Developer quick reference

Disclaimer

⚠️ This is a quick-reference guide, not a complete guide to making a plugin. Use this to copy-paste commands while working on plugins and to troubleshoot them in the testing and scheduling stages. Please consult the official 📗Plugin Tutorials for detailed guidance.

Tips

ℹ️ Plugin=App

📗 = recommended code docs and tutorials from Sage.

👉 First make a minimalistic app with a core functionality to test on the node. Later you may add all the options you want.

☝ Avoid making a plugin from scratch. Use another plugin or this template for your first plugin or use 🆕 Cookiecutter Template.

⚠️ Repository names should be all in small alphanumeric letters and - (Do not use _)

Requirements : Install Docker, git, and Python

Components of a plugin

Typical components of a Sage plugin are described below:

1. An application

This is just your usual Python program, either a single .py script or a set of directories with many components (e.g. ML models, unit tests, test data, etc).

👉 First do this step on your machine and perfect it until you are happy with the core functionality.

app/app.py* : the main Python file (sometimes also named main.py) contains the code that defines the functionality of the plugin or calls other scripts to do tasks. It usually has from waggle.plugin import Plugin call to get the data from in-built sensors and publishes the output.

Note: Variable names in plugin.publish should be descriptive and specific.

Install pywaggle pip3 install -U 'pywaggle[all]'

app/test.py : optional but recommended file, contains the unit tests for the plugin.

2. Dockerizing the app

👉 Put everything in a Docker container using a waggle base image and make it work. This may require some work if libraries are not compatible. Always use the latest base images from Dockerhub

Dockerfile* : contains instructions for building a Docker image for the plugin. It specifies the waggle base image from dockerhub, sets up the environment, installs dependencies, and sets the entrypoint for the container.

⚠️ Keep it simple ENTRYPOINT ["python3", "/app/app.py"]

requirements.txt* : lists the Python dependencies for the plugin. It is used by the Dockerfile to install the required packages using pip.

build.sh : is an optional shell script to automate building the complicated Docker image with tags etc.

Makefile : optional but the recommended file includes commands for building the Docker image, running tests, and deploying the plugin.

3. ECR configs and docs

You can do this step (except sage.yaml) after testing on the node but before the ECR submission. 😄

sage.yaml* : is the configuration file useful for ECR and job submission? Most importantly it specifies the version and input arguments.

README.md and ecr-meta/ecr-science-description.md* : a Markdown file describing the scientific rationale of the plugin as an extended abstract. This includes a description of the plugin, installation instructions, usage examples, data downloading code snippets, and other relevant information.

💡 Keep the same text in both files and follow the template of ecr-science-description.md.

ecr-meta/ecr-icon.jpg : is an icon (512px x 512px or smaller) for the plugin in the Sage portal.

ecr-meta/ecr-science-image.jpg : is a key image or figure plot that best represents the scientific output of the plugin.

info

📗 Check Sage Tuorial Part1 and Part2

Getting access to the node

Follow this page: https://portal.sagecontinuum.org/account/access to access the nodes.
To test your connection the first time, execute ssh waggle-dev-sshd and enter your ssh key passphrase. You should get the following output,

Enter passphrase for key /Users/bhupendra/.ssh/id_rsa: no command provided Connection to 192.5.86.5 closed.

Enter the passphrase to continue.

To connect to the node, execute ssh waggle-dev-node-V032 and enter your passphrase (required twice).

You should see the following message,

We are connecting you to node V032

info

📗 See Sage Tuorial: Part 3 for details on this topic.

Testing plugins on the nodes

danger

⚠️ Do not run any app or install packages directly on the node. Use Docker container or pluginctl commands.

1. Download and run it

Download

If you have not already done it, you need your plugin in a public GitHub repository at this stage.
To test the app on a node, go to nodes W0xx (e.g. W023) and clone your repo there using the command git clone.
At this stage, you can play with your plugin in the docker container until you are happy. Then if there are changes made to the plugin, I reccomend replicating the same in your local repository and pushing it to the github and node.
or do git commit -am 'changes from node' and git push -u origin main.
However, before commiting from node, you must run following commands at least once in your git repository on the node. git config [--locale] user.name "Full Name" git config [--locale] user.email "email@address.com"

danger

⚠️ Make sure your Dockerfile has a proper entrypoint or the pluginctl run will fail.

Testing with Pluginctl

info

📗 For more details on this topic check pluginctl docs.

Then to test execute the command sudo pluginctl build .. This will output the plugin-image registry address at the end of the build. Example: 10.31.81.1:5000/local/my-plugin-name
To run the plugin without input argument, use sudo pluginctl deploy -n <some-unique-name> <10.31.81.1:5000/local/my-plugin-name>
Execute the command with input arguments. sudo pluginctl deploy -n <some-unique-name> <10.31.81.1:5000/local/my-plugin-name> -- -i top_camera.
If you need GPU, use the selector sudo pluginctl deploy -n <some-unique-name> <10.31.81.1:5000/local/my-plugin-name> -- -i top_camera.
❗ -- is a separator. After the -- all arguments are for your entrypoint i.e. app.py.
To check running plugins, execute sudo pluginctl ps.
To stop the plugin, execute sudo pluginctl rm cloud-motion.
To check the log pluginctl logs cloud-motion :warning:Do not forget to stop the plugins after testing or it will run forever.

Testing USBSerial devices

:point_right:The USBserial device template is in Cookiecutter Template. Also check wxt536 plugin.

Steps for working with a USB serial device

First, you need to confirm which computing unit the USB device is connected to, RPi or nxcore.
Then, you add the --selector and --privileged options to the pluginctl command during testing and specifying the same in the job.yaml for scheduling.
To test the plugin on nxcore, which has the USB device, use the command sudo pluginctl run -n testname --selector zone=core --privileged 10.31.81.1:5000/local/plugin-name.
The --selector and --privileged attributes should be added to the pluginSpec in the job submission script as shown in the example YAML code.
You can check which computing unit is being used by the edge scheduler by running the kubectl describe pod command and checking the output.

⚠️ Re/Check that you are using the correct USB port for the device if getting empty output or folder not found error.

2. Check if it worked?

3. Check why it failed?

When you encounter a failing/long pending job with an error, you can use the following steps to help you diagnose the issue:

First check the Dockerfile entrypoint.
Use the command sudo kubectl get pod to get the name of the pod associated with the failing job.
Use the command sudo kubectl logs <<POD_NAME>> to display the logs for the pod associated with the failing job. These logs will provide you with information on why the job failed.
Use the command sudo kubectl describe pod POD_NAME to display detailed information about the pod associated with the failing job.
This information can help you identify any issues with the pod itself, such as issues with its configuration or resources.

By following these steps, you can better understand why the job is failing and take steps to resolve the issue.

4. Troubleshooting inside the container using pluginctl

Follow this tutorial to get in an already running container to troubleshoot the issue. If the plugin fails instantly and your are not able to get inside the container use following commands to override the entrypoint

First Deploy with Custom Entrypoint --entrypoint /bin/bash :

sudo pluginctl deploy -n testnc --entrypoint /bin/bash 10.31.81.1:5000/local/plugin-mobotix-scan -- -c 'while true; do date; sleep 1; done'

Note the -c 'while true; do date; sleep 1; done' instead of your usual plugin arguments. Now if you do sudo pluginctl logs testnc you will see the logs i.e. date.

Access the Plugin Container: sudo pluginctl exec -ti testnc -- /bin/bash

Edge Code Repository

How to get your plugin on ECR

To publish your Plugin on ECR, follow these steps:

Go to https://portal.sagecontinuum.org/apps/.
Click on "Explore the Apps Portal".
Click on "My Apps". You must be logged in to continue.
Click "Create App" and enter your Github Repo URL.
'Click "Register and Build App".
On Your app page click on the "Tags" tab to get the registry link when you need to run the job on the node either using pluginctl or job script. This will look like:docker pull registry.sagecontinuum.org/bhupendraraut/mobotix-move:1.23.3.2
Repeat the above process for updating the plugin.

warning

After the build process is complete, you need to make the plugin public to schedule it.

👉 If you have skipped step 3. ECR Configs and Docs, do it before submitting it to the ECR. Ensure that your ecr-meta/ecr-science-description.md and sage.yaml files are properly configured for this process.

Versioning your code

danger

You can not resubmit the plugin to ECR with the same version number again.

So think about how you change it every time you resubmit to ECR and make your style of versioning. :thinking_face:
I use 'vx.y.m.d' e.g. 'v0.23.3.4' but then I can only have 1 version a day, so now I am thinking of adding an incremental integer to it.

After ECR registry test (generally not required)

Generally successfully tested plugins just work. However, in case they are failing in the scheduled jobs after running for a while or after successfully running in the above tests, do the following.
To test a plugin on a node after it has been built on the ECR, follow these steps: sudo pluginctl run --name test-run registry.sagecontinuum.org/bhupendraraut/cloud-motion:1.23.01.24 -- -input top
This command will execute the plugin with the specified ECR image (version 1.23.01.24), passing the "-input top" argument to the plugin (Note -- after the image telling pluginctl that these arguments are for the plugin).

👉 Note the use of sudo in all pluginctl and docker commands on the node.

Assuming that the plugin has been installed correctly and the ECR image is available, running this command should test the "test-motion" plugin on the node.

You may also have to call the kubectl <POD> commands as in the testing section if this fails.

Scheduling the job

warning

❗ If you get an error like registry does not exist in ECR, then check that your plugin is made public.

Follow this link to get an understanding of how to submit a job
Here are the parameters we set for the Mobotix sampler plugin,

-name thermalimaging registry.sagecontinuum.org/bhupendraraut/mobotix-sampler:1.22.4.13 \
   --ip 10.31.81.14 \
   -u userid \
   -p password \
   --frames 1 \
   --timeout 5 \
   --loopsleep 60

Your science rule can be a cronjob (More information can be found here
This runs every 15 minutes "thermalimaging": cronjob("thermalimaging", "*/15 * * * *").
Use Crontab Guru.
You can also make it triggered by a value. Please read this for supported functions.

Scheduling scripts

✨ Check user friendly job submission UI.

📗 Check sesctl docs for command line tool.

☝ Do not use _, upper case letters or . in the job name. Use only lowercase letters, numbers and -.
☝ Ensure that the plugin is set to 'public' in the Sage app portal.

`job.yaml` example for USB device

name: atmoswxt
plugins:
- name: waggle-wxt536
  pluginSpec:
    image: registry.sagecontinuum.org/jrobrien/waggle-wxt536:0.23.4.13
    privileged: true
    selector:
      zone: core
nodeTags: []
nodes:
  W057: true
  W039: true
scienceRules:
- 'schedule("waggle-wxt536"): cronjob("waggle-wxt536", "1/10 * * * *")'
successCriteria:
- WallClock('1day')

Multiple jobs example

If you want to run your plugins not all at the same time. Use this example.

name: w030-k3s-upgrade-test
plugins:
- name: object-counter-bottom
  pluginSpec:
    image: registry.sagecontinuum.org/yonghokim/object-counter:0.5.1
    args:
    - -stream
    - bottom_camera
    - -all-objects
    selector:
      resource.gpu: "true"
- name: cloud-cover-bottom
  pluginSpec:
    image: registry.sagecontinuum.org/seonghapark/cloud-cover:0.1.3
    args:
    - -stream
    - bottom_camera
    selector:
      resource.gpu: "true"
- name: surfacewater-classifier
  pluginSpec:
    image: registry.sagecontinuum.org/seonghapark/surface_water_classifier:0.0.1
    args:
    - -stream
    - bottom_camera
    - -model
    - /app/model.pth
- name: avian-diversity-monitoring
  pluginSpec:
    image: registry.sagecontinuum.org/dariodematties/avian-diversity-monitoring:0.2.5
    args:
    - --num_rec
    - "1"
    - --silence_int
    - "1"
    - --sound_int
    - "20"
- name: cloud-motion-v1
  pluginSpec:
    image: registry.sagecontinuum.org/bhupendraraut/cloud-motion:1.23.02.20
    args:
    - --input
    - bottom_camera
- name: imagesampler-bottom
  pluginSpec:
    image: registry.sagecontinuum.org/theone/imagesampler:0.3.1
    args:
    - -stream
    - bottom_camera
- name: audio-sampler
  pluginSpec:
    image: registry.sagecontinuum.org/seanshahkarami/audio-sampler:0.4.1
nodeTags: []
nodes:
  W030: true
scienceRules:
- 'schedule(object-counter-bottom): cronjob("object-counter-bottom", "*/5 * * * *")'
- 'schedule(cloud-cover-bottom): cronjob("cloud-cover-bottom", "01-59/5 * * * *")'
- 'schedule(surfacewater-classifier): cronjob("surfacewater-classifier", "02-59/5
  * * * *")'
- 'schedule("avian-diversity-monitoring"): cronjob("avian-diversity-monitoring", "*
  * * * *")'
- 'schedule("cloud-motion-v1"): cronjob("cloud-motion-v1", "03-59/5 * * * *")'
- 'schedule(imagesampler-bottom): cronjob("imagesampler-bottom", "04-59/5 * * * *")'
- 'schedule(audio-sampler): cronjob("audio-sampler", "*/5 * * * *")'
successCriteria:
- Walltime(1day)

here objecct-counter runs at 0, 5, 10, etc cloud-cover: 1, 6, 11, etc. surface water: 2, 7, 12, etc. cloud-motion: 3, 8, 13, etc. image-sampl: 4, 9, 14, etc.

Debugging failed jobs

Do you know how to identify why a job is failing

✨ When the job failures are seen as red markers on your job page, you can click them to see the error.

Or detail errors can be found using using sage_data_client

Requirements: sage_data_client and utils.py
By specifying the plugin name and node, the following code will print out the reasons for job failure within the last 60 minutes.

from utils import *

mynode = "w030"

myplugin = "water"
df = fill_completion_failure(parse_events(get_data(mynode, start="-60m")))
for _, p in df[df["plugin_name"].str.contains(myplugin)].iterrows():
    print(p["error_log"])

Downloading the data

Sage docs for accessing-data

See Your Data on Sage Portal

To check your data on Sage Portal, follow these steps:

Click on the Data tab at the top of the portal page.
Select Data Query Browser from the dropdown menu.
Then, select your app in the filter. This will show all the data that is uploaded by your app using the plugin.publish() and plugin.upload() methods.

In addition, you can data visualize as a time series and select multiple variables to visualize together in a chart, which can be useful for identifying trends or patterns.

Download all images with wget

Visit https://training-data.sagecontinuum.org/
select the node and period for data.
Select the required data and download the text file urls-xxxxxxx.txt with urls
To select only the top camera images, use the vim command: g/^$.*top$\@!.*$/d. This will delete URLs that do not contain the word 'top'
Copy the following command from the website and run it in your terminal. wget -r -N -i urls-xxxxxxx.txt

Sage data client for text data

Sage data client python Notebook Example
pypi link pip install sage-data-client

info

📗 Documentation for accessing the data.

Querying data example

The sage_data_client provides query() function which takes the parameters:

import sage_data_client
import pandas as pd

df = sage_data_client.query(
    start="2023-01-08T00:00:09Z",  # Start time in "YYYY-MM-DDTHH:MM:SSZ" or "YYYYMMDD-HH:MM:SS" format
    end="2024-01-08T23:23:24Z",    # End time in the same format as start time
    filter={
        "plugin": ".*mobotix-scan.*",  # Regex for filtering by plugin name
        "vsn": "W056",                # Specific node identifier
        "name": "upload",             # Specific data field
        "filename": ".*_position1.nc" # Regex for filtering filenames
    }
)

df.sort_values('timestamp')
df

Filter Criteria

start and end: Time should be specified in UTC, using the format YYYY-MM-DDTHH:MM:SSZ or YYYYMMDD-HH:MM:SS.
filter: A dictionary for additional filtering criteria. Each key is a column name in the df.
Use regular expressions (denoted as .*pattern.*) for flexible matching within text fields like plugin or filename.

Downloading Files

Use additional pandas operations on df to to include only the records of interest and download the files using a function like the one provided below, which gets the URLs in the value column, using authentication.

import requests
import os
from requests.auth import HTTPBasicAuth

uname = 'username'
upass = 'token_as_password'

def download_files(df, download_path, uname, upass):
   # check download directory
   if not os.path.exists(download_path):
      os.makedirs(download_path)

   for index, row in df.iterrows():
      # 'value' column has url
      url = row['value']

      filename = url.split('/')[-1]

      # Download using credentials
      response = requests.get(url, auth=HTTPBasicAuth(uname, upass))
      if response.status_code == 200:
         # make the downloads path
         file_path = os.path.join(download_path, filename)
         # Write a new file
         with open(file_path, 'wb') as file:
         file.write(response.content)
         print(f"Downloaded {filename} to {file_path}")
      else:
         print(f"Failed to download {url}, status code: {response.status_code}")

# usage
download_files(df, '/Users/bhupendra/projects/epcape_pier/data/downloaded/nc_pos1', uname, upass)

More data analysis resources

Miscellaneous

Find PT Mobotix thermal camera ip on the node

run nmap -sP 10.31.81.1/24

Nmap scan report for ws-nxcore-000048B02D3AF49F (10.31.81.1)
Host is up (0.0012s latency).
Nmap scan report for switch (10.31.81.2)
Host is up (0.0058s latency).
Nmap scan report for ws-rpi (10.31.81.4)
Host is up (0.00081s latency).
Nmap scan report for 10.31.81.10
Host is up (0.0010s latency).
Nmap scan report for 10.31.81.15
Host is up (0.00092s latency).
Nmap scan report for 10.31.81.17
Host is up (0.0014s latency).
Nmap done: 256 IP addresses (6 hosts up) scanned in 2.42 seconds

From the output run any command for each ip e.g. curl -u admin:meinsm -X POST http://10.31.81.15/control/rcontrol?action=putrs232&rs232outtext=%FF%01%00%0F%00%00%10
The ip for which output is OK is the Mobotix.

SSH 'Broken Pipe' Issue and Solution

A 'Broken pipe' occurs when the SSH session to waggle-dev-node is inactive for longer than 10/15 minutes, resulting in a closed connection.

client_loop: send disconnect: Broken pipe
Connection to waggle-dev-node-w021 closed by remote host.
Connection to waggle-dev-node-w021 closed.

Solution

To prevent the SSH session from timing out and to maintain the connection, the following configuration options can be added to the SSH config file:

# Keep the SSH connection alive by sending a message to the server every 60 seconds
Host *
  TCPKeepAlive yes
  ServerAliveInterval 60
  ServerAliveCountMax 999

Disclaimer​

Tips​

Components of a plugin​

1. An application​

2. Dockerizing the app​

3. ECR configs and docs​

Getting access to the node​

Testing plugins on the nodes​

1. Download and run it​

Download​

Testing with Pluginctl​

Testing USBSerial devices​

2. Check if it worked?​

3. Check why it failed?​

4. Troubleshooting inside the container using pluginctl​

Edge Code Repository​

How to get your plugin on ECR​

Versioning your code​

After ECR registry test (generally not required)​

Scheduling the job​

Scheduling scripts​

job.yaml example for USB device​

Multiple jobs example​

Debugging failed jobs​

Downloading the data​

See Your Data on Sage Portal​

Download all images with wget​

Sage data client for text data​

Querying data example​

Filter Criteria​

Downloading Files​

More data analysis resources​

Miscellaneous​

Find PT Mobotix thermal camera ip on the node​

SSH 'Broken Pipe' Issue and Solution​

Solution​

Disclaimer

Tips

Components of a plugin

1. An application

2. Dockerizing the app

3. ECR configs and docs

Getting access to the node

Testing plugins on the nodes

1. Download and run it

Download

Testing with Pluginctl

Testing USBSerial devices

2. Check if it worked?

3. Check why it failed?

4. Troubleshooting inside the container using pluginctl

Edge Code Repository

How to get your plugin on ECR

Versioning your code

After ECR registry test (generally not required)

Scheduling the job

Scheduling scripts

`job.yaml` example for USB device

Multiple jobs example

Debugging failed jobs

Downloading the data

See Your Data on Sage Portal

Download all images with wget

Sage data client for text data

Querying data example

Filter Criteria

Downloading Files

More data analysis resources

Miscellaneous

Find PT Mobotix thermal camera ip on the node

SSH 'Broken Pipe' Issue and Solution

Solution