The cyberinfrastructure consists of coordinating hardware and software services enabling AI at the edge. Below is a quick summary of the different infrastructure pieces, starting at the highest-level and zooming into each component to understand the relationships and role each plays.
There are 2 main components of the cyberinfrastructure:
- Nodes that exist at the edge
- The cloud that hosts services and storage systems to facilitate running “science goals” @ the edge
Every edge node maintains connections to 2 core cloud components: one to a Beehive and one to a Beekeeper
The Beekeeper is an administrative server that allows system administrators to perform actions on the nodes such as gather health metrics and perform software updates. All nodes "phone home" to their Beekeeper and maintain this "life-line" connection.
Details & source code: https://github.com/waggle-sensor/beekeeper
The Node <-> Beehive connection is the pipeline for the science. It is over this connection that instructions for the node will be sent, in addition to how data is published into the Beehive storage systems from applications (plugins) running on the nodes.
The overall infrastructure supports multiple Beehives, where each node is associated with a single Beehive. The set of nodes associated with a Beehive creates a "project" where each "project" is separate, having its own data store, web services, etc.
In the example above, there are 2 nodes associated with Beehive 1, while a single node is associated with Beehive 2. With all nodes, in this example, being administered by a single Beekeeper.
Note: the example above shows a single Beekeeper, but a second Beekeeper could have been used for administrative isolation.
Details & source code: https://github.com/waggle-sensor/waggle-beehive-v2
Looking deeper into the Beehive infrastructure, it contains 2 main components:
- software services such as the Edge Scheduler (ES), Lambda Triggers (LT), data APIs, and websites/portals
- data storage systems such as the Data Repository (DR) and the Edge Code Repository (ECR)
The Beehive is the “command center” for interacting with the Waggle nodes at the edge. Hosting websites and interfaces allowing scientists to create science goals to run plugins at the edge & browse the data produced by those plugins.
The software services and data storage systems are deployed within a kubernetes environment to allow for easy administration and to support running in a multiple server architecture, supporting redundancy and service replication.
While the services running within Beehive are many (both graphical and REST style API interfaces), the following is an outline of the most vital.
Data Repository (DR)
The Data Repository is the data store for housing all the edge produced plugin data. It consists of different storage technologies (i.e. influxdb) and techniques to store simple textual data (i.e. key-value pairs) in addition to large blobular data (i.e. audio, images, video). The Data Repository additionally has an API interface for easy access to this data.
The data store is a time-series database of key-value pairs with each entry containing metadata about how and when the data originated @ the edge. Included in this metadata is the data collection timestamp, plugin version used to collect the data, the node the plugin was run on, and the specific compute unit within the node that the plugin was running on.
In the above example, the value of
25050 was collected @
2022-06-10T22:37:47.369013647Z from the
bme680 sensor on node
000048b02d35a97c via the
Note: see the Access and use data site for more details and data access examples.
Details & source code: https://github.com/waggle-sensor/data-repository
Edge Scheduler (ES)
The Edge Scheduler is defined as the suite of services running in Beehive that facilitate running plugins @ the edge. Included here are user interfaces and APIs for scientists to create and manage their science goals. The Edge Scheduler continuously analyzes node workloads against all the science goals to determine how the science goals are deployed to the Beehive nodes. When it is determined that a node's science goals are to be updated, the Edge Scheduler interfaces with WES running on those nodes to update the node's local copy of the science goals. Essentially, the Edge Scheduler is the overseer of all the Beehive's nodes, deploying science goals to them to meet the scientists plugin execution objectives.
Details & source code: https://github.com/waggle-sensor/edge-scheduler
Edge Code Repository (ECR)
The Edge Code Repository is the "app store" that hosts all the tested and benchmarked edge plugins that can be deployed to the nodes. This is the interface allowing users to discover existing plugins (for potential inclusion in their science goals) in addition to submitting their own. At it's core, the ECR provides a verified and versioned repository of plugin Docker images that are pulled by the nodes when a plugin is to be downloaded as run-time component of a science goal.
Details & source code: https://github.com/waggle-sensor/edge-code-repository
Lambda Triggers (LT)
The Lambda Triggers service provides a framework for running reactive code within the Beehive. There are two kinds of reaction triggers considered here: From-Edge and To-Edge.
From-Edge triggers, or messages that originate from an edge node, can be used to trigger lambda functions -- for example, if high wind velocity is detected, a function could be triggered to determine how to reconfigure sensors or launch a computation or send an alert.
To-Edge triggers are messages that are to change a node's behavior. For example an HPC calculation or cloud-based data analysis could trigger an Edge Scheduler API call to request a science goal to be run on a particular set of edge nodes.
Details & source code: https://github.com/waggle-sensor/lambda-triggers
Nodes are the edge computing component of the cyberinfrastructure. All nodes consist of 3 items:
- Persisent storage for housing downloaded plugins and caching published data before it is transferred to the node's Beehive
- CPU and GPU compute modules where plugins are executed and perform the accelerated inferences
- Sensors such as environment sensors, cameras and LiDAR systems
Edge nodes enable fast computation @ the edge, leveraging the large non-volatile storage to handle caching of high frequency data (including images, audio and video) in the event the node is "offline" from its Beehive. Through expansion ports the nodes support the adding and removing of sensors to fully customize the node deployments for the particular deployment environment.
Overall, even though the nodes may use different CPU architectures and different sensor configurations, they all leverage the same Waggle Edge Stack (WES) to run plugins.
Wild Sage Node (Wild Waggle Node)
The Wild Sage Node (or Wild Waggle Node) is a custom built weather-proof enclosure intended for remote outdoor installation. The node features software and hardware resilience via a custom operating system and custom circuit board. Internal to the node is a power supply and PoE network switch supporting the addition of sensors through standard Ethernet (PoE), USB and other embedded protocols via the node expansion ports.
The technical capabilities of these nodes consists of:
- NVidia Xavier NX ARM64 Node Controller w/ 8GB of shared CPU/GPU RAM
- 1 TB of NVMe storage
- 4x PoE expansion ports
- 1x USB2 expansion port
- optional Stevenson Shield housing a RPi 4 w/ environmental sensors & microphone
- optional 2nd NVidia Xavier NX ARM64 Edge Processor
Node installation manual: https://sagecontinuum.org/docs/installation-manuals/wsn-manual
Details & source code: https://github.com/waggle-sensor/wild-waggle-node
A Blade Node is a standard commercially available server intended for use in a climate controlled machine room, or extended temperature range telecom-grade blades for harsher environments. The AMD64 based operating system supports these types of nodes, enabling the services needed to support WES.
The above diagram shows the basic technical configuration of a Blade Node:
- Multi-core ARM64
- 32GB of RAM
- Dedicated NVida T4 GPU
- 1 TB of SSD storage
Note: it is possible to add the same optional Stevenson Shield housing that is available to the Wild Sage Nodes
Details & source code: https://github.com/waggle-sensor/waggle-blade
Running plugins @ the Edge
Included in the Waggle operating systems are the core components necessary to enable running plugins @ the edge. At the heart of this is k3s, which creates a protected & isolated run-time environment. This environment combined with the tools and services provided by WES enable plugin access to the node's CPU, GPU, sensors and cameras.
Waggle Edge Stack (WES)
The Waggle Edge Stack is the set of core services running within the edge node's k3s run-time environment that supports all the features that plugins need to run on the Waggle nodes. The WES services coordinate with the core Beehive services to download & run scheduled plugins (including load balancing) and facilitate uploading plugin published data to the Beehive data repository. Through abstraction technologies and WES provided tools, plugins have access to sensor and camera data.
The above diagram demonstrates 2 plugins running on a Waggle node. Plugin 1 ("neon-kafka") is an example plugin that is running alongside Plugin 2 ("data-smooth"). In this example, "neon-kafka" (via the WES tools) is reading metrics from the node's sensors and then publishing that data within the WES run-time environment (internal to the node). At the same time, the "data-smooth" plugin is subscribing to this data stream, performing some sort of inference and then publishing the inference results (via WES tools) to Beehive.
Note: see the Edge apps guide on how to create a Waggle plugin.
Details & source code: https://github.com/waggle-sensor/waggle-edge-stack
What is a plugin?
Plugins are the user-developed modules that the cyberinfrastructure is designed around. At it's simplest definition a "plugin" is code that runs @ the edge to perform some task. That task may be simply collecting sample camera images or a complex inference combining sensor data and results published from other plugins. A plugin's code will interface with the edge node's sensor(s) and then publish resulting data via the tools provided by WES. All developed plugins are hosted by the Beehive Edge Code Repository.
See how to create plugins for details.
A "science goal" is a rule-set for how and when plugins are run on edge nodes. These science goals are created by scientist to accomplish a science objective through the execution of plugins in a specific manner. Goals are created, in a human language, and managed within the Beehive Edge Scheduler. It is then the cyberinfrastucture responsibility to deploy the science goals to the edge nodes and execute the goal's plugins. The tutorial walks through running a science goal.