Three weeks ago I sat in a review where the question was simple and the answer wasn't: "why can't we see what the robots are doing right now?" A client had a dozen ROS 2-based mobile robots running warehouse routes, each one perfectly capable of publishing rich telemetry on its own topics, and precisely zero of that data made it anywhere a human with a dashboard could look at it. The robot's own software stack was fine. The problem was that nobody had built the bridge between "data exists on the robot" and "data exists somewhere else," and that bridge is not a solved problem you buy off a shelf in 2021. This is the pattern I ended up building with AWS IoT Greengrass 2.0, and the honest state of the rough edges you'll hit doing the same thing.
I want to be upfront about scope: this isn't a "here's a finished managed product" post, because that product doesn't exist yet. This is glue. Useful glue, glue that holds up in production, but glue you assemble yourself out of Greengrass components, MQTT topics, and a handful of decisions about what data actually needs to leave the robot.
Why bridge ROS 2 to the cloud at all?
The short answer: fleet visibility and management shouldn't require coupling your robot's application code to a specific cloud vendor. A ROS 2 node that talks directly to a proprietary cloud SDK is a ROS 2 node that's now hard to test in isolation, hard to run on a robot with no network, and locked to whichever vendor wrote that SDK. What you actually want is for the robot to keep doing exactly what it does today — publish and subscribe to ROS 2 topics using DDS, same as always — and have something else, sitting alongside the robot's own stack, decide what crosses the network boundary and how.
That "something else" is where an edge runtime earns its keep. You get telemetry for fleet monitoring (battery levels, mission status, error codes, position), you get the ability to pull diagnostic data on demand instead of walking over to the robot with a laptop, and you get a foundation for over-the-air updates later — all without a single line of AWS-specific code inside the robot's own ROS graph. That separation is the entire design goal here, and it's worth stating plainly because it's easy to violate by accident the first time you're in a hurry.
What is AWS IoT Greengrass 2.0, concretely?
AWS IoT Greengrass is an edge runtime AWS ships that runs on the robot's onboard compute (or a nearby edge box on the same network) and lets you deploy, run, and manage software as discrete, versioned units called components. Greengrass 2.0, which reached general availability in December 2020, rebuilt the whole thing around this component model — a real architectural shift from Greengrass 1.x's more monolithic Lambda-function approach. A component is self-contained: it has a recipe (what to run, what dependencies it needs, what lifecycle hooks to fire on install/start/stop), and Greengrass's job is to deploy components to a device or fleet of devices, keep them running, and update them when you push a new deployment.
The part that matters for this bridge: the ROS 2-to-cloud connector I'm describing here is just another component. It doesn't get special treatment from Greengrass, it doesn't require modifying the robot's ROS 2 workspace, and it runs as its own process alongside whatever ROS 2 nodes are already running on the robot. That's the whole trick — you're not injecting cloud awareness into the robot's control software, you're deploying a separate, sandboxed piece of software next to it that happens to subscribe to the same DDS topics everything else on the robot can already see.
graph TD
ROS["ROS 2 nodes
(navigation, perception, control)"] -->|"DDS topics"| BRIDGE["Greengrass component
rclpy/rclcpp subscriber"]
BRIDGE -->|"MQTT publish"| CORE["AWS IoT Core
device gateway"]
CORE --> RULES["IoT rules engine"]
RULES --> KINESIS["Kinesis Data Streams"]
RULES --> S3["S3 (via Kinesis Firehose)"]
RULES --> DASH["Fleet dashboard / alerting"]
GG["Greengrass core"] -.->|"manages lifecycle"| BRIDGE
GG -.->|"OTA component updates"| BRIDGE
The bridge component subscribes to ROS 2 topics over DDS and republishes as MQTT, without the robot's own ROS graph ever knowing AWS exists. The IoT rules engine is what actually routes data onward — the bridge's only job is getting it into IoT Core.
How does the actual DDS-to-MQTT bridge work?
The bridge component is a small program using rclpy (Python) or rclcpp (C++) — the standard ROS 2 client libraries — to subscribe to whichever topics you've decided are worth shipping off the robot. For each message received, it serializes the relevant fields and publishes them as an MQTT message to a topic on AWS IoT Core, using the AWS IoT Device SDK for the actual MQTT connection, mutual TLS authentication, and certificate-based device identity.
This sounds simple and the mechanism is simple. What's not simple is the translation underneath it, and I'd rather you hear that from me now than discover it during an incident. ROS 2 topics are typed — every topic has a message definition (a .msg file) with named, typed fields, and DDS enforces many-to-many pub/sub with QoS policies (reliability, durability, history depth) baked into the middleware. MQTT is flat, byte-string pub/sub with none of that — no schema enforcement, no native concept of "this field is a float32," just topics and payloads. Bridging the two means you're doing real translation work: serializing a typed ROS 2 message into JSON or a binary encoding MQTT can carry, and making a deliberate choice about which QoS guarantees you're willing to lose in the process (DDS's "keep last N with reliable delivery" doesn't map cleanly onto MQTT's QoS 0/1/2 levels, and pretending it does is how you end up debugging a "missing message" bug that's actually a translation gap).
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import BatteryState
from awsiot import mqtt_connection_builder
import json
class BatteryBridge(Node):
def __init__(self, mqtt_connection):
super().__init__('battery_bridge')
self.mqtt = mqtt_connection
self.subscription = self.create_subscription(
BatteryState, '/battery_state', self.on_battery, 10)
def on_battery(self, msg):
payload = {
'robot_id': 'r-0113',
'voltage': msg.voltage,
'percentage': msg.percentage,
'timestamp': self.get_clock().now().to_msg().sec,
}
self.mqtt.publish(
topic='fleet/r-0113/battery',
payload=json.dumps(payload),
qos=1)
That's a deliberately small example — one topic, one field set, a flat JSON payload. The real version of this component has a config file mapping ROS 2 topic names to MQTT topics and a serialization function per message type, because you don't want to hand-write a bridge node per topic. But the shape doesn't change: subscribe on the ROS 2 side, translate, publish on the MQTT side, and accept that you've made an explicit choice about what got lost in translation.
What does AWS IoT Core actually do once the data arrives?
AWS IoT Core is the managed MQTT broker and device gateway the bridge component talks to. On its own it's not much more than "a place MQTT messages land" — the useful part is the IoT rules engine sitting behind it, which lets you write SQL-like rules that match on MQTT topic and message content, then route matching messages to other AWS services. A rule matching fleet/+/battery can fan out to a Kinesis Data Stream for near-real-time dashboards, to S3 via Kinesis Firehose for long-term storage and later analysis, or trigger a Lambda function if a battery percentage drops below a threshold you care about.
This is the load-bearing design decision in the whole architecture: the bridge component's only responsibility is getting ROS 2 data into IoT Core as MQTT. Everything downstream — where it's stored, who gets alerted, what dashboard shows it — is IoT Core rules and other AWS services, not more robot-side code. If you decide next quarter you want the same telemetry also landing in a different data store, that's a rules engine change, not a robot redeployment.
Why do you need Stream Manager for anything beyond small messages?
Because MQTT over IoT Core is built for small, frequent messages — battery state, mission status, error codes — not for a ten-megabyte rosbag segment or a burst of camera frames, and robot networks are unreliable in ways a typical IoT device on a stable WiFi network isn't. A robot moving through a warehouse loses and regains connectivity constantly; you cannot assume the upload succeeds on the first try, and you cannot assume the robot has a live connection at the exact moment interesting data was captured.
Greengrass Stream Manager is the component AWS ships specifically for this: it manages named streams of data on the device, buffers locally when the network is unavailable, and handles the actual upload to S3 or Kinesis once connectivity returns, with configurable retry and backpressure behavior so a full local disk doesn't silently drop everything. In practice, this is what you use for rosbag segments you want preserved after an anomaly, or periodic image captures you want in the cloud for later model training — data too large or too bursty for the MQTT bridge, but still something you want off the robot reliably rather than best-effort.
Be honest with yourself about how much of this is still manual. There is no fully polished, single-product "ROS-to-cloud" offering as of mid-2021 — you are assembling Greengrass components, IoT Core rules, and Stream Manager configuration yourself, and every one of those pieces has its own failure modes you'll discover in production, not in the demo. Certificate and policy management at fleet scale is the operational cost nobody mentions in the getting-started guide: every robot needs its own X.509 certificate, its own IoT policy scoping exactly which MQTT topics it can publish and subscribe to, and a provisioning process that doesn't involve someone manually clicking through the console for robot number 47. Get the least-privilege policy wrong and you've either got a robot that can impersonate the whole fleet's topics, or a debugging session where nothing publishes and the error is a silent authorization failure three layers down.
How does this become the OTA channel too?
Once Greengrass is already deployed and managing the bridge component, it's a short step to using the same mechanism for deploying updates — a new bridge configuration, an updated ML model artifact for on-robot perception, a patched component with a bug fix. Greengrass fleet deployments let you push a new component version to a defined group of devices (a subset for canary testing, then the rest), and the deployment mechanism handles rollback if a device reports a failed install. This matters because it closes a loop you'll eventually want closed anyway: the same infrastructure that gets telemetry off the robot is the one that gets updates back onto it, and you don't need a second system for that.
| Piece | What it does | Where the real work is |
|---|---|---|
| Greengrass component (bridge) | Subscribes to ROS 2 topics, translates, publishes MQTT | Message serialization, QoS mapping, per-topic config |
| AWS IoT Core | MQTT broker, device gateway, certificate-based auth | Certificate/policy provisioning at fleet scale |
| IoT rules engine | Routes matched MQTT messages to Kinesis, S3, Lambda | Rule authoring, matching on topic and payload |
| Greengrass Stream Manager | Buffers and reliably uploads large/bursty data | Local buffer sizing, backpressure, retry policy |
| Greengrass fleet deployment | OTA delivery of updated components/models | Canary group definition, rollback handling |
What would I actually tell a team starting this today?
Start with a narrow slice — one or two telemetry topics, battery and mission status are a good first target because they're small, low-frequency, and immediately useful for a dashboard. Get the certificate provisioning and IoT policy scoping right before you add more topics, because retrofitting least-privilege access after you've got fifty robots publishing under a shared over-broad policy is a miserable afternoon. Only reach for Stream Manager once you actually have a use case for large or bursty data — rosbag segments around an anomaly, periodic images — rather than routing everything through it by default. And resist the urge to build a general-purpose "ROS message to JSON" auto-serializer before you've bridged three or four real topics by hand; you'll design a much better abstraction once you've felt the actual variety of message shapes you're translating.
What to carry away
The mechanism here isn't exotic: a Greengrass component subscribes to ROS 2 topics using the same client libraries any ROS 2 node would use, translates typed DDS messages into MQTT payloads, and publishes to AWS IoT Core, where the rules engine takes over routing. The robot's own ROS graph never has to know a cloud vendor exists. The two things that will actually cost you time are the DDS-to-MQTT impedance mismatch — you're deciding what QoS guarantees and type safety you're willing to give up, not getting them for free — and certificate/policy management once you're past a handful of robots. Stream Manager is the piece to reach for specifically when you have real bandwidth to move reliably over a network you don't control, not by default. None of this is a finished product yet, and I'd be surprised if it stays that way for long — I expect the pattern here (fleet telemetry, edge bridging, OTA deployment) to keep showing up as robotics teams scale past a handful of test units into real fleets, and the further this same idea is pushed, the more it starts looking like a full data platform rather than a point-to-point bridge.