Skip to content
Onboard Observability Eventjourney

Onboard Observability Eventjourney

Andi Lamprecht Andi Lamprecht ·· 3 min read· Accepted
ADR-0093 · Author: Sybil Melton · Date: 2025-02-07 · Products: uncrew
Originally 0099-OnBoard-Observability_EventJourney (v3) · Source on Confluence ↗

Event Journey

This document describes the journey of an example onboard event/observable from its issuance to when it reaches its audience, specially: a subscriber to the Flight Log, an Uncrew Operator or Uncrew engineer using the Honeycomb frontend.

3f9e97b7eaccd0f29da89be642fc3b1f-obervability-journey.drawio.png

Uncrew OTEL Collector

OTEL traces need to find their way to Honeycomb and that calls for an OTEL Collector

For trying out and getting started with OpenTelemetry, sending your data directly to a backend is a great way to get value quickly. Also, in a development or small-scale environment you can get decent results without a collector.
However, in general we recommend using a collector alongside your service, since it allows your service to offload data quickly and the collector can take care of additional handling like retries, batching, encryption or even sensitive data filtering.

Indeed, we see a number of reasons that justify an onboard OTEL collector, notably:

  • Filter the spans between those written to locally disk (for later upload) and those that make it onto the wire in real time (if any);
  • Collecting and exporting system-level events as OTEL observables;
  • Sampling the observables as some of the onboard events happen very frequently (like the 50Hz control loop);

Transmitting the traces from the onboard components to the Collector would then occur using whatever mechanism the OTEL Collector service uses (HTTP/gRPC based otlp protocol). We could however consider using a ROS2 topic for that, such a custom collector may be useful to the broader ROS2 developer audience.

Transmitting traces from the Collector to Honeycomb may or may not pass the Avatar. All traffic outgoing from the onboard subsystems will be subject to QoS policing and shaping, doesn’t matter if it goes towards the Avatar or elsewhere. The only good reason to pass the OTEL observables via the Avatar is if it’s useful to the Avatar.
Since we want the Collector to arbitrarily delay, sample and filter the observables, it’s hard to see any use in involving the Avatar here.

Collecting System Events

An argument was made that the journal will already be recorded to the ulog by Auterion OS. It also admits that the journal won’t be already annotated with OTEL events. But the argument made for it is to provide a path to view the system journal within Honeycomb even if it does end up writing the “same” data twice – once to the ulog (for one audience segment) and once more to Honeycomb (for Engineering audience segment).

ROS2 Trace Ingress

As ROS2 events don’t do tracing and thus those that originate outside of Uncrew (like the PX4 events) need to be “admitted” to Uncrew and stamped with the trace_id of the root span newly coined for every one of them. This implies translating one ROS2 event into another and to do that we appoint the new architectural role/agent: ROS2 Trace Ingress. This agent will need to be configured to admit/translate specific ROS2 events and as the number of their distinct kind is likely large, so we may need to generate the code that does it.

Neither ROS2 Trace Ingress nor OTEL Collector will be very complex, but they are distinct roles, the former of which is safety critical (safety critical messages stop flowing), while the latter is not (diagnostics stop flowing). They should stay separate.

Cited by queries

Last updated on