Avatar Telemetry Performance
Originally
0106-Avatar-telemetry_PERFORMANCE (v4) · Source on Confluence ↗Performance test
Context
This performance test was conducted to understand the limitations of an avatar in handling messages per second for UAVs. The avatar was deployed in a Kubernetes cluster. Mock UAVs along with pilot mocks were run from a local machine. Each mock instance included a pilot subscription to telemetry and a UAV mock to generate traffic. The test was considered successful if the avatar could handle the desired traffic for 20 minutes without any loss of telemetry data. Each UAV mock generates 20 position telemetry messages per second.
Prerequisites
- Avatar broadcast channel buffer was increased from 1000 to 10000(internal buffer for telemetry messages)
- k8s request\limit memory was increased to 512m\1024m
Test 1
10 Mocks ~ 200 msg per second in total
With more then 200 msg per second avatar start failing submitting it to pubsub, it is not affect pipes between UAV and pilot.
Test 2 without pubsub
70 Mocks ~ 1400 msg per second in total
In tests involving more than 70 mocks, we observed that the avatar’s memory consumption sometimes spiked to 3-4 GB, and the number of goroutines escalated to 30-40k. Subsequently, Kubernetes terminated the avatar due to an Out Of Memory (OOM) error. Even in these tests, the metrics showed a significant spike in goroutines, jumping from 700-800 to 4.8k.
Upon further investigation, I identified that the issue likely stemmed from the LaunchDarkly SDK. The symptoms suggested a bug related to concurrent access to shared memory within the SDK. To mitigate this issue in subsequent tests, I temporarily disabled LaunchDarkly in the avatar configuration. This adjustment aimed to isolate the problem and verify the hypothesis of the SDK’s impact on system stability.
Test 3 without pubsub and launchdarkly
100 Mocks ~ 2000 msg per second in total
150 Mocks ~ 3000 msg per second in total
Summary
| Test | Number of mocks | Msg\sec | Max heap alloc | Goroutines |
|---|---|---|---|---|
| Test 1 | 10 | 200 | 17.8M | 178 |
| Test 2 | 70 | 1400 | 94M | 1659(with spike to 5k which was probably caused by LD SDK) |
| Test 3 | 100 | 2000 | 110M | 1001 |
| Test 3.5 | 150 | 3000 | 227M | 1700 |
Based on this tests we need to optimize our work with pubsub for support more messages per second and investigate issue with Launchdarkly, if this is bug in concurrent access to shared memory then it can happens in any time and crash avatar.
My laptop struggling with running more than 150 mocks, for next tests we need to try running these mocks from the cloud.
Cited by queries
- Observability and metrics across DroneUp — 2026-04-24