Skip to content
Avatar Telemetry Performance

Avatar Telemetry Performance

Andi Lamprecht Andi Lamprecht ·· 3 min read· Accepted
ADR-0128 · Author: Sybil Melton · Date: 2025-02-07 · Products: uncrew
Originally 0106-Avatar-telemetry_PERFORMANCE (v4) · Source on Confluence ↗

Performance test

Context

This performance test was conducted to understand the limitations of an avatar in handling messages per second for UAVs. The avatar was deployed in a Kubernetes cluster. Mock UAVs along with pilot mocks were run from a local machine. Each mock instance included a pilot subscription to telemetry and a UAV mock to generate traffic. The test was considered successful if the avatar could handle the desired traffic for 20 minutes without any loss of telemetry data. Each UAV mock generates 20 position telemetry messages per second.

Prerequisites

  • Avatar broadcast channel buffer was increased from 1000 to 10000(internal buffer for telemetry messages)
  • k8s request\limit memory was increased to 512m\1024m

Test 1

10 Mocks ~ 200 msg per second in total

https://ui.honeycomb.io/droneup/environments/test/datasets/uncrew-avatar/result/4EfF6nRdeM6?hideCompare

With more then 200 msg per second avatar start failing submitting it to pubsub, it is not affect pipes between UAV and pilot.

Test 2 without pubsub

70 Mocks ~ 1400 msg per second in total

https://ui.honeycomb.io/droneup/environments/test/datasets/uncrew-avatar/result/HG88z7BA7Vh?hideCompare

In tests involving more than 70 mocks, we observed that the avatar’s memory consumption sometimes spiked to 3-4 GB, and the number of goroutines escalated to 30-40k. Subsequently, Kubernetes terminated the avatar due to an Out Of Memory (OOM) error. Even in these tests, the metrics showed a significant spike in goroutines, jumping from 700-800 to 4.8k.

Upon further investigation, I identified that the issue likely stemmed from the LaunchDarkly SDK. The symptoms suggested a bug related to concurrent access to shared memory within the SDK. To mitigate this issue in subsequent tests, I temporarily disabled LaunchDarkly in the avatar configuration. This adjustment aimed to isolate the problem and verify the hypothesis of the SDK’s impact on system stability.

Test 3 without pubsub and launchdarkly

100 Mocks ~ 2000 msg per second in total

https://ui.honeycomb.io/droneup/environments/test/datasets/uncrew-avatar/result/k7SgDFWDZ5A?hideCompare

150 Mocks ~ 3000 msg per second in total

https://ui.honeycomb.io/droneup/environments/test/datasets/uncrew-avatar/result/t1dMUmUH9D9?hideCompare

Summary

TestNumber of mocksMsg\secMax heap allocGoroutines
Test 11020017.8M178
Test 270140094M1659(with spike to 5k which was probably caused by LD SDK)
Test 31002000110M1001
Test 3.51503000227M1700

Based on this tests we need to optimize our work with pubsub for support more messages per second and investigate issue with Launchdarkly, if this is bug in concurrent access to shared memory then it can happens in any time and crash avatar.

My laptop struggling with running more than 150 mocks, for next tests we need to try running these mocks from the cloud.

Cited by queries

Last updated on