Apollo gRPC Serialization Format Options
Originally
0084-Apollo-GRPC_Serialization_Format_Options (v3) · Source on Confluence ↗Uncrew Inter-Process Serialization Format
Context
We want a serialization protocol that works well across languages (at a minimum, C++ and Golang) as we have to communicate across language boundaries.
We are currently using protobufs, so all options will be weighed against that as the standard.
Options Considered
Option Summary
- Both Cereal and Boost were immediately disqualified. They both work well (and would likely be my preferred options) for pure C++ development, but that isn’t our use case.
- The remaining three options all have pros and cons, only some of which I feel qualified to properly weigh.
Cap’n Proto
| Pros | Cons |
|---|---|
| Written by the primary author of Protobufs-V2 with the intention of fixing everything wrong with protobufs | Go support exists, but seems very clunky to me and isn’t maintained by the author of the serialization standard |
| Orders of magnitude faster than protobufs | no gRPC support |
| Very robust schema language |
FlatBuffers
| Pros | Cons |
|---|---|
| Faster than protobufs | no gRPC support |
| Good support in all languages | slightly awkward syntax in C++ |
| geared pretty heavily towards video game development |
Protobufs
| Pros | Cons |
|---|---|
| gRPC Support | notoriously bloated in every sense |
| Schema has worked well enough for our purposes so far | Schema is deficient in many ways (I don’t really like the tone of that article, but the points are valid enough) |
| good golang support | C++ is a bit of a second-class citizen |
| Questionable implementation as outlined by keith here |
JSON
| Pros | Cons |
|---|---|
| easily usable json libraries available in virtually every language | relatively slow |
| very easily human-readable | Requires extra effort on our end to get gRPC working with JSON instead of protobufs |
| larger data size compared to many of the alternatives | |
| doesn’t require a pre-compile step | No built-in schema |
Recommendation
Using anything but Protobufs would make using gRPC a significantly larger burden, as we would have to build and maintain our own gRPC encoders. As such, protobufs should be our defacto standard when working with gRPC. As mentioned, however, we have encountered some significant issues with working with protobufs in C++. As a result of these, we should not be strictly held to using protobufs when they are causing more issues than the alternatives, but they should be the first attempt. While most data science tools do prefer the JSON format, using JSON as our serialization format for inter-process communication format is not the correct option. JSON is a significantly larger message size compared to every other serialization option I looked at (protobufs is anywhere from ~25% to ~80% of the size of the equivalent JSON depending on message sizes and compression levels, according to this) and all of the serialization options listed above have conversions to and from JSON, so if we do need the JSON format in our applications, that can be done at the time of usage instead of having that be the format for message transmission. The larger size of JSON also means we should be storing the messages in long term storage as protobufs instead of as JSON. In addition, converting from protobufs to json at the time of export is a fairly simple task in the handful of languages we’re likely to use for any data science purposes: scala, golang, C++, java, python
If we were are willing to bite the bullet and switch off of gRPC for whatever reason, I would recommend Cap’n Proto. The Schema is much better than protobufs, and the performance is unmatched with anything else I tested. The downside is that while C++ is a second-class citizen in Protobufs, Golang seems to be a bit of a second-class citizen in Cap’n Proto. That being said, I think the net benefit of Cap’n Proto is better than that of Protobufs if we are discounting all the networking implications of each.