Aapeli Vuorinen

Hi, I'm Aapeli! Two defining features of my character are: a deep curiosity for understanding what drives the world forward; and a need to always be solving new, tough problems. I'm particularly interested in using advanced technology and mathematical modelling to come up with creative solutions and to make awesome products. I find that the projects I work on increasingly rely on data, so I've lately been concentrating a lot on machine learning, statistics, and other data related areas. By formal training I'm a mathematician and statistician, but I've always had a passion for software engineering. Feel free to get in touch if you like what you see on this site. I'm always open to exploring new things with other creative, smart people.

Around 2 minutes (389 words). Published 2019-07-12.

Why don’t protocol buffers have fixed length arrays?

I’ve been using Google’s protocol buffers recently for some projects, and for a long time I had a gripe with the fact that they don’t include a type for fixed-length arrays. I mean, surely you’d expect that from a serious serialisation library like this?

Well, let’s think about it a bit. Maybe the situation isn’t as hideous as it seems.

One of the very core key objectives of the protocol buffer design is their impressive backwards and forwards compatibility: mainly because a parser can ignore practically everything it doesn’t understand or need. So for instance, a load balancing server in an RPC system could simply read the key in the protocol buffer relating to the service endpoint while discarding everything else in its decision on where to relay the request.

But the problem here is then that not unlike any other type-length-value encoding, for a server to be able to parse a message without knowing anything about this field, it obviously has to know the length of that value in order to skip it appropriately. So in effect, we’ll still need to include the length of this field. But now we’ve in fact ended up with exactly what packed repeated fields already do in proto3 (previously there were unpacked repeated fields which have extra overhead). Hence we’re not really wasting space or doing anything like that in the serialisation by not having a specific fixed length array type, as we’d have to do it anyway.

Of course it would be nice to have a way of telling the code generator exactly how many elements we expect and then getting a runtime error if we don’t deserialise that number of values instead of having to check it ourselves. But then that would increase the complexity of the whole process, and add another slight penalty to thinking about compatibility: if we change it so that from now we require a few more elements, does that mean we have to discard this field number and start using a new one?

So no, protocol buffers don’t have fixed length arrays and at first that can be relatively frustrating. But I hope I’ve argued a little why it doesn’t exactly matter that I can or can’t specify the length of an array in my .proto files.