> Each field in each struct has both a name and a numeric id. Only ids are used for serialization, so field names can be changed at any time.
Fair enough your field names can be renamed. But the 'contract' is field numbers, not names.
> All fields are marked as optional or repeated, never required. Most code is written to handle missing fields gracefully.
So if all fields are optional, and you provide no fields at all, what happens? I assume the process rejects it, because it's not of the correct type?
> Changing the type or id of an existing field is forbidden.
Forbidden by what?
> Adding a new field is okay, as long as you use an id that was never used before. (Each struct definition has a comment indicating the next available id to use.)
I can understand this being the least problematic change to a type. But it still leads to 'if x has y field' behaviour, as your code tries to manage the full range of possible message types it might receive.
> Removing a field is okay if you've checked that no one is using it anymore.
That sounds super fluffy.
> As a small but intentional bonus, you can change an optional field to repeated while preserving binary compatibility.
Sorry, I don't follow? This bit confuses me 'change an optional field to repeated'.
> In the end it works out. You can think of breakages that could theoretically happen, but they don't.
I can think of many:
* If picking of IDs is done by a human, at some point a human will make a mistake and re-use an existing one
* If 'Changing the type or id of an existing field is forbidden' is a human enforced constraint, then it will fail
* If you think a certain struct pattern can't happen any more (you think you've retired all nodes that send the old format), and then you deprecate the many matches that deal with legacy messaging, and then realise that actually there is an old node that does it after all.
* You may re-add a field to a type which was previously removed and cause unexpected behaviour in parts of the system that match on that old format
* Removing a field that you thought wasn't used any more but actually still is
By the way, I'm not suggesting it's not possible to develop robust systems without a static type system of some sort; but I do think the hoops you're jumping through in items 1-6 indicate the problems of not using static types. Each change in functionality could just use a new struct, with a new function, and the old function maps to the old struct to the new one. It captures precisely the change in logic in one place, has no runtime cost for nodes that are sending the new struct, and can't lead to the edge cases that I listed above.