Unfortunately, in a security context, that is not only not guaranteed, but will be actively attacked, so in practice I'm not sure it buys you that much from a security perspective. A net positive, I think, but certainly not enough that you ca metaphorically kick back and enjoy your lemonade.
The binary format is one of the oldest of security vulnerabilities, by simply claiming a length of larger than the buffer allocated in the C program, though I'm inclined to credit that particular joy to C and not the data itself. Nowadays there aren't many languages where simply claiming to be really long will get you anywhere like that.
English is not immune. Think about “who’s on first” — there is no way to distinguish the untrustworthy name “who” from a grammatical part of the conversation.
Sure there is. Barring a pathologically bad wire format design, they’re easier to parse than an equivalent human editable encoding.
Eliminating the human-editing ability requirement also enables us to:
- Avoid introducing character encoding — a huge problem space just on its own — into the list of things that all parsers must get right.
- Define non-malleable encodings; in other words, ensure that there exists only one valid encoding for any valid message, eliminating parser bugs that emerge around handling (or not) multiple different ways to encode the same thing.
I've said similar things to this before. E.g. if you want a boolean, there's nothing simpler and less error-prone than a single bit. It represents exactly the values you need; nothing more and nothing less. You could take a byte if you didn't want to pack, and use the "0 is false, nonzero is true" convention, which is naturally usable in a lot of programming languages; that way there are 256 different values, but the set of inputs is still small and finite with each one having a defined interpretation.