If it wouldn't surprise you then I think you should recalibrate your feelings about how surprising Unicode encodings are. There aren't very many of them, they haven't changed in a very long time, and they don't deal with any of the stuff that makes Unicode very complicated (collation, combination characters, etc). They just encode 21-bit integers, albiet sometimes in a highly convoluted way for backwards-compatibility reasons (UTF-16). It's not the kind of thing that needs to be estimated, or where a layer of FUD is warranted (as it kind of is with combination characters). When talking of codepoints, it's just "up to 4 bytes", high confidence, nothing more to it.