undefined | Better HN

0 pointsz3t42y ago0 comments

Sorry I meant codepoint/characters, but it would not suprise me if there existed an encoding or language where my wording would be technically correct, but I do not know of any such encoding. I also did not know that there exist more then 5 combinations in Unicode, but I'm not supprised and my implementation is probably buggy. But I do challange you to test how well your favourite editor (terminal emulator cough) handles Unicode emojis.

0 comments

WorldMaker2y ago

UTF-8's original specification included 5-byte and 6-byte encodings to cover the complete astral plane (31-bit code points), but later specifications have marked those "invalid" today due to the current 21-bit limit of UTF-16 and to align both specifications for now rather than fix the bugs in UTF-16 (or scratch UTF-16 altogether). In theory, UTF-8 can even extend beyond 6-byte encodings (and UTF-32 into 8-byte encodings and beyond) if the next plane (63-bit code points) or the one after that ever needed to open up. (No one expects that any time soon, of course. Today Unicode is nowhere close to in danger of filling 21-bits much less 31. That would be a massive shock and the compatibility headache would be terrible with UTF-16 breaking or today's software breaking that hard codes the assumption that UTF-8 should never go past 4-byte encodings.)

rcoveson2y ago

If it wouldn't surprise you then I think you should recalibrate your feelings about how surprising Unicode encodings are. There aren't very many of them, they haven't changed in a very long time, and they don't deal with any of the stuff that makes Unicode very complicated (collation, combination characters, etc). They just encode 21-bit integers, albiet sometimes in a highly convoluted way for backwards-compatibility reasons (UTF-16). It's not the kind of thing that needs to be estimated, or where a layer of FUD is warranted (as it kind of is with combination characters). When talking of codepoints, it's just "up to 4 bytes", high confidence, nothing more to it.

j / k navigate · click thread line to collapse

0 pointsz3t42y ago0 comments

0 comments

WorldMaker2y ago

rcoveson2y ago

j / k navigate · click thread line to collapse