indil on Hacker News

Ask HN: Is Unicode Designed Badly?

The more I learn about Unicode, the more complicated it gets. It was rather shocking to learn that the presence of combining characters makes most "reverse a string" programming solutions incorrect, and that strings need to be normalized to compare them. The whole thing seems so much more complicated than it should be, but perhaps that's just the nature of the problem?

Was Unicode designed well? If it were designed from scratch today, with no legacy considerations, would the ideal design look like the current design? What would you change?

Being extremely ignorant of the problem space, the first thing I would consider for the chopping block would be combining characters. Just make every character a precomposed character (one code point), so there's no need for normalization. I'm curious if such a scheme could fit every code point into 32 bits, though. Would this be feasible?

3indil3y ago14

indil

Recent submissions

Ask HN: Is Unicode Designed Badly?

Recent submissions

Ask HN: Is Unicode Designed Badly?