undefined | Better HN

0 pointsest13y ago0 comments

Reading all of your comments, so you are suggesting a Unicode object should not have len() or substring() ?

A standard like that is totally not embarrassing.

0 comments

I am suggesting that people read about unicode before designing supposedly cross-platform applications or programming languages. It's not that hard, just different than ASCII.

estOP13y ago

Since you understand Unicode so well, can you explain dietrichepp's theory that Unicode don't need counting or offsets?

http://news.ycombinator.com/item?id=4834931

And why UCS4 (Not variable-length) is chosen in many Unicode implementations? Why wchar_t is always 32bit in posix?

cmccabe13y ago

Since you understand Unicode so well, can you explain dietrichepp's theory that Unicode don't need counting or offsets?

Unicode doesn't have "characters." If you talk about characters, all you've succeeded in doing is confusing yourself. Leave characters back in ASCII-land where they belong.

Counting code points is stupid. If you like counting code points, go sit in the corner. You don't understand unicode.

You can count graphemes, but it's not going to be easy. And most of the time, I don't see why you would need to do that.

And why UCS4 (Not variable-length) is chosen in many Unicode implementations? Why wchar_t is always 32bit in posix?

wchat_t is a horrible abomination that begs for death. Nobody should use it. Use UTF-8 instead. I think Python used to use UCS4, but they don't any more. It's a horrible representation because all your strings bloat up by 4x.

j / k navigate · click thread line to collapse

0 comments

cmccabe13y ago

I am suggesting that people read about unicode before designing supposedly cross-platform applications or programming languages. It's not that hard, just different than ASCII.

estOP13y ago

Since you understand Unicode so well, can you explain dietrichepp's theory that Unicode don't need counting or offsets?

http://news.ycombinator.com/item?id=4834931

And why UCS4 (Not variable-length) is chosen in many Unicode implementations? Why wchar_t is always 32bit in posix?

cmccabe13y ago

Since you understand Unicode so well, can you explain dietrichepp's theory that Unicode don't need counting or offsets?

Unicode doesn't have "characters." If you talk about characters, all you've succeeded in doing is confusing yourself. Leave characters back in ASCII-land where they belong.

Counting code points is stupid. If you like counting code points, go sit in the corner. You don't understand unicode.

You can count graphemes, but it's not going to be easy. And most of the time, I don't see why you would need to do that.

And why UCS4 (Not variable-length) is chosen in many Unicode implementations? Why wchar_t is always 32bit in posix?

j / k navigate · click thread line to collapse