Those libraries are equivalent to normalize( utf16 `0x00 0x41 0x03 0x08`) == length 1
Back to my top comment, I stated that UCS2 counts faster than UTf8 internally, because every BMP code point is just two bytes, what's wrong here? If variable-length is so good why py3k is using UCS-4 internally? (Wich means every character is exactly 32 bits. There, I said character again.)