It's not about encodings at all, actually. It's about the API that is presented to the programmer.
And the way you take it all into account is by refusing to accept any defaults. So, for example, a string type should not have a "length" operation at all. It should have "length in code points", "length in graphemes" etc operations. And maybe, if you do decide to expose UTF-8 (which I think is a bad idea) - "length in bytes". But every time someone talks about the length, they should be forced to specify what they want (and hence think about why they actually need it).
Similarly, strings shouldn't support simple indexing - at all. They should support operations like "nth codepoint", "nth grapheme" etc. Again, forcing the programmer to decide every time, and to think about the implications of those decisions.
It wouldn't solve all problems, of course. But I bet it would reduce them significantly, because wrong assumptions about strings are the most common source of problems.