I did not know that strings in Ruby have encodings. Is there a reason for that? I personally don't like mixing characters and opaque byte sequences as they are very different.
The representation of a Rust String in memory is guaranteed valid UTF-8. To me, a "sequence of Unicode scalar values" is an abstract description, because it could be implemented via UTF-8, UTF-16 or UTF-32.
> I personally find it unfortunate that they dictate the storage of it at the API level
It is extraordinarily convenient and provides a very transparent way to analyze the performance of string operations.
For transcoding, there is the in-progress `encoding` crate: https://github.com/lifthrasiir/rust-encoding
I note that Go does things very similarly (`string` is conventionally UTF-8) and it works famously for them. They have a much more mature set of encoding libraries, but they work the same as the equivalent libraries would work in Rust: transcode to and from UTF-8 at the boundaries. See: https://godoc.org/golang.org/x/text
[edit] I remember a talk where Matz was asked this specific question and tried to explain it clearly but seemed confused as to how the questioner could have such a poor grasp of unicode (the difference between monolingual americans and japanese i guess)