Apparently Python uses a variety of internal representations depending on the string itself. I looked it up because I saw UTF-32 and thought there's no way that's what they do -- it's pretty much always the wrong answer.
It uses Latin-1 for ASCII strings, UCS-2 for strings that contain code points in the BMP and UCS-4 only for strings that contain code points outside the BMP.
It would be pretty silly for them to explode all strings to 4-byte characters.