If I create a new Unicode object u'\x82\xb1\x82\xea\x82\xcd' how does
this creation process interpret the bytes in the byte string?
It doesn't, because there is no byte-string. You have created a Unicode
object from a literal string of unicode characters, not bytes. Those
characters are:
Dec Hex Char
130 0x82 ‚
177 0xb1 ±
130 0x82 ‚
234 0xea ê
130 0x82 ‚
205 0xcd Ã
Don't be fooled that all of the characters happen to be in the range
0-255, that is irrelevant.
Does it
assume the string represents a utf-16 encoding, at utf-8 encoding,
etc...?
None of the above. It assumes nothing. It takes a string of characters,
end of story.
For reference the string is ã“れ㯠in the 'shift-jis' encoding.
No it is not. The way to get a unicode literal with those characters is
to use a unicode-aware editor or terminal:
.... print ord(c), hex(ord(c)), c
....
12371 0x3053 ã“
12428 0x308c れ
12399 0x306f ã¯
You are confusing characters with bytes. I believe that what you are
thinking of is the following: you start with a byte string, and then
decode it into unicode:
ã“れã¯
If you get the encoding wrong, you will get the wrong characters:
놂춂
If you start with the Unicode characters, you can encode it into various
byte strings:
'\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xaf'