N
NevilleDNZ
Hi,
Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?
cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS
" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK
"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS
German/ALCOR quoting: á›testá› AOK
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)
The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?
Cheers
NevilleDNZ
Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?
cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS
German/ALCOR quoting: á›testá› AOK
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)
The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?
Cheers
NevilleDNZ