Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\2

N

NevilleDNZ

Hi,

Apologies first as I am not a unicode expert.... indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?

cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported="\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220"
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
+"\N{runic cross punctuation}","AOK :)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
German/ALCOR quoting: á›­testá›­ AOK :)
German/ALCOR quoting:
Traceback (most recent call last):
File "./uc.py", line 5, in <module>
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)

The last print statement fails because the ascii "imported" characters
are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
it is actually already UTF-8 unicode?

Cheers
NevilleDNZ
 
N

NevilleDNZ

It was just TOO easy... on posting my message to google groups, and
when I re-read the posting on groups I found that google had pointed me
to a python-unicode tutorial...
www.reportlab.com/i18n/python_unicode_tutorial.html - exercise one :)

Gosh sometime a google is worth so much more then â‚₀¹â°â°!

Happy New Year
NevilleD

It works now:
$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
German/ALCOR quoting: á›­testá›­ AOK :)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹÄá›­ FAILS :-(
nevilled@alfa:/root0/home/nevilled/Project/20 $ vi ./uc.py
nevilled@alfa:/root0/home/nevilled/Project/20 $ cat ./uc.py
#!/usr/bin/env python
imported=unicode("\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220","utf-8")
print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :)" # xterm
encoding if UTF8
print "German/ALCOR quoting:",u"\N{runic cross punctuation}test\N{runic
cross punctuation}","AOK :)"
print "German/ALCOR quoting:",u"\N{runic cross
punctuation}"+imported+u"\N{runic cross punctuation}","Just TOO easy
:)"

$ ./uc.py
English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
German/ALCOR quoting: á›­testá›­ AOK :)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹÄá›­ Just TOO easy :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top