Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\2

Discussion in 'Python' started by NevilleDNZ, Jan 1, 2007.

  1. NevilleDNZ

    NevilleDNZ Guest

    Hi,

    Apologies first as I am not a unicode expert.... indeed I the details
    probably totally elude me. Not withstanding: how can I convert a
    binary string containing UTF-8 binary into a python unicode string?

    cutdown example:
    $ cat ./uc.py
    #!/usr/bin/env python
    imported="\304\246\311\231\316\257\316\271\303\222
    \317\216\317\203\305\224\304\271\304\220"
    print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :)" # xterm
    encoding if UTF8
    print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
    +"\N{runic cross punctuation}","AOK :)"
    print "German/ALCOR quoting:",u"\N{runic cross
    punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("

    $ ./uc.py
    English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
    German/ALCOR quoting: á›­testá›­ AOK :)
    German/ALCOR quoting:
    Traceback (most recent call last):
    File "./uc.py", line 5, in <module>
    print "German/ALCOR quoting:",u"\N{runic cross
    punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
    ordinal not in range(128)

    The last print statement fails because the ascii "imported" characters
    are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
    it is actually already UTF-8 unicode?

    Cheers
    NevilleDNZ
    NevilleDNZ, Jan 1, 2007
    #1
    1. Advertising

  2. NevilleDNZ

    NevilleDNZ Guest

    Just TOO easy.... Re: Q: a simple(?) raw-utf-8 conversion to internal type unicode "\304\246\311\231\316\257\316\271\303\222"

    It was just TOO easy... on posting my message to google groups, and
    when I re-read the posting on groups I found that google had pointed me
    to a python-unicode tutorial...
    www.reportlab.com/i18n/python_unicode_tutorial.html - exercise one :)

    Gosh sometime a google is worth so much more then â‚₀¹â°â°!

    Happy New Year
    NevilleD

    It works now:
    $ ./uc.py
    English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
    German/ALCOR quoting: á›­testá›­ AOK :)
    German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹÄá›­ FAILS :-(
    nevilled@alfa:/root0/home/nevilled/Project/20 $ vi ./uc.py
    nevilled@alfa:/root0/home/nevilled/Project/20 $ cat ./uc.py
    #!/usr/bin/env python
    imported=unicode("\304\246\311\231\316\257\316\271\303\222
    \317\216\317\203\305\224\304\271\304\220","utf-8")
    print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :)" # xterm
    encoding if UTF8
    print "German/ALCOR quoting:",u"\N{runic cross punctuation}test\N{runic
    cross punctuation}","AOK :)"
    print "German/ALCOR quoting:",u"\N{runic cross
    punctuation}"+imported+u"\N{runic cross punctuation}","Just TOO easy
    :)"

    $ ./uc.py
    English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
    German/ALCOR quoting: á›­testá›­ AOK :)
    German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹÄá›­ Just TOO easy :)

    NevilleDNZ wrote:
    > Hi,
    >
    > Apologies first as I am not a unicode expert.... indeed I the details
    > probably totally elude me. Not withstanding: how can I convert a
    > binary string containing UTF-8 binary into a python unicode string?
    >
    > cutdown example:
    > $ cat ./uc.py
    > #!/usr/bin/env python
    > imported="\304\246\311\231\316\257\316\271\303\222
    > \317\216\317\203\305\224\304\271\304\220"
    > print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :)" # xterm
    > encoding if UTF8
    > print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test"
    > +"\N{runic cross punctuation}","AOK :)"
    > print "German/ALCOR quoting:",u"\N{runic cross
    > punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
    >
    > $ ./uc.py
    > English/ASCII quoting: "ĦəίιÒ ώσŔĹÄ" SUCCEEDS :)
    > German/ALCOR quoting: á›­testá›­ AOK :)
    > German/ALCOR quoting:
    > Traceback (most recent call last):
    > File "./uc.py", line 5, in <module>
    > print "German/ALCOR quoting:",u"\N{runic cross
    > punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-("
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
    > ordinal not in range(128)
    >
    > The last print statement fails because the ascii "imported" characters
    > are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that
    > it is actually already UTF-8 unicode?
    >
    > Cheers
    > NevilleDNZ
    NevilleDNZ, Jan 1, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PEP 246 revision

    , Mar 2, 2005, in forum: Python
    Replies:
    4
    Views:
    346
    boisgera
    Mar 13, 2005
  2. Rustom Mody

    AOP and pep 246

    Rustom Mody, Nov 1, 2007, in forum: Python
    Replies:
    13
    Views:
    526
    Kay Schluehr
    Nov 13, 2007
  3. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    934
    Grzegorz ¦liwiñski
    Jan 19, 2011
  4. Lucian Suciu
    Replies:
    3
    Views:
    90
    Lucian Suciu
    Nov 28, 2003
  5. Daniel Moore
    Replies:
    3
    Views:
    178
    Daniel Moore
    Jul 15, 2010
Loading...

Share This Page