Unicode problem.... as always

Discussion in 'Python' started by Todd Jenista, Jul 1, 2003.

  1. Todd Jenista

    Todd Jenista Guest

    I have a parser I am building with python and, unfortunately, people
    have decided to put unicode characters in the files I am parsing.
    The parser seems to have a fit when I search for one \uXXXX symbol,
    and there is another unicode symbol in the file. In this case, a
    search and replace for © with a µ in the file causes the infamous
    ordinal error.
    My quick-fix, because they have good context, is to change them both
    to "UTF8", and then attempt to replace the UTF8 at the end with the
    original µ. The problem is that I am getting a µ when I try to
    re-insert using \u00b5 which is the UTF8 code.
    Words of wisdom would be greatly appreciated.
    Todd Jenista, Jul 1, 2003
    #1
    1. Advertising

  2. Todd Jenista wrote:

    > I have a parser I am building with python and, unfortunately, people
    > have decided to put unicode characters in the files I am parsing.


    Maybe this helps you. It converts a latin1 byte to unicode
    and then converts it to utf8.
    >>> s="ä"
    >>> s_u=unicode(s, "latin1")
    >>> s_utf8=s_u.encode("utf8")


    You need to know the encoding of the input (utf8, utf16) .

    thomas
    Thomas =?ISO-8859-15?Q?G=FCttler?=, Jul 1, 2003
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harald Kirsch
    Replies:
    2
    Views:
    2,114
    Harald Kirsch
    Aug 28, 2003
  2. Robert Mark Bram
    Replies:
    0
    Views:
    3,909
    Robert Mark Bram
    Sep 28, 2003
  3. Deryck
    Replies:
    4
    Views:
    508
    derek giroulle
    Jun 22, 2004
  4. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    530
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  5. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    500
    Gabriele *darkbard* Farina
    May 16, 2006
Loading...

Share This Page