Convert unicode escape sequences to unicode in a file

Discussion in 'Python' started by Jeremy, Jan 11, 2011.

  1. Jeremy

    Jeremy Guest

    I have a file that has unicode escape sequences, i.e.,

    J\u00e9r\u00f4me

    and I want to replace all of them in a file and write the results to a new file. The simple script I've created is copied below. However, I am getting the following error:

    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)

    It appears that the data isn't being converted when writing to the file. Can someone please help?

    Thanks,
    Jeremy


    if __name__ == "__main__":
    f = codecs.open(filename, 'r', 'unicode-escape')
    lines = f.readlines()
    line = ''.join(lines)
    f.close()

    utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
    print(utFound[:1000])


    o = open('newDice.sql', 'w')
    o.write(utFound.decode('utf-8'))
    o.close()
    Jeremy, Jan 11, 2011
    #1
    1. Advertising

  2. Jeremy

    Alex Willmer Guest

    On Jan 11, 8:53 pm, Jeremy <> wrote:
    > I have a file that has unicode escape sequences, i.e.,
    >
    > J\u00e9r\u00f4me
    >
    > and I want to replace all of them in a file and write the results to a new file.  The simple script I've created is copied below.  However, I am getting the following error:
    >
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)
    >
    > It appears that the data isn't being converted when writing to the file.  Can someone please help?


    Are you _sure_ that your file contains the characters '\', 'u', '0',
    '0', 'e' and '9'? I expect that actually your file contains a byte
    with value 0xe9 and you have inspected the file using Python, which
    has printed the byte using a Unicode escape sequence. Open the file
    using a text editor or hex editor and look at the value at offset 947
    to be sure.

    If so, you need to replace 'unicode-escape' with the actual encoding
    of the file.

    > if __name__ == "__main__":
    >     f = codecs.open(filename, 'r', 'unicode-escape')
    >     lines = f.readlines()
    >     line = ''.join(lines)
    >     f.close()
    >
    >     utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
    >     print(utFound[:1000])
    >
    >     o = open('newDice.sql', 'w')
    >     o.write(utFound.decode('utf-8'))
    >     o.close()
    Alex Willmer, Jan 11, 2011
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. harrelson
    Replies:
    3
    Views:
    3,880
    Craig Ringer
    Dec 10, 2004
  2. slomo
    Replies:
    5
    Views:
    1,506
    Duncan Booth
    Dec 2, 2007
  3. Guest
    Replies:
    2
    Views:
    548
    Tim Roberts
    Dec 15, 2007
  4. Guest
    Replies:
    4
    Views:
    697
    Martin v. Löwis
    Dec 19, 2007
  5. Jeremy
    Replies:
    0
    Views:
    566
    Jeremy
    Jan 11, 2011
Loading...

Share This Page