converting octal strings to unicode

Discussion in 'Python' started by flamingivanova@gmail.com, Dec 24, 2004.

  1. Guest

    I have several ascii files that contain '\ooo' strings which represent
    the octal value for a character. I want to convert these files to
    unicode, and I came up with the following script. But it seems to me
    that there must be a much simpler way to do it. Could someone more
    experienced suggest some improvements?

    I want to convert a file eg. containing:

    hello \326du

    with the unicode file containing:

    hello Ödu


    ----------8<---------------------------------------
    #!/usr/bin/python

    import re, string, sys

    if len(sys.argv) > 1:
    file = open(sys.argv[1],'r')
    lines = file.readlines()
    file.close()
    else:
    print "give a filename"
    sys.exit()

    def to_unichr(str):
    oct = string.atoi(str.group(1),8)
    return unichr(oct)

    for line in lines:
    line = string.rstrip(unicode(line,'Latin-1'))
    if re.compile(r'\\\d\d\d').search(line):
    line = re.sub(r'\\(\d\d\d)', to_unichr, line)
    line = line.encode('utf-8')
    print line

    ----------8<---------------------------------------
    , Dec 24, 2004
    #1
    1. Advertising

  2. On 23 Dec 2004 18:41:57 -0800, rumours say that
    might have written:

    >I have several ascii files that contain '\ooo' strings which represent
    >the octal value for a character. I want to convert these files to
    >unicode, and I came up with the following script. But it seems to me
    >that there must be a much simpler way to do it. Could someone more
    >experienced suggest some improvements?


    decoded_string = "\326du".decode("string_escape")
    unicode_text = unicode(decoded_string, "latin-1")
    --
    TZOTZIOY, I speak England very best.
    "Be strict when sending and tolerant when receiving." (from RFC1958)
    I really should keep that in mind when talking with people, actually...
    Christos TZOTZIOY Georgiou, Dec 24, 2004
    #2
    1. Advertising

  3. On 23 Dec 2004 18:41:57 -0800, rumours say that
    might have written:

    >I have several ascii files that contain '\ooo' strings which represent
    >the octal value for a character. I want to convert these files to
    >unicode, and I came up with the following script. But it seems to me
    >that there must be a much simpler way to do it. Could someone more
    >experienced suggest some improvements?


    (hope I cancelled the previous off-by-one-backslash post...)

    your_string = "\\326du"
    decoded_string = your_string.decode("string_escape")
    unicode_text = unicode(decoded_string, "latin-1")
    --
    TZOTZIOY, I speak England very best.
    "Be strict when sending and tolerant when receiving." (from RFC1958)
    I really should keep that in mind when talking with people, actually...
    Christos TZOTZIOY Georgiou, Dec 24, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hostos
    Replies:
    7
    Views:
    5,188
    La'ie Techie
    Oct 15, 2003
  2. Replies:
    15
    Views:
    11,905
    Eric Sosman
    Jun 23, 2006
  3. Michael Goerz

    converting to and from octal escaped UTF--8

    Michael Goerz, Dec 3, 2007, in forum: Python
    Replies:
    9
    Views:
    1,839
    MonkeeSage
    Dec 4, 2007
  4. Asterix
    Replies:
    5
    Views:
    691
    Matt Nordhoff
    Aug 31, 2008
  5. Graham Nicholls

    tr octal strings

    Graham Nicholls, Aug 23, 2004, in forum: Ruby
    Replies:
    2
    Views:
    109
    Graham Nicholls
    Aug 23, 2004
Loading...

Share This Page