encoding hell - any chance of salvation ?

Discussion in 'Python' started by southof40, Mar 7, 2011.

  1. southof40

    southof40 Guest

    Hi - I've got some code which uses array (http://docs.python.org/
    library/array.html) to store charcters read from a file (it's not my
    code it comes from here http://sourceforge.net/projects/pygold/)

    The read is done, in GrammarReader.py, like this ...

    def readString(self, maxsize = -1):
    result = array('u')
    char = None
    while True:
    if (maxsize >= 0) and (len(result) >= maxsize):
    break
    char = self.reader.read(2)
    if (char == '') or (char == '\x00\x00'):
    break
    result.append(char)
    return result.tounicode()

    .... and results in the error"TypeError: array item must be unicode
    character" is raised (full stack trace at bottom) .

    The whole unicode thing is a bit strange because the input file is a
    compiled grammar and so not a text file at all (the file able to be
    downloaded from here http:///kubadev.com/share/VBScript.cgt)

    Can anyone make a suggestion as to the best way to allow the array
    object to accept what is in essence a binary file ?

    Here's the full stack trace ...

    >>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "pygold\Parser.py", line 100, in __init__
    self.loadTables(filename)
    File "pygold\Parser.py", line 365, in loadTables
    reader = GrammarReader(filename, self.encoding)
    File "pygold\GrammarReader.py", line 14, in __init__
    if not self.hasValidHeader():
    File "pygold\GrammarReader.py", line 43, in hasValidHeader
    header = self.readString(64) ## read max 64 chars
    File "pygold\GrammarReader.py", line 68, in readString
    result.append(char)
    TypeError: array item must be unicode character
    southof40, Mar 7, 2011
    #1
    1. Advertising

  2. southof40

    Tom Zych Guest

    southof40 wrote:
    > ...
    > result = array('u')
    > ...
    > ... and results in the error"TypeError: array item must be unicode
    > character" is raised (full stack trace at bottom) .
    > ...
    > Can anyone make a suggestion as to the best way to allow the array
    > object to accept what is in essence a binary file ?


    Glancing at the docs, it appears you want to use 'c', 'b', or 'B'
    instead of 'u' when creating array.

    --
    Tom Zych /
    "Would you like a lovely fluffy little white rabbit, little girl,
    or a cutesy wootesly little brown rabbit?"
    "Actually, I don't think my python would notice."
    Tom Zych, Mar 7, 2011
    #2
    1. Advertising

  3. southof40

    Terry Reedy Guest

    On 3/7/2011 6:24 AM, southof40 wrote:
    > Hi - I've got some code which uses array (http://docs.python.org/
    > library/array.html) to store charcters read from a file (it's not my
    > code it comes from here http://sourceforge.net/projects/pygold/)
    >
    > The read is done, in GrammarReader.py, like this ...
    >
    > def readString(self, maxsize = -1):
    > result = array('u')
    > char = None
    > while True:
    > if (maxsize>= 0) and (len(result)>= maxsize):
    > break
    > char = self.reader.read(2)
    > if (char == '') or (char == '\x00\x00'):
    > break


    print(type(char),char) # to see what is going on

    > result.append(char)
    > return result.tounicode()
    >
    > ... and results in the error"TypeError: array item must be unicode
    > character" is raised (full stack trace at bottom) .
    >
    > The whole unicode thing is a bit strange because the input file is a
    > compiled grammar and so not a text file at all (the file able to be
    > downloaded from here http:///kubadev.com/share/VBScript.cgt)
    >
    > Can anyone make a suggestion as to the best way to allow the array
    > object to accept what is in essence a binary file ?
    >
    > Here's the full stack trace ...
    >
    >>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in<module>
    > File "pygold\Parser.py", line 100, in __init__
    > self.loadTables(filename)
    > File "pygold\Parser.py", line 365, in loadTables
    > reader = GrammarReader(filename, self.encoding)
    > File "pygold\GrammarReader.py", line 14, in __init__
    > if not self.hasValidHeader():
    > File "pygold\GrammarReader.py", line 43, in hasValidHeader
    > header = self.readString(64) ## read max 64 chars
    > File "pygold\GrammarReader.py", line 68, in readString
    > result.append(char)
    > TypeError: array item must be unicode character
    >



    --
    Terry Jan Reedy
    Terry Reedy, Mar 7, 2011
    #3
  4. southof40

    southof40 Guest

    Thanks for both the suggestions. I haven't yet had time to try them
    out but will do so and report back.
    southof40, Mar 8, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?cm9kY2hhcg==?=

    by any chance can you...

    =?Utf-8?B?cm9kY2hhcg==?=, Jun 11, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    374
    Eliyahu Goldin
    Jun 12, 2005
  2. fscked
    Replies:
    8
    Views:
    441
    Stefan Behnel
    Apr 14, 2007
  3. Roland Hell Grossiter
    Replies:
    2
    Views:
    112
  4. Damphyr

    Encoding hell

    Damphyr, Sep 5, 2005, in forum: Ruby
    Replies:
    7
    Views:
    146
    Zach Dennis
    Sep 5, 2005
  5. Xavier Noëlle

    [ENCODING] UTF8 hell

    Xavier Noëlle, Feb 2, 2010, in forum: Ruby
    Replies:
    12
    Views:
    509
    Michael Fellinger
    Feb 24, 2010
Loading...

Share This Page