_csv.Error: string with NUL bytes

Discussion in 'Python' started by fscked, May 3, 2007.

  1. fscked

    fscked Guest

    Anyone have an idea of what I might do to fix this? I have googled adn
    can only find some random conversations about it that doesn't make
    sense to me.

    I am basically reading in a csv file to create an xml and get this
    error.

    I don't see any empty values in any fields or anything...
     
    fscked, May 3, 2007
    #1
    1. Advertising

  2. fscked

    Larry Bates Guest

    fscked wrote:
    > Anyone have an idea of what I might do to fix this? I have googled adn
    > can only find some random conversations about it that doesn't make
    > sense to me.
    >
    > I am basically reading in a csv file to create an xml and get this
    > error.
    >
    > I don't see any empty values in any fields or anything...
    >


    You really should post some code and the actual traceback error your
    get for us to help. I suspect that you have an ill-formed record in
    your CSV file. If you can't control that, you may have to write your
    own CSV dialect parser.

    -Larry
     
    Larry Bates, May 3, 2007
    #2
    1. Advertising

  3. fscked

    fscked Guest

    On May 3, 9:11 am, Larry Bates <> wrote:
    > fscked wrote:
    > > Anyone have an idea of what I might do to fix this? I have googled adn
    > > can only find some random conversations about it that doesn't make
    > > sense to me.

    >
    > > I am basically reading in a csv file to create an xml and get this
    > > error.

    >
    > > I don't see any empty values in any fields or anything...

    >
    > You really should post some code and the actual traceback error your
    > get for us to help. I suspect that you have an ill-formed record in
    > your CSV file. If you can't control that, you may have to write your
    > own CSV dialect parser.
    >
    > -Larry


    Certainly, here is the code:

    import os,sys
    import csv
    from elementtree.ElementTree import Element, SubElement, ElementTree

    def indent(elem, level=0):
    i = "\n" + level*" "
    if len(elem):
    if not elem.text or not elem.text.strip():
    elem.text = i + " "
    for elem in elem:
    indent(elem, level+1)
    if not elem.tail or not elem.tail.strip():
    elem.tail = i
    else:
    if level and (not elem.tail or not elem.tail.strip()):
    elem.tail = i

    root = Element("{Boxes}boxes")
    myfile = open('test.csv', 'rb')
    csvreader = csv.reader(myfile)

    for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address,
    phone, country, city, in csvreader:
    mainbox = SubElement(root, "{Boxes}box")
    mainbox.attrib["city"] = city
    mainbox.attrib["country"] = country
    mainbox.attrib["phone"] = phone
    mainbox.attrib["address"] = address
    mainbox.attrib["name"] = name
    mainbox.attrib["pl_heartbeat"] = heartbeat
    mainbox.attrib["sw_ver"] = sw_ver
    mainbox.attrib["hw_ver"] = hw_ver
    mainbox.attrib["date_activated"] = activated
    mainbox.attrib["mac_address"] = mac
    mainbox.attrib["boxid"] = boxid

    indent(root)

    ElementTree(root).write('test.xml', encoding='UTF-8')

    The traceback is as follows:

    Traceback (most recent call last):
    File "createXMLPackage.py", line 35, in ?
    for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
    address, phone, country, city, in csvreader:
    _csv.Error: string with NUL bytes
    Exit code: 1 , 0001h
     
    fscked, May 3, 2007
    #3
  4. In <>, fscked wrote:

    > The traceback is as follows:
    >
    > Traceback (most recent call last):
    > File "createXMLPackage.py", line 35, in ?
    > for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
    > address, phone, country, city, in csvreader:
    > _csv.Error: string with NUL bytes
    > Exit code: 1 , 0001h


    As Larry said, this most likely means there are null bytes in the CSV file.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, May 3, 2007
    #4
  5. fscked

    fscked Guest

    On May 3, 9:29 am, Marc 'BlackJack' Rintsch <> wrote:
    > In <>, fscked wrote:
    > > The traceback is as follows:

    >
    > > Traceback (most recent call last):
    > > File "createXMLPackage.py", line 35, in ?
    > > for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
    > > address, phone, country, city, in csvreader:
    > > _csv.Error: string with NUL bytes
    > > Exit code: 1 , 0001h

    >
    > As Larry said, this most likely means there are null bytes in the CSV file.
    >
    > Ciao,
    > Marc 'BlackJack' Rintsch


    How would I go about identifying where it is?
     
    fscked, May 3, 2007
    #5
  6. fscked

    Guest

    On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
    > > As Larry said, this most likely means there are null bytes in the CSV file.
    > >
    > > Ciao,
    > > Marc 'BlackJack' Rintsch

    >
    > How would I go about identifying where it is?


    A hex editor might be easiest.

    You could also use Python:

    print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

    Dustin
     
    , May 3, 2007
    #6
  7. fscked

    Guest

    On May 3, 10:12 am, wrote:
    > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
    > > > As Larry said, this most likely means there are null bytes in the CSV file.

    >
    > > > Ciao,
    > > > Marc 'BlackJack' Rintsch

    >
    > > How would I go about identifying where it is?

    >
    > A hex editor might be easiest.
    >
    > You could also use Python:
    >
    > print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
    >
    > Dustin


    Hmm, interesting if I run:

    print open("test.csv").read().replace("\0", ">>>NUL<<<")

    every single character gets a >>>NUL<<< between them...

    What the heck does that mean?

    Example, here is the first field in the csv

    89114608511,

    the above code produces:
    >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,
     
    , May 3, 2007
    #7
  8. fscked

    Guest

    On Thu, May 03, 2007 at 10:28:34AM -0700, wrote:
    > On May 3, 10:12 am, wrote:
    > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
    > > > > As Larry said, this most likely means there are null bytes in the CSV file.

    > >
    > > > > Ciao,
    > > > > Marc 'BlackJack' Rintsch

    > >
    > > > How would I go about identifying where it is?

    > >
    > > A hex editor might be easiest.
    > >
    > > You could also use Python:
    > >
    > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<")
    > >
    > > Dustin

    >
    > Hmm, interesting if I run:
    >
    > print open("test.csv").read().replace("\0", ">>>NUL<<<")
    >
    > every single character gets a >>>NUL<<< between them...
    >
    > What the heck does that mean?
    >
    > Example, here is the first field in the csv
    >
    > 89114608511,
    >
    > the above code produces:
    > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,


    I'm guessing that your file is in UTF-16, then -- Windows seems to do
    that a lot. It kind of makes it *not* a CSV file, but oh well. Try

    print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")

    I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
    way to get the CSV reader to handle such encoding without reading in the
    whole file, decoding it, and setting up a StringIO file.

    Dustin
     
    , May 3, 2007
    #8
  9. fscked

    Peter Otten Guest

    wrote:

    > I'm guessing that your file is in UTF-16, then -- Windows seems to do
    > that a lot. It kind of makes it *not* a CSV file, but oh well. Try
    >
    > print open("test.csv").decode('utf-16').read().replace("\0",
    > ">>>NUL<<<")
    >
    > I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
    > way to get the CSV reader to handle such encoding without reading in the
    > whole file, decoding it, and setting up a StringIO file.


    Not pretty, but seems to work:

    from __future__ import with_statement

    import csv
    import codecs

    def recoding_reader(stream, from_encoding, args=(), kw={}):
    intermediate_encoding = "utf8"
    efrom = codecs.lookup(from_encoding)
    einter = codecs.lookup(intermediate_encoding)
    rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode,
    efrom.streamreader, einter.streamwriter)

    for row in csv.reader(rstream, *args, **kw):
    yield [unicode(column, intermediate_encoding) for column in row]

    def main():
    file_encoding = "utf16"

    # generate sample data:
    data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n"
    with open("tmp.txt", "wb") as f:
    f.write(data.encode(file_encoding))

    # read it
    with open("tmp.txt", "rb") as f:
    for row in recoding_reader(f, file_encoding):
    print u" | ".join(row)

    if __name__ == "__main__":
    main()

    Data from the file is recoded to UTF-8, then passed to a csv.reader() whose
    output is decoded to unicode.

    Peter
     
    Peter Otten, May 3, 2007
    #9
  10. fscked

    John Machin Guest

    On May 4, 3:40 am, wrote:
    > On Thu, May 03, 2007 at 10:28:34AM -0700, wrote:
    > > On May 3, 10:12 am, wrote:
    > > > On Thu, May 03, 2007 at 09:57:38AM -0700, fscked wrote:
    > > > > > As Larry said, this most likely means there are null bytes in the CSV file.

    >
    > > > > > Ciao,
    > > > > > Marc 'BlackJack' Rintsch

    >
    > > > > How would I go about identifying where it is?

    >
    > > > A hex editor might be easiest.

    >
    > > > You could also use Python:

    >
    > > > print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

    >
    > > > Dustin

    >
    > > Hmm, interesting if I run:

    >
    > > print open("test.csv").read().replace("\0", ">>>NUL<<<")

    >
    > > every single character gets a >>>NUL<<< between them...

    >
    > > What the heck does that mean?

    >
    > > Example, here is the first field in the csv

    >
    > > 89114608511,

    >
    > > the above code produces:
    > > >>>NUL<<<8>>>NUL<<<9>>>NUL<<<1>>>NUL<<<1>>>NUL<<<4>>>NUL<<<6>>>NUL<<<0>>>NUL<<<8>>>NUL<<<5>>>NUL<<<1>>>NUL<<<1>>>NUL<<<,

    >
    > I'm guessing that your file is in UTF-16, then -- Windows seems to do
    > that a lot.


    Do what a lot? Encode data in UTF-16xE without putting in a BOM or
    telling the world in some other fashion what x is? Humans seem to do
    that occasionally. When they use Windows software, the result is
    highly likely to be encoded in UTF-16LE -- unless of course the human
    deliberately chooses otherwise (e.g. the "Unicode bigendian" option in
    NotePad's "Save As" dialogue). Further, the data is likely to have a
    BOM prepended.

    The above is consistent with BOM-free UTF-16BE.
     
    John Machin, May 3, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Pete Wilson
    Replies:
    3
    Views:
    413
    David Harmon
    Apr 3, 2004
  2. =?Utf-8?B?RGFuaWVs?=
    Replies:
    2
    Views:
    336
    =?Utf-8?B?RGFuaWVs?=
    Feb 27, 2007
  3. Analog_Guy
    Replies:
    10
    Views:
    899
    Mike Treseler
    Jun 27, 2007
  4. Aguilera, Alexandre
    Replies:
    0
    Views:
    480
    Aguilera, Alexandre
    Mar 17, 2010
  5. Replies:
    5
    Views:
    80
    Mark Lawrence
    Mar 21, 2014
Loading...

Share This Page