_csv.Error: string with NUL bytes

F

fscked

Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.

I am basically reading in a csv file to create an xml and get this
error.

I don't see any empty values in any fields or anything...
 
L

Larry Bates

fscked said:
Anyone have an idea of what I might do to fix this? I have googled adn
can only find some random conversations about it that doesn't make
sense to me.

I am basically reading in a csv file to create an xml and get this
error.

I don't see any empty values in any fields or anything...

You really should post some code and the actual traceback error your
get for us to help. I suspect that you have an ill-formed record in
your CSV file. If you can't control that, you may have to write your
own CSV dialect parser.

-Larry
 
F

fscked

You really should post some code and the actual traceback error your
get for us to help. I suspect that you have an ill-formed record in
your CSV file. If you can't control that, you may have to write your
own CSV dialect parser.

-Larry

Certainly, here is the code:

import os,sys
import csv
from elementtree.ElementTree import Element, SubElement, ElementTree

def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
for elem in elem:
indent(elem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i

root = Element("{Boxes}boxes")
myfile = open('test.csv', 'rb')
csvreader = csv.reader(myfile)

for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name, address,
phone, country, city, in csvreader:
mainbox = SubElement(root, "{Boxes}box")
mainbox.attrib["city"] = city
mainbox.attrib["country"] = country
mainbox.attrib["phone"] = phone
mainbox.attrib["address"] = address
mainbox.attrib["name"] = name
mainbox.attrib["pl_heartbeat"] = heartbeat
mainbox.attrib["sw_ver"] = sw_ver
mainbox.attrib["hw_ver"] = hw_ver
mainbox.attrib["date_activated"] = activated
mainbox.attrib["mac_address"] = mac
mainbox.attrib["boxid"] = boxid

indent(root)

ElementTree(root).write('test.xml', encoding='UTF-8')

The traceback is as follows:

Traceback (most recent call last):
File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h
 
M

Marc 'BlackJack' Rintsch

The traceback is as follows:

Traceback (most recent call last):
File "createXMLPackage.py", line 35, in ?
for boxid, mac, activated, hw_ver, sw_ver, heartbeat, name,
address, phone, country, city, in csvreader:
_csv.Error: string with NUL bytes
Exit code: 1 , 0001h

As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch
 
F

fscked

As Larry said, this most likely means there are null bytes in the CSV file.

Ciao,
Marc 'BlackJack' Rintsch

How would I go about identifying where it is?
 
D

dustin

How would I go about identifying where it is?

A hex editor might be easiest.

You could also use Python:

print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin
 
I

IAmStarsky

A hex editor might be easiest.

You could also use Python:

print open("filewithnuls").read().replace("\0", ">>>NUL<<<")

Dustin

Hmm, interesting if I run:

print open("test.csv").read().replace("\0", ">>>NUL<<<")

every single character gets a >>>NUL<<< between them...

What the heck does that mean?

Example, here is the first field in the csv

89114608511,

the above code produces:
 
D

dustin

Hmm, interesting if I run:

print open("test.csv").read().replace("\0", ">>>NUL<<<")

every single character gets a >>>NUL<<< between them...

What the heck does that mean?

Example, here is the first field in the csv

89114608511,

the above code produces:

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot. It kind of makes it *not* a CSV file, but oh well. Try

print open("test.csv").decode('utf-16').read().replace("\0", ">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.

Dustin
 
P

Peter Otten

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot. It kind of makes it *not* a CSV file, but oh well. Try

print open("test.csv").decode('utf-16').read().replace("\0",
">>>NUL<<<")

I'm not terribly unicode-savvy, so I'll leave it to others to suggest a
way to get the CSV reader to handle such encoding without reading in the
whole file, decoding it, and setting up a StringIO file.

Not pretty, but seems to work:

from __future__ import with_statement

import csv
import codecs

def recoding_reader(stream, from_encoding, args=(), kw={}):
intermediate_encoding = "utf8"
efrom = codecs.lookup(from_encoding)
einter = codecs.lookup(intermediate_encoding)
rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode,
efrom.streamreader, einter.streamwriter)

for row in csv.reader(rstream, *args, **kw):
yield [unicode(column, intermediate_encoding) for column in row]

def main():
file_encoding = "utf16"

# generate sample data:
data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n"
with open("tmp.txt", "wb") as f:
f.write(data.encode(file_encoding))

# read it
with open("tmp.txt", "rb") as f:
for row in recoding_reader(f, file_encoding):
print u" | ".join(row)

if __name__ == "__main__":
main()

Data from the file is recoded to UTF-8, then passed to a csv.reader() whose
output is decoded to unicode.

Peter
 
J

John Machin

I'm guessing that your file is in UTF-16, then -- Windows seems to do
that a lot.

Do what a lot? Encode data in UTF-16xE without putting in a BOM or
telling the world in some other fashion what x is? Humans seem to do
that occasionally. When they use Windows software, the result is
highly likely to be encoded in UTF-16LE -- unless of course the human
deliberately chooses otherwise (e.g. the "Unicode bigendian" option in
NotePad's "Save As" dialogue). Further, the data is likely to have a
BOM prepended.

The above is consistent with BOM-free UTF-16BE.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top