csv read _csv.Error: line contains NULL byte

C

chip9munk

Hi all!

I am reading from a huge csv file (> 20 Gb), so I have to read line by line:

for i, row in enumerate(input_reader):
# and I do something on each row

Everything works fine until i get to a row with some strange symbols "0I`00�^"
at that point I get an error: _csv.Error: line contains NULL byte

How can i skip such row and continue going, or "decipher" it in some way?

I have tried :
csvFile = open(input_file_path, 'rb')
csvFile = open(input_file_path, 'rU')
csvFile = open(input_file_path, 'r')

and nothing works.

if I do:

try:
for i, row in enumerate(input_reader):
# and I do something on each row
except Exception:
sys.exc_clear()

i simply stop an that line. I would like to skip it and move on.

Please help!

Best,

Chip Munk
 
T

Tim Golden

Hi all!

I am reading from a huge csv file (> 20 Gb), so I have to read line by line:

for i, row in enumerate(input_reader):
# and I do something on each row

Everything works fine until i get to a row with some strange symbols "0I`00�^"
at that point I get an error: _csv.Error: line contains NULL byte

How can i skip such row and continue going, or "decipher" it in some way?

Well you have several options:

Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:

def unfussy_reader(reader):
while True:
try:
yield next(reader)
except csv.Error:
# log the problem or whatever
continue


If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.

Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.

TJG
 
C

chip9munk

Without disturbing your existing code too much, you could wrap the

input_reader in a generator which skips malformed lines. That would look

something like this:



def unfussy_reader(reader):

while True:

try:

yield next(reader)

except csv.Error:

# log the problem or whatever

continue


I am sorry I do not understand how to get to each row in this way.

Please could you explain also this:
If I define this function,
how do I change my for loop to get each row?

Thanks!
 
T

Tim Golden

I am sorry I do not understand how to get to each row in this way.

Please could you explain also this:
If I define this function,
how do I change my for loop to get each row?

Does this help?

<code>
#!python3
import csv

def unfussy_reader(csv_reader):
while True:
try:
yield next(csv_reader)
except csv.Error:
# log the problem or whatever
print("Problem with some row")
continue

if __name__ == '__main__':
#
# Generate malformed csv file for
# demonstration purposes
#
with open("temp.csv", "w") as fout:
fout.write("abc,def\nghi\x00,klm\n123,456")

#
# Open the malformed file for reading, fire up a
# conventional CSV reader over it, wrap that reader
# in our "unfussy" generator and enumerate over that
# generator.
#
with open("temp.csv") as fin:
reader = unfussy_reader(csv.reader(fin))
for n, row in enumerate(reader):
print(n, "=>", row)


</code>


TJG
 
C

chip9munk

Ok, I have figured it out:

for i, row in enumerate(unfussy_reader(input_reader):
# and I do something on each row

Sorry, it is my first "face to face" with generators!

Thank you very much!

Best,
Chip Munk
 
M

Mark Lawrence

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top