Reading a large bz2 textfile exits early

N

Norman Rieß

Hello,

i am trying to read a large bz2 compressed textfile using the bz2 module.
The file is 1717362770 lines long and 8GB large.
Using this code

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file

the loop exits cleanly after 4311 lines in midline and the prints are
executed.
This happened on two different boxes runnig different brands of linux.
Is there something i miss or should be done differently?

Thank you.

Regards,
Norman
 
N

Norman Rieß

Am 02/21/10 22:09, schrieb Dennis Lee Bieber:
Please verify your indentation! What you posted above is invalid in
many ways.
I am sorry, the indentation suffered from pasting.

This is the actual code:

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file
 
S

Steven D'Aprano

This is the actual code:

source_file = bz2.BZ2File(file, "r")
for line in source_file:
print line.strip()

print "Exiting"
print "I used file: " + file


Have you verified that the bz file is good by opening it in another
application?
 
N

Norman Rieß

Am 02/22/10 09:02, schrieb Steven D'Aprano:
Have you verified that the bz file is good by opening it in another
application?

Yes, bzcat is running through the file fine. And piping bzcat output
into the python script reading stdin works fine, too.
 
L

Lie Ryan

Am 02/22/10 09:02, schrieb Steven D'Aprano:

Yes, bzcat is running through the file fine. And piping bzcat output
into the python script reading stdin works fine, too.

test with using something other than bzcat; bzcat does certain things
differently because of the way it works (a cat for bzipped file). Try
using plain "bunzip2 filename.bz2"
 
N

Norman Rieß

Am 02/22/10 14:29, schrieb Lie Ryan:
test with using something other than bzcat; bzcat does certain things
differently because of the way it works (a cat for bzipped file). Try
using plain "bunzip2 filename.bz2"

Did that too. Works as expected.
 
S

Stefan Behnel

Lie Ryan, 22.02.2010 14:29:
test with using something other than bzcat; bzcat does certain things
differently because of the way it works (a cat for bzipped file). Try
using plain "bunzip2 filename.bz2"

Please note that all of this has already been suggested on the python-tutor
list.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top