Looping through a file a block of text at a time not by line

R

Rosario Morgan

Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario
 
R

Rune Strand

Rosario said:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario

Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]
f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()
 
F

Fredrik Lundh

Rune said:
Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]

ouch. why not just loop until f.read returns an empty string ?
f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()

here's a shorter and more reliable version:

f = open(filename)
for block in iter(lambda: f.read(6000), ""):
... process block

here's the terse version:

for block in iter(lambda f=open(filename): f.read(6000), ""): ...

:::

what happens if a <Rate> element straddles the border between two 6000
byte blocks, btw ?

</F>
 
B

bruno at modulix

Rosario said:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')

while True:
block = file.read(6000)
if not block:
break
do_something_with_block(block)

or:

block = file.read(6000)
while block:
do_something_with_block(block)
block = file.read(6000)

newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)

Either you compile the regexp once and use the compiled regexp object:

exp = re.compile(r'<Rate.*?></Rate>')
(...)
newblock = exp.sub('', block)

or you use a non-compiled regexp:

newblock = re.sub(r'<Rate.*?></Rate>','',block)

Here, the first solution may be better. Using a SAX parser may be an
option too... (maybe overkill, or maybe the RightThingToDo(tm),
depending on the context...)
I cannot use readlines because the file is 138MB all on one line.

So much for the "XML is human readable and editable"....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top