Looping through a file a block of text at a time not by line

Rosario Morgan · Jun 14, 2006

Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario

Rune Strand · Jun 14, 2006

Rosario said:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')
block = file.read(6000)
newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)
print newblock

I cannot use readlines because the file is 138MB all on one line.

Suggestions?

-Rosario

Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]
f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()

Fredrik Lundh · Jun 14, 2006

Rune said:
Probably a more terse way to do this, but this seems to work
import os

offset = 0
grab_size = 6000
file_size = os.stat('hotels.xml')[6]

ouch. why not just loop until f.read returns an empty string ?

f = open('hotels.xml', 'r')

while offset < file_size:
f.seek(offset)
data_block = f.read(grab_size)
offset += grab_size
print data_block
f.close()

here's a shorter and more reliable version:

f = open(filename)
for block in iter(lambda: f.read(6000), ""):
... process block

here's the terse version:

for block in iter(lambda f=open(filename): f.read(6000), ""): ...

:::

what happens if a <Rate> element straddles the border between two 6000
byte blocks, btw ?

</F>

bruno at modulix · Jun 14, 2006

Rosario said:
Hello

Help is great appreciated in advance.

I need to loop through a file 6000 bytes at a time. I was going to
use the following but do not know how to advance through the file 6000
bytes at a time.

file = open('hotels.xml')

while True:
block = file.read(6000)
if not block:
break
do_something_with_block(block)

or:

block = file.read(6000)
while block:
do_something_with_block(block)
block = file.read(6000)

newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)

Either you compile the regexp once and use the compiled regexp object:

exp = re.compile(r'<Rate.*?></Rate>')
(...)
newblock = exp.sub('', block)

or you use a non-compiled regexp:

newblock = re.sub(r'<Rate.*?></Rate>','',block)

Here, the first solution may be better. Using a SAX parser may be an
option too... (maybe overkill, or maybe the RightThingToDo(tm),
depending on the context...)

I cannot use readlines because the file is 138MB all on one line.

So much for the "XML is human readable and editable"....

Text file with mixed end-of-line terminations	2	Aug 31, 2011
How to Create a random password generator in a separate window	4	May 26, 2022
Ruby looping	3	Sep 4, 2010
Quickie - Regexp for a string not at the beginning of the line	16	Oct 25, 2012
finding a regular expression in a file	2	Apr 24, 2012
groveling over a file for Q:: and A:: stmts	3	Jul 24, 2012
[2.5.1] Read each line from txt file, replace, and save?	4	Sep 2, 2012
When the first line of a file tells something about the other lines	1	Aug 16, 2010

Looping through a file a block of text at a time not by line

Rosario Morgan

Rune Strand

Fredrik Lundh

bruno at modulix

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads