Iterating over a binary file

D

Derek

Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.
 
P

Peter Otten

Derek said:
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

You can tuck away the ugliness in a generator:

def blocks(infile, size=1024):
while True:
block = infile.read(size)
if len(block) == 0:
break
yield block

#use it:
for data in blocks(f):
someobj.update(data)

Peter
 
P

Paul Rubin

Derek said:
f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()

but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.
 
V

Ville Vainio

Paul Rubin said:
You can make it even uglier:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if len(data) <= 0:
break
someobj.update(data)
f.close()

There's been proposals around to add an assignment-expression operator
like in C, so you could say something like

f = file(filename, 'rb')
while len(data := f.read(1024)) > 0:
someobj.update(data)
f.close()

It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.
but that's the subject of holy war around here too many times ;-). Don't
hold your breath waiting for it.

Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.
 
D

Derrick 'dman' Hudson

It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C.

Probably true. Instead of ":=", I wouldn't mind getting rid of
expressions/statements difference as a whole.

Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)

-D

--
For society, it's probably a good thing that engineers value function
over appearance. For example, you wouldn't want engineers to build
nuclear power plants that only _look_ like they would keep all the
radiation inside.
(Scott Adams - The Dilbert principle)

www: http://dman13.dyndns.org/~dman/ jabber: (e-mail address removed)
 
P

Paul Rubin

Ville Vainio said:
It's funny, but I find the first version much more readable than the
second one. Especially if I consciously forget the "do lots of stuff
in condition part of while" indoctrination from C. If there is lots of
stuff in while you have to stare at it a bit more, and it becomes
"idiomatic", something you learn, perhaps even cookbook stuff, instead
of obvious-as-such.

Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.

Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1


Total reading difficulty: 5

I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.

It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.
 
D

Daniel Ehrenberg

Uh-oh. Don't go there. If there was no difference, then you would be
able to perform assignment, even define a class, in the condition of a
while. I don't think you want that based on what you said above. (I
certainly don't want to have to read code with such complexity!)

-D

I was able to create an simple text pager (like Unix's more) in some
nested list comprehensions. Just because I can do that doesn't mean
that real programs will be made like that. IMHO the difference between
statements and expressions doesn't really make sense, and it is one of
the few advantages Lisp/Scheme (and almost Lua) has over Python.

Daniel Ehrenberg
 
J

Jp Calderone

Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

f = file(filename, 'rb')
for data in iter(lambda: f.read(1024), ''):
someobj.update(data)
f.close()

Jp
 
S

Sambo

Paul said:
Idioms exist because they're useful, and there's already plenty of
them in Python, like ''.join(stringlist) or "for i in xrange(n)" etc.

Maybe the condition in the while statement makes that statement twice
as hard to read. However, the example as a whole can still be easier,
simply because it's shorter.

Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1


Total reading difficulty: 5

I got through college on a version of this reasoning. I was a math
major. I had friends studying history and literature who said "that's
a hard subject", but I thought they were crazy. But in a normal math
class, there's one textbook that you use for the whole semester, and
you cover maybe half the chapters in it. I was able to keep up. But
in a literature course, you usually have to read a different entire
book from cover to cover EVERY WEEK. I took a couple classes like
that and barely survived. Yes, it takes a lot more effort to read a
page of a math book than a page of a novel. When you compare the
total reading load though, math was a much easier major than
literature or history.

It's the same with programs. I'd rather read 5 lines of tight code
that each actually does something, than 3 pages of loose code (the
kind that's usually written in Java) that spastically meanders trying
to do the same thing, even if the individual loose lines are easier to
read than the tight lines.

I would say, that depends on the persons competency in a given language.
Naturally once you are writing long/large programs it is better to have tight
code, but for a newby it is too much to translate at once.
While I consider myself expert in "C" , I am still learning "C++".

That does not mean a language has to lack the capability.


Then again how large a program can you or would you want to write with python?

Cheers, Sam.
 
A

Anton Vredegoor

Paul Rubin said:
Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1


Total reading difficulty: 5

In Python it can be done even simpler than in C, by making the
"someobj.update" method return the length of the data:

#derek.py

class X:

def update(self,data):
#print a chunk and a space
print data,
return len(data)

def test():
x = X()
f = file('derek.py','rb')
while x.update(f.read(1)):
pass
f.close()

if __name__=='__main__':
test()

IMHO the generator solution proposed earlier is more natural to some
(all?) Python programmers.

Anton
 
A

Andrew MacIntyre

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

I believe the canonical form is:

f = file(filename, 'rb')
while 1:
data = f.read(1024)
if not data:
break
someobj.update(data)
f.close()

This was also the canonical form for text files, in the case where
f.readlines() wasn't appropriate, prior to the introduction of file
iterators and xreadlines().
 
P

Peter Abel

Derek said:
Pardon the newbie question, but how can I iterate over blocks of data
from a binary file (i.e., I can't just iterate over lines, because
there may be no end-of-line delimiters at all). Essentially I want to
to this:

f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any
way to avoid it, or a more elegant syntax? Thanks.

There's an aproach to mimic the following C-statements in Python:

while (result = f.read(1024))
{
do_some_thing(result);
}
.... global result
.... result=val
.... return val
....
.... print len(result)
....

121

Regards
Peter
 
S

Samuel Walters

|Thus Spake Derek On the now historical date of Tue, 06 Jan 2004 15:25:11
-0500|
f = file(filename, 'rb')
data = f.read(1024)
while len(data) > 0:
someobj.update(data)
data = f.read(1024)
f.close()

The above code works, but I don't like making two read() calls. Any way
to avoid it, or a more elegant syntax? Thanks.

Sounds to me like what you're missing (as in "longing in the heart" not
"missed while reading the documentation") is a "do while" construct.

----- Not Real Python Code ------
f = file(filename, 'rb')
do:
data = f.read(1024)
if len(data) > 0:
someobj.update(data)
while len(data) > 0
----- End of Fictional Python Code -----

Python doesn't have this construct. My understanding is that since
anything that can be done with a "do while" can be accomplished with a
"for" or "while" statement, that "do while" was not included. I'm
probably wrong, but that's my understanding.

Sometimes I miss the "do while" construct because it *can* make code more
legible, but I've mentally replaced it with many of the constructs
mentioned elsewhere in these threads. Generators can make a nice way of
hiding the complexity of code, but it's a judgment call of when your code
starts to become obtuse enough to hide bits of it elsewhere.

HTH

Sam Walters.
 
D

Derek

Samuel Walters said:
Sounds to me like what you're missing (as in "longing in
the heart" not "missed while reading the documentation") is
a "do while" construct.

----- Not Real Python Code ------
f = file(filename, 'rb')
do:
data = f.read(1024)
if len(data) > 0:
someobj.update(data)
while len(data) > 0
----- End of Fictional Python Code -----

Yup. Being the naive Python newbie that I am, your fictional code is
exactly what I wrote first without realizing Python has no do loop.
Python doesn't have this construct. My understanding is
that since anything that can be done with a "do while" can
be accomplished with a "for" or "while" statement, that "do
while" was not included. I'm probably wrong, but that's my
understanding.

Sometimes I miss the "do while" construct because it *can*
make code more legible, but I've mentally replaced it
with many of the constructs mentioned elsewhere in these
threads. Generators can make a nice way of hiding the
complexity of code, but it's a judgment call of when your
code starts to become obtuse enough to hide bits of it
elsewhere.

Generators seem like a powerful way to hide complexity. While in this
case my code is so simple that a generator would probably introduce
unnecessary obfuscation, it's a technique I'm sure I'll use a great
deal in the future. (And to think I didn't know generators even
existed when I asked this question.)
 
S

Samuel Walters

|Thus Spake Derek On the now historical date of Wed, 07 Jan 2004 16:01:52
-0500|
Yup. Being the naive Python newbie that I am, your fictional code is
exactly what I wrote first without realizing Python has no do loop.

I still do that all the time. Most languages have a do-while and I just
can't get it through my thick skull that python doesn't have that. At
least now I usually catch it as soon as I type "do."

Also take note that python has no select-case construct. The equivalent
construct is if-elif-elif-elif-else. If you think about it, this makes
sense. In languages with a select-case construct, you're usually
comparing a strongly typed variable with a set of constant cases. In a
loosely typed language, there's no guarantee that the variable being
passed into a select clause will come even close to being like the
constant cases. if-elif-else allows you to deal with each case in a much
more dynamic way via on-the-fly comparisons. select-case is a construct I
have not missed in the slightest because pythons if-elif-else construct is
just as legible and doesn't have the danger of forgetting a "break" clause.
Generators seem like a powerful way to hide complexity. While in this
case my code is so simple that a generator would probably introduce
unnecessary obfuscation, it's a technique I'm sure I'll use a great deal
in the future. (And to think I didn't know generators even existed when I
asked this question.)

Generators are fairly new, and for the moment I've put them under the same
mental category as regular expressions, threads and thermonuclear
warheads. Sometimes they're exactly what you need, but most of the time
they unnecessarily complicate things and are probably not what you want to
use.

Sam Walters.
 
P

Peter Hansen

Paul said:
Version 1:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while 1: 1
data = f.read(1024) 1
if len(data) <= 0: 1
break 1
someobj.update(data) 1
f.close() 1

Total reading difficulty: 7

Now the second version:

Statement Reading difficulty
========= ==================

f = file(filename, 'rb') 1
while len(data := f.read(1024)) > 0: 2
someobj.update(data) 1
f.close() 1

Total reading difficulty: 5

Hmmm... why only "2" for that line? It combines two function
calls, an assignment, a comparison, and control flow. Sounds
a lot like a candidate for, say, a "4", or maybe higher... after
all, the reading difficulty surely grows in some exponential
fashion, not just linearly with line length of number of nested
parentheses.

(Obviously this is all subjective... that's my point. Many people
who have grown comfortable with Python would find the second
unacceptable, and would certainly re-write the first to use
a nice iterator or a nice function call or something anyway,
so the point is sort of moot for them.)

-Peter
 
T

Terry Reedy

Samuel Walters said:
Also take note that python has no select-case construct. The equivalent
construct is if-elif-elif-elif-else.

Dictionaries are also useful for some of the things people do with case
select.

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top