Graceful detection of EOF

M

MickeyBob

How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Thanks for any assistance.
MickeyBob
 
J

Jeff Epler

Write a file-like object that can "look ahead" and provide a flag to
check in your unpickling loop, and which implements enough of the file
protocol ("read" and "readline", apparently) to please pickle. The
following worked for me.

class PeekyFile:
def __init__(self, f):
self.f = f
self.peek = ""

def eofnext(self):
if self.peek: return False
try:
self.peek = self.f.read(1)
except EOFError:
return True
return not self.peek

def read(self, n=None):
if n is not None:
n = n - len(self.peek)
result = self.peek + self.f.read(n)
else:
result = self.peek + self.f.read()
self.peek = ""
return result

def readline(self):
result = self.peek + self.f.readline()
self.peek = ""
return result

import StringIO, pickle
o = StringIO.StringIO()
for x in range(5):
pickle.dump(x, o)
i = PeekyFile(StringIO.StringIO(o.getvalue()))
while 1:
i.eofnext()
if i.eofnext():
break
print pickle.load(i)
print "at the end"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBZZsVJd01MZaTXX0RAl0FAJ9GCBIWmLaS+UbhCgZGR6PlJ94c4QCePq/k
x9c7Hokjaj+RpSYryvEwCJ8=
=sIw8
-----END PGP SIGNATURE-----
 
A

Andrew Dalke

MickeyBob said:
How does one detect the EOF gracefully? Assuming I have a pickle file
containing an unknown number of objects, how can I read (i.e.,
pickle.load()) until the EOF is encountered without generating an EOF
exception?

Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj


This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
(e-mail address removed)
 
J

Jeremy Jones

Andrew said:
Why isn't catching the exception graceful?

# UNTESTED CODE

def load_pickle_iter(infile):
while 1:
try:
yield pickle.load(infile)
except EOFError:
break

for obj in load_pickle_iter(open("mydata.pickle", "rb")):
print obj


This is well in line with the normal Python idiom,
as compared to "look before you leap".

Andrew
(e-mail address removed)

So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :)

Seriously, this is beautiful. I understand generators, but haven't
become accustomed to using them yet. That is just beautiful, which _is_
Zen.


Jeremy Jones
 
E

Egbert Bouwman

A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?
egbert
 
P

Peter Otten

Egbert said:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains  information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
.... print first
.... break
....
a.... print line
....
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.

Peter
 
G

Gerrit

Peter said:
... print first
... break
...
a
... print line
...
b
c

Unless it hurts your feelings to unconditionally break out of a for-loop,
that is.

How about:
.... print line
....
b
c

Would hurt less feeling I presume.

Gerrit.

--
Weather in Twenthe, Netherlands 08/10 11:25:
11.0°C Few clouds mostly cloudy wind 0.9 m/s None (57 m above NAP)
--
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
-Dwight David Eisenhower, January 17, 1961
 
A

Alex Martelli

Jeremy Jones said:
So, what you're saying is that the Python way, in contradistinction to
"look before you leap", is "land in it, then wipe it off?" Can we get
that in the Zen of Python? :)

The "normal Python idiom" is often called, in honor and memory of
Admiral Grace Murray-Hopper (arguably the most significant woman in the
history of programming languages to this time), "it's Easier to Ask
Forgiveness than Permission" (EAFP, vs the LBYL alternative). This
motto has been attributed to many, but Ms Hopper was undoubtedly the
first one to use it reportedly and in our field.

In the general case, trying to ascertain that an operation will succeed
before attempting the operation has many problems. Often you end up
repeating the same steps between the ascertaining and the actual usage,
which offends the "Once and Only Once" principle as well as slowing
things down. Sometimes you cannot ensure that the ascertaining and the
operating pertain to exactly the same thing -- the world can have
changed in-between, or the code might present subtle differences between
the two cases.

In contrast, if a failed attempt can be guaranteed to not alter
persistent state and only result in an easily catchable exception, EAFP
can better deliver on its name. In terms of your analogy, there's
nothing to "wipe off" -- if the leap "misfires", no damage is done.


Alex
 
A

Alex Martelli

Egbert Bouwman said:
A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.

option 1, the one I would use:

thefile = open('somehugefile.txt')
first_line = thefile.next()
deal_with_first(first_line)
for line in thefile:
deal_with_other(line)

this requires Python 2.3 or better, so that thefile IS-AN iterator; in
2.2, get an iterator with foo=iter(thefile) and use .next and for on
that (better still, upgrade!).

option 2, not unreasonable (not repeating the open & calls...):

first_line = thefile.readline()
for line in thefile: ...

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).


I'm sure there are others, but 3 is at least 2 too many already,
so...;-)


Alex
 
P

Peter Otten

[as opposed to 'for first in lines: break']
Would hurt less feeling I presume.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

I feel a little uneasy with that ...unless I'm sure I want to deal with the
StopIteration elsewhere.
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data

part of which is indeed your suggestion.

Peter
 
J

Josiah Carlson

... print line
...
b
c

Would hurt less feeling I presume.

Unless it was empty, then you'd get the dreaded StopIteration!

IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.

- Josiah
 
M

Mel Wilson

A file is too large to fit into memory.
The first line must receive a special treatment, because
it contains information about how to handle the rest of the file.

Of course it is not difficult to test if you are reading the first line
or another one, but it hurts my feelings to do a test which by definition
succeeds at the first record, and never afterwards.
Any suggestions ?

f = file("lines.txt", "rt")
first_line_processing (f.readline())
for line in f:
line_processing (line)

ought to work.

Regards. Mel.
 
A

Alex Martelli

Peter Otten said:
Looking at it from another angle, the initial for-loop ist just a peculiar
way to deal with an empty iterable. So the best (i. e. clear, robust and
general) approach is probably

items = iter(...)
try:
first = items.next()
except StopIteration:
# deal with empty iterator, e. g.:
raise ValueError("need at least one item")
else:
# process remaining data

I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.


Alex
 
S

Steven Bethard

Josiah Carlson said:
IMO, unconditionally breaking out of a for loop is the nicer way of
handling things in this case, no exceptions to catch.

There's still a NameError to catch if you haven't initialized line:
.... break
....Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined

I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)

Steve
 
P

Peter Otten

Steven said:
There's still a NameError to catch if you haven't initialized line:
... break
...Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'line' is not defined

No, you would put code specific to the first line into the loop before the
break.
I don't much like the break out of a for loop, because it feels like a
misuse

I can understand that.

Peter
 
P

Peter Otten

Alex said:
I think it can't be optimal, as coded, because it's more nested than it
needs to be (and "flat is better than nested"): since the exception
handler doesn't fall through, I would omit the try statement's else
clause and outdent the "process remaining data" part. The else clause
would be needed if the except clause could fall through, though.

I relied more on the two letters 'e. g.' than I should have as there are two
different aspects I wanted to convey:

1. Don't let the StopIteration propagate:

items = iter(...)
try:
first = items.next()
except StopIteration:
raise MeaningfulException("clear indication of what caused the error")

2. General structure when handling the first item specially:

items = iter(...)
try:
first = items.next()
except StopIteration:
# handle error
else:
# a. code relying on 'first'
# b. code independent of 'first' or relying on the error handler
# defining a proper default.

where both (a) and (b) are optional.

As we have now two variants, I have to drop the claim to generality.
Regarding the Zen incantation, "flat is better than nested", I tend measure
nesting as max(indent level) rather than avg(), i. e. following my (perhaps
odd) notion the else clause would affect nesting only if it contained an
additional if, for, etc. Therefore I have no qualms to sometimes use else
where it doesn't affect control flow:

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
else:
# if it ain't red it must be yellow - nobody's afraid of blue
return peopleAfraidOfYellow

as opposed to

def whosAfraidOf(color):
if color == red:
return peopleAfraidOfRed
return peopleAfraidOfAnyOtherColor

That said, usually my programs have bigger problems than the above subtlety.

Peter
 
E

Egbert Bouwman

option 3, a bit cutesy:

for first_line in thefile: break
for line in thefile: ...

(again, in 2.2 you'll need some foo=iter(thefile)).
This technique depends in the file being positioned at line 2,
after the break.

However, In the Nutshell book, page 191, you write:
Interrupting such a loop prematurely (e.g. with break)
leaves the file's current position with an arbitrary value.

So the information about the current position is useless.

Do I discover a contradiction ?
egbert
 
A

Alex Martelli

Egbert Bouwman said:
This technique depends in the file being positioned at line 2,
after the break.

Not exactly, if by "being positioned" you mean what's normally meant for
file objects (what will thefile.tell() respond, what next five bytes
will thefile.read(5) read, and so on). All it depends on is the
_iterator_ on the file being "positioned" in the sense in which
iterators are positioned (what item will come if you call next on the
iterator).

In 2.3 a file is-an iterator; in 2.2 you need to explicitly get an
iterator as indicated in the parenthesis you've also quoted.

However, In the Nutshell book, page 191, you write:

So the information about the current position is useless.

Do I discover a contradiction ?

Nope -- the file's current position is (e.g.) what tell will respond if
you call it, and that IS arbitrary. In 2.2 (which is what the Nutshell
covers) you need to explicitly get an iterator to do anything else; in
2.3 you can rely on the fact that a file is its own iterator to make
your code simpler. But the iteration state is not connected with the
file's current position.


Alex
 
A

Alex Martelli

Steven Bethard said:
I don't much like the break out of a for loop, because it feels like a misuse
of a construct designed for iteration... But take your pick: StopIteration or
NameError. =)

Jacopini and Bohm have much to answer for...;-)



Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top