eof

D

Diez B. Roggisch

braver said:
(Well, TextMate is pretty new, and I've just got a brand new Carbon
Emacs-devel from ports. And tabs don't match in a Python bundle and
the Python mode. Have to fix'em tabs. Chews a fig, mumbles to
himself... :)

Which is the reason one should use spaces.
So why Python's IO cannot yield f.eof() as easily as Ruby's can? :)

Because that requires buffering, something that affects speed. Are you
willing to sacrifice the speed for _all_ usecases just for the _few_
that would actually benefit from the eof()? I myself have seldomly found
the need for eof() - but permanently used the generator style of
line-producing files implement.

Considering your own repeated remarks about "I'd only use ruby if it
wasn't slower than Python", I'd think you could value that.

And you have been shown clear, concise solutions to your problem. Which
add the benefit of working in general stream scenarios, not only with
actual files. Granted, they aren't part of the stdlib - but then, lots
of things aren't.

Diez
 
B

braver

Because that's not how you compare languages. You compare languages by stating what you are actually trying to do, and figuring out the most natural solution in each language. Not "I can do this in x--how come I can't do it in y?"

Python doesn't have f.eof() because it doesn't compare to Ruby? Or
because I'm trying to compare them? :) That's giving up to Ruby too
early!

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-
wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

The suspicion lurking in the thread above was that that has to do with
Python IO buffering, that it somehow can't tell the f.eof() with
automatic look-ahead/push-back/simulate read, as transparently an
effectively as (in practice) Ruby does without much fuss. The reason
why such a useful feature -- useful not in Ruby or Perl or Pascal, but
algorithmically -- is not present in Python is a recurrent mystery,
evidenced in this group recurrently.

Cheers,
Alexy
 
H

Hrvoje Niksic

Diez B. Roggisch said:
Because that requires buffering, something that affects speed.

I don't get it, Python's files are implemented on top of stdio FILE
objects, which do buffering and provide EOF checking (of the sort
where you can check if a previous read hit the EOF, but still). Why
not export that functionality?
Considering your own repeated remarks about "I'd only use ruby if it
wasn't slower than Python", I'd think you could value that.

I see no reason why exposing the EOF check would slow things down.
 
B

braver

Granted, they aren't part of the stdlib - but then, lots
of things aren't.

As Hendrik noticed, I can't even add my own f.eof() if I want to have
buffering -- is that right? The tradeoff between speed and
convenience is something I'd rather determine and enable myself, if I
have the right tools.

Cheers,
Alexy
 
H

Hrvoje Niksic

braver said:
As Hendrik noticed, I can't even add my own f.eof() if I want to
have buffering -- is that right?

You can, you just need to inherit from built-in file type. Then
instances of your class get the __dict__ and with it the ability to
attach arbitrary information to any instance. For example:

class MyFile(file):
def __init__(self, *args, **kwds):
file.__init__(self, *args, **kwds)
self.eof = False

def read(self, size=None):
if size is None:
val = file.read(self)
self.eof = True
else:
val = file.read(self, size)
if len(val) < size:
self.eof = True
return val

def readline(self, size=None):
if size is None:
val = file.readline(self)
else:
val = file.readline(self, size)
if len(val) == 0:
self.eof = True
return val

The code needed to support iteration is left as an excercise for the
reader.
 
J

J. Clifford Dyer

Python doesn't have f.eof() because it doesn't compare to Ruby? Or
because I'm trying to compare them? :) That's giving up to Ruby too
early!

No and no, to your two questions. I'm not giving up on Ruby at all. In fact, I've never tried Ruby. My point isn't that the languages don't compare. My point is that your question shouldn't be "why doesn't python have an eof method on its file objects?" Your question should be "I want to do something different with the last line of a file that I iterate over. How do I do that best in python?" You've been given a couple solutions, and a very valid reason (performance) why buffered file objects are not the default. You may also consider trying subclassing file with a buffered file object that provides self.eof. (I recommend making it an attribute rather than a method. Set it when you hit eof.) That way you have the fast version, and the robust version.

You may find something of interest in the for/else construction as well

for line in file:
pass
else:
# This gets processed at the end unless you break out of the for loop.
pass

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-
wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

No, but showing a different python way is more valid, and if you were more forthcoming about your use case from the get-go, you would have gotten fewer vague answers.
The suspicion lurking in the thread above was that that has to do with
Python IO buffering, that it somehow can't tell the f.eof() with
automatic look-ahead/push-back/simulate read, as transparently an
effectively as (in practice) Ruby does without much fuss. The reason
why such a useful feature -- useful not in Ruby or Perl or Pascal, but
algorithmically -- is not present in Python is a recurrent mystery,
evidenced in this group recurrently.

A mystery which has been answered a couple times in this thread--it causes a performance hit, and python is designed so that you don't suffer that performance hit, unless you want it, so you have to program for it yourself.

You yourself said that performance is a complaint of yours regarding Ruby, so why claim that Ruby's way is clearly better in a case where it causes a known performance hit?
Cheers,
Alexy

Cheers,
Cliff
 
B

braver

You yourself said that performance is a complaint of yours regarding Ruby, so why claim that Ruby's way is clearly better in a case where it causes a known performance hit?

See Hrvoje's remark above -- we can have EOF and eat it too! Perhaps
it was just somehow omitted from the standard Python library because
it disliked by some folks or just forgotten. Is there a history of
the eof()'s fall from grace? Was it ever considered for inclusion?

Cheers,
Alexy
 
N

Neil Cerutti

I don't get it, Python's files are implemented on top of stdio
FILE objects, which do buffering and provide EOF checking (of
the sort where you can check if a previous read hit the EOF,
but still). Why not export that functionality?

You have to make a failed read attempt before feof returns true.
I see no reason why exposing the EOF check would slow things down.

I think it's too low level, and so doesn't do what naive users
expect. It's really only useful, even in C, as part of the
forensic study of a stream in an error state, yet naive C
programmers often write code like:

while (!f.feof()) {
/* Read a line and process it.
}

....and are flumoxed by the way it fails to work.

I think Python is well rid of such a seldomly useful source of
confusion.
 
G

greg

Hendrik said:
So he can't even help himself by setting his
own EOF attribute to False initially, and
to True when he sees an empty string.

Is there a reason for this Bondage style?

There's a fair amount of overhead associated with providing
the ability to set arbitrary attributes on an object, which
is almost never wanted for built-in types, so it's not
provided by default.

You can easily get it if you want it by defining a Python
subclass of the type concerned.
 
M

MonkeeSage

I think it's too low level, and so doesn't do what naive users
expect. It's really only useful, even in C, as part of the
forensic study of a stream in an error state, [...]

Indeed. I just wrote a little implementation of an IPS patcher for the
ips patches used on many old game roms (snes, genesis) for doing fan
translations from Japanese to other languages. The basic format of a
patch is the ascii header "PATCH", followed by 3 bytes telling offest
into datafile to apply patch chunk, 2 bytes telling chunk size, n
bytes of chunk, repeated, with final ascii "EOF" footer. As I was
using Haskell, the function was recursive, and it was useful to check
that "EOF" were the final bytes read and that no more bytes had been
read between the last data chunk and eof. In other words, on the
corner case that all the data in the patch was structurally valid,
except up to two bytes after the last chunk and before the "EOF",
checking that the absolute position in the file was eof gave me the
ability to differentiate the error states of the patch lacking the
closing ascii "EOF", or including extra data between the last chunk
and the "EOF." Without checking eof (or doing something more complex),
I would have only been able to detect the error as a missing footer.

Regards,
Jordan
 
B

braver

I think Python is well rid of such a seldomly useful source of
confusion.

So all that code folks wrote in Algol-like languages, -- e.g. this
works in Ada, --

while not End_of_File(f) loop
--
end if;

-- are confusing? Why not interpret it as reading until ^D on a
pipe? And plain files work fine. (What would Ruby do?:)
Historically, is it possible to trace the eof-related design decision
in stdlib? Most languages have the look-ahead eof, so when Python was
codified, there should gave been some decisions made.

Can we say that f.eof() in fact can check for EOF right after we've
read all characters from a file, but before a failed attempt to read
beyond? In Python's idiom,

for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

After the last line, we've read all bytes but didn't try a new line
yet -- is it the semantics of the for line in file:? I assume it'll
do the right thing if our file ends in \n. What if the last line is
not \n-terminated?

Cheers,
Alexy
 
N

Neil Cerutti

Can we say that f.eof() in fact can check for EOF right after
we've read all characters from a file, but before a failed
attempt to read beyond? In Python's idiom,

for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

After the last line, we've read all bytes but didn't try a new
line yet -- is it the semantics of the for line in file:?

Yes. After the above construction, there's no need to check for
eof.
I assume it'll do the right thing if our file ends in \n. What
if the last line is not \n-terminated?

Nothing bad happens as far as I know, but it may depend on the
underlying clib.
 
G

greg

braver said:
Historically, is it possible to trace the eof-related design decision
in stdlib?

You seem to be assuming that someone started out with a design
that included an eof() of the kind you want, and then decided
to remove it.

But I doubt that such a method was ever considered in the first
place. Someone used to programming with the C stdlib doesn't
think in terms of testing for EOF separately from reading,
because the C stdlib doesn't work that way.

Pascal started out with an eof() function because the earliest
implementations only worked with disk files. Later, when people
tried to make Pascal programs work interactively, they found
out that it was a mistake, as it provides opportunities such
as the following classic wrong way to read interactive input
in Pascal:

while not eof(input) do begin
write('Enter some data: ');
readln(input, line);
end

which stops and waits for input before printing the first
prompt.

By not providing an eof() function, C -- and Python -- make
it clear that testing for eof is not a passive operation.
It's always obvious what's going on, and it's much harder to
make mistakes like the above.
when Python was
codified, there should gave been some decisions made.

Some decisions were made when the C stdlib was designed, and
I think they were the right ones. Python wisely followed them.
for line lin file:
# look at a line
# we can tell eof occurs right here after the last line

No, if the line you just read ends in "\n", you don't know whether
eof has been reached until the for-loop tries to read another line.
 
M

MonkeeSage

By not providing an eof() function, C -- and Python -- make
it clear that testing for eof is not a passive operation.
It's always obvious what's going on, and it's much harder to
make mistakes like the above.

err...C has feof() in stdio (see "man 3 ferror").

Regards,
Jordan
 
H

hdante

Ruby has iterators and generators too, but it also has my good ol'
f.eof(). I challenge the assumption here of some majectically Python-

Ruby doesn't have the good ol' eof. Good old eof tests a single flag
and requires a pre read(). Ruby's eof blocks and does buffering (and
this is a very strong technical statement). I find it ok that ruby
subverts the good old eof, but it's unaceptable for python to do so.

Besides, it's probable that your code could work with the following
construct.

def try_read(f):
line = f.readline()
eof = (line == '')
return (line, eof)
def xor(a, b):
return a and not b or b and not a

count = 0
while True:
next_line, eof = try_read(f)
if not eof:
count += 1
line = next_line
process(line)
if xor(count % 1000 == 0, eof):
summarize(count, line)
if eof:
break

wayist spirit forbidding Python to have f.eof(), while Ruby, which has
all the same features, has it. Saying "it's not the Python way" is
not a valid argument.

Yes, it is.
 
M

MonkeeSage

Ruby doesn't have the good ol' eof. Good old eof tests a single flag
and requires a pre read(). Ruby's eof blocks and does buffering (and
this is a very strong technical statement).

Actually, to be a bit more technical, IO#eof acts like standard C eof
for File objects, it only blocks / requires a previous read() on
character devices and pipes and such. For files, it's the same as
checking the absolute position of the file stream: f.tell ==
File.size(f.path).

Of course, the same can be done in python quite easily (and probably
better implemented):

f.tell() == os.stat(f.name).st_size

I don't honestly see what the big deal is about including or excluding
an eof function / method in python. If you need it, it is easy to
implement, and like yourself and others have shown, you usually don't
need it.

Regards,
Jordan
 
H

hdante

Actually, to be a bit more technical, IO#eof acts like standard C eof
for File objects, it only blocks / requires a previous read() on
character devices and pipes and such. For files, it's the same as
checking the absolute position of the file stream: f.tell ==
File.size(f.path).

This is not the same as ISO C. f.tell could be equal to
File.size(f.path) and eof could be false. An extra read() is required.
 
D

Dennis Lee Bieber

Pascal started out with an eof() function because the earliest
implementations only worked with disk files. Later, when people
tried to make Pascal programs work interactively, they found
out that it was a mistake, as it provides opportunities such
as the following classic wrong way to read interactive input
in Pascal:

Pascal I/O worked with a "one element preread", where what we'd
consider a read operation was performed by the open operation -- which
made console I/O a royal pain (by pure Pascal syntax, the implicit open
of "stdin" required one character to be available before the rest of the
program could run).

In Pascal, one obtained the read-in element using a pointer
dereference operation, and /then/ performed a read operation to move the
"pre-read element" into the pointer reference.

Instead of:

f = open(somefile)
read(f)
do something with f^ (the dereference)

Original Pascal uses

f = open(somefile)
do something with f^
read(f)


--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

MonkeeSage

This is not the same as ISO C. f.tell could be equal to
File.size(f.path) and eof could be false. An extra read() is required.

My bad. As you might have surmised, I'm not a genius when it comes to
C. I thought that the eof flag was set when the pointer into the
stream was the same as the length of the stream, but I guess it makes
since that as an error flag, it would only be set after an attempted
read past the end of the stream (in which case, I guess you'd get a
NULL from the read, equivalent to python's empty string?).

Ps. braver, if you really want a ruby-like eof method on file objects
in python, how about overriding the open() alias with a file subclass
including eof?

import os

class open(file):
def __init__(self, name):
self.size = os.stat(name).st_size
file.__init__(self, name)
def eof(self):
return self.tell() == self.size

f = open('tmp.py')
print f.eof() # False
f.read()
print f.eof() # True

Regards,
Jordan
 
G

greg

Dennis said:
Pascal I/O worked with a "one element preread", where what we'd
consider a read operation was performed by the open operation -- which
made console I/O a royal pain

Yep. Later implementations reduced the pain somewhat by
using a "lazy" scheme which deferred the read until
you tried to do something with the buffer. But they
couldn't completely hide the fact that testing for eof
requires a lookahead.
Original Pascal uses

f = open(somefile)
do something with f^
read(f)

Actually, I think it was get(f) to read the next record
into the buffer. Read(f, x) was a higher-level procedure
equivalent to something like

x = f^;
get(f)

Plus for text files read() and write() did various other
fancy things.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top