feof(), fseek(), fread()

ericunfuk · Mar 24, 2007

If fseek() always clears EOF, is there a way for me to fread() from an
offset of a file and still be able to detect EOF?i.e. withouting using
fseek(). I also need to seek to an offset in the file
frequently(forwards and backwards) and do fread() from that offset.

Or better still, could anyone let me know some good ways to achieve
what I need to do as above?Can I get hold of the file and being able
to read in without using fread()? Using memory addresses?

Thanks

Ben Pfaff · Mar 24, 2007

ericunfuk said:
If fseek() always clears EOF, is there a way for me to fread() from an
offset of a file and still be able to detect EOF?i.e. withouting using
fseek(). I also need to seek to an offset in the file
frequently(forwards and backwards) and do fread() from that offset.

Please explain how the several previous answers to your several
previous similar questions are inadequate.

ericunfuk · Mar 24, 2007

Please explain how the several previous answers to your several
previous similar questions are inadequate.

I still don't know how can I detect EOF while I'm always use fseek()
(It always clears EOF), and I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

Ben Pfaff · Mar 24, 2007

ericunfuk said:
I still don't know how can I detect EOF while I'm always use fseek()
(It always clears EOF), and I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

After fseek(), call fread() to read data. If it returns a short
read, you have either encountered an error or end of file. Use
ferror() and feof() to distinguish an error from end of file.

This is what I said last time. How is it inadequate?

Sjouke Burry · Mar 24, 2007

I still don't know how can I detect EOF while I'm always use fseek()
(It always clears EOF), and I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

If you read the docs for fseek, you find you
can seek to the end of the file.
ftell(fp) then tells you the total size, after
which you can get at any part of the file,
by seeking to the required location, and without ever
needing to use EOF.(unless you go past wat ftell told you)

Keith Thompson · Mar 24, 2007

Sjouke Burry said:
If you read the docs for fseek, you find you
can seek to the end of the file.
ftell(fp) then tells you the total size, after
which you can get at any part of the file,
by seeking to the required location, and without ever
needing to use EOF.(unless you go past wat ftell told you)

fseek() to the end of a binary file is not reliable.

Keith Thompson · Mar 24, 2007

ericunfuk said:
I still don't know how can I detect EOF while I'm always use fseek()
(It always clears EOF), and I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

Please don't quote signatures.

You keep talking about clearing EOF. What you really mean is clearing
the end-of-file indicator for a stream. EOF is something different
(even though it's an abbreviation of End Of File).

The end-of-file indicator indicates that the last attempt to read from
the file failed because you attempted to read past the end of the
file. The way to detect this is to check the result of whatever
function you're using to read from the file. If you use fread()
(which is perfectly reasonable), check the value it returns; if it
tells you that it read fewer items than you asked it to, then you've
probably reached the end of the file.

It's also possible that there's been an error of some sort; the
ferror() function can tell you this.

ericunfuk · Mar 24, 2007

After fseek(), call fread() to read data. If it returns a short

How about fseek() already seeked over EOF before I invoke fread(), and
it could cleared EOF, so feof() or ferror() won't work anymore?

ericunfuk · Mar 24, 2007

I'm really confused!!!
Can anyone answer this please:

Suppose now I fseek()ed, then I do fread() in which case I can tell
from the current file position pointer, it must encounter the original
end of the file(Let's assume this end-of-file position x).Would this
position x disappeared as I fseek()ed, as it says fseek() clears end-
of-file, so now if x has disappeared, in my understanding, fread()
would not encounter the original end-of-file it will read beyond x and
no error(even EOF) will happen?

Anyone enlighten me.

Keith Thompson · Mar 24, 2007

ericunfuk said:
I'm really confused!!!
Yes.

Can anyone answer this please:

Suppose now I fseek()ed, then I do fread() in which case I can tell
from the current file position pointer, it must encounter the original
end of the file(Let's assume this end-of-file position x).Would this
position x disappeared as I fseek()ed, as it says fseek() clears end-
of-file, so now if x has disappeared, in my understanding, fread()
would not encounter the original end-of-file it will read beyond x and
no error(even EOF) will happen?

There are several different entities covered by the terms "EOF" and/or
"end-of-file", and I think you're mixing them up.

1. EOF is a macro that expands to a constant integer expression. It's
the value returned by certain functions when they try and fail to
read data from a file. For example, fgetc() returns the value of
the next character in the file if that's possible; if it isn't, it
returns EOF.

2. The end of a file is simply the position in the file at which it
ends, depending on its current size.

3. An "end-of-file indicator" is an internal flag associated with a
stream. It's set when a function tries and fails to read past the
end of the file. It's cleared when the file is first opened, and
by fseek(). You can query it with feof(), but you seldom need to.

These are three entirely different things.

A "stream" is an entity in your executing program that's associated
with an external file. The file exists outside your program,
typically sitting on a disk somewhere (though there are a myriad of
other possibilities). fopen() associates a stream (in your program)
with the file (outside your program), and gives you a FILE* that you
can use to operate on the stream, and therefore on the file. The
"end-of-file indicator" is associated with the stream; it doesn't
exist in the external file. Similarly, your current position in the
file (as set by fseek(), or updated when you read data from the file)
is associated with the stream; it doesn't exist in the external file.

So, you call fseek(), which, if successful, clears the "end-of-file
indicator" for the stream (#3 above). (If you attempt to fseek() past
the end of the file (#2 above), it might succeed, or it might fail,
depending on the implementation. Don't try it unless you know what
you're doing.) fseek() has *no effect* on the external file; the end
of the file (#2 above) is not touched.

Now you call fread() on the file. If the fread() attempts to read
more data than there is between your current position and the end of
the file (#2), it will fail; it will indicate this by telling you that
it returned fewer items than you asked it to. This will also set
the end-of-file indicator (#3).

If you had called fgetc() rather than fread(), it would be the same
situation, except that fgetc() would indicate failure by returning the
value EOF (#1 above). If you're using fread() rather than fgetc(),
you probably won't be using the EOF macro at all.

Have you read section 12 of the comp.lang.c FAQ yet?
<http://www.c-faq.com/>

Beej Jorgensen · Mar 24, 2007

ericunfuk said:
in my understanding, fread() would not encounter the original
end-of-file it will read beyond x and no error(even EOF) will happen?

The question is, are you calling fread() anywhere past the end of the
file?

If the answer is yes, then feof() will return non-zero right after that.

Think of it this way: end-of-file is not something you cross like a
border. It's a place that you either are in, or are not in.

You're trying to fool the end-of-file flag by calling fseek() to clear
the flag before calling fread() past the end of the file. But it won't
work, because fread() will see that it is once again past the end of the
file, and it will just set the end-of-file flag again!

-Beej

Mark McIntyre · Mar 24, 2007

I still don't know how can I detect EOF

By readin the previous answers. All the IO functions return status
messages which will tell you when you found the end of a file.

I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

You can do it with fseek, thats what its for. However its up to you to
ensure your seeks remain inside the file by some other means. A
trivial way to do this is to determine the file size never seek
further than that from the start of the file.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Mar 24, 2007

How about fseek() already seeked over EOF before I invoke fread(), and
it could cleared EOF, so feof() or ferror() won't work anymore?

For goodness' sake. When did anyone say that fseek erased the EOF flag
eternally? Each function will set it, if and only if, it encounters
the end of the file.

You still think EOF is the end of hte file don't you?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Joe Wright · Mar 25, 2007

ericunfuk said:
I still don't know how can I detect EOF while I'm always use fseek()
(It always clears EOF), and I must be able to seek forwards and
backwards in the file, but I still don't know up to now how can I do
this without fseek(), that's why I posted this question.

Some part of your C education has been missed. First, EOF is a macro
defined in stdio.h probably as..

#define EOF (-1)

...It has a negative value of type int. You keep talking about fseek()
clearing EOF. Wrong. Nothing about EOF can be cleared.

Within the file system there is there is presumably an indicator
associated with the feof() function to tell us whether the file pointer
(no, not FILE pointer) is at the end of the file. For sake of this
discussion, let's call the flag eof (not EOF, which is a macro). This
flag ivolves itself with attempts to read past the end of file with
fgetc(), fgets(), etc. fseek() has nothing to do with reading from a
file. There is nothing that fseek() can do that ought to set the eof flag.

If you would navigate within the file you can know its bounds (as
offsets into the file) with two commands..

long eoff;
fseek(fp, 0, SEEK_END);
eoff = ftell(fp);

File offsets are now 0 to (not including) eoff, which is one past your
playground. You can't seek past eoff. Attempts to do it will fail.

Walter Roberson · Mar 25, 2007

Joe Wright said:
If you would navigate within the file you can know its bounds (as
offsets into the file) with two commands..

long eoff;
fseek(fp, 0, SEEK_END);
eoff = ftell(fp);

File offsets are now 0 to (not including) eoff, which is one past your
playground.

If fp is a binary stream, then SEEK_END might not be meaningfully
supported on it. (An example for unixoids: where would SEEK_END
put you for /dev/zero ?)

If fp is a text stream, then your eoff is an opaque token that need
not correspond to an integral file position. Seeking in a text stream
has to take into account end of line translation, so even if the value
turns out to be integral, it might not correspond to the number of
characters that one would getc() to reach that point in the file.

You can't seek past eoff. Attempts to do it will fail.

There is nothing in the C89 standard preventing you from seeking past
the end of file -- the matter is not discussed there. And if you happen
to be in a POSIX.1 environment, then seeking past end of file is well
defined, specifically discussed and okayed in the POSIX.1 description
of (POSIX's) lseek() .

Keith Thompson · Mar 25, 2007

Joe Wright said:
Some part of your C education has been missed. First, EOF is a macro
defined in stdio.h probably as..

#define EOF (-1)

..It has a negative value of type int. You keep talking about fseek()
clearing EOF. Wrong. Nothing about EOF can be cleared.

Within the file system there is there is presumably an indicator
associated with the feof() function to tell us whether the file
pointer (no, not FILE pointer) is at the end of the file. For sake of
this discussion, let's call the flag eof (not EOF, which is a
macro). This flag ivolves itself with attempts to read past the end of
file with fgetc(), fgets(), etc. fseek() has nothing to do with
reading from a file. There is nothing that fseek() can do that ought
to set the eof flag.

Your terminology here is potentially ambiguous.

Though the C standard doesn't use the phrase "file system", that
phrase commonly refers to the operating system infrastructure that
supports external files (typically on disk). It's sometimes written
as a single word, "filesystem".

What you mean by the term, I think, is the code in the C runtime
environment that implements the functions declared in <stdio.h>.

There is a flag associated with each stream that "records whether the
end of the file has been reached". The standard calls this the
"end-of-file indicator" (C99 7.19.1p2). I suggest that inventing
another name "eof flag" for this is not helpful.

What you refer to as a "file pointer" is what the standard calls the
"file position indicator". A stream also has an "error indicator"
associated with it.

[snip]

Gordon Burditt · Mar 25, 2007

File offsets are now 0 to (not including) eoff, which is one past your

playground. You can't seek past eoff. Attempts to do it will fail.

Attempts to seek past the existing size of a file do not fail on
all systems. It is possible to create a file with a "hole" in it
on many UNIX systems by opening the file, seeking to an offset of
1GB, and writing one byte. The initial part of the file reads as
a lot of zero bytes. Further, sometimes that file will fit on a
floppy disk much, much smaller than the apparent size of the file.

Chris Torek · Mar 26, 2007

There are several different entities covered by the terms "EOF" and/or
"end-of-file", and I think [the OP is] mixing them up.
Indeed.

1. EOF is a macro that expands to a constant integer expression. ...
2. The end of a file is simply the position in the file at which it
ends, depending on its current size.
3. An "end-of-file indicator" is an internal flag associated with a
stream. ...

Those are all we have in C, anyway. In actual implementations we
may also have an "EOF character", just to confuse the issue further.
(More on that later.)

These are three entirely different things.

As is the fourth, alas.

A "stream" is an entity in your executing program that's associated
with an external file. The file exists outside your program,
typically sitting on a disk somewhere (though there are a myriad of
other possibilities).

It is worth mentioning at least two of the other possibilities,
since they normally occur when running any C program (on a hosted
system anyway). An input stream can be connected to the user's
keyboard ("stdin"), or to the user's output screen/window/whatever
("stdout" and "stderr" both, normally).

The OP may (or may not) find some help by stepping outside the C
language for a while. I think he mentioned something about Linux
at one point, although for my purposes here, Linux, Unix, or even
Windows would all be similar enough. (Something like VMS or TSO
would not.)

If we put C aside for a while, we can ignore the whole concept of
a "stream". Here, a "file" is simply an on-disk entity. It is
definitely *not* something interactive like a keyboard or screen,
nor a communications channel to a remote computer like a socket.
(Unix-like systems attempt to make those special things act a lot
like disk files, with varying degrees of success. But we want to
ignore those too.)

Now, given an on-disk file, of whatever size, the OS has some way
of letting you "open" the file, then poke around in the contents.
A Unix-like system does this with the open(), read(), and lseek()
functions (none of which are Standard C, remember; they just happen
to be there, with known behavior, on the Unix-like systems).

When you open() a file, you get a small integer number called a
"file descriptor". There is *no* "end of file indicator" associated
with this descriptor, but there is a "current position within file"
associated with it. The current position is initially 0.

To move the current position (without doing anything else), you
use lseek().

To read data starting at the current position, you use read(). You
give read() three numbers: a file descriptor, a pointer to a buffer,
and a number of bytes. The descriptor identifiers the file and
carries the current lseek() offset. The buffer pointer tells the
OS where to copy the file data. The number of bytes tells the OS
how many bytes to read from the file, into that buffer:

int fd, result;
char buf[SIZE];
...
result = read(fd, buf, SIZE);

If the read is completely, totally, 100% successful, then:

- "result" is set to SIZE;
- the successfully-read bytes are put in buf[0] through buf[SIZE-1]; and
- the current lseek() position is moved forward SIZE bytes.

If the read fails entirely for some reason -- e.g., if the file is
on a floppy disk and the disk has gone bad -- the call returns -1:

- "result" is set to -1;
- the contents of buf[] may or may not be garbage[%];
- the variable "errno" is set to indicate the underlying error
(usually EIO, but maybe something else depending on the OS); and
- the current lseek() position does not change.

[% The buffer contents tend to depend on whether the OS uses DMA
and how the hardware behaves when the bad floppy reports its
bad-ness.]

There is a third possibility as well, though. Suppose that SIZE
is 100, the current offset is 200, and the file is only 220 bytes
long. In this case, there are only 20 bytes remaining in the file.
This read() call needs to "partly succeed" and "partly fail". How
can read() report "partial success"?

The answer is: if -1 means "error" and 100 means "total success",
then any number *between* those two means "partial success". The
exact *amount* of success is given by the number. In this case,
since there are 20 bytes left, the read() will return 20:

- "result" is set to 20;
- the successfully-read bytes are put in buf[0] through buf[19]; and
- the current lseek() position is moved forward 20 bytes.

In other words, "partial success" looks EXACTLY THE SAME as "total
success", except that the count is less than the number of bytes
you asked for.

Now, what happens if the current position is at, or even *beyond*,
the end of the file? One option -- one that was used in some OSes
before Unix -- is to report this as an "error". Unix could have
done that: it could have returned -1, and set errno to EEOF or some
such. This is not how Ken Thompson decided to do it, though.
Instead, he had the read() report "partial success", with the number
of bytes successfully read being zero. If read() calls this "partial
success" and reports zero bytes read, then:

- "result" is set to 0;
- the successfully-read bytes are not put anywhere (because
there are none), so buf[] is left unchanged; and
- the current lseek() position is moved forward 0 bytes,
which leaves it unchanged.

So, if one uses the low-level calls (open(), optionally lseek(),
read(), and close()) on a Unix-like system, one simply checks the
return value from each read() call: -1 means error, 0 means "end
of file", and any other value means "success", with the successful
count being very important, since it may be less than you asked
for.

Now, the reason for all of this "off-topic drift", as it were, is
because C's stdio was originally designed to "wrap around" the Unix
I/O model, but also allow for the more obnoxious I/O models found
on other systems available at the time.

In C, instead of a "file descriptor", you get a "stream". On a
Unix-like system, a stream is a pretty thin wrapper around the
underlying file descriptor. On other systems, it may be much
"thicker", hiding all kinds of system-level obnoxiousness, such as
different I/O routines for "interactive" streams (keyboard and
display) from "disk" streams (on-disk files). (Yes, other OSes
really do require different OS calls to do I/O on files vs devices.)

In any case, however, a "stream" still has, as an underlying concept,
a "current seek position". On a Unix-like system, this position
-- which you manipulate with fseek() -- is exactly the same thing
as the descriptor's byte offset, which the library manipulates with
lseek(). On more-obnoxious systems, though, the "fseek position"
may have little or even no resemblance to a byte offset. (In fact,
on some VMS systems, the values returned by ftell() are derived
from pointers obtained by malloc(). Each ftell() does a new malloc()
to remember exactly where you are in the file now, so that the full
positioning information can be passed to RMS and/or the SYS$QIO OS
calls.)

Now, fread() could have used a trick similar to Unix's read(): it
could return a short count for end-of-file, and -1 for error. But
there is one problem: fread() returns a size_t, which is an *unsigned*
integer. There is no "-1". So fread() returns zero for both
"encountered end of file" and "encountered error".

Similarly, fgetc() could have used a trick to report "end of file"
and "error" separately. In this case, fgetc() -- and getc() and
getchar(), which are defined in terms of fgetc() -- returns a value
in [0..UCHAR_MAX] on success. If UCHAR_MAX is 255 (as it usually
is), that means there are 256 "successful" values. Since fgetc()
returns an ordinary "int", and an "int" has to be able to count
negative numbers from -1 to -32767 (if not more), there are *plenty*
of extra values to use. But -- for some reason (I have no idea
what reason) -- the guys who wrote the original implementations
chose to report both "end of file" and "error" with a single return
value, just like fread().

As Keith wrote above, the value fgetc() returns, to indicate any
kind of failure -- "end of file" failure, or "error reading disk"
failure, or whatever -- is the one defined in <stdio.h>. On most
implementations, the value fgetc() returns for these failures is
-1. The C Standard requires only "a negative int value", and that
<stdio.h> define EOF to that particular negative value, but most
implementors use -1.

This is where the "end of file indicator" on the stream -- Keith's
item #3 -- comes in. Both fread() and fgetc() fail to distinguish
between the two "failure" cases. (These cases, remember, are:
"(A) You asked me to read, but I failed to read anything because,
while everything is all still working fine, there is nothing left
to read!" and "(B) You asked me to read, but I failed to read
anything ... and by the way, look out, the floppy disk is on fire!")

The guys who wrote the C "standard I/O" library decided to allow
you -- the C programmer -- to be able to distinguish between
these two cases, using the feof() and ferror() macros:

FILE *fp;
...
... attempt to read something from the file ...
if (our attempt to read failed) {
if (feof(fp))
printf(
"this was case (A): read failed, but all is well;\n"
"this was just the normal end of file.\n");
if (ferror(fp))
printf(
"this was case (B): read failed, and something is\n"
"badly wrong. Better check: is the disk on fire?\n");
} else {
... our attempt to read worked; use the data ...
}

The tricky bits with feof() and ferror() are:

1) These are "after-the-fact flags", not "predictions about the
future". You should only use the macros to test *why* a
read operation failed, after one has *actually failed*.

2) Once one or both of these flag is/are set, they *stay* set
until you, the C programmer, take action to clear them. There
are a number of ways to clear them, including the clearerr()
function. The clearerr() function clears both of them without
doing anything else. It does not correct the underlying
problem (e.g., put out the fire, in case (B)). But if you
have some way of correcting it, you can do that, and then do
clearerr(), and then try your read() again.

One of the many things confusing the OP is that the fseek() function
clears the end-of-file indicator -- the flag that feof() tests --
whenever the fseek() succeeds. It does so even if the fseek() puts
the current seek position at or beyond the actual end of the actual
on-disk file (assuming, as always, that there is in fact an actual
on-disk file involved).

The rule here, then, is the same as always: try to do the I/O,
and pay attention to the return value from your I/O function:

if (fseek(fp, newpos, SEEK_SET)) ... do something about failure ...

result = fread(buf, item_size, number_of_items, fp);
if (result == 0) {
if (feof(fp))
... fread() failed due to normal, ordinary EOF ...
else
... fread() must have failed due to serious problem ...
} else {
... work with the data: fread() got "result" items ...
}

or:

if (fseek(fp, newpos, SEEK_SET)) ... do something about failure ...

result = fgetc(fp);
if (result == EOF) {
if (feof(fp))
... fgetc() failed due to normal, ordinary EOF ...
else
... fgetc() must have failed due to serious problem ...
} else {
... work with the data: fgetc() got the byte in "result"
}

Beginners can use a simple rule: NEVER call feof(). NEVER, EVER.
Just look at the return value from fread() or fgetc() or whatever.
Assume that "read failed" means "normal end-of-file", i.e., that
disks never catch on fire (and, more practically, that no one
ever uses a magnet to put the floppy up on the fridge).

More-advanced C programmers can move on to the more-advanced rule:
use feof() (and ferror()) ONLY after a read operation (fread, fgetc,
etc) has failed. (People actually do stick magnets on floppies,
or take the floppy out of its "wrapper", or hammer more than one
into a drive, or any number of other bone-headed stunts.)

Now, about that fourth item, the "EOF character" ... if it even
exists. This is another OS-specific thing.

Consider the Unix-like system, in which keyboard input is "just
like" an ordinary file. You open() the keyboard ("/dev/tty" or
"/dev/console" perhaps), or -- more simply -- get the file descriptor
handed to you in the usual way. You call read() to get input.
What if the user wants to signal "end of input"? He has to get
your read() call to return 0. How can he do that?

On a Unix system, the trick is to input the "EOF character" at
the start of a line (i.e., after pushing the RETURN or ENTER key).
This "EOF character" is usually control-D, but is changeable.

The same technique applies to Microsoft's DOS and Windows systems,
except that the key is different: you use control-Z instead of
control-D. But here things get even weirder.

On very old MS-DOS systems, and systems that predated MS-DOS, *disk*
files had a peculiar problem: they did not have a "size" associated
with them. Instead of a "size", they had a "number of disk sectors".
A disk file was 0 sectors long, or 1 sector long, or 2 sectors, or
3 or 4 or whatever. If a disk sector was 512 bytes (though 128
and 256 were also common), then a disk *file* could be 0 bytes, or
512 bytes, or 1024 bytes, or 1536 bytes, or 2048 bytes, and so on
-- but no disk file could ever be just 20 bytes. It had to be at
least "one sector", if it had any bytes at all.

So, on these systems, how could you mark the end of a text file?
The answer was: pick a character, call it the "end of file" character,
and write that somewhere in the last sector. When reading the
file, if you encounter that character, pretend that there is no
more data in the file, even if there really is more.

The "EOF character" in these disk files was usually control-Z (this,
incidentally, is why MS-DOS and Windows use control-Z as a keyboard
"EOF" character). Some I/O routines might detect ^Z as EOF only
in the *last* sector, while others would detect it in *any* sector
(though the latter took less machine code, and with every byte
being precious, this was certainly more common). The fact that
this sort of "EOF character" exists is part of the reason that you
have to fopen() a binary file with "rb" to read it. (It is not
the whole reason, but it is part of the reason.) If you fopen()
with just plain "r", a control-Z byte in the stream may -- or may
not -- cause your stdio to report "end of file".

Keith Thompson · Mar 26, 2007

Chris Torek said:
There are several different entities covered by the terms "EOF" and/or
"end-of-file", and I think [the OP is] mixing them up.

Click to expand...

[snip]

An excellent amplification, as usual. On small point:

On very old MS-DOS systems, and systems that predated MS-DOS, *disk*
files had a peculiar problem: they did not have a "size" associated
with them. Instead of a "size", they had a "number of disk sectors".
A disk file was 0 sectors long, or 1 sector long, or 2 sectors, or
3 or 4 or whatever. If a disk sector was 512 bytes (though 128
and 256 were also common), then a disk *file* could be 0 bytes, or
512 bytes, or 1024 bytes, or 1536 bytes, or 2048 bytes, and so on
-- but no disk file could ever be just 20 bytes. It had to be at
least "one sector", if it had any bytes at all.

So, on these systems, how could you mark the end of a text file?
The answer was: pick a character, call it the "end of file" character,
and write that somewhere in the last sector. When reading the
file, if you encounter that character, pretend that there is no
more data in the file, even if there really is more.

The "EOF character" in these disk files was usually control-Z (this,
incidentally, is why MS-DOS and Windows use control-Z as a keyboard
"EOF" character). Some I/O routines might detect ^Z as EOF only
in the *last* sector, while others would detect it in *any* sector
(though the latter took less machine code, and with every byte
being precious, this was certainly more common). The fact that
this sort of "EOF character" exists is part of the reason that you
have to fopen() a binary file with "rb" to read it. (It is not
the whole reason, but it is part of the reason.) If you fopen()
with just plain "r", a control-Z byte in the stream may -- or may
not -- cause your stdio to report "end of file".

I assume this kind of behavior is also why an implementation is
allowed to append arbitrarily many null characters to the end of a
binary file.

When you write or read a file in text mode, the physical contents of
the file may have to be translated in various ways. Those
translations may not give you back the same thing you wrote. (For
example, if you explicitly write a control-Z character to a text file,
you may not see that same character when you read the same file;
instead, it may mark the effective end of the file, even if you had
written more data after it).

When you write or read a file in binary mode, you get back exactly
what you wrote -- *except* that an implementation-defined number of
null characters may be appended to the end of the stream. This is why
fseek with with whence == SEEK_END isn't necessarily meaningful
(though it is under, for example, Unix).

Mark McIntyre · Mar 26, 2007

I assume this kind of behavior is also why an implementation is
allowed to append arbitrarily many null characters to the end of a
binary file.

At least as likely it was due to VMS - all files are multiples of 512
bytes long, padded with nulls as required. Text files had ascii 26 to
mark the 'end' but the OS still considered a 12-byte file to be 512
bytes in size.

BTW isn't the file management family tree VMS->CPM->DOS?
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

fseek	17	Jul 27, 2012
fseek() newbie question	2	Mar 23, 2007
fseek()	1	Mar 24, 2007
fseek()	6	Mar 23, 2007
fseek() clears EOF?	6	Mar 23, 2007
Determining EOF using fseek()?	6	Jun 29, 2011
Help with EXT3 Filesystem work	1	Mar 13, 2022
fread breaks file descriptors opened in "w" mode.	14	Nov 18, 2008

feof(), fseek(), fread()

ericunfuk

Ben Pfaff

ericunfuk

Ben Pfaff

Sjouke Burry

Keith Thompson

Keith Thompson

ericunfuk

ericunfuk

Keith Thompson

Beej Jorgensen

Mark McIntyre

Mark McIntyre

Joe Wright

Walter Roberson

Keith Thompson

Gordon Burditt

Chris Torek

Keith Thompson

Mark McIntyre

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads