fseek

  • Thread starter Christopher Benson-Manica
  • Start date
C

Christopher Benson-Manica

On thinking about the "replace a word in a file" thread, I wondered how easy
it would be to accomplish the same thing with only one file pointer. This led
me to some questions...

"For a text stream, offset must be zero, or a value returned by ftell (in
which case origin must be SEEK_SET)."

If offset is a value returned by ftell (which returns the current file
position), and origin is SEEK_SET, then fseek() sets the position to the
current position. What is the point of doing so? And, more importantly, why
can't text streams be fseek()'ed randomly like a binary stream can (i.e.,
offset can be any number of bytes)? Do I understand this paragraph correctly?
If so, can fgetpos() and fsetpos() be used to approximate fseek() for text
streams?
 
M

mf

Christopher said:
"For a text stream, offset must be zero, or a value returned by ftell (in
which case origin must be SEEK_SET)."

If offset is a value returned by ftell (which returns the current file
position), and origin is SEEK_SET, then fseek() sets the position to the
current position. What is the point of doing so? And, more importantly, why

It seems to make more sense if you read that as
"or a value previously returned by ftell," so you
may save the ftell value and use it later.
 
C

Christopher Benson-Manica

mf said:
It seems to make more sense if you read that as
"or a value previously returned by ftell," so you
may save the ftell value and use it later.

Aahh... ;( Now *that* makes more sense. Still, why the restriction? Or
rather, why do binary streams get to seek whereever they want, while text
streams are constrained to be proper and seek to someplace that was at least
valid at some point in the past? Maybe I should just go home, eh? ;)
 
E

Eric Sosman

Christopher said:
Aahh... ;( Now *that* makes more sense. Still, why the restriction? Or
rather, why do binary streams get to seek whereever they want, while text
streams are constrained to be proper and seek to someplace that was at least
valid at some point in the past? Maybe I should just go home, eh? ;)

Usually, people who wonder about this are thinking of
trying to fseek() to an arithmetically-calculated position
in the file: so-and-so many characters before or after such-
and-such a position. But how should this new position be
defined? Where should fseek(stream, 1000, SEEK_SET) land?

- The same position you'd reach by starting at the
beginning and doing getc(stream) 1000 times? This
might take just as long as doing the getc()s, if
there's no way to calculate the destination. For
example, consider reading an MS-DOS file, where
pairs of CR/LF must be translated to single '\n'
characters; the proper destination depends on how
many such pairs are skipped over, and that may not
be knowable without actually reading them.

- 1000 bytes from the start of the file? That might
not even be a valid character position at all; it
might not be possible to arrive at that position
by repeated getc()s. For example, consider reading
an OpenVMS VAR file, where each line consists of a
two-byte count, a "payload," and a possible padding
bytes. Some of those bytes have no "image" in the
data getc() can read, conversely, the '\n' characters
returned by getc() do not actually exist in the file.

There are lots and lots of file organizations out there,
and a requirement to support arbitrary seeking in text streams
would make efficient C implementations difficult or impossible.
One of the reasons fgetpos() and fsetpos() were created was
to give better support to file systems less simple than those
C grew up with.
 
K

Kelsey Bjarnason

On thinking about the "replace a word in a file" thread, I wondered how easy
it would be to accomplish the same thing with only one file pointer. This led
me to some questions...

"For a text stream, offset must be zero, or a value returned by ftell (in
which case origin must be SEEK_SET)."

If offset is a value returned by ftell (which returns the current file
position), and origin is SEEK_SET, then fseek() sets the position to the
current position. What is the point of doing so?

The value returned by ftell could have been from earlier:

get position
read
read
read
seek back to saved position
And, more importantly, why
can't text streams be fseek()'ed randomly like a binary stream can

Try this:

File x contains a series of chars '0', '1', '2' and so on, up to '9'.
Between each is written a "CRLF". In text mode, "CRLF"is treated as "\n"
on read, despite being two bytes in the file and only one after conversion.

Now, when attempting to seek to position n in the file... where n is
specified in "bytes from start of file"... exactly how many such "CRLF"
pairs are there to cope with and how should the offset be altered as a
result? After all, if I fgetc 9 times, the 10th will produce a specific
result; if I fseek to the 10th position and fgetc, I should get the same
result, no? But that would mean that fseek needs to figure out, on the
fly, the contents of the file and what transformations would be done. If
nothing else, it would be hellishly inefficient.
 
C

Christopher Benson-Manica

Eric Sosman said:
Usually, people who wonder about this are thinking of
trying to fseek() to an arithmetically-calculated position
in the file: so-and-so many characters before or after such-
and-such a position. But how should this new position be
defined? Where should fseek(stream, 1000, SEEK_SET) land?

Ah, I see now. Far be it from me to inconvenience implementors... ;)
Thanks.
 
A

Alan Balmer

Ah, I see now. Far be it from me to inconvenience implementors... ;)
Thanks.

It's a bit more than that. How do you seek past a newline if you're
counting bytes?
 
G

Glen Herrmannsfeldt

Christopher Benson-Manica said:
Ah, I see now. Far be it from me to inconvenience implementors... ;)

It is not so much the implementors convenience. The idea of fseek() is to
move the pointer without reading all the characters in between. C already
supplies ways to read all the characters, if you want to do that.

-- glen
 
J

Joe Wright

Alan said:
It's a bit more than that. How do you seek past a newline if you're
counting bytes?
Is this a trick question? What does seeking have to do with counting
bytes? What does 'seeking past a newline' mean? If you are counting
bytes and discover a '\n' you might perform an ftell(stream) and save
the result in a long. This will be the address that you can eventually
pass to fseek() to arrive at the place you are now, one past the '\n'.
 
G

Glen Herrmannsfeldt

Joe Wright said:
Alan Balmer wrote:
Is this a trick question? What does seeking have to do with counting
bytes? What does 'seeking past a newline' mean? If you are counting
bytes and discover a '\n' you might perform an ftell(stream) and save
the result in a long. This will be the address that you can eventually
pass to fseek() to arrive at the place you are now, one past the '\n'.

Seek, such as fseek(), puts the file position pointer at the specified
position in the file. In a file format with multiple line end characters,
all will be represented by a single '\n' to a program counting getc() calls
within a loop. If one expected the positions returned by ftell() or used
by fseek() to match those from counting getc() calls, the only
implementation for fseek() would be to read from the beginning of the file
while counting '\r' characters that are followed by '\n'.

The problem gets worse for the file format used by many IBM mainframe OSs.
The number of records in a block, or blocks on a disk track are not
necessarily knowable. In that case, pretty much the only solution is to
read every block from the
beginning, though it does not need to examine every byte that is read.
(For a file written on only once, it should be possible to seek by track and
block. If a file is closed, reopened, and then appended, there might be
short blocks in the middle of the file. There is no other way to account
for them.)

-- glen
 
R

Richard Bos

Alan Balmer said:
True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.

Erm... how? AFAIAA, the results of an fseek() on a file opened as a
binary stream do not have to be relevant to the same file opened as a
text stream.

Richard
 
A

Alan Balmer

Only for a binary stream.
True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.
 
D

Dan Pop

In said:
The offset parameter of fseek is in bytes.

Wrong on text streams, where it is in an unspecified unit.
The number of bytes in a newline is platform dependent.

Entirely irrelevant, for both kind of streams. If you attach a binary
stream to a text file, all the bets are off: you may not see a single
newline character in the whole file (implementations storing each line
of text in a variable size record typically don't bother to store the
newline character at all: it is implied, after the last character of the
record).

Dan
 
D

Dan Pop

In said:
True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.

Where did you get the idea from? Chapter and verse, please.

Dan
 
I

Irrwahn Grausewitz

Alan Balmer said:
True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.

AFAICT: no, you can't. Hence the restriction for using fseek() on
text streams:

ISO/IEC 9899:TC1 7.19.9.2#4

For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function on
a stream associated with the same file and whence shall be SEEK_SET.

Well, it says not explicitly: "a stream associated with the same file
opened in the same (text) mode", but the C-Rationale explains:

"Whereas a binary file can be treated as an ordered sequence of bytes
counting from zero, a text file need not map one-to-one to its
internal representation (see 7.19.2). Thus, only seeks to an earlier
reported position are permitted for text files. [...]"

Therefore I don't think the procedure you suggested will work
portably.

Regards
 
G

Glen Herrmannsfeldt

Irrwahn Grausewitz said:
(snip)
True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.

AFAICT: no, you can't. Hence the restriction for using fseek() on
text streams:

ISO/IEC 9899:TC1 7.19.9.2#4

For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function on
a stream associated with the same file and whence shall be SEEK_SET.

Well, it says not explicitly: "a stream associated with the same file
opened in the same (text) mode", but the C-Rationale explains:

"Whereas a binary file can be treated as an ordered sequence of bytes
counting from zero, a text file need not map one-to-one to its
internal representation (see 7.19.2). Thus, only seeks to an earlier
reported position are permitted for text files. [...]"

Therefore I don't think the procedure you suggested will work
portably.

I am not sure now of the status of C compilers for the PDP-10, but the
normal text file format has five ASCII characters per 36 bit word. A
possible C compatible binary format would have four 9 bit bytes per word.
That would make things extrememly complicated using fseek() between text and
binary files.

-- glen
 
I

Irrwahn Grausewitz

Glen Herrmannsfeldt said:
Irrwahn Grausewitz said:
(snip)
The offset parameter of fseek is in bytes.

Only for a binary stream.

True - the standard does not define the units for text streams (though
it's still bytes in at least some implementations.) However, you can
open the same file as binary, then use the results of fseek and ftell
to position the text file.

AFAICT: no, you can't. Hence the restriction for using fseek() on
text streams:

ISO/IEC 9899:TC1 7.19.9.2#4

For a text stream, either offset shall be zero, or offset shall be a
value returned by an earlier successful call to the ftell function on
a stream associated with the same file and whence shall be SEEK_SET.

Well, it says not explicitly: "a stream associated with the same file
opened in the same (text) mode", but the C-Rationale explains:

"Whereas a binary file can be treated as an ordered sequence of bytes
counting from zero, a text file need not map one-to-one to its
internal representation (see 7.19.2). Thus, only seeks to an earlier
reported position are permitted for text files. [...]"

Therefore I don't think the procedure you suggested will work
portably.

I am not sure now of the status of C compilers for the PDP-10, but the
normal text file format has five ASCII characters per 36 bit word. A
possible C compatible binary format would have four 9 bit bytes per word.
That would make things extrememly complicated using fseek() between text and
binary files.

Indeed. And it leads to this (again, quoting the C-Rationale):

"The need to encode both record position and position within a record
in a long value may constrain the size of text files upon which fseek
and ftell can be used to be considerably smaller than the size of
binary files."
 
A

Alan Balmer

Where did you get the idea from? Chapter and verse, please.

Dan

7.19.9.2

"For a text stream, either offset shall be zero, or offset shall be a
value returned by an _earlier successful call to the ftell function on
a stream associated with the same file_ and whence shall be SEEK_SET."

Emphasis added.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top