Read only last line-

R

RyanS09

Hello-

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

-Ryan
 
K

Keith Thompson

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

No, there isn't. A text file is a sequence of characters, not a
sequence of lines; there's no way to jump directly to the beginning of
the last line without knowing in advance how long it is.

As far as the standard is concerned, fseek(file, -2, SEEK_END)
invokes undefined behavior. The standard says:

For a text stream, either offset shall be zero, or offset shall be
a value returned by an earlier successful call to the ftell
function on a stream associated with the same file and whence
shall be SEEK_SET.

The result of ftell() on a text stream contains unspecified
information, usable only in a call to fseek() with whence==SEEK_SET.

Realistically, if you can assume that the last line is shorter than,
say, 80 characters, you can *probably* get away with doing an
fseek(file, -80, SEEK_END), then reading everything up to the end of
the file and grabbing just the last line from that. But that's still
a bit risky; for example, if the system encodes end-of-line as a CR-LF
pair, the fseek() could land you between a CR and an LF. And there's
always the risk that you've guessed wrong, and the last line is really
90 characters long.

The only portable approach is to read the entire file. You can likely
do better by trading off portability for performance.

<OT>
The Unix "tail -1" command does this. Source code for the GNU version
of the tail command is included in the coreutils package. The
implementation is undoubtedly non-portable and more complex than you
need, but you might get some ideas from it.
</OT>
 
E

Eric Sosman

Hello-

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

It can't be done in perfectly portable C, in part
because the way you're using fseek() isn't portable:

For a text stream, either offset shall be zero,
or offset shall be a value returned by an earlier
successful call to the ftell function [...] and
whence shall be SEEK_SET. (7.19.9.2/4)

Another problem is that each line (except the first)
begins immediately after the '\n' that ends the preceding
line; it is the preceding '\n' that marks the succeeding
character as a line-starter. You cannot tell whether an
arbitrary position in a text file is or isn't the start
of a line without reading the preceding character to see
whether it's a newline. And yet another problem is that
the file on disk may mark the line endings in some other
way, with an encoding that can only be translated to '\n'
by reading it in the forward direction.

Reading the whole file from start to finish is the
only perfectly portable approach. Remember the line most
recently read until you've successfully read another, and
when you reach end-of-file the remembered line is the last
one. (You might also wish to use ftell() to remember the
position of each line start, so you can fseek() back to the
last one again.) Unless the file is very large, this is a
perfectly reasonable approach.

If the file is very large and you're willing to try
something that isn't portable (i.e., isn't guaranteed to
work), you could try seeking to a spot about ten or a dozen
lines' worth before the end and using the read-and-remember
method on the tail of the file. This may -- *may* -- work;
only you can judge whether the risk is worth the reward.
 
F

Flash Gordon

Hello-

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

Read it backwards a character at a time using fseek building your number
as you go. Don't forget to check for file operations failing and for
integer overflow.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
W

websnarf

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

The core C language is not really useful for performing this kind of
operation as others have posted. fseek and ftell are really inadequate
functions as they assume file sizes are always less than LONG_MAX which
doesn't make any sense on modern file systems. Many systems have 64
bit versions of these functions which makes a little more sense, but in
about 30 years we're going to wish people were just using intmax_t.

Anyhow, if we ignore that problem, and the other bizarre ANSI-ism that
you can't seek to some offset you haven't previously visited, what I
would suggest is the following: In a loop seek to offsets of -1, -2,
-4, -8, -16, -32, ... etc. Then read until the end (actually to avoid
redundancy you only need to read half as much except the first where
you should read a whole byte). Then scan for the last '\n' found. If
you find a '\n' then you know you've found the offset of the last line,
otherwise just go to the next offset. The case of a file itself being
only one line, or other sort of read length error you can detect this
and just read the whole file as the line. On average you should expect
to never read equal or more than twice the length of the last line of
the file.

You could also just go backwards in fixed sized offsets (this would
tend to reduce the amount of over-reading), however, I would be
suspicious of the performance of fseek(). My own experience (on
Windows 98) is that the performance of fseek can be roughly as bad as
O(n), where n is the position you are seeking to. So the exponential
offset increasing will reduce this cost.
 
M

Michael Mair

Hello-

I am trying to write a snippet which will open a text file with an
integer on each line. I would like to read the last integer in the
file. I am currently using:
file = fopen("f.txt", "r+");
fseek(file, -2, SEEK_END);
fscanf(file, "%d", &c);
this works fine if the integer is only a single character. When I get
into larger numbers though (e.g. 502) it only reads in the 2. Is there
anything I can do to read the last line as an entity instead of looping
through then entire file? Thanks in advance-

I assume that you actually can seek to the end of the file.
This is not a given.

Determine the number of digits of INT_MAX (or, if you admit
negative numbers, the maximum number of digits plus sign from
INT_MAX and INT_MIN), numlen. INT_MAX and INT_MIN is found in
<limits.h>. For portability, use a general mechanism.
Then seek back from the file end by numlen+2.
Read numlen+2 characters. The number you are looking for is
stored between the last and the second-to-last '\n' [1].
Find the second-to-last '\n', apply sscanf() or strtol() to
extract the number from behind this position.

[1] This assumes a portable file structure i.e. the file ending
with a '\n'.


Cheers
Michael
 
E

Eric Sosman

Michael said:
I assume that you actually can seek to the end of the file.
This is not a given.

fseek(stream, 0, SEEK_END) is fine. Of course, not all
streams are seekable, and there's always the possibility of
I/O error.
Determine the number of digits of INT_MAX (or, if you admit
negative numbers, the maximum number of digits plus sign from
INT_MAX and INT_MIN), numlen. INT_MAX and INT_MIN is found in
<limits.h>. For portability, use a general mechanism.
Then seek back from the file end by numlen+2.

That's the bogus bit: fseek(stream, nonzero, SEEK_END)
and fseek(stream, nonzero, SEEK_CUR) have undefined behavior
on text streams (they violate a "shall"). They'll work on
many systems (just as INT_MAX+1 "works" on many systems), but
not on all. Worth a try, perhaps, but not portable.
 
M

Michael Mair

Eric said:
fseek(stream, 0, SEEK_END) is fine. Of course, not all
streams are seekable, and there's always the possibility of
I/O error.

This is what I meant; thank you for clarifying.
That's the bogus bit: fseek(stream, nonzero, SEEK_END)
and fseek(stream, nonzero, SEEK_CUR) have undefined behavior
on text streams (they violate a "shall"). They'll work on
many systems (just as INT_MAX+1 "works" on many systems), but
not on all. Worth a try, perhaps, but not portable.

Gah. I love C file semantics. Footnote 225) tells you that
fseek(stream, 0, SEEK_END) has UB for binary streams and
7.19.9.2#4 tells you, that fseek(stream, nonzero, whence)
works only if nonzero has been returned by ftell() and whence
is SEEK_SET.

Thank you for the correction.

-Michael
 
J

Joe Wright

Eric said:
fseek(stream, 0, SEEK_END) is fine. Of course, not all
streams are seekable, and there's always the possibility of
I/O error.



That's the bogus bit: fseek(stream, nonzero, SEEK_END)
and fseek(stream, nonzero, SEEK_CUR) have undefined behavior
on text streams (they violate a "shall"). They'll work on
many systems (just as INT_MAX+1 "works" on many systems), but
not on all. Worth a try, perhaps, but not portable.
If it's a text file stream and all lines are terminated with '\n'
including the last one, we first trip through the file looking for '\n'
characters and recording the position (offset) of the next character.

FILE *fp = fopen("file.txt", "r");
int ch;
long prev = 0, here;
while ((ch = fgetc(fp)) != EOF)
if (ch == '\n') {
prev = here;
here = ftell(fp);
}

At EOF, here is really the end of file and prev is the offset to the
previous (last) line.

fseek(fp, prev, SEEK_SET);

points you to it. Good luck.
 
K

Keith Thompson

Joe Wright said:
If it's a text file stream and all lines are terminated with '\n'
including the last one, we first trip through the file looking for
'\n' characters and recording the position (offset) of the next
character.

FILE *fp = fopen("file.txt", "r");
int ch;
long prev = 0, here;
while ((ch = fgetc(fp)) != EOF)
if (ch == '\n') {
prev = here;
here = ftell(fp);
}

At EOF, here is really the end of file and prev is the offset to the
previous (last) line.

fseek(fp, prev, SEEK_SET);

points you to it. Good luck.

That will work (assuming the file is seekable at all), but it requires
reading the entire file, which the OP was trying to avoid.

It records the position of the beginning of the last line, which will
let you re-read from that position, but it assumes that the file isn't
going to change; if you're assuming that, you might as well just read
the entire file and remember the last line.
 
J

Joe Wright

Keith said:
That will work (assuming the file is seekable at all), but it requires
reading the entire file, which the OP was trying to avoid.

It records the position of the beginning of the last line, which will
let you re-read from that position, but it assumes that the file isn't
going to change; if you're assuming that, you might as well just read
the entire file and remember the last line.

Good morning Keith. Thank you for reading. Regardless what the OP was
trying to avoid, we must read the entire file. Agreed? Who or what might
change the file while I am reading it? Reading the file character at a
time simplifies things so I don't need to know how long a line is. We
can know how long the last line is by subtracting prev from here,
allocating space for it, backing up (fseek()) to prev and reading the line.

It's a beautiful bright Sunday afternoon in Arlington. Who'sit on the
Left Coast?
 
E

Eric Sosman

Joe said:
[...] Reading the file character at a
time simplifies things so I don't need to know how long a line is. We
can know how long the last line is by subtracting prev from here,
allocating space for it, backing up (fseek()) to prev and reading the line.

7.19.9.4/2: "[...] For a text stream, its file position
indicator contains unspecified information, [...] the difference
between two such return values is not necessarily a meaningful
measure of the number of characters written or read."

So, calculating a line length by subtracting two file
positions is no use. You'd need to count each character of the
line as you read it, starting a new count after each '\n' that
isn't the last character in the file.
 
M

Mark McIntyre

fseek and ftell are really inadequate
functions as they assume file sizes are always less than LONG_MAX which
doesn't make any sense on modern file systems. Many systems have 64
bit versions of these functions which makes a little more sense, but in
about 30 years we're going to wish people were just using intmax_t.

Anyhow, if we ignore that problem, and the other bizarre ANSI-ism that
you can't seek to some offset you haven't previously visited,

Y'know, your advice would be considerably more useful if you skipped
the pointless diatribe at the start and just got right into answering
the question. If you don't like C, why do you hang around here?

Mark McIntyre
 
E

Eric Sosman

[...]
Anyhow, if we ignore that problem, and the other bizarre ANSI-ism that
you can't seek to some offset you haven't previously visited
> [... in a text stream ...]

Lots of people (including some who ought to know better)
are seduced by the notion that one can do arithmetic on the
"file position indicators" of text streams. In Standard C,
this is not reliable and leads to undefined behavior.

Some people who have learned the above profess that
Standard C is to blame for this sad situation. Such people
have had but limited experience of different systems' notions
of "file," and would do well to refrain from insulting those
whose knowledge is greater. It is better to remain silent and
risk suspicion of folly than to open one's mouth and confirm it.

Then again, there's always the faint possibility that the
Young Turks have a better idea than the Old Fogeys. If so,
let's hear about the better idea -- but let's hear about it
without gratuitous insults. We stand on the shoulders of
giants; let us refrain from making horsefly-buzzings in their
ears lest they become annoyed and swat us.
 
W

websnarf

Eric said:
[...]
Anyhow, if we ignore that problem, and the other bizarre ANSI-ism that
you can't seek to some offset you haven't previously visited
[... in a text stream ...]

Lots of people (including some who ought to know better)
are seduced by the notion that one can do arithmetic on the
"file position indicators" of text streams. In Standard C,
this is not reliable and leads to undefined behavior.

Some people who have learned the above profess that
Standard C is to blame for this sad situation. Such people
have had but limited experience of different systems' notions
of "file," and would do well to refrain from insulting those
whose knowledge is greater. It is better to remain silent and
risk suspicion of folly than to open one's mouth and confirm it.

ANSI C contains two functions which satisfy this: fgetpos(), and
fsetpos(). Notice how they used a data type (fpos_t) that does not
imply any arithmetic capabilities?

So you're saying people who see fseek/ftell which clearly uses "long
int" as the file position are not natually supposed to assume they can
do file pointer arithmetic, especially in light of the knowledge that
functions with more restrictive semantics exist as seperate functions?

I am well aware of systems that have a hard time with byte by byte file
offsets. One such obscure system is Windows 98. The difference is,
rather than simply punting and implementing some obscure kind of fseek,
they bit the bullet and did the obvious, but very slow, implementation
that at least behaved consistently.
Then again, there's always the faint possibility that the
Young Turks have a better idea than the Old Fogeys. If so,
let's hear about the better idea

Oh it may surprise you. The first thing I would start with is to stop
idolizing the original creators and/or the C standard, and recognize
when mistakes have been made.

You want an improvement? Not a problem:

#include <stdio.h>
#include <stdint.h>
int fseekB (FILE * fp, intmax_t offs);

The value offs is always taken as an offset (in bytes) from the
beginning of the file, if the offset does not refer to a valid file
position, -1 is returned and errno is set. No value of offs can lead
to UB so long as fp is a valid and active file.

#include <stdio.h>
#include <stdint.h>
intmax_t ftellB (FILE * fp);

The value returned is the exact offset in the file where the
corresponding to the next byte to be read/written. For a valid
non-empty file pointer fp, fseekB (fp, ftellB (fp)) is basically a
NO-OP. At EOF, the position returned is the length of the file. The
function properly observes ungetc(), and all other or ordinary file
operations.

These functions can retain the property of UB, that the ANSI C
committee cherishes so much, if fp is not a valid open file.
Personally, I would prefer that errors be returned if fp is either NULL
or a *closed* fp.

What could be simpler?

If a system has problems implementing it, then that's too bad. The x86
and 68K world implemented entire floating point emulators just to be
compliant with the C standard, a little work on the part of others
won't kill them. And as Microsoft and others have demonstrated -- its
not actually an impractical thing to implement.

If that doesn't work for you, then point me out one system where 1) it
would be impractical to implement this, and 2) which has a working C99
compiler on it. The marginal platforms are not going to be updated
beyond C89 anyways, so there is little sense in making provisions for
them.
-- but let's hear about it
without gratuitous insults. We stand on the shoulders of
giants; [...]

Its hard to *stand* if you are grovelling all the time.
 
C

CBFalconer

Eric Sosman wrote:
.... snip ...
-- but let's hear about it without gratuitous insults. We stand
on the shoulders of giants; [...]

Its hard to *stand* if you are grovelling all the time.

When I was 15 I was amazed at the ignorance of my parents. When I
was 25 I was amazed how much they had learned in the past 10
years. You are, apparently, about 15.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
M

Mark McIntyre

No value of offs can lead
to UB so long as fp is a valid and active file.

pipes too?
The value returned is the exact offset in the file where the
corresponding to the next byte to be read/written.

and you've tested this for sparse files, databases, etc? Files with
multiple read/write operations permitted? Files with lockable
sections?
These functions can retain the property of UB, that the ANSI C
committee cherishes so much,

Remarks like this merely make you look like a dickhead.
If a system has problems implementing it, then that's too bad.

Good plan, reduce the portability of C
<flame bait>?
to suit your own apparent inability to programme safely.
-- but let's hear about it
without gratuitous insults. We stand on the shoulders of
giants; [...]

Its hard to *stand* if you are grovelling all the time.

And how about when you're so far up yourself that you haven't seen
daylight in weeks?
Mark McIntyre
 
M

Mark McIntyre

The first thing I would start with is to stop
idolizing the original creators and/or the C standard, and recognize
when mistakes have been made.

You again equate "mistake" with compromise, portability and consensus.
Of course, if C owned by one designer that person could do whatever
they liked, eliminating any inconvenient platform or feature along the
way. But its not. Its owned by a committee representing a wide range
of interests, and therefore has to reflect a wider consensus than your
narrow view.

Unfortunately yet again any contribution you might have had to make to
this group is marred by your bias.
Mark McIntyre
 
J

Jordan Abel

pipes too?

A "request that cannot be satisfied" results in a nonzero return, not
UB. It is arguable that this also applies to a call of fseek on a text
stream with a value that does not correspond to a position in the file
which ftell might have returned.
and you've tested this for sparse files, databases, etc? Files with
multiple read/write operations permitted? Files with lockable
sections?

Again, failure is not the same as UB. What is a specific case that you
think invokes UB?
 
E

Eric Sosman

Jordan Abel wrote On 02/22/06 14:37,:
A "request that cannot be satisfied" results in a nonzero return, not
UB. It is arguable that this also applies to a call of fseek on a text
stream with a value that does not correspond to a position in the file
which ftell might have returned.




Again, failure is not the same as UB. What is a specific case that you
think invokes UB?

Keep in mind that we're speaking of text streams, where
the number of characters written to a stream need not be the
same as the number of bytes written to the file. A familiar
example is putc('\n', stream) on Windows, where one character
generates two bytes. There are also systems where writing a
newline produces no bytes in the file, systems where a file
contains both data bytes and metadata bytes, and systems that
use state-dependent encodings for extended character sets.

It's not so much a problem of U.B., but of failure that
doesn't produce a reliable indication. Seek to a position
that happens to be in the middle of a multi-byte character
or in the middle of a stretch of metadata, and the problem
may be difficult to detect: a byte in a file does not always
stand alone, but may require prior context (at an arbitrary
separation) for proper interpretation. Here's the stuff of
a nightmare or two: Imagine opening a stream for update,
seeking to the middle of a stretch of metadata, successfully
writing "Hello, world!" there, and only later discovering
that the successful write has corrupted the file structure
and made the entire tail end unreadable ...

It would be nice if one could do meaningful arithmetic on
file position indicators in text streams, but given the rich
variety of file encodings that exist in the world it is not
always possible to do so. The C Standard recognizes this
difficulty, and so does not attempt to guarantee that seeking
to arbitrary positions in text files will work as desired.
The Standard is cognizant of imperfections in reality, and
does not insist that reality rearrange itself for the Standard's
convenience.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top