Reading LAST line from text file without iterating through the file?

R

Robin Wenger

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Robin
 
K

Knute Johnson

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Robin

You could use a RandomAccessFile and search backwards from the end for a
linefeed. Depending on the size of the line and the size of the file,
it might not be more efficient than reading the whole file.
 
K

Knute Johnson

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Robin

You could use a RandomAccessFile and search backwards from the end for a
linefeed. Depending on the size of the line and the size of the file,
it might not be more efficient than reading the whole file.
 
L

Lew

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Yes, but it's tricky. You need a random-access file and seek backwards to a
newline.
 
I

Ian Shef

(e-mail address removed) (Robin Wenger) wrote in
Is it possible to read the last text line from a text file WITHOUT
reading the previous (n-1) lines?

Robin
Yes, under certain circumstances. For example, if you know "n" and know that
all of the lines are of some fixed length (also known). There are other
situations as well.
 
E

Eric Sosman

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Others have mentioned using RandomAccess to work backward from the
end of the file until you find the penultimate line-ending. This can
work, but it can also fail. Consider a file with context-sensitive
encoding, for example, where the meaning of a byte depends on the values
of bytes that precede it. If you read an isolated byte of value 91 from
such a file, without knowing whether it's a free-standing character or a
part of a multi-byte sequence or possibly preceded by a "shift-out," you
won't know what that byte value means.

One strategy is to estimate a typical line length of N characters,
seek to 100*N (say) bytes before the end, and start reading from
there. A nice feature of most multi-byte encoding schemes is that they
tend to self-synchronize: You may get misinterpreted garbage for a
while, but things are likely to get back on track eventually. If you
want to get fancy you can apply reasonability tests to what you (think
you've) read, and restart at END-1000*N if things seem unreasonable.
 
A

Arne Vajhøj

Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

In general no.

All the RandomAccessFile tricks are based on assumptions about lines
being separated by something - they do not work with record formats
that contains a line length instead of a delimiter.

If Unix/Linux/Windows/MacOS X is all you need to support then try:

public static String readLastLineUnSup(String fnm) throws IOException {
RandomAccessFile raf = new RandomAccessFile(fnm, "r");
String res = "";
long ix = raf.length() - 1;
for(;;) {
raf.seek(ix);
int c = raf.read();
if(c == '\r' || c == '\n') break;
res = (char)c + res;
ix--;
}
raf.close();
return res;
}

Arne
 
E

Eric Sosman

"Record formats" are not relevant here, nor was someone else's concern
about compressed formats -- the OP clearly said "a text file", by which
is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.

OpenVMS supports many record formats, but the "native" one for
text files is VAR: A two-byte binary count, the payload characters,
and if necessary a padding byte to make the total byte count even.

The "next most native" format is VFC, which is sort of like VAR
except that the first N (fixed) bytes of the payload are metadata
(line numbers, carriage control, ...) instead of line content.

Then come the easy formats: STREAM, STREAM-LF, STREAM-CR, and
FIXED. Oh, yes, and UNDEF; let's not forget UNDEF (although, to be
honest, UNDEF is more commonly used for "binary" than "text" files).

(Strangest text file format I ever ran into used line-*bracketing*
characters: a CR before and an LF after. The rationale for this format
caused me to shake my head and sigh: It was said that as you printed
such a file on a typewriter-like console, possibly with long pauses
between lines for progress messages and the like, then the LF at end-
of-line would move the paper so the print head wouldn't interfere with
reading it. As I said, shake the head.)

In short, all I'm asking is that you delete the word "generally"
because your experience is insufficiently general.
 
L

Lars Enderin

2011-02-24 15:00, Ken Wesson skrev:
[...]
Obsolete systems do not interest me.
then…

Since those days, the world has standardized on ASCII flat files for
text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends). Mac text
files are flat ASCII files (with CR line ends). Unix text files are flat
ASCII files (with LF line ends). And that exhausts 99.99% of the
operating system market share right there, if not more, not counting
smartphones which are all too modern to be using weird legacy formats for
text files.

I can't remember the last time I had to interoperate with any machine
that had anything other than standard ASCII as the native format for text
files. It's gotta be decades.

ASCII character values are limited to the 0-127 range. That's an
outdated "standard".
 
M

Michael Wojcik

Ah, the warm blanket of provincialism.

On the IBM i machines (formerly i Series, formerly System i, formerly
AS/400, successor to the System/3x), using the default filesystem, a
text "file" is actually a series of records in a "member" of a
"physical file". The i operating system hides implementation details,
but access to the contents of the "file" is record-oriented, not
byte-oriented.

In the alternate Hierarchical File System supported by the i machines
for POSIX compatibility, text files are byte-oriented, but usually
EBCDIC, not ASCII.

On IBM and other EBCDIC mainframe systems, there are a variety of
formats for text files, but flat byte-oriented ASCII isn't one of
them, unless you're running Linux.
Obsolete systems do not interest me.

Apparently, neither do prominent ones that you don't happen to know
about. What a surprise.
Since those days, the world has
standardized on ASCII flat files for text files.

Only for sufficiently small values of "the world".
 
J

Jussi Piitulainen

Ken said:
[...]
Obsolete systems do not interest me.
then…

Since those days, the world has standardized on ASCII flat files
for text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends). Mac
text files are flat ASCII files (with CR line ends). Unix text files
are flat ASCII files (with LF line ends). And that exhausts 99.99%
of the operating system market share right there, if not more, not
counting smartphones which are all too modern to be using weird
legacy formats for text files.

I can't remember the last time I had to interoperate with any
machine that had anything other than standard ASCII as the native
format for text files. It's gotta be decades.

I remember when we used a seven-bit character code to write my native
language. We could toggle the way we viewed the character codes where
we had put those characters that were not in ASCII. It was either
brackets and braces or those letters, but never both.

V{nkyr{-{{kk|si{. It's not a happy memory.
 
L

Lars Enderin

2011-02-24 15:19, Jussi Piitulainen skrev:
Ken said:
On 2/24/11 9:06 PM, Ken Wesson wrote:
[...]
Obsolete systems do not interest me.

then…

Since those days, the world has standardized on ASCII flat files
for text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends). Mac
text files are flat ASCII files (with CR line ends). Unix text files
are flat ASCII files (with LF line ends). And that exhausts 99.99%
of the operating system market share right there, if not more, not
counting smartphones which are all too modern to be using weird
legacy formats for text files.

I can't remember the last time I had to interoperate with any
machine that had anything other than standard ASCII as the native
format for text files. It's gotta be decades.

I remember when we used a seven-bit character code to write my native
language. We could toggle the way we viewed the character codes where
we had put those characters that were not in ASCII. It was either
brackets and braces or those letters, but never both.

V{nkyr{-{{kk|si{. It's not a happy memory.

I have the same experience. C code wasn't very readable with "Swedish
ASCII". At least Finnish doesn't use "Ã¥", except when quoting Swedish words.
 
R

RedGrittyBrick

Windows text files are flat ASCII files (with CRLF line ends).

Actually I find that, nowadays, lots of text files on Windows are
so-called 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16
with BOM).

Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode
big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be
CP-1252), Text-Document DOS format (turns out to be CP-850) and
Unicode. No ASCII.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top