Reading LAST line from text file without iterating through the file?

J

Jim Janney

Ken Wesson said:
Who asked you for your opinions of others here?


You're one to talk about provincialism. Who the hell uses these ancient
museum pieces any more?

Um, that would be me, or rather my employer's customers.

Indirectly, anyone who has an account with a bank or credit union is
likely using an EBCDIC-based machine. There are some that don't, but
it's not the way to bet.
 
T

Tom Anderson

Yes, and it was a good explanation. Unfortunately, I don't think he
understood the explanation, nor do I think he will understand further
clarification. I think it more likely that the harder anyone tries to
explain to him these points, the more dug in his heels will be.

To do otherwise would necessarily require an admission that there's no single
"text file" format, and that even if there were, ASCII or any of the
single-byte derivatives thereof ain't it. I don't see any way such an
admission would ever be produced.

There is a single text file format: lines of characters in some encoding,
terminated by an end-of-line sequence which is distinguishable from any
other characters.

It's merely the case that some current mainframes, and some obscure or
historical systems, do not store text in text files!

tom
 
M

Martin Gregorie

There is nothing at all prominent about those IBM dinosaurs. They may
have been prominent 30 years ago, but not now.
You know, you sound exactly like a character who surfaced in a Y2K
newsgroup back in 1998/99. He refused to believe that any computers apart
from PCs were in use at the time.
 
M

Michael Wojcik

Ken said:
Who asked you for your opinions of others here?

No one. I offer them out of sheer generosity. No thanks are necessary.
In the twenty years I've been on Usenet, I've found offering my
opinions on the local idiots to be immensely useful. At least to me.
You're one to talk about provincialism. Who the hell uses these ancient
museum pieces any more?

Thousands of organizations, which is why they still enjoy healthy sales.
There is nothing at all prominent about those IBM dinosaurs. They may
have been prominent 30 years ago, but not now.

Tell that to the many thousands of organizations that still use them.

And the majority of business transactions still runs on IBM mainframe
and midrange systems, and similar offerings from other companies.

IBM had just shy of $100B in sales last year. A good chunk of that was
from mainframes: mainframe sales were up 68% from 2009, to the best
level in six years. MIPS capacity (mainframe processing capacity owned
by customers) rose 58%, and IBM acquired a couple dozen new mainframe
customers - businesses that bought their first mainframes.[1]

As usual, you don't know what the hell you're talking about, and
clearly can't be bothered to do even a moment of research before
posting something else that demonstrates your ignorance. Not that
you'll learn anything from this exchange, either, I suppose.


[1] http://www.theregister.co.uk/2011/01/18/ibm_q4_2010_numbers/
 
L

Lew

Actually I find that, nowadays, lots of text files on Windows are so-called
'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with BOM).

Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode
big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be
CP-1252), Text-Document DOS format (turns out to be CP-850) and Unicode. No
ASCII.

Windows hasn't used ASCII in decades.
 
A

Arne Vajhøj

"Record formats" are not relevant here,

They are - because the record format determines whether RandomAccessFile
has a chance of working or not.
nor was someone else's concern
about compressed formats -- the OP clearly said "a text file", by which
is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.

That is probably true among non IT pros.

But this group is for IT pros.

They know that there are other character sets and other
record formats.

Arne
 
A

Arne Vajhøj

OpenVMS supports many record formats, but the "native" one for
text files is VAR: A two-byte binary count, the payload characters,
and if necessary a padding byte to make the total byte count even.

Yep.

NOS/VE (it may not be relevant here because I don't think there
exists a Java for NOS/VE) used 6 byte length + data + 6 byte length.

The trailing 6 byte length made it possible to securely read the
file backwards which the VMS format does not.

Arne
 
A

Arne Vajhøj

Obsolete systems do not interest me.

Whether a solution works in general or not depends on whether
it is guaranteed to work on all platforms or not.

The RandomAccessFile and search for CR and LF does not.

Whether it works on platforms that interest you are completely
irrelevant.
Since those days, the world has
standardized on ASCII flat files for text files.

Not really.

Windows uses CP-1252, UTF-8 and UTF-16
Unix/Linux/VMS uses ISO-8859-1 and UTF-8
IBM mainframe uses EBCDIC

There are really very few systems today that uses just ASCII.

Arne
 
A

Arne Vajhøj

[...]
Obsolete systems do not interest me.
then…

Since those days, the world has standardized on ASCII flat files for
text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends).

No.

They are CP-1252, UTF-8 or UTF-16.
Mac text
files are flat ASCII files (with CR line ends). Unix text files are flat
ASCII files (with LF line ends).

No.

They are ISO-8859-1 or UTF-8.
And that exhausts 99.99% of the
operating system market share right there, if not more,

No.

z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
I can't remember the last time I had to interoperate with any machine
that had anything other than standard ASCII as the native format for text
files. It's gotta be decades.

Possible that you only work with 20+ year old Unix and OpenVMS
systems with 7 bit VT100 access.

But that is not very common.

Arne
 
A

Arne Vajhøj

There is a single text file format: lines of characters in some
encoding, terminated by an end-of-line sequence which is distinguishable
from any other characters.

It's merely the case that some current mainframes, and some obscure or
historical systems, do not store text in text files!

No.

There are also count prefix (and sometimes suffix) formats.

They have the advantage of begin able to actually have
all possible values in lines.

And the disadvantage of various hacks assuming all records
use delimiters does not work.

Arne
 
A

Arne Vajhøj

Well, these days we use the 8th bit for accented characters instead of
just wasting it.

Then it is not ASCII.
Technically it's not your granddaddy's ASCII with that
in use, but it's close enough for government work, and certainly close
enough not to mess with using tests for CR/LF to detect line boundaries.

The character set and the record format are independent of each other.

Arne
 
A

Arne Vajhøj

2011-02-24 15:19, Jussi Piitulainen skrev:
Ken said:
On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:

On 2/24/11 9:06 PM, Ken Wesson wrote:
[...]
Obsolete systems do not interest me.

then…

Since those days, the world has standardized on ASCII flat files
for text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends). Mac
text files are flat ASCII files (with CR line ends). Unix text files
are flat ASCII files (with LF line ends). And that exhausts 99.99%
of the operating system market share right there, if not more, not
counting smartphones which are all too modern to be using weird
legacy formats for text files.

I can't remember the last time I had to interoperate with any
machine that had anything other than standard ASCII as the native
format for text files. It's gotta be decades.

I remember when we used a seven-bit character code to write my native
language. We could toggle the way we viewed the character codes where
we had put those characters that were not in ASCII. It was either
brackets and braces or those letters, but never both.

V{nkyr{-{{kk|si{. It's not a happy memory.

I have the same experience. C code wasn't very readable with "Swedish
ASCII". At least Finnish doesn't use "Ã¥", except when quoting Swedish words.

Good old ISO 646 NRC.

Horrible by today's standards.

But back then it was what we had.

Arne
 
A

Arne Vajhøj

That's why we now actually use that 8th bit for something useful, if need
be.

Well - you are the one that has been claiming that everybody is using
a 7 bit standard (ASCII) today.

Arne
 
A

Arne Vajhøj

Windows hasn't used ASCII in decades.

I don't think it ever have.

DOS used CP-437, CP-850 etc..

32/64 bit Windows uses CP-1252 (which is practically the
same as ISO-8859-1) and some UTF-16.

..NET added UTF-8.

I don't remember 16 bit Windows, but I am pretty sure
that it did not use ASCII.

Arne

PS: CP-850 and CP-1252 is for western countries - other
countries uses other char sets.
 
A

Arne Vajhøj

Ah, the warm blanket of provincialism.
Yep.


On the IBM i machines (formerly i Series, formerly System i, formerly
AS/400, successor to the System/3x), using the default filesystem, a
text "file" is actually a series of records in a "member" of a
"physical file". The i operating system hides implementation details,
but access to the contents of the "file" is record-oriented, not
byte-oriented.

And it is a pretty good guess that the RandomAccessFile searching
for CR and LF will fail on i also then.
In the alternate Hierarchical File System supported by the i machines
for POSIX compatibility, text files are byte-oriented, but usually
EBCDIC, not ASCII.

On IBM and other EBCDIC mainframe systems, there are a variety of
formats for text files, but flat byte-oriented ASCII isn't one of
them, unless you're running Linux.

Linux will be either ISO-8859-1 or UTF-8 not ASCII.

Arne
 
A

Arne Vajhøj

Who asked you for your opinions of others here?

Anyone posting to usenet gives the entire world the
opportunity to comment on them.

The smart people try to post something smart.
You're one to talk about provincialism. Who the hell uses these ancient
museum pieces any more?

Lots of places.

Retail sector, public sector, financial sector
There is nothing at all prominent about those IBM dinosaurs. They may
have been prominent 30 years ago, but not now.

Both z/OS and i are widely used today.
Fine, then -- corporate America and home computers in America then.

OK - neither z/OS or i are common on home computers.

But they are very common in corporate America.

If all z/OS systems disappeared over night then everything
would break down, because so many critical systems are
running on them.

Arne
 
A

Arne Vajhøj

Tell that to the many thousands of organizations that still use them.

And the majority of business transactions still runs on IBM mainframe
and midrange systems, and similar offerings from other companies.

IBM had just shy of $100B in sales last year. A good chunk of that was
from mainframes: mainframe sales were up 68% from 2009, to the best
level in six years.

The biggest chunk of IBM's revenue is services.

But they still sell a lot of big iron.

The don't publicize numbers at the OS level, but I would guess that
at least 10 B$ was mainframe HW & SW.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top