Reading LAST line from text file without iterating through the file?

E

Eric Sosman

You know, that's what you can expect when you are unpleasant, nasty, and
rude about things -- other people display a curious unwillingness to
listen to anything you have to say. An old adage comes to mind --
something about honey and vinegar?

(It doesn't help when your "counterexamples" are obscure formats used on
dinosaurian machines of yesteryear; the fact is that text files with CR/
LF line delimiters are standard on a set of operating systems that have
the overwhelming majority of the market share for such these days.)

Interesting. On the one hand he retreats from his earlier claim
that "ASCII" encoding is universal, and on the other he advances the
notion that CR/LF is The One True Delimiter. So, which hand advances
and which retreats? Is he spinning clockwise or counterclockwise?
Well, maybe his rotation will make him a sort of human eggbeater,
better at mixing the vinegar with the honey. (Ugh.)
 
T

Tom Anderson

Yes.

There are also count prefix (and sometimes suffix) formats.

Which, although they may be used to store text, are not text files.
They have the advantage of begin able to actually have
all possible values in lines.

True. I wish we used more formats like this.

tom
 
T

Tom Anderson

The biggest chunk of IBM's revenue is services.

But they still sell a lot of big iron.

Do they actually sell them? What happened to the leasing model?

tom
 
A

Arne Vajhøj

Do they actually sell them? What happened to the leasing model?

Good question.

I don't know if they sell or lease them out.

IBM deliver boxes to customers and get a ton of money
in return.

Arne
 
A

Arne Vajhøj

Which, although they may be used to store text, are not text files.

Of course they are text files.

If I edit Foobar.java in a text editor and write a Java program
and saves it, then why should it be less of a text file, because
the record format used on that system is not delimited?

Arne
 
M

Martin Gregorie

I doubt that. He may have correctly pointed out that the vast *majority*
of computers were PCs at the time. (Now, laptops and smartphones may
have the slight edge, or perhaps even server blades, now that typical
servers are racks full of small computers instead of single big
computers.)

If he did claim they *all* were then he was an idiot.

He did and he was.
 
A

Arne Vajhøj

They are not, since files in record formats are not text files.

Given that:
data + LF
data + CR + LF
are alo record formats then that is nonsense.
Other character sets mostly intersect in ASCII. Nearly all in any kind of
widespread use intersect in using characters 10 and 13 as the potential-
line-end characters. And "other record formats" are not relevant in a
discussion of text files, as has been explained already.

As has been proven not to be the case.

Arne
 
A

Arne Vajhøj

On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:

On 2/24/11 9:06 PM, Ken Wesson wrote:
[...]
Obsolete systems do not interest me.

then…

Since those days, the world has standardized on ASCII flat files for
text files.

LOL!

Windows text files are flat ASCII files (with CRLF line ends).

No.

They are CP-1252, UTF-8 or UTF-16.

All of which are ASCII++, for all intents and purposes.

This is an IT group.

Not a group for hairdressers or chefs.

This mean that we use exact terms.

ASCII is a very well defined standard specified by ANSI and ISO.

There are no such thing as ASCII++.
Which are ASCII++, for all intents and purposes.


Nonsense. There are *at least* ten thousand PCs running Windows for every
one machine running one of those operating systems.

Ten thousand *PCs running Windows*.

The PC/mainframe ratio is probably like 100000:1.

But the relevance is not that big. Because mainframes happen
to be a lot more expensive than PC's.
I work with what nearly everyone in the field works with these days: a
mix of Unix, MacOS, and Windows, mainly Unix server blades whose services
are accessed by mainly Windows desktop/netbook users with a smattering of
Mac users and a small but growing contingent of smartphone users.

The you won't have any users using ASCII.

Arne
 
M

Martin Gregorie

If by "very common" you mean used on one in ten thousand or fewer of
their computers. For every single z/OS machine in corporate America
there are probably a thousand blade servers and ten thousand office PCs
and employer-provided laptops and God alone knows how many employee
smartphones with plans and/or handsets paid for by their company.
By that standard PCs, in which lets include desktops and laptops, are
also a tiny small proportion of all computers once you count phones and
all the embedded computers in vehicles.

IMO its a silly argument because very many PCs are used for only a small
part of the day and do very little apart from using electricity and
occasionally receiving and sending a few e-mails. A better measure is the
number of transactions and documents handled by each machine per year.
 
A

Arne Vajhøj

If you're counting it that way, that's 3 places. Hardly "lots". :)

I have news for you - the number of business entities in those
3 sectors are a lot higher than 3.

We already understand that you have no knowledge about businesses.

But I assume that you have seen a world map. You are no aware
that other countries has public sectors??
See other posts. Perhaps a collected few tens of thousands of computers
using museum-worthy OSes like those versus a collected *billion* or more
of machines running Windows, MacOS, iOS, Android, and Unix.

There are also more flies than humans on earth.

That does not make flies more important.
If by "widely used" you mean on one in ten thousand or fewer computers.

But a lot more in revenue.
If by "very common" you mean used on one in ten thousand or fewer of
their computers. For every single z/OS machine in corporate America there
are probably a thousand blade servers and ten thousand office PCs and
employer-provided laptops and God alone knows how many employee
smartphones with plans and/or handsets paid for by their company.

And?

If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$,
then it is 2/3 mainframe.
A somewhat scary thought, but hardly relevant unless you're trying to
stir up enough public alarm to foment a general movement to replace these
legacy systems with more modern ones.

It is relevant because the point is that most of the world
important data are processed by mainframes.

Some claim 80% of all financial data is stored on mainframe.

Sure they can be replaced. 10-20 years and 10-20 trillion dollars.

Arne
 
A

Arne Vajhøj

How fortunate that i runs on fewer than one in ten thousand machines.
Does Java even run on i?
Yes.


Both contain ASCII as a subset -- if you take a pure-ASCII file and
reencode it in either the result is the identical byte sequence.

Yes, but that does not change that they do not use ASCII. They
use ISO-8859-1 or UTF-8.

Arne
 
M

Martin Gregorie

They are not, since files in record formats are not text files.


Your personal opinions of others are not the topic of this newsgroup. Do
you have anything Java-related to say?


Other character sets mostly intersect in ASCII. Nearly all in any kind
of widespread use intersect in using characters 10 and 13 as the
potential- line-end characters. And "other record formats" are not
relevant in a discussion of text files, as has been explained already.
Bad argument: a text file contains records. They are variable length
records with a 'newline' encoding as the delimiter.

BTW, you can use C to handle iSeries text files through the usual gets()
and puts() functions despite the iSeries holding text in what are
effectively database rows. They have three fields per row - a line
number, a fixed length text field and an 8 byte ID. The latter is
equivalent to the way the last few columns of punched cards were often
used. I don't know why an OS/400 text file would need an ID field, but
its there. The reason that C's standard text handling works on these
files is down to the standard library, which is written to inter-convert
between C's internal null delimited string representations of lines and
the external fixed field representation.

Getting back on topic, I haven't used Java on an OS/400 but its available
and will almost certainly work the same way and, in addition, will
probably manage the mapping between EBCDIC and Unicode. It has to or it
would break WORA.
 
M

Michael Wojcik

Tom said:
Which, although they may be used to store text, are not text files.

And with what do you support your claim for this definition of "text
file"?

I hope it's something more solid than KW's flailing appeals to
"notion" and the like, which are unsupported by contemporary or
historical uses of the term "text", in the computing disciplines or
more broadly. Have you something better to offer?
 
M

Michael Wojcik

Ken said:
I doubt that. He may have correctly pointed out that the vast *majority*
of computers were PCs at the time. (Now, laptops and smartphones may have
the slight edge, or perhaps even server blades, now that typical servers
are racks full of small computers instead of single big computers.)

Embedded computers have a huge majority over all general-purpose
computers, by orders of magnitude, if we're counting CPUs. The line
between "smartphones" and other mobile phones is fuzzy; but among
computing devices that support at least some general-purpose
applications (as opposed to dedicated controllers), phones are far and
away in the majority, by number of CPUs.

In other words, wrong again, Ken.
 
M

Michael Wojcik

Ken said:
Calling other people names is hardly what I would call "generosity"

No, you wouldn't.
nor
is polluting a newsgroup with off-topic traffic.

Unlike polluting a newsgroup with ignorance and dull repetition, eh?
Your personal opinions of others are not the topic of this newsgroup.

Actually, they are. Check the charter.
Do you have anything Java-related to say?
Yes.


Ah, must be vendor lockin. Sucks to be them. Soon they'll be outcompeted
by newer, nimbler firms that use modern things like the free Unixes on
commodity hardware.

Yes, soon, no doubt. O glorious day, when we are ushered into the Age
of Wessonism! Free unicorns for all!
Of course, they might last a while if they can keep
convincing the government to give them "bailouts" or other protectionist
help in the face of competitors and their own screwups.

Careful, Ken - you'll short out your keyboard with all that spittle.
Still, your "thousands" of organizations are outweighed by the *hundreds*
of thousands that don't use such systems

No, they aren't. But do let us know when your cognitive abilities pass
beyond counting.

(On second thought - don't.)
have no bearing on this discussion, which has to do with the majority of
*computers*

No, it doesn't. You don't have the power to determine what the
discussion is about; it's about whatever the participants - all the
participants - decide to discuss.

I'm pleased to see that my prediction of your failure to learn was
right on the money.
 
A

Arne Vajhøj

Actually, "in general" tends to have some kind of implicit scope that is
usually less than "all platforms". For instance, when discussing a Java
solution, we can exclude platforms that Java doesn't run on.

True.

But Java do run on some of these platforms.
It probably runs on all platforms Java is normally used on. It certainly
runs on 99.99% or more of the machines anyone is likely to run Java on,

If you are counting machines: yes.

If you are counting dollars: no.
AND the remaining less than .01% are ones sufficiently oddball that their
operators will *know* to expect common crossplatform software to often
break on them. Typical C code using I/O will probably not work on such
machines without heavy modification, even C code that compiles and works
fine on every POSIX-compliant system and every Windows box and most other
machines.

C code just like Java code works if the code according to the
standard has well defined behavior.

But this functionality is not guaranteed to work in C either.

fgetpos and fsetpos do not work on offsets but on an opague type
that can contain more than offset.

fseek and ftell work on offsets for binary files, but for text
files it is opaque.

POSIX/SUS then adds lseek, which will either work with
offsets or return an error.
Hell, these machines may not even be able to represent C source
trees normally, requiring the compiler vendor to jump through hoops and
requiring unusual tools and IDEs be used to hack C sources and not just
the system text editor.

Text editors are by definition able to create text files and source
code is text files.

Try think logical.
Hell, I wouldn't be surprised if there were no
working C implementations on some of these systems

They do have C.
-- and I'd be
surprised if many, if any, of them ran Java at all, let alone had a fully
compliant JavaSE 5/6 implementation.

I am not surprised that you would be surprised - you don't seem to know
much about systems.

z/OS, i and OpenVMS all has certified Java versions.
On the contrary, whether software works on platforms that interest its
developer and user base is 100% relevant and whether it works on
platforms that *don't* interest its developer and user base is irrelevant.

No.

Not if the discussion is about general usage.

And it is bad Java programming to write code that only works
on some Java platforms even though the expectation is that the
program will only be used on platforms where it do work.
All ASCII supersets. Which means the common denominator among all those
is ... ta-da! ASCII. :)

That does not make them use ASCII.
And hardly anyone uses IBM mainframe (sic). What was that figure again?
0.01% of all computers?

I think the number was 80% of financial data.
:)
But many that use ASCII.

Very few.

Most support ASCII because they use something that
is compatible with ASCII.

Arne
 
A

Arne Vajhøj

Funny that something so "completely different" intersects with ASCII in
the entirety of ASCII's range (0-127). It just specifies what 128-255
mean instead of leaving those values undefined. Unicode specifies what
128-65535 mean and still intersects with ASCII on 0-127.

It is still a different char set.

Arne
 
A

Arne Vajhøj

Technically they are, since the various more recent standards they use
contain ASCII as a subset and generally reduce to ASCII if you strip the
high bit off (code pages) or the high byte and highest remaining bit (16-
bit encodings). So they are using ASCII and sometimes some additional
stuff that encloses and contains ASCII.

They are not using ASCII.

They are using something that is backwards compatible
with ASCII.

Arne
 
A

Arne Vajhøj

It contains ASCII as a subset.

So it is ASCII. And more.

The makes it not ASCII.

A Java 1.6 app is not a Java 1.2.2 app just because
some of the functionality were present in Java 1.2.2
as well.
Record formats are not relevant here, since text files do not have record
formats;

Lines are a record format.
they are raw sequences in some character set more or less by
definition. Anything with additional structure over and above that is
something other than a text file. Generically we call such things "binary
files" though commonly binary files do *contain* text. But all contain
additional structure that cannot be represented in, say, a
java.lang.String without resort to some form of escaping or encoding. And
that makes them not pure text, but text-and-some-other-stuff or some-
other-stuff-that-happens-to-contain-text.

Not true.

Which you can easily verify by having a Java program read
such a file, those lines are read fine into a String.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top