ifstream question

James Kanze · Feb 1, 2008

* Thomas J. Gritzan:

He must really like the loop and a half idiom, to use it when
there isn't a half a loop to begin with

.

Actually, in this case, the code is just wrong. The test really
should be:
if ( ¡ infile ) break ;
If the file doesn't end with a '\n', then his code will not "Do
something with value" for the last value. (Of course, if a file
does not end with a '\n', it's not a legal text file. But most
of us would like to be a bit more tolerant, on systems which
allow it.)

As a general rule, any time you see something other than what is
in the FAQ, you can probably assume that the person who wrote it
isn't familiar with iostreams.

The example in the FAQ is not as ugly as this code.

The FAQ code,

int i = 0;
while (std::cin >> x) { // RIGHT! (reliable)
++i;
// Work with x ...
}

is apparently simpler and shorter, because it uses implicit
conversion, it uses a side-effect based expression, it leaves
out a declaration, it exposes local variable to outside code,
and it doesn't check for errors.

It's more than apparently simpler, but...

The most important reason to use it is because it is the
standard idiom. I don't particularly like the side effects in a
condition either, but in this case, the idiom is so ubiquious as
to cause questions to be raised when it isn't used. If it
weren't for this, I'd write:

int i ;
std::cin >> i ;
while ( std::cin ) {
// ...
std::cin >> i ;
}

. You might prefer:

for (;

{
int i ;
std::cin >> i ;
if ( ! std::cin ) {
break ;
}
// ...
}

(even if in this case, it's a bit longer.) But that's not the
issue here. I don't write it like that, however, because as I
said above---any time you see something other than what is in
the FAQ... Most experienced programmers, on seeing such code,
would start by asking if I understood iostream, or if he knew
better, by asking what was special here that caused me to break
with the standard idiom.

Judging from discussions here and elsewhere, few programmers
understand what that code /actually does/ -- and doesn't do.

Judging from discussions here and elsewhere, few programmers
know iostream very well at all. (Not that that stops them from
criticising it.)

To check own understanding, what does (std::cin >> x) as
condition, actually check?

failbit and badbit. The only two bits which are relevant with
regards to the success of an operation.

A more important question is what actually might cause it to be
false. And in a lot of cases, how can you distinguish what the
real reason was, and respond accordingly. And while, unlike
you, I rather like iostream, I'll have to admit that error
reporting is not its strong point; I can construct cases where
you cannot correctly determine the real reason.

James Kanze · Feb 1, 2008

[ ... ]

BUT: O, beautiful synergy: as it happens, the first file
line's three numbers (the rest have five) are not intended
to go into my x,y,z,r,c database, so all I had to do was
read these three and their two char type commas (but _not_
read the newline) _outside_ of the loop, which then gave the
loop exactly what it expected (a char first --the \n-- in
the loop-read) when I went inside the loop to fill the
database with numbers. IOW, it was merely a matter of
which, a number or a character, I read first in the loop.
If the number first, it missed the last number, and if the
char first, it missed the whole file. (I could also have
forced my file_writer_ to stick a comma into position 0 in
the file, but that'd have been a major kludge; ugly and
obviously a very clumsy workaround). Thanks for the more
info, Andy (and Alf), but this seems to have been a case of
a neophyte (me) doing something _so_ obvious and stupid that
the experts here (you included) didn't see what sheer idiocy
I had perpetrated upon myself. (*grin*)

Click to expand...

Click to expand...

I wouldn't be quite so hard on yourself -- things like this
really can be a pain for almost anybody to get right.

Click to expand...

There are other ways that can be a bit easier though. For one
example, when you read numbers from a stream, the stream uses
a locale to actually read the numbers, as well as to classify
the other characters that get read from the stream. When you
read numbers, it treats white- space as delimiters between the
numbers

Click to expand...

You're telling someone whose having problems groking streams to
write a locale!

Actually, I rather find that sort of use of locales an
abuse---the character class is called "space", and not
"separator". I'd much rather write some sort of manipulator to
handle separators. Much more flexible, and much more accurate.
Done correctly, for example, it wouldn't allow multiple
commas to be treated as a single separator. Something like:

std::istream&
separ(
std::istream& source )
{
source >> std::ws ;
if ( source.peek() == ',' ) {
source.get() ;
}
}

This allows either white space alone or a comma with optional
white space to be treated as a separator, e.g.:

std::cin >> x >> separ >> y >> separ >> z ;

will accept "1 2 3", "1, 2 3", "1 2 , 3", etc.

Alf P. Steinbach · Feb 1, 2008

* James Kanze:

He must really like the loop and a half idiom, to use it when
there isn't a half a loop to begin with.

Actually, in this case, the code is just wrong. The test really
should be:
if ( ¡ infile ) break ;
If the file doesn't end with a '\n', then his code will not "Do
something with value" for the last value. (Of course, if a file
does not end with a '\n', it's not a legal text file. But most
of us would like to be a bit more tolerant, on systems which
allow it.)

Yes, I included a disclaimer and also good() versus !fail() has already
been discussed extensively both in this thread and earlier threads.

I just refuse to go around remembering arcane details of iostreams.

They're that ugly.

As a general rule, any time you see something other than what is
in the FAQ, you can probably assume that the person who wrote it
isn't familiar with iostreams.

It's more than apparently simpler, but...

The most important reason to use it is because it is the
standard idiom. I don't particularly like the side effects in a
condition either, but in this case, the idiom is so ubiquious as
to cause questions to be raised when it isn't used. If it
weren't for this, I'd write:

int i ;
std::cin >> i ;
while ( std::cin ) {
// ...
std::cin >> i ;
}

. You might prefer:

for (; {
int i ;
std::cin >> i ;
if ( ! std::cin ) {
break ;
}
// ...
}

(even if in this case, it's a bit longer.) But that's not the
issue here. I don't write it like that, however, because as I
said above---any time you see something other than what is in
the FAQ...

Let's not elevate the FAQ to infallible authority.

Both you and I have contributed to the FAQ (at least I think you have, I
know I have).

So saying our own statements and viewpoints etc. are godlike
authoritative just because they have ended up in the FAQ, it would be a
very circular argument

, and besides, in addition to much great
advice, the FAQ also contains some dubious statements and examples.

Most experienced programmers, on seeing such code,
would start by asking if I understood iostream, or if he knew
better, by asking what was special here that caused me to break
with the standard idiom.

Some idioms are good.

Some are not.

I find it most clear when the textual sequence in the code is the same
as the sequence the operations will occur in, and when there are no or
very few side-effects: a one-to-one mapping from idea to code and back.

The problem is mostly that C++ does not support, syntactically, loops
with exit in middle.

It's the same as C++ not supporting modules: we have to implement them
using rather primitive language constructs, essentially compiling our
ideas from an internal high-level language down to C++, and then to
understand the code, decompiling.

Idioms can help in that translation process, but some idioms hinder that
process -- much like people went on using MS-DOS just because others
did (hey, can't be bad then) and it worked sufficiently to do things.

Judging from discussions here and elsewhere, few programmers
know iostream very well at all. (Not that that stops them from
criticising it.)

failbit and badbit. The only two bits which are relevant with
regards to the success of an operation.

A more important question is what actually might cause it to be
false. And in a lot of cases, how can you distinguish what the
real reason was, and respond accordingly. And while, unlike
you, I rather like iostream, I'll have to admit that error
reporting is not its strong point; I can construct cases where
you cannot correctly determine the real reason.

Cheers,

- Alf

Jerry Coffin · Feb 1, 2008

[ ... ]

You're telling someone whose having problems groking streams to
write a locale!

Yes -- I wasn't paying much (any, really) attention to who it was when I
wrote that. Under the circumstances, it was a poor suggestion.

Actually, I rather find that sort of use of locales an
abuse---the character class is called "space", and not
"separator".

True, but irrelevant. From the iostream's viewpoint, anything the locale
calls a "space" is (quite consistently) used as a "separator". This is
true not only when extracting numbers, but also strings, and so on.

As such, this is a perfectly valid and reasonable use of a locale.

I'd much rather write some sort of manipulator to
handle separators. Much more flexible, and much more accurate.

"Accurate" means it adheres more closely to some defined standard. In
this case, the only standard provided was his original description of
the file format, to which this adhered _perfectly_. As such, no higher
degree of accuracy is possible.

Done correctly, for example, it wouldn't allow multiple
commas to be treated as a single separator.

This is a perfectly reasonable way of fulfilling that requirement IF IT
EXISTS. Claiming that it's more accurate is simply a falsehood in this
case. It's not really more flexible either -- it simply fulfills a
different set of requirements. On one hand, if you want a series of
separators to be translated as a series of empty fields, it fulfills
that requirement quite nicely. OTOH, it makes it much more difficult to
meet a requirement that an arbitrary number separators can be used to
separate any two fields.

Your solution would be extremely good if it was required -- but at least
according to the OP, no such requirement exists.

John Brawley · Feb 1, 2008

Jerry Coffin said:
[ ... ]

You're telling someone whose having problems groking streams to
write a locale!

Click to expand...

Yes -- I wasn't paying much (any, really) attention to who it was when I
wrote that. Under the circumstances, it was a poor suggestion.

Actually, I rather find that sort of use of locales an
abuse---the character class is called "space", and not
"separator".

Click to expand...

True, but irrelevant. From the iostream's viewpoint, anything the locale
calls a "space" is (quite consistently) used as a "separator". This is
true not only when extracting numbers, but also strings, and so on.

As such, this is a perfectly valid and reasonable use of a locale.

I'd much rather write some sort of manipulator to
handle separators. Much more flexible, and much more accurate.

Click to expand...

"Accurate" means it adheres more closely to some defined standard. In
this case, the only standard provided was his original description of
the file format, to which this adhered _perfectly_. As such, no higher
degree of accuracy is possible.

Done correctly, for example, it wouldn't allow multiple
commas to be treated as a single separator.

Click to expand...

This is a perfectly reasonable way of fulfilling that requirement IF IT
EXISTS. Claiming that it's more accurate is simply a falsehood in this
case. It's not really more flexible either -- it simply fulfills a
different set of requirements. On one hand, if you want a series of
separators to be translated as a series of empty fields, it fulfills
that requirement quite nicely. OTOH, it makes it much more difficult to
meet a requirement that an arbitrary number separators can be used to
separate any two fields.

Your solution would be extremely good if it was required -- but at least
according to the OP, no such requirement exists.

The OP (me) has been watching, learning, and using/trying the snippets.

There are me-pertinent situations in which any of the several methods would
be preferred. For example, I have several hard-won Python programs which
both spit and read comma-delimited files, so if I use the shortest form (I
tend to like short), which reads space-delimited lines (and also ignores
newlines, an added plus), I break the Python version's filereaders
(my visuals are all in the Python; no C++/OpenGL visuals in the C++
version yet).

On the other hand, a reader that will take a file with 'confused'
delimitings will make the C++ version compatible with those files and others
not of my own making, while still simplifying the codewriting. On yet
another hand, I can go back and alter all of the Python file-readers and
writers to use the shortest/sweetest of these filereading methods, making
everything compatible with everything else, and still have short/sweet/terse
filereaders in the C++ version, OR I can ignore the Python tools altogether
and write the OpenGL visuals into the C++ version, using the densest
possible filereading code and write my files to suit it.

I note all this to point up the fact that for such a minor issue, so many
different potential solutions exist amongst y'all, that it's like
approaching a smorgasbord for me: I can pick whichever method I like best
and use that, while anybody else reading the thread can learn oodles about
all of this.

You are a resource of inestimable value.
The amount of trouble I ran into --info-seeking on the 'net and such--
trying to make this filereader (when I had had nowhere near the same amount
of trouble doing the same in Python) almost turned me off to what I was
trying to do. Here, though, your comments and discussion pointed repeatedly
right TO the answers that I needed.

Feel good, guys. You _are_ good.
Thanks.

James Kanze · Feb 1, 2008

[ ... ]

I'd much rather write some sort of manipulator to
handle separators. Much more flexible, and much more accurate.

Click to expand...

"Accurate" means it adheres more closely to some defined standard. In
this case, the only standard provided was his original description of
the file format, to which this adhered _perfectly_. As such, no higher
degree of accuracy is possible.

Accurate may not be the right word, but it didn't seem to me
that he wanted to have more than one comma handled as a single
separator. It looked more like he was dealing with a CSV
format.

This is a perfectly reasonable way of fulfilling that
requirement IF IT EXISTS. Claiming that it's more accurate is
simply a falsehood in this case. It's not really more flexible
either -- it simply fulfills a different set of requirements.

The general approach is more flexible, in that it allows for a
greater degree of variation in defining your separators (and
signaling an error if a separator isn't conform).

On one hand, if you want a series of separators to be
translated as a series of empty fields, it fulfills that
requirement quite nicely. OTOH, it makes it much more
difficult to meet a requirement that an arbitrary number
separators can be used to separate any two fields.

It's pretty trivial to add a loop on the comma.

Your solution would be extremely good if it was required --
but at least according to the OP, no such requirement exists.

That wasn't the way I interpreted what he said, but of course,
he didn't present an exact specification. In the end, it
probablyl depends on the wording in the specification: if the
specification says that commas are to be treated as white space,
then a new locale is a reasonable solution; if the specification
defines separators, then I think a manipulator which eats
separators is more appropriate. (I also think that a beginner
will reach the level of being able to write such manipulators
much sooner than being able to write a new locale. I know that
I have problems went I need to implement a new locale, and I'm
hardly a beginner.)

James Kanze · Feb 1, 2008

* James Kanze:

[...]

I just refuse to go around remembering arcane details of iostreams.

They're that ugly.

The naming conventions and the error handling are bad, I'll
grant you that. And of course, IF you use the "standard" idiom,
you don't have to remember them.

[...]
[...]

Let's not elevate the FAQ to infallible authority.

Both you and I have contributed to the FAQ (at least I think
you have, I know I have).

If it were only the FAQ, and other experts in iostream were
saying other things... But all of the people I know who are
really competent in iostream use the same idiom.

So saying our own statements and viewpoints etc. are godlike
authoritative just because they have ended up in the FAQ, it
would be a very circular argument , and besides, in
addition to much great advice, the FAQ also contains some
dubious statements and examples.

I know. (As it happened, this idiom predates the time I learned
C++, so I can't claim any responsibility for it.)

Some idioms are good.

Some are not.

More precisely: the fact that something is an established idiom
is a positive point for it. Other, negative points, can
outweigh that. In this case, although I'd have done something
different if the idiom didn't exist, the bad points of the idiom
are small enough that they don't outweigh the advantage of using
the idiom.

ifstream	5	Jul 19, 2009
Can ifstream read file more than 2G?	4	Jun 5, 2009
An ifstream question	0	Apr 19, 2010
ifstream errors	15	Mar 27, 2008
ifstream	5	Aug 2, 2009
ifstream and format issues	2	Dec 29, 2005
multi-thread and ifstream	1	Dec 9, 2009
copy std::cin to a ifstream	3	Jun 23, 2007

ifstream question

James Kanze

James Kanze

Alf P. Steinbach

Jerry Coffin

John Brawley

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads