Why does my simple while loop printf 2x?


B

BartC

Ben Bacarisse said:
Fine. I do it the other way unless there is a compelling reason not to,
but that is because pretty much all the parsers I've written want to
ignore newlines -- they operate on token streams. Reading lines would
just add an irrelevant detail.

(The parsers I write now tend to read an entire file into memory (although
even that is apparently also fraught with problems if you read this
newsgroup too much).

Newlines are significant in the source syntax, but have a more indirect
affect on the resulting token stream. This kind of parser (for language
source) is for a stricter, more precise syntax than might be used for
processing command-line input.

A few years I might have pulled in the source code a line at a time, but now
the largest file I might typically want to process would occupy around
0.0025% of the installed ram I happen to have in my PC.

Of course you can't read a file-at-a-time for interactive input;
line-at-a-time is the next best thing, which exactly matches the
line-buffering used by the language runtime.)
 
Ad

Advertisements

I

Ian Collins

Ben said:
Fine. I do it the other way unless there is a compelling reason not to,
but that is because pretty much all the parsers I've written want to
ignore newlines -- they operate on token streams. Reading lines would
just add an irrelevant detail.

Which just goes to show there is no correct answer!

The stuff I have been doing recently is parsing the output of command
line tolls, which obviously is designed to be human readable so line
breaks are significant. I also have parsers for JSON and XML, which
operate on token streams and are character based.
 
K

Keith Thompson

BartC said:
An input error where? You're now mandating that an input file *must* end
with a newline otherwise Y<newline> or N<newline> will never occur. With
line input, these untidy details can be taken care of in one central
location, which can always deliver a line buffer with a well-defined ending
(0 in my case, perhaps always \n in others), instead of having to worry
about it in a dozen places.

Sorry, I though you were saying that would be an input error.
If you're going to worry about that all the time, then you will never get
anywhere.

If you're going to worry about that all the time, you just might write
robust code that doesn't misbehave in the presence of unexpected input.
But you could do everything just right, and the OP's code could still be
given input that consists of hundred billion lines all containing "?\n"
(until perhaps the last one which might have Y or N). It'll work, but might
take a few years to complete. (And I'm still uncertain about being tolerant
about all those "?" lines by allowing another chance to get it right, in a
situation where that is clearly inappropriate because there is no human to
take heed of the error message.)

You can define whatever input requirements you like (or rather, the OP
or his instructor can). Different requirements call for different code.
For some requirements, you need to store each entire line in memory; for
others, you don't.
 
K

Keith Thompson

BartC said:
(The parsers I write now tend to read an entire file into memory (although
even that is apparently also fraught with problems if you read this
newsgroup too much).
[...]

I think it's safe to say that any problems with reading entire files
into memory are entirely independent of whether you read this newsgroup.
 
I

Ike Naar

// if expecting n and get n or blank, return false
if ((yn.compare(0, 1, "n") == 0 && str.compare(0, 1, "") == 0) ||
str.compare(0, 1, "n") == 0)
return false;
else
return false;

This can be simplified to

return false;
 
B

BartC

Keith Thompson said:
BartC said:
(The parsers I write now tend to read an entire file into memory
(although
even that is apparently also fraught with problems if you read this
newsgroup too much).
[...]

I think it's safe to say that any problems with reading entire files
into memory are entirely independent of whether you read this newsgroup.

I'm sure it's only in this newsgroup that the most unlikely problems are
highlighted.

For example, you can't read a 20-line configuration file into memory without
worrying about how to obtain the size of the file, whether it will fit into
memory, whether the file's size will change between determining the size,
and reading it in (so you can only read it safely a character at a time)
etc.

In fact I don't know if there's any other language group that would
seriously discuss the possibility of exhausting memory (on a machine that
has more memory than existed in the whole world a few decades ago) while
reading a simple response from a keyboard!
 
Ad

Advertisements

B

Ben Bacarisse

BartC said:
In fact I don't know if there's any other language group that would
seriously discuss the possibility of exhausting memory (on a machine that
has more memory than existed in the whole world a few decades ago) while
reading a simple response from a keyboard!

People intent on causing trouble are looking for code written with that
point of view.
 
K

Keith Thompson

Ben Bacarisse said:
People intent on causing trouble are looking for code written with that
point of view.

Exactly. Code that expects keyboard input but crashes when fed a
large file, the contents of /dev/zero (which acts like an infinitely
large file full of zero bytes), or when your cat sits on the keyboard
is unacceptably fragile, regardless of what language it's written in.
 
O

Osmium

Keith Thompson said:
Exactly. Code that expects keyboard input but crashes when fed a
large file, the contents of /dev/zero (which acts like an infinitely
large file full of zero bytes), or when your cat sits on the keyboard
is unacceptably fragile, regardless of what language it's written in.

Would you post your suggested solution for the infinitely large file of zero
bytes?
 
B

BartC

Keith Thompson said:
Exactly. Code that expects keyboard input but crashes when fed a
large file, the contents of /dev/zero (which acts like an infinitely
large file full of zero bytes), or when your cat sits on the keyboard
is unacceptably fragile, regardless of what language it's written in.

On my machine it would take about 5 years to fill up the memory if the cat
sits on the keyboard. The chances are something else will happen before
then.

I don't think anyone is suggesting programs should just crash. Just that
they ought to specify certain operating limits. At least then no-one will be
waiting around for years before noticing something is amiss.

And if someone wants to cause trouble by somehow arranging for /dev/zero to
be the input to a program expecting a certain response (and will therefore
go into an endless loop), what can the programmer do about it? Being
intolerant of abnormally long inputs can be one way of dealing with it.
 
Ad

Advertisements

B

Ben Bacarisse

Osmium said:
Would you post your suggested solution for the infinitely large file of zero
bytes?

For the case that started all this (reading an interactive response), I
think he has. If you have some other case in mind, the details matter.
There is no universal solution.
 
O

Osmium

"Ben Bacarisse" wrrote:
For the case that started all this (reading an interactive response), I
think he has. If you have some other case in mind, the details matter.
There is no universal solution.

I was hoping to get his very favorite answer. One of his answers was to
redefine it away.
 
B

Ben Bacarisse

BartC said:
On my machine it would take about 5 years to fill up the memory if the
cat sits on the keyboard. The chances are something else will happen
before then.

I don't think anyone is suggesting programs should just crash. Just
that they ought to specify certain operating limits. At least then
no-one will be waiting around for years before noticing something is
amiss.

And if someone wants to cause trouble by somehow arranging for
/dev/zero to be the input to a program expecting a certain response
(and will therefore go into an endless loop), what can the programmer
do about it?

By writing the simplest and most obvious bit of code! There is no
actual programming problem here. An endless loop is not a problem.
Putting the data from an endless loop into some buffer is a problem.

Look at what you said:

In fact I don't know if there's any other language group that would
seriously discuss the possibility of exhausting memory (on a machine
that has more memory than existed in the whole world a few decades
ago) while reading a simple response from a keyboard!

If the program does not consume memory, no one cares. You seemed to
object to the absurdity of considering the possibility as if it either
could not happen, or was never a problem when it did. You are clearly
mocking the very idea of discussing of the issue in these days of
multi-GB machine. Why is it unreasonable to caution people about
writing code that can have these issues?

It was in C++, but someone did post a problematic bit of code. You can
trivially re-write it to avoid the problem. That seems to me to be an
entirely reasonable exchange.
Being intolerant of abnormally long inputs can be one way
of dealing with it.

Yes. How do you do it?
 
B

Ben Bacarisse

Osmium said:
"Ben Bacarisse" wrrote:


I was hoping to get his very favorite answer. One of his answers was to
redefine it away.

I don't know what this means, but the code was posted. Was it not what
you expected or wanted?
 
J

James Kuyper

According to this post from Keith, there might not be a '\n' character
following:

The original code never stopped for any reason other than the arrival of
the desired input, so I presume he'd want to continue doing the same
thing even if he changed what the desired input was. That makes it
trivial to deal with if you're not trying to store the entire line in
one buffer:

If the first character after a 'Y' or an 'N' is an '\n', one of the
desired patterns has been read, and the program should break out of the
loop. Otherwise, just keep reading and discarding one character at a
time until a '\n' is read, and then resume scanning for one of the
desired patterns. It there's never a following '\n', then it will keep
reading indefinitely, which seems consistent with the design of the
original program. Personally, I'd recommend exiting the program if
there's an I/O error or end-of-file is detected - doing so won't
complicate the code significantly.
 
Ad

Advertisements

K

Keith Thompson

Osmium said:
Would you post your suggested solution for the infinitely large file of zero
bytes?

My suggested solution is not crashing.

I have nothing more specific than that. The specific behavior of a
given program when given overly large input depends on the requirements
for that program.

In some cases, it might be acceptable to say that the behavior is
undefined (not necessarily in the specific way the C standard uses the
phrase "undefined behavior"). In such cases, crashing might be
acceptable.
 
K

Keith Thompson

BartC said:
On my machine it would take about 5 years to fill up the memory if the
cat sits on the keyboard. The chances are something else will happen
before then.

I don't think anyone is suggesting programs should just crash. Just
that they ought to specify certain operating limits. At least then
no-one will be waiting around for years before noticing something is
amiss.

Unix has a number of utilities designed to process text, where "text"
consists of a sequence of lines, each line terminated by a newline
character. "grep" is a good example.

Many of the original Unix implementations imposed arbitrary limits on
the length of a line, and would misbehave in various ways if fed input
that exceeded those limits.

One of the changes made by the GNU replacements for those utilities (see
the "coreutils" package) is to remove those limits, allowing input lines
to be arbitrarily long, subject only to memory resources.

I find that extremely useful. Text is not limited to human-readable
text. In some contexts, it can be perfectly reasonable to have a text
file with lines thousands, or even millions, of lines long. I don't
want my text-processing tools telling me that I don't *need* lines
longer than, say, 1024 characters.
And if someone wants to cause trouble by somehow arranging for
/dev/zero to be the input to a program expecting a certain response
(and will therefore go into an endless loop), what can the programmer
do about it? Being intolerant of abnormally long inputs can be one way
of dealing with it.

I just tried it:

$ grep hello /dev/zero
grep: /dev/zero: Cannot allocate memory

It failed because grep *does* have to store an entire line in memory.
The program we've been discussing, which reads single-character 'Y' or
'N' input, does not.

If you want to read entire lines into memory, and impose some limit on
how long those lines can be *and* document the program's behavior when
that limit is exceeded, you can certainly do so. But it's not necessary
in the case we're discussing. With the code I posted, you don't need to
set a limit on the length of a line. On the other hand, if you decide
to do so anyway, you can add a few lines of code to count characters as
they're entered.

Storing entire lines creates problems. There are several ways to solve
those problems. One way is to avoid them by not storing entire lines.
 
B

BartC

Ben Bacarisse said:
By writing the simplest and most obvious bit of code! There is no
actual programming problem here. An endless loop is not a problem.
Putting the data from an endless loop into some buffer is a problem.

I haven't specified how the data will be put into a buffer. You would expect
any code that did that to be aware the buffer is of a finite size. For
example, using fgets().
Look at what you said:

In fact I don't know if there's any other language group that would
seriously discuss the possibility of exhausting memory (on a machine
that has more memory than existed in the whole world a few decades
ago) while reading a simple response from a keyboard!

If the program does not consume memory, no one cares.

The choice seems to be between using next to no memory (your preference), to
use a small line buffer of perhaps 2000 bytes on a machine that might have
multiples of 1000000000 bytes (my preference), or to potentially use up all
the available memory (which for some reason, some see as a consequence of
using a line-buffered solution perhaps because they don't like the idea of
not coping with lines that might have unreasonably long lengths).
You seemed to
object to the absurdity of considering the possibility as if it either
could not happen, or was never a problem when it did. You are clearly
mocking the very idea of discussing of the issue in these days of
multi-GB machine.

No. I just would never let it get to that point where the memory capacity
would be under threat from something so trivial. It is desirable from a
coding point of view, especially programming at a higher level (from
languages such as Python for example), to easily iterate through lines in a
file. That requires that for each iteration you are presented with a string
containing the contents of the line.

At this level, you don't want mess about with the character-at-a-time
treatment that has been discussed. You read the line, and it should just
work. The underlying system (most likely a C implementation) should make
sure it behaves as expected.

It is entirely reasonable to expect a multi-GB machine to have enough
capacity for a string containing /one line/ of a text file. It is not
reasonable to compromise these expectations, because of the rare possibility
that someone will feed it garbage (ie. a file that is clearly not a
line-oriented text file). Deal with that possibility, yes, but don't throw
out the baby too.
Yes. How do you do it?

Raising an error is one way. The probability is that something /is/ wrong,
but the requirement to have to find space to store such inputs means such
checks will be in place. They might not be, with a solution that just scans
characters from a stream, but then effectively hangs because it is given a
series of billion-character lines to deal with.

(Although my main objection to the character solution is that it is at a
lower-level than line-based ones. If you are using line-based input in the
application anyway, you don't want to mix it up with low-level access.)
 
Ad

Advertisements

B

BartC

Keith Thompson said:
I just tried it:

$ grep hello /dev/zero
grep: /dev/zero: Cannot allocate memory

It failed because grep *does* have to store an entire line in memory.
The program we've been discussing, which reads single-character 'Y' or
'N' input, does not.

Since /dev/zero is clearly erroneous input, wouldn't it have been better if
it did report an error, instead of just accepting any old garbage?
If you want to read entire lines into memory, and impose some limit on
how long those lines can be *and* document the program's behavior when
that limit is exceeded, you can certainly do so.

Since you're knowledgeable about C, let me ask this: when using 'getchar()'
to read characters from stdin, this input is line-buffered. It will not
return until Enter has been pressed. So even though your program only scans
characters, whatever code is responsible for buffering the line, still has
to store the entire contents of the line in memory.

So the problem doesn't disappear! Admittedly it's made a bit worse if you
need the system's buffer *and* your buffer. But why are the memory problems
only an issue in your program, and not in the i/o library?
But it's not necessary
in the case we're discussing. With the code I posted, you don't need to
set a limit on the length of a line. On the other hand, if you decide
to do so anyway, you can add a few lines of code to count characters as
they're entered.

Storing entire lines creates problems. There are several ways to solve
those problems. One way is to avoid them by not storing entire lines.

Suppose the application in question does need to store entire lines at some
point. Then the problems need to be solved (unless you suggest that no
program should *ever* use line-buffered input). Once they are solved, why
not use the same solution when reading Y or N?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top