Why does my simple while loop printf 2x?

J

James Kuyper

Since /dev/zero is clearly erroneous input, wouldn't it have been better if
it did report an error, instead of just accepting any old garbage?

It did - the only thing that makes /dev/zero invalid input for grep is
the fact that it requires (infinitely) more memory to store the first
line, than grep has available to it - and that's precisely the error
condition that grep reported, and I strongly suspect that it reported
that error condition shortly after finding out that it needed more space
than was available.

What validity test would you suggest that grep use? An arbitrary upper
line length of 1024 characters, rendering grep useless for someone who
needs to extract relevant lines from a file that contains some lines
longer than 1024 characters? Why impose a limit that doesn't need to be
imposed?

....
Suppose the application in question does need to store entire lines at some
point.

No one has ever advocated using character-at-a-time processing when
entire lines are needed, only when they are not needed (as is the case
for this thread).
Then the problems need to be solved (unless you suggest that no
program should *ever* use line-buffered input). Once they are solved, why
not use the same solution when reading Y or N?

For the same reason you don't add a full text editor to a program just
because it needs a line of input from the user. Or are you in the habit
of doing that? Just because you already have the text editor code
already written is no reason for inserting it into a program that
doesn't need it.
 
K

Keith Thompson

BartC said:
Since /dev/zero is clearly erroneous input, wouldn't it have been better if
it did report an error, instead of just accepting any old garbage?

How is it "clearly erroneous input"? I've used grep on binary files;
"grep -l", which reports whether the pattern or not, can be useful in
that case. Should the grep command recognize the name "/dev/zero" and
treat it specially? If not, on what basis should grep recognize it as
erroneous input?
Since you're knowledgeable about C, let me ask this: when using 'getchar()'
to read characters from stdin, this input is line-buffered. It will not
return until Enter has been pressed. So even though your program only scans
characters, whatever code is responsible for buffering the line, still has
to store the entire contents of the line in memory.

So the problem doesn't disappear! Admittedly it's made a bit worse if you
need the system's buffer *and* your buffer. But why are the memory problems
only an issue in your program, and not in the i/o library?

I presume that the implementation of the standard library is reasonably
robust. The point is that if my program reads a character at a time,
then I don't have to write extra code to cope with very long lines. If
getchar() fails because the stdio library failed to allocate memory, it
will report an error to my program, which can then deal with it as it
sees fit.
Suppose the application in question does need to store entire lines at some
point. Then the problems need to be solved (unless you suggest that no
program should *ever* use line-buffered input). Once they are solved, why
not use the same solution when reading Y or N?

Because a line input routine doesn't entirely *solve* the problem of
very long input lines. It can detect long lines, and it can report a
failure. For a program that doesn't need to store the entire line, why
add an unnecessary failure mode?
 
J

James Kuyper

On 05/19/2014 03:27 PM, Keith Thompson wrote:
....
How is it "clearly erroneous input"? I've used grep on binary files;
"grep -l", which reports whether the pattern or not, can be useful in
that case. Should the grep command recognize the name "/dev/zero" and
treat it specially? If not, on what basis should grep recognize it as
erroneous input?

Note: grep could, in principle, be implemented in such a way that that
"grep -l hello /dev/zero" does not need to store the entire input line.
However, that doesn't seem to have been the case, at least on my system.
The command "grep -l hello /dev/zero" produces the message "grep: memory
exhausted" and and exit status of 2.
 
B

Ben Bacarisse

BartC said:
I haven't specified how the data will be put into a buffer. You would expect
any code that did that to be aware the buffer is of a finite size. For
example, using fgets().

So we agree it seems. Read the whole line into memory is a bad idea. I
don't know why you objected to that bit of advice (not to you, but you
did seem to object to it).
The choice seems to be between using next to no memory (your preference), to
use a small line buffer of perhaps 2000 bytes on a machine that might have
multiples of 1000000000 bytes (my preference),

That misses the point. The choice is between using little storage and
having no limit on line length, and using a fixed-size buffer and not
saying what you do when a line is too long. Once you say what you think
should be done, it will be possible to compare how easy, safe, clear this
alternative is.
or to potentially use up all
the available memory (which for some reason, some see as a consequence of
using a line-buffered solution perhaps because they don't like the idea of
not coping with lines that might have unreasonably long lengths).

We both (now) agree that this is not a reasonable option (at least in
this case). I don't know why you seemed surprised that someone would warn
again the dangers of it, but never mind -- this one is off the table.
No. I just would never let it get to that point where the memory capacity
would be under threat from something so trivial.

It's like trying to nail a jelly to wall! I never said you would do
that. I have no idea what your solution is because you posed code in an
undefined notation.

I wondered why you seemed surprised that we would discuss this option --
not that you might permit it. I quoted your words. They had the usual
breathless disbelief that such a thing would be discussed -- only here
could such a thing be discussed! Yet you seem to accept that it is a
problem and you, at least, would never let it get that far.
It is desirable from a
coding point of view, especially programming at a higher level (from
languages such as Python for example), to easily iterate through lines
in a file. That requires that for each iteration you are presented
with a string containing the contents of the line.

At this level, you don't want mess about with the character-at-a-time
treatment that has been discussed. You read the line, and it should just
work. The underlying system (most likely a C implementation) should make
sure it behaves as expected.

Right. Let's be clear? If suggest the character-at-a-time is simpler
and more general for reading a simple response, I am not saying the line
reading is evil and no one should ever do it even when the format is
line oriented.
It is entirely reasonable to expect a multi-GB machine to have enough
capacity for a string containing /one line/ of a text file.

And then this. No, it isn't. It is entirely reasonable to assume that
a short line of text can be read (maybe a few million characters even),
but it is also reasonable to avoid any problems that can be caused by
someone exploiting the fact that the programmer assumed that any line
can be stored.

You seem to accept that above, and then you go and say something else.
I don't know where you stand on this.
It is not
reasonable to compromise these expectations, because of the rare
possibility that someone will feed it garbage (ie. a file that is
clearly not a line-oriented text file). Deal with that possibility,
yes, but don't throw out the baby too.

How? Just propose a solution and we can compare the safety and
simplicity of the two options. I don't know how to define garbage such
that is caches all these problems. Maybe you do.
Raising an error is one way. The probability is that something /is/ wrong,
but the requirement to have to find space to store such inputs means such
checks will be in place. They might not be, with a solution that just scans
characters from a stream, but then effectively hangs because it is given a
series of billion-character lines to deal with.

That's not answering how you would do it. What are you actually
proposing that is simple and safe?
 
T

Tonton Th

Since /dev/zero is clearly erroneous input, wouldn't it have been better if
it did report an error, instead of just accepting any old garbage?

What about that :

tth@plop:~$ yes 'makeporn' | tr -d '\n' | grep 'not war'
 
G

glen herrmannsfeldt

James Kuyper said:
(snip)
(snip)
It did - the only thing that makes /dev/zero invalid input for grep is
the fact that it requires (infinitely) more memory to store the first
line, than grep has available to it - and that's precisely the error
condition that grep reported, and I strongly suspect that it reported
that error condition shortly after finding out that it needed more space
than was available.

Well, that isn't the only thing. Historically, many unix utilities
didn't properly handle input that had '\0' in the input stream.
What validity test would you suggest that grep use? An arbitrary upper
line length of 1024 characters, rendering grep useless for someone who
needs to extract relevant lines from a file that contains some lines
longer than 1024 characters? Why impose a limit that doesn't need to be
imposed?

Many text processing utilities will complain, and often fail, for
non-text input. Many that can process UTF-8 data will complain,
and often fail, if given data that doesn't conform, such as, for
example, random bytes.

As well as I know it, in addition to removing some of the older
line length restrictions, GNU has removed the restriction on '\0'
from some programs.

Also, some programs allow one to specify the maximum line length,
which is better than a built-in limit, and allows for more efficient
memory management.

-- glen
 
G

glen herrmannsfeldt

(snip)
Note: grep could, in principle, be implemented in such a way that that
"grep -l hello /dev/zero" does not need to store the entire input line.
However, that doesn't seem to have been the case, at least on my system.
The command "grep -l hello /dev/zero" produces the message "grep: memory
exhausted" and and exit status of 2.

Well, for one, it could spool to disk when it couldn't allocate
more memory. In the case of /dev/zero, that delays the error
detection, but might be useful for some programs on some systems.

In days of smaller memories, it used to be usual for compilers and
assemblers to write temporary files to keep data between passes
that wouldn't fit in memory. (Usually core in those days.)

-- glen
 
D

DFS

You want to read the first character on an input line, and discard the
rest of the line.


I want to read it, and I also want it to be Y/N.

BartC's response of 5/15@5:30pm (and tweaked by me) is almost the
ticket. It requires the first character to be Y/N, regardless of what
comes after.



Here's a function that does that:

/*
* Returns the first character on an input line;
* discards the rest of the line.
* Returns '\n' for an empty line, EOF at end-of-file.
*/
int get_first_char(void) {
int result = getchar();
if (result != EOF && result != '\n') {
int discard;
while ((discard = getchar()) != '\n' && discard != EOF) {
/* nothing */
}
}
return result;
}


As you know, that doesn't force the first character to be Y/N.


Thanks
 
B

Ben Bacarisse

DFS said:
I want to read it, and I also want it to be Y/N.

BartC's response of 5/15@5:30pm (and tweaked by me) is almost the
ticket. It requires the first character to be Y/N, regardless of what
comes after.

I suspect a language issue. I can't see how you can require something
that simply may not be the case.

I'm glad you've found a solution, but I'd like to see what it is but I
can't find that message. The canonical way to reference a post is via
the message-id (found in the headers). Can you post a message-id or the
actual solution?

<snip>
 
D

DFS

I suspect a language issue. I can't see how you can require something
that simply may not be the case.

I'm glad you've found a solution, but I'd like to see what it is but I
can't find that message. The canonical way to reference a post is via
the message-id (found in the headers). Can you post a message-id or the
actual solution?

<snip>


==========================================================
BartC's original post

<[email protected]>

void skiptoeol(void){
while (getchar()!='\n') {}
}

int confirm(void) {
char c,d;
while (1) {
printf("Enter Y or N (not case-sensitive): ");
c=toupper(getchar());
if (c == 'Y' || c == 'N') {
skiptoeol();
break;
}
printf("Error.\n");
skiptoeol();
}
return c;
}

int main(void) {
char c;

c=confirm();
printf("%s\n",(c=='Y'?"Yes":"No"));

}

==========================================================

My tweaked version - I eliminated skiptoeol() and toupper().


int confirm(void)
{
char c;
while (1) {
printf("Enter Y or N (not case-sensitive):");
c=getchar();
if (c=='Y'||c=='y'||c=='N'||c=='n') {
while (getchar()!='\n') {}
break;
}
while (getchar()!='\n') {}
}
return c;
}


int main(void)
{
printf("You entered: %c\n", confirm());
return 0;
}

==========================================================
 
D

DFS

When I say 'require' I mean the program stays in the loop until you
enter Y/N as the first character. You can still type other letters
after it.

Y is valid.
YX is valid (tho I'd like to restrict the input to Y or N).
XY isn't.
 
I

Ian Collins

DFS said:
My tweaked version - I eliminated skiptoeol() and toupper().

Why skip toupper()? It just leaves a bigger comparison for the caller.
int confirm(void)
{
char c;
while (1) {
printf("Enter Y or N (not case-sensitive):");
c=getchar();
if (c=='Y'||c=='y'||c=='N'||c=='n') {
while (getchar()!='\n') {}
break;
}
while (getchar()!='\n') {}
}
return c;
}

This would fail (get stuck) if it were to hit an end of file before a
newline. You should re-add skiptoeol() and check for both.
 
D

DFS

Why skip toupper()? It just leaves a bigger comparison for the caller.

What does 'bigger comparison for the caller' mean? Are you talking
about this:

(c='Y'||c='N') versus (c='Y'||c='y'||c='N'||c='n')?


I deleted toupper() because it generated a compiler warning (implicit
function... I think) unless I wrote the function definition:

int toupper(int c);

Keith Thompson pointed out it would be better to #include <ctype.h>.

Bottom line: I did it to save a couple lines of code.


This would fail (get stuck) if it were to hit an end of file before a
newline. You should re-add skiptoeol() and check for both.

How can it hit EOF before a newline if the user is just typing a letter
or two on the keyboard?
 
I

Ian Collins

DFS said:
What does 'bigger comparison for the caller' mean? Are you talking
about this:

(c='Y'||c='N') versus (c='Y'||c='y'||c='N'||c='n')?

Yes. Why make it harder?
I deleted toupper() because it generated a compiler warning (implicit
function... I think) unless I wrote the function definition:

int toupper(int c);

Keith Thompson pointed out it would be better to #include <ctype.h>.

You should.
Bottom line: I did it to save a couple lines of code.

Well if you write "c=toupper(getchar())" it's the same number of lines...
How can it hit EOF before a newline if the user is just typing a letter
or two on the keyboard?

They type ^D on Unix or ^Z (I think) on windows. Or if someone feeds
input from a file.
 
D

DFS

Yes. Why make it harder?


It's not any harder than an extra #include and toupper().


You should.


What kind of trouble will it cause if I write the function definition

int toupper(int c);

rather than #include ctype.h

?



Well if you write "c=toupper(getchar())" it's the same number of lines...

But then I have to add an #include or a function definition.



They type ^D on Unix or ^Z (I think) on windows. Or if someone feeds
input from a file.

Why would they type that or 'feed input from a file'?

Don't users always follow instructions?
 
I

Ian Collins

DFS said:
It's not any harder than an extra #include and toupper().

A helper function, such as yours, should make life easier for the caller.
What kind of trouble will it cause if I write the function definition

int toupper(int c);

rather than #include ctype.h

Why bother?
But then I have to add an #include or a function definition.
So?


Why would they type that or 'feed input from a file'?

Don't users always follow instructions?

You're taking the piss now, aren't you?
 
K

Keith Thompson

DFS said:
On 05/17/2014 03:35 PM, Keith Thompson wrote: [...]
Here's a function that does that:

/*
* Returns the first character on an input line;
* discards the rest of the line.
* Returns '\n' for an empty line, EOF at end-of-file.
*/
int get_first_char(void) {
int result = getchar();
if (result != EOF && result != '\n') {
int discard;
while ((discard = getchar()) != '\n' && discard != EOF) {
/* nothing */
}
}
return result;
}


As you know, that doesn't force the first character to be Y/N.

It was never intended to. The idea is that you call the function in a
loop until it returns 'Y' or 'N' (or 'y' or 'n'). If it returns EOF,
give up because there's no more input.

If you want to put the loop inside the function you can do that, of
course, but separating the code to return the first character of a line
from the code that checks the value seems cleaner to me.
 
K

Keith Thompson

DFS said:
On 5/19/2014 11:44 PM, Ian Collins wrote: [...]
This would fail (get stuck) if it were to hit an end of file before a
newline. You should re-add skiptoeol() and check for both.

How can it hit EOF before a newline if the user is just typing a letter
or two on the keyboard?

On a Unix-like system, if it's reading from the keyboard, it will hit
end-of-file before newline if the user types Ctrl-D twice in the middle
of a line.

If input is from a file, it will hit end-of-file before newline if the
last line of the file is not terminated by a newline (but it's supposed
to be an interactive program and you can probably just ignore that
possibility).
 
B

BartC

James Kuyper said:
On 05/19/2014 02:52 PM, BartC wrote:

It did - the only thing that makes /dev/zero invalid input for grep ...
What validity test would you suggest that grep use?

I was talking about the program that needs a Y/N response.
An arbitrary upper
line length of 1024 characters, rendering grep useless for someone who
needs to extract relevant lines from a file that contains some lines
longer than 1024 characters? Why impose a limit that doesn't need to be
imposed?

I'm not familiar with grep. If it really works with /lines/, rather than a
continuous stream of characters, then I see nothing wrong with specifying a
reasonable upper limit for an individual line length. This would be chosen
so that it would never be reached in the majority of cases, while sensibly
balking at input which is clearly not intended for it: /dev/zero example, or
some extremely large file that doesn't contain newlines but is just small
enough for memory to be allocated for it, which will however be enough to
slow everything down.

And of course the actual line limit can be specified as an option.

But then what do I know about making perfectly reasonable compromises?
For the same reason you don't add a full text editor to a program just
because it needs a line of input from the user.

A text editor (if adding the source code for one rather than invoking an
external command) is a bit more complicated than just calling fgets()
instead of a loop containing getchar().
Just because you already have the text editor code
already written is no reason for inserting it into a program that
doesn't need it.

My personal opinion is that it is a good thing to move onto a higher, more
abstract level as soon as possible when dealing with stuff like this. I've
done a lot of command-line programming, mostly it does not take input
directly from a terminal or console (in other words, it is not just an
extension of a Linux terminal dialog).

But if you are using broad concepts such as 'read the next line' and 'look
at the first token on this line', then you are largely immune from these
details. It's also harder to have such input redirected from a file or from
/dev/zero.
 
B

Ben Bacarisse

DFS said:
==========================================================
BartC's original post

<[email protected]>

Thanks. Not everyone find message-ids useful, but they are the "gold
standard" for citing someone's post.
void skiptoeol(void){
while (getchar()!='\n') {}
}

int confirm(void) {
char c,d;
while (1) {
printf("Enter Y or N (not case-sensitive): ");
c=toupper(getchar());
if (c == 'Y' || c == 'N') {
skiptoeol();
break;
}
printf("Error.\n");
skiptoeol();
}
return c;
}

int main(void) {
char c;

c=confirm();
printf("%s\n",(c=='Y'?"Yes":"No"));

}

The logic is bit odd here. If the user hits return the next line is
consumed and discarded.
==========================================================

My tweaked version - I eliminated skiptoeol() and toupper().

skiptoeol() was not perfect (it should test for EOF) but why get rid of
it?
int confirm(void)
{
char c;
while (1) {
printf("Enter Y or N (not case-sensitive):");
c=getchar();
if (c=='Y'||c=='y'||c=='N'||c=='n') {
while (getchar()!='\n') {}
break;
}
while (getchar()!='\n') {}
}
return c;
}

Did you really format it like this?

Anyway, I'd write it like this:

int confirm(void)
{
char reply = 0;
do {
int c;
if (reply != 0) puts("Y or N must be the first character.");
printf("Enter Y or N (not case-sensitive): ");

reply = c = toupper(getchar());
while (c != EOF && c != '\n') c = getchar();

} while (reply != 'Y' && reply != 'N');
return reply;
}

c is of type int so that it can detect EOF. The logic is such that the
code loops indefinitely if there is no y or n at the start of a line, so
detecting EOF might seem pointless, but at least this way the user gets
told what's wrong. Personally, I'd also break the loop when EOF is seen
which I'd treat either as a "no" or as a reason to call exit().

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top