Why does my simple while loop printf 2x?

Discussion in 'C Programming' started by DFS, May 15, 2014.

  1. DFS

    James Kuyper Guest

    It did - the only thing that makes /dev/zero invalid input for grep is
    the fact that it requires (infinitely) more memory to store the first
    line, than grep has available to it - and that's precisely the error
    condition that grep reported, and I strongly suspect that it reported
    that error condition shortly after finding out that it needed more space
    than was available.

    What validity test would you suggest that grep use? An arbitrary upper
    line length of 1024 characters, rendering grep useless for someone who
    needs to extract relevant lines from a file that contains some lines
    longer than 1024 characters? Why impose a limit that doesn't need to be

    No one has ever advocated using character-at-a-time processing when
    entire lines are needed, only when they are not needed (as is the case
    for this thread).
    For the same reason you don't add a full text editor to a program just
    because it needs a line of input from the user. Or are you in the habit
    of doing that? Just because you already have the text editor code
    already written is no reason for inserting it into a program that
    doesn't need it.
    James Kuyper, May 19, 2014
    1. Advertisements

  2. How is it "clearly erroneous input"? I've used grep on binary files;
    "grep -l", which reports whether the pattern or not, can be useful in
    that case. Should the grep command recognize the name "/dev/zero" and
    treat it specially? If not, on what basis should grep recognize it as
    erroneous input?
    I presume that the implementation of the standard library is reasonably
    robust. The point is that if my program reads a character at a time,
    then I don't have to write extra code to cope with very long lines. If
    getchar() fails because the stdio library failed to allocate memory, it
    will report an error to my program, which can then deal with it as it
    sees fit.
    Because a line input routine doesn't entirely *solve* the problem of
    very long input lines. It can detect long lines, and it can report a
    failure. For a program that doesn't need to store the entire line, why
    add an unnecessary failure mode?
    Keith Thompson, May 19, 2014
    1. Advertisements

  3. DFS

    James Kuyper Guest

    On 05/19/2014 03:27 PM, Keith Thompson wrote:
    Note: grep could, in principle, be implemented in such a way that that
    "grep -l hello /dev/zero" does not need to store the entire input line.
    However, that doesn't seem to have been the case, at least on my system.
    The command "grep -l hello /dev/zero" produces the message "grep: memory
    exhausted" and and exit status of 2.
    James Kuyper, May 19, 2014
  4. So we agree it seems. Read the whole line into memory is a bad idea. I
    don't know why you objected to that bit of advice (not to you, but you
    did seem to object to it).
    That misses the point. The choice is between using little storage and
    having no limit on line length, and using a fixed-size buffer and not
    saying what you do when a line is too long. Once you say what you think
    should be done, it will be possible to compare how easy, safe, clear this
    alternative is.
    We both (now) agree that this is not a reasonable option (at least in
    this case). I don't know why you seemed surprised that someone would warn
    again the dangers of it, but never mind -- this one is off the table.
    It's like trying to nail a jelly to wall! I never said you would do
    that. I have no idea what your solution is because you posed code in an
    undefined notation.

    I wondered why you seemed surprised that we would discuss this option --
    not that you might permit it. I quoted your words. They had the usual
    breathless disbelief that such a thing would be discussed -- only here
    could such a thing be discussed! Yet you seem to accept that it is a
    problem and you, at least, would never let it get that far.
    Right. Let's be clear? If suggest the character-at-a-time is simpler
    and more general for reading a simple response, I am not saying the line
    reading is evil and no one should ever do it even when the format is
    line oriented.
    And then this. No, it isn't. It is entirely reasonable to assume that
    a short line of text can be read (maybe a few million characters even),
    but it is also reasonable to avoid any problems that can be caused by
    someone exploiting the fact that the programmer assumed that any line
    can be stored.

    You seem to accept that above, and then you go and say something else.
    I don't know where you stand on this.
    How? Just propose a solution and we can compare the safety and
    simplicity of the two options. I don't know how to define garbage such
    that is caches all these problems. Maybe you do.
    That's not answering how you would do it. What are you actually
    proposing that is simple and safe?
    Ben Bacarisse, May 19, 2014
  5. DFS

    Tonton Th Guest

    What about that :

    tth@plop:~$ yes 'makeporn' | tr -d '\n' | grep 'not war'
    Tonton Th, May 19, 2014
  6. Well, that isn't the only thing. Historically, many unix utilities
    didn't properly handle input that had '\0' in the input stream.
    Many text processing utilities will complain, and often fail, for
    non-text input. Many that can process UTF-8 data will complain,
    and often fail, if given data that doesn't conform, such as, for
    example, random bytes.

    As well as I know it, in addition to removing some of the older
    line length restrictions, GNU has removed the restriction on '\0'
    from some programs.

    Also, some programs allow one to specify the maximum line length,
    which is better than a built-in limit, and allows for more efficient
    memory management.

    -- glen
    glen herrmannsfeldt, May 20, 2014
  7. (snip)
    Well, for one, it could spool to disk when it couldn't allocate
    more memory. In the case of /dev/zero, that delays the error
    detection, but might be useful for some programs on some systems.

    In days of smaller memories, it used to be usual for compilers and
    assemblers to write temporary files to keep data between passes
    that wouldn't fit in memory. (Usually core in those days.)

    -- glen
    glen herrmannsfeldt, May 20, 2014
  8. DFS

    DFS Guest

    I want to read it, and I also want it to be Y/N.

    BartC's response of 5/15@5:30pm (and tweaked by me) is almost the
    ticket. It requires the first character to be Y/N, regardless of what
    comes after.

    As you know, that doesn't force the first character to be Y/N.

    DFS, May 20, 2014
  9. I suspect a language issue. I can't see how you can require something
    that simply may not be the case.

    I'm glad you've found a solution, but I'd like to see what it is but I
    can't find that message. The canonical way to reference a post is via
    the message-id (found in the headers). Can you post a message-id or the
    actual solution?

    Ben Bacarisse, May 20, 2014
  10. DFS

    DFS Guest

    BartC's original post


    void skiptoeol(void){
    while (getchar()!='\n') {}

    int confirm(void) {
    char c,d;
    while (1) {
    printf("Enter Y or N (not case-sensitive): ");
    if (c == 'Y' || c == 'N') {
    return c;

    int main(void) {
    char c;




    My tweaked version - I eliminated skiptoeol() and toupper().

    int confirm(void)
    char c;
    while (1) {
    printf("Enter Y or N (not case-sensitive):");
    if (c=='Y'||c=='y'||c=='N'||c=='n') {
    while (getchar()!='\n') {}
    while (getchar()!='\n') {}
    return c;

    int main(void)
    printf("You entered: %c\n", confirm());
    return 0;

    DFS, May 20, 2014
  11. DFS

    DFS Guest

    When I say 'require' I mean the program stays in the loop until you
    enter Y/N as the first character. You can still type other letters
    after it.

    Y is valid.
    YX is valid (tho I'd like to restrict the input to Y or N).
    XY isn't.
    DFS, May 20, 2014
  12. DFS

    Ian Collins Guest

    Why skip toupper()? It just leaves a bigger comparison for the caller.
    This would fail (get stuck) if it were to hit an end of file before a
    newline. You should re-add skiptoeol() and check for both.
    Ian Collins, May 20, 2014
  13. DFS

    DFS Guest

    What does 'bigger comparison for the caller' mean? Are you talking
    about this:

    (c='Y'||c='N') versus (c='Y'||c='y'||c='N'||c='n')?

    I deleted toupper() because it generated a compiler warning (implicit
    function... I think) unless I wrote the function definition:

    int toupper(int c);

    Keith Thompson pointed out it would be better to #include <ctype.h>.

    Bottom line: I did it to save a couple lines of code.

    How can it hit EOF before a newline if the user is just typing a letter
    or two on the keyboard?
    DFS, May 20, 2014
  14. DFS

    Ian Collins Guest

    Yes. Why make it harder?
    You should.
    Well if you write "c=toupper(getchar())" it's the same number of lines...
    They type ^D on Unix or ^Z (I think) on windows. Or if someone feeds
    input from a file.
    Ian Collins, May 20, 2014
  15. DFS

    DFS Guest

    It's not any harder than an extra #include and toupper().

    What kind of trouble will it cause if I write the function definition

    int toupper(int c);

    rather than #include ctype.h


    But then I have to add an #include or a function definition.

    Why would they type that or 'feed input from a file'?

    Don't users always follow instructions?
    DFS, May 20, 2014
  16. DFS

    Ian Collins Guest

    A helper function, such as yours, should make life easier for the caller.
    Why bother?
    You're taking the piss now, aren't you?
    Ian Collins, May 20, 2014
  17. It was never intended to. The idea is that you call the function in a
    loop until it returns 'Y' or 'N' (or 'y' or 'n'). If it returns EOF,
    give up because there's no more input.

    If you want to put the loop inside the function you can do that, of
    course, but separating the code to return the first character of a line
    from the code that checks the value seems cleaner to me.
    Keith Thompson, May 20, 2014
  18. On a Unix-like system, if it's reading from the keyboard, it will hit
    end-of-file before newline if the user types Ctrl-D twice in the middle
    of a line.

    If input is from a file, it will hit end-of-file before newline if the
    last line of the file is not terminated by a newline (but it's supposed
    to be an interactive program and you can probably just ignore that
    Keith Thompson, May 20, 2014
  19. DFS

    BartC Guest

    I was talking about the program that needs a Y/N response.
    I'm not familiar with grep. If it really works with /lines/, rather than a
    continuous stream of characters, then I see nothing wrong with specifying a
    reasonable upper limit for an individual line length. This would be chosen
    so that it would never be reached in the majority of cases, while sensibly
    balking at input which is clearly not intended for it: /dev/zero example, or
    some extremely large file that doesn't contain newlines but is just small
    enough for memory to be allocated for it, which will however be enough to
    slow everything down.

    And of course the actual line limit can be specified as an option.

    But then what do I know about making perfectly reasonable compromises?
    A text editor (if adding the source code for one rather than invoking an
    external command) is a bit more complicated than just calling fgets()
    instead of a loop containing getchar().
    My personal opinion is that it is a good thing to move onto a higher, more
    abstract level as soon as possible when dealing with stuff like this. I've
    done a lot of command-line programming, mostly it does not take input
    directly from a terminal or console (in other words, it is not just an
    extension of a Linux terminal dialog).

    But if you are using broad concepts such as 'read the next line' and 'look
    at the first token on this line', then you are largely immune from these
    details. It's also harder to have such input redirected from a file or from
    BartC, May 20, 2014
  20. Thanks. Not everyone find message-ids useful, but they are the "gold
    standard" for citing someone's post.
    The logic is bit odd here. If the user hits return the next line is
    consumed and discarded.
    skiptoeol() was not perfect (it should test for EOF) but why get rid of
    Did you really format it like this?

    Anyway, I'd write it like this:

    int confirm(void)
    char reply = 0;
    do {
    int c;
    if (reply != 0) puts("Y or N must be the first character.");
    printf("Enter Y or N (not case-sensitive): ");

    reply = c = toupper(getchar());
    while (c != EOF && c != '\n') c = getchar();

    } while (reply != 'Y' && reply != 'N');
    return reply;

    c is of type int so that it can detect EOF. The logic is such that the
    code loops indefinitely if there is no y or n at the start of a line, so
    detecting EOF might seem pointless, but at least this way the user gets
    told what's wrong. Personally, I'd also break the loop when EOF is seen
    which I'd treat either as a "no" or as a reason to call exit().

    Ben Bacarisse, May 20, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.