Reading a number from stdin

Discussion in 'C Programming' started by pandit, May 21, 2014.

  1. When talking about something as generic as reading a number
    from stdin, I don't know what "normally" is supposed to mean.
    Some applications might only want numbers below 100; others might
    need to handle anything in the range of int. If you're writing
    a library routine to read integers, restricting the input to an
    arbitrary range is not a good idea.

    Furthermore, if you call scanf("%5d", &n) and enter "123456", n is
    set to 12345 and the 6 is left in the input stream. You could add
    code to check for that, but then you might as well use strtol().
    I'm not at all sure what sscanf() *should* do for an out-of-range
    numeric input. The trouble is that it just returns the number
    of items matched; it has no good way to distinguish between
    syntactically bad input and an out-of-range number.

    If the behavior were to be defined in a future standard, it could
    either treat it as a matching failure, or it could set the object to,
    say, INT_MIN or INT_MAX and set error to ERANGE, like strtol() does.

    Either would be better than leaving the behavior undefined.

    (Incidentally, it's your runtime library, not your compiler, that
    implements sscanf.)
     
    Keith Thompson, May 23, 2014
    #21
    1. Advertisements

  2. Oh of course. It also allows whitespace to match nothing.
    Not good.
    True.

    Really you've got to use strtol for robust parsing of integers.
     
    Malcolm McLean, May 23, 2014
    #22
    1. Advertisements

  3. Numbers usually mean something, typically integers are counts of something
    in the real world. A few applications, like a calculator, might process numbers
    without understanding their meaning. But normally it has to be programmed
    in. The number might be the number of employees. That could conceivably
    go above a hundred thousand, but only for the very largest organisations.
    The number of characters in someone's name is never going to go that large,
    nor are the number of optimisation levels for a compiler. You might add -04,
    -O5, and so on, and it's hard to set a definite limit, but it's never going to go
    super-high.
     
    Malcolm McLean, May 23, 2014
    #23
  4. pandit

    Stefan Ram Guest

    When a program has to read something, the input has to conform
    to certain expectations, otherwise the input is erroneous.

    These expectations need to be laid down in an ILS (input-langauge
    specification).

    The core of the ILS is the specification of the syntax of the input
    language (IL), usually using a grammar, using - for example - EBNF.

    How errors in the input are to be treated is specified in the
    requirement specifications (RS) for the software, which also
    includes the ILS.

    Given such an RS and money, a programmer then can write a
    parser with error handling for that input language in C.

    »Reading a number« or »reading numbers« cannot server as an RS,
    because it is still too vague.
     
    Stefan Ram, May 23, 2014
    #24
  5. Systems like that soon hit reality.

    The grammar might specify an integer as a +/- followed by a sequence of digits, with
    zero being a special case of a leading zero allowed.
    However C only allows easy representation of integers which will fit in a basic type.

    You can of course code an arbitrary-precision integer representation to read the grammar,
    only to find that it's referring to a user option that's unlikely to go above three.
    That sort of thing adds massively to the costs of development and adds potential points of
    failure. Also, people might ignore the specifications because they are so detached
    from the actual requirements, leading to the worst possible situation - code which
    doesn't in fact behave as documented.

    Or you can write the grammar in terms of basic input functions you have.
     
    Malcolm McLean, May 23, 2014
    #25
  6. I think you meant

    scanf("%5d %5d", &i, &j);
     
    Keith Thompson, May 23, 2014
    #26
  7. pandit

    Stefan Ram Guest

    If such a grammar is »not realistic«, what do you then say about

    decimal-constant:
    nonzero-digit
    decimal-constant digit

    which is quoted straight from N1570 (6.4.4.1)?
     
    Stefan Ram, May 23, 2014
    #27
  8. So *depending on the application's requirements*, it might make sense to
    restrict the range of input values.

    scanf(), or even fgets() followed by sscanf(), is not a useful or safe
    way to do that, though it may be good enough for a quick-and-dirty
    program where you aren't concerned about incorrect input.
     
    Keith Thompson, May 23, 2014
    #28
  9. Yes, of course, the 5 was the key thing! Thanks.

    <snip>
     
    Ben Bacarisse, May 23, 2014
    #29
  10. It is possible to talk formal grammars without also talking gibberish.
    But the psychological reality is that they engender a gibberish
    mentality. "N1570 (6.4.4.1)?" is gibberish. It doesn't mean anything
    to anyone who doesn't have that particular document in mind.

    As the author of MiniBasic ( http://sourceforge.net/projects/minibasic/?source=directory )I'm not opposed to formal grammars, for formal
    language specification. For an average text file format, however,
    it's overkill.
    It's not usually clear what a program should do when presented with
    a huge integer, or a huge real, in a place where a number is
    expected and allowed. It's not usually worth worrying too much about
    it because the data almost always must be corrupt, numbers usually mean
    something, and very high numbers are seldom valid. As long as the
    program doesn't crash, and throws the file out, it's likely to be OK
    in all but the most rigorous of environments.

    My options parser uses a scanf-like interface to extract options, but
    it calls strtol() internally, then throws out anything out of the
    range of a signed int, at the parse level. Caller then throws out
    anything out of range at the application level.
    That's a reasonable, general-purpose solution to getting an integer
    from the commandline.
     
    Malcolm McLean, May 24, 2014
    #30
  11. pandit

    Stefan Ram Guest

    It is exactly this kind of thinking that has lead us to the
    current situation, where a CSV file that has been created by
    a program A cannot be read by program B that is claiming to
    be able to read CSV files.
     
    Stefan Ram, May 24, 2014
    #31
  12. CSV should have been a bit more tightly specified.
    I've got a parser on my website. It's quite a hunk of code, the header
    has to be intelligently guessed, and nan is grief because not all
    versions of C handle it the same.

    Here it is
    http://www.malcolmmclean.site11.com/www/


    It's not terribly efficient. Unfortunately CSV files can be very
    large and reading them can be a performance bottleneck. My version
    reads the whole lot into memory with a separate allocation for
    every string, which is only OK for small to medium-sized files on
    medium to big machines.
     
    Malcolm McLean, May 24, 2014
    #32
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.