scanf()

Discussion in 'C Programming' started by Edward Rutherford, Aug 6, 2012.

  1. Hello

    If scanf() fails for a matching error, what can we say about the location
    of the file pointer for subsequent reads from stdin? Could it have read
    on through an indeterminate amount of stdin, or will it always be
    positioned immediately after the last successful conversion from the
    format string?

    Regards
    Edward Rutherford, Aug 6, 2012
    #1
    1. Advertising

  2. Edward Rutherford

    Eric Sosman Guest

    On 8/6/2012 4:58 PM, Edward Rutherford wrote:
    > Hello
    >
    > If scanf() fails for a matching error, what can we say about the location
    > of the file pointer for subsequent reads from stdin? Could it have read
    > on through an indeterminate amount of stdin, or will it always be
    > positioned immediately after the last successful conversion from the
    > format string?


    The former. For example, consider reading with "%e" and
    encountering the input "123.4567e+---". The first ten characters
    are a valid prefix for a floating-point number, which is then
    spoiled by the eleventh. Yet scanf() can't simply push the
    eleventh character back onto the stream and convert the first
    ten: They are valid as a prefix but not as a complete number.
    scanf() would have to push back three characters ('-', '+', 'e')
    to arrive at something valid, but it has only one character of
    push-back to work with. And, of course, there's simply no way
    it can get all the way back to a position before the '1'.

    --
    Eric Sosman
    d
    Eric Sosman, Aug 6, 2012
    #2
    1. Advertising

  3. Edward Rutherford

    James Kuyper Guest

    On 08/06/2012 04:58 PM, Edward Rutherford wrote:
    > Hello
    >
    > If scanf() fails for a matching error, what can we say about the location
    > of the file pointer for subsequent reads from stdin? Could it have read
    > on through an indeterminate amount of stdin, or will it always be
    > positioned immediately after the last successful conversion from the
    > format string?
    >
    > Regards
    >


    "An input item is defined as the longest sequence of input characters
    which does not exceed any specified field width and which is, or is a
    prefix of, a matching input sequence.285) The first character, if any,
    after the input item remains unread." (7.21.6.2p9).
    James Kuyper, Aug 6, 2012
    #3
  4. Eric Sosman <> writes:
    > On 8/6/2012 4:58 PM, Edward Rutherford wrote:
    >> If scanf() fails for a matching error, what can we say about the location
    >> of the file pointer for subsequent reads from stdin? Could it have read
    >> on through an indeterminate amount of stdin, or will it always be
    >> positioned immediately after the last successful conversion from the
    >> format string?

    >
    > The former. For example, consider reading with "%e" and
    > encountering the input "123.4567e+---". The first ten characters
    > are a valid prefix for a floating-point number, which is then
    > spoiled by the eleventh. Yet scanf() can't simply push the
    > eleventh character back onto the stream and convert the first
    > ten: They are valid as a prefix but not as a complete number.
    > scanf() would have to push back three characters ('-', '+', 'e')
    > to arrive at something valid, but it has only one character of
    > push-back to work with. And, of course, there's simply no way
    > it can get all the way back to a position before the '1'.


    And that's not the only problem with using scanf to read numeric data.

    Consider reading with "%e" and encountering the input "1.0e9999999".
    The behavior is undefined -- and since you're reading from stdin,
    there's no way (using scanf alone) to avoid that.

    (This was something I really hoped C11 would fix, but it didn't.)

    The safe way to read floating-point data from stdin is to read lines
    using fgets() (or something similar -- obviously not gets()), and then
    parse it using strtod().

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 6, 2012
    #4
  5. Edward Rutherford <> writes:

    > If scanf() fails for a matching error, what can we say about the location
    > of the file pointer for subsequent reads from stdin? Could it have read
    > on through an indeterminate amount of stdin, or will it always be
    > positioned immediately after the last successful conversion from the
    > format string?


    No, it can read an unlimited number of characters before reporting a
    matching failure. There are lots of details, but since you are asking
    an all-or-nothing question, you probably don't care about them. A
    "worst case" example occurs when processing "%d" -- an unlimited number
    of white-space characters can be read before any matching failure can be
    reported. If there is such a failure, the character that causes it
    matching failure will pushed back onto the stream, but that's all.

    The scanf functions only ever push back at most one character so you
    can use that a rule of thumb to imagine what they must consume before
    a directive can fail.

    For some input patterns %n is an effective solution, but you don't say
    what your overall objective is so I can't be sure.

    --
    Ben.
    Ben Bacarisse, Aug 6, 2012
    #5
  6. Ben Bacarisse wrote:

    > Edward Rutherford <> writes:
    >
    >> If scanf() fails for a matching error, what can we say about the
    >> location of the file pointer for subsequent reads from stdin? Could it
    >> have read on through an indeterminate amount of stdin, or will it
    >> always be positioned immediately after the last successful conversion
    >> from the format string?

    >
    > No, it can read an unlimited number of characters before reporting a
    > matching failure. There are lots of details, but since you are asking
    > an all-or-nothing question, you probably don't care about them. A
    > "worst case" example occurs when processing "%d" -- an unlimited number
    > of white-space characters can be read before any matching failure can be
    > reported. If there is such a failure, the character that causes it
    > matching failure will pushed back onto the stream, but that's all.
    >
    > The scanf functions only ever push back at most one character so you can
    > use that a rule of thumb to imagine what they must consume before a
    > directive can fail.
    >
    > For some input patterns %n is an effective solution, but you don't say
    > what your overall objective is so I can't be sure.


    That's unfortunate, because then there's no way to recover the non-
    matching data, as far as I can see.

    Wouldn't it be better if scanf() buffered non-matching characters, either
    seemlessly in the background (so that future reads from stdin saw those
    characters) or by returning a pointer to a buffer containing them?
    Edward Rutherford, Aug 7, 2012
    #6
  7. Edward Rutherford

    James Kuyper Guest

    On 08/07/2012 02:37 PM, Edward Rutherford wrote:
    > Ben Bacarisse wrote:
    >
    >> Edward Rutherford <> writes:
    >>
    >>> If scanf() fails for a matching error, what can we say about the
    >>> location of the file pointer for subsequent reads from stdin? Could it
    >>> have read on through an indeterminate amount of stdin, or will it
    >>> always be positioned immediately after the last successful conversion
    >>> from the format string?

    >>
    >> No, it can read an unlimited number of characters before reporting a
    >> matching failure. There are lots of details, but since you are asking
    >> an all-or-nothing question, you probably don't care about them. A
    >> "worst case" example occurs when processing "%d" -- an unlimited number
    >> of white-space characters can be read before any matching failure can be
    >> reported. If there is such a failure, the character that causes it
    >> matching failure will pushed back onto the stream, but that's all.
    >>
    >> The scanf functions only ever push back at most one character so you can
    >> use that a rule of thumb to imagine what they must consume before a
    >> directive can fail.
    >>
    >> For some input patterns %n is an effective solution, but you don't say
    >> what your overall objective is so I can't be sure.

    >
    > That's unfortunate, because then there's no way to recover the non-
    > matching data, as far as I can see.
    >
    > Wouldn't it be better if scanf() buffered non-matching characters, either
    > seemlessly in the background (so that future reads from stdin saw those
    > characters) or by returning a pointer to a buffer containing them?


    scanf() was not intended to have that complicated an interface, and it's
    far too late to make any changes to its interface now.

    If you want to do something like that, the standard library provides you
    with the pieces you need to assemble to do it yourself: fgets() from
    <stdio.h> and the strto*() functions from <stdlib.h> are the most
    relevant ones.
    James Kuyper, Aug 7, 2012
    #7
  8. Edward Rutherford

    Eric Sosman Guest

    On 8/7/2012 2:37 PM, Edward Rutherford wrote:
    > Ben Bacarisse wrote:
    >
    >> Edward Rutherford <> writes:
    >>
    >>> If scanf() fails for a matching error, [...] Could it
    >>> have read on through an indeterminate amount of stdin, [...]

    >>
    >> No, it can read an unlimited number of characters before reporting a
    >> matching failure. [...]

    >
    > That's unfortunate, because then there's no way to recover the non-
    > matching data, as far as I can see.


    Right. That's one of the reasons scanf() and its brethren
    are difficult to use in "industrial-strength" applications.

    > Wouldn't it be better if scanf() buffered non-matching characters, either
    > seemlessly in the background (so that future reads from stdin saw those
    > characters) or by returning a pointer to a buffer containing them?


    Which part of "indeterminate amount" and "unlimited number"
    do you have trouble understanding? ;-)

    If you need to revisit those characters, you'll need to buffer
    them yourself (or fseek() back to them, if the input is seekable).
    You'd need to read them into a buffer first, and then apply sscanf()
    to the buffer; as Ben suggests, the "%n" specifier may be helpful
    in navigating.

    My own preference when parsing input fancy enough to warrant
    backtracking is to read it as plain characters (perhaps a line at
    a time, if "line" makes sense), store them in a buffer, and pick
    them apart with strxxx() functions. Even sscanf() has infelicities.

    --
    Eric Sosman
    d
    Eric Sosman, Aug 7, 2012
    #8
  9. Edward Rutherford <> writes:

    > Ben Bacarisse wrote:
    >
    >> Edward Rutherford <> writes:
    >>
    >>> If scanf() fails for a matching error, what can we say about the
    >>> location of the file pointer for subsequent reads from stdin? Could it
    >>> have read on through an indeterminate amount of stdin, or will it
    >>> always be positioned immediately after the last successful conversion
    >>> from the format string?

    >>
    >> No, it can read an unlimited number of characters before reporting a
    >> matching failure. There are lots of details, but since you are asking
    >> an all-or-nothing question, you probably don't care about them. A
    >> "worst case" example occurs when processing "%d" -- an unlimited number
    >> of white-space characters can be read before any matching failure can be
    >> reported. If there is such a failure, the character that causes it
    >> matching failure will pushed back onto the stream, but that's all.
    >>
    >> The scanf functions only ever push back at most one character so you can
    >> use that a rule of thumb to imagine what they must consume before a
    >> directive can fail.
    >>
    >> For some input patterns %n is an effective solution, but you don't say
    >> what your overall objective is so I can't be sure.

    >
    > That's unfortunate, because then there's no way to recover the non-
    > matching data, as far as I can see.


    You've got good answers already so I'll just indulge in picking a small
    nit: you can always get the non-matching data -- it's left in the stream
    by definition! I know what you mean, of course, I just don't know a
    good word for it.

    To make up for being mean, here's some advice that might help. If you
    want to be able to back-up to the last place a conversion worked, you
    might keep track of the stream position (using fgetpos) after every
    successful fscanf call. After every failed one, fsetpos to the last
    saved position.

    That's no use for sscanf, of course, and it does mean you probably have
    to do calls with only one conversion specifier at a time, but it might
    be all you need.

    > Wouldn't it be better if scanf() buffered non-matching characters, either
    > seemlessly in the background (so that future reads from stdin saw those
    > characters) or by returning a pointer to a buffer containing them?


    --
    Ben.
    Ben Bacarisse, Aug 7, 2012
    #9
  10. Edward Rutherford

    John Bode Guest

    On Tuesday, August 7, 2012 1:37:19 PM UTC-5, Edward Rutherford wrote:
    > Ben Bacarisse wrote:
    >
    > > The scanf functions only ever push back at most one character so you can
    > > use that a rule of thumb to imagine what they must consume before a
    > > directive can fail.
    > >
    > > For some input patterns %n is an effective solution, but you don't say
    > > what your overall objective is so I can't be sure.

    >
    > That's unfortunate, because then there's no way to recover the non-
    > matching data, as far as I can see.
    >


    Which is why scanf() is the wrong tool for all but the simplest
    input tasks. Unless I can guarantee that my input is always
    well-behaved, I avoid using anything from the *scanf() family.

    Use fgets() to consume an entire line, then parse and convert each
    element using tools like strtok(), strtod() strtol(), etc.
    John Bode, Aug 9, 2012
    #10
  11. בת×ריך ×™×•× ×—×ž×™×©×™, 9 ב×וגוסט 2012 22:00:49 UTC+1, מ×ת John Bode:
    >
    > Which is why scanf() is the wrong tool for all but the simplest
    > input tasks. Unless I can guarantee that my input is always
    > well-behaved, I avoid using anything from the *scanf() family.
    >

    It depends on the quality of parsing you want. Let's say we have x, y
    co-ordinates in columns.

    while(fscanf(fp, "%f %f\n", &x, &y) == 2)
    {
    /* assign x and y to arrays */
    }

    won't catch malformed imput like four columns in a rogue line.
    But for many applications, it's probably good enough. If it's a
    bad file, it will fail eventually. A no-one's got an interest in
    providing malicious input to deliberately get a wrong result.
    Malcolm McLean, Aug 11, 2012
    #11
  12. Malcolm McLean <> writes:
    > בת×ריך ×™×•× ×—×ž×™×©×™, 9 ב×וגוסט 2012 22:00:49 UTC+1, מ×ת John Bode:
    >> Which is why scanf() is the wrong tool for all but the simplest
    >> input tasks. Unless I can guarantee that my input is always
    >> well-behaved, I avoid using anything from the *scanf() family.
    >>

    > It depends on the quality of parsing you want. Let's say we have x, y
    > co-ordinates in columns.
    >
    > while(fscanf(fp, "%f %f\n", &x, &y) == 2)
    > {
    > /* assign x and y to arrays */
    > }
    >
    > won't catch malformed imput like four columns in a rogue line.
    > But for many applications, it's probably good enough. If it's a
    > bad file, it will fail eventually. A no-one's got an interest in
    > providing malicious input to deliberately get a wrong result.


    There are worse possibilities than wrong results. If the input includes
    something like "1.0e999999999", the behavior is undefined.

    If you can treat the input file as part of the program, so that an error
    in the input file is the same as a coding error, then it's probably ok
    to use fscanf. Otherwise ...

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 11, 2012
    #12
  13. בת×ריך ×™×•× ×©×‘×ª,11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
    > Malcolm McLean <> writes:
    >
    > There are worse possibilities than wrong results. If the input includes
    > something like "1.0e999999999", the behavior is undefined.
    >

    But what is the program meant to do with such an input value? UB is probably the
    best thing that can happen to it.
    "Undefined behaviour" means "the C standard imposes no restrictions on the
    behaviour of the implementation", not that the behaviour exists in some
    ontological state of undefinedness.
    Malcolm McLean, Aug 11, 2012
    #13
  14. Edward Rutherford

    BartC Guest

    "Malcolm McLean" <> wrote in message
    news:...
    > בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
    >> Malcolm McLean <> writes:
    >>
    >> There are worse possibilities than wrong results. If the input includes
    >> something like "1.0e999999999", the behavior is undefined.
    >>

    > But what is the program meant to do with such an input value? UB is
    > probably the
    > best thing that can happen to it.
    > "Undefined behaviour" means "the C standard imposes no restrictions on the
    > behaviour of the implementation", not that the behaviour exists in some
    > ontological state of undefinedness.


    So, you have an application which consists of a user interface and a bunch
    of data, much of which has not been committed to disk. It's been running for
    several hours, but if the user then chooses a command which reads something
    from a file, and accidentally chooses a binary instead of a text file, the
    application should just crash with the loss of all data?

    You've obviously never had to deal with irate clients on the phone!

    (With the routines I use (the C stuff is buried several layers deep so I
    couldn't tell you exactly what library functions are used), any non-numeric
    input will return 0.0 when trying to read floating point. Anything numeric
    but out-of-range such as 1.0e99999999 is read as INF.

    Input is always line oriented too (so running into end-of-line while trying
    to read more numbers will just return 0.0. All solid behaviour,
    although it's possible some of this is due to a well-behaved version of
    scanf() somewhere.)

    --
    Bartc
    BartC, Aug 11, 2012
    #14
  15. Edward Rutherford

    James Kuyper Guest

    On 08/11/2012 04:34 PM, Malcolm McLean wrote:
    > בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
    >> Malcolm McLean <> writes:
    >>
    >> There are worse possibilities than wrong results. If the input includes
    >> something like "1.0e999999999", the behavior is undefined.
    >>

    > But what is the program meant to do with such an input value? UB is probably the
    > best thing that can happen to it.


    In most of my programs, the program is meant to report the fact that it
    has run into a problem with the input, and to identify the offending
    input and the context in which it occurred (for this particular program
    that would mean identifying the line number). In some cases, at the
    appropriate level (which is, in general, not the same as the level at
    which the problem was detected), it is meant retry the action with
    something changed that might prevent or avoid the problem; for this
    program, that might mean asking the user to provide the name of a
    different file to use as input. UB is almost never on my agenda for the
    appropriate response to an error condition.

    > "Undefined behaviour" means "the C standard imposes no restrictions on the
    > behaviour of the implementation", not that the behaviour exists in some
    > ontological state of undefinedness.


    I try my best to avoid having my programs enter a given state unless
    their behavior in that state is defined by something (not necessarily
    the C standard), and I know what that definition is.
    --
    James Kuyper
    James Kuyper, Aug 12, 2012
    #15
  16. Malcolm McLean <> writes:
    > בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
    >> Malcolm McLean <> writes:
    >>
    >> There are worse possibilities than wrong results. If the input includes
    >> something like "1.0e999999999", the behavior is undefined.
    >>

    > But what is the program meant to do with such an input value? UB is probably the
    > best thing that can happen to it.


    That's absurd. Undefined behavior means that whatever the *worst*
    thing is, it can happen, whether that's crashing the program, or
    continuing to execute quietly with bad data, or reformatting your
    hard drive.

    And before you reject that last possibility, consider this: Is
    there code in your operating system that's designed to reformat
    a hard drive? Are you *certain* that an instance of undefined
    behavior can't possibly corrupt some function pointer and cause
    that code to be invoked?

    I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
    your hard drive (or I wouldn't have tried it just now), but why take the
    risk? If you use gets() and strtod(), you can at least detect an input
    error and abort the program.

    > "Undefined behaviour" means "the C standard imposes no restrictions on the
    > behaviour of the implementation", not that the behaviour exists in some
    > ontological state of undefinedness.


    Nor does it mean "the program is guaranteed to crash with a
    meaningful error message".

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 12, 2012
    #16
  17. Edward Rutherford

    Eric Sosman Guest

    On 8/11/2012 8:30 PM, Keith Thompson wrote:
    >[...]
    > I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
    > your hard drive (or I wouldn't have tried it just now), but why take the
    > risk? If you use gets() and strtod(), you can at least detect an input
    > error and abort the program.


    Extend your wrist, Keith, whilst I administer the slap.
    And I'm rescinding all the gold stars you won last term, too.

    --
    Eric Sosman
    d
    Eric Sosman, Aug 12, 2012
    #17
  18. Eric Sosman <> writes:
    > On 8/11/2012 8:30 PM, Keith Thompson wrote:
    >>[...]
    >> I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
    >> your hard drive (or I wouldn't have tried it just now), but why take the
    >> risk? If you use gets() and strtod(), you can at least detect an input
    >> error and abort the program.

    >
    > Extend your wrist, Keith, whilst I administer the slap.
    > And I'm rescinding all the gold stars you won last term, too.


    Aarrgghh!

    Perhaps my f key isn't working. Nope, there it is, can't use that
    excuse.

    Wrist slap humbly accepted -- but I assure it it was a typo, not a
    thinko.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 12, 2012
    #18
  19. Keith Thompson <> writes:
    [...]
    > Wrist slap humbly accepted -- but I assure it it was a typo, not a
    > thinko.


    .... but I assure *you* it was a typo ...

    http://en.wikipedia.org/wiki/Muphry's_law

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Aug 12, 2012
    #19
  20. Edward Rutherford

    Eric Sosman Guest

    On 8/11/2012 9:13 PM, Keith Thompson wrote:
    > Eric Sosman <> writes:
    >> On 8/11/2012 8:30 PM, Keith Thompson wrote:
    >>> [...]
    >>> I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
    >>> your hard drive (or I wouldn't have tried it just now), but why take the
    >>> risk? If you use gets() and strtod(), you can at least detect an input
    >>> error and abort the program.

    >>
    >> Extend your wrist, Keith, whilst I administer the slap.
    >> And I'm rescinding all the gold stars you won last term, too.

    >
    > Aarrgghh!
    >
    > Perhaps my f key isn't working. Nope, there it is, can't use that
    > excuse.
    >
    > Wrist slap humbly accepted -- but I assure it it was a typo, not a
    > thinko.


    A reudian slip, no doubt.

    --
    Eric Sosman
    d
    Eric Sosman, Aug 12, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Snubis

    Re: safe scanf( ) or gets

    Snubis, Jan 2, 2004, in forum: C++
    Replies:
    0
    Views:
    387
    Snubis
    Jan 2, 2004
  2. JustSomeGuy

    string.scanf?

    JustSomeGuy, Jun 4, 2004, in forum: C++
    Replies:
    5
    Views:
    8,166
    Jorge Rivera
    Jun 6, 2004
  3. Replies:
    5
    Views:
    3,049
    Richard Herring
    Aug 4, 2004
  4. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    scanf (yes/no) - doesn't work + deprecation errors scanf, fopen etc.

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Feb 16, 2006, in forum: C Programming
    Replies:
    185
    Views:
    3,377
    those who know me have no need of my name
    Apr 3, 2006
  5. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    difference between scanf("%i") and scanf("%d") ??? perhaps bug inVS2005?

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Apr 26, 2006, in forum: C Programming
    Replies:
    18
    Views:
    672
    Richard Bos
    May 2, 2006
Loading...

Share This Page