Correct behaviour of scanf and sscanf

Discussion in 'C Programming' started by Rob Thorpe, Mar 14, 2005.

  1. Rob Thorpe

    Rob Thorpe Guest

    Given the code:-

    r = sscanf (s, "%lf", x);

    What is the correct output if the string s is simply "-" ?

    If "-" is considered the beginning of a number, that has been
    cut-short then the correct output is that r = EOF. If it is taken to
    be a letter in the stream, then the output should be r = 0, as far as
    I can see. My compiler gives EOF.

    Does the standard specify which is correct?
    Rob Thorpe, Mar 14, 2005
    #1
    1. Advertising

  2. Rob Thorpe

    Mac Guest

    On Mon, 14 Mar 2005 10:44:13 -0800, Rob Thorpe wrote:

    > Given the code:-
    >
    > r = sscanf (s, "%lf", x);
    >
    > What is the correct output if the string s is simply "-" ?
    >
    > If "-" is considered the beginning of a number, that has been
    > cut-short then the correct output is that r = EOF. If it is taken to
    > be a letter in the stream, then the output should be r = 0, as far as
    > I can see. My compiler gives EOF.
    >
    > Does the standard specify which is correct?


    From C-89 (or a reasonable facsimile thereof)

    *************beginning of excerpt ***************

    4.9.6.6 The sscanf function

    [snip]

    Returns

    The sscanf function returns the value of the macro EOF if an input
    failure occurs before any conversion. Otherwise, the sscanf function
    returns the number of input items assigned, which can be fewer than
    provided for, or even zero, in the event of an early matching failure.

    *********** end of excerpt **************


    The behavior you describe sounds correct to me. sscanf() is supposed to
    return EOF if an input failure occurs. Elsewhere, it says that
    encountering the end of the string is identical to encountering the end
    of file in a fscanf() call. So this is just like calling fscanf on a text
    file which has the single character '.' in it.

    HTH

    --Mac
    Mac, Mar 15, 2005
    #2
    1. Advertising

  3. Rob Thorpe

    Eric Sosman Guest

    Rob Thorpe wrote:

    > Given the code:-
    >
    > r = sscanf (s, "%lf", x);
    >
    > What is the correct output if the string s is simply "-" ?
    >
    > If "-" is considered the beginning of a number, that has been
    > cut-short then the correct output is that r = EOF. If it is taken to
    > be a letter in the stream, then the output should be r = 0, as far as
    > I can see. My compiler gives EOF.
    >
    > Does the standard specify which is correct?


    Haven't seen a reply in the several hours since I first
    saw the message, so (fools rush in ...) I'll hazard a guess.

    The Standard speaks of two sorts of failure for *scanf()
    directives: "matching failure," which amounts to an input
    sequence that doesn't satisfy the syntax required by the
    directive, and "input failure," meaning that the source of
    input characters dried up -- for the stream-input versions
    this means EOF was sensed, and for sscanf() it means the
    scan reached the end of the string. On a matching failure,
    *scanf() stops operating and returns the number of items
    already matched and converted (0, in your example), while
    for an input failure *scanf() returns EOF.

    So the question boils down to this: When "%lf" processes
    "-", is the failure a matching failure or an input failure?

    One point of view considers it a matching failure. The
    characters for "%f" are supposed to be something strtod() would
    swallow: an optional all-whitespace prefix, an optional sign,
    and then a character string resembling a floating-point constant
    as written in C source code. The string "-" doesn't match this
    description (the floating-point constant is missing), so it could
    be called a matching failure.

    The other viewpoint holds that no "mismatch" was detected
    before end-of-string, so it's an input failure. The sequence
    of characters is perfectly good as the prefix of a valid match,
    and the only thing preventing a complete match is the fact that
    no more input was available. Hence (says this argument), the
    operation ends with an input failure rather than a matching
    failure, and EOF is the correct return value.

    IMHO the Standard is not entirely clear about which argument
    is correct: is an incomplete prefix a failure to match, or a
    failure of the input source? To me, the language of the Standard
    doesn't shine enough light into this dark corner -- but if anyone
    happens to have a torch to hand, I'd welcome illumination ...

    Trying to put myself in the place of an implementor, I'd
    imagine the input failure (EOF) outcome would be "more natural,"
    but I don't think the Standard's language actually says so in
    so many words.

    The fool has rushed in; tread, o ye angels!

    --
    Eric Sosman
    lid
    Eric Sosman, Mar 15, 2005
    #3
  4. Rob Thorpe

    CBFalconer Guest

    Rob Thorpe wrote:
    >
    > Given the code:-
    >
    > r = sscanf (s, "%lf", x);
    >
    > What is the correct output if the string s is simply "-" ?
    >
    > If "-" is considered the beginning of a number, that has been
    > cut-short then the correct output is that r = EOF. If it is
    > taken to be a letter in the stream, then the output should be
    > r = 0, as far as I can see. My compiler gives EOF.
    >
    > Does the standard specify which is correct?


    The problem is that detection of such a lone '-' requires reading
    two characters, the second of which is not a digit. C only
    guarantees one level of pushback via ungetc, so whatever routine is
    doing the parsing (such as scanf) cannot leave the input stream
    unaltered and report 'No number available'. With string sources
    this obviously does not apply. So the question is "should the
    string and stream operations function in the same manner". A
    similar (but worse) problem arises after reading the e in floating
    point formats. "3.0e-x" should return 3.0 and have to push back
    three chars.

    I think the proper thing would be to guarantee three level
    pushback, maybe in C05. This requires defining what is to be done
    when an application attempts excess pushback :)

    --
    "If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers." - Keith Thompson
    CBFalconer, Mar 15, 2005
    #4
  5. Rob Thorpe

    tigervamp Guest

    CBFalconer wrote:
    > Rob Thorpe wrote:
    > >
    > > Given the code:-
    > >
    > > r = sscanf (s, "%lf", x);
    > >
    > > What is the correct output if the string s is simply "-" ?
    > >
    > > If "-" is considered the beginning of a number, that has been
    > > cut-short then the correct output is that r = EOF. If it is
    > > taken to be a letter in the stream, then the output should be
    > > r = 0, as far as I can see. My compiler gives EOF.
    > >
    > > Does the standard specify which is correct?

    >
    > The problem is that detection of such a lone '-' requires reading
    > two characters, the second of which is not a digit. C only
    > guarantees one level of pushback via ungetc,


    Footnote 242 in section 7.19.6.2 (fscanf) indicates that a _maximum_ of
    one character can be pushed back, the standard does not say that sscanf
    behaves differently.

    > so whatever routine is
    > doing the parsing (such as scanf) cannot leave the input stream
    > unaltered and report 'No number available'. With string sources
    > this obviously does not apply. So the question is "should the
    > string and stream operations function in the same manner".


    According to the standard they should.

    > A similar (but worse) problem arises after reading the e in floating
    > point formats. "3.0e-x" should return 3.0 and have to push back
    > three chars.


    fscanf should consume the "3.0e-x", recognize a matching failure, push
    the "x" back onto the stream, and return 0. This is the behavior
    defined in example 3 of section 7.19.6.2p20 (fscanf), and again the
    standard specifies that sscanf should behave the same.

    I think that in the OP's case the behavior should be similiar and the
    return value should be 0, glibc does this and I think they are right
    here. From what I can tell, EOF is never returned if a character was
    read (regardless of whether is matched or was pushed back), but I may
    well be wrong.

    > I think the proper thing would be to guarantee three level
    > pushback, maybe in C05. This requires defining what is to be done
    > when an application attempts excess pushback :)


    I think the current behavior is pretty clear and well-defined but
    notable implementations do not follow this behavior (Solaris and glibc
    both push back multiple characters to achieve the output you described
    above, details about the Solaris behavior can be found at
    http://iforce.sun.com/protected/solaris10/adoptionkit/general/scanf.txt,
    apparently there are instances that require at least 5 characters to be
    pushed back to follow the behavior you outlined).

    > --
    > "If you want to post a followup via groups.google.com, don't use
    > the broken "Reply" link at the bottom of the article. Click on
    > "show options" at the top of the article, then click on the
    > "Reply" at the bottom of the article headers." - Keith Thompson


    Rob Gamble
    tigervamp, Mar 15, 2005
    #5
  6. Rob Thorpe

    Rob Thorpe Guest

    CBFalconer <> wrote in message news:<>...
    > Rob Thorpe wrote:
    > >
    > > Given the code:-
    > >
    > > r = sscanf (s, "%lf", x);
    > >
    > > What is the correct output if the string s is simply "-" ?
    > >
    > > If "-" is considered the beginning of a number, that has been
    > > cut-short then the correct output is that r = EOF. If it is
    > > taken to be a letter in the stream, then the output should be
    > > r = 0, as far as I can see. My compiler gives EOF.
    > >
    > > Does the standard specify which is correct?

    >
    > The problem is that detection of such a lone '-' requires reading
    > two characters, the second of which is not a digit. C only
    > guarantees one level of pushback via ungetc, so whatever routine is
    > doing the parsing (such as scanf) cannot leave the input stream
    > unaltered and report 'No number available'. With string sources
    > this obviously does not apply. So the question is "should the
    > string and stream operations function in the same manner". A
    > similar (but worse) problem arises after reading the e in floating
    > point formats. "3.0e-x" should return 3.0 and have to push back
    > three chars.
    >
    > I think the proper thing would be to guarantee three level
    > pushback, maybe in C05. This requires defining what is to be done
    > when an application attempts excess pushback :)


    Thanks, that explains it.

    I wondered if testing for both 0 and EOF is OTT, but since it works as
    you describe it's necessary in very many situtions.
    Rob Thorpe, Mar 15, 2005
    #6
  7. Rob Thorpe

    Dan Pop Guest

    In <> Eric Sosman <> writes:

    >Rob Thorpe wrote:
    >
    >> Given the code:-
    >>
    >> r = sscanf (s, "%lf", x);
    >>
    >> What is the correct output if the string s is simply "-" ?
    >>
    >> If "-" is considered the beginning of a number, that has been
    >> cut-short then the correct output is that r = EOF. If it is taken to
    >> be a letter in the stream, then the output should be r = 0, as far as
    >> I can see. My compiler gives EOF.
    >>
    >> Does the standard specify which is correct?

    >
    > Haven't seen a reply in the several hours since I first
    >saw the message, so (fools rush in ...) I'll hazard a guess.
    >
    > The Standard speaks of two sorts of failure for *scanf()
    >directives: "matching failure," which amounts to an input
    >sequence that doesn't satisfy the syntax required by the
    >directive, and "input failure," meaning that the source of
    >input characters dried up -- for the stream-input versions
    >this means EOF was sensed, and for sscanf() it means the
    >scan reached the end of the string. On a matching failure,
    >*scanf() stops operating and returns the number of items
    >already matched and converted (0, in your example), while
    >for an input failure *scanf() returns EOF.
    >
    > So the question boils down to this: When "%lf" processes
    >"-", is the failure a matching failure or an input failure?
    >
    > One point of view considers it a matching failure. The
    >characters for "%f" are supposed to be something strtod() would
    >swallow: an optional all-whitespace prefix, an optional sign,
    >and then a character string resembling a floating-point constant
    >as written in C source code. The string "-" doesn't match this
    >description (the floating-point constant is missing), so it could
    >be called a matching failure.
    >
    > The other viewpoint holds that no "mismatch" was detected
    >before end-of-string, so it's an input failure. The sequence
    >of characters is perfectly good as the prefix of a valid match,
    >and the only thing preventing a complete match is the fact that
    >no more input was available. Hence (says this argument), the
    >operation ends with an input failure rather than a matching
    >failure, and EOF is the correct return value.
    >
    > IMHO the Standard is not entirely clear about which argument
    >is correct: is an incomplete prefix a failure to match, or a
    >failure of the input source? To me, the language of the Standard
    >doesn't shine enough light into this dark corner -- but if anyone
    >happens to have a torch to hand, I'd welcome illumination ...


    An incomplete prefix followed by an end of file condition cannot be a
    matching failure, like "- " or "-foo", we're clearly in the case where
    an input failure occured before any conversion, just as if the input
    were an empty string.

    I agree that the text of the standard is less than crystal clear and I
    wouldn't be surprised to see different behaviours on different
    implementations. OTOH, as an implementation user, especially in the
    case of sscanf, I see no problem: if the function doesn't return 1, it is
    obvious that the input string doesn't contain a valid number.

    Dan
    --
    Dan Pop <>
    Dan Pop, Mar 15, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. effbiae

    sscanf and scanf behave differently

    effbiae, Jan 19, 2006, in forum: C Programming
    Replies:
    2
    Views:
    348
    Keith Thompson
    Jan 19, 2006
  2. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    difference between scanf("%i") and scanf("%d") ??? perhaps bug inVS2005?

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Apr 26, 2006, in forum: C Programming
    Replies:
    18
    Views:
    673
    Richard Bos
    May 2, 2006
  3. ryniek90
    Replies:
    15
    Views:
    1,424
  4. ryniek90
    Replies:
    0
    Views:
    238
    ryniek90
    Oct 13, 2009
  5. Bill Cunningham

    scanf and sscanf

    Bill Cunningham, Jul 8, 2013, in forum: C Programming
    Replies:
    14
    Views:
    343
    Keith Thompson
    Jul 9, 2013
Loading...

Share This Page