Error in scanf implementation or error in example in standard?

Discussion in 'C Programming' started by Simon Biber, Nov 29, 2006.

  1. Simon Biber

    Simon Biber Guest

    The following Example 3 is given in the 1999 C standard for the function
    fscanf:

    > EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
    > measure, and an item name:
    >
    > #include <stdio.h>
    > /* ... */
    > int count; float quant; char units[21], item[21];
    > do {
    > count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
    > fscanf(stdin,"%*[^\n]");
    > } while (!feof(stdin) && !ferror(stdin));
    >
    > If the stdin stream contains the following lines:
    >
    > 2 quarts of oil
    > -12.8degrees Celsius
    > lots of luck
    > 10.0LBS of
    > dirt
    > 100ergs of energy
    >
    > the execution of the above example will be analogous to the following
    > assignments:
    >
    > quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
    > count = 3;
    > quant = -12.8; strcpy(units, "degrees");
    > count = 2; // "C" fails to match "o"
    > count = 0; // "l" fails to match "%f"
    > quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
    > count = 3;
    > count = 0; // "100e" fails to match "%f"
    > count = EOF;


    I have tested several implementations and none of them get the last case
    right. In no case does fscanf return 0 indicating failure to match
    "100ergs of energy" with "%f".

    The actual behaviour varies. Some will match '100', leaving the 'e' unread:

    quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    count = 3;

    While others will match '100e', leaving the 'r' unread:

    quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    count = 3;

    But I am yet to come across an implementation that does what the example
    in the Standard specifies. Is this a failure in the implementations or
    in the Standard itself?

    --
    Simon.
     
    Simon Biber, Nov 29, 2006
    #1
    1. Advertising

  2. Simon Biber wrote:
    > The following Example 3 is given in the 1999 C standard for the function
    > fscanf:
    >
    > > EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
    > > measure, and an item name:
    > >
    > > #include <stdio.h>
    > > /* ... */
    > > int count; float quant; char units[21], item[21];
    > > do {
    > > count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
    > > fscanf(stdin,"%*[^\n]");
    > > } while (!feof(stdin) && !ferror(stdin));
    > >
    > > If the stdin stream contains the following lines:
    > >
    > > 2 quarts of oil
    > > -12.8degrees Celsius
    > > lots of luck
    > > 10.0LBS of
    > > dirt
    > > 100ergs of energy
    > >
    > > the execution of the above example will be analogous to the following
    > > assignments:
    > >
    > > quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
    > > count = 3;
    > > quant = -12.8; strcpy(units, "degrees");
    > > count = 2; // "C" fails to match "o"
    > > count = 0; // "l" fails to match "%f"
    > > quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
    > > count = 3;
    > > count = 0; // "100e" fails to match "%f"
    > > count = EOF;

    >
    > I have tested several implementations and none of them get the last case
    > right. In no case does fscanf return 0 indicating failure to match
    > "100ergs of energy" with "%f".
    >
    > The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    >
    > quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    > count = 3;
    >
    > While others will match '100e', leaving the 'r' unread:
    >
    > quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    > count = 3;
    >
    > But I am yet to come across an implementation that does what the example
    > in the Standard specifies. Is this a failure in the implementations or
    > in the Standard itself?


    Footnote 245 in n1124 states:
    "fscanf pushes back at most one input character onto the input stream.
    Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    are unacceptable to fscanf."

    This was added in response to Defect Report #22:
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

    In the case of 100ergs, fscanf reads up to the r before realizing that
    the "e" is not part of the number but at that point, given the one
    character pushback limit, it can no longer push back both the r and the
    e so it has to return with a failure since 100e is not a valid number.
    Many implementations allow more than one character pushback and take
    advantage of this fact in the fscanf function, hence the behavior you
    have seen. Technically such implementations are in violation of the
    Standard but the sentiment among many implementors is that the
    requirement is unjustified and they just live with non-conformance.

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #2
    1. Advertising

  3. Robert Gamble said:

    <snip>

    > Many implementations allow more than one character pushback and take
    > advantage of this fact in the fscanf function, hence the behavior you
    > have seen. Technically such implementations are in violation of the
    > Standard


    Why?

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Nov 29, 2006
    #3
  4. Simon Biber

    Richard Bos Guest

    "Robert Gamble" <> wrote:

    > Simon Biber wrote:
    > > I have tested several implementations and none of them get the last case
    > > right. In no case does fscanf return 0 indicating failure to match
    > > "100ergs of energy" with "%f".
    > >
    > > The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    > >
    > > quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    > > count = 3;
    > >
    > > While others will match '100e', leaving the 'r' unread:
    > >
    > > quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    > > count = 3;
    > >
    > > But I am yet to come across an implementation that does what the example
    > > in the Standard specifies. Is this a failure in the implementations or
    > > in the Standard itself?

    >
    > Footnote 245 in n1124 states:
    > "fscanf pushes back at most one input character onto the input stream.
    > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    > are unacceptable to fscanf."


    True, but feetneet are not normative. Strictly speaking, there's a
    conflict between two parts of the Standard; the footnote makes it clear
    that in this case, the intent was that the part about a single character
    pushback buffer for input streams overrides the part about parsing
    numbers, but it would be better if that were made explicit in the
    _normative_ text in the next TC.

    Richard
     
    Richard Bos, Nov 29, 2006
    #4
  5. Richard Heathfield wrote:
    > Robert Gamble said:
    >
    > <snip>
    >
    > > Many implementations allow more than one character pushback and take
    > > advantage of this fact in the fscanf function, hence the behavior you
    > > have seen. Technically such implementations are in violation of the
    > > Standard

    >
    > Why?


    Why what? Why such implementations aren't technically conforming?
    Because implementations that push back more than one character in the
    fscanf family of functions do not behave as mandated by the Standard.
    I am not sure I understand your point, perhaps you could clarify with a
    multi-word response.

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #5
  6. Simon Biber

    Ben Pfaff Guest

    "Robert Gamble" <> writes:

    > Many implementations allow more than one character pushback and take
    > advantage of this fact in the fscanf function, hence the behavior you
    > have seen. Technically such implementations are in violation of the
    > Standard but the sentiment among many implementors is that the
    > requirement is unjustified and they just live with non-conformance.


    C99 says this in the description of the ungetc function:

    One character of pushback is guaranteed. If the ungetc
    function is called too many times on the same stream without
    an intervening read or file positioning operation on that
    stream, the operation may fail.

    I don't see a requirement that *only* one character of pushback
    be supported, only that *at least* one character of pushback be
    supported.

    On the other hand, perhaps you are talking about the following
    text and footnote for the fscanf function; your article seems
    ambiguous to me:

    An input item is read from the stream, unless the specification
    includes an n specifier. An input item is defined as the
    longest sequence of input characters which does not exceed
    any specified field width and which is, or is a prefix of, a
    matching input sequence.242)

    242) fscanf pushes back at most one input character onto the
    input stream. Therefore, some sequences that are
    acceptable to strtod, strtol, etc., are unacceptable
    to fscanf.
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Nov 29, 2006
    #6
  7. Richard Bos wrote:
    > "Robert Gamble" <> wrote:
    >
    > > Simon Biber wrote:
    > > > I have tested several implementations and none of them get the last case
    > > > right. In no case does fscanf return 0 indicating failure to match
    > > > "100ergs of energy" with "%f".
    > > >
    > > > The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    > > >
    > > > quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    > > > count = 3;
    > > >
    > > > While others will match '100e', leaving the 'r' unread:
    > > >
    > > > quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    > > > count = 3;
    > > >
    > > > But I am yet to come across an implementation that does what the example
    > > > in the Standard specifies. Is this a failure in the implementations or
    > > > in the Standard itself?

    > >
    > > Footnote 245 in n1124 states:
    > > "fscanf pushes back at most one input character onto the input stream.
    > > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    > > are unacceptable to fscanf."

    >
    > True, but feetneet are not normative.


    And neither are the examples for that matter.

    > Strictly speaking, there's a
    > conflict between two parts of the Standard; the footnote makes it clear
    > that in this case, the intent was that the part about a single character
    > pushback buffer for input streams overrides the part about parsing
    > numbers, but it would be better if that were made explicit in the
    > _normative_ text in the next TC.


    I certainly agree that it would have been nice if this footnote was
    part of the normative text, I don't know why it isn't. The only
    conflict I see is the one in the C90 Standard which was addressed in DR
    022. Although the footnote is non-normative, it along with the example
    and the fact that it was the result of a DR make it abundantly clear
    what the intent was. If intent isn't enough though, a careful reading
    of the normative changes made in the DR (which were carried through to
    C99) yield the same result even if not as clearly spelled out.

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #7
  8. Robert Gamble said:

    > Richard Heathfield wrote:
    >> Robert Gamble said:
    >>
    >> <snip>
    >>
    >> > Many implementations allow more than one character pushback and take
    >> > advantage of this fact in the fscanf function, hence the behavior you
    >> > have seen. Technically such implementations are in violation of the
    >> > Standard

    >>
    >> Why?

    >
    > Why what? Why such implementations aren't technically conforming?


    Yes.

    > Because implementations that push back more than one character in the
    > fscanf family of functions do not behave as mandated by the Standard.


    Why not?

    > I am not sure I understand your point, perhaps you could clarify with a
    > multi-word response.


    <grin> Okay, let me see if I can make it clearer. Maybe you're right that
    providing more than the minimum level of pushback is against the rules, and
    maybe you're not. I can see why an implementation *must* provide at least
    one character of pushback, but where is it *forbidden* from providing more?

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Nov 29, 2006
    #8
  9. Ben Pfaff wrote:
    > "Robert Gamble" <> writes:
    >
    > > Many implementations allow more than one character pushback and take
    > > advantage of this fact in the fscanf function, hence the behavior you
    > > have seen. Technically such implementations are in violation of the
    > > Standard but the sentiment among many implementors is that the
    > > requirement is unjustified and they just live with non-conformance.

    >
    > C99 says this in the description of the ungetc function:
    >
    > One character of pushback is guaranteed. If the ungetc
    > function is called too many times on the same stream without
    > an intervening read or file positioning operation on that
    > stream, the operation may fail.
    >
    > I don't see a requirement that *only* one character of pushback
    > be supported, only that *at least* one character of pushback be
    > supported.


    I was speaking specifically of the pushback used by the fscanf function
    which I thought was clear based on the footnote that I cited. I
    certainly did not mean to imply that multi-character pushback was
    itself incorrect, just its use in the fscanf function.

    > On the other hand, perhaps you are talking about the following
    > text and footnote for the fscanf function; your article seems
    > ambiguous to me:
    >
    > An input item is read from the stream, unless the specification
    > includes an n specifier. An input item is defined as the
    > longest sequence of input characters which does not exceed
    > any specified field width and which is, or is a prefix of, a
    > matching input sequence.242)
    >
    > 242) fscanf pushes back at most one input character onto the
    > input stream. Therefore, some sequences that are
    > acceptable to strtod, strtol, etc., are unacceptable
    > to fscanf.


    Right, I cited this exact footnote at the beginning of my original
    article, perhaps your missed it.

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #9
  10. Richard Heathfield wrote:
    > Robert Gamble said:
    >
    > > Richard Heathfield wrote:
    > >> Robert Gamble said:
    > >>
    > >> <snip>
    > >>
    > >> > Many implementations allow more than one character pushback and take
    > >> > advantage of this fact in the fscanf function, hence the behavior you
    > >> > have seen. Technically such implementations are in violation of the
    > >> > Standard
    > >>
    > >> Why?

    > >
    > > Why what? Why such implementations aren't technically conforming?

    >
    > Yes.
    >
    > > Because implementations that push back more than one character in the
    > > fscanf family of functions do not behave as mandated by the Standard.

    >
    > Why not?
    >
    > > I am not sure I understand your point, perhaps you could clarify with a
    > > multi-word response.

    >
    > <grin> Okay, let me see if I can make it clearer. Maybe you're right that
    > providing more than the minimum level of pushback is against the rules, and
    > maybe you're not. I can see why an implementation *must* provide at least
    > one character of pushback, but where is it *forbidden* from providing more?


    First let me make clear that I am speaking only of the pushback
    functionality used within the fscanf function itself, not the pushback
    capability of a stream in general (which can provide pushback for as
    many characters as it desires), at least one person seems to have been
    confused by my original statement. The Standard makes it clear through
    the discussed footnote and example that the behavior shall be as if a
    maximum of one character of pushback was used within the fscanf
    function ("fscanf pushes back at most one input character onto the
    input stream"). Although footnotes and examples are non-normative, the
    same meaning is supported by the normative changes that were provoked
    by DR 022:

    In subclause 7.9.6.2, page 135, lines 31-33, change:

    "An input item is defined as the longest matching sequence of input
    characters, unless that exceeds a specified field width, in which case
    it is the initial subsequence of that length in the sequence."

    to:

    "An input item is defined as the longest sequence of input characters
    which does not exceed any specified field width and which is, or is a
    prefix of, a matching input sequence."

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #10
  11. Robert Gamble said:

    > The Standard makes it clear through
    > the discussed footnote and example that the behavior shall be as if a
    > maximum of one character of pushback was used within the fscanf
    > function ("fscanf pushes back at most one input character onto the
    > input stream").


    Thank you for clarifying. I know you know that footn...

    > Although footnotes and examples are non-normative,


    ....er, quite so.

    > the
    > same meaning is supported by the normative changes that were provoked
    > by DR 022:


    I've found DRs 200 through 294. I can't find DR 022.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Nov 29, 2006
    #11
  12. Richard Heathfield wrote:
    > Robert Gamble said:
    >
    > > The Standard makes it clear through
    > > the discussed footnote and example that the behavior shall be as if a
    > > maximum of one character of pushback was used within the fscanf
    > > function ("fscanf pushes back at most one input character onto the
    > > input stream").

    >
    > Thank you for clarifying. I know you know that footn...
    >
    > > Although footnotes and examples are non-normative,

    >
    > ...er, quite so.
    >
    > > the
    > > same meaning is supported by the normative changes that were provoked
    > > by DR 022:

    >
    > I've found DRs 200 through 294. I can't find DR 022.


    The link was in my original response:
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

    Robert Gamble
     
    Robert Gamble, Nov 29, 2006
    #12
  13. Simon Biber

    Ben Pfaff Guest

    "Robert Gamble" <> writes:

    >> On the other hand, perhaps you are talking about the following
    >> text and footnote for the fscanf function; your article seems
    >> ambiguous to me:


    [...]

    > Right, I cited this exact footnote at the beginning of my original
    > article, perhaps your missed it.


    I did miss it, sorry.
    --
    Ben Pfaff
    email:
    web: http://benpfaff.org
     
    Ben Pfaff, Nov 29, 2006
    #13
  14. Robert Gamble said:

    > Richard Heathfield wrote:

    <snip>
    >>
    >> I've found DRs 200 through 294. I can't find DR 022.

    >
    > The link was in my original response:
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.


    My apologies for missing that. It does appear that the text under
    consideration is still non-normative. (It's footnote 245 in n1124, for
    those who don't know).

    Having said that, I accept that the intent of footnotes, despite their
    non-normative status, is to clarify the meaning of the Standard, so I'll
    shut up now.

    (Like I care ***so much*** about fscanf! :) )

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Nov 29, 2006
    #14
  15. Simon Biber

    Simon Biber Guest

    Robert Gamble wrote:
    > Simon Biber wrote:
    >> I have tested several implementations and none of them get the last case
    >> right. In no case does fscanf return 0 indicating failure to match
    >> "100ergs of energy" with "%f".
    >>
    >> The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    >>
    >> quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    >> count = 3;
    >>
    >> While others will match '100e', leaving the 'r' unread:
    >>
    >> quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    >> count = 3;
    >>
    >> But I am yet to come across an implementation that does what the example
    >> in the Standard specifies. Is this a failure in the implementations or
    >> in the Standard itself?

    >
    > Footnote 245 in n1124 states:
    > "fscanf pushes back at most one input character onto the input stream.
    > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    > are unacceptable to fscanf."
    >
    > This was added in response to Defect Report #22:
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.
    >
    > In the case of 100ergs, fscanf reads up to the r before realizing that
    > the "e" is not part of the number but at that point, given the one
    > character pushback limit, it can no longer push back both the r and the
    > e so it has to return with a failure since 100e is not a valid number.


    But none of the implementations I tested actually return with a failure!

    Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
    LCC-Win32 or Turbo C, none of them return with a failure. They interpret
    100e as a valid number, with the value 100.

    That's the real bug, not the quibble on how many characters are pushed back.

    --
    Simon.
     
    Simon Biber, Nov 30, 2006
    #15
  16. Simon Biber wrote:
    > Robert Gamble wrote:
    > > Simon Biber wrote:
    > >> I have tested several implementations and none of them get the last case
    > >> right. In no case does fscanf return 0 indicating failure to match
    > >> "100ergs of energy" with "%f".
    > >>
    > >> The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    > >>
    > >> quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    > >> count = 3;
    > >>
    > >> While others will match '100e', leaving the 'r' unread:
    > >>
    > >> quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    > >> count = 3;
    > >>
    > >> But I am yet to come across an implementation that does what the example
    > >> in the Standard specifies. Is this a failure in the implementations or
    > >> in the Standard itself?

    > >
    > > Footnote 245 in n1124 states:
    > > "fscanf pushes back at most one input character onto the input stream.
    > > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    > > are unacceptable to fscanf."
    > >
    > > This was added in response to Defect Report #22:
    > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.
    > >
    > > In the case of 100ergs, fscanf reads up to the r before realizing that
    > > the "e" is not part of the number but at that point, given the one
    > > character pushback limit, it can no longer push back both the r and the
    > > e so it has to return with a failure since 100e is not a valid number.

    >
    > But none of the implementations I tested actually return with a failure!
    >
    > Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
    > LCC-Win32 or Turbo C, none of them return with a failure. They interpret
    > 100e as a valid number, with the value 100.
    >
    > That's the real bug, not the quibble on how many characters are pushed back.


    There are 2 problems here. Implementations that convert 100 and leave
    the "e" on the stream are probably realizing that the "e" is not part
    of the number when it reads the "r" and are pushing back too many
    characters. Implementations that convert 100 and leave the "r" as the
    first character on the stream are incorrectly accepting "100e" as
    equivalent to "100e1". glibc is known to accept certain invalid
    numeric sequences but they don't seem willing to acknowledge such
    problems.

    I tested a number of implementations a while ago and had the same
    results that you have seen. I believe the that at least the Solaris
    and glibc folk are aware of this particular issue but they don't seem
    to have any plans to change their behavior. I believe that uClibc
    (http://uclibc.org/) handled this case correctly, but I'm not positive.

    I haven't tried this on Dinkumware as I don't have access to it but if
    this was going to be handled correctly on any implementation it would
    probably be the Dinkumware C99 library. Their library claims to be
    certified by Perennial as C99-compliant and I believe the behavior in
    question is tested in the certification process. If anyone has access
    to this library it would be nice if they could confirm how it handles
    the this. Additionally, if it does handle this correctly, I would be
    curious to know if the same string is handled the same way with the
    sscanf function (I believe it should but some people do not, the
    Standard isn't crystal clear in my opinion).

    Robert Gamble
     
    Robert Gamble, Nov 30, 2006
    #16
  17. Simon Biber

    P.J. Plauger Guest

    "Robert Gamble" <> wrote in message
    news:...

    > Simon Biber wrote:
    >> Robert Gamble wrote:
    >> > Simon Biber wrote:
    >> >> I have tested several implementations and none of them get the last
    >> >> case
    >> >> right. In no case does fscanf return 0 indicating failure to match
    >> >> "100ergs of energy" with "%f".
    >> >>
    >> >> The actual behaviour varies. Some will match '100', leaving the 'e'
    >> >> unread:
    >> >>
    >> >> quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    >> >> count = 3;
    >> >>
    >> >> While others will match '100e', leaving the 'r' unread:
    >> >>
    >> >> quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    >> >> count = 3;
    >> >>
    >> >> But I am yet to come across an implementation that does what the
    >> >> example
    >> >> in the Standard specifies. Is this a failure in the implementations or
    >> >> in the Standard itself?
    >> >
    >> > Footnote 245 in n1124 states:
    >> > "fscanf pushes back at most one input character onto the input stream.
    >> > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    >> > are unacceptable to fscanf."
    >> >
    >> > This was added in response to Defect Report #22:
    >> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.
    >> >
    >> > In the case of 100ergs, fscanf reads up to the r before realizing that
    >> > the "e" is not part of the number but at that point, given the one
    >> > character pushback limit, it can no longer push back both the r and the
    >> > e so it has to return with a failure since 100e is not a valid number.

    >>
    >> But none of the implementations I tested actually return with a failure!
    >>
    >> Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
    >> LCC-Win32 or Turbo C, none of them return with a failure. They interpret
    >> 100e as a valid number, with the value 100.
    >>
    >> That's the real bug, not the quibble on how many characters are pushed
    >> back.

    >
    > There are 2 problems here. Implementations that convert 100 and leave
    > the "e" on the stream are probably realizing that the "e" is not part
    > of the number when it reads the "r" and are pushing back too many
    > characters. Implementations that convert 100 and leave the "r" as the
    > first character on the stream are incorrectly accepting "100e" as
    > equivalent to "100e1". glibc is known to accept certain invalid
    > numeric sequences but they don't seem willing to acknowledge such
    > problems.
    >
    > I tested a number of implementations a while ago and had the same
    > results that you have seen. I believe the that at least the Solaris
    > and glibc folk are aware of this particular issue but they don't seem
    > to have any plans to change their behavior. I believe that uClibc
    > (http://uclibc.org/) handled this case correctly, but I'm not positive.
    >
    > I haven't tried this on Dinkumware as I don't have access to it but if
    > this was going to be handled correctly on any implementation it would
    > probably be the Dinkumware C99 library. Their library claims to be
    > certified by Perennial as C99-compliant and I believe the behavior in
    > question is tested in the certification process. If anyone has access
    > to this library it would be nice if they could confirm how it handles
    > the this. Additionally, if it does handle this correctly, I would be
    > curious to know if the same string is handled the same way with the
    > sscanf function (I believe it should but some people do not, the
    > Standard isn't crystal clear in my opinion).


    We do it right (if only to score 100 per cent on the Perennial C99
    validation suite), where by "right" I mean what the DR tells us
    to do -- consume "100e", fail, and leave "r" in the input stream.
    We do the same for both scanf and sscanf, since the code is common.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
     
    P.J. Plauger, Nov 30, 2006
    #17
  18. P.J. Plauger wrote:
    > "Robert Gamble" <> wrote in message
    > news:...
    >
    > > Simon Biber wrote:
    > >> Robert Gamble wrote:
    > >> > Simon Biber wrote:
    > >> >> I have tested several implementations and none of them get the last
    > >> >> case
    > >> >> right. In no case does fscanf return 0 indicating failure to match
    > >> >> "100ergs of energy" with "%f".
    > >> >>
    > >> >> The actual behaviour varies. Some will match '100', leaving the 'e'
    > >> >> unread:
    > >> >>
    > >> >> quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    > >> >> count = 3;
    > >> >>
    > >> >> While others will match '100e', leaving the 'r' unread:
    > >> >>
    > >> >> quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    > >> >> count = 3;
    > >> >>
    > >> >> But I am yet to come across an implementation that does what the
    > >> >> example
    > >> >> in the Standard specifies. Is this a failure in the implementations or
    > >> >> in the Standard itself?
    > >> >
    > >> > Footnote 245 in n1124 states:
    > >> > "fscanf pushes back at most one input character onto the input stream.
    > >> > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    > >> > are unacceptable to fscanf."
    > >> >
    > >> > This was added in response to Defect Report #22:
    > >> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.
    > >> >
    > >> > In the case of 100ergs, fscanf reads up to the r before realizing that
    > >> > the "e" is not part of the number but at that point, given the one
    > >> > character pushback limit, it can no longer push back both the r and the
    > >> > e so it has to return with a failure since 100e is not a valid number.
    > >>
    > >> But none of the implementations I tested actually return with a failure!
    > >>
    > >> Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
    > >> LCC-Win32 or Turbo C, none of them return with a failure. They interpret
    > >> 100e as a valid number, with the value 100.
    > >>
    > >> That's the real bug, not the quibble on how many characters are pushed
    > >> back.

    > >
    > > There are 2 problems here. Implementations that convert 100 and leave
    > > the "e" on the stream are probably realizing that the "e" is not part
    > > of the number when it reads the "r" and are pushing back too many
    > > characters. Implementations that convert 100 and leave the "r" as the
    > > first character on the stream are incorrectly accepting "100e" as
    > > equivalent to "100e1". glibc is known to accept certain invalid
    > > numeric sequences but they don't seem willing to acknowledge such
    > > problems.
    > >
    > > I tested a number of implementations a while ago and had the same
    > > results that you have seen. I believe the that at least the Solaris
    > > and glibc folk are aware of this particular issue but they don't seem
    > > to have any plans to change their behavior. I believe that uClibc
    > > (http://uclibc.org/) handled this case correctly, but I'm not positive.
    > >
    > > I haven't tried this on Dinkumware as I don't have access to it but if
    > > this was going to be handled correctly on any implementation it would
    > > probably be the Dinkumware C99 library. Their library claims to be
    > > certified by Perennial as C99-compliant and I believe the behavior in
    > > question is tested in the certification process. If anyone has access
    > > to this library it would be nice if they could confirm how it handles
    > > the this. Additionally, if it does handle this correctly, I would be
    > > curious to know if the same string is handled the same way with the
    > > sscanf function (I believe it should but some people do not, the
    > > Standard isn't crystal clear in my opinion).

    >
    > We do it right (if only to score 100 per cent on the Perennial C99
    > validation suite), where by "right" I mean what the DR tells us
    > to do -- consume "100e", fail, and leave "r" in the input stream.
    > We do the same for both scanf and sscanf, since the code is common.


    Thanks very much for the input. I sense from you the same sentiment
    that I have seen expressed from other implementors, that the one
    character max pushback mandate isn't well-received. Although the
    Rationale doesn't provide any insight as to why this decision was made
    I would assume it would be to support implementations that only provide
    a single character pushback while keeping results consistent among
    implementations that could provide more. Do you feel that there is a
    better way to handle this, has there been any discussion on changing
    this behavior in the Standard, and is this a common sentiment in your
    experience?

    Robert Gamble
     
    Robert Gamble, Nov 30, 2006
    #18
  19. Simon Biber

    Random832 Guest

    2006-11-30 <>,
    Robert Gamble wrote:
    > Simon Biber wrote:
    >> Robert Gamble wrote:
    >> > Simon Biber wrote:
    >> >> I have tested several implementations and none of them get the last case
    >> >> right. In no case does fscanf return 0 indicating failure to match
    >> >> "100ergs of energy" with "%f".
    >> >>
    >> >> The actual behaviour varies. Some will match '100', leaving the 'e' unread:
    >> >>
    >> >> quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
    >> >> count = 3;
    >> >>
    >> >> While others will match '100e', leaving the 'r' unread:
    >> >>
    >> >> quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
    >> >> count = 3;
    >> >>
    >> >> But I am yet to come across an implementation that does what the example
    >> >> in the Standard specifies. Is this a failure in the implementations or
    >> >> in the Standard itself?
    >> >
    >> > Footnote 245 in n1124 states:
    >> > "fscanf pushes back at most one input character onto the input stream.
    >> > Therefore, some sequences that are acceptable to strtod, strtol, etc.,
    >> > are unacceptable to fscanf."
    >> >
    >> > This was added in response to Defect Report #22:
    >> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.
    >> >
    >> > In the case of 100ergs, fscanf reads up to the r before realizing that
    >> > the "e" is not part of the number but at that point, given the one
    >> > character pushback limit, it can no longer push back both the r and the
    >> > e so it has to return with a failure since 100e is not a valid number.

    >>
    >> But none of the implementations I tested actually return with a failure!
    >>
    >> Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
    >> LCC-Win32 or Turbo C, none of them return with a failure. They interpret
    >> 100e as a valid number, with the value 100.
    >>
    >> That's the real bug, not the quibble on how many characters are pushed back.

    >
    > There are 2 problems here. Implementations that convert 100 and leave
    > the "e" on the stream are probably realizing that the "e" is not part
    > of the number when it reads the "r" and are pushing back too many
    > characters. Implementations that convert 100 and leave the "r" as the
    > first character on the stream are incorrectly accepting "100e" as
    > equivalent to "100e1".


    100e0, actually - which it's arguable* that it in fact is equivalent.

    * Arguable. adj. That for which "one would be wrong, but one could argue it."
     
    Random832, Nov 30, 2006
    #19
  20. Simon Biber

    CBFalconer Guest

    "P.J. Plauger" wrote:
    >

    .... snip about parsing "100ergs" as a real ...
    >
    > We do it right (if only to score 100 per cent on the Perennial C99
    > validation suite), where by "right" I mean what the DR tells us
    > to do -- consume "100e", fail, and leave "r" in the input stream.
    > We do the same for both scanf and sscanf, since the code is common.


    Which makes sense, especially if you consider the spec as reading
    "stop on the first character that cannot describe a real". It also
    makes sense if you conceive of an empty field as describing zero.
    This more or less agrees with the standard (at least N869):

    [#4] If the subject sequence has the expected form for a
    floating-point number, the sequence of characters starting
    with the first digit or the decimal-point character
    (whichever occurs first) is interpreted as a floating
    constant according to the rules of 6.4.4.2, except that the
    decimal-point character is used in place of a period, and |
    that if neither an exponent part nor a decimal-point |
    character appears in a decimal floating point number, or if |
    a binary exponent part does not appear in a binary floating |
    point number, an exponent part of the appropriate type with |
    value zero is assumed to follow the last digit in the |
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    string. If the subject sequence begins with a minus sign, |
    the sequence is interpreted as negated.235) A character
    sequence INF or INFINITY is interpreted as an infinity, if
    representable in the return type, else like a floating
    constant that is too large for the range of the return type.
    A character sequence NAN or NAN(n-char-sequence-opt), is
    interpreted as a quiet NaN, if supported in the return type,
    else like a subject sequence part that does not have the
    expected form; the meaning of the n-char sequences is
    implementation-defined.236) A pointer to the final string *
    is stored in the object pointed to by endptr, provided that
    endptr is not a null pointer.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Nov 30, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    scanf (yes/no) - doesn't work + deprecation errors scanf, fopen etc.

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Feb 16, 2006, in forum: C Programming
    Replies:
    185
    Views:
    3,523
    those who know me have no need of my name
    Apr 3, 2006
  2. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    difference between scanf("%i") and scanf("%d") ??? perhaps bug inVS2005?

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Apr 26, 2006, in forum: C Programming
    Replies:
    18
    Views:
    709
    Richard Bos
    May 2, 2006
  3. Michael Tsang
    Replies:
    32
    Views:
    1,161
    Richard Bos
    Mar 1, 2010
  4. Michael Tsang
    Replies:
    54
    Views:
    1,241
    Phil Carmody
    Mar 30, 2010
  5. Venks
    Replies:
    5
    Views:
    279
    Ken Bloom
    Dec 6, 2007
Loading...

Share This Page