Error in scanf implementation or error in example in standard?

Simon Biber · Nov 29, 2006

The following Example 3 is given in the 1999 C standard for the function
fscanf:

EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;

I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Robert Gamble · Nov 29, 2006

Simon said:
The following Example 3 is given in the 1999 C standard for the function
fscanf:

EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of
measure, and an item name:

#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));

If the stdin stream contains the following lines:

2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy

the execution of the above example will be analogous to the following
assignments:

quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;

Click to expand...

I have tested several implementations and none of them get the last case
right. In no case does fscanf return 0 indicating failure to match
"100ergs of energy" with "%f".

The actual behaviour varies. Some will match '100', leaving the 'e' unread:

quant = 100; strcpy(units, "ergs"); strcpy(item, "energy");
count = 3;

While others will match '100e', leaving the 'r' unread:

quant = 100; strcpy(units, "rgs"); strcpy(item, "energy");
count = 3;

But I am yet to come across an implementation that does what the example
in the Standard specifies. Is this a failure in the implementations or
in the Standard itself?

Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

Robert Gamble

Richard Heathfield · Nov 29, 2006

Robert Gamble said:

Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard

Why?

Richard Bos · Nov 29, 2006

Robert Gamble said:
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

True, but feetneet are not normative. Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.

Richard

Robert Gamble · Nov 29, 2006

Richard said:
Robert Gamble said:

Why?

Why what? Why such implementations aren't technically conforming?
Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.
I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

Robert Gamble

Ben Pfaff · Nov 29, 2006

Robert Gamble said:
Many implementations allow more than one character pushback and take
advantage of this fact in the fscanf function, hence the behavior you
have seen. Technically such implementations are in violation of the
Standard but the sentiment among many implementors is that the
requirement is unjustified and they just live with non-conformance.

C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.

On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.

Robert Gamble · Nov 29, 2006

Richard said:
True, but feetneet are not normative.

And neither are the examples for that matter.

Strictly speaking, there's a
conflict between two parts of the Standard; the footnote makes it clear
that in this case, the intent was that the part about a single character
pushback buffer for input streams overrides the part about parsing
numbers, but it would be better if that were made explicit in the
_normative_ text in the next TC.

I certainly agree that it would have been nice if this footnote was
part of the normative text, I don't know why it isn't. The only
conflict I see is the one in the C90 Standard which was addressed in DR
022. Although the footnote is non-normative, it along with the example
and the fact that it was the result of a DR make it abundantly clear
what the intent was. If intent isn't enough though, a careful reading
of the normative changes made in the DR (which were carried through to
C99) yield the same result even if not as clearly spelled out.

Robert Gamble

Richard Heathfield · Nov 29, 2006

Robert Gamble said:

Why what? Why such implementations aren't technically conforming?
Yes.

Because implementations that push back more than one character in the
fscanf family of functions do not behave as mandated by the Standard.

Why not?

I am not sure I understand your point, perhaps you could clarify with a
multi-word response.

<grin> Okay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?

Robert Gamble · Nov 29, 2006

Ben said:
C99 says this in the description of the ungetc function:

One character of pushback is guaranteed. If the ungetc
function is called too many times on the same stream without
an intervening read or file positioning operation on that
stream, the operation may fail.

I don't see a requirement that *only* one character of pushback
be supported, only that *at least* one character of pushback be
supported.

I was speaking specifically of the pushback used by the fscanf function
which I thought was clear based on the footnote that I cited. I
certainly did not mean to imply that multi-character pushback was
itself incorrect, just its use in the fscanf function.

On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the
longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a
matching input sequence.242)

242) fscanf pushes back at most one input character onto the
input stream. Therefore, some sequences that are
acceptable to strtod, strtol, etc., are unacceptable
to fscanf.

Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.

Robert Gamble

Robert Gamble · Nov 29, 2006

Richard said:
Robert Gamble said:

Why not?

<grin> Okay, let me see if I can make it clearer. Maybe you're right that
providing more than the minimum level of pushback is against the rules, and
maybe you're not. I can see why an implementation *must* provide at least
one character of pushback, but where is it *forbidden* from providing more?

First let me make clear that I am speaking only of the pushback
functionality used within the fscanf function itself, not the pushback
capability of a stream in general (which can provide pushback for as
many characters as it desires), at least one person seems to have been
confused by my original statement. The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream"). Although footnotes and examples are non-normative, the
same meaning is supported by the normative changes that were provoked
by DR 022:

In subclause 7.9.6.2, page 135, lines 31-33, change:

"An input item is defined as the longest matching sequence of input
characters, unless that exceeds a specified field width, in which case
it is the initial subsequence of that length in the sequence."

to:

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence."

Robert Gamble

Richard Heathfield · Nov 29, 2006

Robert Gamble said:

The Standard makes it clear through
the discussed footnote and example that the behavior shall be as if a
maximum of one character of pushback was used within the fscanf
function ("fscanf pushes back at most one input character onto the
input stream").

Thank you for clarifying. I know you know that footn...

Although footnotes and examples are non-normative,

....er, quite so.

the
same meaning is supported by the normative changes that were provoked
by DR 022:

I've found DRs 200 through 294. I can't find DR 022.

Robert Gamble · Nov 29, 2006

Richard said:
Robert Gamble said:

Thank you for clarifying. I know you know that footn...

...er, quite so.

I've found DRs 200 through 294. I can't find DR 022.

The link was in my original response:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

Robert Gamble

Ben Pfaff · Nov 29, 2006

Robert Gamble said:
On the other hand, perhaps you are talking about the following
text and footnote for the fscanf function; your article seems
ambiguous to me:

Click to expand...

[...]

Right, I cited this exact footnote at the beginning of my original
article, perhaps your missed it.

I did miss it, sorry.

Richard Heathfield · Nov 29, 2006

Robert Gamble said:

Richard Heathfield wrote:

The link was in my original response:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

My apologies for missing that. It does appear that the text under
consideration is still non-normative. (It's footnote 245 in n1124, for
those who don't know).

Having said that, I accept that the intent of footnotes, despite their
non-normative status, is to clarify the meaning of the Standard, so I'll
shut up now.

(Like I care ***so much*** about fscanf!

)

Simon Biber · Nov 30, 2006

Robert said:
Footnote 245 in n1124 states:
"fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc.,
are unacceptable to fscanf."

This was added in response to Defect Report #22:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_022.html.

In the case of 100ergs, fscanf reads up to the r before realizing that
the "e" is not part of the number but at that point, given the one
character pushback limit, it can no longer push back both the r and the
e so it has to return with a failure since 100e is not a valid number.

But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.

Robert Gamble · Nov 30, 2006

Simon said:
But none of the implementations I tested actually return with a failure!

Try it -- whether on Solaris, Linux, Cygwin, DJGPP, Microsoft VC++,
LCC-Win32 or Turbo C, none of them return with a failure. They interpret
100e as a valid number, with the value 100.

That's the real bug, not the quibble on how many characters are pushed back.

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

Robert Gamble

P.J. Plauger · Nov 30, 2006

There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1". glibc is known to accept certain invalid
numeric sequences but they don't seem willing to acknowledge such
problems.

I tested a number of implementations a while ago and had the same
results that you have seen. I believe the that at least the Solaris
and glibc folk are aware of this particular issue but they don't seem
to have any plans to change their behavior. I believe that uClibc
(http://uclibc.org/) handled this case correctly, but I'm not positive.

I haven't tried this on Dinkumware as I don't have access to it but if
this was going to be handled correctly on any implementation it would
probably be the Dinkumware C99 library. Their library claims to be
certified by Perennial as C99-compliant and I believe the behavior in
question is tested in the certification process. If anyone has access
to this library it would be nice if they could confirm how it handles
the this. Additionally, if it does handle this correctly, I would be
curious to know if the same string is handled the same way with the
sscanf function (I believe it should but some people do not, the
Standard isn't crystal clear in my opinion).

We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Robert Gamble · Nov 30, 2006

P.J. Plauger said:
We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

Thanks very much for the input. I sense from you the same sentiment
that I have seen expressed from other implementors, that the one
character max pushback mandate isn't well-received. Although the
Rationale doesn't provide any insight as to why this decision was made
I would assume it would be to support implementations that only provide
a single character pushback while keeping results consistent among
implementations that could provide more. Do you feel that there is a
better way to handle this, has there been any discussion on changing
this behavior in the Standard, and is this a common sentiment in your
experience?

Robert Gamble

Random832 · Nov 30, 2006

2006-11-30 said:
There are 2 problems here. Implementations that convert 100 and leave
the "e" on the stream are probably realizing that the "e" is not part
of the number when it reads the "r" and are pushing back too many
characters. Implementations that convert 100 and leave the "r" as the
first character on the stream are incorrectly accepting "100e" as
equivalent to "100e1".

100e0, actually - which it's arguable* that it in fact is equivalent.

* Arguable. adj. That for which "one would be wrong, but one could argue it."

CBFalconer · Nov 30, 2006

.... snip about parsing "100ergs" as a real ...

We do it right (if only to score 100 per cent on the Perennial C99
validation suite), where by "right" I mean what the DR tells us
to do -- consume "100e", fail, and leave "r" in the input stream.
We do the same for both scanf and sscanf, since the code is common.

Which makes sense, especially if you consider the spec as reading
"stop on the first character that cannot describe a real". It also
makes sense if you conceive of an empty field as describing zero.
This more or less agrees with the standard (at least N869):

[#4] If the subject sequence has the expected form for a
floating-point number, the sequence of characters starting
with the first digit or the decimal-point character
(whichever occurs first) is interpreted as a floating
constant according to the rules of 6.4.4.2, except that the
decimal-point character is used in place of a period, and |
that if neither an exponent part nor a decimal-point |
character appears in a decimal floating point number, or if |
a binary exponent part does not appear in a binary floating |
point number, an exponent part of the appropriate type with |
value zero is assumed to follow the last digit in the |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
string. If the subject sequence begins with a minus sign, |
the sequence is interpreted as negated.235) A character
sequence INF or INFINITY is interpreted as an infinity, if
representable in the return type, else like a floating
constant that is too large for the range of the return type.
A character sequence NAN or NAN(n-char-sequence-opt), is
interpreted as a quiet NaN, if supported in the return type,
else like a subject sequence part that does not have the
expected form; the meaning of the n-char sequences is
implementation-defined.236) A pointer to the final string *
is stored in the object pointed to by endptr, provided that
endptr is not a null pointer.

which functions set the end-of-file indicator?	67	Aug 4, 2008
composite type example from C11 standard	2	Aug 4, 2012
Idk need help in editing this source code	0	Nov 5, 2022
C Standard Regarding Null Pointer Dereferencing	280	Jul 21, 2010
an example code of TPOP(the practice of programming),why cannot lookup	2	Apr 20, 2007
Dynamic lists of strings in C	28	Mar 31, 2007
compling error in visual studio	1	Dec 6, 2004
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010

Error in scanf implementation or error in example in standard?

Simon Biber

Robert Gamble

Richard Heathfield

Richard Bos

Robert Gamble

Ben Pfaff

Robert Gamble

Richard Heathfield

Robert Gamble

Robert Gamble

Richard Heathfield

Robert Gamble

Ben Pfaff

Richard Heathfield

Simon Biber

Robert Gamble

P.J. Plauger

Robert Gamble

Random832

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads