Correct behaviour of scanf and sscanf

R

Rob Thorpe

Given the code:-

r = sscanf (s, "%lf", x);

What is the correct output if the string s is simply "-" ?

If "-" is considered the beginning of a number, that has been
cut-short then the correct output is that r = EOF. If it is taken to
be a letter in the stream, then the output should be r = 0, as far as
I can see. My compiler gives EOF.

Does the standard specify which is correct?
 
M

Mac

Given the code:-

r = sscanf (s, "%lf", x);

What is the correct output if the string s is simply "-" ?

If "-" is considered the beginning of a number, that has been
cut-short then the correct output is that r = EOF. If it is taken to
be a letter in the stream, then the output should be r = 0, as far as
I can see. My compiler gives EOF.

Does the standard specify which is correct?

From C-89 (or a reasonable facsimile thereof)

*************beginning of excerpt ***************

4.9.6.6 The sscanf function

[snip]

Returns

The sscanf function returns the value of the macro EOF if an input
failure occurs before any conversion. Otherwise, the sscanf function
returns the number of input items assigned, which can be fewer than
provided for, or even zero, in the event of an early matching failure.

*********** end of excerpt **************


The behavior you describe sounds correct to me. sscanf() is supposed to
return EOF if an input failure occurs. Elsewhere, it says that
encountering the end of the string is identical to encountering the end
of file in a fscanf() call. So this is just like calling fscanf on a text
file which has the single character '.' in it.

HTH

--Mac
 
E

Eric Sosman

Rob said:
Given the code:-

r = sscanf (s, "%lf", x);

What is the correct output if the string s is simply "-" ?

If "-" is considered the beginning of a number, that has been
cut-short then the correct output is that r = EOF. If it is taken to
be a letter in the stream, then the output should be r = 0, as far as
I can see. My compiler gives EOF.

Does the standard specify which is correct?

Haven't seen a reply in the several hours since I first
saw the message, so (fools rush in ...) I'll hazard a guess.

The Standard speaks of two sorts of failure for *scanf()
directives: "matching failure," which amounts to an input
sequence that doesn't satisfy the syntax required by the
directive, and "input failure," meaning that the source of
input characters dried up -- for the stream-input versions
this means EOF was sensed, and for sscanf() it means the
scan reached the end of the string. On a matching failure,
*scanf() stops operating and returns the number of items
already matched and converted (0, in your example), while
for an input failure *scanf() returns EOF.

So the question boils down to this: When "%lf" processes
"-", is the failure a matching failure or an input failure?

One point of view considers it a matching failure. The
characters for "%f" are supposed to be something strtod() would
swallow: an optional all-whitespace prefix, an optional sign,
and then a character string resembling a floating-point constant
as written in C source code. The string "-" doesn't match this
description (the floating-point constant is missing), so it could
be called a matching failure.

The other viewpoint holds that no "mismatch" was detected
before end-of-string, so it's an input failure. The sequence
of characters is perfectly good as the prefix of a valid match,
and the only thing preventing a complete match is the fact that
no more input was available. Hence (says this argument), the
operation ends with an input failure rather than a matching
failure, and EOF is the correct return value.

IMHO the Standard is not entirely clear about which argument
is correct: is an incomplete prefix a failure to match, or a
failure of the input source? To me, the language of the Standard
doesn't shine enough light into this dark corner -- but if anyone
happens to have a torch to hand, I'd welcome illumination ...

Trying to put myself in the place of an implementor, I'd
imagine the input failure (EOF) outcome would be "more natural,"
but I don't think the Standard's language actually says so in
so many words.

The fool has rushed in; tread, o ye angels!
 
C

CBFalconer

Rob said:
Given the code:-

r = sscanf (s, "%lf", x);

What is the correct output if the string s is simply "-" ?

If "-" is considered the beginning of a number, that has been
cut-short then the correct output is that r = EOF. If it is
taken to be a letter in the stream, then the output should be
r = 0, as far as I can see. My compiler gives EOF.

Does the standard specify which is correct?

The problem is that detection of such a lone '-' requires reading
two characters, the second of which is not a digit. C only
guarantees one level of pushback via ungetc, so whatever routine is
doing the parsing (such as scanf) cannot leave the input stream
unaltered and report 'No number available'. With string sources
this obviously does not apply. So the question is "should the
string and stream operations function in the same manner". A
similar (but worse) problem arises after reading the e in floating
point formats. "3.0e-x" should return 3.0 and have to push back
three chars.

I think the proper thing would be to guarantee three level
pushback, maybe in C05. This requires defining what is to be done
when an application attempts excess pushback :)
 
T

tigervamp

CBFalconer said:
The problem is that detection of such a lone '-' requires reading
two characters, the second of which is not a digit. C only
guarantees one level of pushback via ungetc,

Footnote 242 in section 7.19.6.2 (fscanf) indicates that a _maximum_ of
one character can be pushed back, the standard does not say that sscanf
behaves differently.
so whatever routine is
doing the parsing (such as scanf) cannot leave the input stream
unaltered and report 'No number available'. With string sources
this obviously does not apply. So the question is "should the
string and stream operations function in the same manner".

According to the standard they should.
A similar (but worse) problem arises after reading the e in floating
point formats. "3.0e-x" should return 3.0 and have to push back
three chars.

fscanf should consume the "3.0e-x", recognize a matching failure, push
the "x" back onto the stream, and return 0. This is the behavior
defined in example 3 of section 7.19.6.2p20 (fscanf), and again the
standard specifies that sscanf should behave the same.

I think that in the OP's case the behavior should be similiar and the
return value should be 0, glibc does this and I think they are right
here. From what I can tell, EOF is never returned if a character was
read (regardless of whether is matched or was pushed back), but I may
well be wrong.
I think the proper thing would be to guarantee three level
pushback, maybe in C05. This requires defining what is to be done
when an application attempts excess pushback :)

I think the current behavior is pretty clear and well-defined but
notable implementations do not follow this behavior (Solaris and glibc
both push back multiple characters to achieve the output you described
above, details about the Solaris behavior can be found at
http://iforce.sun.com/protected/solaris10/adoptionkit/general/scanf.txt,
apparently there are instances that require at least 5 characters to be
pushed back to follow the behavior you outlined).
--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Rob Gamble
 
R

Rob Thorpe

CBFalconer said:
The problem is that detection of such a lone '-' requires reading
two characters, the second of which is not a digit. C only
guarantees one level of pushback via ungetc, so whatever routine is
doing the parsing (such as scanf) cannot leave the input stream
unaltered and report 'No number available'. With string sources
this obviously does not apply. So the question is "should the
string and stream operations function in the same manner". A
similar (but worse) problem arises after reading the e in floating
point formats. "3.0e-x" should return 3.0 and have to push back
three chars.

I think the proper thing would be to guarantee three level
pushback, maybe in C05. This requires defining what is to be done
when an application attempts excess pushback :)

Thanks, that explains it.

I wondered if testing for both 0 and EOF is OTT, but since it works as
you describe it's necessary in very many situtions.
 
D

Dan Pop

In said:
Haven't seen a reply in the several hours since I first
saw the message, so (fools rush in ...) I'll hazard a guess.

The Standard speaks of two sorts of failure for *scanf()
directives: "matching failure," which amounts to an input
sequence that doesn't satisfy the syntax required by the
directive, and "input failure," meaning that the source of
input characters dried up -- for the stream-input versions
this means EOF was sensed, and for sscanf() it means the
scan reached the end of the string. On a matching failure,
*scanf() stops operating and returns the number of items
already matched and converted (0, in your example), while
for an input failure *scanf() returns EOF.

So the question boils down to this: When "%lf" processes
"-", is the failure a matching failure or an input failure?

One point of view considers it a matching failure. The
characters for "%f" are supposed to be something strtod() would
swallow: an optional all-whitespace prefix, an optional sign,
and then a character string resembling a floating-point constant
as written in C source code. The string "-" doesn't match this
description (the floating-point constant is missing), so it could
be called a matching failure.

The other viewpoint holds that no "mismatch" was detected
before end-of-string, so it's an input failure. The sequence
of characters is perfectly good as the prefix of a valid match,
and the only thing preventing a complete match is the fact that
no more input was available. Hence (says this argument), the
operation ends with an input failure rather than a matching
failure, and EOF is the correct return value.

IMHO the Standard is not entirely clear about which argument
is correct: is an incomplete prefix a failure to match, or a
failure of the input source? To me, the language of the Standard
doesn't shine enough light into this dark corner -- but if anyone
happens to have a torch to hand, I'd welcome illumination ...

An incomplete prefix followed by an end of file condition cannot be a
matching failure, like "- " or "-foo", we're clearly in the case where
an input failure occured before any conversion, just as if the input
were an empty string.

I agree that the text of the standard is less than crystal clear and I
wouldn't be surprised to see different behaviours on different
implementations. OTOH, as an implementation user, especially in the
case of sscanf, I see no problem: if the function doesn't return 1, it is
obvious that the input string doesn't contain a valid number.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top