Puzzle!

R

Richard

Tor Rustad said:
Very strange advice, the problem at hand, *might* have specified that
OP don't even need to check for input errors.

And the input stream might be via a pipe with guaranteed input data integrity.
 
K

Keith Thompson

Tor Rustad said:
Hmm... I can't say I agree with your interpretation Keith.

How can the %li conversion in scanf overflow a long object?
For %s and %c conversions, it's possible with such an overflow, but I
don't see the same for "numbers".

Easily, if the input string represents a number outside the range of
the target type.
Of course, I assume there is no type mismatch... between conversion
specifiers and object.

#include <stdio.h>
#include <limits.h>
#include <string.h>
int main(void)
{
#define BIG_ENOUGH 100
char buf[BIG_ENOUGH];
long n = 0;
int result;

sprintf(buf, "%ld", LONG_MAX);
memset(buf, '9', strlen(buf));

printf("LONG_MAX = %ld\n", LONG_MAX);
printf("buf = \"%s\"\n", buf);

result = sscanf(buf, "%li", &n);
if (result == 1) {
printf("n = %ld\n", n);
}
else {
printf("n = %ld (may not be meaningful)\n", n);
}
printf("sscanf() returned %d\n", result);

return 0;
}

I get the following results on various systems (all of which are
valid, since the behavior is undefined):

LONG_MAX = 2147483647
buf = "9999999999"
n = 2147483647
sscanf() returned 1

LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = 9223372036854775807
sscanf() returned 1

LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = -8446744073709551617
sscanf() returned 1

LONG_MAX = 2147483647
buf = "9999999999"
n = 1410065407
sscanf() returned 1

LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = 0 (may not be meaningful)
sscanf() returned -1
 
C

Chris Torek

But how can I avoid [various problems with scanf]?
[/QUOTE][/QUOTE]

A little too extreme...

Not in general. Maybe in some specific, well-controlled cases. :)
For example:

do {
int k = scanf("%d", &n);
switch (k) {
[snippage]

As someone (I think Keith Thompson) pointed out, this can misbehave
(formally, has "undefined behavior") if someone enters an overlarge
number. Your second variant (using %ld and a "long") still suffers
from this problem. (In practice, real implementations almost always
do at least one of these things: use strtol() internally, thus
clamping out of range inputs; use atoi() or atol() internally,
often behaving annoyingly but predictably by "wrapping" out of
range inputs based on the machine's internal representation; or
discover -- in some cases based solely on input length, and thus
sometimes having trouble with leading zeros -- that the value would
have been out of range, and stop the scanf engine with a matching
failure.)

More practically, this code is rather "user-unfriendly" when the
scanf() occurs after a recent prompt:

printf("enter a number: ");
fflush(stdout);
/* your loop using scanf() here */

If the user types nothing but a carriage return, the computer
simply sits there, having accepted the input line, waiting for
more input -- but without producing any diagnostic output like:
puts("Please enter an integer");

or repeating the prompt or anything. The scanf() call is still
running, waiting for more input, since "%d" skips leading white
space, and an empty line is "white space".

The rule that beginners should never use scanf() is, I think, a
good one. (To gain experience with the scanf family's behavior,
one can get input lines into buffers and apply sscanf(), then
inspect the subsequent wreckage. The buffers preserve the original
input and help insulate the program's "weird internal behavior due
to scanf" from its "obvious and immediate external I/O behavior".
In other words, the buffers ... well, they *buffer*:

buffer ... [as transitive verb] ... 12. to cushion, shield,
or protect. 13. to lessen the adverse effect of ...
-- from <http://dictionary.reference.com/browse/buffer>

The scanf() function is a lot like a field of cactus in bloom:
pretty, but it is dangerous to get too close. :) )
 
T

Tor Rustad

Keith said:
Easily, if the input string represents a number outside the range of
the target type.

Thanks for your detailed answer Keith, do you perhaps also have a link
to comp.std.c, where this topic has been discussed?

When I read the C standard, I expected numerical overflow/underflow to
be a matching failure, if *not* underflow or overflow is *inappropriate
input*, what do we call such input then???

Indeed, TR 24731 clearly define this as a matching failure too, but the
scanf_s() family of functions, is just an extension.

If still accepting the input item, implementations *should* IMO give it
LONG_MAX/LONG_MIN value, so that the caller can detect the error condition.

When the "numerical" directive is processed, the valid range of
numerical input field is known apriori to the implementation, so IMO
there is no acceptable reason for implementations (or standard) to put
this into UB land.

C89 also state:

<quote>
The following conversion specifiers are valid:

d Matches an optionally signed decimal integer, whose format is the
same as expected for the subject sequence of the strtol function with
the value 10 for the base argument. The corresponding argument shall
be a pointer to integer.
</quote>

and since strtol() function very much detect underflow/overflow, I guess
more clever people than me, has been fooled by this too (e.g. I can't
remember Les Hattton in "Safer C" listing this case in his UB list).


I get the following results on various systems (all of which are
valid, since the behavior is undefined):

LONG_MAX = 2147483647
buf = "9999999999"
n = 2147483647
sscanf() returned 1

If 1 is returned, overflow(/underflow) can still be detected by checking
LONG_MAX (or LONG_MIN)
LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = 9223372036854775807
sscanf() returned 1

Also, ok.
LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = -8446744073709551617
sscanf() returned 1

Which C implementation is this?
LONG_MAX = 2147483647
buf = "9999999999"
n = 1410065407
sscanf() returned 1

Which C implementation is this?
LONG_MAX = 9223372036854775807
buf = "9999999999999999999"
n = 0 (may not be meaningful)
sscanf() returned -1

Did scanf return EOF? It is not a input failure in my view.
 
A

Army1987

Keith Thompson said:
Army1987 said:
"Mark McIntyre" <[email protected]> ha scritto nel messaggio
Don't use scanf.

A little too extreme...
For example:

do {
int k = scanf("%d", &n); [snip]

Or to avoid strange problem if the number input is too large:

#include <limits.h>
#include <errno.h>
do {
long tmp;
int k = scanf("%ld", &tmp);
[snip]

C99 7.19.6.2p10:

If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the
behavior is undefined.

Yes, but, two paragraphs below:
d Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtol function with the value 10
for the base argument. The corresponding argument shall be a pointer to
signed integer.

7.20.1.4p8:
The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned. If the
correct value is outside the range of representable values, LONG_MIN,
LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned
(according to the return type and sign of the value, if any), and the value
of the macro ERANGE is stored in errno.

So apparently the condition you're mentioning only can happen with
int or narrower types. If my interpretation of the Standard is
correct, with long the worst that can happen is that tmp becomes
LONG_MAX or LONG_MIN and errno becomes ERANGE. (But I would not be
that sure, since "whose format is the same as expected for the
subject sequence of the strtol function with the value 10 for the
base argument" doesn't clearly state that the result will be the
same, too. But then, there is nothing to guarantee that the
sequence "12345" isn't converted to the int -23.)
 
A

Army1987

Chris Torek said:
More practically, this code is rather "user-unfriendly" when the
scanf() occurs after a recent prompt:

printf("enter a number: ");
fflush(stdout);
/* your loop using scanf() here */

If the user types nothing but a carriage return, the computer
simply sits there, having accepted the input line, waiting for
more input -- but without producing any diagnostic output like:


or repeating the prompt or anything. The scanf() call is still
running, waiting for more input, since "%d" skips leading white
space, and an empty line is "white space".
At which point, the user simplily enters the number and re-hits
enter, and the programs goes on correctly. Unless the user thinks
that simplily hitting enter means the program should read their
mind.
 
K

Keith Thompson

Tor Rustad said:
Thanks for your detailed answer Keith, do you perhaps also have a link
to comp.std.c, where this topic has been discussed?

No, I don't recall it being discussed there (which is no indication
that it hasn't been discussed).
When I read the C standard, I expected numerical overflow/underflow to
be a matching failure, if *not* underflow or overflow is
*inappropriate input*, what do we call such input then???

Really? C99 7.19.6.2 seems very clear to me:

... the input item ... is converted to a type appropriate to the
conversion specifier. ... if the result of the conversion cannot
be represented in the object, the behavior is undefined.

Note that it doesn't specify how the conversion is performed.

[...]
If still accepting the input item, implementations *should* IMO give
it LONG_MAX/LONG_MIN value, so that the caller can detect the error
condition.

How could the caller distinguish between an overflow and a valid input
that happens to yield the value LONG_MAX or LONG_MIN?
When the "numerical" directive is processed, the valid range of
numerical input field is known apriori to the implementation, so IMO
there is no acceptable reason for implementations (or standard) to put
this into UB land.

I agree, but the standard permits it.

A naive implementation could perform the conversion (e.g., repeatedly
multiplying by 10 and adding the value of the next digit) without
concern for numeric overflow. If a numeric overflow occurs (for a
signed or floating-poit type), the behavior is undefined. I don't
think the standard *should* permit such an implementation, but it eems
clear that it does.
C89 also state:

<quote>
The following conversion specifiers are valid:

d Matches an optionally signed decimal integer, whose format is the
same as expected for the subject sequence of the strtol function with
the value 10 for the base argument. The corresponding argument shall
be a pointer to integer.
</quote>

and since strtol() function very much detect underflow/overflow, I
guess more clever people than me, has been fooled by this too (e.g. I
can't remember Les Hattton in "Safer C" listing this case in his UB
list).

I note that the C99 standard says (7.19.6.2p12) that "%d"

Matches an optionally signed decimal integer, whose format is the
same as expected for the subject sequence of the strtol function
with the value 10 for the base argument. The corresponding
argument shall be a pointer to signed integer.

This merely describes the *format* of the input, not the manner in
which the conversion is performed. In particular, note that strtol()
would not work for "%lld" (type long long), but the format is the
same. I think the reference to strtol is merely intended to avoid
repeating the description of the format.
If 1 is returned, overflow(/underflow) can still be detected by
checking LONG_MAX (or LONG_MIN)

No, it doesn't distinguish between an overflow and an actual input of
"2147483647".
Also, ok.

See above.
Which C implementation is this?

I got that on Solaris and AIX in 64-bit mode.
Which C implementation is this?

Solaris and AIX, both in 32-bit mode.
Did scanf return EOF? It is not a input failure in my view.

That's from an Alpha OSF1 system. Yes, scanf() returned EOF (scanf()
and sscanf behave the same way, and EOF == -1).

I think that treating it as an input failure is the most sensible
solution. It would be nice if there were a way to distinguish
different kinds of input failure (numeric overflow or underflow vs.
something that's not in the right format), but that would complicate
the interface.
 
K

Keith Thompson

Army1987 said:
Keith Thompson said:
Army1987 said:
"Mark McIntyre" <[email protected]> ha scritto nel messaggio
news:[email protected]... [...]
Don't use scanf.

A little too extreme...
For example:

do {
int k = scanf("%d", &n); [snip]

Or to avoid strange problem if the number input is too large:

#include <limits.h>
#include <errno.h>
do {
long tmp;
int k = scanf("%ld", &tmp);
[snip]

C99 7.19.6.2p10:

If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the
behavior is undefined.

Yes, but, two paragraphs below:
d Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtol function with the value 10
for the base argument. The corresponding argument shall be a pointer to
signed integer.

Yes, it says the *format* is the same as expected by the strtol
function. It doesn't say that it uses the strtol function or
equivalent to do the conversion.

As I mentioned elsethread, strtol only handles type long; "%lld" would
require using strtoll. The strol function is mentioned only to
specify the format, not the algorithm. Likewise, the description of
%a, %e, %f, and %g mentions strtod, which doesn't handle long double.
If the standard intends to specify how the conversions are performed,
it doesn't do a very good job of it.
7.20.1.4p8:
The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned. If the
correct value is outside the range of representable values, LONG_MIN,
LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned
(according to the return type and sign of the value, if any), and the value
of the macro ERANGE is stored in errno.

So apparently the condition you're mentioning only can happen with
int or narrower types. If my interpretation of the Standard is
correct, with long the worst that can happen is that tmp becomes
LONG_MAX or LONG_MIN and errno becomes ERANGE. (But I would not be
that sure, since "whose format is the same as expected for the
subject sequence of the strtol function with the value 10 for the
base argument" doesn't clearly state that the result will be the
same, too. But then, there is nothing to guarantee that the
sequence "12345" isn't converted to the int -23.)

I don't agree with your interpretation. As far as I can tell, the
standard doesn't say *how* the conversion is performed. And the
statement of undefined behavior in C99 7.19.6.2p10 seems clear and
unambiguous.
 
T

Tak

ACM International Collegiate Programming Contest.
<http://icpc.baylor.edu/icpc/>

I have no idea why this implies that one can't use fgets and fgetc.

I don't know.when I use fgets,fgetc the system return me a Wrong
Answer. Maybe because the test data is not stored in a file,so we
can't use fgets,fgetc to get data?
 
T

Tor Rustad

Keith said:
[...]
When I read the C standard, I expected numerical overflow/underflow to
be a matching failure, if *not* underflow or overflow is
*inappropriate input*, what do we call such input then???

Really? C99 7.19.6.2 seems very clear to me:

I wasn't really arguing my case here, I did accept that your
interpretation was correct, after your previous post. However,

How could the caller distinguish between an overflow and a valid input
that happens to yield the value LONG_MAX or LONG_MIN?

By defining the valid range of input item 'x' as e.g. LONG_MIN < x <
LONG_MAX, which is no great loss in practice.

The alternative, is to specify field, like we do for %s, which is not
exactly an attractive solution.
I got that on Solaris and AIX in 64-bit mode.

I was surprised it was Solaris and AIX.
 
T

Tor Rustad

Tor Rustad wrote:

[Sorry, misclick and prev post slipped away, before I was finished
editing my reply]

C89 wording was:

<quote>
The fscanf function executes each directive of the format in turn.
If a directive fails, as detailed below, the fscanf function returns.
Failures are described as input failures (due to the unavailability of
input characters), or matching failures (due to inappropriate input).
[...]
An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the longest
sequence of input characters (up to any specified maximum field width)
which is an initial subsequence of a matching sequence. The first
character, if any, after the input item remains unread. If the length
of the input item is zero, the execution of the directive fails: this
condition is a matching failure, unless an error prevented input from
the stream, in which case it is an input failure.
</quote>

Hence, an input error is "unavailability of input characters", which
isn't exactly what a numerical overflow/underflow is in this context,
quite the opposite...
 
K

Keith Thompson

Tak said:
I don't know.when I use fgets,fgetc the system return me a Wrong
Answer. Maybe because the test data is not stored in a file,so we
can't use fgets,fgetc to get data?

You wrote:

I solve problems from ACM-ICPC, so fgets fgetc are not useful.

which implies that you can't use fgets and fgetc *because* you're
solving problems from ACM-ICPC. Now it looks like that has nothing to
do with your problem.

Saying that you get a "Wrong Answer" doesn't tell us anything. If
you'll show us some sample code, perhaps we can help you figure out
why fgets and fgetc aren't working. There's no reason they shouldn't
work if you use them properly.
 
K

Keith Thompson

Tor Rustad said:
By defining the valid range of input item 'x' as e.g. LONG_MIN < x <
LONG_MAX, which is no great loss in practice.

I disagree. Reserving a value as an error indicator, such as EOF for
fgetc or NULL for malloc, is sensible when that value can't also be a
valid result. Reserving a value that could be valid limits the
usefulness of the interface. Whether this is a significant loss
depends on the application.
The alternative, is to specify field, like we do for %s, which is not
exactly an attractive solution.

IMHO, a better alternative given the current interface is to treat
overflow as a matching error. It would be even better to be able to
distinguish between overflow and non-numeric input, but IMHO that's
not as important. The *scanf functions can't be expected to provide a
universal solution for parsing arbitrary input; if you need something
more sophisticated, you can write it yourself.
 
K

Keith Thompson

Keith Thompson said:
No, I don't recall it being discussed there (which is no indication
that it hasn't been discussed).

I started a thread on comp.std.c, subject
Undefined behavior for *scanf with "%d"
 
E

Eric Sosman

Keith Thompson wrote On 06/04/07 04:28,:
You wrote:

I solve problems from ACM-ICPC, so fgets fgetc are not useful.

which implies that you can't use fgets and fgetc *because* you're
solving problems from ACM-ICPC. Now it looks like that has nothing to
do with your problem.

Saying that you get a "Wrong Answer" doesn't tell us anything. If
you'll show us some sample code, perhaps we can help you figure out
why fgets and fgetc aren't working. There's no reason they shouldn't
work if you use them properly.

Keep in mind that the O.P. is engaged in a programming
contest. Try to offer help in a form that will require his
entry to involve a substantial portion of his own work.

(He's up-front about the matter, not like those scum
who want their homework done for them. As such, I feel
he deserves help, not abuse -- but let's remain aware of
the context, shall we?)
 
W

Walter Roberson

I don't know.when I use fgets,fgetc the system return me a Wrong
Answer. Maybe because the test data is not stored in a file,so we
can't use fgets,fgetc to get data?

Your response hints to me that you did not use fgets() or fgetc()
properly. The standard C library does not operate on "files", it
operates on "streams". It is entirely possible with C that one
run of a program will have a stream attached to a file and the
next run with the stream attached to a terminal, without any changes
in the code.

The key to using fgets() or fgetc() with the standard input stream
is to supply the FILE* parameter as the name stdin such as

if ( fgets(&mybuffer, sizeof mybuffer, stdin) != NULL ) { ... }
 
P

pete

Walter said:
Your response hints to me that you did not use fgets() or fgetc()
properly.

There's always a way to use fgetc.


N869
7.19.3 Files
[#11]
The byte input functions
read characters from the stream as if by successive calls to
the fgetc function.
 
T

Tor Rustad

Keith said:
I disagree. Reserving a value as an error indicator, such as EOF for
fgetc or NULL for malloc, is sensible when that value can't also be a
valid result. Reserving a value that could be valid limits the
usefulness of the interface. Whether this is a significant loss
depends on the application.

My point is that this is better than having UB, so if an implementation
"must" accept the input item, well then the first two implementations of
the ones you tested, did the best alternative in my view (i.e. Solaris
and AIX did the worst thing).

IMHO, a better alternative given the current interface is to treat
overflow as a matching error. It would be even better to be able to
distinguish between overflow and non-numeric input, but IMHO that's
not as important. The *scanf functions can't be expected to provide a
universal solution for parsing arbitrary input; if you need something
more sophisticated, you can write it yourself.

We do agree do on this, and this was how I beleaved the intent of the
C89 standard was. After a matching error has been detected, errno could
be set to indicate reason for the failure.

PS. I did check "The Single UNIX ® Specification, Version 2", if the
overflow/underflow case was addressed there, without any success.
 
T

Tor Rustad

Keith said:
I started a thread on comp.std.c, subject
Undefined behavior for *scanf with "%d"

Excellent! Perhaps P.J. Plauger will make a comment on if TR 24731 (Part
I: Bounds-checking interfaces) intentionally identified this as a
matching failure.
 
T

Tor Rustad

Tor said:
Keith said:
Keith Thompson said:
[...]
Thanks for your detailed answer Keith, do you perhaps also have a link
to comp.std.c, where this topic has been discussed?
No, I don't recall it being discussed there (which is no indication
that it hasn't been discussed).

I started a thread on comp.std.c, subject
Undefined behavior for *scanf with "%d"

Excellent! Perhaps P.J. Plauger will make a comment on if TR 24731 (Part
I: Bounds-checking interfaces) intentionally identified this as a
matching failure.

When I checked the latest draft of TR 24731, it is clear that they don't
include numerical overflow/underflow as a matching failure after all.

<quote>
fscanf_s function is equivalent to fscanf except that the c, s and [
conversion...
</quote>

Sorry!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top