scanf pushes back two chars?

S

Steve Kobes

I ran the following program on gcc:

#include <stdio.h>

int main(void)
{
unsigned int i;
char c;

if (scanf("%x", &i) == 1)
printf("i = %u\n", i);

if (scanf("%c", &c) == 1)
printf("c = %c\n", c);

return 0;
}

I gave it the following input:

0xg

I expected the first scanf call to fail, since 'g' is not a hex digit
and "0x" is not a hex number. The 'g' would then be pushed back onto
the input stream, and the only output would be:

c = g

Instead, I got the following output:

i = 0
c = x

Is this output correct? It seems like the first scanf call must have
pushed *two* characters ('x' and 'g') back onto the input stream. Was
it supposed to?

--Steve
 
J

Jack Klein

I ran the following program on gcc:

#include <stdio.h>

int main(void)
{
unsigned int i;
char c;

if (scanf("%x", &i) == 1)
printf("i = %u\n", i);

if (scanf("%c", &c) == 1)
printf("c = %c\n", c);

return 0;
}

I gave it the following input:

0xg

I expected the first scanf call to fail, since 'g' is not a hex digit
and "0x" is not a hex number. The 'g' would then be pushed back onto
the input stream, and the only output would be:

c = g

Instead, I got the following output:

i = 0
c = x

Is this output correct? It seems like the first scanf call must have
pushed *two* characters ('x' and 'g') back onto the input stream. Was
it supposed to?

--Steve

What makes you think it pushed back two characters? There are several
other possible explanations.

There have been implementations of scanf() that do not handle a "0x"
prefix properly, yours could be one. Have you verified the results
when you enter "0xfg"?

In any case, you have a defect in your library's implementation of the
function, which is a library problem, not a gcc problem. I suggest
you contact the source of the library.
 
M

Martin Ambuhl

Steve said:
I ran the following program on gcc:

#include <stdio.h>

int main(void)
{
unsigned int i;
char c;

if (scanf("%x", &i) == 1)
printf("i = %u\n", i);

if (scanf("%c", &c) == 1)
printf("c = %c\n", c);

return 0;
}

I gave it the following input:

0xg

I expected the first scanf call to fail, since 'g' is not a hex digit
and "0x" is not a hex number. The 'g' would then be pushed back onto
the input stream, and the only output would be:

c = g

Instead, I got the following output:

i = 0
c = x


To add to your puzzlement, with gcc 3.41 I get
i = 0
c = g

Your result seems right and mine wrong. Since the specification for
strtoul reads

If the value of base is 16, the characters 0x or 0X may optionally
precede the sequence of letters and digits, following the sign if
present.

4 The subject sequence is defined as the longest initial subsequence
of the input string, starting with the first non-white-space
character, that is of the expected form. The subject sequence contains
no characters if the input string is empty or consists entirely of
white space, or if the first non-white-space character is other than a
sign or a permissible letter or digit.

The longest subsequence of the expected form is '0', with the next tail
beginning with 'x'. Note that the '0x' is optional, so '0' *is* of the
expected form.
 
C

Chris Torek

... the specification for strtoul reads

If the value of base is 16, the characters 0x or 0X may optionally
precede the sequence of letters and digits, following the sign if
present.

4 The subject sequence is defined as the longest initial subsequence
of the input string, starting with the first non-white-space
character, that is of the expected form. The subject sequence contains
no characters if the input string is empty or consists entirely of
white space, or if the first non-white-space character is other than a
sign or a permissible letter or digit.

The longest subsequence of the expected form is '0', with the next tail
beginning with 'x'. Note that the '0x' is optional, so '0' *is* of the
expected form.

The scanf() specification in the ANSI/ISO C standard -- at least the
original C89 one -- is confusing.

When I wrote the scanf engine for 4.4BSD, I decided that input of
the form:

4.123e+oops

would match %f (and similar) specifications through the "4.123"
but not the "e+oops" part, and thus leave "e+oops" unread in the
stream, available for subsequent input operations (scanf %c
conversions, getc(), etc.).

Since then, I have been told that scanf is supposed to be highly
destructive and discard some or all of the unusable "e+oops" because
it *begins* with a legitimate sequence, but is not actually one.
(Compare 4.123e+oops with 4.123e+17andtrailingstuff, for instance.)

A few example of just what is to be matched, what is to be discarded,
and what is to be left behind in the input stream, on various inputs
for various formats, would help implementors a great deal, I think.
(It certainly would have helped me.)
 
C

CBFalconer

Chris said:
.... snip ...

Since then, I have been told that scanf is supposed to be highly
destructive and discard some or all of the unusable "e+oops" because
it *begins* with a legitimate sequence, but is not actually one.
(Compare 4.123e+oops with 4.123e+17andtrailingstuff, for instance.)

A few example of just what is to be matched, what is to be discarded,
and what is to be left behind in the input stream, on various inputs
for various formats, would help implementors a great deal, I think.
(It certainly would have helped me.)

Most, if not all, of these anomalies could be handled in one swell
foop by increasing the pushback limit of ungetc to 3.
 
O

Old Wolf

Martin Ambuhl said:
To add to your puzzlement, with gcc 3.41 I get
i = 0
c = g

Your result seems right and mine wrong.

Isn't this a feature of glibc rather than gcc ?
 
S

Steve Kobes

Martin said:
Steve said:
I expected the first scanf call to fail, since 'g' is not
a hex digit and "0x" is not a hex number. The 'g' would
then be pushed back onto the input stream, and the only
output would be:

c = g

Instead, I got the following output:

i = 0
c = x

To add to your puzzlement, with gcc 3.41 I get
i = 0
c = g

Your result seems right and mine wrong.
[snip strtoul spec]

In scanf,

"An input item is defined as the longest sequence of input characters
(up to any specified maximum field width) which is an initial
subsequence of a matching sequence." (C89)

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence." (C99, AFAICT trying to say the
same thing more clearly)

And in the next paragraph:

"If the input item is not a matching sequence, the execution of the
directive fails: this condition is a matching failure." (C89 and C99)

This sounds pretty clear to me: given 0xg with %x, "0x", not "0", is
the input item, and it is a matching failure because "0x" is not a
matching input sequence. And there's no reason to think that the 'x'
should be pushed back onto the input stream.

Same with 4.123e+oops and %f... the input item is "4.123e+" (the
longest sequence that COULD begin a match), all of which is read and
discarded.

This means GCC gets it wrong for both Martin and me (in two different
ways, which I don't understand since I'm also using 3.4.1). Visual
C++ 6 seems to do it correctly.
 
K

Keith Thompson

Isn't this a feature of glibc rather than gcc ?

Yes. More precisely, it's a feature of whatever C library you happen
to be using. (gcc is commonly used with libraries other than glibc.)

<OT>
There's no gcc 3.41, at least not yet; the latest release is 3.4.1.
</OT>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top