scanf()

  • Thread starter Edward Rutherford
  • Start date
E

Edward Rutherford

Hello

If scanf() fails for a matching error, what can we say about the location
of the file pointer for subsequent reads from stdin? Could it have read
on through an indeterminate amount of stdin, or will it always be
positioned immediately after the last successful conversion from the
format string?

Regards
 
E

Eric Sosman

Hello

If scanf() fails for a matching error, what can we say about the location
of the file pointer for subsequent reads from stdin? Could it have read
on through an indeterminate amount of stdin, or will it always be
positioned immediately after the last successful conversion from the
format string?

The former. For example, consider reading with "%e" and
encountering the input "123.4567e+---". The first ten characters
are a valid prefix for a floating-point number, which is then
spoiled by the eleventh. Yet scanf() can't simply push the
eleventh character back onto the stream and convert the first
ten: They are valid as a prefix but not as a complete number.
scanf() would have to push back three characters ('-', '+', 'e')
to arrive at something valid, but it has only one character of
push-back to work with. And, of course, there's simply no way
it can get all the way back to a position before the '1'.
 
J

James Kuyper

Hello

If scanf() fails for a matching error, what can we say about the location
of the file pointer for subsequent reads from stdin? Could it have read
on through an indeterminate amount of stdin, or will it always be
positioned immediately after the last successful conversion from the
format string?

Regards

"An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and which is, or is a
prefix of, a matching input sequence.285) The first character, if any,
after the input item remains unread." (7.21.6.2p9).
 
K

Keith Thompson

Eric Sosman said:
The former. For example, consider reading with "%e" and
encountering the input "123.4567e+---". The first ten characters
are a valid prefix for a floating-point number, which is then
spoiled by the eleventh. Yet scanf() can't simply push the
eleventh character back onto the stream and convert the first
ten: They are valid as a prefix but not as a complete number.
scanf() would have to push back three characters ('-', '+', 'e')
to arrive at something valid, but it has only one character of
push-back to work with. And, of course, there's simply no way
it can get all the way back to a position before the '1'.

And that's not the only problem with using scanf to read numeric data.

Consider reading with "%e" and encountering the input "1.0e9999999".
The behavior is undefined -- and since you're reading from stdin,
there's no way (using scanf alone) to avoid that.

(This was something I really hoped C11 would fix, but it didn't.)

The safe way to read floating-point data from stdin is to read lines
using fgets() (or something similar -- obviously not gets()), and then
parse it using strtod().
 
B

Ben Bacarisse

Edward Rutherford said:
If scanf() fails for a matching error, what can we say about the location
of the file pointer for subsequent reads from stdin? Could it have read
on through an indeterminate amount of stdin, or will it always be
positioned immediately after the last successful conversion from the
format string?

No, it can read an unlimited number of characters before reporting a
matching failure. There are lots of details, but since you are asking
an all-or-nothing question, you probably don't care about them. A
"worst case" example occurs when processing "%d" -- an unlimited number
of white-space characters can be read before any matching failure can be
reported. If there is such a failure, the character that causes it
matching failure will pushed back onto the stream, but that's all.

The scanf functions only ever push back at most one character so you
can use that a rule of thumb to imagine what they must consume before
a directive can fail.

For some input patterns %n is an effective solution, but you don't say
what your overall objective is so I can't be sure.
 
E

Edward Rutherford

Ben said:
No, it can read an unlimited number of characters before reporting a
matching failure. There are lots of details, but since you are asking
an all-or-nothing question, you probably don't care about them. A
"worst case" example occurs when processing "%d" -- an unlimited number
of white-space characters can be read before any matching failure can be
reported. If there is such a failure, the character that causes it
matching failure will pushed back onto the stream, but that's all.

The scanf functions only ever push back at most one character so you can
use that a rule of thumb to imagine what they must consume before a
directive can fail.

For some input patterns %n is an effective solution, but you don't say
what your overall objective is so I can't be sure.

That's unfortunate, because then there's no way to recover the non-
matching data, as far as I can see.

Wouldn't it be better if scanf() buffered non-matching characters, either
seemlessly in the background (so that future reads from stdin saw those
characters) or by returning a pointer to a buffer containing them?
 
J

James Kuyper

That's unfortunate, because then there's no way to recover the non-
matching data, as far as I can see.

Wouldn't it be better if scanf() buffered non-matching characters, either
seemlessly in the background (so that future reads from stdin saw those
characters) or by returning a pointer to a buffer containing them?

scanf() was not intended to have that complicated an interface, and it's
far too late to make any changes to its interface now.

If you want to do something like that, the standard library provides you
with the pieces you need to assemble to do it yourself: fgets() from
<stdio.h> and the strto*() functions from <stdlib.h> are the most
relevant ones.
 
E

Eric Sosman

Ben said:
Edward Rutherford said:
If scanf() fails for a matching error, [...] Could it
have read on through an indeterminate amount of stdin, [...]

No, it can read an unlimited number of characters before reporting a
matching failure. [...]

That's unfortunate, because then there's no way to recover the non-
matching data, as far as I can see.

Right. That's one of the reasons scanf() and its brethren
are difficult to use in "industrial-strength" applications.
Wouldn't it be better if scanf() buffered non-matching characters, either
seemlessly in the background (so that future reads from stdin saw those
characters) or by returning a pointer to a buffer containing them?

Which part of "indeterminate amount" and "unlimited number"
do you have trouble understanding? ;-)

If you need to revisit those characters, you'll need to buffer
them yourself (or fseek() back to them, if the input is seekable).
You'd need to read them into a buffer first, and then apply sscanf()
to the buffer; as Ben suggests, the "%n" specifier may be helpful
in navigating.

My own preference when parsing input fancy enough to warrant
backtracking is to read it as plain characters (perhaps a line at
a time, if "line" makes sense), store them in a buffer, and pick
them apart with strxxx() functions. Even sscanf() has infelicities.
 
B

Ben Bacarisse

Edward Rutherford said:
That's unfortunate, because then there's no way to recover the non-
matching data, as far as I can see.

You've got good answers already so I'll just indulge in picking a small
nit: you can always get the non-matching data -- it's left in the stream
by definition! I know what you mean, of course, I just don't know a
good word for it.

To make up for being mean, here's some advice that might help. If you
want to be able to back-up to the last place a conversion worked, you
might keep track of the stream position (using fgetpos) after every
successful fscanf call. After every failed one, fsetpos to the last
saved position.

That's no use for sscanf, of course, and it does mean you probably have
to do calls with only one conversion specifier at a time, but it might
be all you need.
 
J

John Bode

That's unfortunate, because then there's no way to recover the non-
matching data, as far as I can see.

Which is why scanf() is the wrong tool for all but the simplest
input tasks. Unless I can guarantee that my input is always
well-behaved, I avoid using anything from the *scanf() family.

Use fgets() to consume an entire line, then parse and convert each
element using tools like strtok(), strtod() strtol(), etc.
 
M

Malcolm McLean

בת×ריך ×™×•× ×—×ž×™×©×™, 9 ב×וגוסט 2012 22:00:49 UTC+1, מ×ת John Bode:
Which is why scanf() is the wrong tool for all but the simplest
input tasks. Unless I can guarantee that my input is always
well-behaved, I avoid using anything from the *scanf() family.
It depends on the quality of parsing you want. Let's say we have x, y
co-ordinates in columns.

while(fscanf(fp, "%f %f\n", &x, &y) == 2)
{
/* assign x and y to arrays */
}

won't catch malformed imput like four columns in a rogue line.
But for many applications, it's probably good enough. If it's a
bad file, it will fail eventually. A no-one's got an interest in
providing malicious input to deliberately get a wrong result.
 
K

Keith Thompson

Malcolm McLean said:
בת×ריך ×™×•× ×—×ž×™×©×™, 9 ב×וגוסט 2012 22:00:49 UTC+1, מ×ת John Bode:
It depends on the quality of parsing you want. Let's say we have x, y
co-ordinates in columns.

while(fscanf(fp, "%f %f\n", &x, &y) == 2)
{
/* assign x and y to arrays */
}

won't catch malformed imput like four columns in a rogue line.
But for many applications, it's probably good enough. If it's a
bad file, it will fail eventually. A no-one's got an interest in
providing malicious input to deliberately get a wrong result.

There are worse possibilities than wrong results. If the input includes
something like "1.0e999999999", the behavior is undefined.

If you can treat the input file as part of the program, so that an error
in the input file is the same as a coding error, then it's probably ok
to use fscanf. Otherwise ...
 
M

Malcolm McLean

בת×ריך ×™×•× ×©×‘×ª,11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
There are worse possibilities than wrong results. If the input includes
something like "1.0e999999999", the behavior is undefined.
But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.
"Undefined behaviour" means "the C standard imposes no restrictions on the
behaviour of the implementation", not that the behaviour exists in some
ontological state of undefinedness.
 
B

BartC

Malcolm McLean said:
בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
But what is the program meant to do with such an input value? UB is
probably the
best thing that can happen to it.
"Undefined behaviour" means "the C standard imposes no restrictions on the
behaviour of the implementation", not that the behaviour exists in some
ontological state of undefinedness.

So, you have an application which consists of a user interface and a bunch
of data, much of which has not been committed to disk. It's been running for
several hours, but if the user then chooses a command which reads something
from a file, and accidentally chooses a binary instead of a text file, the
application should just crash with the loss of all data?

You've obviously never had to deal with irate clients on the phone!

(With the routines I use (the C stuff is buried several layers deep so I
couldn't tell you exactly what library functions are used), any non-numeric
input will return 0.0 when trying to read floating point. Anything numeric
but out-of-range such as 1.0e99999999 is read as INF.

Input is always line oriented too (so running into end-of-line while trying
to read more numbers will just return 0.0. All solid behaviour,
although it's possible some of this is due to a well-behaved version of
scanf() somewhere.)
 
J

James Kuyper

בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.

In most of my programs, the program is meant to report the fact that it
has run into a problem with the input, and to identify the offending
input and the context in which it occurred (for this particular program
that would mean identifying the line number). In some cases, at the
appropriate level (which is, in general, not the same as the level at
which the problem was detected), it is meant retry the action with
something changed that might prevent or avoid the problem; for this
program, that might mean asking the user to provide the name of a
different file to use as input. UB is almost never on my agenda for the
appropriate response to an error condition.
"Undefined behaviour" means "the C standard imposes no restrictions on the
behaviour of the implementation", not that the behaviour exists in some
ontological state of undefinedness.

I try my best to avoid having my programs enter a given state unless
their behavior in that state is defined by something (not necessarily
the C standard), and I know what that definition is.
 
K

Keith Thompson

Malcolm McLean said:
בת×ריך ×™×•× ×©×‘×ª, 11 ב×וגוסט 2012 18:40:56 UTC+1, מ×ת Keith Thompson:
But what is the program meant to do with such an input value? UB is probably the
best thing that can happen to it.

That's absurd. Undefined behavior means that whatever the *worst*
thing is, it can happen, whether that's crashing the program, or
continuing to execute quietly with bad data, or reformatting your
hard drive.

And before you reject that last possibility, consider this: Is
there code in your operating system that's designed to reformat
a hard drive? Are you *certain* that an instance of undefined
behavior can't possibly corrupt some function pointer and cause
that code to be invoked?

I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
your hard drive (or I wouldn't have tried it just now), but why take the
risk? If you use gets() and strtod(), you can at least detect an input
error and abort the program.
"Undefined behaviour" means "the C standard imposes no restrictions on the
behaviour of the implementation", not that the behaviour exists in some
ontological state of undefinedness.

Nor does it mean "the program is guaranteed to crash with a
meaningful error message".
 
E

Eric Sosman

[...]
I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
your hard drive (or I wouldn't have tried it just now), but why take the
risk? If you use gets() and strtod(), you can at least detect an input
error and abort the program.

Extend your wrist, Keith, whilst I administer the slap.
And I'm rescinding all the gold stars you won last term, too.
 
K

Keith Thompson

Eric Sosman said:
[...]
I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
your hard drive (or I wouldn't have tried it just now), but why take the
risk? If you use gets() and strtod(), you can at least detect an input
error and abort the program.

Extend your wrist, Keith, whilst I administer the slap.
And I'm rescinding all the gold stars you won last term, too.

Aarrgghh!

Perhaps my f key isn't working. Nope, there it is, can't use that
excuse.

Wrist slap humbly accepted -- but I assure it it was a typo, not a
thinko.
 
E

Eric Sosman

Eric Sosman said:
[...]
I admit that feeding "1.0e999999999" to scanf isn't likely to reformat
your hard drive (or I wouldn't have tried it just now), but why take the
risk? If you use gets() and strtod(), you can at least detect an input
error and abort the program.

Extend your wrist, Keith, whilst I administer the slap.
And I'm rescinding all the gold stars you won last term, too.

Aarrgghh!

Perhaps my f key isn't working. Nope, there it is, can't use that
excuse.

Wrist slap humbly accepted -- but I assure it it was a typo, not a
thinko.

A reudian slip, no doubt.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top