Warning against Scanf

B

B Thomas

Hi,
I was reading O'Reilly's "Practical C programming" book and it warns
against the use of scanf, suggesting to avoid using it completely .
Instead it recomends to use using fgets and sscanf. However no
explanation is offered other than that scanf handels end of lines very
badly. I have exeperienced such problems when doing some numerical
programming but never understood it. Things like some consequitive
scanfs would not read in values correctly.

Could some please give me a pointer on this issue, in particular how to
use scanf safely.

sincerely
b thomas
 
R

Richard Heathfield

B said:
Hi,
I was reading O'Reilly's "Practical C programming" book and it warns
against the use of scanf, suggesting to avoid using it completely .

I agree with that advice, although it is certainly *possible* to use scanf
safely.
Instead it recomends to use using fgets and sscanf.

I wouldn't even use sscanf, to be frank.
However no
explanation is offered other than that scanf handels end of lines very
badly.

Rather, handling them well is tricky.
I have exeperienced such problems when doing some numerical
programming but never understood it. Things like some consequitive
scanfs would not read in values correctly.

Could some please give me a pointer on this issue, in particular how to
use scanf safely.

Dan Pop is your man, I think. IIRC he's the only regular contributor to this
newsgroup who thinks using scanf is a good idea.
 
P

pete

Richard said:
I agree with that advice, although it is certainly *possible* to use scanf
safely.


I wouldn't even use sscanf, to be frank.


Rather, handling them well is tricky.


Dan Pop is your man, I think.
IIRC he's the only regular contributor to this
newsgroup who thinks using scanf is a good idea.

I've implemented Dan's technique on somebody's homework,
once or twice.

http://groups.google.com/[email protected]

I like it.

He also has a variation using feof.

http://groups.google.com/[email protected]
 
M

Malcolm

B Thomas said:
Could some please give me a pointer on this issue, in particular how to
use scanf safely.
scanf("%s", str) is gets() in disguise.

However scanf() does have its own minature formatting language that will
allow you to sepcify the maximum width of the string, how to handle
overflow, etc.
I don't know this formatting language (though I seldom read fields from text
files). It's likely that many of the programmers you work with won't be
familiar with it either.
 
K

Kevin Goodsell

B said:
Hi,
I was reading O'Reilly's "Practical C programming" book and it warns
against the use of scanf, suggesting to avoid using it completely .
Instead it recomends to use using fgets and sscanf. However no
explanation is offered other than that scanf handels end of lines very
badly. I have exeperienced such problems when doing some numerical
programming but never understood it. Things like some consequitive
scanfs would not read in values correctly.

Could some please give me a pointer on this issue, in particular how to
use scanf safely.

You should check the FAQ. It covers a number of scanf() gotchas and
talks about the recommendation to avoid it.

The bottom line, I think, is that it's very difficult to use scanf()
correctly. The lack of type checking is bad enough by itself (though we
tolerate that in simpler cases like printf), but the number of ways
scanf can fail and the effort required to handle them and to avoid bugs
and security holes make it a very tricky. Avoiding its use is one way to go.

If you do use it, be very careful. Double-check your conversion
specifiers (refer to a book or the standard - don't try to just remember
them). Make sure you pass pointer arguments (that is, don't forget to
take the address where appropriate). Don't use %s or %[ without an
*accurate* field width specified (remember that it must be at least 1
less than the available space to leave room for the '\0'). Check the
return value, and handle the case where a conversion fails. If possible,
use a LINT-like program to type-check the call (look for splint - it's a
good, free LINT), or a compiler that does such type-checking (gcc is the
only one I know of).

-Kevin
 
N

nrk

Richard said:
I agree with that advice, although it is certainly *possible* to use scanf
safely.

Yes. However, it is easier to get it wrong more often than not.
I wouldn't even use sscanf, to be frank.

You're right in that there's very little difference between using scanf
directly and fgets+sscanf (in sscanf you know the size of input, but you
can still invoke undefined behavior if you're careless with your conversion
specifiers as the content is not under your control). So, if you believe
scanf is dangerous, then so is fgets+sscanf. However, I believe it is
considerably difficult to build your own alternatives that are safer
(depending on the task you want to do of course). Rather, if one devotes
some of the time spent building their own complex input routines to
understanding the *scanf family, one could end up with a more robust and
effective alternative.

To OP: Can you elaborate on this? Specifically, what is the book
complaining about as far as scanf's handling of end of lines?
Rather, handling them well is tricky.


Dan Pop is your man, I think. IIRC he's the only regular contributor to
this newsgroup who thinks using scanf is a good idea.

-nrk.
 
C

Chris Torek

[prefers fgets+sscanf over scanf]

You're right in that there's very little difference between using scanf
directly and fgets+sscanf (in sscanf you know the size of input, but you
can still invoke undefined behavior if you're careless with your conversion
specifiers as the content is not under your control). ...

This is not the only, or perhaps (depending on one's point of view)
even the major, difference. The other significant difference between
using scanf() directly, and using some input-reading function --
whether fgets() or some other function -- followed by sscanf(),
is that the latter separates what one might reasonably consider
fundamentally different tasks.

Specifically, fgets() (or ggets() or any of the variants designed
to avoid fgets()'s limitations) reads "raw", "uninterpreted" data.
(I use double quotes around "raw" and "uninterpreted" because
fgets() and company do in fact interpret data as "lines of text"
separated by newlines, which is one of the two fundamental file
formats required of all hosted C implementations. The other is
the "even raw-er" binary file, which you get with fopen's "b"
modifier -- "rb", "wb", and the like -- in the open-mode parameter.)
Compare this with the scanf() family, which includes directives
like %d -- "interpret an integer" -- and %f and %[ and so on.

It is certainly possible to "read and interpret" more or less
simultaneously. Indeed, one of the features of certain GUI
applications is that they can do this "per input event", and beep
or otherwise gripe at you the very instant you do something
inappropriate, such as attempting to enter the letter 'z' in a
numeric field. But ANSI/ISO C is too primitive for this -- in
portable yet interactive C, we have to just read a line at a time,
then do our best to make sense of it.

This is where scanf() goes wrong.

Suppose, for instance, you do this:

n = scanf("%d", &intvar);

and the user enters 'zx81'. What does scanf() do with this, and
what would you *like* to have happen? A GUI interface might
reject the z, reject the x, and then accept the 81. The scanf
engine, on the other hand, sees the 'z' and rejects it but LEAVES
IT IN THE INPUT STREAM. It never gets as far as the 8.

If you put this scanf() call in a loop, the engine keeps rejecting
the same 'z' over and over again, never making any progress. The
program runs forever (or until externally interrupted).

If, on the other hand, you use fgets() (or ggets() etc.) first, so
as to read a "raw" line, *then* apply sscanf(), you not only can
detect the failure to convert the 'z', you also get the presumably-desired
effect of having entirely consumed the only interactive input item
portable C supports, i.e., the entire line. While this may be less
desirable than interactively catching the 'z' and 'x' as the user
pushes the keys, it does at least keep the program from getting
stuck in an infinite loop.
So, if you believe scanf is dangerous, then so is fgets+sscanf.

(But fgets() followed by sscanf() allows the insertion of limit
checking, and avoids infinite loops. It is possible to do both of
these with scanf(), but it is also clumsy to do so.)
However, I believe it is
considerably difficult to build your own alternatives that are safer
(depending on the task you want to do of course). Rather, if one devotes
some of the time spent building their own complex input routines to
understanding the *scanf family, one could end up with a more robust and
effective alternative.

Interactive user input is perhaps *the* most difficult thing to
do with computers, because it means the computer must work with
that most unpredictable of I/O devices, the human being. :)
To OP: Can you elaborate on this? Specifically, what is the book
complaining about as far as scanf's handling of end of lines?

Consider the "%d" example again, with a bit more prefix:

printf("Please enter an integer: ");
fflush(stdout);
n = scanf("%d", &intvar);

Not only does this have the "scanf engine jams up on alphabetic
input" problem, it also has another series of problems relating
to whitespace. The "%d" directive does not *just* mean "convert
an int", it *also* means "skip (ignore) leading white space", and
to the scanf engine, all white space -- spaces, tabs, newlines,
formfeeds, vertical tabs: basically anything for which isspace()
returns nonzero -- is equivalent.

If the user simply presses the ENTER (or RETURN) key, the program
sits there impassively. The scanf() code has eaten the newline
and is still awaiting more input -- but the computer does not
issue a new prompt to the human saying "please, no blank lines;
I need an integer, a series of digits". This too is often not
what one wanted.

Worse, suppose the user enters "123,456" (comma and all). The
scanf() engine reads and converts the 123 and leaves the comma and
subsequent digits and newline in the input stream, for the next
input operation to find. If the user really does enter just an
integer, scanf() leaves the newline behind. Since ANSI/ISO C's
interactive input *is* a primitive line-at-a-time based model, this
is quite a disservice. Because different scanf formats imply "skip
whitespace", it is impossible to tell a priori just how many input
lines (or "input events" as a human operator might see them) scanf()
has consumed.

One can attempt to work around the "trailing newline left in the
input stream", but almost invariably, C programmers' first attempts
to do so read something like this:

n = scanf("%d\n", &intvar);

(though most leave off the "n =" as well!). This simply does not
work -- because once again, the scanf engine interprets "any white
space" as "ANY white space". The newline in the format directive
does *not* mean "eat the trailing newline", nor even "eat trailing
blanks if any followed by a trailing newline" (which is probably
what the programmer wants), but rather "eat all white space including
as many newlines as possible". This means that even if the user
enters an integer as directed, the computer *still* just sits there
impassively, waiting for more input lines. If the user presses
ENTER again, the computer continues waiting for input. Only when
the user enters something "not white space" -- such as "wake up
you stupid machine" -- does the scanf() call return! And, of
course, it leaves this "bad" input in the input stream, where it
immediately jams up the scanf engine on the next "%d" format.

As long as the user sees (correctly) that ENTER is the way to push
input through to an interactive ANSI-C program -- that he gets to
edit any input until that point, and the ENTER key, well, *enters*
it, committing it to the program's input -- the scanf() function
remains badly misdesigned for interactive input. This is the
programming-language equivalent of mismatched impedances on radio
antennae, or plugging a 120 volt appliance into a 220 volt socket,
or any number of similar analogies: scanf() works in the wrong
units. The units the interactive program and its user exchange
are "input lines", but the units scanf() handles are "things that
match the next format directive", and there is no single directive
that *ever* means "an input line".

While fgets() is not perfect, it *does* get "an input line", so it
works far, far better for the average programmer and interactive
user. The pieces fit: in the usual case, the part that sticks out
of the user goes smoothly into the computer, and no blood is shed. :)
 
K

Keith Thompson

Malcolm said:
scanf("%s", str) is gets() in disguise.

In some ways. It has the same problems with buffer overflow (you
can't control how much input you'll get from stdin, so there's no way
to avoid overflowing str). But the "%s" specifier matches a sequence
of non-whitespace characters, not an entire input line.
However scanf() does have its own minature formatting language that will
allow you to sepcify the maximum width of the string, how to handle
overflow, etc.
I don't know this formatting language (though I seldom read fields from text
files). It's likely that many of the programmers you work with won't be
familiar with it either.

But since scanf() (along with fscanf() and sscanf()) is defined by the
language standard, it's easy enough to find a description of the
formatting language, either in the standard itself or in any decent
textbook.
 
K

Kelsey Bjarnason

Hi,
I was reading O'Reilly's "Practical C programming" book and it warns
against the use of scanf, suggesting to avoid using it completely .
Instead it recomends to use using fgets and sscanf. However no
explanation is offered other than that scanf handels end of lines very
badly. I have exeperienced such problems when doing some numerical
programming but never understood it. Things like some consequitive
scanfs would not read in values correctly.

Could some please give me a pointer on this issue, in particular how to
use scanf safely.

Simple exemplars:

int x;
scanf( "%d", &x ); - does weird things when entering "one".

int x = 1;
while( x != 3 )
scanf( "%d", &x );
/* Enter, say 'd'; even a subsequent entry of '1' will fail to end the
loop */

char buff[128];
scanf( "%s", buff );
/* Hey, we just re-created gets! */


There are, obviously, ways to deal with such things... but one problem is
that if scanf pukes as a result of incorrect data ('d' instead of an
integer, say) it leaves the data in the buffer, thus screwing up the next
read, and the next, and the next...

Using fgets rips all (or as much as it can) of the data out of the buffer,
regardless of what's there. Net result, if you determine the input was
garbage, just loop and fgets again. You also don't need to worry about
buffer overflows, input limiting modifiers, yadda yadda yadda; just get
the input and process it as you will.
 
K

Keith Thompson

Kelsey Bjarnason said:
Using fgets rips all (or as much as it can) of the data out of the buffer,
regardless of what's there. Net result, if you determine the input was
garbage, just loop and fgets again. You also don't need to worry about
buffer overflows, input limiting modifiers, yadda yadda yadda; just get
the input and process it as you will.

Looping and calling fgets() again is sensible for an interactive
program, but not for one that's reading input from a file.
 
J

Joe Wright

Keith said:
Looping and calling fgets() again is sensible for an interactive
program, but not for one that's reading input from a file.
You can't mean that. Looping on fgets() is *the* way to read a text
file. Are you proposing that fscanf() might be better? Pray tell.
 
K

Keith Thompson

Joe Wright said:
You can't mean that. Looping on fgets() is *the* way to read a text
file. Are you proposing that fscanf() might be better? Pray tell.

You missed the context: "if you determine the input was garbage, just
loop and fgets again".

In an interactive program, if the user provides garbage input, it's
sensible to print an error message and re-prompt. For example (assume
anything after ": " is entered by the user):

Enter a number: ten Enter a number: 10
That's better, thank you for entering the number 10.

You can't do this if you're reading from a file. If there's an error
in an input file, you have to find some other way to deal with it
(commonly by printing an error message and bailing out).

You're correct, of course, that looping and calling fgets() again is a
sensible way to get the *next* line of input, whether you're reading
from an interactive source or from a file.
 
T

Tak-Shing Chan

You can't mean that. Looping on fgets() is *the* way to read a text
file. Are you proposing that fscanf() might be better? Pray tell.

Sure, if the text file is formatted. For example:

joe.txt:
3 4 Matrix dimensions
0.1 2.3 4.5 6.7 Comment 2
8.9 0.1 2.3 4.5 Comment 3
6.7 8.9 0.1 2.3 Comment 4

main.c:
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
FILE *fp = fopen("joe.txt", "r");
unsigned n, m, i;
double *buf;

if (!fp) return EXIT_FAILURE;
if (fscanf(fp, "%u%u", &n, &m) != 2)
return EXIT_FAILURE;
if (!(buf = malloc(n * m * sizeof *buf)))
return EXIT_FAILURE;
for (i = 0; i < n * m; i++) {
if (!(i % m)) fscanf(fp, "%*[^\n]"), getc(fp);
if (fscanf(fp, "%lf", &buf) != 1)
return EXIT_FAILURE;
}
fclose(fp);

/* Do whatever you want with buf here */
free(buf);
return 0;
}

Tak-Shing
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top