Since you can determine the length of the input string, %s can be
perfectly safe with sscanf.
Agreed.
For scanf(), it's not immediately obvious how to make the length
"mandatory".
For example, this:
char buf[10];
int result;
result = scanf("%s", buf);
is as unsafe as any call to gets(), since arbitrary input can easily
overflow buf. We can avoid the gets() problem by removing the gets()
function from the standard library, but we're not contemplating
removing the scanf() function altogether, so we can't prevent the
string "%s" being passed as the first argument to the scanf()
function.
Probably the best solution would be to mandate that a "%s" directive
with no length always fails; the call scanf("%s", buf) would then
return 0 and would not modify the buf array. (To be clear, this would
be incompatible with the current standard; we're talking about
proposed changes in a future version of the standard.)
In general, we want (or at least I want) to avoid cases where
arbitrary input from stdin can cause a buffer overflow. There are
other cases to consider. For example: fscanf(stdin, "%s", buf)
presents exactly the same problem as scanf("%s", buf); catching it
would require fscanf() to behave differently based on whether its
first argument is stdin.
Even more generally, I think we want to avoid cases where arbitrary
input from an interactive device (not just stdin) can cause a buffer
overflow. The standard doesn't currently require I/O operations to be
aware of whether they're dealing with an interactive device, and this
may not be possible in general. C99 7.19.3p7 provides a vague
precedent:
... the standard input and standard output streams are fully
buffered if and only if the stream can be determined not to refer
to an interactive device.
but I'm not comfortable with the idea of such a drastic change in the
behavior of fscanf(f, "%s", buf) depending on whether f *can be
determined* to refer to an interactive device.
It would be simpler to eliminate "%s" for all of scanf, sscanf, and
fscanf. sscanf and fscanf with "%s" are potentially safe because it's
possible to know the maximum length, but then it's also possible to
*know* the maximum length, and to specify it. If nothing else, the
programmer can just specify the length of the buffer, which I'd say is
good practice anyway.