advanced scanf

B

Bernard Liang

In response, I have another question about the scanf family. After
reading in a %d value, for instance, do they immediately wade through
all subsequent whitespace until a non-whitespace character is reached,
and then continue analysing the format string? For instance, suppose I
have a buffer

int num;
char chr1, chr2, chr3, chr4;
char[] my_string = "1 1 2 3 5 8abc";

and then I do something like
sscanf(my_string, "%*d %*d %*d %*d %d %c%c%c%c", &num, &chr1, &chr2,
&chr3, &chr4);

would I find " " in chr[1-4], or "8abc"?

-BL

Eric said:
>
> Bernard Liang wrote On 07/20/06 17:32,:
>
>>Consider the following excerpt below, with the included relevant
>>declarations.
>>
>>Firstly, the lines marked with **, ***, **** at the beginning are not
>>supposed to have those stars at the beginning; they are only used to
>>direct attention.
>
>
> I've removed them; hope you don't mind.
>
>
>>p_num is somehow getting changed from its previous value to 0 by the
>>sscanf command (***) that doesn't even involve p_num. [...]
>>
>> int pktsize, interval, numgums, p_pktsize, p_interval, p_num;
>> char trial, p_trial;
>>[...]
>> if (sscanf(buffer, "%d %d %d %1s %*d %lf", &pktsize,
>>&interval, &numgums, &trial, &commrate) == 5) {
>
>
> The "%1s" conversion specifier skips white space,
> reads one character, and stores it as a string starting
> at the indicated place, in this case `&trial'. Observe
> that I said "as a string," meaning that two characters
> are stored: the input character (as the string's content)
> plus a zero byte (to mark the end of the string). The
> input character goes into `trial' as intended, but you
> have not provided any memory to hold the zero byte. The
> sscanf() function will try to store the zero anyhow, and
> whatever happens next is in the laps of the gods.
>
> What happened in this case is *probably* that sscanf()
> stored the zero byte in the memory location immediately
> after `trial', and *probably* that memory was part of the
> memory used for `p_num', hence the mysterious value change.
>
> If you want to read just one character and treat it as
> a single character (not as a string), you should probably
> use "%c" instead of "%s" (but note that "%c" does not skip
> leading white space, if that's important). If you want
> to read a one-character string into a two-character array
> (one for the payload, one for the zero), keep "%1s" but
> change `char trial' to `char trial[2]' and change `&trial'
> to `trial'.
>
 
D

Dann Corbit

Instead of using scanf(), I suggest using fgets() followed by sscanf().

Always check the return of the function.

From the current ANSI/ISO C standard:

7.19.6.7 The sscanf function
Synopsis
1 #include <stdio.h>
int sscanf(const char * restrict s,
const char * restrict format, ...);
Description
2 The sscanf function is equivalent to fscanf, except that input is obtained
from a string (specified by the argument s) rather than from a stream.
Reaching the end of the string is equivalent to encountering end-of-file for
the fscanf function. If copying takes place between objects that overlap,
the behavior is undefined.
Returns
3 The sscanf function returns the value of the macro EOF if an input failure
occurs before any conversion. Otherwise, the sscanf function returns the
number of input items assigned, which can be fewer than provided for, or
even zero, in the event of an early matching failure.

7.19.6.2 The fscanf function
Synopsis
1 #include <stdio.h>
int fscanf(FILE * restrict stream,
const char * restrict format, ...);
Description
2 The fscanf function reads input from the stream pointed to by stream,
under control of the string pointed to by format that specifies the
admissible input sequences and how they are to be converted for assignment,
using subsequent arguments as pointers to the objects to receive the
converted input. If there are insufficient arguments for the format, the
behavior is undefined. If the format is exhausted while arguments remain,
the excess arguments are evaluated (as always) but are otherwise ignored.
3 The format shall be a multibyte character sequence, beginning and ending
in its initial shift state. The format is composed of zero or more
directives: one or more white-space characters, an ordinary multibyte
character (neither % nor a white-space character), or a conversion
specification. Each conversion specification is introduced by the character
%.
After the %, the following appear in sequence:
- An optional assignment-suppressing character *.
- An optional nonzero decimal integer that specifies the maximum field width
(in characters).
- An optional length modifier that specifies the size of the receiving
object.
- Aconversion specifier character that specifies the type of conversion to
be applied.
4 The fscanf function executes each directive of the format in turn. If a
directive fails, as detailed below, the function returns. Failures are
described as input failures (due to the occurrence of an encoding error or
the unavailability of input characters), or matching failures (due to
inappropriate input).
5 A directive composed of white-space character(s) is executed by reading
input up to the first non-white-space character (which remains unread), or
until no more characters can be read.
6 A directive that is an ordinary multibyte character is executed by reading
the next characters of the stream. If any of those characters differ from
the ones composing the directive, the directive fails and the differing and
subsequent characters remain unread. Similarly, if end-of-file, an encoding
error, or a read error prevents a character from being read, the directive
fails.
7 A directive that is a conversion specification defines a set of matching
input sequences, as described below for each specifier. A conversion
specification is executed in the following steps:
8 Input white-space characters (as specified by the isspace function) are
skipped, unless the specification includes a [, c, or n specifier.241)
9 An input item is read from the stream, unless the specification includes
an n specifier. An input item is defined as the longest sequence of input
characters which does not exceed any specified field width and which is, or
is a prefix of, a matching input sequence.242)
The first character, if any, after the input item remains unread. If the
length of the input item is zero, the execution of the directive fails; this
condition is a matching failure unless end-of-file, an encoding error, or a
read error prevented input from the stream, in which case it is an input
failure.
10 Except in the case of a % specifier, the input item (or, in the case of a
%n directive, the count of input characters) is converted to a type
appropriate to the conversion specifier. If the input item is not a matching
sequence, the execution of the directive fails: this condition is a matching
failure. Unless assignment suppression was indicated by a *, the result of
the conversion is placed in the object pointed to by the first argument
following the format argument that has not already received a conversion
result. If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the behavior is
undefined.
11 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, X, or n conversion specifier
applies to an argument with type pointer to signed char or unsigned char.
h Specifies that a following d, i, o, u, x, X, or n conversion specifier
applies to an argument with type pointer to short int or unsigned short int.
l (ell) Specifies that a following d, i, o, u, x, X, or n conversion
specifier applies to an argument with type pointer to long int or unsigned
long int; that a following a, A, e, E, f, F, g, or G conversion specifier
applies to an argument with type pointer to double; or that a following c,
s, or [ conversion specifier applies to an argument with type pointer to
wchar_t.
ll (ell-ell) Specifies that a following d, i, o, u, x, X, or n conversion
specifier applies to an argument with type pointer to long long int or
unsigned long long int.
j Specifies that a following d, i, o, u, x, X, or n conversion specifier
applies to an argument with type pointer to intmax_t or uintmax_t.
z Specifies that a following d, i, o, u, x, X, or n conversion specifier
applies to an argument with type pointer to size_t or the corresponding
signed integer type.
t Specifies that a following d, i, o, u, x, X, or n conversion specifier
applies to an argument with type pointer to ptrdiff_t or the corresponding
unsigned integer type.
L Specifies that a following a, A, e, E, f, F, g, or G conversion specifier
applies to an argument with type pointer to long double. If a length
modifier appears with any conversion specifier other than as specified
above, the behavior is undefined.
12 The conversion specifiers and their meanings are:
d Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtol function with the value 10
for the base argument. The corresponding argument shall be a pointer to
signed integer.
i Matches an optionally signed integer, whose format is the same as expected
for the subject sequence of the strtol function with the value 0 for the
base argument. The corresponding argument shall be a pointer to signed
integer.
o Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
u Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 10
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
x Matches an optionally signed hexadecimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the value
16 for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
a,e,f,g Matches an optionally signed floating-point number, infinity, or
NaN, whose format is the same as expected for the subject sequence of the
strtod
function. The corresponding argument shall be a pointer to floating.
c Matches a sequence of characters of exactly the number specified by the
field width (1 if no field width is present in the directive).243)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept
the sequence. No null character is added. If an l length modifier is
present, the input shall be a sequence of multibyte characters that begins
in the initial shift state. Each multibyte character in the sequence is
converted to a wide character as if by a call to the mbrtowc function, with
the conversion state described by an mbstate_t object initialized to zero
before the first multibyte character is converted. The corresponding
argument shall be a pointer to the initial element of an array of wchar_t
large enough to accept the resulting sequence of wide characters. No null
wide character is added.
s Matches a sequence of non-white-space characters.243)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept
the sequence and a terminating null character, which will be added
automatically. If an l length modifier is present, the input shall be a
sequence of multibyte characters that begins in the initial shift state.
Each multibyte character is converted to a wide character as if by a call to
the mbrtowc function, with the conversion state described by an mbstate_t
object initialized to zero before the first multibyte character is
converted. The corresponding argument shall be a pointer to the initial
element of an array of wchar_t large enough to accept the sequence and the
terminating null wide character, which will be added automatically.
[ Matches a nonempty sequence of characters from a set of expected
characters (the scanset).243)
If no l length modifier is present, the corresponding argument shall be a
pointer to the initial element of a character array large enough to accept
the sequence and a terminating null character, which will be added
automatically. If an l length modifier is present, the input shall be a
sequence of multibyte characters that begins in the initial shift state.
Each multibyte character is converted to a wide character as if by a call to
the mbrtowc function, with the conversion state described by an mbstate_t
object initialized to zero before the first multibyte character is
converted. The corresponding argument shall be a pointer to the initial
element of an array of wchar_t large enough to accept the sequence and the
terminating null wide character, which will be added automatically. The
conversion specifier includes all subsequent characters in the format
string, up to and including the matching right bracket (]). The characters
between the brackets (the scanlist) compose the scanset, unless the
character after the left bracket is a circumflex (^), in which case the
scanset contains all characters that do not appear in the scanlist between
the circumflex and the right bracket. If the conversion specifier begins
with [] or [^], the right bracket character is in the scanlist and the next
following right bracket character is the matching right bracket that ends
the specification; otherwise the first following right bracket character is
the one that ends the
specification. If a - character is in the scanlist and is not the first, nor
the second where the first character is a ^, nor the last character, the
behavior is implementation-defined.
p Matches an implementation-defined set of sequences, which should be the
same as the set of sequences that may be produced by the %p conversion of
the fprintf function. The corresponding argument shall be a pointer to a
pointer to void. The input item is converted to a pointer value in an
implementation-defined manner. If the input item is a value converted
earlier during the same program execution, the pointer that results shall
compare equal to that value; otherwise the behavior of the %p conversion is
undefined.
n No input is consumed. The corresponding argument shall be a pointer to
signed integer into which is to be written the number of characters read
from the input stream so far by this call to the fscanf function. Execution
of a %n directive does not increment the assignment count returned at the
completion of execution of the fscanf function. No argument is converted,
but one is consumed. If the conversion specification includes an assignment
suppressing character or a field width, the behavior is undefined.
% Matches a single % character; no conversion or assignment occurs. The
complete conversion specification shall be %%.
13 If a conversion specification is invalid, the behavior is undefined.244)
14 The conversion specifiers A, E, F, G, and X are also valid and behave the
same as, respectively, a, e, f, g, and x.
15 Trailing white space (including new-line characters) is left unread
unless matched by a directive. The success of literal matches and suppressed
assignments is not directly determinable other than via the %n directive.
Returns
16 The fscanf function returns the value of the macro EOF if an input
failure occurs before any conversion. Otherwise, the function returns the
number of input items assigned, which can be fewer than provided for, or
even zero, in the event of an early matching failure.
17 EXAMPLE 1 The call:
#include <stdio.h>
/* ... */
int n, i; float x; char name[50];
n = fscanf(stdin, "%d%f%s", &i, &x, name);
with the input line:
25 54.32E-1 thompson
will assign to n the value 3, to i the value 25, to x the value 5.432, and
to name the sequence
thompson\0.
18 EXAMPLE 2 The call:
#include <stdio.h>
/* ... */
int i; float x; char name[50];
fscanf(stdin, "%2d%f%*d %[0123456789]", &i, &x, name);
with input:
56789 0123 56a72
will assign to i the value 56 and to x the value 789.0, will skip 0123, and
will assign to name the sequence 56\0. The next character read from the
input stream will be a.
19 EXAMPLE 3 To accept repeatedly from stdin a quantity, a unit of measure,
and an item name:
#include <stdio.h>
/* ... */
int count; float quant; char units[21], item[21];
do {
count = fscanf(stdin, "%f%20s of %20s", &quant, units, item);
fscanf(stdin,"%*[^\n]");
} while (!feof(stdin) && !ferror(stdin));
20 If the stdin stream contains the following lines:
2 quarts of oil
-12.8degrees Celsius
lots of luck
10.0LBS of
dirt
100ergs of energy
the execution of the above example will be analogous to the following
assignments:
quant = 2; strcpy(units, "quarts"); strcpy(item, "oil");
count = 3;
quant = -12.8; strcpy(units, "degrees");
count = 2; // "C" fails to match "o"
count = 0; // "l" fails to match "%f"
quant = 10.0; strcpy(units, "LBS"); strcpy(item, "dirt");
count = 3;
count = 0; // "100e" fails to match "%f"
count = EOF;
21 EXAMPLE 4 In:
#include <stdio.h>
/* ... */
int d1, d2, n1, n2, i;
i = sscanf("123", "%d%n%n%d", &d1, &n1, &n2, &d2);
the value 123 is assigned to d1 and the value 3 to n1. Because %n can never
get an input failure the value
of 3 is also assigned to n2. The value of d2 is not affected. The value 1 is
assigned to i.
22 EXAMPLE 5 In these examples, multibyte characters do have a
state-dependent encoding, and the members of the extended character set that
consist of more than one byte each consist of exactly two bytes, the first
of which is denoted here by a and the second by an uppercase letter, but are
only recognized as such when in the alternate shift state. The shift
sequences are denoted by ­ and ¯, in which the first causes entry into the
alternate shift state.
23 After the call:
#include <stdio.h>
/* ... */
char str[50];
fscanf(stdin, "a%s", str);
with the input line:
a­ X Y¯ bc
str will contain ­ X Y¯\0 assuming that none of the bytes of the shift
sequences (or of the multibyte characters, in the more general case) appears
to be a single-byte white-space character.
24 In contrast, after the call:
#include <stdio.h>
#include <stddef.h>
/* ... */
wchar_t wstr[50];
fscanf(stdin, "a%ls", wstr);
with the same input line, wstr will contain the two wide characters that
correspond to X and Y and a terminating null wide character.
25 However, the call:
#include <stdio.h>
#include <stddef.h>
/* ... */
wchar_t wstr[50];
fscanf(stdin, "a­ X¯%ls", wstr);
with the same input line will return zero due to a matching failure against
the ¯ sequence in the format
string.
26 Assuming that the first byte of the multibyte character X is the same as
the first byte of the multibyte character Y, after the call:
#include <stdio.h>
#include <stddef.h>
/* ... */
wchar_t wstr[50];
fscanf(stdin, "a­ Y¯%ls", wstr);
with the same input line, zero will again be returned, but stdin will be
left with a partially consumed
multibyte character.
Forward references: the strtod, strtof, and strtold functions (7.20.1.3),
the strtol, strtoll, strtoul, and strtoull functions (7.20.1.4), conversion
state
(7.24.6), the wcrtomb function (7.24.6.3.3).

241) These white-space characters are not counted against a specified field
width.
242) fscanf pushes back at most one input character onto the input stream.
Therefore, some sequences that are acceptable to strtod, strtol, etc., are
unacceptable to fscanf.
243) No special provisions are made for multibyte characters in the matching
rules used by the c, s, and [
conversion specifiers - the extent of the input field is determined on a
byte-by-byte basis. The resulting field is nevertheless a sequence of
multibyte characters that begins in the initial shift state.
244) See ''future library directions'' (7.26.9).
 
E

Eric Sosman

Bernard Liang wrote On 07/20/06 18:02,:
In response, I have another question about the scanf family. After
reading in a %d value, for instance, do they immediately wade through
all subsequent whitespace until a non-whitespace character is reached,
and then continue analysing the format string?

No. White-space skipping happens (if it happens at
all) at the start of processing a conversion specifier,
not after the processing is finished.
For instance, suppose I
have a buffer

int num;
char chr1, chr2, chr3, chr4;
char[] my_string = "1 1 2 3 5 8abc";

and then I do something like
sscanf(my_string, "%*d %*d %*d %*d %d %c%c%c%c", &num, &chr1, &chr2,
&chr3, &chr4);

would I find " " in chr[1-4], or "8abc"?

You would find 8,a,b,c -- but *not* for the reason you
may be thinking. The spaces between 5 and 8 are not skipped
by the "%d" that converts the 5, nor are they skipped by the
"%c" that converts the 8. They are not "skipped" at all: they
are *matched* by the white space between "%d" and "%c" in the
format string. A white space character (any kind) in the format
matches as many white space characters (any kinds) as it can
from the input, stopping when it reaches (and does not consume)
a non-white character or when it hits the end.

Perhaps this would be a good time for you to reopen your C
textbook and spend some time reading what it has to say about
the scanf() family ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top