sscanf question

Discussion in 'C Programming' started by Chad, Aug 1, 2008.

  1. Chad

    Chad Guest

    Given the following

    #include <stdio.h>

    int main(void)
    {
    char line[BUFSIZ];
    long arg1, arg2;

    while(fgets(line, BUFSIZ, stdin) != NULL){
    if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
    snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
    if(fputs(line, stdout) == EOF){
    fprintf(stderr, "output error\n");
    }
    }
    else{
    fprintf(stderr, "invalid input\n");
    }
    }

    return 0;
    }

    When I run it, I get the following
    [cdalten@localhost oakland]$ ./add
    4 5
    9
    65
    invalid input

    Now, here's the question. How come there has to be a space between the
    numbers? sscanf() in this case doesn't even have a space in the format
    args. Ie, it is like the following

    (sscanf(line,"%ld%ld",&arg1,&arg2)

    More to the point. How come when I enter 65, the computer won't spit
    back 11?
     
    Chad, Aug 1, 2008
    #1
    1. Advertising

  2. Chad

    Chad Guest

    On Aug 1, 5:05 am, Richard Heathfield <> wrote:
    > Chad said:
    >
    > > Given the following

    >
    > > #include <stdio.h>

    >
    > > int main(void)
    > > {
    > > char line[BUFSIZ];
    > > long arg1, arg2;

    >
    > > while(fgets(line, BUFSIZ, stdin) != NULL){
    > > if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){

    >
    > <snip>
    >
    > > When I run it, I get the following
    > > [cdalten@localhost oakland]$ ./add
    > > 4 5
    > > 9
    > > 65
    > > invalid input

    >
    > > Now, here's the question. How come there has to be a space between the
    > > numbers?

    >
    > If there didn't have to be, how would you add sixty-five to twenty-seven?
    >


    No idea. For whatever reasons, I thought I was maybe missing
    something. It does happen from time to time.
     
    Chad, Aug 1, 2008
    #2
    1. Advertising

  3. Chad

    santosh Guest

    Chad wrote:

    > Given the following
    >
    > #include <stdio.h>
    >
    > int main(void)
    > {
    > char line[BUFSIZ];
    > long arg1, arg2;
    >
    > while(fgets(line, BUFSIZ, stdin) != NULL){
    > if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
    > snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
    > if(fputs(line, stdout) == EOF){
    > fprintf(stderr, "output error\n");
    > }
    > }
    > else{
    > fprintf(stderr, "invalid input\n");
    > }
    > }
    >
    > return 0;
    > }
    >
    > When I run it, I get the following
    > [cdalten@localhost oakland]$ ./add
    > 4 5
    > 9
    > 65
    > invalid input
    >
    > Now, here's the question. How come there has to be a space between the
    > numbers?


    How will you differentiate two numbers otherwise? Consider the string

    3456

    How many numbers does this represent? One, two, three or four?

    > sscanf() in this case doesn't even have a space in the format
    > args. Ie, it is like the following
    >
    > (sscanf(line,"%ld%ld",&arg1,&arg2)
    >
    > More to the point. How come when I enter 65, the computer won't spit
    > back 11?


    The standard scanf field separator is whitespace as defined by the
    isspace function.
     
    santosh, Aug 1, 2008
    #3
  4. Chad

    Bartc Guest

    "santosh" <> wrote in message
    news:g6v5tv$2tp$...
    > Chad wrote:


    >> Now, here's the question. How come there has to be a space between the
    >> numbers?

    >
    > How will you differentiate two numbers otherwise? Consider the string
    >
    > 3456
    >
    > How many numbers does this represent? One, two, three or four?


    I make it ten different numbers, although not all at the same time.

    It is useful sometimes to pack numbers like this, where the width of each
    number is fixed. Perhaps using something like %2d%2d for the scanf format.

    --
    Bartc
     
    Bartc, Aug 1, 2008
    #4
  5. Chad

    Bill Reid Guest

    Richard Heathfield <> wrote in message
    news:...
    > Chad said:
    >
    > > Given the following
    > >
    > > #include <stdio.h>
    > >
    > > int main(void)
    > > {
    > > char line[BUFSIZ];
    > > long arg1, arg2;
    > >
    > > while(fgets(line, BUFSIZ, stdin) != NULL){
    > > if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){

    >
    > <snip>
    >
    > > When I run it, I get the following
    > > [cdalten@localhost oakland]$ ./add
    > > 4 5
    > > 9
    > > 65
    > > invalid input
    > >
    > > Now, here's the question. How come there has to be a space between the
    > > numbers?

    >
    > If there didn't have to be, how would you add sixty-five to twenty-seven?


    Boy, the trolls are having a field day today, with "troll zero" doing
    his usual bang-up job of trollery...

    Gee, how can you add 65 to 27, given a string of 6527, I don't know,
    maybe like THIS:

    sscanf(line,"%02d%02d",&arg1,&arg2);

    arg3=arg1+arg2;

    Though this MAY not be exactly what OP is looking for or really
    trying to do...but in any event, there is a max field width specifier
    in the *scanf() functions, so you can scan in numbers of a certain
    number of digits even though there is no space between them, as
    was pointed out by others...

    To illustrate a few examples (for the benefit of "troll zero", in case
    he ever wants to move beyond Usenet trollery and actually write
    some useful code):

    /* [0-pad year][0-pad month][0-pad day] "20020308" */
    case df_YZMZDZ :
    sscanf(string,"%04d%02d%02d",
    &date_components.year,
    &date_components.month,
    &date_components.day);
    break;

    Or a very typical set of defines (or enums) for configuration or whatever
    with levels of categories:

    #define category_0 0
    #define category_0_config_0 1
    ....
    #define category_1 100
    #define category_1_config_1 101
    ....

    Note that we can now populate our configuration variables (or whatever)
    from a "single" number (such a 206234317) in a configuration file (or
    whatever)
    by scanning it using the field width specifier as above...

    ---
    William Ernest Reid
     
    Bill Reid, Aug 1, 2008
    #5
  6. Chad

    Chris Torek Guest

    In article <>
    Chad <> wrote:
    >Given the following
    >
    >#include <stdio.h>
    >
    >int main(void)
    >{
    > char line[BUFSIZ];
    > long arg1, arg2;
    >
    > while(fgets(line, BUFSIZ, stdin) != NULL){
    > if(sscanf(line,"%ld%ld",&arg1,&arg2) == 2){
    > snprintf(line, sizeof(line), "%ld\n", arg1 + arg2);
    > if(fputs(line, stdout) == EOF){
    > fprintf(stderr, "output error\n");
    > }


    You could just printf() the result, instead of snprintf()-ing and
    then fputs()-ing. But this code is all correct, at least.

    > }
    > else{
    > fprintf(stderr, "invalid input\n");
    > }
    > }
    >
    > return 0;
    >}


    >When I run it, I get the following
    >[cdalten@localhost oakland]$ ./add
    >4 5
    >9
    >65
    >invalid input
    >
    >Now, here's the question. How come there has to be a space between the
    >numbers? sscanf() in this case doesn't even have a space in the format
    >args.


    Most people seem to misunderstand how the scanf engine works.

    The scanf family of functions all use a common "engine" to do
    their work. This "scanf engine" is fairly simple -- one might
    even say "simplistic" -- and simply executes "directives" in a
    sequence, one after another, until one of them "fails" or the
    engine runs out of directives, whichever occurs first.

    Directives consist of literal text, white space, or "conversions"
    introduced by "%". They are given by the "format arg" (singular,
    not plural) -- the first argument to the various scanf functions.

    Input characters are taken from a supplied stdio stream (which is
    to say, a valid "FILE *" value) as needed. Input characters are
    "consumed" in the process, which means that a future fgetc() on
    the stream will no longer see them: they are gone forever. (But
    one single "char" may be read and then put back, in the same manner
    as ungetc(), if needed for internal purposes.) In the case of
    sscanf() (and vsscanf() in C99), a temporary internal "string-stream"
    is created with input coming from the string, and destroyed by the
    time sscanf() (or vsscanf()) returns, and characters "consumed" by
    the stream are still in the original string, so in this respect,
    string-streams are much more forgiving.

    Your format contains two %ld directives (and no white space, and no
    other characters), as you note. But each "%ld" directive means
    the same thing, which is:

    Step 1: consume (and ignore) any white space on the stream, so
    that the next available input character is non-white-space.

    Step 2: consume (and save) as many decimal digits as possible,
    with optional prefix sign, so that the next available input
    character is not a decimal digit.

    Step 3: convert the consumed digits to a "long" and store the
    result via the supplied pointer (which must of course point to
    a "long").

    This directive will fail if there are no decimal digits available
    after the whitespace is consumed. Steps 2 and 3 may also be combined
    internally.

    (The failure is an "input failure" if fgetc() would return EOF on
    the stream, and a "matching failure" otherwise. C99 adds one more
    failure case, which I do not fully understand and do not address
    here. The difference between "input failure" and the others only
    matters for the first conversion: input failure with no conversions
    makes the scanf engine return EOF, while matching failure at that
    point makes it return 0. If there have been successful conversions
    and assignments, the engine returns the number of assignments.

    In this case, if the engine returns 1 -- indicating the first "%ld"
    worked, but the second failed -- you cannot distinguish immediately
    between "input failure" and "matching failure" for the second
    conversion. If this were not a "string-stream", you could use
    (feof(stream) || ferror(stream)) to test whether there was an input
    failure on the stream, if you really cared.)

    Note that both steps 1 and 2 can require "putting back" a character
    on the stream, because "consume as many characters as possible that
    meet some test" is done as if by:

    int c;

    do {
    c = fgetc(stream);
    } while (c != EOF && whatever_test_applies_here(c));
    if (c != EOF)
    ungetc(c, stream);

    (but usually "more efficiently", in some tricky way the implementation
    has internally).

    >More to the point. How come when I enter 65, the computer won't spit
    >back 11?


    If you follow the three steps described above, it should become clear
    why.

    Note that changing the directives will change the steps. If you
    include a width specifier, step 2 in particular changes. "%1ld"
    and "%2ld" would limit step 2 to consuming at most 1 or at most 2
    characters (respectively), for instance, i.e., the directive is now
    handled with code equivalent to:

    int c;
    int i, max;
    char buf[SOME_SIZE];

    do {
    c = fgetc(stream);
    } while (isspace(c)); /* note that isspace(EOF) is false */
    if (c != EOF)
    ungetc(c, stream);

    #define IS_SIGN(c) ((c) == '+' || c == '-')

    max = <the supplied format width>;
    for (i = 0; i < max; i++) {
    c = fgetc(stream);
    /* note that isdigit(EOF) is false, so no separate test needed */
    if (isdigit(c) || (i == 0 && IS_SIGN(c)))
    buf = c;
    else
    break;
    }
    if (c != EOF)
    ungetc(c, stream);

    if (i > 0 && isdigit(buf[i - 1])) {
    buf = '\0';
    *va_arg(ap, long *) = atol(buf); /* or strtol(buf, NULL, 10) */
    } else
    ... handle input or matching failure, depending on c==EOF ...

    Note: I cannot find the actual limit in the C99 draft standard text
    I keep handy for searches, but I believe that "%ld" -- with no hard
    limit on the number of decimal digits -- is allowed to "act like"
    %4095ld or similar, so that the size of the "buf" into which decimal
    digits are saved need not be infinite. (Of course, implementations
    can simply combine the reading and converting:

    unsigned long result = 0;
    int sign = 0;

    max = <whatever, possibly LLONG_MAX with max having type long long>;
    for (i = 0; i < max; i++) {
    c = fgetc(stream);
    if (i == 0 && IS_SIGN(c)) {
    if (c == '-')
    sign = 1;
    } else if (isdigit(c))
    result = (result * 10) + (c - '0');
    else
    break;
    }
    ... handle the rest similarly, but no need for atol/strtol ...

    but this means that %ld input behaves differently on overflow than
    does strtol(). This is allowed, but I prefer implementations that
    handle overflow the same way in the scanf engine as in strtol().)

    This is a lot to remember (which is why it is a good idea to keep
    a reference handy, to look up all the details on how the scanf
    engine has to work). But there are a couple of key items that
    you should memorize, if you are going to use the scanf family:

    - almost all conversions begin by skipping initial white space;
    - "white space" includes newlines;
    - almost all conversions DO NOT skip trailing white space.

    This means that applying the scanf family to a stdio stream almost
    always leaves "trailing white space" -- usually a newline -- behind
    in the stream. This trailing white space will cause you trouble
    later. It is tempting to add code to remove it, but this is usually
    a mistake, because it is only *almost* always left behind, so if
    you simply always remove another "line" ended by a newline, you
    will sometimes remove input you should have left alone. The best
    approach -- besides "avoid scanf entirely" :) -- tends to be "read
    a line with a line-oriented function, then use sscanf() on the
    resulting string". This gives you much more control, and much more
    "obvious predictability" on how the program will behave with various
    inputs. Code that is obvious and predictable tends to be easier
    to debug, and hence more reliable and useful in the long run, than
    code that is obscure.

    (It is possible, but somewhat difficult, to "read a line" with
    the scanf family. The code to do this is somewhat obscure. It
    does appear here in comp.lang.c now and then.)
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: gmail (figure it out) http://web.torek.net/torek/index.html
     
    Chris Torek, Aug 3, 2008
    #6
  7. Chad

    CBFalconer Guest

    Chris Torek wrote:
    >

    .... snip good explanation of scanf ...
    >
    > (It is possible, but somewhat difficult, to "read a line" with
    > the scanf family. The code to do this is somewhat obscure. It
    > does appear here in comp.lang.c now and then.)


    Er - Dan Pop hasn't posted here for at least 2 years :)

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.
     
    CBFalconer, Aug 3, 2008
    #7
  8. Chad

    Guest

    On Aug 3, 11:02 am, Chris Torek <> wrote:
    <snip>
    > (It is possible, but somewhat difficult, to "read a line" with
    > the scanf family. The code to do this is somewhat obscure. It
    > does appear here in comp.lang.c now and then.)


    I'd say it's impossible in robust code; the stream can have unwanted
    embedded null bytes, which scanf will happily read.
     
    , Aug 3, 2008
    #8
  9. Chad

    CBFalconer Guest

    wrote:
    > Chris Torek <> wrote:
    >
    > <snip>
    >
    >> (It is possible, but somewhat difficult, to "read a line" with
    >> the scanf family. The code to do this is somewhat obscure. It
    >> does appear here in comp.lang.c now and then.)

    >
    > I'd say it's impossible in robust code; the stream can have
    > unwanted embedded null bytes, which scanf will happily read.


    So? A null byte is not a digit, nor a period, so it will normally
    be treated as marking the end of a numeric field.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.
     
    CBFalconer, Aug 3, 2008
    #9
  10. writes:

    > On Aug 3, 11:02 am, Chris Torek <> wrote:
    > <snip>
    >> (It is possible, but somewhat difficult, to "read a line" with
    >> the scanf family. The code to do this is somewhat obscure. It
    >> does appear here in comp.lang.c now and then.)

    >
    > I'd say it's impossible in robust code; the stream can have unwanted
    > embedded null bytes, which scanf will happily read.


    That is not a problem. Given:

    char line[101], nl[2];
    int nchars;

    The call:

    scanf("%100[^\n]%n%[\n]", line, &nchars, nl)

    tells us all we need to know. If the return is 2 we saw a whole
    line. If the return is 1 it is partial. In both cases, nchars is the
    number of characters read (excluding a newline if present) and will
    happily include nulls in this count.

    --
    Ben.
     
    Ben Bacarisse, Aug 3, 2008
    #10
  11. Ben Bacarisse <> writes:

    > writes:
    >
    >> On Aug 3, 11:02 am, Chris Torek <> wrote:
    >> <snip>
    >>> (It is possible, but somewhat difficult, to "read a line" with
    >>> the scanf family. The code to do this is somewhat obscure. It
    >>> does appear here in comp.lang.c now and then.)

    >>
    >> I'd say it's impossible in robust code; the stream can have unwanted
    >> embedded null bytes, which scanf will happily read.

    >
    > That is not a problem. Given:
    >
    > char line[101], nl[2];
    > int nchars;
    >
    > The call:
    >
    > scanf("%100[^\n]%n%[\n]", line, &nchars, nl)


    I missed the 1 in the %1[\n] format, sorry. Anyway, you get the idea...

    > tells us all we need to know. If the return is 2 we saw a whole
    > line. If the return is 1 it is partial. In both cases, nchars is the
    > number of characters read (excluding a newline if present) and will
    > happily include nulls in this count.


    --
    Ben.
     
    Ben Bacarisse, Aug 3, 2008
    #11
  12. Chad

    Guest

    On Aug 3, 4:27 pm, CBFalconer <> wrote:
    > wrote:
    > > Chris Torek <> wrote:

    >
    > > <snip>

    >
    > >> (It is possible, but somewhat difficult, to "read a line" with
    > >> the scanf family. The code to do this is somewhat obscure. It
    > >> does appear here in comp.lang.c now and then.)

    >
    > > I'd say it's impossible in robust code; the stream can have
    > > unwanted embedded null bytes, which scanf will happily read.

    >
    > So? A null byte is not a digit, nor a period, so it will normally
    > be treated as marking the end of a numeric field.


    What are you talking about?
     
    , Aug 3, 2008
    #12
  13. Chad

    Guest

    On Aug 3, 4:57 pm, Ben Bacarisse <> wrote:
    > Ben Bacarisse <> writes:
    > > writes:

    >
    > >> On Aug 3, 11:02 am, Chris Torek <> wrote:
    > >> <snip>
    > >>> (It is possible, but somewhat difficult, to "read a line" with
    > >>> the scanf family. The code to do this is somewhat obscure. It
    > >>> does appear here in comp.lang.c now and then.)

    >
    > >> I'd say it's impossible in robust code; the stream can have unwanted
    > >> embedded null bytes, which scanf will happily read.

    >
    > > That is not a problem. Given:

    >
    > > char line[101], nl[2];
    > > int nchars;

    >
    > > The call:

    >
    > > scanf("%100[^\n]%n%[\n]", line, &nchars, nl)

    >
    > I missed the 1 in the %1[\n] format, sorry. Anyway, you get the idea...
    >
    > > tells us all we need to know. If the return is 2 we saw a whole
    > > line. If the return is 1 it is partial. In both cases, nchars is the
    > > number of characters read (excluding a newline if present) and will
    > > happily include nulls in this count.


    nchar is indeed the number of characters/bytes read. There needs to be
    another check like

    if(strlen(line) != nchar) /* embedded lunn bytes */

    As you see, `line' is processed twice, which may be unwanted in
    'robust' code.
     
    , Aug 3, 2008
    #13
  14. On Sun, 3 Aug 2008 14:25:15 -0700 (PDT), wrote:

    >On Aug 3, 4:27 pm, CBFalconer <> wrote:
    >> wrote:
    >> > Chris Torek <> wrote:

    >>
    >> > <snip>

    >>
    >> >> (It is possible, but somewhat difficult, to "read a line" with
    >> >> the scanf family. The code to do this is somewhat obscure. It
    >> >> does appear here in comp.lang.c now and then.)

    >>
    >> > I'd say it's impossible in robust code; the stream can have
    >> > unwanted embedded null bytes, which scanf will happily read.

    >>
    >> So? A null byte is not a digit, nor a period, so it will normally
    >> be treated as marking the end of a numeric field.

    >
    >What are you talking about?


    Well, you said '\0' characters would pose some difficulty. He said
    they wouldn't.

    So let's start at the beginning. Why do you think they prevent robust
    code from using scanf? Before you answer, I suggest you look through
    the archives for posts by Dan Pop that describe exactly how to do
    this.

    --
    Remove del for email
     
    Barry Schwarz, Aug 4, 2008
    #14
  15. Chad

    CBFalconer Guest

    wrote:
    > CBFalconer <> wrote:
    >> wrote:
    >>> Chris Torek <> wrote:

    >>
    >>> <snip>
    >>>
    >>>> (It is possible, but somewhat difficult, to "read a line" with
    >>>> the scanf family. The code to do this is somewhat obscure. It
    >>>> does appear here in comp.lang.c now and then.)
    >>>
    >>> I'd say it's impossible in robust code; the stream can have
    >>> unwanted embedded null bytes, which scanf will happily read.

    >>
    >> So? A null byte is not a digit, nor a period, so it will normally
    >> be treated as marking the end of a numeric field.

    >
    > What are you talking about?


    The handling of null (i.e. '\0') bytes by the scanf family.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.
     
    CBFalconer, Aug 4, 2008
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dan Smith

    Question on sscanf

    Dan Smith, Aug 14, 2003, in forum: C Programming
    Replies:
    3
    Views:
    431
    Martin Ambuhl
    Aug 14, 2003
  2. Brent Lievers

    sscanf fixed-width integer question

    Brent Lievers, Apr 22, 2004, in forum: C Programming
    Replies:
    8
    Views:
    668
    Dan Pop
    Apr 26, 2004
  3. sscanf() question?

    , Jun 9, 2005, in forum: C Programming
    Replies:
    5
    Views:
    548
  4. broeisi

    Programmer wannabee question about sscanf

    broeisi, Mar 4, 2006, in forum: C Programming
    Replies:
    10
    Views:
    547
    CBFalconer
    Mar 5, 2006
  5. sscanf question

    , Nov 22, 2006, in forum: C Programming
    Replies:
    5
    Views:
    647
Loading...

Share This Page