Re: sscanf(): weird behavior?

Discussion in 'C Programming' started by Chris Torek, Jun 26, 2003.

  1. Chris Torek

    Chris Torek Guest

    In article <>
    Sidney Cadot <> writes:
    >Thanks for your answers. The standard (I'm talking about N869 here
    >which is the closest thing I have) talks about a "string" argument
    >which, I assume, implies zero-termination in its terminology.


    Yes, this is where both the ability to crash, and the slowness you
    have experienced, come from.

    >N869 says [sscanf] is "equivalent" to fscanf. Now one could read
    >that to imply:
    >
    >(1) it has the same Big-Oh complexity as fscanf;


    The C Standards *never* address performance.

    >(2) If I use the format string "X" and the contents start with "Y",
    >the fscanf spec says the "Y" character and subsequent characters will
    >remain "unread". For the sscanf() I would assume this translates to
    >"not read-accessed".


    Although the scanf engine itself is not going to access them, the
    "string" wording at the front gives sscanf() license to read through
    the argument looking for the '\0' byte -- and indeed, real
    implementations do precisely that.

    >As to (mild) "abuse" of sscanf in my application, I beg to differ. The
    >400 MB behemoths I'm reading are mmap()'d read-only files in my
    >application in a binary format, which happen to contain ASCII-encoded
    >numbers here and there.


    I did not call it "abuse" myself; I merely pointed out that the C
    library is generally optimized for typical uses, and yours is
    nowhere near typical. Yours will probably also remain atypical as
    long as the C library is so slow at it, creating a certain
    chicken-and-egg problem. :)

    >It can happen that I have something like XXXXXYYYYYYYY where XXXXX is
    >a 5-digit decimal number and YYYYY is binary data possibly containing
    >digits. sscanf(BUF, "%5d") would be ideal for the job, if not for the
    >\0 restriction, and time complexity behavior.
    >
    >I'm sure I can find a way around this ...


    The easiest is, I think, to memcpy() the desired portion into a
    valid (and short :) ) C string, then apply strtol() on it. Why
    strtol()? Because sscanf() has to parse a format string, find the
    "%d" directive, copy[%] the number from the string-stream into a
    suitable -- i.e., '\0'-terminated -- buffer, and call strtol() or
    its equivalent anyway -- so if you already know that the input is
    supposed to be a number, you can avoid a lot of work.

    A particularly smart C compiler could see that a "%d" directive
    (one without a count, that is) will just invoke strtol() and optimize
    out the sscanf() step, but by doing this manually, you avoid the
    need for a particularly smart C compiler.

    [%Footnote: This copy step is not required for string-streams if
    the implementor is willing to duplicate the work that strtol()
    would perform, or have both the scanf engine and strtol() call some
    subsidiary function. But because scanf formats can have field
    widths -- as in the %5d directives in this very example -- and
    strtol() does not have such limits, and because a string-stream's
    backing memory may be read-only so that it is not always possible
    (much less advisable) to punch a '\0' byte in, scanf cannot blindly
    call strtol(), which might in this case read more than the desired
    five digits. Moreover, for input that comes from unbuffered FILE
    streams, if the scanf engine is going to use strtol() at all, it
    *does* have to copy the characters to a temporary buffer. The
    scanf() engine could do the strtol() work "in line", as it were,
    one incoming digit at a time, but only at the cost of considerably
    more code and the inability to share so much of the four integral
    conversions (%d, %o, %u, and %i) in all their variants (%hh, %h,
    none, %l, and %ll modifiers). (The ll modifiers need to use
    strtoll() of course, and the unsigned variants need strtoul() or
    strtoull(), but by factoring, we wind up with far less code inside
    the scanf engine, which is already enough of a maintenance nightmare
    as it is :) .)]
    --
    In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://67.40.109.61/torek/index.html (for the moment)
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Jun 26, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PX

    Weird problem: sscanf

    PX, Feb 18, 2004, in forum: C Programming
    Replies:
    3
    Views:
    504
    Mark McIntyre
    Feb 23, 2004
  2. dorayme
    Replies:
    1
    Views:
    623
    richard
    Jan 21, 2011
  3. richard
    Replies:
    0
    Views:
    587
    richard
    Jan 21, 2011
  4. richard
    Replies:
    0
    Views:
    618
    richard
    Jan 21, 2011
  5. Beauregard T. Shagnasty

    Re: A Weird Appearance for a Weird Site

    Beauregard T. Shagnasty, Jan 21, 2011, in forum: HTML
    Replies:
    1
    Views:
    440
    Captain Paralytic
    Jan 21, 2011
Loading...

Share This Page