sscanf parsing doubt

Discussion in 'C Programming' started by Simone Mehta, Sep 26, 2004.

  1. Simone Mehta

    Simone Mehta Guest

    hi All,
    I am parsing a CSV file.
    I want to read every row into a char array of reasonable size and then
    extract strings from it.
    <snippet>
    char foo[128]="hello,world,bye,bye,world";
    .....
    sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    <snippet/>
    This is giving me junk .
    I understand it is not finding '\0' to scan (%s) strings.
    but then I cannot use %c also .
    I think i can use like "%64c%*[,]%64c" .
    Please enlighten me as to the algo to be used here . Am i doing it the
    right way ?

    Thanks In Advance,
    Simone Mehta.





    --
    live life Queen Size.
     
    Simone Mehta, Sep 26, 2004
    #1
    1. Advertising

  2. Simone Mehta

    Michael Mair Guest

    Hi,

    > <snippet>
    > char foo[128]="hello,world,bye,bye,world";
    > ....
    > sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    > <snippet/>
    > This is giving me junk .
    > I understand it is not finding '\0' to scan (%s) strings.


    Nope. It gives you junk because %s spans from white space to
    white space. Commas are not white spaces, so s1 gets it all.

    Check the return value of scanf(), this tells you how many
    input items you actually could read.

    Use the scanset: For example, you can scan for "%[^, \t]"
    which stops at the first comma, blank or tabulator.


    > but then I cannot use %c also .
    > I think i can use like "%64c%*[,]%64c" .


    No. The c conversion specifier will not give you strings
    but character arrays which can be nasty to handle.
    Apart from that, the problem of the comma being gobbled
    by %64c still persists.


    Apart from that, using a field width for reading in the
    strings to be stored in s1 through s5 is a Good Idea.
    If a string before the last item was too long, the return value
    of scanf will tell you. For the last item, look up
    Pop's Device here in the newsgroup to see how to get
    rid of the rest of the line.


    Cheers
    Michael


    #include <stdio.h>
    #include <stdlib.h>


    #define MAXITEMLEN 32

    #define STRINGIZE(s) # s
    #define XSTR(s) STRINGIZE(s)

    #define DONTSCAN ", \t"
    #define ITEMFORMAT "[^" DONTSCAN "]"
    #define MAXITEMFORMAT XSTR(MAXITEMLEN) ITEMFORMAT

    #define ONEITEM "%" MAXITEMFORMAT
    #define SEP "%*[" DONTSCAN "]"

    int main (void)
    {
    char foo[128] = "hello,world, bye ,\tbye\t,world";
    char s0[MAXITEMLEN], s1[MAXITEMLEN], s2[MAXITEMLEN];
    char s3[MAXITEMLEN], s4[MAXITEMLEN];
    int rv;

    rv = sscanf(foo, " " ONEITEM SEP ONEITEM SEP ONEITEM SEP
    ONEITEM SEP ONEITEM, s0, s1, s2, s3, s4);

    switch (rv) {
    case 5:
    fprintf(stdout,"s4: %s\n",s4);
    case 4:
    fprintf(stdout,"s3: %s\n",s3);
    case 3:
    fprintf(stdout,"s2: %s\n",s2);
    case 2:
    fprintf(stdout,"s1: %s\n",s1);
    case 1:
    fprintf(stdout,"s0: %s\n",s0);
    default:
    if (rv != 5) {
    fprintf(stderr, "Did not get all items!\n");
    exit(EXIT_FAILURE);
    }
    }


    return 0;
    }
     
    Michael Mair, Sep 26, 2004
    #2
    1. Advertising

  3. Simone Mehta

    pete Guest

    Simone Mehta wrote:
    >
    > hi All,
    > I am parsing a CSV file.
    > I want to read every row into a char array of reasonable size and then
    > extract strings from it.
    > <snippet>
    > char foo[128]="hello,world,bye,bye,world";
    > ....
    > sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    > <snippet/>
    > This is giving me junk .
    > I understand it is not finding '\0' to scan (%s) strings.
    > but then I cannot use %c also .
    > I think i can use like "%64c%*[,]%64c" .
    > Please enlighten me as to the algo to be used here . Am i doing it the
    > right way ?


    I think the smimplest way, is to read whole lines from the file
    into strings, and then to process the strings in memory.

    /* BEGIN output from new.c */

    helloworldbyebyeworld

    /* END output from new.c */



    /* BEGIN new.c */

    #include <stdio.h>
    #include <string.h>

    int main(void)
    {
    char foo[128] = "hello,world,bye,bye,world";
    char *pointer;

    for (pointer = foo; *pointer != '\0'; ++pointer) {
    if (*pointer == ',') {
    memmove(pointer, pointer + 1, strlen(pointer));
    }
    }
    puts("\n/* BEGIN output from new.c */\n");
    puts(foo);
    puts("\n/* END output from new.c */");
    return 0;
    }

    /* END new.c */


    --
    pete
     
    pete, Sep 26, 2004
    #3
  4. Simone Mehta

    Michael Mair Guest

    Hi pete,


    it seems to me that you misunderstood the OP's question:

    >> I am parsing a CSV file.
    >>I want to read every row into a char array of reasonable size and then

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    >>extract strings from it.

    ^^^^^^^^^^^^^^^^^^^^^^^^^
    Note: The OP is doing things line by line.
    He wants to set s1 through s5.

    >> [snip! code <snippet> and questions to that]

    >
    > I think the smimplest way, is to read whole lines from the file
    > into strings, and then to process the strings in memory.


    Which is what the OP does, if I understood him/her correctly.


    > /* BEGIN output from new.c */
    >
    > helloworldbyebyeworld
    >
    > /* END output from new.c */
    >
    >
    > /* BEGIN new.c */
    >
    > #include <stdio.h>
    > #include <string.h>
    >
    > int main(void)
    > {
    > char foo[128] = "hello,world,bye,bye,world";
    > char *pointer;
    >
    > for (pointer = foo; *pointer != '\0'; ++pointer) {
    > if (*pointer == ',') {
    > memmove(pointer, pointer + 1, strlen(pointer));
    > }
    > }
    > puts("\n/* BEGIN output from new.c */\n");
    > puts(foo);
    > puts("\n/* END output from new.c */");
    > return 0;
    > }
    >
    > /* END new.c */


    I would suggest the following modification:

    > #include <stdio.h>
    > #include <string.h>

    #include <assert.h>

    #define MAXNUMENTRIES 5

    > int main(void)
    > {
    > char foo[128] = "hello,world,bye,bye,world";

    char *pointer, *s[MAXNUMENTRIES+1];
    size_t i=0;
    >

    s[i++] = foo;
    > for (pointer = foo; *pointer != '\0'; ++pointer) {
    > if (*pointer == ',') {

    *pointer = '\0';
    s[i++] = pointer+1;
    > }
    > }

    assert(i<=MAXNUMENTRIES);
    s = NULL; /* Signify end of valid entries */
    > puts("\n/* BEGIN output from new.c */\n");

    for (i=0; s != NULL; i++)
    puts(s);
    > puts("\n/* END output from new.c */");
    > return 0;
    > }


    I did not test it, though; just wanted to make clear
    how to do it :)


    Cheers
    Michael
     
    Michael Mair, Sep 26, 2004
    #4
  5. Simone Mehta

    Simone Mehta Guest

    Hi pete,Michael,
    thanks for the useful replies.

    >Michael Mair <-stuttgart.de>
    >
    > it seems to me that you misunderstood the OP's question:


    you are right Michael I want to scan line by line.
    >
    >
    > I would suggest the following modification:
    >
    > > #include <stdio.h>
    > > #include <string.h>

    > #include <assert.h>
    >
    > #define MAXNUMENTRIES 5
    >

    I am able to get the same using your program michael.
    but need to go for sscanf is that .
    csv files have strings with quotes also.
    like "hello",world,"foo",FSM,"comp,lang,c"
    so this being the case. I will have to maintain a small FSM when it
    comes to quote
    which can make things difficult.
    So i wanted to train sscanf to identify quotes or strings without
    them.
    but sscanf seems to have a real bad man page or maybe I am not able to
    understand much from it.
    I would in the above case be interested in
    s1=hello
    s2=world
    s3=foo
    s4=FSM
    s5=comp,lang,c

    any sscanf URLs/bookmarks any one has, explaining a little more would
    be a great help. google has helped me a lot but not much on this one
    though...

    TIA,
    Simone Mehta
     
    Simone Mehta, Sep 27, 2004
    #5
  6. Simone Mehta

    Michael Mair Guest

    Hi Simone,

    >>I would suggest the following modification:
    >>

    [Modified code, original code from pete]
    >
    > I am able to get the same using your program michael.
    > but need to go for sscanf is that .
    > csv files have strings with quotes also.
    > like "hello",world,"foo",FSM,"comp,lang,c"
    > so this being the case. I will have to maintain a small FSM when it
    > comes to quote which can make things difficult.
    > So i wanted to train sscanf to identify quotes or strings without
    > them.


    Hmmm, considering that, I would advise you to abandon sscanf
    as a solution for the whole line -- you just cannot get that
    in readable code. So, sscanf essentially will give you more
    of a headache than it gains in (seeming) shortness and
    conciseness.

    > but sscanf seems to have a real bad man page or maybe I am not able to
    > understand much from it.

    .....
    > any sscanf URLs/bookmarks any one has, explaining a little more would
    > be a great help. google has helped me a lot but not much on this one
    > though...


    Well, it is not very good, but the man pages at dinkumware.com
    ( http://www.dinkumware.com/refxc.html ) about formatted I/O may
    help you a little bit more. Apart from that: Many people are
    requesting scanf-format help around here, so maybe a google-search
    through comp.lang.c archives can give you a better understanding
    of what is happening.


    > I would in the above case be interested in
    > s1=hello
    > s2=world
    > s3=foo
    > s4=FSM
    > s5=comp,lang,c


    If you know _beforehand_ in which places to expect quotation marks,
    you can easily adjust the format in my example.
    Otherwise, I would just go through the string in the way pete
    has showed. If you encounter a '\"' as first character after
    a comma (and zero or more white spaces), just search for '\"'
    instead of a terminating ',' and after finding it, throw away
    everything up to the next ','...


    Cheers
    Michael
     
    Michael Mair, Sep 27, 2004
    #6
  7. Simone Mehta

    Dag Viken Guest

    "Simone Mehta" <> wrote in message
    news:...
    > hi All,
    > I am parsing a CSV file.
    > I want to read every row into a char array of reasonable size and then
    > extract strings from it.
    > <snippet>
    > char foo[128]="hello,world,bye,bye,world";
    > ....
    > sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    > <snippet/>
    > This is giving me junk .
    > I understand it is not finding '\0' to scan (%s) strings.
    > but then I cannot use %c also .
    > I think i can use like "%64c%*[,]%64c" .
    > Please enlighten me as to the algo to be used here . Am i doing it the
    > right way ?
    >
    > Thanks In Advance,
    > Simone Mehta.


    You could use
    sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^,]", s1, s2, s3, s4, s5);
    where s1,s2,s3,s4,s5 all point to string buffers;

    You could also try this:

    char foo[128] = "hello,world,bye,bye,world";
    char* sep = ",";
    char* str;
    int n;
    for (n=0, str=strtok(foo,sep); n++, str!=NULL; str=strtok(NULL,sep))
    printf("%d: %s\n", n, str);

    which gives me the output:
    1: hello
    2: world
    3: bye
    4: bye
    5: world

    Note that strtok will replace the commas with a NULLs in foo. Also, avoid
    strtok in multi-threaded applications since it uses static data to preserve
    context.

    Dag
     
    Dag Viken, Sep 27, 2004
    #7
  8. Simone Mehta

    Dan Pop Guest

    In <> (Simone Mehta) writes:

    > I am parsing a CSV file.
    >I want to read every row into a char array of reasonable size and then
    >extract strings from it.
    ><snippet>
    >char foo[128]="hello,world,bye,bye,world";
    >....
    >sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    ><snippet/>
    >This is giving me junk .


    What else can you expect from your brain dead sscanf call?

    >I understand it is not finding '\0' to scan (%s) strings.


    You appear to be completely clueless about how %s works.

    >but then I cannot use %c also .


    %c is useful only when you know in advance how many characters you want
    to read. And it doesn't store its output as a properly terminated string.

    >I think i can use like "%64c%*[,]%64c" .


    %64c is hardly any better than %s. I'd say it's actually worse...

    >Please enlighten me as to the algo to be used here . Am i doing it the
    >right way ?


    Nope. Which is to be expected, since you have obviously not bothered to
    *carefully* read the specification of the sscanf function. The first rule
    of programming: if you don't know what you're doing, don't do it at all.

    A %s directive starts by skipping white space (if any) and then it
    consumes everything until a white space character or the null character
    terminating the input string are encountered. Your string has no white
    space characters, so the first %s will store the whole string in s1.
    So, %s is useless for your purpose. The right solution is:

    rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);

    The last conversion specification can be %s if your fields cannot contain
    white space. No need for %*[,] unless you want to skip multiple commas,
    which doesn't make much sense (no point in skipping multiple commas if
    you don't know their exact position inside the input string).

    Always check the value of rc, instead of blindly assuming that all 5
    fields were properly extracted from the input string.

    Trivia quiz: why did I use %[^\n] for the last conversion?

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
    Currently looking for a job in the European Union
     
    Dan Pop, Sep 27, 2004
    #8
  9. Simone Mehta

    Chris Torek Guest

    In article <news:>
    Simone Mehta <> wrote:
    >csv files have strings with quotes also.
    >like "hello",world,"foo",FSM,"comp,lang,c"
    >so this being the case. I will have to maintain a small FSM when it
    >comes to quote
    >which can make things difficult.
    >So i wanted to train sscanf to identify quotes or strings without
    >them. ...


    The scanf engine is less powerful than regular expressions, and
    in this case, is not powerful enough to do what you want.

    Note that even regular expressions -- which *can* match quotes,
    at least in some RE systems -- cannot handle more-general parsing
    tasks, such as matching parentheses. But clearly the scanf engine,
    which does only literal matches without alternation, is not enough
    by itself to handle both quoted and unquoted strings. The closest
    you can get is a sort of "manual alternation" scheme:

    while (there is more to scan) {
    if (this item begins with a double quote) {
    run scanf engine on RE-subset "[^"]+", e.g.:

    ret = sscanf(&buf[offset], "\"%79[^\"]%c%n",
    dequoted_string, &doublequote_char, &more_offset);
    if (ret != 2) ... handle error ...

    now doublequote_char is " and more_offset says how many
    characters were scanned. Note that this assumes the
    dequoted_string[] array has size 80 or more (%79 above).
    } else {
    run scanf engine on RE-subset [^,]+
    }
    }

    This is still not good enough for "real" CSV files, which allow
    quoting the quote marks (in various ways).

    I recommend writing a real (but ad-hoc) lexer (or finding one, e.g.,
    via google search, and adapting it if needed).
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Sep 27, 2004
    #9
  10. Simone Mehta

    Ravi Uday Guest

    "Dan Pop" <> wrote in message
    news:cj9fkr$qvh$...
    > In <>

    (Simone Mehta) writes:
    >
    > > I am parsing a CSV file.
    > >I want to read every row into a char array of reasonable size and then
    > >extract strings from it.
    > ><snippet>
    > >char foo[128]="hello,world,bye,bye,world";
    > >....
    > >sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    > ><snippet/>
    > >This is giving me junk .

    >
    > What else can you expect from your brain dead sscanf call?
    >
    > >I understand it is not finding '\0' to scan (%s) strings.

    >
    > You appear to be completely clueless about how %s works.
    >
    > >but then I cannot use %c also .

    >
    > %c is useful only when you know in advance how many characters you want
    > to read. And it doesn't store its output as a properly terminated string.
    >
    > >I think i can use like "%64c%*[,]%64c" .

    >
    > %64c is hardly any better than %s. I'd say it's actually worse...
    >
    > >Please enlighten me as to the algo to be used here . Am i doing it the
    > >right way ?

    >
    > Nope. Which is to be expected, since you have obviously not bothered to
    > *carefully* read the specification of the sscanf function. The first rule
    > of programming: if you don't know what you're doing, don't do it at all.
    >
    > A %s directive starts by skipping white space (if any) and then it
    > consumes everything until a white space character or the null character
    > terminating the input string are encountered. Your string has no white
    > space characters, so the first %s will store the whole string in s1.
    > So, %s is useless for your purpose. The right solution is:
    >
    > rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);
    >
    > The last conversion specification can be %s if your fields cannot contain
    > white space. No need for %*[,] unless you want to skip multiple commas,
    > which doesn't make much sense (no point in skipping multiple commas if
    > you don't know their exact position inside the input string).
    >
    > Always check the value of rc, instead of blindly assuming that all 5
    > fields were properly extracted from the input string.
    >
    > Trivia quiz: why did I use %[^\n] for the last conversion?
    >

    Does it serve any purpose ? Because sscanf would terminate anyways if it
    encounters '\0' which in the OP
    code is present.

    > Dan
    > --
    > Dan Pop
    > DESY Zeuthen, RZ group
    > Email:
    > Currently looking for a job in the European Union
     
    Ravi Uday, Sep 28, 2004
    #10
  11. Simone Mehta

    Dan Pop Guest

    In <1096348626.492309@sj-nntpcache-3> "Ravi Uday" <> writes:


    >"Dan Pop" <> wrote in message
    >news:cj9fkr$qvh$...
    >>
    >> Trivia quiz: why did I use %[^\n] for the last conversion?
    >>

    > Does it serve any purpose ? Because sscanf would terminate anyways if it
    >encounters '\0' which in the OP
    > code is present.


    Try broadening your horizon, beyond the artificial example of the OP.
    In real programs, where do such strings come from?

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
    Currently looking for a job in the European Union
     
    Dan Pop, Sep 28, 2004
    #11
  12. Simone Mehta

    Moonie Guest

    sscanf( str, "%s%*c%s%*c%s%*c%s%*c%s", would suffice or have you trie
    strtok()


    -
    Mooni
    -----------------------------------------------------------------------
    Posted via http://www.codecomments.co
    -----------------------------------------------------------------------
     
    Moonie, Sep 28, 2004
    #12
  13. (Dan Pop) wrote in message news:<cj9fkr$qvh$>...
    > In <> (Simone Mehta) writes:
    >
    > > I am parsing a CSV file.
    > >I want to read every row into a char array of reasonable size and then
    > >extract strings from it.
    > ><snippet>
    > >char foo[128]="hello,world,bye,bye,world";
    > >....
    > >sscanf(foo,"%s%*[,]%s%*[,]%s%*[,]%s%*[,]%s",s1,s2,s3,s4,s5);
    > ><snippet/>
    > >This is giving me junk .

    >
    > What else can you expect from your brain dead sscanf call?
    >
    > >I understand it is not finding '\0' to scan (%s) strings.

    >
    > You appear to be completely clueless about how %s works.

    It appears you are in complete control of the situation then pray give
    the right answer , stop bullying around the OP.
    >
    > >but then I cannot use %c also .

    >
    > %c is useful only when you know in advance how many characters you want
    > to read. And it doesn't store its output as a properly terminated string.
    >
    > >I think i can use like "%64c%*[,]%64c" .

    >
    > %64c is hardly any better than %s. I'd say it's actually worse...
    >
    > >Please enlighten me as to the algo to be used here . Am i doing it the
    > >right way ?

    >
    > Nope. Which is to be expected, since you have obviously not bothered to
    > *carefully* read the specification of the sscanf function. The first rule
    > of programming: if you don't know what you're doing, don't do it at all.

    The OP has some confusions thats why he has turned to the list.
    don't scare her. I am sure she must have tried the Circumflex with
    lilttle success,.
    >
    > A %s directive starts by skipping white space (if any) and then it
    > consumes everything until a white space character or the null character
    > terminating the input string are encountered. Your string has no white
    > space characters, so the first %s will store the whole string in s1.
    > So, %s is useless for your purpose. The right solution is:
    >
    > rc = sscanf(foo, "%[^,],%[^,],%[^,],%[^,],%[^\n]", s1, s2, s3, s4, s5);
    >
    > The last conversion specification can be %s if your fields cannot contain
    > white space. No need for %*[,] unless you want to skip multiple commas,
    > which doesn't make much sense (no point in skipping multiple commas if
    > you don't know their exact position inside the input string).
    >
    > Always check the value of rc, instead of blindly assuming that all 5
    > fields were properly extracted from the input string.

    Please stop thinking people will paste complete code here. some code
    is always left out for clarity.

    >
    > Trivia quiz: why did I use %[^\n] for the last conversion?
    >
    > Dan


    your signature says ur looking for a job...
    Such arrogance from you can only lead to the search getting prolonged
    ..
     
    Aslam Sheikh Durrani, Oct 3, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simone Mehta

    sscanf parsing doubt

    Simone Mehta, Sep 27, 2004, in forum: C Programming
    Replies:
    0
    Views:
    323
    Simone Mehta
    Sep 27, 2004
  2. Bob Nelson

    doubt about doubt

    Bob Nelson, Jul 28, 2006, in forum: C Programming
    Replies:
    11
    Views:
    659
  3. Markus Ilmola

    sscanf style string parsing

    Markus Ilmola, Mar 10, 2006, in forum: C++
    Replies:
    6
    Views:
    1,375
    Default User
    Mar 11, 2006
  4. Timo

    Simple sscanf parsing problem

    Timo, Jun 28, 2008, in forum: C Programming
    Replies:
    4
    Views:
    453
    CBFalconer
    Jun 28, 2008
  5. Replies:
    0
    Views:
    592
Loading...

Share This Page