Parsing - is this a sensible idea?

Discussion in 'C++' started by gw7rib@aol.com, Nov 16, 2008.

  1. Guest

    I have a program that needs to do a small amount of relatively simple
    parsing. The routines I've written work fine, but the code using them
    is a bit long-winded.

    I therefore had the idea of creating a class to do parsing. It could
    be used as follows:

    int a, n, x, y;
    Parser par;
    par << string;
    if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
    else if (par >> "Number" >> ' ' >> n) a = 2;
    else a = 3;

    Then if string is "From 3 to 5" this will set a=1, x=3, y=5. If the
    string is "Number 2" this will set a=2 and n=2. If string is
    "Other" then a=3. For convenience, I'll assume that an input of "From
    4 other" is allowed to alter the value of x while returning a=3.

    I think I could write a class that would do this. It would need to
    keep track of whether the current parsing was succeeding and, if so,
    how far through the string it had got. It would need overloaded >>
    operators, obviously, some of them taking references. And it would
    need a conversion operator, which I think would need to be to void *,
    which would not only return whether the current parse had succeeded
    but would also reset the flag and counter ready for another attempt.

    So my questions are, is this a sensible thing to try to do, and are
    there any potential snags that I haven't spotted?

    Thanks.
    Paul.
    , Nov 16, 2008
    #1
    1. Advertising

  2. On 2008-11-16 22:16, wrote:
    > I have a program that needs to do a small amount of relatively simple
    > parsing. The routines I've written work fine, but the code using them
    > is a bit long-winded.
    >
    > I therefore had the idea of creating a class to do parsing. It could
    > be used as follows:
    >
    > int a, n, x, y;
    > Parser par;
    > par << string;
    > if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
    > else if (par >> "Number" >> ' ' >> n) a = 2;
    > else a = 3;
    >
    > Then if string is "From 3 to 5" this will set a=1, x=3, y=5. If the
    > string is "Number 2" this will set a=2 and n=2. If string is
    > "Other" then a=3. For convenience, I'll assume that an input of "From
    > 4 other" is allowed to alter the value of x while returning a=3.
    >
    > I think I could write a class that would do this. It would need to
    > keep track of whether the current parsing was succeeding and, if so,
    > how far through the string it had got. It would need overloaded >>
    > operators, obviously, some of them taking references. And it would
    > need a conversion operator, which I think would need to be to void *,
    > which would not only return whether the current parse had succeeded
    > but would also reset the flag and counter ready for another attempt.
    >
    > So my questions are, is this a sensible thing to try to do, and are
    > there any potential snags that I haven't spotted?


    If you need to parse a lot you should probably try a tool like yacc or
    some other parser-generator. If you only need to be able to parse a very
    small grammar (and want a good exercise) you can try to write the state-
    machine by hand.

    You example looks like a runtime-construct (though, perhaps you can make
    it compile-time with some fancy template meta-programming) which does
    not sound like a good idea to me.

    --
    Erik Wikström
    Erik Wikström, Nov 16, 2008
    #2
    1. Advertising

  3. Guest

    On 16 Nov, 21:42, Erik Wikström <> wrote:
    > On 2008-11-16 22:16, wrote:
    > > I have a program that needs to do a small amount of relatively simple
    > > parsing. The routines I've written work fine, but the code using them
    > > is a bit long-winded.

    >
    > > I therefore had the idea of creating a class to do parsing. It could
    > > be used as follows:

    >
    > > int a, n, x, y;
    > > Parser par;
    > > par << string;
    > > if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
    > > else if (par >> "Number" >> ' ' >> n) a = 2;
    > > else a = 3;

    >
    > > Then if string is "From 3 to 5" this will set a=1, x=3, y=5. If the
    > > string is "Number     2" this will set a=2 and n=2. If string is
    > > "Other" then a=3. For convenience, I'll assume that an input of "From
    > > 4 other" is allowed to alter the value of x while returning a=3.

    >
    > > I think I could write a class that would do this. It would need to
    > > keep track of whether the current parsing was succeeding and, if so,
    > > how far through the string it had got. It would need overloaded >>
    > > operators, obviously, some of them taking references. And it would
    > > need a conversion operator, which I think would need to be to void *,
    > > which would not only return whether the current parse had succeeded
    > > but would also reset the flag and counter ready for another attempt.

    >
    > > So my questions are, is this a sensible thing to try to do, and are
    > > there any potential snags that I haven't spotted?

    >
    > If you need to parse a lot you should probably try a tool like yacc or
    > some other parser-generator. If you only need to be able to parse a very
    > small grammar (and want a good exercise) you can try to write the state-
    > machine by hand.


    I don't think I'm going to be doing that much parsing, though I'll
    bear that in mind if i do.

    > You example looks like a runtime-construct (though, perhaps you can make
    > it compile-time with some fancy template meta-programming) which does
    > not sound like a good idea to me.


    How my example works - par >> "text" will check to see whether the
    next bit of the string to be parsed contains the characters "text".
    par >> n will check to see if the next bit of the string is a number,
    and if so, set n to that number. par >> ' ' will skip whitespace. The
    routine doesn't build up a "template" of what the string is supposed
    to look like, it just checks each bit of it in turn, as I would have
    thought any parser needs to.

    Thanks for any further thoughts.
    Paul.
    , Nov 16, 2008
    #3
  4. Joe Smith Guest

    Paul wrote:
    >How my example works - par >> "text" will check to see whether the
    >next bit of the string to be parsed contains the characters "text".
    >par >> n will check to see if the next bit of the string is a number,
    >and if so, set n to that number. par >> ' ' will skip whitespace. The
    >routine doesn't build up a "template" of what the string is supposed
    >to look like, it just checks each bit of it in turn, as I would have
    >thought any parser needs to.


    It is definately possible.

    The only part that sticks out of your design as really weird is the
    side effects of the conversion operator. I would prefer to have the
    operator>> overloads return copies of the original with the changed
    member variables. If you use a reference counting smart pointer for
    the string your class would no larger than 4 integers on most
    platforms (one for pointer, one for its reference count, one for the
    position and less than 1 for the flag). The cost of copying four
    integers is not terrible. If all the lines you want to parse are
    fairly short like in your examples, you won't be making too many
    copies. This is likely a reasonable tradeoff for avoiding the magic
    in the operator void*().

    In general though the returning copies is not scalable. On the other
    hand your design has limited scalablility too, as advanced parsing
    requires more sophisiticated techniques. But considerering your
    examples, it sounds like you don't need a powerful parser, but
    want something to parse simple strings, so all this might be just fine
    for you.
    Joe Smith, Nov 17, 2008
    #4
  5. James Kanze Guest

    On Nov 16, 11:09 pm, wrote:
    > On 16 Nov, 21:42, Erik Wikström <> wrote:
    > > On 2008-11-16 22:16, wrote:
    > > > I have a program that needs to do a small amount of
    > > > relatively simple parsing. The routines I've written work
    > > > fine, but the code using them is a bit long-winded.


    > > > I therefore had the idea of creating a class to do
    > > > parsing. It could be used as follows:


    > > > int a, n, x, y;
    > > > Parser par;
    > > > par << string;
    > > > if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
    > > > else if (par >> "Number" >> ' ' >> n) a = 2;
    > > > else a = 3;


    > > > Then if string is "From 3 to 5" this will set a=1, x=3,
    > > > y=5. If the string is "Number 2" this will set a=2 and
    > > > n=2. If string is "Other" then a=3. For convenience, I'll
    > > > assume that an input of "From 4 other" is allowed to alter
    > > > the value of x while returning a=3.


    > > > I think I could write a class that would do this. It would
    > > > need to keep track of whether the current parsing was
    > > > succeeding and, if so, how far through the string it had
    > > > got. It would need overloaded >> operators, obviously,
    > > > some of them taking references. And it would need a
    > > > conversion operator, which I think would need to be to
    > > > void *, which would not only return whether the current
    > > > parse had succeeded but would also reset the flag and
    > > > counter ready for another attempt.


    > > > So my questions are, is this a sensible thing to try to
    > > > do, and are there any potential snags that I haven't
    > > > spotted?


    > > If you need to parse a lot you should probably try a tool
    > > like yacc or some other parser-generator. If you only need
    > > to be able to parse a very small grammar (and want a good
    > > exercise) you can try to write the state- machine by hand.


    > I don't think I'm going to be doing that much parsing, though
    > I'll bear that in mind if i do.


    > > You example looks like a runtime-construct (though, perhaps
    > > you can make it compile-time with some fancy template
    > > meta-programming) which does not sound like a good idea to
    > > me.


    > How my example works - par >> "text" will check to see whether
    > the next bit of the string to be parsed contains the
    > characters "text".


    I think that that's what I really don't care for in it. One
    expects >> to read, not to check.

    What's wrong with just using boost::regex?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Nov 17, 2008
    #5
  6. Jerry Coffin Guest

    In article <52baba84-3b2a-40fc-b95b-
    >, says...
    > I have a program that needs to do a small amount of relatively simple
    > parsing. The routines I've written work fine, but the code using them
    > is a bit long-winded.
    >
    > I therefore had the idea of creating a class to do parsing. It could
    > be used as follows:


    Depending on what you're doing, I'd consider using a regular expression
    library such as boost::regex, or a template-based parser generator such
    as boost::Spirit 2.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Nov 19, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hazz
    Replies:
    5
    Views:
    479
  2. gavnosis
    Replies:
    0
    Views:
    519
    gavnosis
    Aug 2, 2003
  3. Twisted
    Replies:
    552
    Views:
    6,649
    Twisted
    Dec 8, 2006
  4. David C

    sensible way to pass around values

    David C, Aug 6, 2007, in forum: ASP .Net
    Replies:
    3
    Views:
    331
    Ross Culver
    Aug 7, 2007
  5. Adrienne Boswell
    Replies:
    2
    Views:
    528
    Doug Miller
    Nov 8, 2009
Loading...

Share This Page