Parsing - is this a sensible idea?

G

gw7rib

I have a program that needs to do a small amount of relatively simple
parsing. The routines I've written work fine, but the code using them
is a bit long-winded.

I therefore had the idea of creating a class to do parsing. It could
be used as follows:

int a, n, x, y;
Parser par;
par << string;
if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
else if (par >> "Number" >> ' ' >> n) a = 2;
else a = 3;

Then if string is "From 3 to 5" this will set a=1, x=3, y=5. If the
string is "Number 2" this will set a=2 and n=2. If string is
"Other" then a=3. For convenience, I'll assume that an input of "From
4 other" is allowed to alter the value of x while returning a=3.

I think I could write a class that would do this. It would need to
keep track of whether the current parsing was succeeding and, if so,
how far through the string it had got. It would need overloaded >>
operators, obviously, some of them taking references. And it would
need a conversion operator, which I think would need to be to void *,
which would not only return whether the current parse had succeeded
but would also reset the flag and counter ready for another attempt.

So my questions are, is this a sensible thing to try to do, and are
there any potential snags that I haven't spotted?

Thanks.
Paul.
 
E

Erik Wikström

I have a program that needs to do a small amount of relatively simple
parsing. The routines I've written work fine, but the code using them
is a bit long-winded.

I therefore had the idea of creating a class to do parsing. It could
be used as follows:

int a, n, x, y;
Parser par;
par << string;
if (par >> "From" >> ' ' >> x >> ' ' >> "to" >> ' ' >> y) a = 1;
else if (par >> "Number" >> ' ' >> n) a = 2;
else a = 3;

Then if string is "From 3 to 5" this will set a=1, x=3, y=5. If the
string is "Number 2" this will set a=2 and n=2. If string is
"Other" then a=3. For convenience, I'll assume that an input of "From
4 other" is allowed to alter the value of x while returning a=3.

I think I could write a class that would do this. It would need to
keep track of whether the current parsing was succeeding and, if so,
how far through the string it had got. It would need overloaded >>
operators, obviously, some of them taking references. And it would
need a conversion operator, which I think would need to be to void *,
which would not only return whether the current parse had succeeded
but would also reset the flag and counter ready for another attempt.

So my questions are, is this a sensible thing to try to do, and are
there any potential snags that I haven't spotted?

If you need to parse a lot you should probably try a tool like yacc or
some other parser-generator. If you only need to be able to parse a very
small grammar (and want a good exercise) you can try to write the state-
machine by hand.

You example looks like a runtime-construct (though, perhaps you can make
it compile-time with some fancy template meta-programming) which does
not sound like a good idea to me.
 
G

gw7rib

If you need to parse a lot you should probably try a tool like yacc or
some other parser-generator. If you only need to be able to parse a very
small grammar (and want a good exercise) you can try to write the state-
machine by hand.

I don't think I'm going to be doing that much parsing, though I'll
bear that in mind if i do.
You example looks like a runtime-construct (though, perhaps you can make
it compile-time with some fancy template meta-programming) which does
not sound like a good idea to me.

How my example works - par >> "text" will check to see whether the
next bit of the string to be parsed contains the characters "text".
par >> n will check to see if the next bit of the string is a number,
and if so, set n to that number. par >> ' ' will skip whitespace. The
routine doesn't build up a "template" of what the string is supposed
to look like, it just checks each bit of it in turn, as I would have
thought any parser needs to.

Thanks for any further thoughts.
Paul.
 
J

Joe Smith

Paul said:
How my example works - par >> "text" will check to see whether the
next bit of the string to be parsed contains the characters "text".
par >> n will check to see if the next bit of the string is a number,
and if so, set n to that number. par >> ' ' will skip whitespace. The
routine doesn't build up a "template" of what the string is supposed
to look like, it just checks each bit of it in turn, as I would have
thought any parser needs to.

It is definately possible.

The only part that sticks out of your design as really weird is the
side effects of the conversion operator. I would prefer to have the
operator>> overloads return copies of the original with the changed
member variables. If you use a reference counting smart pointer for
the string your class would no larger than 4 integers on most
platforms (one for pointer, one for its reference count, one for the
position and less than 1 for the flag). The cost of copying four
integers is not terrible. If all the lines you want to parse are
fairly short like in your examples, you won't be making too many
copies. This is likely a reasonable tradeoff for avoiding the magic
in the operator void*().

In general though the returning copies is not scalable. On the other
hand your design has limited scalablility too, as advanced parsing
requires more sophisiticated techniques. But considerering your
examples, it sounds like you don't need a powerful parser, but
want something to parse simple strings, so all this might be just fine
for you.
 
J

James Kanze

I don't think I'm going to be doing that much parsing, though
I'll bear that in mind if i do.
How my example works - par >> "text" will check to see whether
the next bit of the string to be parsed contains the
characters "text".

I think that that's what I really don't care for in it. One
expects >> to read, not to check.

What's wrong with just using boost::regex?
 
J

Jerry Coffin

I have a program that needs to do a small amount of relatively simple
parsing. The routines I've written work fine, but the code using them
is a bit long-winded.

I therefore had the idea of creating a class to do parsing. It could
be used as follows:

Depending on what you're doing, I'd consider using a regular expression
library such as boost::regex, or a template-based parser generator such
as boost::Spirit 2.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top