On Apr 29, 3:32 pm, "kwikius"
I used to agree but someone some time ago "politely suggested"
using a formal parser rather than writing parsers by hand and
now I am completely converted. Parser generators will verify
the grammar that is presented to them and point out
ambiguities that a hand written parser would never spot. (I
have written various parsers by hand ) and are easier for
others to understand
I think it depends a lot on the grammar. I regularly use flex
for smaller things. In general, if the grammar isn't too
complex, a parser generator may be simpler (and if you define a
grammar yourself, you should definitely strive to make it not
too complex). In practice, however, most real programming
languages have very complex grammars (C++ is probably one of the
worst), and hand written parsers can usually give better error
messages, handle error recovery more gracefully, and it's also
easier to "cheat" a bit when necessary to make things work. (I
suspect, for example, that most C++ compilers use some sort of
backtracking in cases where it isn't clear from the initial
sequence whether you're dealing with a declaration or an
expression.)
As for "easier for others to understand", it obviously depends
on which "others". I've been hassled for using flex because
some of the "others" aren't familiar with the tool, and don't
feel at home with anything more complex than recursive descent.
Also Bjarne Stroustrup himself says that C++ grammar is
"absurd". See:
page 38 column 2, half way down, para starting "However ,
tools and environments..
Yes. C++ is one of the most difficult languages to parse.
I'm sure no expert on regular expressions, but AFAIK you cant
abstract a part of a regular expression into a production (e.g
"integer" in my above example ), so you end up with a long
difficult to read and verify expression ( which is hard work).
If you could have productions... I think you'd have a parser
grammar. But as I say I am no expert and I'm sure someone will
correct me if I'm wrong about that.
The grammar that he's parsing is regular, so you don't need
anything more complicated than a regular expression. And the
regular expression matchers I know (e.g. my own or Boost) all
start with a string. So you would start with something like:
std::string const integer( "\\d+" ) ;
and build up the final expression as a string. For the original
problem, you might end up with something like:
std::string const integer( "\\d+" ) ;
std::string const spaces( "\\s+" ) ;
std::string const time(
integer + ":" integer + ":" + integer + "\\." +
integer ) ;
std::string const ipAddress(
integer + "\\." + integer
+ "\\." + integer
+ "\\." + integer ) ;
std::string const fullAddress(
ipAddress + "\\." + integer ) ;
// Or should this use a "/" as a
// separator?
std::string const protocol( "\l+" ) ;
// or "\S+" ?
std::string const line( time
+ spaces + fullAddress
+ spaces + fullAddress
+ spaces + protocol
+ spaces + integer ) ;
boost::regex pattern( line ) ;
As usual: divide and conquer. (Note that if you're not afraid
of a few local macros, the fact that C++ concatenates adjacent
string literals means that you can actually do all of this at
compile time, replacing the std::string const with #define, and
dropping the +'s.)