YARD : Generic regular expression parser

christopher diggins · Dec 10, 2004

There seems to be a gazillion regular expression libraries. Most of them
only work on text, but I wanted something that also worked on arbitrary
sequences of data ( this is useful, for instance, in building parse trees
from token lists ). This is possible, I think, using the Spirit library from
Boost, but the syntax and complexity again is too much for me. I almost
finished the YARD ( yet another recursive descent ) parser which is a really
lightweight truly generic regex parser (and runs like a bat out of hell).
Anyway, the syntax is essentially as follows:

You define rules as follows:

typedef CharRange_parser<'a', 'z'> LowerCaseLetter_parser;
typedef CharRange_parser<'A', 'Z'> UpperCaseLetter_parser;
typedef CharRange_parser<'0', '9'> Number_parser;
typedef re_or<LowerCaseLetter_parser, UpperCaseLetter_parser> Letter_parser;
typedef re_or<Letter_parser, Char_parser<'\''> > WordChar_parser;
typedef re_plus<WordChar_parser> Word_parser;
typedef re_or<Letter_parser, Char_parser<'_'> > IdentFirstChar_parser;
typedef re_or<IdentFirstChar_parser, Number_parser> IdentOtherChar_parser;
typedef re_and<IdentFirstChar_parser, re_star<IdentOtherChar_parser> >
Ident_parser;

Then you hand them to a tokenizer as follows:

int main ()
{
nBufSize = GetFileSize(sFileName);
pBuf = static_cast<char*>(calloc(nBufSize, 1));
ifstream f;
f.open(sFileName);
f.read(pBuf, nBufSize);
f.close();
Tokenizer<Word_parser> tknzr;
tknzr.Parse(pBuf, nBufSize);
OutputTokens(tknzr.Begin(), tknzr.End());
free(pBuf);
getchar();
return 0;
}

A tokenizer in this case is really simple:

template<typename Parser_T>
struct Tokenizer {
void Parse(char* pText, int nSize)
{
ParseInputStream stream(pText, nSize);
while (!stream.AtEnd()) {
int index = stream.GetIndex();
if (Rules_T::Accept(stream)) {
mTkns.push_back(Token(index, stream.GetIndex()));
}
stream.GotoNext();
}
}
TokenIter Begin() { return mTkns.begin(); }
TokenIter End() { return mTkns.end(); }
private:
TokenList mTkns;
};

What I want to know is this obvious to programmers how it works and how to
use it? Is the verbosity acceptable? Also, would it interest people more if
I showed some benchmarks comparing it to other libraries?

TIA

Markus Elfring · Jan 5, 2005

Can the definitions that are described in the section "7 Regular expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf

christopher diggins · Jan 5, 2005

Markus Elfring said:
Can the definitions that are described in the section "7 Regular
expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf

Sorry but I don't quite understand the question ( nor the document ), could
you explain more?

Markus Elfring · Jan 9, 2005

Sorry but I don't quite understand the question ( nor the document ),
could

you explain more?

What don't you understand from the referenced document?
Would you like to reuse anything from this template library for regular
expressions that is in development?

When do you want a regexp to be evaluated?
Compile (Boost::Spirit / Phoenix) or run time?

Regards,
Markus

Expression templates and generic matrix operator	0	Jan 6, 2006
[ANN] Boost.Xpressive 2.0, advanced regular expression template library	0	Oct 24, 2007
regular expression to restrict number of consecutive characters	6	Oct 12, 2007
writing a generic method	2	Nov 21, 2005
Parse::Eyapp a LALR yapp compatible Parser Generator	0	Jan 30, 2007
Regular expression match objects - compact syntax?	1	Feb 3, 2005
Suggestion for a new regular expression extension	4	Nov 20, 2003
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008

YARD : Generic regular expression parser

christopher diggins

Markus Elfring

christopher diggins

Markus Elfring

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads