E
Eric Mahurin
------=_Part_37706_8148117.1131146360947
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Jim Freeze was curious as to how Grammar (LL parser) compares to RACC (LR
parser). I was a little bit too... As a test case, I compared against the
racc calc.y parser (simple expression calculator) with a huge
2.6MBexpression (randomly generated by a script). Since ruby could
evaluate this
expression directly (I used load), I also threw it into the mix. Here are
the results:
parser lexer user vmsize notes
------ ----- ---- ------ -----
ruby ruby 0.8s 30.6MB -
racc Regexp 33.4s 278.2MB w/o racc C extension
racc Regexp 15.2s 142.1MB w/ racc C extension
Grammar Grammar 22.1s 11.4MB multi-threaded (300 token buf)
Grammar Grammar 20.9s 11.4MB multi-threaded, inlined
Grammar Grammar 21.1s 11.3MB one token at a time
Grammar Grammar 20.1s 11.4MB one token at a time, inlined
Grammar Regexp 15.6s 76.0MB -
Grammar Regexp 13.8s 74.4MB inlined
I think the best apples-to-apples comparision is racc w/ its Regexp lexer
and no C extension (mine is pure ruby) vs. my Grammar parser with a similar
Regexp lexer. You can even get a little more speed by inlining your actions
with Grammar. Unfortunately this makes it less readable because you have to
put your code in strings instead of a simple block. My parser is more than
twice as fast with those circumstances. racc's C extension gets it back on
par with my pure ruby solution. I think the memory usage is so high for the
racc solutions because with racc they typically generate all the tokens up
front (in memory). I think every racc example I found did this. I'm not sur=
e
why.
I only showed the Regexp lexer solution because that is what the racc
examples use. For a Grammar parser, I recommend using a Grammar lexer. A
Regexp lexer isn't as readable, flexible, and most importantly a Regexp is
meant to work with a String, not an IO. Because of Regexp's work with
Strings, all of the racc examples I found read the IO into a String first.
That is why the memory usage is so much higher for the Regexp lexers above.
BTW, I did this on my experimental code. Hopefully this will become version
0.6 in a few weeks.
------=_Part_37706_8148117.1131146360947--
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Jim Freeze was curious as to how Grammar (LL parser) compares to RACC (LR
parser). I was a little bit too... As a test case, I compared against the
racc calc.y parser (simple expression calculator) with a huge
2.6MBexpression (randomly generated by a script). Since ruby could
evaluate this
expression directly (I used load), I also threw it into the mix. Here are
the results:
parser lexer user vmsize notes
------ ----- ---- ------ -----
ruby ruby 0.8s 30.6MB -
racc Regexp 33.4s 278.2MB w/o racc C extension
racc Regexp 15.2s 142.1MB w/ racc C extension
Grammar Grammar 22.1s 11.4MB multi-threaded (300 token buf)
Grammar Grammar 20.9s 11.4MB multi-threaded, inlined
Grammar Grammar 21.1s 11.3MB one token at a time
Grammar Grammar 20.1s 11.4MB one token at a time, inlined
Grammar Regexp 15.6s 76.0MB -
Grammar Regexp 13.8s 74.4MB inlined
I think the best apples-to-apples comparision is racc w/ its Regexp lexer
and no C extension (mine is pure ruby) vs. my Grammar parser with a similar
Regexp lexer. You can even get a little more speed by inlining your actions
with Grammar. Unfortunately this makes it less readable because you have to
put your code in strings instead of a simple block. My parser is more than
twice as fast with those circumstances. racc's C extension gets it back on
par with my pure ruby solution. I think the memory usage is so high for the
racc solutions because with racc they typically generate all the tokens up
front (in memory). I think every racc example I found did this. I'm not sur=
e
why.
I only showed the Regexp lexer solution because that is what the racc
examples use. For a Grammar parser, I recommend using a Grammar lexer. A
Regexp lexer isn't as readable, flexible, and most importantly a Regexp is
meant to work with a String, not an IO. Because of Regexp's work with
Strings, all of the racc examples I found read the IO into a String first.
That is why the memory usage is so much higher for the Regexp lexers above.
BTW, I did this on my experimental code. Hopefully this will become version
0.6 in a few weeks.
------=_Part_37706_8148117.1131146360947--