Parser generator

  • Thread starter Rodrigo B. de Oliveira
  • Start date
R

Rodrigo B. de Oliveira

This is a multi-part message in MIME format.

------=_NextPart_000_0054_01C35555.A871D780
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I'm evaluating language/frameworks for creating a toy language compiler =
and I'd love to use ruby. Unfortunately I couldn't find a mature parser =
generator for ruby (automatic AST generation is a plus). Advices?

Thanks in advance,
Rodrigo
------=_NextPart_000_0054_01C35555.A871D780
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1170" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3D"Lucida Console">I'm evaluating language/frameworks =
for creating=20
a toy language compiler and I'd love to use ruby. Unfortunately I =
couldn't find=20
a mature parser generator for ruby (automatic AST generation is a plus). =

Advices?</FONT></DIV>
<DIV><FONT face=3D"Lucida Console"></FONT>&nbsp;</DIV>
<DIV><FONT face=3D"Lucida Console">Thanks in advance,</FONT></DIV>
<DIV><FONT face=3D"Lucida Console">Rodrigo</FONT></DIV></BODY></HTML>

------=_NextPart_000_0054_01C35555.A871D780--
 
G

gabriele renzi

il Tue, 29 Jul 2003 10:00:03 +0900, "Rodrigo B. de Oliveira"
I'm evaluating language/frameworks for creating
a toy language compiler and I'd love to use ruby.
Unfortunately I couldn't find a mature parser generator
for ruby (automatic AST generation is a plus). Advices?

dunno what you mean for 'mature' but look on RAA for RACC and rockit.
the first is yacc/bison like, rockit is, I think, antlr-like.

Possibly could exists something like RBison, but I heard little about
it
 
M

Mauricio Fernández

'racc' is pretty stable, I think. 'rockit' is newer, but includes
automatic AST support.

racc http://raa.ruby-lang.org/list.rhtml?name=racc
rockit http://raa.ruby-lang.org/list.rhtml?name=rockit

Last time I used rockit (a couple months ago), it just didn't make it
for me. The parser was behaving very strangely, it didn't seem to match
my grammar (I spent a few hours checking it by verifying the tokens and
following the state transitions...).

Having the AST built automatically is nice, but it's fairly easy to make
it in the action sections of racc anyway (although rockit's is nicer since
it has array-like structures instead of recursive things), and I solved
all my problems (without changing the grammar) in a breeze when I switched
to racc.

--
_ _
| |__ __ _| |_ ___ _ __ ___ __ _ _ __
| '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \
| |_) | (_| | |_\__ \ | | | | | (_| | | | |
|_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

I did this 'cause Linux gives me a woody. It doesn't generate revenue.
-- Dave '-ddt->` Taylor, announcing DOOM for Linux
 
R

Robert Feldt

il Tue, 29 Jul 2003 10:00:03 +0900, "Rodrigo B. de Oliveira"
Rockit builds your AST and has EBNF constructs. but see below for
recommendation.
dunno what you mean for 'mature' but look on RAA for RACC and rockit.
the first is yacc/bison like, rockit is, I think, antlr-like.
Rockit is GLR which is Generalized LR which is LR(infinity). It is very
nice
since you can parse any context-free language with it and grammars look
very nice and are easy to understand.

I think antlr is LL(k) for varying and controllable k (I guess sort of
LL(infinity) in practice...). Its powerful if you know what you are doing
but you generally need to "direct it" more than a GLR parser, ie. write
more. Upside is performance *can* be better.

Recently some new algorithms and ideas for GLR has made C implementations
approach bison (LALR) in performance (I think the best are within 3-10% of
bison) when the grammar is unambigous (which they are the majority of the
time in practice). This is the best of worlds since you only take the
performance hit when there is no way to decide early which parse tree is
the correct one.

Rockit 0.3.8 is too immature (bugs and poor performance) to use for parsing
large grammars/inputs unfortunately since I haven't had enough time to work
on it. There is a later version that uses one of the new C-implemented GLR
backend's for nice speed but I never get enough time to tighten/pack it up.
Hopefully
it will happen some day but for now you can consider it vaporware... ;)

So a safe summary is to go for racc for time-tested algorithms and a really
good implementation. If you want to stay on the bleeding edge you might
wanna
watch out for rockit or other glr (or maybe an antlr in ruby port?) ruby
offerings.

Regards,

Robert Feldt
 
R

Robert Feldt

Robert,

I have thought it would be good to have a native Ruby parser generator.
I agree.
Are these GLR algorithms implemented in sufficiently readable fashion
that it would be worth translating? I have delved into various parser
generators and implemented many parsers, but I do find that much code
written by the sort of language-theory buffs that build these things to
be completely but unnecessarily impenetrable.

One reason for a native implementation is to allow dynamic extension
of a grammar - so you can add rules to an operating parser and have the
generator re-generate the parser on the fly. Great for parser development
and extensible languages...
My plan is to get it working as I want with the C back-ends and then
translate the necessary parts to Ruby. Having a pure Ruby one is
essential yes.

I'd say they are not super-easy to translate since there is no real
docs on the design and internals. However, you can start with
only the parts that are necessary during parsing since they are generally
smaller than the rest. Thus you can generate both fast parsers using C
but can be compiled to Ruby extension and have a pure Ruby parser when
you want for portability. When that is in place you can focus on
doing the rest of it in pure Ruby (Since the GLR generation phase is
very similar to the LALR one you could actually use parts of Racc,
unclear if its worth it though).
Can you point me to the GLR implementations of which you speak?
Elkhound is the fastest but its C++ and I wanted a pure-C one. So
I use D Parser (dparser.sf.net).

If you wanna work on translating it to Ruby I think its good if we
collaborate on it since there are obvious overlaps / connections
to rockit.

Regards,

Robert
 
M

maillist

Why would you want a native Ruby parser generator??? I am not trying to
be critical but cant understand why you would need one. How would you
use one and what for?

(Please excuse my naivety ;)
 
R

Robert Feldt

Why would you want a native Ruby parser generator??? I am not trying to
be critical but cant understand why you would need one. How would you
use one and what for?
Well there are two different issues here. What language the pgen is
implemented in and what language the generated parsers are in.

You want one that *generates* pure-Ruby parsers since you want to deploy
them with pure-Ruby programs, don't want to require people to compile,
more easily extend it dynamically etc. (As a side note you want it to have
the option to generate a parser as a Ruby C extension if speed is really
crucial.)

Main reason for having pure-Ruby-implemented generator is that its
easier to experiment with and extend. For example, the C/C++-implemented
GLR generators have little support for error handling
in the generated parsers. If you have the generator in Ruby you can
faster/more easily try different proposals for error-gen algorithms and
find
a suitable or hybrid solution. If its in C it takes more time to implement
one of them so it takes longer to find the "best" solution etc. I guess the
argument is really the same as in any argument on why you'd want something
in Ruby vs. some other language...

Regards,

Robert Feldt
 
E

Eric Hodel

--IiVenqGWf+H9Y6IX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
My plan is to get it working as I want with the C back-ends and then
translate the necessary parts to Ruby. Having a pure Ruby one is
essential yes.

Well if you aren't in too big of a hurry, there's a pure Ruby Coco/R
port coming out RSN (hopefully next tuesday).
I'd say they are not super-easy to translate since there is no real
docs on the design and internals. However, you can start with
only the parts that are necessary during parsing since they are generally
smaller than the rest. Thus you can generate both fast parsers using C
but can be compiled to Ruby extension and have a pure Ruby parser when
you want for portability. When that is in place you can focus on
doing the rest of it in pure Ruby (Since the GLR generation phase is
very similar to the LALR one you could actually use parts of Racc,
unclear if its worth it though).

We found that porting Coco/R from Java to Ruby was pretty easy, because
the Java knew how to generate itself, so we just needed to make all the
same function calls in the same order, and we'd be done. (It took a
while to figure this out though.)

We still need to unbreak encapsulation, because the people who wrote
Coco/R for Java did that quite a bit.

--=20
Eric Hodel - (e-mail address removed) - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04


--IiVenqGWf+H9Y6IX
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQE/KTM3MypVHHlsnwQRAobOAKDgujxNRoNgHFbOegimGAK1VSj1ywCfRtly
THfRG3fbGtrbDQ6K8by9h7Y=
=TdYZ
-----END PGP SIGNATURE-----

--IiVenqGWf+H9Y6IX--
 
E

Eric Hodel

--gE7i1rD7pdK0Ng3j
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Eric Hodel <[email protected]> skrev den Thu, 31 Jul 2003 08:18:15 -= =20
Is this a new port or are you referring to the Coco/Rb in RAA?

The Coco/Rb in the RAA is a wrapper of Coco/R in C. This is a new port
that will be pure ruby.
Ps. What is "RSN"?

"Real Soon Now" we're close, but don't have a definite release date.

--=20
Eric Hodel - (e-mail address removed) - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04


--gE7i1rD7pdK0Ng3j
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (FreeBSD)

iD8DBQE/KUr0MypVHHlsnwQRAtalAKCMYd1Hz+joZhYGQkuPuJHedrku6gCfZ3lz
E6QAPKa39wLhUQoPq8TGspY=
=roVg
-----END PGP SIGNATURE-----

--gE7i1rD7pdK0Ng3j--
 
L

Lothar Scholz

I really doubt that scripting languages offer faster implementation of
complex algorithms then C (which has bison/lex etc.)

All parsers i've seen so far that are written in python, are
really bad, trying go do things with regular expressions that can't be
done (context free regular expressions are not very powerful - read
the non political literature from Chomsky). In fact is there a MIME
parser that is written correctly in a scripting language ? For Perl and
Python the answer is no. I had to go to C++ to get one.

And do you want only a lexical parser or also one that gathers semantic
information about ruby code ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,234
Latest member
SkyeWeems

Latest Threads

Top