looking for tips on how to implement "ruby-style" Domain SpecificLanguage in Python

K

Kay Schluehr

O.K. Mark. Since you seem to accept the basic requirement to build an
*external* DSL I can provide some help. I'm the author of EasyExtend
( EE ) which is a system to build external DSLs for Python.

http://www.fiber-space.de/EasyExtend/doc/EE.html

EE is very much work in progress and in the last year I was more
engaged with increasing power than enhance accessibility for
beginners. So be warned.

A DSL in EE is called a *langlet*. Download the EE package and import
it in a Python shell. A langlet can then be built this way:

This creates a bunch of files in a directory

<site-packages-path>/EasyExtend/langlets/my_langlet

Among them is run_my_langet.py and langlet.py. You can cd to the
directory and apply

$python run_my_langlet.py

which opens a console with prompt 'myl>'. Each langlet is immediatly
interactive. A user can also run a langlet specific module like

$python run_my_langlet.py mod.dsl

with the suffix .dsl defined in the langlet builder function. Each
module xxx.dsl can be imported from other modules of the my_langlet
langlet. EE provides a generic import hook for user defined suffixes.

In order to do anything meaningful one has to implement langlet
transformations in the langlet.py module. The main transformations are
defined in a class called LangletTransformer. It defines a set of
visitor methods that are marked by a decorator called @trans. Each
@trans method is named like a terminal/non-terminal in a grammar file
and responds to a terminal or non-terminal node of the parse tree
which is traversed. The structure of the parse tree is the same as
those you'd get from Pythons builtin parser. It is entirely determined
by 4 files:

- Grammar which is precisely the Python grammar found in the Python
source distribution.
- Grammar.ext which defines new non-terminals and overwrites old
ones.
- Token which defines Pythons token.
- Token.ext which is the analog of Grammar.ext for token definitions.

The Grammar.ext file is in the directory my_langlet/parsedef. There is
also an analog lexdef directory for Token.ext.

A possible Grammar.ext extension of the Python grammar that overwrites
two non-terminals of looks like this:

Grammar.ext
-----------

trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME | NAME |
NUMBER | STRING

atom: ('(' [yield_expr|testlist_gexp] ')' |
'[' [listmaker] ']' |
'{' [dictmaker] '}' |
'`' testlist1 '`' |
NAME | NUMBER | STRING)

-----------

Once this has been defined you can start a new my_langlet session and
type

myl> navigate_to 'www.openstreetmap.org' website
Traceback (most recent call last):
File "C:\lang\Python25\lib\site-packages\EasyExtend\eeconsole.py",
line 270, in compile_cst
_code = compile(src,"<input>","single", COMPILER_FLAGS)
File "<input>", line 1
navigate_to 'www.openstreetmap.org' website
^
SyntaxError: invalid syntax
myl>

It will raise a SyntaxError but notice that this error stems from the
*compiler*, not the parser. The parser perfectly accepts the modified
non-terminals and produces a parse tree. This parse tree has to be
transformed into a valid Python parse tree that can be accepted by
Pythons bytecode compiler.

I'm not going into detail here but recommend to read the tutorial

http://www.fiber-space.de/EasyExtend/doc/tutorial/EETutorial.html

that walks through a complete example that defines a few terminals,
non-terminals and the transformations accordingly. It also shows how
to use command line options to display parse tree properties, unparse
parse trees back into source code ( you can eliminate DSL code from
the code base entirely and replace it by equivalent Python code ), do
some validation on transformed parse trees etc.
 
J

Jonathan Gardner

It seems we're defining "DSL" in two different ways.

You can't write a DSL in Python because you can't change the syntax and
you don't have macros.

You can write a compiler in Python that will compile your "DSL."

Yes, that's what I'm saying. You can get the same results even thought
you can't manipulate the Python language itself as it's compiling
Python because you can feed it Python code that you've generated.

As another poster mentioned, eventually PyPy will be done and then
you'll get more of an "in-Python" DSL.

Of course, such a language wouldn't be Python anymore because Python
doesn't have such features.
 
J

Jonathan Gardner

May I ask why you consider it as important that the interpreter is
written in Python? I see no connection between PyPy and syntactical
Python extensions and the latter isn't an objective of PyPy. You can
write Python extensions with virtually any Python aware parser.
M.A.Lemburg already mentioned PLY and PLY is used for Cython. Then
there is ANTLR which provides a Python grammar. I also know about two
other Python aware parsers. One of them was written by myself.

If you're going to manipulate the Python compiler/interpreter from the
Python program itself, it's only reasonable that the Python compiler/
interpreter be written in Python so that it can be manipulated.

If you haven't already made it through SICP, you really should. It
will help you understand why being able to write a language in itself
is a big deal and why having the language around to be manipulated
from within the language is very useful.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top