Has anyone implemented BASIC in Python?

L

Leif K-Brooks

Has anyone ever tried implementing a simple unstructured BASIC dialect
in Python? I'm getting interested in language implementation, and
looking at a reasonably simple example like that could be pretty
interesting.
 
H

has

Leif K-Brooks said:
Has anyone ever tried implementing a simple unstructured BASIC dialect
in Python? I'm getting interested in language implementation, and
looking at a reasonably simple example like that could be pretty
interesting.

Dunno about BASIC, but Ian Bicking has done Logo: http://pylogo.sourceforge.net/

HTH
 
A

Andrea Griffini

Has anyone ever tried implementing a simple unstructured BASIC dialect
in Python? I'm getting interested in language implementation, and
looking at a reasonably simple example like that could be pretty
interesting.

I've no idea why you think that an unstructured language would
be a good starting point. IMO a good starting point would be just
an expression parser; after than an interpreter that just executes
a parsed tree and finally a true compiler that generates elementar
instructions. Ugly non-structured languages are not going to be
that easier to implement IMO.

Anyway I found the idea of a BASIC interpreter in python intriguing,
so I wrote one this weekend. I wrote it for my egoistic fun, so
I'm not sure if it would be useful for you (or anyone else).

Anyway here we go... http://www.gripho.it/py_basic.tgz

Comments are welcome...

Andrea
 
L

Leif K-Brooks

Andrea said:
I've no idea why you think that an unstructured language would
be a good starting point.

There's no stack to deal with, and generally less possible expressions
to worry about. You're probably right about doing an expression parser
first, though; I will. Thanks for the advice.
Anyway I found the idea of a BASIC interpreter in python intriguing,
so I wrote one this weekend. I wrote it for my egoistic fun, so
I'm not sure if it would be useful for you (or anyone else).

Whoa, that's your idea of a weekend project? I'm speechless.

I really appreciate the example, and I'll try my best to understand it.
Did you use a parser generator or anything like that, or is it purely
hand-coded?
 
E

Erik Max Francis

Leif said:
Has anyone ever tried implementing a simple unstructured BASIC dialect
in Python? I'm getting interested in language implementation, and
looking at a reasonably simple example like that could be pretty
interesting.

On a whim, I implemented a Python BASIC interpreter which supported all
statements in primitive BASIC, but got bored before I implemented all
the standard operators in expressions.
 
A

Andrea Griffini

There's no stack to deal with, and generally less possible expressions
to worry about. You're probably right about doing an expression parser
first, though; I will. Thanks for the advice.

I resorted to use two explicit stacks; one for FORs and one
for GOSUBs. There is no stack for expression evaluation because
python stack is used indirectly instead (expressions are evaluated
directly from the parse tree). Both of the explicit stacks
could have been dropped by considering a structured approach
with functions instead of the concept of just GOSUB-RETURN.
GOSUB-RETURN requires an explicit stack because it's a runtime
concept; not a parse-time concept. So it's not only uglier to
use... it's also harder to write.

Writing an interpreter that executes directly a parse tree of
a structured language program is IMO *easier* than writing an
interpreter for an unstructured language program.
That's why I think that starting from BASIC is not a good idea.

Things are different if we're talking about the VM approach,
but that's IMO something that should come later.
Whoa, that's your idea of a weekend project? I'm speechless.

Writing parsers, interpreters and compilers is a lot simpler than
many do think. Of course the evil is in the details and getting
something that comes even just close to (just for example ;-) )
python is a *LOT* harder.

I have "The dragon" book, and I saw that the part dedicated
to recursive descent parsers is incredibly small.
No wonder that writing compilers and interpreters is considered
such a difficult task... IMO the shift-reduce approach is just
*too* abstract (at least too abstract for *ME*, that is ;-) ).

Note also that working without any requirement (i.e. creating
requirements on the fly) and without any timeline (i.e. doing
whatever you think it would be nice to do) and without any
(pointy-haired) boss or (annoying) user is a lot easier.

I actually even think that the percentage of time I spent coding
the tic-tac-toe example in BASIC is not so small...
I really appreciate the example, and I'll try my best to understand it.
Did you use a parser generator or anything like that, or is it purely
hand-coded?

I never use yacc/lex, I just prefer hand-coding the parsers.

I never felt as pressing the problem of the speed of parsing
(I've read that this is one main superiority of shift-reduce
parsers). An hand-coded recursive descent parser is in my
opinion very easy to write... and very easy to maintain.
In the years I just found myself using tables for binary
operators to avoid repeating annoying code for just different
precedence levels or grouping; so I normally end up have just
one function for parsing "terms" (literals, varables, function
calls, parenthesized expressions and unary operators calls)
and one for parsing "expressions" that handles all binary
operators and that accepts the precedence level as a parameter.
Statements normally get their separate handcoded parsing
function (and this implies a lot of freedom about the syntax,
including making parsing dependent on the *meaning* of a symbol).
Also it's very easy to provide meaningful error messages.

For starting I would suggest however coding a separate
function at least for every precedence level (i.e. one for
mul/div, one for add/sub etc.... with lower precedence
ones that call higher precedence ones and with the highest
"term" one that calls the lowest "expr" for example for
parenthesized subexpressions).

Andrea
 
L

Leif K-Brooks

Andrea said:
Writing an interpreter that executes directly a parse tree of
a structured language program is IMO *easier* than writing an
interpreter for an unstructured language program.
That's why I think that starting from BASIC is not a good idea.

I see what you're saying now. I had entirely forgotten about GOSUB and
FOR (I haven't used BASIC for a very long time, thank God), so I was
thinking thinking about stackless code with IF and GOTO for flow control.

Thanks a lot for the detailed replies, by the way; they've been very
helpful.
Writing parsers, interpreters and compilers is a lot simpler than
many do think.

Is there a (virtual) book you would recommend reading to learn about
writing them? A course isn't really possibly for me, so I'm hoping that
isn't the only option.
 
P

Paul Rubin

Leif K-Brooks said:
Is there a (virtual) book you would recommend reading to learn about
writing them? A course isn't really possibly for me, so I'm hoping
that isn't the only option.

Structure and Interpretation of Computer Programs, by Abelson and
Sussman. Not the easiest book in the world, but by the time you
finish it, you'll really understand some things.

Full text available from:

http://mitpress.mit.edu/sicp/
 
P

Paul Rubin

Peter Hansen said:
Please do!

Well, it's not real Forth, just a mini-subset that I threw together in
a few hours, to get some understanding of how Forth is supposed to
work. But you can type stuff like

: square dup * ;
3 square .

and it will print 9 at you. Part of the perversity is that each word
is compiled to a Python lambda and is invoked by calling the lambda.
The idea was to make both Pythonistas and Forthwrights(?) scream in
disgust ;-). Please forgive the lack of comments etc. I wasn't going
to release it (at least in that condition), but you asked for it ;-).

http://www.nightsong.com/phr/python/forth.py
 
P

Peter Hansen

Paul said:

Thank you, Paul. Several others wrote to check if you had
sent anything to me privately, so there must be interest. :)

(My interest stems from two areas: simple curiosity and the
possibility of using Forth as a "scripting" language for
some embedded systems at some point, and the thought
that a Python-based Forth could be used to allow PC-side
test-driven development. Mostly I have the same curiosity
that led you to write it in the first place, but either
less time or more laziness. ;-)

-Peter
 
M

Michael J. Fromberger

Andrea Griffini said:
Writing parsers, interpreters and compilers is a lot simpler than
many do think. Of course the evil is in the details and getting
something that comes even just close to (just for example ;-) )
python is a *LOT* harder.

[...]

I never use yacc/lex, I just prefer hand-coding the parsers.

I never felt as pressing the problem of the speed of parsing
(I've read that this is one main superiority of shift-reduce
parsers). An hand-coded recursive descent parser is in my
opinion very easy to write... and very easy to maintain.

For simple, single-purpose languages, I am inclined to agree that
hand-written parsers are easy enough to write and maintain. However,
when you are dealing with a more complex general-purpose language, and
one whose grammar needs to evolve over time, then I think a
parser-generator is a much better solution by far.

The chief trouble in maintaining a hand-written recursive descent parser
is that the grammar for the input language is hidden inside the
implementation. The author of the code can usually pick it out, but
over time, even the author may find it difficult (and yes, I am speaking
from a certain degree of first-hand experience in this matter).

In contrast, the grammars consumed by most parser-generators are
explicitly written out as grammars, in the spirit (of not the form) of
BNF notation. If you want to make a change to the grammar (and thereby,
the parser), you can easily see how the changes will affect the rest of
the language, and a new parser can be created quickly and easily, simply
by re-running the parser generator. The only lines of code you need to
touch are the places where your change affects the translation, and that
is often quite minimal.

Furthermore, naive implementation of recursive descent is fraught with a
few subtle perils to trip the novice and the unwary. For instance, you
must carefully avoid left-recursive productions, or your parser may not
terminate. Also, error-handling is tricky in recursive descent, because
much of the parser's state is implicit in stack frames that must be
correctly unwound when an error forces you to bail out. If you're
writing in a language (like Python) with good automatic memory
management, the latter is less of an issue, but recursive descent
parsers written in languages without automatic memory management, like C
and C++, must contend mightily with this.

Of course, there is no silver bullet, but the availability of good LR(1)
and LALR(1) parser generators should not be discounted, even if the
theory behind them seems to be slightly complicated, on the surface.

Cheers,
-M
 
A

Aahz

On a whim, I implemented a Python BASIC interpreter which supported all
statements in primitive BASIC, but got bored before I implemented all
the standard operators in expressions.

For some reason, I thought that after "but" you were going to write,
"...but I don't have the proof handy." ;-)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"To me vi is Zen. To use vi is to practice zen. Every command is a
koan. Profound to the user, unintelligible to the uninitiated. You
discover truth everytime you use it." (e-mail address removed)
 
A

Andrea Griffini

On Mon, 23 Aug 2004 11:10:53 -0400, "Michael J. Fromberger"

[OT]
For simple, single-purpose languages, I am inclined to agree that
hand-written parsers are easy enough to write and maintain. However,
when you are dealing with a more complex general-purpose language, and
one whose grammar needs to evolve over time, then I think a
parser-generator is a much better solution by far.

Like I said before may be it's me... but I see the grammar definition
and the sparse collection of C code snippets required to use yacc &
friends harder to understand than a recursive descending parser.

One could go *forever* by abstracting and generalizing and
formalizing, but there is a ROI point that must be considered,
and, for me, shift-reduce parser are beyond the ROI point.
If the formal description (with the ugly pieces of real meat that
you're required to stick to it) becomes harder to follow and
maintain than another (that is still formal) imperative description
of how the parsing is done (i.e. a program doing the parsing)
then IMO there is simply no point in pursuing further abstraction.

Sometimes, for reasons I don't understand, there is a strong
temptation of abstracting just too much, and the result can
become quite ugly. If you do not understand what I mean then
The chief trouble in maintaining a hand-written recursive descent parser
is that the grammar for the input language is hidden inside the
implementation. The author of the code can usually pick it out, but
over time, even the author may find it difficult (and yes, I am speaking
from a certain degree of first-hand experience in this matter).

Normally the grammar is documented ALSO elsewhere in a very
human readable form. Also grammar sure evolves but (in my
experience) it's not the most common change.
In contrast, the grammars consumed by most parser-generators are
explicitly written out as grammars, in the spirit (of not the form) of
BNF notation. If you want to make a change to the grammar (and thereby,
the parser), you can easily see how the changes will affect the rest of
the language, and a new parser can be created quickly and easily, simply
by re-running the parser generator. The only lines of code you need to
touch are the places where your change affects the translation, and that
is often quite minimal.

I love BNF form, and I even wrote long ago a program that was
accepting a BNF description (and a very "clean" one, written
as you can find it in reference manuals of languages) and then
could check inputs for compliance with the grammar.

However when writing a compiler the grammar is just one part...
you do not have to just read the source. Spreading the meat of
the compiler is in my opinion worse than spreading the grammar...
it's like spreading the business logic of an application in
the buttons you press in a gui.

It's only a personal opinion, of course.
Furthermore, naive implementation of recursive descent is fraught with a
few subtle perils to trip the novice and the unwary. For instance, you
must carefully avoid left-recursive productions, or your parser may not
terminate.

That is something that never happened to me. Some trap I fell
in was for example writing tokenizers that recognized integers
with something like

/-?\d+/

and then later bumping in "x-1" being tokenized as <ident><number>.

Those are not serious issues, however.
Also, error-handling is tricky in recursive descent, because
much of the parser's state is implicit in stack frames that must be
correctly unwound when an error forces you to bail out. If you're
writing in a language (like Python) with good automatic memory
management, the latter is less of an issue, but recursive descent
parsers written in languages without automatic memory management, like C
and C++, must contend mightily with this.

Here is an excerpt of a parser I wrote about a dozen years ago
in C. The grammar is for a complete programming language (a
language designed for the implementation of the server side
of a three-thiers client/business logic/RDBMS architecture).

case -RW_IF :
SkipToken();
c=ParseExpr();
if (!ParseError && NextIs(-RW_THEN))
{ t=ParseStat();
if (!ParseError)
{ if (CurToken==-RW_ELSE)
{ SkipToken(); e=ParseStat(); }
else
e=NULL;
r=NewIf(c,t,e);
}
else
{ Deref(t); Deref(c); }
}
else
Deref(c);
break;

This is the case for parsing IF-THEN-ELSE statement.
Surely it's a bit annoying to remember to do all those
"Deref" call in case of failure, but I wouldn't call this
"contend mightly". The parser for the whole language
(a general purpose programming language with support
for embedded SQL) is about 1000 lines of C.
Of course, there is no silver bullet, but the availability of good LR(1)
and LALR(1) parser generators should not be discounted, even if the
theory behind them seems to be slightly complicated, on the surface.

Surely they're good for someone. I've to say I didn't really
invest a lot of time on shift-reduce parsers; but for the problems
I've been facing I found no serious reason for using those tools
instead of a recursive parser. Once a bit of low-level tricks
(like having current/skip instead of get/unget; or using
reference counters when the language is not providing them to
you) are in place things are really not complex at all.

Of course this is just my personal view, based on the experience
I had on the few languages I implemented. For example I never
dared to write a parser for C++: things like the rule that
says "if it can be interpreted as a declaration then it's a
declaration" really scare me (that is the rule that Scott Meyers
calls "the C++ most vexing parse" and implies that a C++ parser
has to swallow a potentially unlimited number of tokens before
deciding what is the semantic meaning for the very first of them).

In language I invented I never had the crazy idea of allowing
both:

X x(...);
X x = X(...);

for the price of introducing ambiguity with function declarations.

Note that if a grammar is that complex then even USERS of the
grammar will fall into the traps. This is very common for C++;
the following code snippet for example...

double pi = 3.141592654;
int x( int(pi) );

may trick many C++ programmers into thinking that an integer
variable named x is being initialized with value 3.

It's hard to get to this level of complexity with a recursive
descent parser ? Good ... I've another excuse to stay away
from pointless confusion ;-)


Andrea

PS: Ok ok ok... I'll give shift-reduce parser and their tools
another look. May be it's just my laziness that is showing up
and my brain that tries to find excuses for justifying it.
 
P

Paul Rubin

Andrea Griffini said:
One could go *forever* by abstracting and generalizing and
formalizing, but there is a ROI point that must be considered,
and, for me, shift-reduce parser are beyond the ROI point.

They are hard to code by hand, but the idea is normally you'd create
them with a parser generator.
 
M

Michael J. Fromberger

Andrea Griffini said:
Like I said before may be it's me... but I see the grammar definition
and the sparse collection of C code snippets required to use yacc &
friends harder to understand than a recursive descending parser.

Admittedly, YACC and Bison have a somewhat steep initial learning curve.
However, in terms of long-term maintainability, that learning curve is
well worth the trouble for non-trivial language projects. By combining
the context-free grammar with associated semantic information (the
so-called "sparse collection of C code snippets" you mention above), and
allowing the tool to generate your parser code, you are much more likely
to get a parser you can actually argue is correct.

Correct code may not be fashionable, but it is certainly much more
maintainable than incorrect code, or code whose correctness cannot
easily be determined. The usual response: "It's worked fine so far!"
does not cut much ice when you're writing something important, as I am
sure you know!
for me, shift-reduce parser are beyond the ROI point.

Ah, well. Suit yourself. :) In my opinion, that's equivalent to a
builder saying, "Power tools do not have good return-on-investment over
doing all our cutting, fitting, and joining using hand tools." But,
perhaps if you only ever need to work with simple grammars, that might
be true for your own needs.
Normally the grammar is documented ALSO elsewhere in a very
human readable form.

Of course -- but how do you know your human-readable documentation has
any necessary correlation with the actions of the parser? With a hand
coded parser, you basically have nothing but the author's hopeful
assurance. Automatically generated parsers are generated directly from
an EBNF analogue of the grammar (e.g., YACC input), which means you can
be sure the parser matches the grammar, unless there's a bug in the
parser generator, and the latter is S.E.P.

And, heck, if you don't like bottom-up shift-reduce parsers of the LR(k)
family, there are even parser-generators that generate top-down
predictive parsers of various stripes (e.g., ANTLR), that are reputed to
work well for many common tasks.
Also, error-handling is tricky in recursive descent, because
much of the parser's state is implicit in stack frames that must be
correctly unwound when an error forces you to bail out.

Here is an excerpt of a parser I wrote about a dozen years ago
in C. [...]

case -RW_IF :
SkipToken();
c=ParseExpr();
if (!ParseError && NextIs(-RW_THEN))
{ t=ParseStat();
if (!ParseError)
{ if (CurToken==-RW_ELSE)
{ SkipToken(); e=ParseStat(); }
else
e=NULL;
r=NewIf(c,t,e);
}
else
{ Deref(t); Deref(c); }
}
else
Deref(c);
break;

This is the case for parsing IF-THEN-ELSE statement.

If you'll forgive me for being a bit harsh, I find this code practically
unreadable. You've got at least three layers of nested error-handling
here, just to make sure your dereferences all happen in the correct
order. Having to explicitly check error codes like this, and manually
handle memory, is PRECISELY the reason why I argued that error-handling
is tricky in recursive descent. If you handed me this code and asked me
to find a problem in it, I'd probably tear all my hair out.

Does that mean you shouldn't use recursive descent? Of course not! The
technique is clever, and has its place. But this example could have
been so much nicer, even just using a table-driven top-down predictive
parser, rather than an explicitly recursive one.
Of course this is just my personal view, based on the experience
I had on the few languages I implemented. For example I never
dared to write a parser for C++

Yeah. Well, C++ is pretty awful to parse, no matter WHAT type of
algorithm you want to use. I won't dispute that at all. :)
PS: Ok ok ok... I'll give shift-reduce parser and their tools
another look. May be it's just my laziness that is showing up and
my brain that tries to find excuses for justifying it.

I think that, in the long run, you will probably find your efforts well
rewarded. I didn't intend to become the advocate for automatic parser
generators, but I do think that someone just starting out should be
aware that recursive descent is not always the best solution.

-M
 
J

John Tobler

Leif K-Brooks said:
Is there a (virtual) book you would recommend reading to learn about
writing them? A course isn't really possibly for me, so I'm hoping that
isn't the only option.

You do not need to spend a penny for books or information about
compiler writing. Let's go fishing!

Start with the Open Directory Project:

http://dmoz.org/Computers/Programming/Compilers/


.... and Google's equivalent of it (which incorporates the ODP but may
modify or supplement its results):

http://directory.google.com/Top/Computers/Programming/Compilers/


From those two links, alone, you should be able to find everything
from excellent beginner tutorials to advanced theoretical references.


Now, let's try a simple Google search for "compiler writing":

http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=compiler+writing


.... and "compiler theory"

http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=compiler+theory


.... and "tutorial+compiler"

http://www.google.com/search?hl=en&lr=&ie=UTF-8&q=tutorial+compiler&btnG=Search


Then, let's narrow our scope to tiny things:

http://www.google.com/search?sourceid=mozclient&ie=utf-8&oe=utf-8&q=tiny+compiler


That should give you some reasonably-sized learning projects to
emulate.

Have fun!

John Tobler
CSharpener's Weblog:
http://weblogs.asp.net/jtobler/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top