XML based programming language

S

stefaan

Hello,

I have recently had to deal with an XML-based
programming language. (The programs are
generated programmatically.)

XML leads to a "two-level" parsing problem: first
parse the xml into tokens, then parse the tokens
to grasp their meaning (based on the semantics
of the programming language).

Basically, I used elementtree as a sophisticated "lexer" and wrote a
recursive descent parser to perform the semantic analysis and
interpretation.
(It works great.)

But I keep wondering: do parser generator tools
exist that I could have used instead of writing
the recursive descent parser manually ?

Best regards,
Stefaan.
 
D

Diez B. Roggisch

stefaan said:
Hello,

I have recently had to deal with an XML-based
programming language. (The programs are
generated programmatically.)

XML leads to a "two-level" parsing problem: first
parse the xml into tokens, then parse the tokens
to grasp their meaning (based on the semantics
of the programming language).

Basically, I used elementtree as a sophisticated "lexer" and wrote a
recursive descent parser to perform the semantic analysis and
interpretation.
(It works great.)

But I keep wondering: do parser generator tools
exist that I could have used instead of writing
the recursive descent parser manually ?

You haven't written a recursive descent parser. At least not in the
sense of the word.

A parser (recursive descent or otherwise) will take a string written in
the language it accepts, and in the field of programming languages
usually returns an abstract syntax tree. On which one works - for
code-generation, interpretation, optimization.

What you wrote is usually called a reducer, the part that traverses the
tree, rewriting it, transforming it for interpretation and whatnot.

I've been working with tools that use a XML-Schema or DTD and generate
typed objects from it, that are capable of being deserialized from a
XML-stream. The better of these tools generate visitors and/or matchers,
which basically are objects that traverse the generated object tree in
document order, via typed methods. Something like this (java pseudocode):

class Visitor {

public visit(Object o) {
if(o instanceof Expr) {
visit((Expr)o);
else if(o instanceof SubExpr) {
visit((SubExpr)o);
}

public visit(Expr e) {
for(SubExpr se : e.subExpressions) {
visit(se);
}
}

public visit(SubExpr e) {
// not doing anything
}

}


This visitor you can then subclass, for example to create an interpreter.

All of this is theoretically possible in python, too. Using multimethods
one can create the dispatching, and so forth.

I'm just not too convinced that it really is worth the effort. A simple
tag-name-based dispatching scheme, together with the really nice
ElementTree-api suffices in my eyes. Then you could do something like this:


class Visitor(ojbect):

def visit(self, node):
descent = True
if getattr(self, "visit_%s" % node.tag):
descent = getattr(self, "visit_%s" % node.tag)(node)
if descent:
for child in node:
self.visit(child)


Then for an element "expr" you could define

class Foo(Visitor):
def visit_expr(self, node):
...


HTH,

Diez
 
S

stefaan

Thank you Diez for answering.
As far as I can see, it more or less corresponds to what I have.

But my question was perhaps more this:

"If elementtree is "lex", what is "yacc" ? "
 
D

Diez B. Roggisch

stefaan said:
Thank you Diez for answering.
As far as I can see, it more or less corresponds to what I have.

But my question was perhaps more this:

"If elementtree is "lex", what is "yacc" ? "

Elementtree isn't lex. You are comparing apples and oranges here. Lex
tokenizes, yacc creates trees. Both of is covered in XML itself - it's
defined the tokenization and parsing, built into elementtree. So,
elemnttree is lex _and_ yacc for XML. And if your language is written in
XML, that's all there is to it.

Diez
 
S

stefaan

Elementtree isn't lex. You are comparing apples and oranges here. Lex
tokenizes, yacc creates trees. Both of is covered in XML itself - it's
defined the tokenization and parsing, built into elementtree. So,
elemnttree is lex _and_ yacc for XML. And if your language is written in
XML, that's all there is to it.

I see your point. But yacc does more: I specify a grammar, and yacc
will
reject input files that do not conform to the grammar.
Elementtree OTOH will happily accept any valid XML file, all checking
has to
implememented manually by me.

Best regards,
Stefaan.
 
L

Laurent Pointal

stefaan a écrit :
I see your point. But yacc does more: I specify a grammar, and yacc
will
reject input files that do not conform to the grammar.
Elementtree OTOH will happily accept any valid XML file, all checking
has to
implememented manually by me.

Best regards,
Stefaan.

For an XML represented programming language, isn't the DTD (or other XML
definition format) your grammar?

IE. Just use an XML validator tool and you dont need to write checking.
DTDs may be not enough, see other definitions tools.
 
D

Diez B. Roggisch

stefaan said:
I see your point. But yacc does more: I specify a grammar, and yacc
will
reject input files that do not conform to the grammar.
Elementtree OTOH will happily accept any valid XML file, all checking
has to
implememented manually by me.

First of all: nearly all parsers allow syntactically more, than their
languages semantics actually admit. So you will have to have certain (most
of the time context sensitive) checks hand-coded. But this is a small
digression.

What you are after then is the usage of a validating parser, not just
well-formed XML-documents.

I'm not sure where element-tree stands regarding this, but I think 4suite
offers DTD, W3C-Schema and Relax-NG support.

All of these are grammar-specifications that allow you to define the
structure of your XML-documents with more constraints.

Diez
 
J

Jarek Zgoda

Diez B. Roggisch napisa³(a):
I'm not sure where element-tree stands regarding this, but I think 4suite
offers DTD, W3C-Schema and Relax-NG support.

This varies depending on implementation. As Fredrik Lundh's original
implementation is based on expat parser, it has no ability to check
anything besides well-formedness of document. AFAIK lxml implementation
exposes some of libxml2 abilities to check the document conformance to
XML schema definition.
 
S

stefaan

All of these are grammar-specifications that allow you to define the
structure of your XML-documents with more constraints.

Ok, I should have foreseen the schema checker answer...my point really
is that
yacc can do even more than just checking the conformance to a grammar.

It also allows me to specify semantic actions,
e.g. to help in building an abstract syntax tree from
the concrete syntax tree, or to implement a very basic
interpreter...

mock example:
<output><sum arg1="a" arg2="b"/></output>

No schema checker can take this specification and simply output "22".
XSLT might be able to implement it, but it is complex for anything
real-life. Elementtree can immediately give me the concrete syntax
tree,
but any semantic actions have to be implemented during a
manually programmed tree traversal.

Anyway, it is not urgent for me, I have something which works,
it just seems like something's missing still from the
existing XML tool collection. Or I am being thick-headed ;)
 
J

Jarek Zgoda

stefaan napisa³(a):
No schema checker can take this specification and simply output "22".
XSLT might be able to implement it, but it is complex for anything
real-life. Elementtree can immediately give me the concrete syntax
tree,
but any semantic actions have to be implemented during a
manually programmed tree traversal.

Don't you think the lex/yacc combo is complex even in anything in
real-life? The "XML tree simplification implementations" (as Elementtree
can be considered) has other complex tasks to do. Usually implementors
look from document processing perspective and the interface is tailored
to such tasks. ;)
 
G

greg

Diez said:
What you are after then is the usage of a validating parser, not just
well-formed XML-documents.

I'm not sure where element-tree stands regarding this, but I think 4suite
offers DTD, W3C-Schema and Relax-NG support.

So he's effectively written his own validating parser, which
is a legitimate thing to do. His programming language likely
has constraints that can't be easily expressed using any of
the standard W3C buzzword-compliant validation mechanisms.

Also his validator has a chance to do other useful computation
along the way, such as semantic analysis and/or code generation.
That stuff has to be done anyway, and validation sort of comes
out of that for free. So inserting an extra validation step
might not be of any advantage.
 
S

stefaan

Don't you think the lex/yacc combo is complex even in anything in
real-life?
If real-life means: C++, then yes, it is impossible :)
If real-life means: some domain specific language, then it is ok.
The "XML tree simplification implementations" (as Elementtree
can be considered) has other complex tasks to do.

I fully agree. Perhaps I need a validating parser with user-definable
hooks for semantic actions. This would be a layer on top of
Elementtree.
 
D

Diez B. Roggisch

stefaan said:
Ok, I should have foreseen the schema checker answer...my point really
is that
yacc can do even more than just checking the conformance to a grammar.

It also allows me to specify semantic actions,
e.g. to help in building an abstract syntax tree from
the concrete syntax tree, or to implement a very basic
interpreter...

mock example:

<output><sum arg1="a" arg2="b"/></output>

No schema checker can take this specification and simply output "22".
XSLT might be able to implement it, but it is complex for anything
real-life. Elementtree can immediately give me the concrete syntax
tree,
but any semantic actions have to be implemented during a
manually programmed tree traversal.

Yep, they have. But to be brutally honest: I haven't felt the need to go
with semantic actions when using e.g. ANTLR. IMHO it only works for small
examples like the one above. The mixing of syntactic structure definition
together with "real" code gets really messy, and you are very rigid
regarding even smaller grammar changes.

The very moment you are getting more complex, you want an AST, and work upon
that. It will be much easier and robust to work on it, even if you alter
your grammar a bit.

And XML _is_ your AST, and working on it means... writing code.
*If* there was anything as yacc regarding semantic actions, it would be an
extension to XSD or any other schema. I'm not aware of such a beast.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,188
Latest member
Crypto TaxSoftware

Latest Threads

Top