When little languages grow...

  • Thread starter Hugh Sasse Staff Elec Eng
  • Start date
H

Hugh Sasse Staff Elec Eng

Is the "mini-language" given or you can build your own "mini-language"?

Build my own.
If you can write your own language, then you should consider writing a
DSL suitable for your problem a la Rails, and then, let the Ruby

I've not grokked Rails yet, but yes, it is Domain Specific. The
problem is not designing the language (because I know the sort of
things I want to say); the problem is how to parse it with some
abstract machine such that I can maintain that machine and improve
it. Also I don't want to open up security holes unneccesarily:
"Deliver us from Eval". One doesn't intend to get hostile users, but
the possibilty is there.
interpreter do the whole job.

Kind Regards,
Ed

Thank you,
Hugh
 
T

Trans

Hugh, I sent you an email with parser.rb attached, but I'm not sure if
I had the right email address. Let me know if you got it thanks.

This is being sent via google groups, so I assume it will get to you
;-)

T.
 
C

Clifford Heath

Mark said:
My personal solution to this is to use Coco/R, an LL(1) scanner/generator.

I haven't used Coco, but I'd second Mark's recommendation to stay with
an LL(1) parser if possible. If not, then LL(n) ala ANTLR, but not LALR
ala yacc. The LL parsers are easy to write manually using recursive
descent, which I've done a few times.
[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read which
will turn the problem around, so it becomes easy to handle?

Shift just means "delay a decision about what I've just seen". Reduce is
the operation you do when you do decide. If you explore the ambiguity
in your grammar rules, these start to make more sense.
The primary disadvantage of Coco/R is the LL(1) part.

ANTLR does LL(n) for arbitrary n I believe - though you should avoid
n > 3 or humans start to have trouble parsing your language :).

It's a shame that the ANTLR folk at Purdue went Java-only when they
dropped their old C-based implementation. A multi-lingual ANTLR would
be super-cool, especially if it would generate Ruby.
As an
example, Ruby can not, as far as I have tried, be converted into an LL(1)
grammar, though C can.

Not without a tie-in to the lexical analyser to help recognise goto
labels, which require LL(2). Such a tie-in is commonly used however.
A simple example of the ruby grammar

Good example, thanks Mark.

I should point out that the major reason for the success of XML
(contrary to most of the hyped claims about it) is that it allows
people to create languages without having to create parsers. Or
rather, they use an XML parser which yields a DOM, and can process
the AST at will.

If you can live with the ugliness of XML and the size&speed of Rexml,
you should consider it.

There's no good reason why a language like Ruby shouldn't have
grammar rules as first-class objects (as Regexp's are), yielding
Ruby objects that reflect the AST, allowing attribute-grammar
parsers to be written and integrated directly within a program.

Such a tool, integrated into the Ruby interpreter itself, would
allow extension modules to define *Ruby syntax extensions*, so
that the language itself becomes plastic.

I haven't thought much about what these last two features would
look like in Ruby's case.

Clifford Heath.
 
C

Carlos

My present example is that I want to parse Constructive Solid
Geometry descriptions, at the moment limited to cones, spheres,
bricks with the co-ordinates specified, and I need to specify
material types as well.

I'd also like to be able to declare new objects so they can be
placed. Silly example: Get two small spheres to cap the ends of
a cylinder and call the result a Sausage. Then place several
Sausages in the space at different points.

Later I'd have to extend the language to be able to rotate them into
psoition. Lots of creeping featurism is likely, I suspect. I
didn't have material types to deal with before.

How about using POV-ray's scene description language?

Here somebody did a yacc parser for that language:
http://www.io.com/~wwagner/pov.html
(skip to header "The Parser"). Maybe you can adapt it to use with racc.

(I don't know about the licence.)

Good luck.
 
M

Mathieu Bouchard

_ Of course it was vast overkill for 99% of what you need to
do with that kind of system. As much as I dislike Tcl for other
reasons, it is a very good language for extending applications
via simple scripting. Adding your own specialized commands is
fairly straightforward. I know quite a few scientific groups
that are using Python to do similar things these days.

Another thing I like about Tcl is that it has _no_ special cases. Its
syntax, while superficially like shell scripts, is deeply like Lisp
(actually simpler: think Scheme with a few details backwards).

The equivalent of Ruby's "def" in Tcl is "proc". Well, proc is a
procedure, so you can redefine it, so as to add features to it.

That kind of goodies is the things that make Tcl quite bearable to me,
despite lacking a damn lot features (I originally switched to Perl from
Tcl)

I still have to work with Tcl because of a GUI made in Tk, but I'm quite
happy with it.

_____________________________________________________________________
Mathieu Bouchard -=- Montréal QC Canada -=- http://artengine.ca/matju
 
S

Sam Roberts

Quoteing (e-mail address removed), on Thu, Jan 27, 2005 at 02:08:04AM +0900:
I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.
snip

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

Hi Hugh.

Two suggestions:

Most of the docs for parser-generators assume some (lots?) knowledge of
compiler theory. An exception is the O'Reilly Lex&Yacc book. If any of
the generators you mention above is anything like Yacc, I think you
might like the book. Its at a practical level, it assumes you are a
competent programmer, you just haven't taken a compiler course. The
first few chapters got me up to speed very quickly (I think you develop
a calculator). After that, the example used is generating a
mini-language to specify curses screen layouts. I don't do any curses
stuff, but the examples were used to good effect.

I think that if you do this often, mastering one of these tools will be
a life skill you'll never regret! And yacc (or one of the others) is a
mini-language in its own right, always good to learn another.

On another track, I have deliberately NOT used these grammers when
parseing the BNF for internet email. In my experience, if you can write
the grammer down in BNF, its usually pretty easy to write a recursive
descent parser for it. It's a little tedious though, which is why all
the tools exist to generate the code for you...

Have fun,
Sam
 
D

Dave Burt

ts said:
H> Thread.new(input){|source|
H> $SAFE=5
H> instance_eval source
H> }.value

Sorry to say this but this is the most common error that I see when
someone try to eval some code with $SAFE >= 4

The code will be eval'ed with $SAFE >= 4 but the result (#value) will be
used with $SAFE = 0 and you can have problems.

The result of #eval must be cleaned with $SAFE >= 4, before it's
returned.

Hi,

I'm having trouble understanding what you mean by this. Can you say it in
Ruby?

Cheers,
Dave
 
K

Kaspar Schiess

(In response to @brains.eng.cse.dmu.ac.uk by Hugh Sasse Staff Elec Eng)
I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

Just a few random thoughts:

- All of this depends on how complex your mini-language is. Should it be
Turing complete or just really something like fstab ? See
http://www.faqs.org/docs/artu/ch08s01.html (a chapter from the Art of
Unix Programming, including a taxonomy of languages).

- If your language needs loopiness and control constructs like Ruby has
(or should look like Ruby), then you could help me think about a cut-down
version of Ruby for configuration files: This version of Ruby should be
accessible from within Ruby and be totally safe (=cut off of the external
world). Such a thing would have to be execution timeouted (endless loops
and stack overflows) too...

- Your problem was essentially the topic of one of my semester projects.
I tried out a lot of different parser generators, for different
languages. None really could score in the category of simplicity.
Probably just a complex subject. But the problem of having to create a
minilanguage pops up all the time, as you say.

- Parse error reporting: A lot of programming languages have bad parse
error reporting, which essentially comes from the fact that the parser
generator does not help you in this. ANTLR is a bit better there, but IMO
still sucks.

I think there are two really good ideas in this thread:
a) Create and Integrate Grammars as first class objects in Ruby.
(Clifford Heath)
b) Use a cut-down version of Ruby that is designed with security in mind
(various people, this has been done).

I can't help feeling that we're missing a really simple way to do all of
this ...

kaspar

hand manufactured code - www.tua.ch/ruby
 
H

Hugh Sasse Staff Elec Eng

I haven't used Coco, but I'd second Mark's recommendation to stay with
an LL(1) parser if possible. If not, then LL(n) ala ANTLR, but not LALR
ala yacc. The LL parsers are easy to write manually using recursive
descent, which I've done a few times.

I dug out my copy of "Crafting a Compiler", ISBN 0-8053-3021-4
last night, and agree that LL(1) is simpler because it can be done
from a fairly straightforward table. Yacc and Bison are LALR I
think.
[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read
which will turn the problem around, so it becomes easy to handle?

Shift just means "delay a decision about what I've just seen". Reduce is
the operation you do when you do decide. If you explore the ambiguity
in your grammar rules, these start to make more sense.

That's a good description, but it's difficult to hold this
state in one's, (or is that just my?), head
ANTLR does LL(n) for arbitrary n I believe - though you should avoid
n > 3 or humans start to have trouble parsing your language :).

It's a shame that the ANTLR folk at Purdue went Java-only when they
dropped their old C-based implementation. A multi-lingual ANTLR would
be super-cool, especially if it would generate Ruby.

There is also PCCTS

http://dynamo.ecn.purdue.edu/~hankd/PCCTS/

but, although very flexible, it is not what I'd call simple. I've
not really retained what I read about it, but I remember it is a
good design of something with sufficient complexity to meet its
goals.
Not without a tie-in to the lexical analyser to help recognise goto
labels, which require LL(2). Such a tie-in is commonly used however.


Good example, thanks Mark.

I should point out that the major reason for the success of XML
(contrary to most of the hyped claims about it) is that it allows
people to create languages without having to create parsers. Or
rather, they use an XML parser which yields a DOM, and can process
the AST at will.

This is a good point. I'd not really given much thought to XML.
I'm thinking of this for describing problems expressed in
constructive solid geometry, and XML is pretty unpleasant to edit by
hand. I've not got into XSLT, but maybe there {is, could be} a utility that
could take something with Rubyesque blocks and transform it into
XML. I'm not in a position to start writng it, and, of course, I'd
have to parse the input....

"Hence, by induction..."
If you can live with the ugliness of XML and the size&speed of Rexml,

Rexml does make XML handling nice, I have to say.
you should consider it.

There's no good reason why a language like Ruby shouldn't have
grammar rules as first-class objects (as Regexp's are), yielding
Ruby objects that reflect the AST, allowing attribute-grammar
parsers to be written and integrated directly within a program.

"That's a Rite good idea, is that!"
Such a tool, integrated into the Ruby interpreter itself, would
allow extension modules to define *Ruby syntax extensions*, so
that the language itself becomes plastic.

I haven't thought much about what these last two features would
look like in Ruby's case.

Clifford Heath.
Thank you,
Hugh
 
H

Hugh Sasse Staff Elec Eng

My present example is that I want to parse Constructive Solid
Geometry descriptions, at the moment limited to cones, spheres,
[...]

How about using POV-ray's scene description language?

Here somebody did a yacc parser for that language:
http://www.io.com/~wwagner/pov.html
(skip to header "The Parser"). Maybe you can adapt it to use with racc.

Thanks, I'll take a look at that.
(I don't know about the licence.)

Good luck.
Hugh
 
H

Hugh Sasse Staff Elec Eng

Hi Hugh.

Two suggestions:

Most of the docs for parser-generators assume some (lots?) knowledge of
compiler theory. An exception is the O'Reilly Lex&Yacc book. If any of
[...]
Thanks, I'll see if I can get my hands on this. I've read a few books
about these but that description sounds good.
I think that if you do this often, mastering one of these tools will be
a life skill you'll never regret! And yacc (or one of the others) is a
mini-language in its own right, always good to learn another.

Agreed, I'd really like to be fluent in it.
On another track, I have deliberately NOT used these grammers when
parseing the BNF for internet email. In my experience, if you can write
the grammer down in BNF, its usually pretty easy to write a recursive
descent parser for it. It's a little tedious though, which is why all
the tools exist to generate the code for you...

According to "Crafting a Compiler" ISBN 0-8053-3201-4 it seems that
recursive descent implies LL(1), so I think I'd agree with aiming
for that constraint. It describes tecniques for removing the
ambiguity that increases LL(1) to LL(k), so often this is doable.
Have fun,
Sam
Thank you,
Hugh
 
H

Hugh Sasse Staff Elec Eng

Hi,

I'm having trouble understanding what you mean by this. Can you say it in
Ruby?

My understanding is

valid_input = Thread.new(input){|source|
$SAFE=5
result = instance_eval source
final_result = validate(result)
}.value

would be better than:

result = Thread.new(input){|source|
$SAFE=5
result = instance_eval source
}.value
final_result = validate(result) # OOPS!!!

because the validation takes places while "the safety catch is on"
in the first of these two cases.
Cheers,
Dave
Hugh
 
T

ts

H> My understanding is

H> valid_input = Thread.new(input){|source|
H> $SAFE=5
H> result = instance_eval source
H> final_result = validate(result)
H> }.value

Well, a better example is

uln% cat b.rb
#!/usr/local/bin/ruby
source = <<EOT
a = Object.new
class << a
def to_s
puts "hello :)"
super
end
end
a
EOT

puts Thread.new { $SAFE = 4; eval source }.value
uln%

uln% b.rb
hello :)
#<Object:0x2a955c0738>
uln%


i.e. you *MUST* validate the object with $SAFE = 4 and never trust it.



Guy Decoux
 
H

Hugh Sasse Staff Elec Eng

(In response to @brains.eng.cse.dmu.ac.uk by Hugh Sasse Staff Elec Eng)
I seem to have run into my parsing problem again. [...]
So I'm wondering what other people do.

Just a few random thoughts:

- All of this depends on how complex your mini-language is. Should it be
Turing complete or just really something like fstab ? See
http://www.faqs.org/docs/artu/ch08s01.html (a chapter from the Art of
Unix Programming, including a taxonomy of languages).

Interesting that declarative languages are considered less general,
but that may be an artifact of those selected for the figure.
- If your language needs loopiness and control constructs like Ruby has

Mine doesn't, but...
(or should look like Ruby), then you could help me think about a cut-down
version of Ruby for configuration files: This version of Ruby should be

Yes, I'd like one of those. Even having Lua Tables as a native data
type to Ruby would be a big help, but this would be good to have for
the more general case.
,
accessible from within Ruby and be totally safe (=cut off of the external
world). Such a thing would have to be execution timeouted (endless loops
and stack overflows) too...

Good idea.
- Your problem was essentially the topic of one of my semester projects.
I tried out a lot of different parser generators, for different
languages. None really could score in the category of simplicity.
Probably just a complex subject. But the problem of having to create a
minilanguage pops up all the time, as you say.

Thanks for this confirmation that I'm still on the right track! :)
- Parse error reporting: A lot of programming languages have bad parse
error reporting, which essentially comes from the fact that the parser
generator does not help you in this. ANTLR is a bit better there, but IMO
still sucks.

Yes it is difficult. I think you have to sprinkle '.' all over yacc
grammars to do this.
I think there are two really good ideas in this thread:
a) Create and Integrate Grammars as first class objects in Ruby.
(Clifford Heath)
b) Use a cut-down version of Ruby that is designed with security in mind
(various people, this has been done).

I can't help feeling that we're missing a really simple way to do all of
this ...

I think so too, but I can't see the wood for the trees...
kaspar

hand manufactured code - www.tua.ch/ruby

Hugh
 
D

Dave Burt

Hugh Sasse Staff Elec Eng said:
My understanding is

valid_input = Thread.new(input){|source|
$SAFE=5
result = instance_eval source
final_result = validate(result)
}.value

would be better than:

result = Thread.new(input){|source|
$SAFE=5
result = instance_eval source
}.value
final_result = validate(result) # OOPS!!!

because the validation takes places while "the safety catch is on"
in the first of these two cases.

So essentially, the problem is that the result of eval here is tainted
(untrusted input) and needs to be treated with appropriate care. If it was
used at $SAFE=0, the result object would have normal Ruby priveleges. So the
safe thing to do is to only access that result with $SAFE in place, and
maybe to create a new object with valid data taken from the result to use
later.

Thanks for the explanation.

Cheers,
Dave
 
D

Dave Burt

ts said:
Well, a better example is

uln% cat b.rb
#!/usr/local/bin/ruby
source = <<EOT
a = Object.new
class << a
def to_s
puts "hello :)"
super
end
end
a
EOT

puts Thread.new { $SAFE = 4; eval source }.value
uln%

uln% b.rb
hello :)
#<Object:0x2a955c0738>
uln%


i.e. you *MUST* validate the object with $SAFE = 4 and never trust it.

Like this?

Thread.new { $SAFE = 4; puts eval(source) }

Or like this?

puts Thread.new { $SAFE = 4; String.new(eval(source).to_s) }.value

Both of these generate the SecurityError that your example above bypasses by
running at $SAFE=0.

Dave
 
T

ts

D> puts Thread.new { $SAFE = 4; String.new(eval(source).to_s) }.value

Yes, like this

If you want a String, you must call String::new

You can even write it

puts Thread.new { $SAFE = 4; String.new(eval(source)) }.value

if you accept only String object for the result


Guy Decoux
 
M

Mark Probert

Hi ..

There is also PCCTS
PCCTS was the predecessor of ANTLR. I haven't played with either for quite
some time but I seem to remember that PCCTS is written in C and could
generate C or C++, whilst ANTLR is in Java and generates Java, C, C++ and C#.
Also, I think that Dr Parr rolled the lexer into ANTLR in release 2 (a la
Coco).

A quick search turned up a few other LL(k) generators, though most seem to be
directed at Java and C#.
but, although very flexible, it is not what I'd call simple.
That's why I liked Coco/R. A single pass lexer/scanner and a "readable"
grammar. Plus it is fast and efficient.

For most of my work, the LL(1) restriction hasn't been an issue. The only
problem came when I tried to parse Ruby code for fun. Not that I need to do
that for real, after all, there is eval() ;-)
Forth? ;-) it's a case of back to the future ..

Regards,
 
T

tony summerfelt

Another thing I like about Tcl is that it has _no_ special cases.

i'm not sure you'd define them as 'special cases' but tcl can be
quirky if you come from a language like perl:

set x 1

and to refer to it you use '$' so:

puts $x

but to increment it you'd do:

incr x

ie. no '$' used.

and the comment thing is a bit annoying for uneven braces in the same
line...


Its
That kind of goodies is the things that make Tcl quite bearable to me,
despite lacking a damn lot features

which features is it missing for you?
I still have to work with Tcl because of a GUI made in Tk, but I'm quite
happy with it.

tk is easily one of the quicker ways to crank out a gui for an
application in tcl...

although i'm really starting to like rubywebdialogs gui code...

http://home.cogeco.ca/~tsummerfelt1
telnet://ventedspleen.dyndns.org
 
C

Clifford Heath

Mark said:
PCCTS was the predecessor of ANTLR.

Actually, it was the toolset that contained the original ANTLR.
When they decided they needed a Java version, they didn't go
multi-lingual - strange decision for a language research group!

Another suite of language tools we used successfully a few years
back was Eli. Afraid I can't remember much about it now, except
there seemed to be too many tools insufficiently well integrated.

I also had fun with TXL, but that's a language-transformation
language not a compiler generator. It's only applicable when the
input and output grammars are identical or can be structurally
fused - very difficult.
Forth? ;-) it's a case of back to the future ..

No, I mean syntax extensions - not choosing a language with an
almost complete absence of syntax :). So you could define useful
and appropriate domain-specific-languages...

Hugh said:
it's difficult to hold this state in one's, (or is that just my?),
head

No, I agree. The reason is that the grammar rules are transformed
first into simple rules each containing only *two* items. If you
look into how that transformation is done, you'll know what's
happening for a given grammar. The state is much more obvious then,
because it's defined in terms of these atomic rules.

For example, a rule A ::= B C D gets turned into two rules, either
B_C ::= B C
A ::= B_C D
or
C_D ::= C D
A ::= B C_D

depending on precedence. After processing all rules, there will
normally be a number of duplicates, ambiguities and other problems,
so there's about five more stages of checking and massaging until
a parser can be generated. Unfortunately the individuals who are
capable of implementing these processes are so adept at linguistics
that they can't explain the processes in terms that we mortals can
understand :)... though I'm sure I almost understood it once... :)

Clifford.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,265
Latest member
TodLarocca

Latest Threads

Top