When little languages grow...

  • Thread starter Hugh Sasse Staff Elec Eng
  • Start date
H

Hugh Sasse Staff Elec Eng

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

input = nil
File.open("input.txt"){|f|
input = f.read
}
Thread.new(input){|source|
$SAFE=5
instance_eval source
}.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and
parsing, but have found nothing enlightening.

Thank you,
Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.
 
Z

Zach Dennis

To make sure myself and the rest of the list is correctly hearing you
question. You are interested in the ruby way to write a code generator?

And you are looking for input other parsing or implemented solutions
others may have experience with, with languges and tools such as; lisp,
forth, yacc, bison, racc, etc.. ?

Zach



I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

input = nil
File.open("input.txt"){|f|
input = f.read
}
Thread.new(input){|source|
$SAFE=5
instance_eval source
}.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and
parsing, but have found nothing enlightening.

Thank you,
Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.
 
T

ts

H> Thread.new(input){|source|
H> $SAFE=5
H> instance_eval source
H> }.value

Sorry to say this but this is the most common error that I see when
someone try to eval some code with $SAFE >= 4

The code will be eval'ed with $SAFE >= 4 but the result (#value) will be
used with $SAFE = 0 and you can have problems.

The result of #eval must be cleaned with $SAFE >= 4, before it's
returned.


Guy Decoux
 
H

Hugh Sasse Staff Elec Eng

To make sure myself and the rest of the list is correctly hearing you
question. You are interested in the ruby way to write a code generator?

There's probably more than one way, but yes. I don't need to
generate an executable for later use, so interpreting my input is
fine. I need to manage the complexity so I can cope with future
expansion if any.
And you are looking for input other parsing or implemented solutions others
may have experience with, with languges and tools such as; lisp, forth, yacc,
bison, racc, etc.. ?

I'm looking to do this in Ruby. Experience of things that simplify
this, whether they come from other languages or not, is what I am
after. I mention the other languages because I have tried their
styles of handling this problem. My success has been limited. So,
what can I do for more success? :) Look at the problem differntly?
Use another technique?
Hope that is clearer,
Thank you,
Hugh
 
H

Hugh Sasse Staff Elec Eng

H> Thread.new(input){|source|
H> $SAFE=5
H> instance_eval source
H> }.value

Sorry to say this but this is the most common error that I see when
someone try to eval some code with $SAFE >= 4

The code will be eval'ed with $SAFE >= 4 but the result (#value) will be
used with $SAFE = 0 and you can have problems.

Yes, that's a good point.
The result of #eval must be cleaned with $SAFE >= 4, before it's
returned.

Thank you. That would be better than cleaning it aftwerwards, I'd
not really considered that risk.
Guy Decoux
Hugh
 
R

Robert Klemme

Hugh,

the one thing I didn't see in your posting is a statement about the
language. What capabilities should it have? If it's just assigning
constants to vars (like often needed for configurations) then Regexp is
probably fine. From what you write I'm guessing that your envisioned
language is more complex - but how complex? Maybe it's a special case for
which someone somewhere has a solution already.

Regards

robert


Hugh Sasse Staff Elec Eng said:
I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.

The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

The conventional wisdom is to use some form of parser
generator (Yacc, Bison, Racc, Rockit,...) but I don't have
confidence in my ability to get these working well.[1].
I have had great difficulty in the past, certainly.

Other possibilities I have considered and tried are to lash together
some form of Lisp [cf Greenspun's 10th rule of programming] or Forth,
but I don't consider myself fluent in either of those languages, and
they are not as easy a user interface for other people as Ruby would
be. I can get something working, but find it hard to maintain or
improve. [2]

So the next possibility is to use something like

input = nil
File.open("input.txt"){|f|
input = f.read
}
Thread.new(input){|source|
$SAFE=5
instance_eval source
}.value

or something, and actually make the commands in the language methods
of some Ruby object.

It is often observed that it is difficult to add security to a
system, compared to building it in from the start. Can I do this
and still have a good level of security? Should I make the parser
object (whose method's I'm using) a subclass of Nil, to limit it as
much as possible? I need to give people enough rope to hold their
input together, but not enough to hang themselves (or me). I don't
want people to be able to execute arbitrary code, or fiddle with
objects they should not need to touch.

Is there another way to handle input flexibly that I have completely
missed? I've googled for things to do with little languages and parsing,
but have found nothing enlightening.

Thank you,
Hugh

[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. This might just be a weakness on my
part or may have something to do with people's difficulties in
handling modal interfaces: it is hard to switch contexts rapidly.
Maybe there is something I can read which will turn the problem
around, so it becomes easy to handle?

[2] Immensely powerful and fast systems have been written in Forth,
and Lisp is very powerful in the right hands. I just don't have the
experience with these to be effective, yet.
 
W

why the lucky stiff

Hugh said:
The application is immaterial at the moment, but the problem is that
I need to do more than can be done with a simple case statement, and
if I were to use case statements managing the problem would get too
big.

...

So the next possibility is to use something like

input = nil
File.open("input.txt"){|f|
input = f.read
}
Thread.new(input){|source|
$SAFE=5
instance_eval source
}.value

or something, and actually make the commands in the language methods
of some Ruby object.

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

_why
 
M

Mark Probert

Hi ..

I seem to have run into my parsing problem again. Whatever I'm
doing I usually end up having to parse non-simplistic input, and I'm
still not happy about the apparently available solutions to this.
So I'm wondering what other people do.
My personal solution to this is to use Coco/R, an LL(1) scanner/generator.
You can find more information at:

http://www.scifac.ru.ac.za/coco

The primary advantage of this approach, IMHO, is that all of the grammar /
scanning rules are in a single file (rather than the lex/yacc approach).
This makes the grammar quite easy to read and extend, once you are familiar
with the process. Ryan Davies has a pure ruby version, and I have a ruby
extension version. Both seem to work well for little languages.
[1] I find that thinking in the manner of a shift/reduce parser is
particularly unnatural to me. ... Maybe there is something I can read which
will turn the problem around, so it becomes easy to handle?

Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1)
(and other) topics very well. You can find it at:

http://www.scifac.ru.ac.za/compilers/

The primary disadvantage of Coco/R is the LL(1) part. This means that your
grammar needs to be fairly well formed and not arbitrarily complex. As an
example, Ruby can not, as far as I have tried, be converted into an LL(1)
grammar, though C can.

A simple example of the ruby grammar (this is for the famous four function
calculator) for my extension library. Note that this will generate a Ruby
extension. When you compile and link, you can use it in Ruby like this:

# ---( test.rb )-------------
require 'Calc'

f = File.readlines("calc.inp")
t = Calc.new
t.run(f)

if t.success
puts "parsed ok!"
t.capture.each { |ans| puts " ans==#{ans}" }
else
puts "Errors ::"
t.errs.each { |err| puts " --> #{err}" }
end


# ---( calc.inp )-----------
var a,b,c,d;

write 1+(2*3)+4;
write 100/10;

a := 37-12-(4*5);
write a;
b := a*16;
write b*2



# ---( calc.atg )-----------
$C /* Generate Main Module */
COMPILER Calc

#define upcase(c) ((c >= 'a' && c <= 'z')? c-32:c)
int VARS[10000];

int get_spix()
{
char name[20];
LEX_S(name, sizeof(name) - 1);
if (strlen(name) >= 2)
return 26*(upcase(name[1])-'A')+(upcase(name[0])-'A');
else
return (upcase(name[0])-'A');
}

int get_number()
{
char name[20];
LEX_S(name, sizeof(name) - 1);
return atoi(name);
}

void new_var(int spix)
{
VARS[spix] = 0;
}

int get_var(int spix)
{
return VARS[spix];
}

void write_val(int val)
{
char tmp[20];

sprintf(tmp, "%d", val);
t_capture_output(tmp);
}

void set_var(int spix, int val)
{
VARS[spix] = val;
}

IGNORE CASE

CHARACTERS
letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
digit = "0123456789".
eol = CHR(13) .
lf = CHR(10) .

COMMENTS
FROM '--' TO eol

IGNORE eol + lf

TOKENS
ident = letter {letter | digit} .
number = digit {digit} .

PRODUCTIONS
Calc =
[Declarations] StatSeq .

Declarations
= (. int spix; .)
'VAR'
Ident <&spix> (. new_var(spix); .)
{ ',' Ident <&spix> (. new_var(spix); .)
} ';'.

StatSeq =
Stat {';' Stat}.

Stat
= (. int spix, val; .)
| "WRITE" Expr <&val> (. write_val(val); .)
| Ident <&spix> ":=" Expr <&val> (. set_var(spix, val); .) .

Expr <int *exprVal>
= (. int termVal; .)
Term <exprVal>
{ '+' Term <&termVal> (. *exprVal += termVal; .)
| '-' Term <&termVal> (. *exprVal -= termVal; .)
} .

Term <int *termVal>
= (. int factVal; .)
Fact <termVal>
{ '*' Fact <&factVal> (. *termVal *= factVal; .)
| '/' Fact <&factVal> (. *termVal /= factVal; .)
} .

Fact <int *factVal>
= (. int spix; .)
Ident <&spix> (. *factVal = get_var(spix); .)
| number (. *factVal = get_number(); .)
| '(' Expr <factVal> ')' .

Ident <int *spix>
= ident (. *spix = get_spix(); .) .

END Calc.


I hope that this helps.

Regards,
 
H

Hugh Sasse Staff Elec Eng

Hugh,

the one thing I didn't see in your posting is a statement about the language.
What capabilities should it have? If it's just assigning constants to vars

I was trying to keep this general because I run into the problem of
parsing non-simplistic grammars so often. It's easy to do the
<verb> <direct object>
type grammars with lots of verbs[1], but...

My present example is that I want to parse Constructive Solid
Geometry descriptions, at the moment limited to cones, spheres,
bricks with the co-ordinates specified, and I need to specify
material types as well.

I'd also like to be able to declare new objects so they can be
placed. Silly example: Get two small spheres to cap the ends of
a cylinder and call the result a Sausage. Then place several
Sausages in the space at different points.

Later I'd have to extend the language to be able to rotate them into
psoition. Lots of creeping featurism is likely, I suspect. I
didn't have material types to deal with before.
(like often needed for configurations) then Regexp is probably fine. From
Agreed

what you write I'm guessing that your envisioned language is more complex -
but how complex? Maybe it's a special case for which someone somewhere has a
solution already.

Regards

robert

Thank you
Hugh

[1] some years back I got Arthur Secret's Agora (Perl, web by email)
program working and extended it considerably. I used regexps for
that.
 
H

Hugh Sasse Staff Elec Eng

Hugh Sasse Staff Elec Eng ([email protected]) wrote: [...]
So the next possibility is to use something like

input = nil
File.open("input.txt"){|f|
input = f.read
}
Thread.new(input){|source|
$SAFE=5
instance_eval source
}.value

or something, and actually make the commands in the language methods
of some Ruby object.

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

That's a nice metaphor...
[I'll have to explore that RedHanded site a bit more too1
:)]

Hugh
 
H

Hugh Sasse Staff Elec Eng

Perhaps you should take a look at Lua:

I really like Lua, (bought the book), but I need to keep this as pure Ruby.
Given time, energy, etc I'd love to provide Lua tables to Ruby.... I
don't think they'd solve my problem here though.
Thank you,
Hugh
 
H

Hugh Sasse Staff Elec Eng

Hi ..


My personal solution to this is to use Coco/R, an LL(1) scanner/generator.
You can find more information at:

http://www.scifac.ru.ac.za/coco

Thank you. I'd seen this on RAA but not explored it...
The primary advantage of this approach, IMHO, is that all of the grammar /
scanning rules are in a single file (rather than the lex/yacc approach).
This makes the grammar quite easy to read and extend, once you are familiar
with the process. Ryan Davies has a pure ruby version, and I have a ruby
extension version. Both seem to work well for little languages.
I'll probably stay with the pure ruby one, but I'll certainly look
at this. I suspect that the 1 might be the problem for what I'm
trying to do. I can't remember how LL differs from LR now, but I'll
find that pretty easily.

[...]
Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1)
(and other) topics very well. You can find it at:

http://www.scifac.ru.ac.za/compilers/
Thank you.
Hugh
 
B

Booker C. Bense

-----BEGIN PGP SIGNED MESSAGE-----

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

_ This is the one place that Tcl ( at least circa 1994 Tcl )
absolutely fits like a glove. There was a client/server program
called sysctl written by some guys at IBM that allowed you to
assign ACL's to every command in the language. I've never seen
any other secure distributed scripting system that comes close.

_ Of course it was vast overkill for 99% of what you need to
do with that kind of system. As much as I dislike Tcl for other
reasons, it is a very good language for extending applications
via simple scripting. Adding your own specialized commands is
fairly straightforward. I know quite a few scientific groups
that are using Python to do similar things these days.

_ Booker C. Bense

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBQffjDWTWTAjn5N/lAQFKYwQAi4pUuXmOO4gMR4Ul1+Ncw7d6CjShbLmG
Kw+mdnot3lZ0/TDgk/O4ufwCyZFoVPDzqFAdI6HYEuV+3cOaaRYaErYvwElc7yo8
mGzbMuLMBZSFR9wT2/LybOUt3ct8ZOUqhTaHZH3sdsQCjs/kx3THPJ4DzYITmxlt
w2P12rsJUCs=
=mQrp
-----END PGP SIGNATURE-----
 
M

Mark Probert

Hi ..

I'll probably stay with the pure ruby one, but I'll certainly look
at this. I suspect that the 1 might be the problem for what I'm
trying to do. I can't remember how LL differs from LR now, but I'll
find that pretty easily.
If you are starting from scratch, then LL(1) usually isn't a problem. If you
can define the grammar in well formed BNF, then LL(1) will be fine. The real
problem comes with feature creep and making sure that you don't hose your
grammar too much ;-)

Regards,
 
T

Trans

Hi Hugh,

What are you trying to parse exactly? Is it full Ruby source or a small
subset?

If full source, have you considered what it is exactly that might be
unsafe? Maybe there's a way to create you own "Safety Net" by making
certain parts of Ruby inaccessable. Not sure how extactly, seems like
namepsaces would be needed, but it would be an interesting challange.
Also you might try ParseTree.

OTOH, if only a subset (or something completey different), I have a
general purpose and easy to use Parser class I've been working on for
Carats. Perhaps you'd like to try it and see if can help? Doing so
could also help me test/improve it for everyone.

T.
 
H

Hugh Sasse Staff Elec Eng

Hi Hugh,

What are you trying to parse exactly? Is it full Ruby source or a small
subset?

Not full ruby source, just a mini-language, maybe using Ruby's
parser instead of botching(!) my own...
If full source, have you considered what it is exactly that might be

I don't know what would be unsafe yet: I'm assuming that there are
some smart people out there who like doing heap overruns and other
feindish tricks that I'd not envisage.... So, I'd like to be safe
by default rather than be safe for the cases I have considered.
unsafe? Maybe there's a way to create you own "Safety Net" by making
certain parts of Ruby inaccessable. Not sure how extactly, seems like
namepsaces would be needed, but it would be an interesting challange.
Also you might try ParseTree.

I'll have a look for that, thanks.
OTOH, if only a subset (or something completey different), I have a
general purpose and easy to use Parser class I've been working on for
Carats. Perhaps you'd like to try it and see if can help? Doing so
could also help me test/improve it for everyone.

OK, bowl the URL in my direction! Thank you.
 
H

Hugh Sasse Staff Elec Eng

Hi ..


If you are starting from scratch, then LL(1) usually isn't a problem. If you
can define the grammar in well formed BNF, then LL(1) will be fine. The real
problem comes with feature creep and making sure that you don't hose your
grammar too much ;-)

Yes. Just adding material types seems unsurmountable at the moment.
thank you
Hugh
 
E

Edgardo Hames

Not full ruby source, just a mini-language, maybe using Ruby's
parser instead of botching(!) my own...

Is the "mini-language" given or you can build your own "mini-language"?
If you can write your own language, then you should consider writing a
DSL suitable for your problem a la Rails, and then, let the Ruby
interpreter do the whole job.

Kind Regards,
Ed
 
G

George

I hear ya. I wish there was a way to open a jailed namespace. It'd be
like chroot'ing into a module. The sandbox module would be addressable
from ::Object, but would only include a limited set of modules when
offered under chroot.

Sounds like capability security, where - to super-simplify - you can
only access the objects/functionality you can name. It's used by the E
language - see a good description at

http://www.skyhunter.com/marcs/ewalnut.html#SEC42

My guess though is that Ruby is too 'sloppy' for such a thing to work
fully securely. I'd be very happy to be proved wrong!

-- George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top