ANN: Free-form-operators patch

Bill Kelly · Oct 8, 2004

From: "Markus said:
* It makes the parser about 150 lines shorter

Wow, amazing! Sounds like you found a way to generalize
what used to be code handling the existing operators
separately?

That would seem to be quite a positive development,
regardless of whether the generalized free-form operators
are (ultimately) to be allowed in standard Ruby.

Nice work !

Regards,

Bill

Jim Weirich · Oct 8, 2004

Charles Hixson said:

This is one place that I don't think Eiffel got things quite right.
Precedence should be defineable, not necessarily higher than all
built-in operators. If one is defining a cross-product, e.g., one
doesn't necessarily want it to be of higher precedence than whatever one
is using to define the literal representation of the matrix. And
ordinary arithmetic might be of even higher precedence. E.G.:

| .< 1 + 3, 5 * 2, 6 / 3 >. .<n21, n22, n23>. .<n31, n32, n33>. | .*. M

Hmmmm ... I'm not sure that the .<, >. and | are technically operators
(certainly not binary operators). They seem to be merely syntactical
markers.

The problem with definable precedence is that precedence is a compile time
operation and the selection of the operator implementation is a run time
thing. Consider ...

x @a y @b z

(note: Syntax is Eiffel style syntax, so @a and @b are operators. Ignore
for the moment that this conflicts with Ruby syntax for instance variables
and go with me here).

Should this be parsed ((x @a y) @b z) or (x @a (y @b z))? Operators are
generally transformed into method calls, e.g. (x @a y) => (x.@a(y)). So
we look at the runtime class of x to find the operator and determine the
precedence. Unfortunately, at the time the expression is compiled the
runtime class of x isn't available! Indeed, x may have a different class
every time the expression is executed.

So, that leaves us two choices: (1) Reparse the expression everytime it is
executed, or (2) globally define the precedence of an operator across all
classes.

(1) seems a bit too dynamic for my taste, and (2) is troublesome when you
have more than one class defining the operator.

Given that, I wouldn't mind user defined operators in Ruby under the
following conditions...

(A) The operators are visually distinctive. In Eiffel, all user defined
operators must begin with a limited set of symbols (@, #, &, and | IIRC).
After the initial symbol, any printable, non-space character is allowed.
So @in, @cross, &and, #..# are all recognized as operators according to
the Eiffel rules.

(B) Fixed precedence. User defined unary operators should have the same
precedence as other unary operators. User defined Binary operators should
be the same as the highest precedence binary operator under the unary
operators.

Point A means that I don't have to parse a random string of symbols in my
head and decide whether it is legal Ruby syntax or user defined operators
(or Beetle Baily swearing). Ruby's problem is that all the punctuation
characters are already used for other purposes, it would be difficult to
set aside yet another character for this purpose. Perhaps something like
@op@ can be used.

Regarding the matrix example, I find it less than compiling, especially
when existing Ruby can be used to get something quite readable ...

Matrix [
[ 1 + 3, 5 * 2, 6 / 3 ],
[ n21, n22, n23 ],
[ n31, n32, n33 ]
].cross M

Phil Tomson · Oct 9, 2004

A perfect example of why free-form-operators are a bad idea. ;-)

No offense intended. It's just that custom notations are not obvious to
readers/maintainers of the code. Also, adding such flexibility to the
parser means it is probably less capable of recognizing and diagnosing
errors. Then there is the problem of clashes when multiple extensions
try to define the same ops for entirely different operations.

I think this patch is quite cool and I even have immediate applications
for it (the much discussed ':=' operator)... However, I have the same
concern mentioned above. It seems that it could make it more difficult
to detect some syntax errors. Couldn't it also lead to some parsing
ambiguities?

Phil

Markus · Oct 9, 2004

Wow, amazing! Sounds like you found a way to generalize
what used to be code handling the existing operators
separately?

Yes, though I haven't gone as far in that direction as I'd like;
since I'm not a C programmer (and this is my first attempt to mess with
the parser) I was being conservative.

Nice work !

Thanks!

-- Markus

Markus · Oct 9, 2004

Charles Hixson said:

Hmmmm ... I'm not sure that the .<, >. and | are technically operators
(certainly not binary operators). They seem to be merely syntactical
markers.

You could do it, but it would be kludgey unless you generalized it
properly (i.e. take the people who designed smalltalk and the people
behind postscript out for a beer and take notes). The basic trick:
define a class that is a "partial structure" and is the result (and
consumer) of the individual expressions.
But without doing gymnastics, you're right.

The problem with definable precedence is that precedence is a compile time
operation and the selection of the operator implementation is a run time
thing. Consider ...

x @a y @b z

(note: Syntax is Eiffel style syntax, so @a and @b are operators. Ignore
for the moment that this conflicts with Ruby syntax for instance variables
and go with me here).

Should this be parsed ((x @a y) @b z) or (x @a (y @b z))? Operators are
generally transformed into method calls, e.g. (x @a y) => (x.@a(y)). So
we look at the runtime class of x to find the operator and determine the
precedence. Unfortunately, at the time the expression is compiled the
runtime class of x isn't available! Indeed, x may have a different class
every time the expression is executed.

So, that leaves us two choices: (1) Reparse the expression everytime it is
executed, or (2) globally define the precedence of an operator across all
classes.

(1) seems a bit too dynamic for my taste, and (2) is troublesome when you
have more than one class defining the operator.

This is a very good description of why I'm unsure about letting the
user set the precedence/associativity/arity. Do you mind if I use it?

The best answer I have come up with so far, is to require that if
they're compile-time properties are changed they must be declared before
their first use, and that any additional declaration must match.
Declarations of this sort are not commonly done in ruby (but they are
not unheard of either).

Given that, I wouldn't mind user defined operators in Ruby under the
following conditions...

(A) The operators are visually distinctive. In Eiffel, all user defined
operators must begin with a limited set of symbols (@, #, &, and | IIRC).
After the initial symbol, any printable, non-space character is allowed.
So @in, @cross, &and, #..# are all recognized as operators according to
the Eiffel rules.

I'm being a bit more conservative at this point: only characters
that are already found in ruby operators (and only on combinations that
are not already used) may be user defined.

(B) Fixed precedence. User defined unary operators should have the same
precedence as other unary operators. User defined Binary operators should
be the same as the highest precedence binary operator under the unary
operators.

Why the highest? I know that's how eiffel does it, but I don't
recall the rational. I've also considered some function of the length
or contents (such that the built in operators would "fit" the scheme),
but I haven't found anything I like.

Point A means that I don't have to parse a random string of symbols in my
head and decide whether it is legal Ruby syntax or user defined operators
(or Beetle Baily swearing). Ruby's problem is that all the punctuation
characters are already used for other purposes, it would be difficult to
set aside yet another character for this purpose. Perhaps something like
@op@ can be used.

I think "duck typing" on character classes does the job nicely. If
it looks like an operator (e.g., like => <-+ +/- !~~, etc. it is
composed of (and only of) the characters used in ruby to construct
operators and noting else) it is an operator.

Regarding the matrix example, I find it less than compiling, especially
when existing Ruby can be used to get something quite readable ...

Matrix [
[ 1 + 3, 5 * 2, 6 / 3 ],
[ n21, n22, n23 ],
[ n31, n32, n33 ]
].cross M

I agree, though I think the initial point was about the value of
being able to experiment to find a comfortable notation, and not so much
that the example provided was the best one.

-- Markus

Markus · Oct 9, 2004

I think this patch is quite cool and I even have immediate applications
for it (the much discussed ':=' operator)... However, I have the same
concern mentioned above. It seems that it could make it more difficult
to detect some syntax errors. Couldn't it also lead to some parsing
ambiguities?

I don't think so. My policy is: if it could be ambiguous, warn and
interpret it as the old parser would have (break no code). In general,
it is no more confusing to the parser than the existence of reserved
words and identifiers, both made out of lower-case letters.

-- Markus

trans. (T. Onoma) · Oct 9, 2004

On Friday 08 October 2004 11:12 pm, Markus wrote:
| This is a very good description of why I'm unsure about letting the
| user set the precedence/associativity/arity. Â Do you mind if I use it?
|
| Â Â Â The best answer I have come up with so far, is to require that if
| they're compile-time properties are changed they must be declared before
| their first use, and that any additional declaration must match.
| Declarations of this sort are not commonly done in ruby (but they are
| not unheard of either).

Just a thought. Since you are already using only character already used in
Ruby operators (which I think is good idea) then you may also fix the
precedence according to those. You would just need to come up with good set
of rules. For example, where R means one or more other op-chars:

Ruby (high to low) User defined examples with same precedence
** R** **R R**R (contains ** goes right to top)
! ~ + - all unary operators (can we do those 'def @++' ?)
* / % *R /R %R
+ - +R -R
& &R
^ | ^R |R
< > <R R>
= =R R=R (equals on end illegal)

It may have to be a bit more complex then that but you get the idea. Like I
said, just a thought.

T.

Markus · Oct 9, 2004

On Friday 08 October 2004 11:12 pm, Markus wrote:
| This is a very good description of why I'm unsure about letting the
| user set the precedence/associativity/arity. Do you mind if I use it?
|
| The best answer I have come up with so far, is to require that if
| they're compile-time properties are changed they must be declared before
| their first use, and that any additional declaration must match.
| Declarations of this sort are not commonly done in ruby (but they are
| not unheard of either).

Just a thought. Since you are already using only character already used in
Ruby operators (which I think is good idea) then you may also fix the
precedence according to those. You would just need to come up with good set
of rules. For example, where R means one or more other op-chars:

Ruby (high to low) User defined examples with same precedence
** R** **R R**R (contains ** goes right to top)
! ~ + - all unary operators (can we do those 'def @++' ?)
* / % *R /R %R
+ - +R -R
& &R
^ | ^R |R
< > <R R>
= =R R=R (equals on end illegal)

It may have to be a bit more complex then that but you get the idea. Like I
said, just a thought.

That was my "doodle in meeting" project for a few days last week.
I didn't come up with anything I liked. For example, what do you do
with something that comprises two or more extant operators? How do you
decide which part is the R and which is not-R? How do you find the
buddha-nature of a missmatched pair of socks? And what if there is a
chance congruence hidden in an operator that the users don't see?

I'll add you suggestions to my notes, but I'm not optimistic about
that path.

-- Markus

Jim Weirich · Oct 9, 2004

Markus said:
This is a very good description of why I'm unsure about letting the
user set the precedence/associativity/arity. Do you mind if I use it?

Please do!

I'm being a bit more conservative at this point: only characters
that are already found in ruby operators (and only on combinations that
are not already used) may be user defined.

Hmmm ... I felt my position was the conservative one

I think yours
will be quite hard on the programmer.

Quick, without looking ahead ... which of the following operators can
appear in valid ruby code today ...

(A) <-+
(B) +/-
(C) !~~
(D) ++

I'll comment on something else (to provide spoiler prevention

[...] User defined Binary operators should
be the same as the highest precedence binary operator under the unary
operators.

Click to expand...

Why the highest? I know that's how eiffel does it, but I don't
recall the rational. I've also considered some function of the length
or contents (such that the built in operators would "fit" the scheme),
but I haven't found anything I like.

I'm not strongly tied to it being the highest. Whatever it is, it
should be easy to remember. I don't want to have to remember 35 levels
of operator precedence (I can go back to C++ if I want that).

So highest seems like an easy to remember option. If you have another
easy to remember scheme, I would be open to it.

Now back to the legal operators ...

(A) is legal, e.g. 2<-+1 => false
(B) is illegal, e.g. 2+/-1 => illegal, however, 2/-1 is legal
(C) is illegal (but I'm not sure why)
! ~~1 is legal, but !~~1 gives strange results
try: ruby -e '!~~1' and see for yourself.
(D) is legal, e.g. ++1 => 1

Anyways, I'm not sure it will be immediately obvious to the casual
observer that +/- is a user defined operator and ++ and <+- are
concatenations of existing legal Ruby built in operators. (Or that !~~
looks like a concatentation of built in operators, but for some reason
is not).

trans. (T. Onoma) · Oct 9, 2004

On Friday 08 October 2004 11:46 pm, Markus wrote:
| > Ruby (high to low) User defined examples with same precedence
| > ** R** **R R**R (contains ** goes right to top)
| > ! ~ + - all unary operators (can we do those 'def
| > @++' ?) * / % *R /R %R
| > + - +R -R
| > & &R
| > ^ | ^R |R
| > < > <R R>
| > = =R R=R (equals on end illegal)
| >
| > It may have to be a bit more complex then that but you get the idea. Like
| > I said, just a thought.
|
| That was my "doodle in meeting" project for a few days last week.
| I didn't come up with anything I liked. For example, what do you do
| with something that comprises two or more extant operators? How do you
| decide which part is the R and which is not-R? How do you find the
| buddha-nature of a missmatched pair of socks? And what if there is a
| chance congruence hidden in an operator that the users don't see?

Hmm.. well, workable rules can be made.The rules themsleves have precedence
top tp bottom:

1. contains ** precedence level 0 (highest)
2. unaries level 1
3. starts with < or ends with >, level 6
4. contains =, level 7 (lowest)
5. starts with * / %, level 2
6. starts with + -, level 3
7. starts with &, level 4
8. starts with ^ |, level 5

OTOH, if you are saying such rules don't always pleases you, well you have two
choices: make precedence user definable with all the baggage that looks like
it will entail, or accept that you can't always be pleased and that following
Ruby's general order at least makes it easy to remember.

| I'll add you suggestions to my notes, but I'm not optimistic about
| that path.

Well, you never really know until you try. Right?

T.

trans. (T. Onoma) · Oct 9, 2004

|
| (A) is legal, e.g. Â Â 2<-+1 Â Â => false
| (B) is illegal, e.g. Â 2+/-1 Â Â => illegal, however, 2/-1 is legal
| (C) is illegal (but I'm not sure why)
| Â Â Â ! ~~1 is legal, but !~~1 gives strange results
| Â Â Â try: Â ruby -e '!~~1' Â and see for yourself.
| (D) is legal, e.g. Â Â ++1 Â Â Â => 1
|
| Anyways, I'm not sure it will be immediately obvious to the casual
| observer that +/- is a user defined operator and ++ and <+- are
| concatenations of existing legal Ruby built in operators. Â (Or that !~~
| looks like a concatentation of built in operators, but for some reason
| is not).

Spaces b/c important, which isn't necessarily a bad thing. A rule that
operator "on top of operator" requires a space solves problem. So

'!1' or '! 1'

is fine, but two in a row must be

'! !1' or '! ! 1', not '!!1'

as '!!' would be considered one operator.

Or did you figure out a better way, Markus?

T.

Jim Weirich · Oct 9, 2004

trans. (T. Onoma) said:
Spaces b/c important, which isn't necessarily a bad thing. A rule that
operator "on top of operator" requires a space solves problem. So

'!1' or '! 1'

is fine, but two in a row must be

'! !1' or '! ! 1', not '!!1'

This is fine, but at this point you are making incompatible changes to
Ruby. I don't want user defined operators nearly so much that I would
want to break existing code.

trans. (T. Onoma) · Oct 9, 2004

| > Spaces b/c important, which isn't necessarily a bad thing. A rule that
| > operator "on top of operator" requires a space solves problem. So
| >
| > '!1' or '! 1'
| >
| > is fine, but two in a row must be
| >
| > '! !1' or '! ! 1', not '!!1'
|
| This is fine, but at this point you are making incompatible changes to
| Ruby. I don't want user defined operators nearly so much that I would
| want to break existing code.

Sure. I understand. But I wonder how often two operators are used back to
back? I imagine that the most common case is !-1. And actually I think - +
should probably be considered literal "numeral" chars when in front of other
such chars always. Just the same, given the infrequency, I don't think it's
out of the question, unless I've over looked some common idioms. Think of
any?

T.

Charles Hixson · Oct 9, 2004

Phil said:
...
I think this patch is quite cool and I even have immediate applications
for it (the much discussed ':=' operator)... However, I have the same
concern mentioned above. It seems that it could make it more difficult
to detect some syntax errors. Couldn't it also lead to some parsing
ambiguities?

Phil

Perhaps one could require that the user defined operators be delimited
by whitespace? In that way no ambiguities could be introduced.

Markus · Oct 9, 2004

Please do!
Thanks.

Hmmm ... I felt my position was the conservative one I think yours
will be quite hard on the programmer.

Quick, without looking ahead ... which of the following operators can
appear in valid ruby code today ...

(A) <-+
(B) +/-
(C) !~~
(D) ++

I'll comment on something else (to provide spoiler prevention

I'll reply without reading ahead. I don't think any of them are
valid operators in a version of ruby without my patch or something like
it. (D) could occur in the construct x++3, but it would be two
operators. With my patch, any of them could be used as operators if you
defined them, but (D) would generate a warning and parse as two
operators if you didn't make your intention clear with white space.
((A) and (B) would also warn.) I'll read on and see how I did...

[...] User defined Binary operators should
be the same as the highest precedence binary operator under the unary
operators.

Click to expand...

Why the highest? I know that's how eiffel does it, but I don't
recall the rational. I've also considered some function of the length
or contents (such that the built in operators would "fit" the scheme),
but I haven't found anything I like.

Click to expand...

I'm not strongly tied to it being the highest. Whatever it is, it
should be easy to remember. I don't want to have to remember 35 levels
of operator precedence (I can go back to C++ if I want that).

So highest seems like an easy to remember option. If you have another
easy to remember scheme, I would be open to it.

*sigh* Alas, I do not.

Now back to the legal operators ...

(A) is legal, e.g. 2<-+1 => false
(B) is illegal, e.g. 2+/-1 => illegal, however, 2/-1 is legal
(C) is illegal (but I'm not sure why)
! ~~1 is legal, but !~~1 gives strange results
try: ruby -e '!~~1' and see for yourself.
(D) is legal, e.g. ++1 => 1

Ooooo! Good catch on (A). It's three operators though. The main
cost of this patch is that operators will have to be treated the same
way identifiers are; you can't run them together indiscriminately and
have the compiler guess what you meant.

Anyways, I'm not sure it will be immediately obvious to the casual
observer that +/- is a user defined operator and ++ and <+- are
concatenations of existing legal Ruby built in operators. (Or that !~~
looks like a concatentation of built in operators, but for some reason
is not).

That's why you use white space. Note that the confusions you
listed were not created by the free form operators, but were present in
the language all along.

-- Markus

Markus · Oct 9, 2004

This is fine, but at this point you are making incompatible changes to
Ruby. I don't want user defined operators nearly so much that I would
want to break existing code.

T (I never know what to call you. Trans? Someone said "Tom" once,
but I don't know if that was a guess)--anyway "T" pretty much nailed the
syntactic issue, but didn't get into how I am attempting to resolve it.
Short answer: if a concatenation of operator symbols might be ambiguous
I generate a warning (suggesting that spaces be used) and interpret it
as two separate operators (e.g., mimic the old behaviour). If you want
free form operators, and want to have them overlap with the existing
operators, you have to use spaces to make your meaning clear.

-- Markus

Markus · Oct 9, 2004

| > Spaces b/c important, which isn't necessarily a bad thing. A rule that
| > operator "on top of operator" requires a space solves problem. So
| >
| > '!1' or '! 1'
| >
| > is fine, but two in a row must be
| >
| > '! !1' or '! ! 1', not '!!1'
|
| This is fine, but at this point you are making incompatible changes to
| Ruby. I don't want user defined operators nearly so much that I would
| want to break existing code.

Sure. I understand. But I wonder how often two operators are used back to
back? I imagine that the most common case is !-1. And actually I think - +
should probably be considered literal "numeral" chars when in front of other
such chars always. Just the same, given the infrequency, I don't think it's
out of the question, unless I've over looked some common idioms. Think of
any?

No, but going through the day's feedback I found a uncommon one:

x = proc {|a,*|a}

takes one or more arguments and returns the first one, unless you've
applied my patch in which case it blows up.

I'm fixing all known bugs to post a new version in an hour or so,
but this seems pretty perverse to me. Why not just write:

x = proc {|a,*ignore|a}

or something?

-- Markus

gabriele renzi · Oct 9, 2004

Markus ha scritto:

Oh, no worries. There's no risk of that.

The problem is, what if I want more power than you do? And what if
Angilena Mathmatica or Bob Business wants something that neither of us
do? I'd say that the language should provide enough flexibility for
everyone, so long as it's not required that anyone use it who doesn't
want it.

the problem I can see is: +,*,- have a ,more or less obvious semantic.
You can expect them to be implemented to be similar things in diffenret
domains.
The problem is: given that -> is not a standard operator, what could
happen if Angelina and Joe Automator implement themselve it for
different things?

I mean, one could intend it as a function definition operator, while
the other could use it to indicate movements of an hunter in the usual
wumpus labyrinth. then their two library would be completely incompatible.

Maybe we could be happy if standard operators such as
+= or != could be overriden, and everything could become prefix.
I guess this could be enough power.

gabriele renzi · Oct 9, 2004

Jason Voegele ha scritto:

FYI, Eiffel has user-defined prefix and infix operators. The language
rule is that all user-defined operators have a higher precedence than
built-in operators. The built-in operators follow typical precedence
rules.

wow, and I thought Eiffel was one of those non funny languages

Note that even Haskell has [^a-z] as operators, and even the funny thing
that x `foo` y works as foo(x y)

gabriele renzi · Oct 9, 2004

Charles Hixson ha scritto:

This is one place that I don't think Eiffel got things quite right.
Precedence should be defineable, not necessarily higher than all
built-in operators. If one is defining a cross-product, e.g., one
doesn't necessarily want it to be of higher precedence than whatever one
is using to define the literal representation of the matrix. And
ordinary arithmetic might be of even higher precedence. E.G.:

yes, but this would be library interaction nightmare.

[ANN] Free IronRuby IDE	0	Mar 1, 2008
[PATCH] Subtle bug in bignum.c	11	Aug 27, 2004
(Final) [PATCH] Subtle bug in bignum.c	1	Aug 31, 2004
[ANN] RubyGems 1.3.0	14	Sep 26, 2008
[ANN] RubyInstaller Release Candidate 1 - 1.8 and 1.9 releasessigned!	0	Nov 10, 2009
ANN main-4.4.0	0	Nov 25, 2010
[ANN] RubyGems 1.3.2	8	Apr 15, 2009
[ANN] main-2.8.3	2	Oct 23, 2008

ANN: Free-form-operators patch

Bill Kelly

Jim Weirich

Phil Tomson

Markus

Markus

Markus

trans. (T. Onoma)

Markus

Jim Weirich

trans. (T. Onoma)

trans. (T. Onoma)

Jim Weirich

trans. (T. Onoma)

Charles Hixson

Markus

Markus

Markus

gabriele renzi

gabriele renzi

gabriele renzi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads