[RCR] New [] Semantics

Martin DeMello · Oct 7, 2004

Yukihiro Matsumoto said:
Hi,

In message "Re: Range behavior (Re: [RCR] New [] Semantics)"

|Not at all. This is something particular to a Range so why overload #include?
|with that function? Doing so can cause duck-typing problems. To me member?
|and include? are just different names for the same thing and should stay that
|way.

Your opinion makes sense. What do you thinks is the best way to fix?
Or maybe we first need to define the problem to fix to evaluate the fix.

If I've followed along properly, the problem is as follows: Enumerable
sets up a contract that include? is a synonym for member?, in much the
same way that map? is a synonym for collect?. By changing one but not
the other, Range is breaking that contract, which might affect duck
typed code. (Incidentally, I see this as an excellent argument against
having "officially blessed" synonyms at all, but that's another
argument.)

Of course, the underlying problem may be that range should not include
Enumerable at all, or that ContinuousRange and DiscreteRange should be
two entirely differnt objects, with only the latter including
Enumerable. Then we could have a separate RangeLike mixin, with a
contains? operator that does bounds testing, and the DiscreteRange would
mix in Enumerable and hence get include? (and member?) with discrete
semantics.

Separating Range functionality into a mixin might also make it easier to
get goodies like negative-stepping and infinite ranges.

martin

trans. (T. Onoma) · Oct 7, 2004

matz wrote:
| > Your opinion makes sense. What do you thinks is the best way to fix?
| > Or maybe we first need to define the problem to fix to evaluate the fix.

Let me think on it some more. It's actually not a simple problem once you
really start to think about it (see below) --but we need a simple fix.

---

On Thursday 07 October 2004 05:04 am, Martin DeMello wrote:
| > |Not at all. This is something particular to a Range so why overload
| > | #include? with that function? Doing so can cause duck-typing problems.
| > | To me member? and include? are just different names for the same thing
| > | and should stay that way.
| >

| If I've followed along properly, the problem is as follows: Enumerable
| sets up a contract that include? is a synonym for member?, in much the
| same way that map? is a synonym for collect?. By changing one but not
| the other, Range is breaking that contract, which might affect duck
| typed code. (Incidentally, I see this as an excellent argument against
| having "officially blessed" synonyms at all, but that's another
| argument.)

Yes, in a sense that is right. But the issue is two fold. The 1st is the issue
with Range itself. The 2nd, broader and encompassing the 1st, is synonym
overriding --which occurs in a number of places throughout the libs.

To properly address the 1st I think it helps to look at the 2nd. A good
example of this is Object#=== :

Case Equalityâ€”For class Object, effectively the same as calling #==,
but typically overridden by descendents to provide meaningful semantics
in case statements.

Indeed, a few Classes redefine this. The most obvious is Regexp which
make :=== the same as :=~. Less obvious is Range itself which makes it the
same as :member? Or is it include? --between? --ah ha, that's a catch, isn't
it!

Given this lets look directly at 1st issue along with what you said. (We can
come back to Override question later.)

| Of course, the underlying problem may be that range should not include
| Enumerable at all, or that ContinuousRange and DiscreteRange should be
| two entirely differnt objects, with only the latter including
| Enumerable. Then we could have a separate RangeLike mixin, with a
| contains? operator that does bounds testing, and the DiscreteRange would
| mix in Enumerable and hence get include? (and member?) with discrete
| semantics.

At first, I didn't think it mattered, but b/c of what I just touched on above
I see you do have a very good point --there is an important difference. As it
would be imprudent to discount the need for one use over the other, somehow
we need to have both. Off the cuff I see two options: 1) An internal flag:
discrete vs. continuous 2) Or two separate classes, as you suggest. But there
is more to it...

This becomes increasing interesting (or frustrating depending on your slant)
when we consider the further advantages of having a NumericRange --notice
that its advantage is specific to discreteness (member modulo). But all
Ranges are based on succ and are therefore by definition discrete --only
Numeric ranges can be Continuous.

Consider further how succ determines successive members --a Range is an
indeterminate ordered set built by iteration. Oddly one defines a Range with
a first and last argument, but iterations are supposed to be defined by a
seed (first) and the number of successive iterations. And there is good
reason for this: there is no way to be sure that any given _last_ is a member
of the set! Look at this:

class ShrinkyDink
def initialize(x); @x = x.to_i; end
def succ; @x - 1; end
def <=>(b); @x <=> b; end
end

rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)

Not only is ShrinkyDink.new(100) not a member of the range, but nothing
"inbetween" the two is either!

I'm going to stop here for now. I have more to offer, but I want to work on it
a little bit more before I go into it. Also...

| Separating Range functionality into a mixin might also make it easier to
| get goodies like negative-stepping and infinite ranges.

Could you elaborate on this more?

T.

Brian Candler · Oct 7, 2004

Off the cuff I see two options: 1) An internal flag:
discrete vs. continuous 2) Or two separate classes, as you suggest. But there
is more to it...

I can understand why there is not an explicit flag at the moment, because
generally it's implicit:
- you can't use the continuous methods if the lower and upper bounds don't
respond to spaceship*
- you can't use the discrete methods if the lower bound doesn't respond to
'succ', or the upper bound doesn't respond to spaceship*

[*] Actually, I might not have those the right way round. I don't know,
without looking at the source, whether it is bounds <=> member or member <=>
bounds which is tested.

irb(main):001:0> (3.3..5.3).each {|x| puts x}
TypeError: cannot iterate from Float

Since the discrete methods rely on succ, == and spaceship all being
available, then I can't think of any case where a range would work usefully
for the discrete case but not the continuous case.

The flag you propose might be useful if you want to make the semantics of
=== different for discrete/continuous ranges. However it seems reasonable
for === to mean 'continuous range', because that's the case which almost
always is available.

Actually, although Range#each is useful, I can see little use for
Range#member?. Why iterate through a range stepwise when there is almost
always a better way to test membership? (It may be domain-specific though,
e.g. if bounds are Integer then test for elem.to_i == elem)

If Range#member? is not particularly useful, that's another reason why it
should not be the default for ===. And it could be inherited from
Enumerable, rather than having its own implementation.

This becomes increasing interesting (or frustrating depending on your slant)
when we consider the further advantages of having a NumericRange --notice
that its advantage is specific to discreteness (member modulo). But all
Ranges are based on succ and are therefore by definition discrete --only
Numeric ranges can be Continuous.

I don't see in principle why you can't have a continuous range foo..bar

Consider further how succ determines successive members --a Range is an
indeterminate ordered set built by iteration. Oddly one defines a Range with
a first and last argument, but iterations are supposed to be defined by a
seed (first) and the number of successive iterations. And there is good
reason for this: there is no way to be sure that any given _last_ is a member
of the set! Look at this:

Indeed - which is why a Ruby discrete range relies on spaceship to determine
whether the end of iteration has been reached.

class ShrinkyDink
def initialize(x); @x = x.to_i; end
def succ; @x - 1; end
def <=>(b); @x <=> b; end
end

rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)

This is a bit odd though. Your 'succ' method returns an Integer, not another
instance of ShrinkyDink. Do you mean this?

def succ; self.class.new(@x-1); end

Not only is ShrinkyDink.new(100) not a member of the range, but nothing
"inbetween" the two is either!

It's not a 'member', although ShrinkyDink(-50) is. And ShrinkyDink(50) is
included within the range, but it's not a member. Argh!

class ShrinkyDink
attr_reader :x
def initialize(x); @x = x.to_i; end
def succ; ShrinkyDink.new(@x - 1); end
def <=>(b); @x <=> b.x; end
end

rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)
p rng.include?(ShrinkyDink.new(50)) # true
p rng.include?(ShrinkyDink.new(150)) # false
# p rng.member?(ShrinkyDink.new(-50)) # would be true, except for Ruby bug

# rng.each { |a| p a } # infinite loop!

There is an implied contract that #succ takes you 'closer' to the upper
bound. Unfortunately the direction of the spaceship test is fixed. In your
case

rng = ShrinkyDink.new(100)...ShrinkyDink.new(0)

might have been useful in the discrete case, except that it never iterates
because ShrinkyDink.new(100) > ShrinkyDink.new(0) by your spaceship
definition. It's not useful in the continuous case either, because all
values are out of range.

So, how else could discrete ranges be handled? Maybe if they provided either
(a) an iteration count [as you suggested]; or
(b) a final element, which can be tested using == alone; or
(c) a proc which determines when the iteration is complete

rather than forcing the test { |e| e < upperbound }
or { |e| e <= upperbound } which happens now.

-------------------------------------------------------------------------
class DiscreteRange1
include Enumerable
def initialize(lower, count)
@lower = lower
@count = count
end
def each
val = @lower
@count.times do
yield val
val = val.succ
end
end
end

class DiscreteRange2
include Enumerable
def initialize(lower, final, exclude_end=false)
@lower = lower
@final = final
@exclude_end = exclude_end
end
def each
val = @lower
while val != @final
yield val
val = val.succ
end
yield val unless @exclude_end
end
end

class DiscreteRange3
include Enumerable
def initialize(lower, &cond)
@lower = lower
@cond = cond
end
def each
val = @lower
while @cond.call(val)
yield val
val = val.succ
end
end
end

DiscreteRange1.new(10,5).each { |x| puts x }
DiscreteRange2.new(10,14).each { |x| puts x }
DiscreteRange3.new(10) { |x| x < 15 }.each { |x| puts x }
-------------------------------------------------------------------------

The third is the most general pattern. Current Ruby ranges could be
simulated by

DiscreteRange3.new(lower) { |x| x < upper } # exclude_end
DiscreteRange3.new(lower) { |x| x <= upper } # not exclude_end

So, all very interesting, but where does this leave (3..5) ?

- we want interval semantics:
case n
when (0...3)
# xxx
when (3..7.5)
# xxx
end

- we want discrete iterator semantics
b = (0..3).to_a

- there's no reason why the upper bound couldn't be a proc:

(0..proc{|x| x<3}).to_a

- if the upper bound is not a proc, then ISTM that DiscreteRange2 would be a
reasonable pattern, allowing your ShrinkyDinks to iterate backwards. However
it would break

(0..3.5).to_a # becomes infinite

- ranges with a length, rather than an upper bound, are definitely of
interest because of the parallel with substrings and array slices:

a[3,2] # from 3 for 2 elements

and so I wonder if having an explicit object for them could tidy this area
up.

That's more than enough thinking aloud though!

Regards,

Brian.

Markus · Oct 7, 2004

--only Numeric ranges can be Continuous.

Not so. Time ranges, for example, can be Continuous, as could
Colour or FuzzyTruthValue or...

Consider further how succ determines successive members --a Range is an
indeterminate ordered set built by iteration. Oddly one defines a Range with
a first and last argument, but iterations are supposed to be defined by a
seed (first) and the number of successive iterations.

Supposed by whom? You can just as easily define an iteration with
a termination test (e.g. iterating over the lines of a File).

And there is good
reason for this: there is no way to be sure that any given _last_ is a member
of the set! Look at this:

class ShrinkyDink
def initialize(x); @x = x.to_i; end
def succ; @x - 1; end
def <=>(b); @x <=> b; end
end

rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)

Not only is ShrinkyDink.new(100) not a member of the range, but nothing
"inbetween" the two is either!

This fails because the implementation of either succ or <=> is
aberrant; by definition x.succ > x, while that is not the case for
ShrinkyDinks. If you start going down that path (e.g. "but what if they
write:

def ShrinkyDink.to_s()
`rm -f *`
end

"?) no answer will withstand scrutiny.

-- Markus

trans. (T. Onoma) · Oct 7, 2004

Brian, you make some excellent points, and clarify the issues. Thanks.

A BIG question that comes to mind after reading your post is this: What do
people usually use Range for? Is it continuous or discrete? I know there is
some of both, but I wonder if one far exceeds the other. Hopefully yes, b/c I
think the solution must basically fall to something like this:

- Range is defined as discrete. It represents an iterative set.

- The range has it's own #succ which *defaults* to the #succ
method of the seed (first) value, but can be easily overridden.

- The Ranges #succ can be overridden as either a proc or a an object.
If an object, then each step is simply determined by the addition
of that object, i.e. self + #succ. If that succ object is a
Numeric, then the range is a numeric range and can gain the
speed advantages of the faster modulo #member? code.

- Range is Enumerable, and #include? is the same as #member? and #===.

- It should be defined by seed and number of steps, not an end member.
If not, then we need a "when to end" proc, as well as end_exclusive?
in order for this proc to default to succ <=> end >= 0, or > 0.
plus the dilema of infinite loops. That gets messy. So I think not.

- A new class is made for the continuous uses called Interval.

- The Interval is defined by a first and last, exclusive/inclusive.

- For Interval the #member?, #include, and #=== methods mean
"circumscribes".

- I'm not sure how the two will relate, if one can be subclass of the other
but no doubt the Interval can have a suitable #to_rng(div_size) method.

Of course that's all well and good, but what about the literal form now? Will
a..b be an Interval or a Range? And how will we literally write the other?

T.

On Thursday 07 October 2004 11:07 am, Brian Candler wrote:
| I can understand why there is not an explicit flag at the moment, because
| generally it's implicit:
| - you can't use the continuous methods if the lower and upper bounds don't
| respond to spaceship*
| - you can't use the discrete methods if the lower bound doesn't respond to
| 'succ', or the upper bound doesn't respond to spaceship*
|
| [*] Actually, I might not have those the right way round. I don't know,
| without looking at the source, whether it is bounds <=> member or member
| <=> bounds which is tested.
|
| irb(main):001:0> (3.3..5.3).each {|x| puts x}
| TypeError: cannot iterate from Float
|
| Since the discrete methods rely on succ, == and spaceship all being
| available, then I can't think of any case where a range would work usefully
| for the discrete case but not the continuous case.
|
| The flag you propose might be useful if you want to make the semantics of
| === different for discrete/continuous ranges. However it seems reasonable
| for === to mean 'continuous range', because that's the case which almost
| always is available.
|
| Actually, although Range#each is useful, I can see little use for
| Range#member?. Why iterate through a range stepwise when there is almost
| always a better way to test membership? (It may be domain-specific though,
| e.g. if bounds are Integer then test for elem.to_i == elem)
|
| If Range#member? is not particularly useful, that's another reason why it
| should not be the default for ===. And it could be inherited from
| Enumerable, rather than having its own implementation.
|
| > This becomes increasing interesting (or frustrating depending on your
| > slant) when we consider the further advantages of having a NumericRange
| > --notice that its advantage is specific to discreteness (member modulo).
| > But all Ranges are based on succ and are therefore by definition discrete
| > --only Numeric ranges can be Continuous.
|
| I don't see in principle why you can't have a continuous range foo..bar
| where foo and bar are not Numeric, but do respond to <=>. Can't think of a
| useful example though

|
| > Consider further how succ determines successive members --a Range is an
| > indeterminate ordered set built by iteration. Oddly one defines a Range
| > with a first and last argument, but iterations are supposed to be defined
| > by a seed (first) and the number of successive iterations. And there is
| > good reason for this: there is no way to be sure that any given _last_ is
| > a member of the set! Look at this:
|
| Indeed - which is why a Ruby discrete range relies on spaceship to
| determine whether the end of iteration has been reached.
|
| > class ShrinkyDink
| > def initialize(x); @x = x.to_i; end
| > def succ; @x - 1; end
| > def <=>(b); @x <=> b; end
| > end
| >
| > rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)
|
| This is a bit odd though. Your 'succ' method returns an Integer, not
| another instance of ShrinkyDink. Do you mean this?
|
| def succ; self.class.new(@x-1); end
|
| > Not only is ShrinkyDink.new(100) not a member of the range, but nothing
| > "inbetween" the two is either!
|
| It's not a 'member', although ShrinkyDink(-50) is. And ShrinkyDink(50) is
| included within the range, but it's not a member. Argh!
|
| class ShrinkyDink
| attr_reader :x
| def initialize(x); @x = x.to_i; end
| def succ; ShrinkyDink.new(@x - 1); end
| def <=>(b); @x <=> b.x; end
| end
|
| rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)
| p rng.include?(ShrinkyDink.new(50)) # true
| p rng.include?(ShrinkyDink.new(150)) # false
| # p rng.member?(ShrinkyDink.new(-50)) # would be true, except for Ruby
| bug
|
| # rng.each { |a| p a } # infinite loop!
|
| There is an implied contract that #succ takes you 'closer' to the upper
| bound. Unfortunately the direction of the spaceship test is fixed. In your
| case
|
| rng = ShrinkyDink.new(100)...ShrinkyDink.new(0)
|
| might have been useful in the discrete case, except that it never iterates
| because ShrinkyDink.new(100) > ShrinkyDink.new(0) by your spaceship
| definition. It's not useful in the continuous case either, because all
| values are out of range.
|
| So, how else could discrete ranges be handled? Maybe if they provided
| either (a) an iteration count [as you suggested]; or
| (b) a final element, which can be tested using == alone; or
| (c) a proc which determines when the iteration is complete
|
| rather than forcing the test { |e| e < upperbound }
| or { |e| e <= upperbound } which happens now.
|
| -------------------------------------------------------------------------
| class DiscreteRange1
| include Enumerable
| def initialize(lower, count)
| @lower = lower
| @Count = count
| end
| def each
| val = @lower
| @count.times do
| yield val
| val = val.succ
| end
| end
| end
|
| class DiscreteRange2
| include Enumerable
| def initialize(lower, final, exclude_end=false)
| @lower = lower
| @final = final
| @exclude_end = exclude_end
| end
| def each
| val = @lower
| while val != @final
| yield val
| val = val.succ
| end
| yield val unless @exclude_end
| end
| end
|
| class DiscreteRange3
| include Enumerable
| def initialize(lower, &cond)
| @lower = lower
| @cond = cond
| end
| def each
| val = @lower
| while @cond.call(val)
| yield val
| val = val.succ
| end
| end
| end
|
| DiscreteRange1.new(10,5).each { |x| puts x }
| DiscreteRange2.new(10,14).each { |x| puts x }
| DiscreteRange3.new(10) { |x| x < 15 }.each { |x| puts x }
| -------------------------------------------------------------------------
|
| The third is the most general pattern. Current Ruby ranges could be
| simulated by
|
| DiscreteRange3.new(lower) { |x| x < upper } # exclude_end
| DiscreteRange3.new(lower) { |x| x <= upper } # not exclude_end
|
| So, all very interesting, but where does this leave (3..5) ?
|
| - we want interval semantics:
| case n
| when (0...3)
| # xxx
| when (3..7.5)
| # xxx
| end
|
| - we want discrete iterator semantics
| b = (0..3).to_a
|
| - there's no reason why the upper bound couldn't be a proc:
|
| (0..proc{|x| x<3}).to_a
|
| - if the upper bound is not a proc, then ISTM that DiscreteRange2 would be
| a reasonable pattern, allowing your ShrinkyDinks to iterate backwards.
| However it would break
|
| (0..3.5).to_a # becomes infinite
|
| - ranges with a length, rather than an upper bound, are definitely of
| interest because of the parallel with substrings and array slices:
|
| a[3,2] # from 3 for 2 elements
|
| and so I wonder if having an explicit object for them could tidy this area
| up.
|
| That's more than enough thinking aloud though!
|
| Regards,
|
| Brian.

--
( o _ ã‚«ãƒ©ãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

Ara.T.Howard · Oct 7, 2004

A BIG question that comes to mind after reading your post is this: What do
people usually use Range for?

- huge indexes sets into binary data
- open ended time ranges (infinity one one end)
- floating point intervals
- sets of strings

Is it continuous or discrete?
both

I know there is some of both, but I wonder if one far exceeds the other.
Hopefully yes, b/c I think the solution must basically fall to something
like this:

- Range is defined as discrete. It represents an iterative set.

this seems limiting because it disallows even the most basic ranges like
float ranges - which are both infinite and continuous. also, if you allow
user defined succ procs you allow non-discrete discrete objects. in
otherwords you give the user the ability to break the definition which is
like allowing a method of Array to somehow make and Array not an Array -
this can't be good. i realize we are talking about the coder making an
infinite loop for himself, but what's different here is that you are wanting
to encapsulate loop behaviour. if you encapsulate a discrete loop then all
operations on that loop should not be able to break it's discretness no?

- Range is Enumerable, and #include? is the same as #member? and #===.

this eliminates some of the most useful aspects of ranges though? eg:

now = Time::now

period = (now ... (now + 42.days))

then = method_returning_time_with_usecs

do_something if period.include? then

- A new class is made for the continuous uses called Interval.

that would break alot of code. why not flip flop: leave Range the way it
is and introduce a new class called IterativeSet or Iteration or
DiscreteRange or something. this might be more inline with what you are
wanting. for example it might make sense for Iterations to be able to do
this:

it = Iteration::new 0, 42

it.iterate(1){|i| p i} #=> 0, 1, 2, 3, etc...

it.iterate(2){|i| p i} #=> 0, 2, 4, 6, etc...

perhaps there could even be a literal

it = 0 -> 42

it.each{|i| p i}

or

it = 0 +> 42

it.each{|i| p i}

where the -> or +> indicate a 'step' operator

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

Brian Candler · Oct 7, 2004

- The Ranges #succ can be overridden as either a proc or a an object.
If an object, then each step is simply determined by the addition
of that object, i.e. self + #succ. If that succ object is a
Numeric, then the range is a numeric range and can gain the
speed advantages of the faster modulo #member? code.

I think more generally, if you provide a proc, then you don't need to
stipulate addition here; the proc could be { |x| x+1 }, or it could be
{ |x| x*2 }

class DiscreteRange4
include Enumerable
def initialize(seed, count, succmeth=:succ)
@seed = seed
@count = count
@succmeth = succmeth
end
def each
val = @seed
if @succmeth.respond_to?

call)
@count.times do
yield val
val = @succmeth.call(val)
end
else
@count.times do
yield val
val = val.send(@succmeth)
end
end
end
end

DiscreteRange4.new(10,5).each { |x| puts x }
DiscreteRange4.new(10,5,proc {|x| x=x-1}).each { |x| puts x }

(leaving aside that we might want to replace count with an end value or an
end test proc)

So, we've made an iterator generator. I'm not sure how useful this is,
because in these cases it's probably simpler just to write your own
iterator:

x = 10
5.times do
puts x
x = x-1
end

Especially where the range is given by a length, and so you can use n.times
{ block } to run it.

Regards,

Brian.

trans. (T. Onoma) · Oct 7, 2004

11, trans. (T. Onoma) wrote:
| > --only Numeric ranges can be Continuous.
|
| Not so. Time ranges, for example, can be Continuous, as could
| Colour or FuzzyTruthValue or...

True. Although I think one could argue, that the defining characteristic is a
*capability* to functionally map to rational numbers. So in a sense they are
numeric too, even if they don't use Numeric as a base class. Is that
important?

So yes, you are right. We need to consider Intervals of variant objects. But
how?

| > Consider further how succ determines successive members --a Range is an
| > indeterminate ordered set built by iteration. Oddly one defines a Range
| > with a first and last argument, but iterations are supposed to be defined
| > by a seed (first) and the number of successive iterations.
|
| Supposed by whom? You can just as easily define an iteration with
| a termination test (e.g. iterating over the lines of a File).

Okay, sure. My point was simply that you get no promises with a termination
test.

| > And there is good
| > reason for this: there is no way to be sure that any given _last_ is a
| > member of the set! Look at this:
| >
| > class ShrinkyDink
| > def initialize(x); @x = x.to_i; end
| > def succ; @x - 1; end
| > def <=>(b); @x <=> b; end
| > end
| >
| > rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)
| >
| > Not only is ShrinkyDink.new(100) not a member of the range, but nothing
| > "inbetween" the two is either!
|
| This fails because the implementation of either succ or <=> is
| aberrant; by definition x.succ > x, while that is not the case for
| ShrinkyDinks. If you start going down that path (e.g. "but what if they
| write:

Not really. I went to an extreme purposefully to demonstrate a point. For
something more realistic try the Collatz conjecture.

Nonetheless, I take it your point is basically "it's the programmers problem"
with regards to whether there's a real end member or termination?

T.

trans. (T. Onoma) · Oct 7, 2004

19:21AM +0900, trans. (T. Onoma) wrote:
| > - The Ranges #succ can be overridden as either a proc or a an object.
| > If an object, then each step is simply determined by the addition
| > of that object, i.e. self + #succ. If that succ object is a
| > Numeric, then the range is a numeric range and can gain the
| > speed advantages of the faster modulo #member? code.
|
| I think more generally, if you provide a proc, then you don't need to
| stipulate addition here; the proc could be { |x| x+1 }, or it could be
| { |x| x*2 }

Except you gain no speed advantages in dealing with numeric ranges, which I'm
sure is the most common type of range in use.

| class DiscreteRange4
| include Enumerable
| def initialize(seed, count, succmeth=:succ)
| @seed = seed
| @count = count
| @succmeth = succmeth
| end
| def each
| val = @seed
| if @succmeth.respond_to?

call)
| @count.times do
| yield val
| val = @succmeth.call(val)
| end
| else
| @count.times docreating
| yield val
| val = val.send(@succmeth)
| end
| end
| end
| end
|
| DiscreteRange4.new(10,5).each { |x| puts x }
| DiscreteRange4.new(10,5,proc {|x| x=x-1}).each { |x| puts x }
|
| (leaving aside that we might want to replace count with an end value or an
| end test proc)
|
| So, we've made an iterator generator. I'm not sure how useful this is,
| because in these cases it's probably simpler just to write your own
| iterator:
|
| x = 10
| 5.times do
| puts x
| x = x-1
| end
|
| Especially where the range is given by a length, and so you can use n.times
| { block } to run it.

The why do we have a range at all? I can easily test x >= a && x <= b, too. So
interval functionality seems trivial. On the other hand, a Range has the
advantages of being a portable and pollable iterator, base on inherent traits
of the seed value (i.e. succ) --but being able to override this is hardly
useless either. It's advantage is that it is a self contained object with
useful methods.

Do you think DiscreteRange (Range) and ContinuousRange (Interval) need to be
separated in some way as has been suggested?

T.

Markus · Oct 7, 2004

11, trans. (T. Onoma) wrote:
| > --only Numeric ranges can be Continuous.
|
| Not so. Time ranges, for example, can be Continuous, as could
| Colour or FuzzyTruthValue or...

True. Although I think one could argue, that the defining characteristic is a
*capability* to functionally map to rational numbers. So in a sense they are
numeric too, even if they don't use Numeric as a base class. Is that
important?

Well, there are continuous mathematical constructs that _don't_ map
to the rationals (e.g. the reals), so I think it's a bad definition. I
may someday want to implement something that can represent sqrt(2)
exactly...

So yes, you are right. We need to consider Intervals of variant objects. But
how?

By having an Interval class that uses duck typing on its
sentinels? Maybe I'm missing the problem (I'm busy trying to get my
roll-your-own-operators patch working and haven't been following this
thread as closely as I'd like to) but why is this considered a problem?

It seems to me that restricting them to a specific base class would
be the harder (and less useful) of the options.

| > Consider further how succ determines successive members --a Range is an
| > indeterminate ordered set built by iteration. Oddly one defines a Range
| > with a first and last argument, but iterations are supposed to be defined
| > by a seed (first) and the number of successive iterations.
|
| Supposed by whom? You can just as easily define an iteration with
| a termination test (e.g. iterating over the lines of a File).

Okay, sure. My point was simply that you get no promises with a termination
test.

Start: first
Succ: next = (2+sqr(this))*(2+sqr(step))
Test: this < last

Here's a simple (albeit contrived) case with a termination test
where you can promise that it terminates, even without knowing how many
steps it takes.

| > And there is good
| > reason for this: there is no way to be sure that any given _last_ is a
| > member of the set! Look at this:
| >
| > class ShrinkyDink
| > def initialize(x); @x = x.to_i; end
| > def succ; @x - 1; end
| > def <=>(b); @x <=> b; end
| > end
| >
| > rng = ShrinkyDink.new(0)...ShrinkyDink.new(100)
| >
| > Not only is ShrinkyDink.new(100) not a member of the range, but nothing
| > "inbetween" the two is either!
|
| This fails because the implementation of either succ or <=> is
| aberrant; by definition x.succ > x, while that is not the case for
| ShrinkyDinks. If you start going down that path (e.g. "but what if they
| write:

Not really. I went to an extreme purposefully to demonstrate a point. For
something more realistic try the Collatz conjecture.

*laugh* First response: I'll write Collatz#succ if you write
Collatz#<=>!

Second response: Unless you're talking about the congruence sets
(attractors), in which case I'll write <=> and you can write succ. Then
if "p Collatz.new(1).succ" returns anything we can split the money.

Third response: what exactly would a range (or interval) mean in
that context?

Nonetheless, I take it your point is basically "it's the programmers problem"
with regards to whether there's a real end member or termination?

Not exactly. But we can't prevent them from writing code that
won't _ever_ terminate without severely hobbling the language (e.g.
Turring) so we shouldn't make this a major consideration. Better to
have the semantics clear (and well named, and well documented) so that
they can avoid shooting them selves in the foot, rather than try to
control their feet.

For that matter (again, with the contrived examples), I might want
that sort of behavior; suppose my subtype of Numeric defines a
congruence operation that is _not_ monotonic; thus member? on a Range of
them might need to iterate through them.

-- Markus

Markus · Oct 7, 2004

it = 0 -> 42

or

it = 0 +> 42

where the -> or +> indicate a 'step' operator

I've almost sort of got the roll your own operators patch
working*...

-- Markus

* Translation: it seems to works or it dies mysteriously at whim**.

** Its, whim, not mine.

Ara.T.Howard · Oct 7, 2004

I've almost sort of got the roll your own operators patch
working*...

-- Markus

* Translation: it seems to works or it dies mysteriously at whim**.

** Its, whim, not mine.

lol.

sounds pretty cool.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

trans. (T. Onoma) · Oct 7, 2004

55, trans. (T. Onoma) wrote:
| > On Thursday 07 October 2004 11:27 am, Markus wrote:
| > | On Thu, 2004-10-07 at 07:11, trans. (T. Onoma) wrote:
| > | > --only Numeric ranges can be Continuous.
| > |
| > | Not so. Time ranges, for example, can be Continuous, as could
| > | Colour or FuzzyTruthValue or...
| >
| > True. Although I think one could argue, that the defining characteristic
| > is a *capability* to functionally map to rational numbers. So in a sense
| > they are numeric too, even if they don't use Numeric as a base class. Is
| > that important?
|
| Well, there are continuous mathematical constructs that _don't_ map
| to the rationals (e.g. the reals), so I think it's a bad definition. I
| may someday want to implement something that can represent sqrt(2)
| exactly...

Sorry, I meant reals.

| > So yes, you are right. We need to consider Intervals of variant objects.
| > But how?
|
| By having an Interval class that uses duck typing on its
| sentinels? Maybe I'm missing the problem (I'm busy trying to get my
| roll-your-own-operators patch working and haven't been following this
| thread as closely as I'd like to) but why is this considered a problem?

Okay, yes. I just meant which methods apply? I guess #<=> is all we have. So
if they are comparable they can form a continuous range. And membership is
constitute on falling between the sentinels.

| It seems to me that restricting them to a specific base class would
| be the harder (and less useful) of the options.
|
| > | > Consider further how succ determines successive members --a Range is
| > | > an indeterminate ordered set built by iteration. Oddly one defines a
| > | > Range with a first and last argument, but iterations are supposed to
| > | > be defined by a seed (first) and the number of successive iterations.
| > |
| > | Supposed by whom? You can just as easily define an iteration with
| > | a termination test (e.g. iterating over the lines of a File).
| >
| > Okay, sure. My point was simply that you get no promises with a
| > termination test.
|
| Start: first
| Succ: next = (2+sqr(this))*(2+sqr(step))
| Test: this < last
|
| Here's a simple (albeit contrived) case with a termination test
| where you can promise that it terminates, even without knowing how many
| steps it takes.

I knew you were going to say something like that! But I think you know what I
mean ... just that you don't _always_ know -- and in fact a range might
terminate with one seed but not another. But I don't think that's too
important. We can go the "shoot yourself in the foot route". That's fine with
me. It would be nice to offer a notation, though, for "keep your foot out of
trouble" too.

(i.e. seed,steps)

I'll have to finish later.... got to run.

T.

Markus · Oct 7, 2004

--=-bGfylPCk97kwreyt0ejI
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

!!! -> <- -- +++ -=- */* !! ~~~ <=..< ....

* Have you ever wanted to define your own operators in ruby?

* Have you ever wanted to play around with experimental hash or
range semantics, but wanted a concise syntax for the
constructors?

* Are you crazy enough to install a compiler patch from someone
WHO OPENLY ADMITS TO NOT KNOW C? (for those of you that need
everything quantified, that's a metric craziness index of about
0.6 Why)

If you answered yes to all of the above, have I got a patch for you.

----------------------------------------------------------------

As mentioned previously, I've been working on a patch that would let you
write things like:

class Pair
attr_accessor :l,:r
def initialize(l,r)
@l,@r = l,r
end
end

class Object
def -->(other)
Pair.new(self,other)
end
end

print 1-->5,"\n"

This is the pre-alpha release of that patch. Basically, any sequence
of "operator characters" that isn't otherwise used is now user
definable, though you are warned to use spaces where this would be
ambiguous (e.g. x+=-1).

In this version, the operators are always binary, non-associative,
and mid-precedence. I can see how to let the users set the precedence,
but can not figure out what "scope" the precedence declarations should
have. I can NOT see how it could depend on the class of the recipient,
which would be in some ways idea and in others hideous.

Several of you have expressed interest in this; let me know if you
try it out & what you think. Bug reports are especially welcome.

-- Markus

----------------------------------------------------------------

Normally try to preface code I post with some sort of warning about
the quality/soundness of the code I'm posting. In this case, though, my
C skills aren't up to rating my code here. To me, it is much easier to
read (albeit in theory microscopically slower) than the way C is usually
written. Native speakers of C will doubtlessly be scandalized by my
accent.

So the best I can do is say, 1) it works on my machine, 2) I
haven't been able to measure any performance impact, 3) it doesn't seem
to break anything that I know of, though it does generate some warnings
(by design) on existing scripts that run operators together (e.g.
x+=-1).

I am NOT using this in a production environment, nor would I
recommend any else do so. But it's fun to play with.

--=-bGfylPCk97kwreyt0ejI
Content-Disposition: attachment; filename=ruby_free_op_patch.0.00001.diff
Content-Type: text/plain; name=ruby_free_op_patch.0.00001.diff; charset=UTF-8
Content-Transfer-Encoding: 7bit

--- ruby-1.8.2.mqr/parse.y 2004-10-07 11:32:40.000000000 -0700
+++ ruby-1.8.2/parse.y 2004-07-21 20:34:08.000000000 -0700
@@ -293,7 +293,7 @@
%token tAMPER /* & */
%token tSYMBEG tSTRING_BEG tXSTRING_BEG tREGEXP_BEG tWORDS_BEG tQWORDS_BEG
%token tSTRING_DBEG tSTRING_DVAR tSTRING_END
-%token tGENERIC_OP /* -->, <++<, etc. */
+
/*
* precedence table
*/
@@ -311,7 +311,7 @@
%nonassoc tDOT2 tDOT3
%left tOROP
%left tANDOP
-%nonassoc tCMP tEQ tEQQ tNEQ tMATCH tNMATCH tGENERIC_OP
+%nonassoc tCMP tEQ tEQQ tNEQ tMATCH tNMATCH
%left '>' tGEQ '<' tLEQ
%left '|' '^'
%left '&'
@@ -917,7 +917,6 @@
| '^' { $$ = '^'; }
| '&' { $$ = '&'; }
| tCMP { $$ = tCMP; }
- | tGENERIC_OP { $$ = tGENERIC_OP; }
| tEQ { $$ = tEQ; }
| tEQQ { $$ = tEQQ; }
| tMATCH { $$ = tMATCH; }
@@ -1127,10 +1126,6 @@
{
$$ = call_op($1, tCMP, 1, $3);
}
- | arg tGENERIC_OP arg
- {
- $$ = call_op($1, tGENERIC_OP, 1, $3);
- }
| arg '>' arg
{
$$ = call_op($1, '>', 1, $3);
@@ -3304,65 +3299,12 @@

#define IS_ARG() (lex_state == EXPR_ARG || lex_state == EXPR_CMDARG)

-/* MQR */
-#define WAS_ARG() (pre_op_state == EXPR_ARG || pre_op_state == EXPR_CMDARG)
-#define AMBI_ARG() (WAS_ARG() && space_seen && !ISSPACE(*lex_p))
-#define TRUE 1
-#define FALSE 0
-#define is_operator_character(c) (ISASCII(c) && ((c) == '+' || (c) == '-' || (c) == '*' || (c) == '/' || (c) == '=' || (c) == '<' || (c) == '>' || (c) == '.' || (c) == '%' || (c) == '^' || (c) == '!' || (c) == '&' || (c) == '|' || (c) == '~'))
-
-static int is_operator(found_len,len,op,state)
- char *op;
- int found_len,len;
- enum lex_state state;
- {
- if (found_len == len && strncmp(lex_p, op, len) == 0) {
- lex_p += len;
- lex_state = state;
- return TRUE;
- }
- else
- return FALSE;
- }
-static int space_op_warn(len,result)
- int len,result;
- {
- int i;
- for (i=len;i > 0;i--) pushback(i);
- rb_warning("Put space between operators for future version");
- return result;
- }
-static int operand_prefix_warn(ambiguious,pre_op_state,as_op,as_prefix)
- int ambiguious,as_op,as_prefix;
- enum lex_state pre_op_state;
- {
- if (ambiguious) {
- /* patch the 'as_op' character into message? */
- rb_warning("`*' or '&' interpreted as argument prefix");
- return as_prefix;
- }
- else if (pre_op_state == EXPR_BEG || pre_op_state == EXPR_MID) {
- return as_prefix;
- }
- else {
- return as_op;
- }
- }
-static int assignment_operator(op)
- int op;
- {
- yylval.id = op;
- lex_state = EXPR_BEG;
- return tOP_ASGN;
- }
-
static int
yylex()
{
- register int c,c2;
+ register int c;
int space_seen = 0;
int cmd_state;
- enum lex_state pre_op_state,post_op_state;

if (lex_strterm) {
int token;
@@ -3386,7 +3328,6 @@
cmd_state = command_start;
command_start = Qfalse;
retry:
-
switch (c = nextc()) {
case '\0': /* NUL */
case '\004': /* ^D */
@@ -3419,6 +3360,54 @@
command_start = Qtrue;
lex_state = EXPR_BEG;
return '\n';
+
+ case '*':
+ if ((c = nextc()) == '*') {
+ if ((c = nextc()) == '=') {
+ yylval.id = tPOW;
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ pushback(c);
+ c = tPOW;
+ }
+ else {
+ if (c == '=') {
+ yylval.id = '*';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ pushback(c);
+ if (IS_ARG() && space_seen && !ISSPACE(c)){
+ rb_warning("`*' interpreted as argument prefix");
+ c = tSTAR;
+ }
+ else if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
+ c = tSTAR;
+ }
+ else {
+ c = '*';
+ }
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ return c;
+
+ case '!':
+ lex_state = EXPR_BEG;
+ if ((c = nextc()) == '=') {
+ return tNEQ;
+ }
+ if (c == '~') {
+ return tNMATCH;
+ }
+ pushback(c);
+ return '!';
+
case '=':
if (was_bol()) {
/* skip embedded rd document */
@@ -3439,8 +3428,30 @@
lex_p = lex_pend;
goto retry;
}
- break;
}
+
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ if ((c = nextc()) == '=') {
+ if ((c = nextc()) == '=') {
+ return tEQQ;
+ }
+ pushback(c);
+ return tEQ;
+ }
+ if (c == '~') {
+ return tMATCH;
+ }
+ else if (c == '>') {
+ return tASSOC;
+ }
+ pushback(c);
+ return '=';
+
case '<':
c = nextc();
if (c == '<' &&
@@ -3452,8 +3463,53 @@
int token = heredoc_identifier();
if (token) return token;
}
- pushback(c);
- break;
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ if (c == '=') {
+ if ((c = nextc()) == '>') {
+ return tCMP;
+ }
+ pushback(c);
+ return tLEQ;
+ }
+ if (c == '<') {
+ if ((c = nextc()) == '=') {
+ yylval.id = tLSHFT;
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ pushback(c);
+ return tLSHFT;
+ }
+ pushback(c);
+ return '<';
+
+ case '>':
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ if ((c = nextc()) == '=') {
+ return tGEQ;
+ }
+ if (c == '>') {
+ if ((c = nextc()) == '=') {
+ yylval.id = tRSHFT;
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ pushback(c);
+ return tRSHFT;
+ }
+ pushback(c);
+ return '>';
+
case '"':
lex_strterm = NEW_STRTERM(str_dquote, '"', 0);
return tSTRING_BEG;
@@ -3491,12 +3547,24 @@
if (!IS_ARG()){
int c2 = 0;
switch (c) {
- case ' ': c2 = 's'; break;
- case '\n': c2 = 'n'; break;
- case '\t': c2 = 't'; break;
- case '\v': c2 = 'v'; break;
- case '\r': c2 = 'r'; break;
- case '\f': c2 = 'f'; break;
+ case ' ':
+ c2 = 's';
+ break;
+ case '\n':
+ c2 = 'n';
+ break;
+ case '\t':
+ c2 = 't';
+ break;
+ case '\v':
+ c2 = 'v';
+ break;
+ case '\r':
+ c2 = 'r';
+ break;
+ case '\f':
+ c2 = 'f';
+ break;
}
if (c2) {
rb_warn("invalid character syntax; use ?\\%c", c2);
@@ -3521,193 +3589,142 @@
lex_state = EXPR_END;
yylval.node = NEW_LIT(INT2FIX(c));
return tINTEGER;
- case '+': case '-':
- if (lex_state == EXPR_FNAME || lex_state == EXPR_DOT) break;
- c2 = nextc();
- if (lex_state == EXPR_BEG || lex_state == EXPR_MID ||
- (IS_ARG() && space_seen && !ISSPACE(c2))) {
- if (IS_ARG()) arg_ambiguous();
+
+ case '&':
+ if ((c = nextc()) == '&') {
lex_state = EXPR_BEG;
- pushback(c2);
- if (ISDIGIT(c2)) {
- if (c == '+') goto start_num; else return tUMINUS_NUM;
- }
- return (c == '+') ? tUPLUS : tUMINUS;
- }
- pushback(c2);
- break;
- case '/':
- if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
- lex_strterm = NEW_STRTERM(str_regexp, '/', 0);
- return tREGEXP_BEG;
- }
- c = nextc();
- pushback(c);
- if (c == '=') break;
- if (IS_ARG() && space_seen && (!ISSPACE(c))) { /* && (c != '=')) { */
- arg_ambiguous();
- lex_strterm = NEW_STRTERM(str_regexp, '/', 0);
- return tREGEXP_BEG;
+ if ((c = nextc()) == '=') {
+ yylval.id = tANDOP;
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
}
- break;
- /* needed?
- case '.':
- c = nextc();
+ pushback(c);
+ return tANDOP;
+ }
+ else if (c == '=') {
+ yylval.id = '&';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
pushback(c);
- if (ISDIGIT(c)) {
- yyerror("no .<digit> floating literal anymore; put 0 before dot");
+ if (IS_ARG() && space_seen && !ISSPACE(c)){
+ rb_warning("`&' interpreted as argument prefix");
+ c = tAMPER;
}
- */
- case '%':
- if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
- int term;
- int paren;
+ else if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
+ c = tAMPER;
+ }
+ else {
+ c = '&';
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG;
+ }
+ return c;

- c = nextc();
- quotation:
- if (!ISALNUM(c)) {
- term = c;
- c = 'Q';
+ case '|':
+ if ((c = nextc()) == '|') {
+ lex_state = EXPR_BEG;
+ if ((c = nextc()) == '=') {
+ yylval.id = tOROP;
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
}
- else {
- term = nextc();
- if (ISALNUM(term) || ismbchar(term)) {
- yyerror("unknown type of %string");
- return 0;
- }
+ pushback(c);
+ return tOROP;
+ }
+ if (c == '=') {
+ yylval.id = '|';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ if (lex_state == EXPR_FNAME || lex_state == EXPR_DOT) {
+ lex_state = EXPR_ARG;
+ }
+ else {
+ lex_state = EXPR_BEG;
+ }
+ pushback(c);
+ return '|';
+
+ case '+':
+ c = nextc();
+ if (lex_state == EXPR_FNAME || lex_state == EXPR_DOT) {
+ lex_state = EXPR_ARG;
+ if (c == '@') {
+ return tUPLUS;
}
- if (c == -1 || term == -1) {
- rb_compile_error("unterminated quoted string meets end of file");
- return 0;
+ pushback(c);
+ return '+';
+ }
+ if (c == '=') {
+ yylval.id = '+';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ if (lex_state == EXPR_BEG || lex_state == EXPR_MID ||
+ (IS_ARG() && space_seen && !ISSPACE(c))) {
+ if (IS_ARG()) arg_ambiguous();
+ lex_state = EXPR_BEG;
+ pushback(c);
+ if (ISDIGIT(c)) {
+ c = '+';
+ goto start_num;
}
- paren = term;
- if (term == '(') term = ')';
- else if (term == '[') term = ']';
- else if (term == '{') term = '}';
- else if (term == '<') term = '>';
- else paren = 0;
-
- switch (c) {
- case 'Q':
- lex_strterm = NEW_STRTERM(str_dquote, term, paren);
- return tSTRING_BEG;
-
- case 'q':
- lex_strterm = NEW_STRTERM(str_squote, term, paren);
- return tSTRING_BEG;
-
- case 'W':
- lex_strterm = NEW_STRTERM(str_dquote | STR_FUNC_QWORDS, term, paren);
- do {c = nextc();} while (ISSPACE(c));
- pushback(c);
- return tWORDS_BEG;
-
- case 'w':
- lex_strterm = NEW_STRTERM(str_squote | STR_FUNC_QWORDS, term, paren);
- do {c = nextc();} while (ISSPACE(c));
- pushback(c);
- return tQWORDS_BEG;
-
- case 'x':
- lex_strterm = NEW_STRTERM(str_xquote, term, paren);
- return tXSTRING_BEG;
-
- case 'r':
- lex_strterm = NEW_STRTERM(str_regexp, term, paren);
- return tREGEXP_BEG;
+ return tUPLUS;
+ }
+ lex_state = EXPR_BEG;
+ pushback(c);
+ return '+';

- case 's':
- lex_strterm = NEW_STRTERM(str_ssym, term, paren);
- lex_state = EXPR_FNAME;
- return tSYMBEG;
+ case '-':
+ c = nextc();
+ if (lex_state == EXPR_FNAME || lex_state == EXPR_DOT) {
+ lex_state = EXPR_ARG;
+ if (c == '@') {
+ return tUMINUS;
+ }
+ pushback(c);
+ return '-';
+ }
+ if (c == '=') {
+ yylval.id = '-';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ if (lex_state == EXPR_BEG || lex_state == EXPR_MID ||
+ (IS_ARG() && space_seen && !ISSPACE(c))) {
+ if (IS_ARG()) arg_ambiguous();
+ lex_state = EXPR_BEG;
+ pushback(c);
+ if (ISDIGIT(c)) {
+ return tUMINUS_NUM;
+ }
+ return tUMINUS;
+ }
+ lex_state = EXPR_BEG;
+ pushback(c);
+ return '-';

- default:
- yyerror("unknown type of %string");
- return 0;
+ case '.':
+ lex_state = EXPR_BEG;
+ if ((c = nextc()) == '.') {
+ if ((c = nextc()) == '.') {
+ return tDOT3;
}
+ pushback(c);
+ return tDOT2;
}
- c = nextc();
- if (IS_ARG() && space_seen && !ISSPACE(c)) goto quotation;
pushback(c);
- break;
- }
-
- pushback(c);
- int op_len = 0;
- while (lex_p+op_len < lex_pend && is_operator_character(c = lex_p[op_len])) {
- op_len++;
- }
- if (op_len > 0) {
- c = lex_p[op_len-1];
- if (op_len > 1 && (c == '-' || c == '+') && !(ISSPACE(lex_p[op_len]) || lex_p+op_len >= lex_pend) ) {
- rb_warn("operators ending in +/- should be space delimited");
- op_len--;
- }
- }
- pre_op_state = lex_state;
- switch (lex_state) {
- case EXPR_FNAME: case EXPR_DOT:
- post_op_state = EXPR_ARG; break;
- default:
- post_op_state = EXPR_BEG; break;
- }
- if (is_operator(op_len,3,"**=", EXPR_BEG )) { return assignment_operator(tPOW); }
- if (is_operator(op_len,2,"**", post_op_state)) { return tPOW; }
- if (is_operator(op_len,2,"*=", EXPR_BEG )) { return assignment_operator('*'); }
- if (is_operator(op_len,1,"*", post_op_state)) { return operand_prefix_warn(AMBI_ARG(),pre_op_state,'*',tSTAR); }
- if (is_operator(op_len,2,"!=", EXPR_BEG )) { return tNEQ; }
- if (is_operator(op_len,2,"!~", EXPR_BEG )) { return tMATCH; }
- if (is_operator(op_len,1,"!", EXPR_BEG )) { return '!'; }
- if (is_operator(op_len,2,"=~", post_op_state)) { return tMATCH; }
- if (is_operator(op_len,2,"=>", post_op_state)) { return tASSOC; }
- if (is_operator(op_len,3,"===", post_op_state)) { return tEQQ; }
- if (is_operator(op_len,2,"==", post_op_state)) { return tEQ; }
- if (is_operator(op_len,1,"=", post_op_state)) { return '='; }
- if (is_operator(op_len,2,"=%", post_op_state)) { return space_op_warn(1,'='); }
- if (is_operator(op_len,3,"<=>", post_op_state)) { return tCMP; }
- if (is_operator(op_len,2,"<=", post_op_state)) { return tLEQ; }
- if (is_operator(op_len,3,"<<=", EXPR_BEG )) { return assignment_operator(tLSHFT); }
- if (is_operator(op_len,2,"<<", post_op_state)) { return tLSHFT; }
- if (is_operator(op_len,1,"<", post_op_state)) { return '<'; }
- if (is_operator(op_len,2,">=", post_op_state)) { return tGEQ; }
- if (is_operator(op_len,3,">>=", EXPR_BEG )) { return assignment_operator(tRSHFT); }
- if (is_operator(op_len,2,">>", post_op_state)) { return tRSHFT; }
- if (is_operator(op_len,1,">", post_op_state)) { return '>'; }
- if (is_operator(op_len,3,"&&=", EXPR_BEG )) { return assignment_operator(tANDOP); }
- if (is_operator(op_len,2,"&&", post_op_state)) { return tANDOP; }
- if (is_operator(op_len,2,"&=", EXPR_BEG )) { return assignment_operator('&'); }
- if (is_operator(op_len,1,"&", post_op_state)) { return operand_prefix_warn(AMBI_ARG(),pre_op_state,'&',tAMPER); }
- if (is_operator(op_len,3,"||=", EXPR_BEG )) { return assignment_operator(tOROP); }
- if (is_operator(op_len,2,"||", post_op_state)) { return tOROP; }
- if (is_operator(op_len,2,"|=", EXPR_BEG )) { return assignment_operator('|'); }
- if (is_operator(op_len,1,"|", post_op_state)) { return '|'; }
- if (is_operator(op_len,2,"+@", EXPR_ARG )) { return tUPLUS; }
- if (is_operator(op_len,2,"+=", EXPR_BEG )) { return assignment_operator('+'); }
- if (is_operator(op_len,1,"+", post_op_state)) { return '+'; }
- if (is_operator(op_len,2,"-@", EXPR_ARG )) { return tUMINUS; }
- if (is_operator(op_len,2,"-=", EXPR_BEG )) { return assignment_operator('-'); }
- if (is_operator(op_len,1,"-", post_op_state)) { return '-'; }
- if (is_operator(op_len,2,"/=", EXPR_BEG )) { return assignment_operator('/'); }
- if (is_operator(op_len,1,"/", post_op_state)) { return '/'; }
- if (is_operator(op_len,2,"^=", EXPR_BEG )) { return assignment_operator('^'); }
- if (is_operator(op_len,1,"^", post_op_state)) { return '^'; }
- if (is_operator(op_len,2,"~@", post_op_state)) { return '~'; }
- if (is_operator(op_len,1,"~", post_op_state)) { return '~'; }
- if (is_operator(op_len,3,"...", EXPR_BEG )) { return tDOT3; }
- if (is_operator(op_len,2,"..", EXPR_BEG )) { return tDOT2; }
- if (is_operator(op_len,1,".", EXPR_DOT )) { return '.'; }
- if (is_operator(op_len,2,"%=", EXPR_BEG )) { return assignment_operator('%'); }
- if (is_operator(op_len,1,"%", post_op_state)) { return '%'; }
- if (op_len > 0) {
- int i=0;
- newtok();
- for (i=0;i<op_len;i++) tokadd(nextc());
- tokfix();
- yylval.id = rb_intern(tok());
- lex_state = post_op_state;
- return tGENERIC_OP;
- }
- switch (c = nextc()) {
+ if (ISDIGIT(c)) {
+ yyerror("no .<digit> floating literal anymore; put 0 before dot");
+ }
+ lex_state = EXPR_DOT;
+ return '.';
+
start_num:
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
@@ -3964,12 +3981,67 @@
lex_state = EXPR_FNAME;
return tSYMBEG;

+ case '/':
+ if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
+ lex_strterm = NEW_STRTERM(str_regexp, '/', 0);
+ return tREGEXP_BEG;
+ }
+ if ((c = nextc()) == '=') {
+ yylval.id = '/';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ pushback(c);
+ if (IS_ARG() && space_seen) {
+ if (!ISSPACE(c)) {
+ arg_ambiguous();
+ lex_strterm = NEW_STRTERM(str_regexp, '/', 0);
+ return tREGEXP_BEG;
+ }
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ return '/';
+
+ case '^':
+ if ((c = nextc()) == '=') {
+ yylval.id = '^';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ pushback(c);
+ return '^';
+
case ';':
command_start = Qtrue;
case ',':
lex_state = EXPR_BEG;
return c;

+ case '~':
+ if (lex_state == EXPR_FNAME || lex_state == EXPR_DOT) {
+ if ((c = nextc()) != '@') {
+ pushback(c);
+ }
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ return '~';
+
case '(':
command_start = Qtrue;
if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
@@ -4034,6 +4106,91 @@
pushback(c);
return '\\';

+ case '%':
+ if (lex_state == EXPR_BEG || lex_state == EXPR_MID) {
+ int term;
+ int paren;
+
+ c = nextc();
+ quotation:
+ if (!ISALNUM(c)) {
+ term = c;
+ c = 'Q';
+ }
+ else {
+ term = nextc();
+ if (ISALNUM(term) || ismbchar(term)) {
+ yyerror("unknown type of %string");
+ return 0;
+ }
+ }
+ if (c == -1 || term == -1) {
+ rb_compile_error("unterminated quoted string meets end of file");
+ return 0;
+ }
+ paren = term;
+ if (term == '(') term = ')';
+ else if (term == '[') term = ']';
+ else if (term == '{') term = '}';
+ else if (term == '<') term = '>';
+ else paren = 0;
+
+ switch (c) {
+ case 'Q':
+ lex_strterm = NEW_STRTERM(str_dquote, term, paren);
+ return tSTRING_BEG;
+
+ case 'q':
+ lex_strterm = NEW_STRTERM(str_squote, term, paren);
+ return tSTRING_BEG;
+
+ case 'W':
+ lex_strterm = NEW_STRTERM(str_dquote | STR_FUNC_QWORDS, term, paren);
+ do {c = nextc();} while (ISSPACE(c));
+ pushback(c);
+ return tWORDS_BEG;
+
+ case 'w':
+ lex_strterm = NEW_STRTERM(str_squote | STR_FUNC_QWORDS, term, paren);
+ do {c = nextc();} while (ISSPACE(c));
+ pushback(c);
+ return tQWORDS_BEG;
+
+ case 'x':
+ lex_strterm = NEW_STRTERM(str_xquote, term, paren);
+ return tXSTRING_BEG;
+
+ case 'r':
+ lex_strterm = NEW_STRTERM(str_regexp, term, paren);
+ return tREGEXP_BEG;
+
+ case 's':
+ lex_strterm = NEW_STRTERM(str_ssym, term, paren);
+ lex_state = EXPR_FNAME;
+ return tSYMBEG;
+
+ default:
+ yyerror("unknown type of %string");
+ return 0;
+ }
+ }
+ if ((c = nextc()) == '=') {
+ yylval.id = '%';
+ lex_state = EXPR_BEG;
+ return tOP_ASGN;
+ }
+ if (IS_ARG() && space_seen && !ISSPACE(c)) {
+ goto quotation;
+ }
+ switch (lex_state) {
+ case EXPR_FNAME: case EXPR_DOT:
+ lex_state = EXPR_ARG; break;
+ default:
+ lex_state = EXPR_BEG; break;
+ }
+ pushback(c);
+ return '%';
+
case '$':
lex_state = EXPR_END;
newtok();
@@ -4886,7 +5043,6 @@
case '^':
case '&':
case tCMP:
- case tGENERIC_OP:
case '>':
case tGEQ:
case '<':

--=-bGfylPCk97kwreyt0ejI--

Bill Atkins · Oct 7, 2004

It's got its own whims? I'd be careful with that code.

Markus · Oct 7, 2004

It's got its own whims? I'd be careful with that code.

It's working now (see the ANNouncement a few messages back). I
swear C is for masochists. It turned out the code didn't have its own
whims after all; it was using a ill-initialized pointer to access
something else's whims.

-- Markus

Markus · Oct 7, 2004

Shoot.

It doesn't correctly distinguish multiple user-defined operators on one
class. I know what the problem is, and tonight I'll try to figure out
how to fix it.

-- Markus

Yukihiro Matsumoto · Oct 8, 2004

Hi,

In message "Re: Range behavior (Re: [RCR] New [] Semantics)"

|A BIG question that comes to mind after reading your post is this: What do
|people usually use Range for? Is it continuous or discrete? I know there is
|some of both, but I wonder if one far exceeds the other. Hopefully yes, b/c I
|think the solution must basically fall to something like this:

<snip>

Define the problem to solve. What do you want to fix?

Ranges serving both continuous and discrete? Or "member?" and
"include?" behaving differently? Or having fun with designing
ranges? Or something else?

For your note, I refuse to separate continuous ranges and discrete
ranges. In Ruby, I'd rather choose classes to have multiple purpose.
For example, arrays can be served as stack, queue, etc. I feel same
to ranges too.

I thought "member?" (which means item is included in the member set of
enumeration) and "include?" (which means item is included between
upper and lower bound) can be easily distinguished.

I have several ideas, but I have to know what is the problem first.

matz.

Randy W. Sims · Oct 8, 2004

Hi,

In message "Re: Range behavior (Re: [RCR] New [] Semantics)"

|A BIG question that comes to mind after reading your post is this: What do
|people usually use Range for? Is it continuous or discrete? I know there is
|some of both, but I wonder if one far exceeds the other. Hopefully yes, b/c I
|think the solution must basically fall to something like this:

<snip>

Define the problem to solve. What do you want to fix?

Ranges serving both continuous and discrete? Or "member?" and
"include?" behaving differently? Or having fun with designing
ranges? Or something else?

For your note, I refuse to separate continuous ranges and discrete
ranges. In Ruby, I'd rather choose classes to have multiple purpose.
For example, arrays can be served as stack, queue, etc. I feel same
to ranges too.

I thought "member?" (which means item is included in the member set of
enumeration) and "include?" (which means item is included between
upper and lower bound) can be easily distinguished.

I have several ideas, but I have to know what is the problem first.

FWIW, I like the current implementation.

I think it is important to be able to express ideas like those being
expressed here, and it's great to be able to make suggestions and have a
conversation with the language author. Unfortunately, a lot of people
(myself included) sometimes get in a mode where we over design things.
We start thinking hypothetically and abstractly about things that would
be nice to have. But most of them are not strictly necessary . A lot of
the suggestions I've seen on this and other lang forums, especially in
"scripting" languages suffer this problem.

I've always liked C++'s philosophy of a complete, but minimal core. It
keeps things simple, fewer points of failure, easy to understand and
learn, etc. But it's hard to find a balance. Some simple additions like
a ++ operator are used so much they are useful to the language. But some
operators that are just as simple (implementation wise) are just not
useful enough to be implemented. This seems to be a problem, for
example, with Perl6 IMHO. An example might be an xor operator (^^). It
might be useful and expressive in some situations, but it's probably not
useful often enough to be worth the cost of having yet another operator,
and it also has an easy workaround ((A || B) && !(A && B)).

The same thing could be argued about continuous ranges. Workaround are easy:

irb(main):001:0> rng = 0..9
=> 0..9
irb(main):002:0> 0.42 > rng.first && 0.42 < rng.last
=> true

And I'm not sure they're used often enough to warrant more.

Just MHO,
Randy.

trans. (T. Onoma) · Oct 8, 2004

On Thursday 07 October 2004 08:08 pm, Yukihiro Matsumoto wrote:
| Define the problem to solve. What do you want to fix?
|
| Ranges serving both continuous and discrete? Or "member?" and
| "include?" behaving differently? Or having fun with designing
| ranges? Or something else?

Well, it might look like the later, b/c I am also "exploring". But also b/c it
would be nice to solve as many outstanding issues at once rather then just
one at the time --b/c one might effect the another, and you would end up
changing them again later (or be learn to live with problems).

| For your note, I refuse to separate continuous ranges and discrete
| ranges. In Ruby, I'd rather choose classes to have multiple purpose.
| For example, arrays can be served as stack, queue, etc. I feel same
| to ranges too.
|
| I thought "member?" (which means item is included in the member set of
| enumeration) and "include?" (which means item is included between
| upper and lower bound) can be easily distinguished.
|
| I have several ideas, but I have to know what is the problem first.

Okay. In summary the problem centers around how to to define #===. When you
think about that you realize Range is serving two purposes which are not
perfectly compatible:

1) a discrete iterative set where .succ on first is used
as generator and membership is defined like

val <=> succ </<= last

2) an interval where membership is determined like

val <=> first >/>= 0 && val <=> last </<= 0

Another problem is that numeric ranges, which are likely the most commonly
used by large margin, are *very* slow to determine #member?, --but they don't
need to be slow.

There are a couple of other small things that would be nice, but these are the
main important ones (IMHO) that I'm interested in solving. Brian and Markus
may have other ideas.

T.

RCR 13	0	Jun 27, 2007
Suggestions for new RCR process	0	Jan 12, 2007
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Take indices of non zero elements of matrix	1	Jun 15, 2022
The Semantics of 'volatile'	73	Jun 2, 2009
Tic Tac Toe Game	2	Mar 10, 2024
Potential RCR: method_missing convention, opinions?	2	Jan 3, 2006
First time question	1	Dec 13, 2022

[RCR] New [] Semantics

Martin DeMello

trans. (T. Onoma)

Brian Candler

Markus

trans. (T. Onoma)

Ara.T.Howard

Brian Candler

trans. (T. Onoma)

trans. (T. Onoma)

Markus

Markus

Ara.T.Howard

trans. (T. Onoma)

Markus

Bill Atkins

Markus

Markus

Yukihiro Matsumoto

Randy W. Sims

trans. (T. Onoma)

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads