RCR 296: Destructive methods return self

Nikolai Weibull · Mar 20, 2005

* ES (Mar 20, 2005 18:10):

Ropes are used fairly widely and would be a good solution to this
problem. In addition, I see no reason why Strings and Ropes could not
use the same idiom as Fixnum and Bignum, where the former is
automatically converted (but the latter may also be explicitly
instantiated).

That's a very good point. There is definitely some sort of correlation
between the relationship between Fixnum and Bignum and that between
String and Rope.

Interesting... I certainly see the validity of your call to optimize
the nondestructive operations; I'm sure the performance can be
reasonably close to destructive when properly designed. However, it
will still have an impact in _some_ programs even when we're dealing
with these minimal differences.

And I still assert that it's conceptually better to s.chomp! than to s
= s.chomp

Definitely. But that's due to the fact that Strings are implemented as
a sequence of bytes, where we can simply decrement RSTRING(str)->len and
write a '\0' at RSTRING(str)->ptr[RSTRING(str)->len - 1]. It's
interesting to note that chop! returns self, so that it may easily be
chained with other methods,
nikolai

David A. Black · Mar 20, 2005

Hi --

Sure there is: a Ruby implementation should implement the Ruby language as
currently defined. What I'm saying is that there is nothing in the
definition of the Ruby language that forces an implementor to use contiguous
blocks of memory for strings or arrays. Managing large strings as linked
chunks is something text editors already do for performance. Assertions
about how strings should or should not be used are based, in part, on
assumptions that should be questioned.

I would rather say:
The most widely used implementation is the de facto standard.

I would say that Thing X is a Ruby implementation iff Matz says it is

In the absence of a written standard, there's no other real
benchmark -- except, perhaps, passing all tests in the Ruby test
suite, which I think anything claiming to be a full Ruby
implementation would clearly have to do.

David

Yukihiro Matsumoto · Mar 22, 2005

Hi,

In message "Re: RCR 296: Destructive methods return self"

|Perhaps a more interesting solution would be to get rid of all
|destructive methods and work on optimizing ther existing
|non-destructive counterparts. Matz seems to prefer this solution.

I don't like this proposal, since it loose some feature (modify
detection) and there's no way to recover without serious performance
penalty, which makes no sense of bang methods usage.

Alternative plans:

(1) remove bang methods altogether.

(2) current bang methods are bad because they work most of the
cases, and fail if no change has made. thus let us make bang
methods return something other than self, boolean for example,
to fail always. this forces no chaining of bang methods.

(3) introduce "real" multiple return values ala common lisp, and let
the first returned value as the receiver and the second return
value is boolean value to denote success/failure.

(4) add some kind of reference counting, and if the receiver is
referenced from only one place, modify the receiver in place, to
gain performance.

matz.

Glenn Parker · Mar 22, 2005

Yukihiro said:
Alternative plans:

(1) remove bang methods altogether.

Backwards compatibility is lost.

(2) current bang methods are bad because they work most of the
cases, and fail if no change has made. thus let us make bang
methods return something other than self, boolean for example,
to fail always. this forces no chaining of bang methods.

Tolerable, but only because it removes the element of surprise.

(3) introduce "real" multiple return values ala common lisp, and let
the first returned value as the receiver and the second return
value is boolean value to denote success/failure.

Ick. Ever wonder why other languages don't do this?

(4) add some kind of reference counting, and if the receiver is
referenced from only one place, modify the receiver in place, to
gain performance.

Oooo, pretty! Seems like it might defeat the goal of optimizing things,
unless the reference counting could reduce the overall impact of garbage
collection.

When I'm playing with simple algorithms, disabling garbage collection
seems to yield a guaranteed 20% speedup, even for relatively short tasks
(< 30 sec.).

Another option:

(5) add alternative destructive methods with different names that
always return self.

This is the change with the least impact, but I realize it is ugly.

Daniel Amelang · Mar 22, 2005

Yes, I am liking the proposal less and less as time goes on. And I'm
the one who wrote it!

I'm *am* liking the (1) remove bang methods altogether idea more and
more. It has the positive (minor) side effect of reducing the number
of instance methods for String and Array, which is nice. And of
course, it eliminates the chaining confusion, since bang methods are
no longer an option. Losing the bang methods has also proven to be
only a minor loss in the efficiency category (we could live without
them).

But, this alone doesn't solve the problem with the modification
detection. Is it _that_ common? I've heard only one person complain so
far. Yet, you (matz) seem to be concerned about losing that
functionality, so I'll be concerned too.

Will (2) really solve the problem, or just mitigate it slightly?

(3) is an interesting solution to the detection problem. Perhaps
having real multiple return values will solve some other issues in
Rubyland also (anyone?). Using a modified version of the 'strip'
method, let me illustrate a minor modification in the language that
allows for easier use of multiple return values:

class String
# The new strip returns both the string result and
# a flag that some change occured.
def strip
...
return result, changed
end
end

# This works now:
str, changed = "hello ".strip

# But currently, when you only want the string result, you have to do this:
str, _ = "hello ".strip

# Because this:
str = "hello ".strip

# gives you an array

# Why not make the minor change such that this:
*str = "hello ".strip

# gives you the array and this:

str = "hello ".strip

only gives you the _first_ of the multiple return values. That way we can return
multiple return values without requiring the receivers to use _ all the time
to throw away the rest.

I'm done. Thanks for taking the time to consider my proposal.

Dan

David A. Black · Mar 22, 2005

Hi --

(3) is an interesting solution to the detection problem. Perhaps
having real multiple return values will solve some other issues in
Rubyland also (anyone?). Using a modified version of the 'strip'
method, let me illustrate a minor modification in the language that
allows for easier use of multiple return values:

class String
# The new strip returns both the string result and
# a flag that some change occured.
def strip
...
return result, changed
end
end

# This works now:
str, changed = "hello ".strip

# But currently, when you only want the string result, you have to do this:
str, _ = "hello ".strip

# Because this:
str = "hello ".strip

# gives you an array

Actually you can just do:

str, = "hello ".strip

to assign the first array element to str. That's a nice construct,
but it would extremely annoying and feels kind of ad hoc to require it
for every strip operation (not to mention sub, gsub, reverse, etc.)

Maybe we need StripData, SubData, etc., like MatchData... (No, not
really

David

Navindra Umanee · Mar 22, 2005

Glenn Parker said:
Oooo, pretty! Seems like it might defeat the goal of optimizing things,
unless the reference counting could reduce the overall impact of garbage
collection.

When I'm playing with simple algorithms, disabling garbage collection
seems to yield a guaranteed 20% speedup, even for relatively short tasks
(< 30 sec.).

Interesting observation and benchmark! It might be neat to have
further details on how you are testing this. Or is this measurement
done on an actual app that you have deployed?

I guess it might make sense for a Ruby app to disable GC and invoke it
when really necessary in a speed-critical situation.

Cheers,
Navin.

Jeremy Tregunna · Mar 22, 2005

Interesting observation and benchmark! It might be neat to have
further details on how you are testing this. Or is this measurement
done on an actual app that you have deployed?

I guess it might make sense for a Ruby app to disable GC and invoke it
when really necessary in a speed-critical situation.

Or maybe it would be wiser to move to an incremental GC, where the
overall impact of the GC is negligable.

Hal Fulton · Mar 22, 2005

Obviously I'm only expressing my opinion here, which often
is nearly worthless:

Yukihiro said:
Alternative plans:

(1) remove bang methods altogether.

"The cradle is too short. Let's cut off the baby's feet."

(2) current bang methods are bad because they work most of the
cases, and fail if no change has made. thus let us make bang
methods return something other than self, boolean for example,
to fail always. this forces no chaining of bang methods.

Again, please no. I like chaining.

(3) introduce "real" multiple return values ala common lisp, and let
the first returned value as the receiver and the second return
value is boolean value to denote success/failure.

Interesting, but only if "the common case is the prettier one" and
back compatibility is maintained. I don't want to go sprinkling
commas and asterisks through old code.

(4) add some kind of reference counting, and if the receiver is
referenced from only one place, modify the receiver in place, to
gain performance.

Hmm, does this work? Would there be times it would not be obvious whether we
were changing the original object or not? If so, unacceptable.

I would almost suggest what I once suggested as a joke: Combine the ! and ?
suffixes.

gsub! returns self
gsub!? returns self or nil (yes, it looks silly)

Or alternatively:

gsub! returns self or nil as now
gsub_! returns self (yes, it's ugly)

Or perhaps:

Give the objects a "changed?" flag. Every bang method is expected to
set it...

obj.gsub!(...) # returns self and sets flag
if obj.changed? then... # not thread-friendly, but I could live with it

Is that really an expensive solution?

Hal

Bill Kelly · Mar 22, 2005

From: "Hal Fulton said:
I would almost suggest what I once suggested as a joke: Combine the ! and ?
suffixes.

gsub! returns self
gsub!? returns self or nil (yes, it looks silly)

It looks a little silly, but it seems pretty straightforward
to read and understand: "Oh, it's gsub-bang with a question."

The occasions when I've cared about the flag returned by the
bang methods has typically been associated with some conditional,
like:

save(data) if data.gsub!?(/foo/, bar)

I.e. the ? is not too far from the 'if' so maybe reduces its
silliness slightly by giving it something to relate to visually?

Doesn't seem too heinous to me.

Regards,

Bill

Mathieu Bouchard · Mar 22, 2005

Ick. Ever wonder why other languages don't do this?

Perl shows some cases of LISP-style multiple return values. Especially,
functions can detect whether they're in scalar-context or list-context,
and some of them may return ($a,$b,$c) in list-context while returning
just $a in scalar-context.

The biggest problem with LISP must be that the form is called
(GET-HIDDEN-RETURNED-MULTIPLE-VALUE-DATA-STUFF-USING-LENGTHY-MACRO-NAME)

The most similar thing in Ruby may be that you can do:

a, = foo()
a,b,c = foo()

But there are so many things wrong with it that its usefulness is very
limited. It's too far from being the thing.

Oooo, pretty! Seems like it might defeat the goal of optimizing
things, unless the reference counting could reduce the overall impact
of garbage collection.

There's a reference-counting system that only uses two bits (and so could
fit in the flags section of a Ruby boxed-object) and has four different
values: 1, 2, 3, and "more". This is based on the fact that most objects
have very few references to them. Then, for the "more" case, a
Mark-and-sweep can do the job, and you don't have to run it nearly as
often as if the Mark-and-sweep had to do all the job. However, I'm not
sure it's *that* good. Plus, I don't know what it would do to the
Ruby-FFI. I've seen Python-FFI C code, and INCREF DECREF INCREF DECREF is
about as annoying as Python's tab tab tab self self self () () () or
Ruby's end end end end end. I'd rather just define mark() and free().

When I'm playing with simple algorithms, disabling garbage collection
seems to yield a guaranteed 20% speedup, even for relatively short tasks
(< 30 sec.).

At some point there was a big problem in Ruby dealing with large numbers
of objects. I don't recall which version got it fixed but I'm pretty sure
it wasn't there before 1.8.0. If you happen to still run Ruby 1.6.8 for
example, then I think you don't have it. If the total number of objects
got very large, this meant a *tremendous* speedup.

(5) add alternative destructive methods with different names that
always return self.
This is the change with the least impact, but I realize it is ugly.

I know how to solve the problem!!!!!!!!

add support for multiple bangs signs at the end of methods!!!!!!!

like, foo.gsub!!!!!!!!!!!!!!!!!!!

this is so ELiTE!!!!!!!!!!!!!!111

W00t!!!!!!!!!!!!!1

ok, sorry for this one.

_____________________________________________________________________
Mathieu Bouchard -=- Montréal QC Canada -=- http://artengine.ca/matju

Patrick Hurley · Mar 22, 2005

(2) current bang methods are bad because they work most of the

Again, please no. I like chaining.

I am curious, it is not the case that chaining the bang functions is
generally risky? I am sure that there are cases where you know the
input well enough that this is not a problem. But for the vast
majority of cases I always use the "non-bang" method (I like chaining
too) it is safter and the performance is not that bad.

Of the choices I think this one would cause the fewest problems -- it
is a least surprise sort of thing. For new users (of which I am) - it
is often a surprise the first few times a ruby program dies on an
undefined method for Nil class on a chain. (Yes I know we should read
the docs better)

Glenn Parker · Mar 22, 2005

Hal said:
I would almost suggest what I once suggested as a joke: Combine the ! and ?
suffixes.

gsub! returns self
gsub!? returns self or nil (yes, it looks silly)

Not all that silly looking, IMHO.

Malte Milatz · Mar 22, 2005

Christian Neukirchen:

I actually like that.

+1

Malte

Florian Gross · Mar 22, 2005

Daniel said:
# Why not make the minor change such that this:
*str = "hello ".strip

# gives you the array and this:

str = "hello ".strip

only gives you the _first_ of the multiple return values. That way we can return
multiple return values without requiring the receivers to use _ all the time
to throw away the rest.

Of course that would be nice, but with Ruby's current multi-return-value
semantics it is not possible. If the above where to be true then this
would also be true:

a = [1, 2, 3]
a # => 1

And you'll have to agree that that won't work.

Perhaps it's finally time to introduce a Tuple class?

Ben Giddings · Mar 22, 2005

Yukihiro said:
(2) current bang methods are bad because they work most of the
cases, and fail if no change has made. thus let us make bang
methods return something other than self, boolean for example,
to fail always. this forces no chaining of bang methods.

I think this one is the most consistent way of doing things. In a way,
using chaining with receiver-modifying methods doesn't make sense. You
really don't want to modify the return value of the method, you want to
modify the original object. I think the main reaon people want to chain
bang methods is that they don't like having to type the variable a lot
of times.

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.strip!.downcase!

is easier to type than

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.strip!
the_config_string_for_foo.downcase!

But it seems to me that what people are really looking for is a way to
apply multiple methods to an object without having to retype it's name:

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.apply { strip!; downcase! }

This has been proposed a few times, I think. I can't remember how you
(Matz) felt about it. Is the reason Ruby doesn't have a way to do it
because you don't like the idea, or you were hoping for consensus on the
exact method of doing it?

(4) add some kind of reference counting, and if the receiver is
referenced from only one place, modify the receiver in place, to
gain performance.

This sounds like "more work for Matz", but also a really good solution.
Most of the time people only want bang methods for efficiency. If
that weren't an issue then it would be even easier to have the
bang-methods return boolean (modified / not modified) and then people
could choose the method based on what they were after (chaining vs. "was
something changed") and not based on efficiency.

Ben

ES · Mar 22, 2005

In data 3/22/2005 said:
Obviously I'm only expressing my opinion here, which often
is nearly worthless:

"The cradle is too short. Let's cut off the baby's feet."

Again, please no. I like chaining.

Interesting, but only if "the common case is the prettier one" and
back compatibility is maintained. I don't want to go sprinkling
commas and asterisks through old code.

Hmm, does this work? Would there be times it would not be obvious whether we
were changing the original object or not? If so, unacceptable.

I would almost suggest what I once suggested as a joke: Combine the ! and ?
suffixes.

gsub! returns self
gsub!? returns self or nil (yes, it looks silly)

+1 for interrobang!

Or alternatively:

gsub! returns self or nil as now
gsub_! returns self (yes, it's ugly)

Or perhaps:

Give the objects a "changed?" flag. Every bang method is expected to
set it...

obj.gsub!(...) # returns self and sets flag
if obj.changed? then... # not thread-friendly, but I could live with it

Is that really an expensive solution?

Hal

E

Csaba Henk · Mar 22, 2005

Obviously I'm only expressing my opinion here, which often
is nearly worthless:

"The cradle is too short. Let's cut off the baby's feet."

Again, please no. I like chaining.

Interesting, but only if "the common case is the prettier one" and
back compatibility is maintained. I don't want to go sprinkling
commas and asterisks through old code.

Hmm, does this work? Would there be times it would not be obvious whether we
were changing the original object or not? If so, unacceptable.

I would almost suggest what I once suggested as a joke: Combine the ! and ?
suffixes.

gsub! returns self
gsub!? returns self or nil (yes, it looks silly)

I also have a proposal which I tought to be half a joke, but since then
I realized I quite like it:

As we have now that upon writing

obj()

a call method is invoked implicitly if obj is a local var, why not to
have

obj{}

do an implicit call to instance_eval? It would make sense, of course,
only if you let it happen not only for local vars (which is a limitation
for the "obj()" stuff and I don't clearly see why).

So then you could do:

"aaa" { sub! /a/, "b"; chop! }

You couldn't do

"aaa" { sub! /a/, @b }

but I guess that's not's the hottest case. Usually when you feel like
chaining destructive methods you use literals, don't you?

Csaba

ES · Mar 22, 2005

I think this one is the most consistent way of doing things. In a way,
using chaining with receiver-modifying methods doesn't make sense. You
really don't want to modify the return value of the method, you want to
modify the original object.

That doesn't make any sense unless you mean that if I want 'hello' to
become 'Hello', these are really two objects instead of the latter
being a modification of the first? But even in that case, would you
say that modifying class Test's state should also create a new object?

I think the main reaon people want to chain
bang methods is that they don't like having to type the variable a lot
of times.

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.strip!.downcase!

is easier to type than

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.strip!
the_config_string_for_foo.downcase!

But it seems to me that what people are really looking for is a way to
apply multiple methods to an object without having to retype it's name:

the_config_string_for_foo = $stdin.gets
the_config_string_for_foo.apply { strip!; downcase! }

This has been proposed a few times, I think. I can't remember how you
(Matz) felt about it. Is the reason Ruby doesn't have a way to do it
because you don't like the idea, or you were hoping for consensus on the
exact method of doing it?

This sounds like "more work for Matz", but also a really good solution.
Most of the time people only want bang methods for efficiency. If
that weren't an issue then it would be even easier to have the
bang-methods return boolean (modified / not modified) and then people
could choose the method based on what they were after (chaining vs. "was
something changed") and not based on efficiency.

Ben

E

Yukihiro Matsumoto · Mar 22, 2005

Hi,

In message "Re: RCR 296: Destructive methods return self"

|But it seems to me that what people are really looking for is a way to
|apply multiple methods to an object without having to retype it's name:
|
|the_config_string_for_foo = $stdin.gets
|the_config_string_for_foo.apply { strip!; downcase! }
|
|This has been proposed a few times, I think. I can't remember how you
|(Matz) felt about it. Is the reason Ruby doesn't have a way to do it
|because you don't like the idea, or you were hoping for consensus on the
|exact method of doing it?

I want some syntactical support if cascading method call have to be
introduced, not by a method (except for instance_eval, of course).

matz.

Ruby Weekly News 2nd - 15th May 2005	14	May 16, 2005
Mind Control "mailteam" works-- victims work trends	0	Feb 12, 2008
Replies to Seebach - attempting to post to clc moderated	5	Sep 11, 2009
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
[SUMMARY] 1-800-THE-QUIZ (#20)	7	Feb 24, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
Ruby Weekly News 21st - 27th March 2005	16	Mar 28, 2005
humanities.philosophy.objectivism Administration FAQ, v1.37.01	2	May 8, 2006

RCR 296: Destructive methods return self

Nikolai Weibull

David A. Black

Yukihiro Matsumoto

Glenn Parker

Daniel Amelang

David A. Black

Navindra Umanee

Jeremy Tregunna

Hal Fulton

Bill Kelly

Mathieu Bouchard

Patrick Hurley

Glenn Parker

Malte Milatz

Florian Gross

Ben Giddings

ES

Csaba Henk

ES

Yukihiro Matsumoto

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads