#map, #select semantics

J

James Coglan

[Note: parts of this message were removed to make it a legal post.]

I imagine this has come up before, though I can't find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they're called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

1. What do they mean when applied to a Hash? Do you want a list of values,
or keys, or pairs? Or do you want to map each key and value to a different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?

2. Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

3. There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

These are pretty much of the top of my head, but I'd be very interested in
hearing discussion of pros and cons, and any history that shows why Ruby
differs from Smalltalk. I also heard that Ruby 2.0 will "correct" this
behaviour -- is this true? Any discussion and/or links to further reading
welcome.

P.S. This was raised through a discussion of JS.Class[1], which implements
Enumerable, Hash and Set. The Enumerable methods return native JavaScript
arrays, which don't have the Enumerable methods -- this causes some
frustration and breaks the Ruby-like model the library is designed to
support.

[1] http://jsclass.jcoglan.com
 
D

David A. Black

Hi --

I imagine this has come up before, though I can't find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they're called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

I'm one of the few, I think, who prefer to get everything back in an
array. I like the idea of a uniform structure for result sets from
Enumerable operations. Among other things, it's impossible for it to
work universally the other way, so it's always going to be "return an
array, except for a few special cases."
1. What do they mean when applied to a Hash? Do you want a list of values,
or keys, or pairs? Or do you want to map each key and value to a different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?

Hash#select and #reject return hashes in 1.9:
=> {3=>4, 5=>6}

#map returns an array. I don't think it can be otherwise, since you're
only returning one value from the code block. Of course it would be
possible to write something like this:

module Enumerable
def map2hash
res = {}
each do |k,v|
res[k] = yield(v)
end
res
end
end
2. Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

I think any quasi-mappish operation that departs from real map
semantics would have to be a different method. Mapping an enumerable
through a function to an array is just too basic and too useful not to
be available.
3. There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

You could try -- it should be implementable in Ruby, at least for
proof of concept.


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
Instructors: David A. Black and Erik Kastner
More info and registration: http://rubyurl.com/vmzN
 
R

Rick DeNatale

I imagine this has come up before, though I can't find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they're called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

I'm not sure what you mean is a good idea, the Smalltalk way, the Ruby
way, or that there's a difference?

Actually, it's not completely true that in Smalltalk collect and
select return the same class (and in both Ruby and Smalltalk classes
are not types), as the receiver. In Smalltalk they return an instance
of the species of the receiver. The abstract Collection class
implements collect and select which calls self species to get the
class to instantiate the result.

The collect method then enumerates the receiver and adds each element
to the result using the add: method. Select pretty much is the same
except that adds the element conditional on what the block evaluation
returns.

The abstract implementation of species is just self class, but it is
overridden when that's not appropriate. For example the Interval class
(ST's rough equivalent of Range) overrides species to return Array.

I'm going to reorder your questions a bit.
2. Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

Not in Smalltalk. If you send collect to a Set, you'll get back a Set
of the results of evaluating the block for each element of the
receiver, which means that duplicates will be removed. So the result
Set may be smaller than the original.
1. What do they mean when applied to a Hash? Do you want a list of values,
or keys, or pairs? Or do you want to map each key and value to a different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?


3. There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

Smalltalk implements Dictionary (which is the rough equivalent of
Ruby's Hash) as a subclass of Set. A Smalltalk Dictionary is a Set of
Associations. An instance of Association represents a key-value pair,
and two Associations are considered equal if the keys are equal. When
you enumerate a Dictionary in Smalltalk with the do: method (the rough
equivalent of #each), the values of the associations are enumerated.
There's also an associationsDo: method which enumerates the
associations.

In Smalltalk there is a uniform method to add an item to a collection
add: So to add an item to a Dictionary you add an association.

So what about collect/select for Dictionaries.

In Smalltalk, Dictionary overrides collect: to return an instance of
Bag holding the values in the Dictionary, a Bag is just what it sounds
like, it's a bag of objects which is unordered and may have
duplicates.

Dictionary also overrides select: to use associationsDo: instead of
do: it returns an instance of the species of the receiver (a
Dictionary for Dictionary instances, but maybe something else for a
subclass) after adding each association value from evaluating the
block argument for each association in the receiver. So in this case
you'd get back a Dictionary or something like a Dictionary.

All of this is based on "Blue-Book" Smalltalk-80, there might be some
slight variation in more modern Smalltalk implementations/dialects.

Now, is the Ruby or Smalltalk approach 'better' I guess if I had my
druthers, I guess I differ a bit from my friend David A. Black here,
and I'd prefer it if Ruby were a bit more Smalltalk like here, and
Ruby 1.9 has made some steps in that direction.

But it is what it is.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
R

Rick DeNatale

Hash#select and #reject return hashes in 1.9:


=3D> {1=3D>2, 3=3D>4, 5=3D>6}

=3D> {3=3D>4, 5=3D>6}

#map returns an array. I don't think it can be otherwise, since you're
only returning one value from the code block.

Both of these are more like Smalltalk. As I pointed out in another
posting to this thread, in the second case Smalltalk would return a
Bag of values from the receiver.

I honestly can't recall ever using map with a Dictionary in Smalltalk,
or with a Hash in Ruby for that matter.
Of course it would be
possible to write something like this:

module Enumerable
=A0def map2hash
=A0 =A0res =3D {}
=A0 =A0each do |k,v|
=A0 =A0 =A0res[k] =3D yield(v)
=A0 =A0end
=A0 =A0res
=A0end
end

It's trickier coming up with a map for Hash which enumerates over the
key,value pairs and allows the block to return a new key,value pair,
what do you do if there are duplicate keys?

Smalltalk finessed that question.
I think any quasi-mappish operation that departs from real map
semantics would have to be a different method. Mapping an enumerable
through a function to an array is just too basic and too useful not to
be available.


I think there are two sides to that. If you think of the world of
collections primarily as arrays, then this makes sense.

OTOH, if you think of collections more abstractly, then having map on
a Set return a Set which might be smaller because of duplication in
the results of the block is useful as well, and if you want the array
behavior you can make an Array from the set ant then map that.
3. There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=3D, Set#add and Set#add?. For linked lists, adding to = the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

You could try -- it should be implementable in Ruby, at least for
proof of concept.

The difficulty is that for some Ruby classes, I'm thinking of Hash in
particular here, it's not clear what an element is, in Smalltalk every
collection is a collection of elements, with Dictionary elements being
key-value associations.


--=20
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
D

David A. Black

Hi --

Now, is the Ruby or Smalltalk approach 'better' I guess if I had my
druthers, I guess I differ a bit from my friend David A. Black here,
and I'd prefer it if Ruby were a bit more Smalltalk like here, and
Ruby 1.9 has made some steps in that direction.

But it is what it is.

The problem is that it isn't what it is :) Maybe it's just Hash
that's special-cased. It's certainly impossible to return, say, a file
handle from doing fh.map, meaningless to return a range from
range.map (though range.map is sort of hybrid behavior anyway), and
maybe or maybe not meaningful for other classes, including
user-defined ones, to return an object of the same class.

I guess I like the simplicity of defining each, including Enumerable,
and getting pre-defined behavior. I know that Hash and Array already
override a few things, though, so it's never been quite that simple.
Still, given the underlying mechanism, there's no way to generalize
returning a same-class object.


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
Instructors: David A. Black and Erik Kastner
More info and registration: http://rubyurl.com/vmzN
 
D

David A. Black

Hi --

I think there are two sides to that. If you think of the world of
collections primarily as arrays, then this makes sense.

I wouldn't extrapolate a world-view from it :) None of this has to
be winner-take-all; there's room for all of these methods to exist. I
just like having a foundational map operation that returns a result
set in an array. I'm not ruling out other kinds of functionality, nor
claiming that non-array collections are actually arrays.

For example:

class Hash
def map2 # or whatever
h = {}
each do |k,v|
k2, v2 = yield(k,v)
h[k2] = v2
end
h
end
end

h = {1 => 2, 3 => 4 }

p h.map2 {|k,v| [k*10, v-1] }

# => {30=>3, 10=>1}

Nothing wrong with having such a method. I just don't think it's a
candidate to replace Hash#map.


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
Instructors: David A. Black and Erik Kastner
More info and registration: http://rubyurl.com/vmzN
 
7

7rans

I guess I like the simplicity of defining each, including Enumerable,
and getting pre-defined behavior. I know that Hash and Array already
override a few things, though, so it's never been quite that simple.
Still, given the underlying mechanism, there's no way to generalize
returning a same-class object.

Sure there is --or at least it can be made complete predictable. Just
define a special class method, eg.

class Array
def self.enumerable_instance; []; end
end

class Hash
def self.enumerable_instance; {}; end
end

Then a uniform "add". Eg. #<< (btw I love Hash#<<) and the whole thing
can be generalized.

This has been discussed before, but for whatever reason the idea has
been disregarded. I suspect efficiency hacks have come to out way any
of these nice "uniformalizaions".

T.
 
7

7rans

I don't mean that *something* can't be defined for every class; I mean
specifically that it will never be the case that every Enumerable
class will return an instance of itself from its iterations.

That's true. And we would never really want it to. There are many
classes we may make Enumerable just to have the conveniences for some
internal property. For example, I once defined an #each method for
ObjectSpace, and made it enumerable. I sure wouldn't want a new
ObjectSpace returned! ;-)

I think people who suggest the idea, don't really mean exactly what
they think they do. That is to say, they see Set and Hash enumerable
methods returning Arrays and think, why not Sets and Hashes? Then they
take the mental leap to the idea that enumerables ought to return an
instance of the originating class. But on further reflection, they
would realize that the ideal solution is a predictable and
controllable system for determining the return container.

It's a pretty powerful idea. It could even be designed to allow the
container to be changed on the fly. Let's say we have

[[:a,1], [:b,2]].map{ |(k,v)| [k, v+1] } #=3D> [[:a,2], [:b,3]]

We might have something like:

[[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } #=3D>
{:a=3D>2, :b=3D>3}

--
 
R

Rick DeNatale

Hi,

In message "Re: #map, #select semantics"
=A0 =A0on Sat, 29 Aug 2009 21:32:21 +0900, 7rans <[email protected]> wr= ites:

|Let's say we have
|
| =A0[[:a,1], [:b,2]].map{ |(k,v)| [k, v+1] } =A0#=3D> [[:a,2], [:b,3]]
|
|We might have something like:
|
| =A0[[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } =A0#=3D> {:a=3D>2= , :b=3D>3}

[[:a,1], [:b,2]].inject({}){|h,(k,v)| h[k]=3Dv+1; h} =A0 =A0 #=3D> {:a=3D=
2, :b=3D>3}

+1

Proving that inside every map there's an inject struggling to break free! <=
G>

Seriously, although my Smalltalk background gives me a philosophical
preference for having ruby enumeration work more like Smalltalk, from
a pragmatic perspective, I'd prefer it if changes wouldn't break my
existing code.


--=20
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 
D

David A. Black

No, but it wouldnt hurt you! <G>

I would say: no, *and* it wouldn't hurt me :) I'd be happy to know
more about Smalltalk, and I strongly suspect I wouldn't want it to be
more like Ruby.


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (http://www.manning.com/black2)

September Ruby training in NJ has been POSTPONED. Details to follow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top