Mode method for Array

G

Glenn

[Note: parts of this message were removed to make it a legal post.]

Hi,

I'd like to write a get_mode method for the Array class. The method would return an array of the most frequently occurring element or elements.

So [3, 1, 1, 55, 55].get_mode would return [1, 55].

I have a way to do this but I don't know if it's the best way. I was wondering if anyone had any suggestions?

Thanks!
 
E

Eustáquio 'TaQ' Rangel

I'd like to write a get_mode method for the Array class. The method would return an array of the most frequently occurring element or elements.
So [3, 1, 1, 55, 55].get_mode would return [1, 55].
I have a way to do this but I don't know if it's the best way. I was wondering if anyone had any suggestions?

What is your way? Maybe we can have some idea of what parameters you are using
to the the most frequently elements. Using something like

irb(main):001:0> [3,1,1,55,55].inject(Hash.new(0)){|memo,item| memo[item] += 1;
memo}.sort_by {|e| e[1]}.reverse
=> [[55, 2], [1, 2], [3, 1]]

can return you some elements ordered by frequency.
 
T

Trans

Hi,

I'd like to write a get_mode method for the Array class. =A0The method wo=
uld return an array of the most frequently occurring element or elements.
So [3, 1, 1, 55, 55].get_mode would return [1, 55].

I have a way to do this but I don't know if it's the best way. =A0I was w=
ondering if anyone had any suggestions?

Facets has:

module Enumerable

# In Statistics mode is the value that occurs most
# frequently in a given set of data.

def mode
count =3D Hash.new(0)
each {|x| count[x] +=3D 1 }
count.sort_by{|k,v| v}.last[0]
end

end

Hmm.. but that thwarts ties. I'll have to consider how to fix.

T.
 
E

Erik Veenstra

Using Enumerable#cluster_by (already defined in Facets):

module Enumerable
def mode
cluster_by do |element|
element
end.cluster_by do |cluster|
cluster.length
end.last.ergo do |clusters|
clusters.transpose.first
end # || []
end
end

gegroet,
Erik V.
 
E

Erik Veenstra

There's one more problem with your code: [].mode doesn't work.

gegroet,
Erik V.
 
B

Brian Candler

Shame that the standard Hash#invert doesn't handle duplicate values
well. My suggestion:

class Hash
def ninvert
inject({}) { |h,(k,v)| (h[v] ||= []) << k; h }
end
end

class Array
def get_mode
(inject(Hash.new(0)) { |h,e| h[e] += 1; h }.ninvert.max ||
[[]]).last
end
end

p [3, 1, 1, 55, 55].get_mode
p [3, 1, 1, 55].get_mode
p [:foo, 3, "bar", :foo, 4, "bar"].get_mode
p [].get_mode

(with ruby 1.8 if there are multiple mode values you get them in an
arbitary order; I think with 1.9 you'd get them in the order first seen
in the original array)
 
E

Erik Veenstra

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

For low level methods like these, speed is much more important
than readability. And the inject versions of the methods below
aren't even more readable than the faster implementations.

So, we'll go for the fast ones:

module Enumerable
def mode
empty? ? [] : frequencies.group_by_value.max.last
end

def frequencies
x = nil
res = Hash.new(0)
each{|x| res[x] += 1}
res
end
end

class Hash
def group_by_value
k = v = nil
res = {}
each{|k, v| (res[v] ||= []) << k}
res
end
end

gegroet,
Erik V. - http://www.erikveen.dds.nl/
 
R

Robert Klemme

2008/10/1 Erik Veenstra said:
Speed is also the reason for "pre-defining" variables used in
iterations. (6% faster!!)

Premature optimization IMHO.

Here's another nice and short one:

irb(main):001:0> module Enumerable
irb(main):002:1> def mode
irb(main):003:2> max = 0
irb(main):004:2> c = Hash.new 0
irb(main):005:2> each {|x| cc = c[x] += 1; max = cc if cc > max}
irb(main):006:2> c.select {|k,v| v == max}.map {|k,v| k}
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> [3, 1, 1, 55, 55].mode
=> [55, 1]
irb(main):010:0> [].mode
=> []
irb(main):011:0>

Cheers

robert
 
D

David A. Black

Hi --

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

What's inject! ?


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
D

David A. Black

Hi --

Premature optimization IMHO.

As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.
I tend to agree, though I'm not able to come up with a very technical
rationale.

What say you, oh inject king?


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
B

Brian Candler

David said:
What say you, oh inject king?

I don't know who that is, but I'll add my 2c anyway:
As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic.

I agree; 'inject' is ideally for when you're creating a new data
structure each iteration rather than modifying an existing one. You
could do

inject({}) {|h,item| h.merge(something => otherthing)}

but that creates lots of waste.

I only used it as a convenient holder for the target object. Maybe
there's a more ruby-ish pattern where the target is the same each time
round, although I don't know what you'd call it:

module Enumerable
def into(obj)
each { |e| yield obj, e }
obj
end
end

src = {:foo=>1, :bar=>1, :baz=>2}
p src.into({}) { |tgt,(k,v)| (tgt[v] ||= []) << k }

There was also a previous suggestion of generalising map so that it
would build into an arbitary object, not just an array.

module Enumerable
def map2(target = [])
each { |e| target << (yield e) }
target
end
end

p [1,2,3].map2 { |e| e * 2 }

class Hash
def <<(x)
self[x[0]] = x[1]
end
end

p [1,2,3].map2({}) { |e| [e, e * 2] }

That would allow any target which implements :<<, so map to $stdout
would be fine.

It's not so useful here, since we'd need a :<< method suitable for hash
inversion. And I suppose for completeness, you'd need a wrapper class
analagous to Enumerator to map :<< to an arbitary method name...
 
R

Robert Klemme

2008/10/1 David A. Black said:
Hi --



As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to []? i.e.

module Enumerable
def map(target = [])
each {|x| target << yield x}
target
end
end
I tend to agree, though I'm not able to come up with a very technical
rationale.

What say you, oh inject king?

Um..., I kind of agree about the unidiomaticness. It's ugly. These
are certainly much nicer

inject(0) {|h,item| item + h }
inject("") {|s,item| s << item }

I have to admit I use it sparingly these days. :)

Kind regards

robert
 
D

David A. Black

Hi --

2008/10/1 David A. Black said:
Hi --



As much as I like inject, I have to say I've always felt that the ones
that look like this:

inject({}) {|h,item| do_something; h }

are kind of unidiomatic. Evan Phoenix was saying recently on IRC (I
hope I'm remembering/quoting correctly) that his rule of thumb was
that inject was for cases where the accumulator was not the same
object every time, and that where a single object is having elements
added to it, an each iteration from the source collection was better.

In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to []? i.e.

module Enumerable
def map(target = [])
each {|x| target << yield x}
target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

[1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
R

Robert Klemme

2008/10/1 David A. Black said:
In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to []? i.e.

module Enumerable
def map(target = [])
each {|x| target << yield x}
target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

[1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.

I'm not sure I understand exactly what you mean. My point was simply
to allow the caller to provide the target collection where something
is mapped to. Of course you can then also use a String or anything
else that supports << (a stream for example) which I find quite neat.
But where does "=>" syntax come into play?

Kind regards

robert
 
D

David A. Black

Hi --

2008/10/1 David A. Black said:
In that case #map might be more appropriate - at least if the target
collection is an Array. Btw, did we ever discuss having #map accept a
parameter which defaults to []? i.e.

module Enumerable
def map(target = [])
each {|x| target << yield x}
target
end
end

I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:

[1,2,3,4,5] => "Hi."

I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.

I'm not sure I understand exactly what you mean. My point was simply
to allow the caller to provide the target collection where something
is mapped to. Of course you can then also use a String or anything
else that supports << (a stream for example) which I find quite neat.
But where does "=>" syntax come into play?

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

[1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
R

Robert Klemme

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

[1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.

Ah, now I get your point. Thanks for elaborating. So you'd rather call
such a method #append or similar.

Kind regards

robert
 
D

David A. Black

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

[1,2,3,4,5].map("") {...} # Result: "Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.

Ah, now I get your point. Thanks for elaborating. So you'd rather call such
a method #append or similar.

I'm pretty happy just iterating and using <<. I'm not sure there's a
need to wrap it in another method.


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
T

Todd Benson

And since we all love speed, we tend to avoid inject. (For
those who don't know: inject and inject! are really, really
slow. I mean, _really_ slow...)

I use inject for conceptual reasons. You can always refactor for speed later.

Todd
 
T

Trans

Hi --



2008/10/1 David A. Black said:
In that case #map might be more appropriate - at least if the target
collection is an Array. =A0Btw, did we ever discuss having #map accep= t a
parameter which defaults to []? =A0i.e.
module Enumerable
=A0def map(target =3D [])
=A0 each {|x| target << yield x}
=A0 target
=A0end
end
I don't think it's a good idea; it generalizes the idea of a mapping
of a collection beyond anything that really seems to me to be a
mapping. If I saw:
=A0[1,2,3,4,5] =A0 =3D> =A0"Hi."
I would not just say it's a weird mapping; I would not be able to
identify it as a mapping at all. It's a more general transformation.
I'm not sure I understand exactly what you mean. =A0My point was simply
to allow the caller to provide the target collection where something
is mapped to. =A0Of course you can then also use a String or anything
else that supports << (a stream for example) which I find quite neat.
But where does "=3D>" syntax come into play?

It doesn't; I'm just using it to separate the collection and a
potential "mapping".

=A0 =A0[1,2,3,4,5].map("") {...} =A0 # Result: =A0"Hi."

I don't think that << correctly represents the concept of mapping, so
I would not want map to be generalized to any <<-capable target
object. It's more a <<'ing, or something, than a mapping. It happens
that the current behavior of map can be implemented using << and an
empty array, but I don't think that means that << per se is at the
heart of mapping.

Why not? Building up a collection requires some means of "building
up", and #<< is that means. Standardizing around that method allows
for a more comprehensible and flexible system. Is it a semantic thing
for you? Would it help to think of #collect, instead of #map?

In any case, Robert's idea had to do with providing an initial
collection with which to build, whether #<< is used to do that or not.

T.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top