count : array to array of hashes

  • Thread starter Jean-Sébastien
  • Start date
J

Jean-Sébastien

Hi,
I'm tryind to count distinct ids from an array (array_of_ids =
[1,2,3,1,2]), and want a result array of hashes looking like it :
result = [{;id=>1,:count=>2},{;id=>2,:count=>2},{;id=>3,:count=>1}]

i tried it, but it always returns errors :
result = array_of_ids.inject(Array.new){ |a,x| elt = a.find{|h| h[:id]
== x} || {:id => x, :count => 0}; elt[:count] += 1; elt}

i took my information from this post :
http://groups.google.com/group/comp...nk=gst&q=array++count&rnum=8#04a7b10da8195cec

If someone could help.

regards
 
R

Robert Dober

Hi,
I'm tryind to count distinct ids from an array (array_of_ids =3D
[1,2,3,1,2]), and want a result array of hashes looking like it :
result =3D [{;id=3D>1,:count=3D>2},{;id=3D>2,:count=3D>2},{;id=3D>3,:coun= t=3D>1}]

i tried it, but it always returns errors :
result =3D array_of_ids.inject(Array.new){ |a,x| elt =3D a.find{|h| h[:id= ]
=3D=3D x} || {:id =3D> x, :count =3D> 0}; elt[:count] +=3D 1; elt}

i took my information from this post :
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/312221= 2f6db8d7f5/04a7b10da8195cec?lnk=3Dgst&q=3Darray++count&rnum=3D8#04a7b10da81=
95cec

If someone could help.

regards
irb(main):005:0> a=3D[1,2,3,42,2,2,3]
=3D> [1, 2, 3, 42, 2, 2, 3]
irb(main):006:0> a.inject([]){|r,ele| r[ele][:count]+=3D1 rescue
r[ele]=3D{:id =3D> ele, :count=3D>1}; r}.compact
=3D> [{:count=3D>1, :id=3D>1}, {:count=3D>3, :id=3D>2}, {:count=3D>2, :id=
=3D>3},
{:count=3D>1, :id=3D>42}]
irb(main):007:0>

HTH
Robert



--=20
[...] as simple as possible, but no simpler.
-- Attributed to Albert Einstein
 
S

Stefano Crocco

Alle mercoled=EC 1 agosto 2007, Jean-S=E9bastien ha scritto:
Hi,
I'm tryind to count distinct ids from an array (array_of_ids =3D
[1,2,3,1,2]), and want a result array of hashes looking like it :
result =3D [{;id=3D>1,:count=3D>2},{;id=3D>2,:count=3D>2},{;id=3D>3,:coun= t=3D>1}]

i tried it, but it always returns errors :
result =3D array_of_ids.inject(Array.new){ |a,x| elt =3D a.find{|h| h[:id]
=3D=3D x} || {:id =3D> x, :count =3D> 0}; elt[:count] +=3D 1; elt}

i took my information from this post :
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/312221= 2f
6db8d7f5/04a7b10da8195cec?lnk=3Dgst&q=3Darray++count&rnum=3D8#04a7b10da819=
5cec

If someone could help.

regards

There are two problems in your code:
1- in the inject block, you return elt, which is (or should be, if the code=
=20
worked) the hash containing the id which is being processed. You should=20
return a, that is the array which contains the hashes. Correcting this shou=
ld=20
give a piece of code which executes without errors, but which returns an=20
empty array
2- you never insert the hashes you create inside the inject block into the=
=20
array a: you only store them in the local variable elt, which gets destroye=
d=20
after each iteration. The inject block should be:

result =3D array_of_ids.inject(Array.new){ |a,x|=20
elt =3D ( a.find{|h| h[:id] =3D=3D x} || a[a.size] =3D {:id =3D> x, :co=
unt =3D> 0} )
elt[:count] +=3D 1
a
}

As you can see, after the || operator a new hash is created and inserted at=
=20
the end of a (corresponding to the index a.size). Since an assignment alway=
s=20
return the value being assigned (this is why I didn't use <<, it returns th=
e=20
array, not the inserted element), elt is then set to the new hash. Of cours=
e,=20
all this happens only if find returns nil.

If you can rely in the id to be positive integers, and don't care if the=20
resulting array contains the hashes in the same order as the id are stored =
in=20
the array, here's another approach you can consider:

result =3D array_of_ids.inject([]) do |res, i|
res[i-1] ||=3D {:id =3D> i, :count =3D> 0}
res[i-1][:count] +=3D 1
res
end

This code stores the data relative to the id i in the i-1 position in the=20
array (the -1 is there to avoid a nil element at the beginning). This shoul=
d=20
make it faster, since you don't need to iterate all the array to check=20
whether the data corresponding to an id is already there or not: either is =
in=20
the position id - 1 or it itsn't there. A quick benchmark, done by creating=
=20
an array of ids of 100_000 elements, with values randomly chosen between 1=
=20
and 11 gives:

user system total real
original approach 5.730000 1.490000 7.220000 ( 7.310224)
user system total real
alternative approach 1.250000 0.150000 1.400000 ( 1.416871)

Changing the range of the ids from 11 to 101 gives:
user system total real
alternative approach 1.270000 0.190000 1.460000 ( 1.472353)
user system total real
original approach 37.730000 11.360000 49.090000 ( 51.056527)

Increasing it to 1001 gives
user system total real
alternative approach 1.500000 0.220000 1.720000 ( 1.733568)

The original approach takes much more time (I didn't have the patience to w=
ait=20
for it to complete).

I hope this helps

Stefano
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top