Array and hash iteration questions

B

Ben Giddings

I have a CSV file and I'm trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.

I'm iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the approach:

cases = Hash.new(0)
...
cases[arr[324]] += 1
...

But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)
...
case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end
...

That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?

Next, I want to print out the values. It is easy to do this with
cases.each, but I'd like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

Any ideas would be appreciated.

Ben
 
R

Robert Klemme

Ben Giddings said:
I have a CSV file and I'm trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.

I'm iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the approach:

cases = Hash.new(0)
..
cases[arr[324]] += 1
..

But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)
..
case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end
..

That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?

cases = Hash.new {|h,k| h[k] = [0, 0]}
...
ca = arr[324]
counter = cases[ca]
counter[0] += 1

counter[1] += 1 if /Error/ =~ arr[532]
Next, I want to print out the values. It is easy to do this with
cases.each, but I'd like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

cases.sort.each do |ca, counter|
printf "%10s: %4d", ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print "\n"
end

Regards

robert
 
B

Ben Giddings

Robert said:
cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn't remember how to use the block form properly. I'm actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case]['Number'] += 1
cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/
cases.sort.each do |ca, counter|
printf "%10s: %4d", ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print "\n"
end

Aha, I just assumed hash didn't have a sort method, because the concept of
a "sorted hash" seemed meaningless, but since it actually returns an array
containing [key, value] pairs, that's perfect!

Thanks Robert

Ben
 
R

Robert Klemme

Ben Giddings said:
Robert said:
cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn't remember how to use the block form properly. I'm actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case]['Number'] += 1
cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

No need to use a Hash for this...

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It *can* affect performance if you have a large amount of cases... (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

counters = cases[case]
counters[Number] += 1
counters[Errors] += 1 if arr[OFFSET] =~ /Error/

You could as well do

cases[case].instance_eval do
self[Number] += 1
self[Errors] += 1 if arr[OFFSET] =~ /Error/
end

I'm getting carried away... :)
cases.sort.each do |ca, counter|
printf "%10s: %4d", ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print "\n"
end

Aha, I just assumed hash didn't have a sort method, because the concept of
a "sorted hash" seemed meaningless, but since it actually returns an array
containing [key, value] pairs, that's perfect!

It is! Thanks to Matz's wisdom.
Thanks Robert

You're welcome.

Kind regards

robert


10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
11.28 22.28 2.51 100001 0.03 0.03 Array#[]
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
10:17:25 [ruby]: cat lookups.rb


def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end

testHash
testArray

10:18:15 [ruby]:
 
R

Robert Klemme

Robert Klemme said:
Ben Giddings said:
Robert said:
cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn't remember how to use the block form properly. I'm actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case]['Number'] += 1
cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

No need to use a Hash for this...

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.

It *can* affect performance if you have a large amount of cases... (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

This sentence should really have appeared several lines above: it's the
argument in favour of using arrays instead of hashes for the counters.

Regards

robert
 
A

Alan Chen

Robert Klemme said:
No need to use a Hash for this...

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It *can* affect performance if you have a large amount of cases... (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

I'm not sure if my testing method is quite consistent, but making a specific
record object looks like it could speed things up even more...
ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
73.74 13.08 13.08 3 4359.00 5911.67 Integer#upto
14.47 15.64 2.57 100001 0.03 0.03 Hash#[]
11.79 17.73 2.09 100001 0.02 0.02 Array#[]
0.08 17.75 0.01 1 15.00 15.00 Profiler__.start_profile
0.00 17.75 0.00 1 0.00 17735.00 #toplevel
0.00 17.75 0.00 1 0.00 0.00 Class#inherited
0.00 17.75 0.00 1 0.00 1329.00 Object#testObj
0.00 17.75 0.00 2 0.00 8203.00 Object#test
0.00 17.75 0.00 1 0.00 0.00 TestObj#initialize
0.00 17.75 0.00 1 0.00 8203.00 Object#testArray
0.00 17.75 0.00 9 0.00 0.00 Module#method_added
0.00 17.75 0.00 1 0.00 8203.00 Object#testHash
0.00 17.75 0.00 1 0.00 0.00 Module#attr_accessor
0.00 17.75 0.00 1 0.00 0.00 Class#new
type lookups.rb
def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end


# a simple record class...
class TestObj
attr_accessor :num, :err
def initialize
@num = 0
@err = 0
end
end

def testObj
to = TestObj.new
0.upto( 100000 ) do
to.err
end
end

testHash
testArray
testObj
10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
11.28 22.28 2.51 100001 0.03 0.03 Array#[]
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top