Array and hash iteration questions

Discussion in 'Ruby' started by Ben Giddings, Sep 30, 2003.

  1. Ben Giddings

    Ben Giddings Guest

    I have a CSV file and I'm trying to do a few things with it. Essentially
    what it boils down to is: count the number of times a certain value is
    seen, then count the number of times another value is seen in conjunction
    with the first one.

    I'm iterating over the lines of the file, and splitting them into an array
    with arr = line.split(/,/). That part works well, but there are a few
    questions about how to do something efficiently.

    In order to count the number of times something is seen, I took the approach:

    cases = Hash.new(0)
    ...
    cases[arr[324]] += 1
    ...

    But now I want to save the number of cases where another value occurs with
    the first one. (Essentially errors indexed by case)

    The approach I have now is:

    cases = Hash.new(0)
    errors = Hash.new(0)
    ...
    case = arr[324]
    cases[case] += 1
    if arr[532] =~ /Error/
    errors[case] += 1
    end
    ...

    That works, but it seems to me that I really should be doing this with one
    hash, not two. Any suggestions?

    Next, I want to print out the values. It is easy to do this with
    cases.each, but I'd like to print them out, sorted by case. The best
    solution I have so far uses cases.keys.sort.each, then inside the block
    uses cases[key] (and errors[key]).

    Any ideas would be appreciated.

    Ben
     
    Ben Giddings, Sep 30, 2003
    #1
    1. Advertising

  2. "Ben Giddings" <> schrieb im Newsbeitrag
    news:...
    > I have a CSV file and I'm trying to do a few things with it.

    Essentially
    > what it boils down to is: count the number of times a certain value is
    > seen, then count the number of times another value is seen in

    conjunction
    > with the first one.
    >
    > I'm iterating over the lines of the file, and splitting them into an

    array
    > with arr = line.split(/,/). That part works well, but there are a few
    > questions about how to do something efficiently.
    >
    > In order to count the number of times something is seen, I took the

    approach:
    >
    > cases = Hash.new(0)
    > ..
    > cases[arr[324]] += 1
    > ..
    >
    > But now I want to save the number of cases where another value occurs

    with
    > the first one. (Essentially errors indexed by case)
    >
    > The approach I have now is:
    >
    > cases = Hash.new(0)
    > errors = Hash.new(0)
    > ..
    > case = arr[324]
    > cases[case] += 1
    > if arr[532] =~ /Error/
    > errors[case] += 1
    > end
    > ..
    >
    > That works, but it seems to me that I really should be doing this with

    one
    > hash, not two. Any suggestions?


    cases = Hash.new {|h,k| h[k] = [0, 0]}
    ...
    ca = arr[324]
    counter = cases[ca]
    counter[0] += 1

    counter[1] += 1 if /Error/ =~ arr[532]

    > Next, I want to print out the values. It is easy to do this with
    > cases.each, but I'd like to print them out, sorted by case. The best
    > solution I have so far uses cases.keys.sort.each, then inside the block
    > uses cases[key] (and errors[key]).


    cases.sort.each do |ca, counter|
    printf "%10s: %4d", ca, counter[0]
    printf " %4d", counter[1] if counter[1] > 0
    print "\n"
    end

    Regards

    robert
     
    Robert Klemme, Oct 1, 2003
    #2
    1. Advertising

  3. Ben Giddings

    Ben Giddings Guest

    Robert Klemme wrote:
    > cases = Hash.new {|h,k| h[k] = [0, 0]}


    Ah. I couldn't remember how to use the block form properly. I'm actually
    going to use:

    cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

    Because it will make some of the later stuff more clear like

    cases[case]['Number'] += 1
    cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

    > cases.sort.each do |ca, counter|
    > printf "%10s: %4d", ca, counter[0]
    > printf " %4d", counter[1] if counter[1] > 0
    > print "\n"
    > end


    Aha, I just assumed hash didn't have a sort method, because the concept of
    a "sorted hash" seemed meaningless, but since it actually returns an array
    containing [key, value] pairs, that's perfect!

    Thanks Robert

    Ben
     
    Ben Giddings, Oct 1, 2003
    #3
  4. "Ben Giddings" <> schrieb im Newsbeitrag
    news:...
    > Robert Klemme wrote:
    > > cases = Hash.new {|h,k| h[k] = [0, 0]}

    >
    > Ah. I couldn't remember how to use the block form properly. I'm

    actually
    > going to use:
    >
    > cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
    >
    > Because it will make some of the later stuff more clear like
    >
    > cases[case]['Number'] += 1
    > cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/


    No need to use a Hash for this...

    Number = 0
    Errors = 1

    cases[case][Number] += 1
    cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

    I might be a bit pricky, but storing the array ref saves one hash lookup.
    It *can* affect performance if you have a large amount of cases... (see
    below; although the timing is dominated by the iteration here, you can see
    that the array is faster)

    counters = cases[case]
    counters[Number] += 1
    counters[Errors] += 1 if arr[OFFSET] =~ /Error/

    You could as well do

    cases[case].instance_eval do
    self[Number] += 1
    self[Errors] += 1 if arr[OFFSET] =~ /Error/
    end

    I'm getting carried away... :)

    > > cases.sort.each do |ca, counter|
    > > printf "%10s: %4d", ca, counter[0]
    > > printf " %4d", counter[1] if counter[1] > 0
    > > print "\n"
    > > end

    >
    > Aha, I just assumed hash didn't have a sort method, because the concept

    of
    > a "sorted hash" seemed meaningless, but since it actually returns an

    array
    > containing [key, value] pairs, that's perfect!


    It is! Thanks to Matz's wisdom.

    > Thanks Robert


    You're welcome.

    Kind regards

    robert


    10:17:02 [ruby]: ruby -rprofile lookups.rb
    % cumulative self self total
    time seconds seconds calls ms/call ms/call name
    62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
    26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
    11.28 22.28 2.51 100001 0.03 0.03 Array#[]
    0.07 22.30 0.01 1 15.00 15.00
    Profiler__.start_profile
    0.00 22.30 0.00 2 0.00 11140.50 Object#test
    0.00 22.30 0.00 3 0.00 0.00 Module#method_added
    0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
    0.00 22.30 0.00 1 0.00 22281.00 #toplevel
    0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
    10:17:25 [ruby]: cat lookups.rb


    def test(coll)
    0.upto( 100000 ) do
    coll[2]
    end
    end

    def testHash
    test( { 0 => 0, 1 => 1, 2 => 2 } )
    end

    def testArray
    test( [0, 1, 2] )
    end

    testHash
    testArray

    10:18:15 [ruby]:
     
    Robert Klemme, Oct 2, 2003
    #4
  5. "Robert Klemme" <> schrieb im Newsbeitrag
    news:blgp2a$bvnb8$-berlin.de...
    >
    > "Ben Giddings" <> schrieb im Newsbeitrag
    > news:...
    > > Robert Klemme wrote:
    > > > cases = Hash.new {|h,k| h[k] = [0, 0]}

    > >
    > > Ah. I couldn't remember how to use the block form properly. I'm

    > actually
    > > going to use:
    > >
    > > cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
    > >
    > > Because it will make some of the later stuff more clear like
    > >
    > > cases[case]['Number'] += 1
    > > cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

    >
    > No need to use a Hash for this...
    >
    > Number = 0
    > Errors = 1
    >
    > cases[case][Number] += 1
    > cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
    >
    > I might be a bit pricky, but storing the array ref saves one hash

    lookup.

    > It *can* affect performance if you have a large amount of cases... (see
    > below; although the timing is dominated by the iteration here, you can

    see
    > that the array is faster)


    This sentence should really have appeared several lines above: it's the
    argument in favour of using arrays instead of hashes for the counters.

    Regards

    robert
     
    Robert Klemme, Oct 2, 2003
    #5
  6. Ben Giddings

    Alan Chen Guest

    "Robert Klemme" <> wrote in message news:<blgp2a$bvnb8$-berlin.de>...
    > No need to use a Hash for this...
    >
    > Number = 0
    > Errors = 1
    >
    > cases[case][Number] += 1
    > cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
    >
    > I might be a bit pricky, but storing the array ref saves one hash lookup.
    > It *can* affect performance if you have a large amount of cases... (see
    > below; although the timing is dominated by the iteration here, you can see
    > that the array is faster)


    I'm not sure if my testing method is quite consistent, but making a specific
    record object looks like it could speed things up even more...

    >ruby -rprofile lookups.rb

    % cumulative self self total
    time seconds seconds calls ms/call ms/call name
    73.74 13.08 13.08 3 4359.00 5911.67 Integer#upto
    14.47 15.64 2.57 100001 0.03 0.03 Hash#[]
    11.79 17.73 2.09 100001 0.02 0.02 Array#[]
    0.08 17.75 0.01 1 15.00 15.00 Profiler__.start_profile
    0.00 17.75 0.00 1 0.00 17735.00 #toplevel
    0.00 17.75 0.00 1 0.00 0.00 Class#inherited
    0.00 17.75 0.00 1 0.00 1329.00 Object#testObj
    0.00 17.75 0.00 2 0.00 8203.00 Object#test
    0.00 17.75 0.00 1 0.00 0.00 TestObj#initialize
    0.00 17.75 0.00 1 0.00 8203.00 Object#testArray
    0.00 17.75 0.00 9 0.00 0.00 Module#method_added
    0.00 17.75 0.00 1 0.00 8203.00 Object#testHash
    0.00 17.75 0.00 1 0.00 0.00 Module#attr_accessor
    0.00 17.75 0.00 1 0.00 0.00 Class#new
    >type lookups.rb

    def test(coll)
    0.upto( 100000 ) do
    coll[2]
    end
    end

    def testHash
    test( { 0 => 0, 1 => 1, 2 => 2 } )
    end

    def testArray
    test( [0, 1, 2] )
    end


    # a simple record class...
    class TestObj
    attr_accessor :num, :err
    def initialize
    @num = 0
    @err = 0
    end
    end

    def testObj
    to = TestObj.new
    0.upto( 100000 ) do
    to.err
    end
    end

    testHash
    testArray
    testObj

    > 10:17:02 [ruby]: ruby -rprofile lookups.rb
    > % cumulative self self total
    > time seconds seconds calls ms/call ms/call name
    > 62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
    > 26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
    > 11.28 22.28 2.51 100001 0.03 0.03 Array#[]
    > 0.07 22.30 0.01 1 15.00 15.00
    > Profiler__.start_profile
    > 0.00 22.30 0.00 2 0.00 11140.50 Object#test
    > 0.00 22.30 0.00 3 0.00 0.00 Module#method_added
    > 0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
    > 0.00 22.30 0.00 1 0.00 22281.00 #toplevel
    > 0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
     
    Alan Chen, Oct 2, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rudi
    Replies:
    5
    Views:
    5,229
  2. rp
    Replies:
    1
    Views:
    562
    red floyd
    Nov 10, 2011
  3. Anthony Martinez
    Replies:
    4
    Views:
    291
    Robert Klemme
    Jun 11, 2007
  4. Michal Suchanek
    Replies:
    6
    Views:
    253
    Nobuyoshi Nakada
    Jun 13, 2007
  5. Srijayanth Sridhar
    Replies:
    19
    Views:
    655
    David A. Black
    Jul 2, 2008
Loading...

Share This Page