Premature Optimization

Discussion in 'Ruby' started by poopdeville@gmail.com, Oct 25, 2006.

  1. Guest

    Hi everybody,

    I'm writing a fairly open-ended question. I'm hoping for suggestions,
    opinions, advice. Suppose I have n arrays, each of which has m
    entries. m is a fairly large integer, on the order of 10,000. Each
    entry is either 1 or 0.

    The first task I need to accomplish is figuring out how many times a 1
    occurs in the ith entry in an array. So for concreteness, if I had
    arrays:

    first = [1,0,0,0,0]
    second = [1,1,0,0,0]
    third = [0,0,0,1,0]

    I would end up with
    count = [2,1,0,1,0]

    I'm just trying to give the general flavor of what I'm working on. I
    know I can use some simple each_with_index loops to increment
    count[index] (something along the lines of:)

    count = Array.new(m,0)
    [first, second, third].each do |array|
    array.each_with_index do |item, index|
    count[index] += item
    end
    end

    There are going to be m * 3n * (two constant multipliers for the
    looping) object allocations and method calls. m is fairly large, and I
    have other, similar, tasks to accomplish with this data. The faster I
    can process this data, the more data I can process in a given amount of
    time, and the more accurate the analysis will be.

    Is there a slick way to do this with unpacking and packing? Or some
    other way to do this with strings? Any modules or libraries I should
    look into? I'm fairly new to Ruby, though not to scientific
    computation. I realize I won't be able to do this any faster than with
    m * n * (a constant multiplier), but I need that constant multiplier to
    be as small as possible. Any advice would be appreciated.

    Thanks!
    'cid 'ooh
    , Oct 25, 2006
    #1
    1. Advertising

  2. wrote:
    > Hi everybody,
    >
    > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > opinions, advice. Suppose I have n arrays, each of which has m
    > entries. m is a fairly large integer, on the order of 10,000. Each
    > entry is either 1 or 0.


    > Is there a slick way to do this with unpacking and packing? Or some
    > other way to do this with strings? Any modules or libraries I should
    > look into? I'm fairly new to Ruby, though not to scientific
    > computation. I realize I won't be able to do this any faster than with
    > m * n * (a constant multiplier), but I need that constant multiplier to
    > be as small as possible. Any advice would be appreciated.


    I would look out for an implementation of a BitSet as this data
    structure seems crucial for your application. You could even create one
    your own - it's not too hard.

    Kind regards

    robert
    Robert Klemme, Oct 25, 2006
    #2
    1. Advertising

  3. Guest

    On Thu, 26 Oct 2006 wrote:

    > Hi everybody,
    >
    > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > opinions, advice. Suppose I have n arrays, each of which has m entries. m
    > is a fairly large integer, on the order of 10,000. Each entry is either 1
    > or 0.
    >
    > The first task I need to accomplish is figuring out how many times a 1
    > occurs in the ith entry in an array. So for concreteness, if I had arrays:
    >
    > first = [1,0,0,0,0]
    > second = [1,1,0,0,0]
    > third = [0,0,0,1,0]
    >
    > I would end up with
    > count = [2,1,0,1,0]


    harp:~ > cat a.rb
    require 'narray'

    first = NArray.to_na [1,0,0,0,0]
    second = NArray.to_na [1,1,0,0,0]
    third = NArray.to_na [0,0,0,1,0]

    count = first.eq(1) + second.eq(1) + third.eq(1)

    p count


    harp:~ > ruby a.rb
    NArray.byte(5):
    [ 2, 1, 0, 1, 0 ]


    > I'm just trying to give the general flavor of what I'm working on. I know I
    > can use some simple each_with_index loops to increment count[index]
    > (something along the lines of:)
    >
    > count = Array.new(m,0)
    > [first, second, third].each do |array|
    > array.each_with_index do |item, index|
    > count[index] += item
    > end
    > end
    >
    > There are going to be m * 3n * (two constant multipliers for the
    > looping) object allocations and method calls. m is fairly large, and I
    > have other, similar, tasks to accomplish with this data. The faster I
    > can process this data, the more data I can process in a given amount of
    > time, and the more accurate the analysis will be.


    i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.


    regards.

    -a
    --
    my religion is very simple. my religion is kindness. -- the dalai lama
    , Oct 25, 2006
    #3
  4. On 10/25/06, <> wrote:
    > On Thu, 26 Oct 2006 wrote:
    >
    > > Hi everybody,
    > >
    > > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > > opinions, advice. Suppose I have n arrays, each of which has m entries. m
    > > is a fairly large integer, on the order of 10,000. Each entry is either 1
    > > or 0.
    > >
    > > The first task I need to accomplish is figuring out how many times a 1
    > > occurs in the ith entry in an array. So for concreteness, if I had arrays:
    > >
    > > first = [1,0,0,0,0]
    > > second = [1,1,0,0,0]
    > > third = [0,0,0,1,0]
    > >
    > > I would end up with
    > > count = [2,1,0,1,0]

    >
    > harp:~ > cat a.rb
    > require 'narray'
    >
    > first = NArray.to_na [1,0,0,0,0]
    > second = NArray.to_na [1,1,0,0,0]
    > third = NArray.to_na [0,0,0,1,0]
    >
    > count = first.eq(1) + second.eq(1) + third.eq(1)
    >
    > p count
    >
    >
    > harp:~ > ruby a.rb
    > NArray.byte(5):
    > [ 2, 1, 0, 1, 0 ]
    >
    >
    > > I'm just trying to give the general flavor of what I'm working on. I know I
    > > can use some simple each_with_index loops to increment count[index]
    > > (something along the lines of:)
    > >
    > > count = Array.new(m,0)
    > > [first, second, third].each do |array|
    > > array.each_with_index do |item, index|
    > > count[index] += item
    > > end
    > > end
    > >
    > > There are going to be m * 3n * (two constant multipliers for the
    > > looping) object allocations and method calls. m is fairly large, and I
    > > have other, similar, tasks to accomplish with this data. The faster I
    > > can process this data, the more data I can process in a given amount of
    > > time, and the more accurate the analysis will be.

    >
    > i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.
    >
    >


    What's it going to take to get this into the stdlib? It is awesome.
    Wilson Bilkovich, Oct 25, 2006
    #4
  5. Guest

    On Thu, 26 Oct 2006, Wilson Bilkovich wrote:

    > What's it going to take to get this into the stdlib? It is awesome.


    it sure it. i've be campaigning for years to no avail... maybe it's time to
    bug matz again? it even compiles on windows: both it and the gsl can be found
    precompiled here:

    http://codeforpeople.com/lib/ruby/rb-gsl-win/

    regards.

    -a
    --
    my religion is very simple. my religion is kindness. -- the dalai lama
    , Oct 26, 2006
    #5
  6. dga Guest

    Someone already mentioned the NArray way; if you treat the entire
    thing as a set of matrix operations, you can make it insanely fast in
    either NArray or GSL. But don't treat it as "N" arrays with "M"
    entries -- treat it as a matrix, and then all of your operations can be
    done in the external (very fast) code:

    m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])

    You can then either cheat, and "flatten" the matrix into a vector and
    find the sum:

    puts m.to_v.sum

    (which works since they're ones)

    Or pretend you're in Matlab and do it as a set of matrix operations:

    colones = GSL::Matrix.new(1, m.size[1])
    colones[0].set_all(1)

    rowones = GSL::Matrix.new(1, m.size[0])
    rowones[0].set_all(1)

    puts ((colones * m.transpose) * rowones.transpose)[0,0]

    -Dave


    wrote:
    > Hi everybody,
    >
    > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > opinions, advice. Suppose I have n arrays, each of which has m
    > entries. m is a fairly large integer, on the order of 10,000. Each
    > entry is either 1 or 0.
    >
    > The first task I need to accomplish is figuring out how many times a 1
    > occurs in the ith entry in an array. So for concreteness, if I had
    > arrays:
    >
    > first = [1,0,0,0,0]
    > second = [1,1,0,0,0]
    > third = [0,0,0,1,0]
    >
    > I would end up with
    > count = [2,1,0,1,0]
    >
    > I'm just trying to give the general flavor of what I'm working on. I
    > know I can use some simple each_with_index loops to increment
    > count[index] (something along the lines of:)
    >
    > count = Array.new(m,0)
    > [first, second, third].each do |array|
    > array.each_with_index do |item, index|
    > count[index] += item
    > end
    > end
    >
    > There are going to be m * 3n * (two constant multipliers for the
    > looping) object allocations and method calls. m is fairly large, and I
    > have other, similar, tasks to accomplish with this data. The faster I
    > can process this data, the more data I can process in a given amount of
    > time, and the more accurate the analysis will be.
    >
    > Is there a slick way to do this with unpacking and packing? Or some
    > other way to do this with strings? Any modules or libraries I should
    > look into? I'm fairly new to Ruby, though not to scientific
    > computation. I realize I won't be able to do this any faster than with
    > m * n * (a constant multiplier), but I need that constant multiplier to
    > be as small as possible. Any advice would be appreciated.
    >
    > Thanks!
    > 'cid 'ooh
    dga, Oct 26, 2006
    #6
  7. dga Guest

    Tweaking a bit and benchmarking with 100 rows of 10,000 elements each

    user system total real
    Create Matrix 3.940000 0.090000 4.030000 ( 4.037665)
    Add Up Matrix 0.710000 0.000000 0.710000 ( 0.711897)
    Create GSL::Matrix 3.400000 0.050000 3.450000 ( 3.496827)
    Add GSL::Matrix 0.080000 0.030000 0.110000 ( 0.111982)
    Matrix Add GSL::Matrix 0.020000 0.000000 0.020000 ( 0.017661)

    With the "Matrix Add" done as:

    (@m * colones.transpose).to_v.sum

    (to avoid @m.transpose, which creates a copy of all of @m).

    Have fun. Note that there are faster ways to initialize a GSL array
    from an external source, if you end up concerned about that source of
    overhead. (Or an NArray).

    -Dave

    dga wrote:
    > Someone already mentioned the NArray way; if you treat the entire
    > thing as a set of matrix operations, you can make it insanely fast in
    > either NArray or GSL. But don't treat it as "N" arrays with "M"
    > entries -- treat it as a matrix, and then all of your operations can be
    > done in the external (very fast) code:
    >
    > m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])
    >
    > You can then either cheat, and "flatten" the matrix into a vector and
    > find the sum:
    >
    > puts m.to_v.sum
    >
    > (which works since they're ones)
    >
    > Or pretend you're in Matlab and do it as a set of matrix operations:
    >
    > colones = GSL::Matrix.new(1, m.size[1])
    > colones[0].set_all(1)
    >
    > rowones = GSL::Matrix.new(1, m.size[0])
    > rowones[0].set_all(1)
    >
    > puts ((colones * m.transpose) * rowones.transpose)[0,0]
    >
    > -Dave
    >
    >
    > wrote:
    > > Hi everybody,
    > >
    > > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > > opinions, advice. Suppose I have n arrays, each of which has m
    > > entries. m is a fairly large integer, on the order of 10,000. Each
    > > entry is either 1 or 0.
    > >
    > > The first task I need to accomplish is figuring out how many times a 1
    > > occurs in the ith entry in an array. So for concreteness, if I had
    > > arrays:
    > >
    > > first = [1,0,0,0,0]
    > > second = [1,1,0,0,0]
    > > third = [0,0,0,1,0]
    > >
    > > I would end up with
    > > count = [2,1,0,1,0]
    > >
    > > I'm just trying to give the general flavor of what I'm working on. I
    > > know I can use some simple each_with_index loops to increment
    > > count[index] (something along the lines of:)
    > >
    > > count = Array.new(m,0)
    > > [first, second, third].each do |array|
    > > array.each_with_index do |item, index|
    > > count[index] += item
    > > end
    > > end
    > >
    > > There are going to be m * 3n * (two constant multipliers for the
    > > looping) object allocations and method calls. m is fairly large, and I
    > > have other, similar, tasks to accomplish with this data. The faster I
    > > can process this data, the more data I can process in a given amount of
    > > time, and the more accurate the analysis will be.
    > >
    > > Is there a slick way to do this with unpacking and packing? Or some
    > > other way to do this with strings? Any modules or libraries I should
    > > look into? I'm fairly new to Ruby, though not to scientific
    > > computation. I realize I won't be able to do this any faster than with
    > > m * n * (a constant multiplier), but I need that constant multiplier to
    > > be as small as possible. Any advice would be appreciated.
    > >
    > > Thanks!
    > > 'cid 'ooh
    dga, Oct 26, 2006
    #7
  8. Guest

    wrote:
    > Hi everybody,
    >
    > I'm writing a fairly open-ended question. I'm hoping for suggestions,
    > opinions, advice.

    <snip>

    Thanks for all your replies. I've worked with the Gnu Scientific
    Library (and ATLAS, too) before, but I didn't realize there were Ruby
    binding for it. I settled on a Ruby/GSL with a custom compiled ATLAS
    backend (instead of just BLAS like GSL uses normally). It's screaming
    fast.

    'cid 'ooh
    , Oct 30, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. nomadx
    Replies:
    1
    Views:
    477
    dinser
    Sep 19, 2003
  2. Wayne Deleersnyder

    Premature end of script headers

    Wayne Deleersnyder, Nov 21, 2003, in forum: Perl
    Replies:
    1
    Views:
    651
    Wayne Deleersnyder
    Nov 21, 2003
  3. tenxian
    Replies:
    6
    Views:
    501
    Arne Vajhøj
    May 1, 2008
  4. Ravikiran

    Zero Optimization and Sign Optimization???

    Ravikiran, Nov 17, 2008, in forum: C Programming
    Replies:
    22
    Views:
    842
    Thad Smith
    Nov 24, 2008
  5. jimxoch
    Replies:
    40
    Views:
    1,151
    Pascal J. Bourguignon
    Oct 5, 2009
Loading...

Share This Page