# Premature Optimization

Discussion in 'Ruby' started by poopdeville@gmail.com, Oct 25, 2006.

1. ### Guest

Hi everybody,

I'm writing a fairly open-ended question. I'm hoping for suggestions,
opinions, advice. Suppose I have n arrays, each of which has m
entries. m is a fairly large integer, on the order of 10,000. Each
entry is either 1 or 0.

The first task I need to accomplish is figuring out how many times a 1
occurs in the ith entry in an array. So for concreteness, if I had
arrays:

first = [1,0,0,0,0]
second = [1,1,0,0,0]
third = [0,0,0,1,0]

I would end up with
count = [2,1,0,1,0]

I'm just trying to give the general flavor of what I'm working on. I
know I can use some simple each_with_index loops to increment
count[index] (something along the lines of

count = Array.new(m,0)
[first, second, third].each do |array|
array.each_with_index do |item, index|
count[index] += item
end
end

There are going to be m * 3n * (two constant multipliers for the
looping) object allocations and method calls. m is fairly large, and I
have other, similar, tasks to accomplish with this data. The faster I
can process this data, the more data I can process in a given amount of
time, and the more accurate the analysis will be.

Is there a slick way to do this with unpacking and packing? Or some
other way to do this with strings? Any modules or libraries I should
look into? I'm fairly new to Ruby, though not to scientific
computation. I realize I won't be able to do this any faster than with
m * n * (a constant multiplier), but I need that constant multiplier to
be as small as possible. Any advice would be appreciated.

Thanks!
'cid 'ooh

, Oct 25, 2006

2. ### Robert KlemmeGuest

wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.

> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.

I would look out for an implementation of a BitSet as this data
structure seems crucial for your application. You could even create one
your own - it's not too hard.

Kind regards

robert

Robert Klemme, Oct 25, 2006

3. ### Guest

On Thu, 26 Oct 2006 wrote:

> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m entries. m
> is a fairly large integer, on the order of 10,000. Each entry is either 1
> or 0.
>
> The first task I need to accomplish is figuring out how many times a 1
> occurs in the ith entry in an array. So for concreteness, if I had arrays:
>
> first = [1,0,0,0,0]
> second = [1,1,0,0,0]
> third = [0,0,0,1,0]
>
> I would end up with
> count = [2,1,0,1,0]

harp:~ > cat a.rb
require 'narray'

first = NArray.to_na [1,0,0,0,0]
second = NArray.to_na [1,1,0,0,0]
third = NArray.to_na [0,0,0,1,0]

count = first.eq(1) + second.eq(1) + third.eq(1)

p count

harp:~ > ruby a.rb
NArray.byte(5):
[ 2, 1, 0, 1, 0 ]

> I'm just trying to give the general flavor of what I'm working on. I know I
> can use some simple each_with_index loops to increment count[index]
> (something along the lines of
>
> count = Array.new(m,0)
> [first, second, third].each do |array|
> array.each_with_index do |item, index|
> count[index] += item
> end
> end
>
> There are going to be m * 3n * (two constant multipliers for the
> looping) object allocations and method calls. m is fairly large, and I
> have other, similar, tasks to accomplish with this data. The faster I
> can process this data, the more data I can process in a given amount of
> time, and the more accurate the analysis will be.

i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.

regards.

-a
--
my religion is very simple. my religion is kindness. -- the dalai lama

, Oct 25, 2006
4. ### Wilson BilkovichGuest

On 10/25/06, <> wrote:
> On Thu, 26 Oct 2006 wrote:
>
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m entries. m
> > is a fairly large integer, on the order of 10,000. Each entry is either 1
> > or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]

>
> harp:~ > cat a.rb
> require 'narray'
>
> first = NArray.to_na [1,0,0,0,0]
> second = NArray.to_na [1,1,0,0,0]
> third = NArray.to_na [0,0,0,1,0]
>
> count = first.eq(1) + second.eq(1) + third.eq(1)
>
> p count
>
>
> harp:~ > ruby a.rb
> NArray.byte(5):
> [ 2, 1, 0, 1, 0 ]
>
>
> > I'm just trying to give the general flavor of what I'm working on. I know I
> > can use some simple each_with_index loops to increment count[index]
> > (something along the lines of
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.

>
> i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.
>
>

What's it going to take to get this into the stdlib? It is awesome.

Wilson Bilkovich, Oct 25, 2006
5. ### Guest

On Thu, 26 Oct 2006, Wilson Bilkovich wrote:

> What's it going to take to get this into the stdlib? It is awesome.

it sure it. i've be campaigning for years to no avail... maybe it's time to
bug matz again? it even compiles on windows: both it and the gsl can be found
precompiled here:

http://codeforpeople.com/lib/ruby/rb-gsl-win/

regards.

-a
--
my religion is very simple. my religion is kindness. -- the dalai lama

, Oct 26, 2006
6. ### dgaGuest

Someone already mentioned the NArray way; if you treat the entire
thing as a set of matrix operations, you can make it insanely fast in
either NArray or GSL. But don't treat it as "N" arrays with "M"
entries -- treat it as a matrix, and then all of your operations can be
done in the external (very fast) code:

m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])

You can then either cheat, and "flatten" the matrix into a vector and
find the sum:

puts m.to_v.sum

(which works since they're ones)

Or pretend you're in Matlab and do it as a set of matrix operations:

colones = GSL::Matrix.new(1, m.size[1])
colones[0].set_all(1)

rowones = GSL::Matrix.new(1, m.size[0])
rowones[0].set_all(1)

puts ((colones * m.transpose) * rowones.transpose)[0,0]

-Dave

wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.
>
> The first task I need to accomplish is figuring out how many times a 1
> occurs in the ith entry in an array. So for concreteness, if I had
> arrays:
>
> first = [1,0,0,0,0]
> second = [1,1,0,0,0]
> third = [0,0,0,1,0]
>
> I would end up with
> count = [2,1,0,1,0]
>
> I'm just trying to give the general flavor of what I'm working on. I
> know I can use some simple each_with_index loops to increment
> count[index] (something along the lines of
>
> count = Array.new(m,0)
> [first, second, third].each do |array|
> array.each_with_index do |item, index|
> count[index] += item
> end
> end
>
> There are going to be m * 3n * (two constant multipliers for the
> looping) object allocations and method calls. m is fairly large, and I
> have other, similar, tasks to accomplish with this data. The faster I
> can process this data, the more data I can process in a given amount of
> time, and the more accurate the analysis will be.
>
> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.
>
> Thanks!
> 'cid 'ooh

dga, Oct 26, 2006
7. ### dgaGuest

Tweaking a bit and benchmarking with 100 rows of 10,000 elements each

user system total real
Create Matrix 3.940000 0.090000 4.030000 ( 4.037665)
Add Up Matrix 0.710000 0.000000 0.710000 ( 0.711897)
Create GSL::Matrix 3.400000 0.050000 3.450000 ( 3.496827)
Add GSL::Matrix 0.080000 0.030000 0.110000 ( 0.111982)
Matrix Add GSL::Matrix 0.020000 0.000000 0.020000 ( 0.017661)

With the "Matrix Add" done as:

(@m * colones.transpose).to_v.sum

(to avoid @m.transpose, which creates a copy of all of @m).

Have fun. Note that there are faster ways to initialize a GSL array
from an external source, if you end up concerned about that source of

-Dave

dga wrote:
> Someone already mentioned the NArray way; if you treat the entire
> thing as a set of matrix operations, you can make it insanely fast in
> either NArray or GSL. But don't treat it as "N" arrays with "M"
> entries -- treat it as a matrix, and then all of your operations can be
> done in the external (very fast) code:
>
> m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])
>
> You can then either cheat, and "flatten" the matrix into a vector and
> find the sum:
>
> puts m.to_v.sum
>
> (which works since they're ones)
>
> Or pretend you're in Matlab and do it as a set of matrix operations:
>
> colones = GSL::Matrix.new(1, m.size[1])
> colones[0].set_all(1)
>
> rowones = GSL::Matrix.new(1, m.size[0])
> rowones[0].set_all(1)
>
> puts ((colones * m.transpose) * rowones.transpose)[0,0]
>
> -Dave
>
>
> wrote:
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m
> > entries. m is a fairly large integer, on the order of 10,000. Each
> > entry is either 1 or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had
> > arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]
> >
> > I'm just trying to give the general flavor of what I'm working on. I
> > know I can use some simple each_with_index loops to increment
> > count[index] (something along the lines of
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.
> >
> > Is there a slick way to do this with unpacking and packing? Or some
> > other way to do this with strings? Any modules or libraries I should
> > look into? I'm fairly new to Ruby, though not to scientific
> > computation. I realize I won't be able to do this any faster than with
> > m * n * (a constant multiplier), but I need that constant multiplier to
> > be as small as possible. Any advice would be appreciated.
> >
> > Thanks!
> > 'cid 'ooh

dga, Oct 26, 2006
8. ### Guest

wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,

<snip>

Thanks for all your replies. I've worked with the Gnu Scientific
Library (and ATLAS, too) before, but I didn't realize there were Ruby
binding for it. I settled on a Ruby/GSL with a custom compiled ATLAS
backend (instead of just BLAS like GSL uses normally). It's screaming
fast.

'cid 'ooh

, Oct 30, 2006