Trying to make Array#collect massively parallel with OpenMP

D

Daniel Berger

Hi all,

Windows XP Home
VC++ 8 (free edition)

Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn't seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?

static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;

if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);

collect = rb_ary_new2(RARRAY(ary)->len);

#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr));

return collect;
}

rb_define_method(rb_cArray, "acollect", rb_ary_acollect, 0);

# bench_collect.rb
require "benchmark"

MAX = 4000

array = []
MAX.times{ |n|
array[n] = 2 * n
}

# No significant difference (?)
Benchmark.bm(30) do |x|
x.report("Array#collect"){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report("Array#acollect"){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end

Ideas?

Thanks,

Dan
 
J

Jan Svitok

Hi all,

Windows XP Home
VC++ 8 (free edition)

Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn't seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?

static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;

if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);

collect = rb_ary_new2(RARRAY(ary)->len);

#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr));

return collect;
}

rb_define_method(rb_cArray, "acollect", rb_ary_acollect, 0);

# bench_collect.rb
require "benchmark"

MAX = 4000

array = []
MAX.times{ |n|
array[n] = 2 * n
}

# No significant difference (?)
Benchmark.bm(30) do |x|
x.report("Array#collect"){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report("Array#acollect"){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end

Ideas?

Thanks,

Dan


Hi,

I'm no expert in either ruby internals or openmp. I've just a few
ideas for you (though most of them will be obvious probably):

1. http://msdn2.microsoft.com/en-us/library/fw509c3b(VS.80).aspx says
you need to add /openmp compiler switch (try using _OPENMP define to
see if the compiler recognizes omp pragmas). #include "omp.h" might
help as well.

2. in the same page they say you won't see any difference if the whole
loop runs under 15 ms on a specific machine (i.e. the thread startup
time)

3. try running one loop outside the benchmark to setup the thread pool

4. is the assignment intentional (e += 4)?

5. (just for my curiosity:) are you using 1.8 or 1.9?

6. I've heard that 1.8 interpreter runs in one thred. I suppose your
code runs correctly with multiple threads because there should not be
any (re)allocations (e.g. it might crash if there was a local var in
the block). Right?

7. what machine are you running this code on? (if you send me the
binary or patch I might try it on my core2 machine if that helps)

8. I suppose the difference might be bigger if you used more
complicated (longer) block (relates to #2)

9. if everything fails, try running pure c loops (without calling ruby
functions) with omp optimisation, possibly wrapped into a ruby
function to see if omp makes at least difference at c level

ok, I've run out of ideas for now...

Jano
 
D

Daniel Berger

Windows XP Home
VC++ 8 (free edition)
Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn't seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?
static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;
if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
collect = rb_ary_new2(RARRAY(ary)->len);
#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr));

return collect;
}
rb_define_method(rb_cArray, "acollect", rb_ary_acollect, 0);
# bench_collect.rb
require "benchmark"
MAX = 4000
array = []
MAX.times{ |n|
array[n] = 2 * n
}
# No significant difference (?)
Benchmark.bm(30) do |x|
x.report("Array#collect"){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report("Array#acollect"){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end


Dan

Hi,

I'm no expert in either ruby internals or openmp. I've just a few
ideas for you (though most of them will be obvious probably):

1.http://msdn2.microsoft.com/en-us/library/fw509c3b(VS.80).aspxsays
you need to add /openmp compiler switch (try using _OPENMP define to
see if the compiler recognizes omp pragmas). #include "omp.h" might
help as well.


Ah, thanks. I tried that and I got:

LINK : fatal error LNK1104: cannot open file 'VCOMP.lib'

It doesn't look like I have omp.h. This may be a header that's not
included in the free version of VC++. I'll have to ask around.

Thanks,

Dan
 
J

Jan Svitok

Windows XP Home
VC++ 8 (free edition)
Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn't seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?
static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;
if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
collect = rb_ary_new2(RARRAY(ary)->len);
#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr));

return collect;
}
rb_define_method(rb_cArray, "acollect", rb_ary_acollect, 0);
# bench_collect.rb
require "benchmark"
MAX = 4000
array = []
MAX.times{ |n|
array[n] = 2 * n
}
# No significant difference (?)
Benchmark.bm(30) do |x|
x.report("Array#collect"){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report("Array#acollect"){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end


Dan

Hi,

I'm no expert in either ruby internals or openmp. I've just a few
ideas for you (though most of them will be obvious probably):

1.http://msdn2.microsoft.com/en-us/library/fw509c3b(VS.80).aspxsays
you need to add /openmp compiler switch (try using _OPENMP define to
see if the compiler recognizes omp pragmas). #include "omp.h" might
help as well.


Ah, thanks. I tried that and I got:

LINK : fatal error LNK1104: cannot open file 'VCOMP.lib'

It doesn't look like I have omp.h. This may be a header that's not
included in the free version of VC++. I'll have to ask around.

Thanks,

Dan


Seems OMP is included in standard and up.
http://members.gamedev.net/Rivorus/surge/html/surge_act/setting_up_your_compiler.html

If you send me the patch/instructions I can compile that for you.

Jano
 
D

Daniel Berger

Hi all,
Windows XP Home
VC++ 8 (free edition)
Just for kicks I tried creating a parallel Array#collect method, which
I called Array#acollect (asynch. collect). I added the following C
code to array.c, rebuilt and reinstalled, but it doesn't seem to be
any faster. Could this be an issue with my compiler? Or a Windows
thing?
static VALUE rb_ary_acollect(VALUE ary){
long i;
VALUE collect;
if (!rb_block_given_p())
return rb_ary_new4(RARRAY(ary)->len, RARRAY(ary)->ptr);
collect = rb_ary_new2(RARRAY(ary)->len);
#pragma omp parallel for
for (i = 0; i < RARRAY(ary)->len; i++)
rb_ary_push(collect, rb_yield(RARRAY(ary)->ptr));
return collect;
}
rb_define_method(rb_cArray, "acollect", rb_ary_acollect, 0);
# bench_collect.rb
require "benchmark"
MAX = 4000
array = []
MAX.times{ |n|
array[n] = 2 * n
}
# No significant difference (?)
Benchmark.bm(30) do |x|
x.report("Array#collect"){
MAX.times{ array.collect{ |e| e += 4 } }
}
x.report("Array#acollect"){
MAX.times{ array.acollect{ |e| e += 4 } }
}
end
Ideas?
Thanks,
Dan
Hi,
I'm no expert in either ruby internals or openmp. I've just a few
ideas for you (though most of them will be obvious probably):
1.http://msdn2.microsoft.com/en-us/library/fw509c3b(VS.80).aspxsays
you need to add /openmp compiler switch (try using _OPENMP define to
see if the compiler recognizes omp pragmas). #include "omp.h" might
help as well.

Ah, thanks. I tried that and I got:
LINK : fatal error LNK1104: cannot open file 'VCOMP.lib'
It doesn't look like I have omp.h. This may be a header that's not
included in the free version of VC++. I'll have to ask around.

Dan

Seems OMP is included in standard and up.http://members.gamedev.net/Rivorus/surge/html/surge_act/setting_up_yo...
Drat.

If you send me the patch/instructions I can compile that for you.


Edit array.c and add the rb_ary_acollect function in my OP anywhere
above the Init_array() declaration. Add "rb_define_method(rb_cArray,
"acollect", rb_ary_acollect, 0);" where all the other method
defintions are (near the bottom). Then recompile and install.

Thanks,

Dan
 
D

Daniel Berger

Hi,

as I wasn't able to compile 1.8.5-p12 I used the brand new 1.8.6 and
VS 2005 SP1, XP SP2.

Patch to array.c is included and I added -openmp to CPPFLAGS.

Second run without assignments:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total real
Array#collect 6.875000 0.047000 6.922000 ( 6.953000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17: [BUG]
cross-thread violation on rb_thread_schedule()
ruby 1.8.6 (2007-03-13) [i386-mswin32_80]

Ouch. I had a feeling it would collapse. Maybe wrapping the relevant
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Oh, well. It was fun to try at least. :)

Thanks,

Dan
 
J

Jan Svitok

Hi,

as I wasn't able to compile 1.8.5-p12 I used the brand new 1.8.6 and
VS 2005 SP1, XP SP2.

Patch to array.c is included and I added -openmp to CPPFLAGS.

Second run without assignments:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total real
Array#collect 6.875000 0.047000 6.922000 ( 6.953000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17: [BUG]
cross-thread violation on rb_thread_schedule()
ruby 1.8.6 (2007-03-13) [i386-mswin32_80]

Ouch. I had a feeling it would collapse. Maybe wrapping the relevant
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Oh, well. It was fun to try at least. :)

Thanks,

Dan

I have tried with 1.9 as well. It crashed in an even more interesting
way -- two pages of hex numbers, see below.

So the result is when ruby will be able to handle threads, this could
be a way to speed it up. So far it goes faster down ;-)

Jano

user system total real
Array#collect 2.515000 0.031000 2.546000 ( 2.546000)
Array#acollect -- stack frame ------------
0000 (00BD0020): 00000004
0001 (00BD0024): 00000005
0002 (00BD0028): 00b5f70c
0003 (00BD002C): 00000004
0004 (00BD0030): 00b5f6e4
0005 (00BD0034): 00b60738
0006 (00BD0038): 0000003d
0007 (00BD003C): 00b5f6f8
0008 (00BD0040): 00b5f6d0
0009 (00BD0044): 00000004
0010 (00BD0048): 00ba5049
0011 (00BD004C): 00000004
0012 (00BD0050): 00b5f694
0013 (00BD0054): 0000003d
0014 (00BD0058): 00b5feb4
0015 (00BD005C): 00b5f680
0016 (00BD0060): 00000000
0017 (00BD0064): 00000004
0018 (00BD0068): 00000004
0019 (00BD006C): 00ba5049
0020 (00BD0070): 00b5f66c
0021 (00BD0074): 00b5f630
0022 (00BD0078): 00b5f66c
0023 (00BD007C): 00b4bc20
0024 (00BD0080): 00b4bc0c
0025 (00BD0084): 00b4bbf8
0026 (00BD0088): 00000004
0027 (00BD008C): 00000004
0028 (00BD0090): 00baa369
0029 (00BD0094): 00b60738
0030 (00BD0098): 00b4bbd0
0031 (00BD009C): 00b4bb58
0032 (00BD00A0): 00b4bb08
0033 (00BD00A4): 00000004
0034 (00BD00A8): 00000004
0035 (00BD00AC): 00000004
0036 (00BD00B0): 00baa369
0037 (00BD00B4): 00ba51bd
0038 (00BD00B8): 00001f41
0039 (00BD00BC): 00000004
0040 (00BD00C0): 00c4fdf5
0041 (00BD00C4): 00000004
0042 (00BD00C8): 00c4fdf5
0043 (00BD00CC): 00bd00b5 (= 37)
0044 (00BD00D0): 00b5f70c
0045 (00BD00D4): 00000004
0046 (00BD00D8): 00c4fd35
0047 (00BD00DC): 00000004
0048 (00BD00E0): 00c4fd35
0049 (00BD00E4): 00000231
0050 (00BD00E8): 00000231
0051 (00BD00EC): 00000004
0052 (00BD00F0): 00000001 <- lfp <- dfp
-- control frame ----------
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC :initialize
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------
---------------------------
-- stack frame ------------
-- control frame ----------
c:0019 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0018 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0017 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0016 p:---- s:-3096584 b:-001 l:00000000 d:00000000 ------
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC :initialize
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------
---------------------------
-- stack frame ------------
0000 (00BD0020): 00000004
0001 (00BD0024): 00000005
0002 (00BD0028): 00b5f70c
0003 (00BD002C): 00000004
0004 (00BD0030): 00b5f6e4
0005 (00BD0034): 00b60738
0006 (00BD0038): 0000003d
0007 (00BD003C): 00b5f6f8
0008 (00BD0040): 00b5f6d0
0009 (00BD0044): 00000004
0010 (00BD0048): 00ba5049
0011 (00BD004C): 00000004
0012 (00BD0050): 00b5f694
0013 (00BD0054): 0000003d
0014 (00BD0058): 00b5feb4
0015 (00BD005C): 00b5f680
0016 (00BD0060): 00000000
0017 (00BD0064): 00000004
0018 (00BD0068): 00000004
0019 (00BD006C): 00ba5049
0020 (00BD0070): 00b5f66c
0021 (00BD0074): 00b5f630
0022 (00BD0078): 00b5f66c
0023 (00BD007C): 00b4bc20
0024 (00BD0080): 00b4bc0c
0025 (00BD0084): 00b4bbf8
0026 (00BD0088): 00000004
0027 (00BD008C): 00000004
0028 (00BD0090): 00baa369
0029 (00BD0094): 00b60738
0030 (00BD0098): 00b4bbd0
0031 (00BD009C): 00b4bb58
0032 (00BD00A0): 00b4bb08
0033 (00BD00A4): 00000004
0034 (00BD00A8): 00000004
0035 (00BD00AC): 00000004
0036 (00BD00B0): 00baa369
0037 (00BD00B4): 00ba51bd
0038 (00BD00B8): 00001f41
0039 (00BD00BC): 00000004
0040 (00BD00C0): 00c4fdf5
0041 (00BD00C4): 00000004
0042 (00BD00C8): 00c4fdf5
0043 (00BD00CC): 00bd00b5 (= 37)
0044 (00BD00D0): 00b5f70c
0045 (00BD00D4): 00000004
0046 (00BD00D8): 00c4fd35
0047 (00BD00DC): 00000004
0048 (00BD00E0): 00c4fd35
0049 (00BD00E4): 00000231
0050 (00BD00E8): 00000231
-- control frame ----------
c:0014 p:---- s:0051 b:0053 l:000052 d:000052 CFUNC :new
c:0013 p:---- s:0047 b:0047 l:000046 d:000046 CFUNC :acollect
c:0012 p:0008 s:0044 b:0044 l:000000D8 d:000043 BLOCK bench_collect.rb:17
c:0011 p:---- s:0043 b:0043 l:000042 d:000042 FINISH
c:0010 p:---- s:0041 b:0041 l:000040 d:000040 CFUNC :times
c:0009 p:0013 s:0038 b:0038 l:000000D8 d:000037 BLOCK bench_collect.rb:17
c:0008 p:0037 s:0037 b:0037 l:000036 d:000036 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:293
c:0007 p:0037 s:0029 b:0029 l:000028 d:000028 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:377
c:0006 p:0023 s:0022 b:0022 l:000000D8 d:0000026C BLOCK bench_collect.rb:16
c:0005 p:0134 s:0020 b:0020 l:000019 d:000019 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:177
c:0004 p:0037 s:0011 b:0011 l:000010 d:000010 METHOD c:/ruby19/usr/lib/ruby/1.9/
benchmark.rb:207
c:0003 p:0048 s:0005 b:0005 l:000000D8 d:000000D8 TOP bench_collect.rb:12
c:0002 p:---- s:0002 b:0002 l:000001 d:000001 FINISH
c:0001 p:---- s:0000 b:-001 l:000000 d:000000 ------
---------------------------
DBG> : "bench_collect.rb:17:in `acollect'"
DBG> : "bench_collect.rb:17:in `block (3 levels) in <main>'"
DBG> : "bench_collect.rb:17:in `times'"
DBG> : "bench_collect.rb:17:in `block (2 levels) in <main>'"
DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:293:in `measure'"
DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:377:in `item'"
DBG> : "bench_collect.rb:16:in `block in <main>'"
DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:177:in `benchmark'"
DBG> : "c:/ruby19/usr/lib/ruby/1.9/benchmark.rb:207:in `bm'"
DBG> : "bench_collect.rb:12:in `<main>'"
[BUG] cfp consistency error - call0
ruby 1.9.0 (2007-03-13) [i386-mswin32_80]


This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
 
R

Rick DeNatale

Second run without assignments:

W:\projects\ruby\ruby\ruby-1.8.6\win32>bench_collect.rb
user system total real
Array#collect 6.875000 0.047000 6.922000 ( 6.953000)
Array#acollect
W:/projects/ruby/ruby/ruby-1.8.6/win32/bench_collect.rb:17: [BUG]
cross-thread violation on rb_thread_schedule()
ruby 1.8.6 (2007-03-13) [i386-mswin32_80]

Ouch. I had a feeling it would collapse. Maybe wrapping the relevant
code in RUBY_CRITICAL would work, but that may defeat the purpose,
assuming it even works at all.

Even if it worked it would have some other problems. If the block
argument was at all sensitive to evaluation order, for example, the
results would be indeterminate I think.

For a cooked up example:

i = 0
(1..100).to_a.acollect {|elem| i += 1}

Interaction with the GC might also be interesting.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top