Concurent (using threads) slower than sequential -doubt

Carlos Ortega · Oct 5, 2008

Hi Folks.
While starting to study the benefits of using threads in Ruby, I tried
to solve the following problem:

I have 3 text files ( numbers0.txt, numbers1.txt, c:\numbers2.txt ),
each file contains a very large list of numbers.
I attempt to read and compute each file by using a different thread.
Finally I tried to sum all subtotals to provide the final result.

Here is the code.
===================

require 'thread'
m_threads = []

print "INITIAL TIME := ", initial_time = Time.now, "\n"
3.times do |i|
m_threads = Thread.new do
total_per_thread = 0
case i
when 0 then path = "C:\\numbers0.txt"
when 1 then path = "C:\\numbers1.txt"
when 2 then path = "C:\\numbers2.txt"
end
File.open( path, "r" ) do |m_file|
while line = m_file.gets
total_per_thread = line.to_i + total_per_thread
end
Thread.current[:INDEX] = total_per_thread
end
end
end

result = 0
m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

print "FINAL TIME := ", final_time = Time.now, "\n"
print "TOTAL TIME := ", total_time = final_time-initial_time, "\n"
print "Total := ", result, "\n"

=======================================
Output (CONCURRENT - Using Threads):

INITIAL TIME := Sun Oct 05 22:07:26 -0500 2008

FINAL TIME := Sun Oct 05 22:07:38 -0500 2008
TOTAL TIME := 11.485
Total := 1150000000
========================================

I verified and each thread made the job, result is OK too.
I also solved the same problem by using a sequential program with no
threads at all
Here is the code:

print "INITIAL Time := ", initial_time = Time.now, "\n"

paths = [ "C:\\numbers0.txt", "C:\\numbers1.txt", "C:\\numbers2.txt" ]
result = 0
for m_path in paths
File.open( m_path, "r+" ) do |m_file|
while line = m_file.gets
result = line.to_i + result
end
end
end

print "FINAL time := ", final_time = Time.now, "\n"
print "TOTAL time := ", total_time = final_time - initial_time, "\n"
print "Total := ", result, "\n"

=======================================
Output: (SECUENCIAL- NO Threads)

INITIAL TIME := Sun Oct 05 22:34:47 -0500 2008
FINAL TIME := Sun Oct 05 22:34:57 -0500 2008
TOTAL TIME := 10.656
Total := 1150000000

=======================================
As you see, the thread based program run slower.
I thought that by using threads it will be faster, but it didn't....Why
is it slower?

Any help will be very appreciated

Charles Oliver Nutter · Oct 6, 2008

Carlos said:
As you see, the thread based program run slower.
I thought that by using threads it will be faster, but it didn't....Why
is it slower?

You may want to try with JRuby, which actually uses native threads. On a
multi-core system, it should improve performance.

- Charlie

Robert Klemme · Oct 6, 2008

2008/10/6 Yukihiro Matsumoto said:
In message "Re: Concurent (using threads) slower than sequential -doubt"

|As you see, the thread based program run slower.
|I thought that by using threads it will be faster, but it didn't....Why
|is it slower?

Threads require context switching, so that they tend to run slower,
especially green threads like Ruby 1.8 has.

There is another issue which may easily have a more serious impact:
since all three files reside in the same directory they are read from
the same physical device (most likely a local (S)ATA disk). And since
these files are large chances are that they are spread over the disk
and do not fit into the operating systems buffer cache. This will lead
to reasonably more head movement and less efficient disk caching than
the sequential approach.

Kind regards

robert

Carlos Ortega · Oct 6, 2008

Thank all of you (Matz, Charles and Robert)

Just one more doubt.....

Since the threads I created really resides as an array that holds
threads object I tried to access each one by using [ ] notation:

for t in m_threads
print t[:INDEX], "\n"
end

The interpreter does not throw any error, but results always indicate:
nil
nil
nil

I tried to verify if they are still running:

Thread.list.each{|t| p t}

Results were:
#<Thread:0x29c5fc0 run>
#<Thread:0x29c6100 run>
#<Thread:0x29c6240 run>
#<Thread:0x294c74c run>

So indeed they are running... the doubt is...why I can't access the
content of the array?
In fact in the statement
m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

I just can compute result variable only after executing t.join..... if
I take out the t.join statement the interpreter throws an error:

PbaThreads.rb:10 : undefined method `+' for nil:NilClass (NoMethodError)

Could you clarify this, please.

Best Regards

Robert Klemme · Oct 6, 2008

2008/10/6 Carlos Ortega said:
Thank all of you (Matz, Charles and Robert)

Just one more doubt.....

Since the threads I created really resides as an array that holds
threads object I tried to access each one by using [ ] notation:

for t in m_threads
print t[:INDEX], "\n"
end

The interpreter does not throw any error, but results always indicate:
nil
nil
nil

I tried to verify if they are still running:

Thread.list.each{|t| p t}

Results were:
#<Thread:0x29c5fc0 run>
#<Thread:0x29c6100 run>
#<Thread:0x29c6240 run>
#<Thread:0x294c74c run>

So indeed they are running... the doubt is...why I can't access the
content of the array?
In fact in the statement
m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

I just can compute result variable only after executing t.join..... if
I take out the t.join statement the interpreter throws an error:

PbaThreads.rb:10 : undefined method `+' for nil:NilClass (NoMethodError)

Could you clarify this, please.

Well, this is obvious: you cannot access the result before it's there.
Since you are setting this as the last statement in the thread you
need to wait (i.e. join) until the thread finishes.

Btw, you can use Thread#value for this. Here's a variant:

require 'benchmark'

files = (1..3).map {|i| "C:\\numbers#{i}.txt"}

Benchmark.bmbm 10 do |b|
b.report "threaded" do
threads = files.map do |file|
Thread.new file do |f|
File.open f do |io|
io.inject(0) {|sum, l| sum + l.to_i}
end
end
end

puts threads.inject(0) {|sum, th| sum + th.value}
end

b.report "sequential" do
puts files.inject(0) {|s, f|
File.open f do |io|
io.inject(s) {|sum, l| sum + l.to_i}
end
}
end
end

Kind regards

robert

Erik Veenstra · Oct 6, 2008

If you are on Linux, you might want to have a look at the gem
"forkandreturn" [1]. ForkAndReturn handles each element in an
enumeration in a seperate process [2].

gegroet,
Erik V. - http://www.erikveen.dds.nl/

[1] http://www.erikveen.dds.nl/forkandreturn/doc/index.html

[2] ...if you're on a multicore machine. Oops. Will be fixed in
the next release.

----------------------------------------------------------------

$ cat count1.rb
files = ["numbers0.txt", "numbers1.txt", "numbers2.txt"]
result = 0

files.collect do |file|
res = 0

File.open(file) do |file|
file.each do |line|
res += line.to_i
end
end

res
end.each do |res|
result += res
end

p result

----------------------------------------------------------------

$ diff -ur count[12].rb | clean_diff
+require "forkandreturn"
+
files = ["numbers0.txt", "numbers1.txt", "numbers2.txt"]
result = 0

-files.collect do |file|
+files.concurrent_collect do |file|
res = 0

File.open(file) do |file|

----------------------------------------------------------------

$ time ruby count1.rb
81627450482688

real 0m15.309s
user 0m15.201s
sys 0m0.076s

----------------------------------------------------------------

$ time ruby count2.rb
81627450482688

real 0m8.976s <=== Multicore!
user 0m17.177s <=== Multicore!
sys 0m0.204s

----------------------------------------------------------------

$ uname -a
Linux laptop 2.6.24-19-generic #1 SMP Wed Aug 20 22:56:21 UTC 2008
i686 GNU/Linux

----------------------------------------------------------------

$ ruby --version
ruby 1.8.6 (2008-06-20 patchlevel 230) [i686-linux]

----------------------------------------------------------------

$ gem list | grep -ie forkandreturn
forkandreturn (0.2.0)

----------------------------------------------------------------

Erik Veenstra · Oct 6, 2008

[2] ...if you're on a multicore machine. Oops. Will be fixed in
the next release.

It's released...

gegroet,
Erik V.

Carlos Ortega · Oct 6, 2008

Erik said:
[2] ...if you're on a multicore machine. Oops. Will be fixed in
the next release.

Click to expand...

It's released...

gegroet,
Erik V.

Thank you Erik and Robert...

I will try on both environments.

Regards
Carlos

Prashant Srinivasan · Oct 7, 2008

Carlos, that sounds about correct. I did some similar tests early this
year[1]. Basically your problem is that Ruby runs on one kernel
thread/LWP irrespective of how many user land threads you create. It's
expensive to switch between threads(cost varies depending on which
hardware platform you're running on) - so these two factors combine to
make it slower for you when you use threads.

JRuby was almost just as bad until JRuby 1.1.1 after which it started
doing better with threads(this was due to a bug fix by Charles [2]).
It's now much better at scaling with threads compared with MRI, but
still quite poor in absolute terms[3] - it's scalability on an
embarrassingly threaded program eroded 54% jumping from 1 to 2 threads
and became worse after that. (*Caveat:* My numbers are old, they're
from March, and things may have gotten much better since!)

[1] http://blogs.sun.com/prashant/resource/files/jruby-ruby-comparison.xls
[2] Ref to Charles' entry
http://blog.headius.com/2008/04/shared-data-considered-harmful.html
[3] http://blogs.sun.com/prashant/resource/files/jruby-threads.xls

-ps

Carlos said:
Hi Folks.
While starting to study the benefits of using threads in Ruby, I tried
to solve the following problem:

I have 3 text files ( numbers0.txt, numbers1.txt, c:\numbers2.txt ),
each file contains a very large list of numbers.
I attempt to read and compute each file by using a different thread.
Finally I tried to sum all subtotals to provide the final result.

Here is the code.
===================

require 'thread'
m_threads = []

print "INITIAL TIME := ", initial_time = Time.now, "\n"
3.times do |i|
m_threads = Thread.new do
total_per_thread = 0
case i
when 0 then path = "C:\\numbers0.txt"
when 1 then path = "C:\\numbers1.txt"
when 2 then path = "C:\\numbers2.txt"
end
File.open( path, "r" ) do |m_file|
while line = m_file.gets
total_per_thread = line.to_i + total_per_thread
end
Thread.current[:INDEX] = total_per_thread
end
end
end

result = 0
m_threads.each{ |t| t.join; result = t[:INDEX] + result; }

print "FINAL TIME := ", final_time = Time.now, "\n"
print "TOTAL TIME := ", total_time = final_time-initial_time, "\n"
print "Total := ", result, "\n"

=======================================
Output (CONCURRENT - Using Threads):

INITIAL TIME := Sun Oct 05 22:07:26 -0500 2008

FINAL TIME := Sun Oct 05 22:07:38 -0500 2008
TOTAL TIME := 11.485
Total := 1150000000
========================================

I verified and each thread made the job, result is OK too.
I also solved the same problem by using a sequential program with no
threads at all
Here is the code:

print "INITIAL Time := ", initial_time = Time.now, "\n"

paths = [ "C:\\numbers0.txt", "C:\\numbers1.txt", "C:\\numbers2.txt" ]
result = 0
for m_path in paths
File.open( m_path, "r+" ) do |m_file|
while line = m_file.gets
result = line.to_i + result
end
end
end

print "FINAL time := ", final_time = Time.now, "\n"
print "TOTAL time := ", total_time = final_time - initial_time, "\n"
print "Total := ", result, "\n"

=======================================
Output: (SECUENCIAL- NO Threads)

INITIAL TIME := Sun Oct 05 22:34:47 -0500 2008
FINAL TIME := Sun Oct 05 22:34:57 -0500 2008
TOTAL TIME := 10.656
Total := 1150000000

=======================================
As you see, the thread based program run slower.
I thought that by using threads it will be faster, but it didn't....Why
is it slower?

Any help will be very appreciated

--
Prashant Srinivasan
F/OSS Enthusiast
Sun Microsystems, Inc.
http://blogs.sun.com/prashant
GnuPG key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x82FBDE5A

fprintf slower than printf and redirect?	1	Nov 29, 2008
Bug? Fixnum# is slower than Bignum#	0	May 21, 2005
Help me understand why the Ruby block is slower than without	34	Mar 10, 2006
Equivalent of "click_no_wait" using win32ole and threads?	0	Feb 6, 2009
Killing threads in perl	5	Mar 7, 2007
threads problem	1	Mar 17, 2006
DRb::DRbBadScheme when using drbunix sockets, why?	3	Feb 13, 2009
this prog keeps stopping again and again	2	Sep 24, 2008

Concurent (using threads) slower than sequential -doubt

Carlos Ortega

Charles Oliver Nutter

Robert Klemme

Carlos Ortega

Robert Klemme

Erik Veenstra

Erik Veenstra

Carlos Ortega

Prashant Srinivasan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads