Threading Loops

B

Bobby S.

I understood how to thread functions, but I don't understand how to
implement it outside of that. I am trying to make a picture re-namer and
I want to thread the renaming part to speed it up.


pic_names = Dir["{E://**/*.{JPG, jpg}"]
pic_numb = 1;
batch_name = "test"

#Want to thread this part.
pic_names.each do |name|
print ' . '
new_name = batch_name + pic_numb.to_s + ' .jpg'
File.rename name, new_name
pic_numb += 1
end

Thanks in advance.
 
R

Roy Zuo

It is very convenient to use Thread in Ruby, but maybe you should first get rid
of pic_numb self-increment statements.

pic_names.collect.with_index do |name, pic_numb|
Thread.new do
print ' . '
new_name = batch_name + pic_numb.to_s + ' .jpg'
File.rename name, new_name
end
end.each{ |thread| thread.join }

Roy

I understood how to thread functions, but I don't understand how to
implement it outside of that. I am trying to make a picture re-namer and
I want to thread the renaming part to speed it up.


pic_names = Dir["{E://**/*.{JPG, jpg}"]
pic_numb = 1;
batch_name = "test"

#Want to thread this part.
pic_names.each do |name|
print ' . '
new_name = batch_name + pic_numb.to_s + ' .jpg'
File.rename name, new_name
pic_numb += 1
end

Thanks in advance.

--
 
B

Bobby S.

Sorry for double posting just wanted to state I understand it now, used
irb tell I understood what was going on.
 
7

7stud --

Hi,

I don't think threading will speed up your execution time in this case.
Threading is an illusion: processing switches rapidly between threads;
threads don't actually execute at the same time. So unless there is
some waiting going on somewhere in your code, and switching to another
thread will make use of that downtime, there won't be any improvement in
execution speed.

Even if you have a computer with multiple processors, it won't help:

===
...if your machine has more than one processor, Ruby threads won't take
advantage of that fact - because they run in one process, and in a
single native thread, they are constrained to run on one processor at a
time.

http://rubylearning.com/satishtalim/ruby_threads.html
===
 
7

7stud --

Bobby S. wrote in post #997304:
The threading made no deference in speed at all not what I expected.

Ok, that is what I expected. To truly do two things at once in ruby,
you have to create multiple processes. So you need to read up on
fork().
 
K

Kevin Bullock

I can't imagine that even forking would help you here=E2=80=94your OS wil=
l still =

be performing only one I/O call at a time. Unless you've got an OS =

that's doing some funny optimization like aggregating filesystem =

metadata changes, you won't see any speedup. And that would be a strange =

thing to optimize for.

Multiprocessing generally only helps when your application is CPU bound, =

not I/O bound.

-- =

Posted via http://www.ruby-forum.com/.=
 
7

7stud --

Bobby S. wrote in post #997293:
Thank you so much that helped allot and now I understand threading
better.

Could you explain the first part?
pic_names.collect.with_index do |name, pic_numb|

I understand how the threading is working but not the loop. I read
.collect returns all elements in an array. But what about the
with_index and adding pic_numb to the iterator.

In ruby 1.9, if you call collect() without supplying a block, you get
what's called an 'enumerator' back. It's an object of the class
Enumerator, which has a method called with_index(). with_index() works
just like each()--but it sends a second argument to the block: the index
of the element.

I don't like that collect() loop at all. collect() returns an array
containing elements of the original array for which the block evaluates
to true. But the only thing inside the block is Thread.new(), which
always returns something that evaluates to true, so all elements of the
original array are selected by collect() and returned in a new array,
which is then discarded because the result of collect() isn't assigned
to a variable. So, why not just use each(), which does not return a new
array:

arr = ['a', 'b', 'c']
arr.each.with_index do |el, index|
p [el, index]
end

--output:--
["a", 0]
["b", 1]
["c", 2]
 
B

Bobby S.

Yea I just went to irb and started playing with .collect and with_index
tell I figured out what it did since I couldn't find any information on
with_index that I could understand.

I didn't understand threading much, but I still wanted to know how to do
this for future reference so it helps even though it didn't help if that
makes sense. Also your explanation helped allot.
 
C

Christopher Dicely

Hi,

I don't think threading will speed up your execution time in this case.
Threading is an illusion: processing switches rapidly between threads;
threads don't actually execute at the same time.

Well, if you are using Ruby 1.8 with its green threads, or Ruby 1.9
and not calling native routines that release the global interpreter
lock. If you are using JRuby, threads are concurrent native threads
with no GIL, so they do run concurrently, and you should see a speedup
on tasks that are CPU-bound and efficiently parallelizable.
===
...if your machine has more than one processor, Ruby threads won't take
advantage of that fact - because they run in one process, and in a
single native thread, they are constrained to run on one processor at a
time.

http://rubylearning.com/satishtalim/ruby_threads.html
===

This is true in MRI 1.8, which uses "green" threads implemented in the
runtime library with a single native thread backing them. Its not
true, from what I understan, in older versions of MacRuby, or Ruby
1.9, or the current versions of Rubinius (in all three of these,
threads are native threads, but concurrency is limited by a global
interpreter lock), current MacRuby or JRuby (threads are native
threads with no global lock), and I haven't seen anything about
Maglev's threading model.

While the Thread API is part of the Ruby language, threading
implementations vary between Ruby implementations.
 
R

Robert Klemme

Bobby S. wrote in post #997293:

In ruby 1.9, if you call collect() without supplying a block, you get
what's called an 'enumerator' back. It's an object of the class
Enumerator, which has a method called with_index(). with_index() works
just like each()--but it sends a second argument to the block: the index
of the element.

I don't like that collect() loop at all. collect() returns an array
containing elements of the original array for which the block evaluates
to true.

It seems you are confusing #collect with #select here.

irb(main):006:0> a=[true,false,nil,1,2,3]
=> [true, false, nil, 1, 2, 3]
irb(main):007:0> a.collect {|x| x}
=> [true, false, nil, 1, 2, 3]
irb(main):008:0> a.select {|x| x}
=> [true, 1, 2, 3]
irb(main):009:0> a.collect {false}
=> [false, false, false, false, false, false]
irb(main):010:0> a.select {false}
=> []
irb(main):011:0> a.select {true}
=> [true, false, nil, 1, 2, 3]

#collect does the same as #map: it creates a new Array containing the
result of block evaluation on each element in the original Enumerable.

irb(main):012:0> a.collect {|x| x.inspect}
=> ["true", "false", "nil", "1", "2", "3"]
But the only thing inside the block is Thread.new(), which
always returns something that evaluates to true, so all elements of the
original array are selected by collect() and returned in a new array,
which is then discarded because the result of collect() isn't assigned
to a variable.

That's not true either:
pic_names.collect.with_index do |name, pic_numb|
Thread.new do
print ' . '
new_name = batch_name + pic_numb.to_s + ' .jpg'
File.rename name, new_name
end
end.each{ |thread| thread.join }

This creates a thread for each input and then joins on all of them.
It's perfectly appropriate and even elegant to use #collect here.

irb(main):013:0> a.collect.with_index {|x,y| [x,y]}
=> [[true, 0], [false, 1], [nil, 2], [1, 3], [2, 4], [3, 5]]

With Threads:

irb(main):014:0> a.collect.with_index {|x,y| Thread.new {}}
=> [#<Thread:0x106a8278 run>, #<Thread:0x106a81d0 run>,
#<Thread:0x106a8160 run>, #<Thread:0x106a80f0 run>, #<Thread:0x106a8048
run>, #<Thread:0x106a7fbc run>]

Kind regards

robert
 
R

Robert Klemme

Bobby S. wrote in post #997304:

Ok, that is what I expected. To truly do two things at once in ruby,
you have to create multiple processes. So you need to read up on
fork().

It's questionable whether that will yield any benefit since it's likely
not CPU what's making it slow but rather IO. Depending on the file
system there might be some locking on the directory involved when
renaming. In any case a FS needs to take measures to not make disk
contents inconsistent in case there are multiple concurrent writes to a
directory. That's where the bottleneck likely lies.

I don't think that doing the rename concurrently will give any
improvements. I would just do it sequentially or at least not with so
many threads (at most 2). I don't think it's worth going through that
hassle.

Kind regards

robert
 
B

Brian Candler

rename is almost certainly disk limited - doing them in parallel is
almost certainly not going to be faster. In any case, Ruby threads are
not true threads, only one executes at once (in MRI anyway, other
implementations like JRuby are different)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,208
Latest member
RandallLay

Latest Threads

Top