trouble with threads

gm gm · Feb 21, 2008

I am trying to expand my web crawler to use multiple threads (with
mechanize), and I ma having some trouble. It seems that each thread is
not creating a local variable, but rather they are sharing the "index"
variable below:

threads = []
mutex = Mutex.new

10.times do |i|
threads = Thread.new(i) { |index|
while index < @will_visit.size
current_link = @will_visit[index]
begin
index += 10
puts current_link
page = @agent.get(current_link)
if(page.kind_of? WWW::Mechanize:age)
page.links.each do |link|
mutex.synchronize do
if(validLink?(link))
@will_visit.push(link.href)
end
end
end
end

puts "Currently visiting page #{index} of #{@will_visit.size}"
rescue Exception => msg
puts "Error with " + current_link
puts msg
puts msg.backtrace
end
end
}
end
threads.each {|t| t.join }

From what I have read from google, the 'index' variable should be
independent between threads, but it seems that it is shared. The problem
may also be with the face that @agent is shared, but I am not sure

MenTaLguY · Feb 21, 2008

From what I have read from google, the 'index' variable should be
independent between threads, but it seems that it is shared. The problem
may also be with the face that @agent is shared, but I am not sure

'index' is indeed independent here; @agent being shared, however, is
very likely to cause problems. As far as I know, WWW::Mechanize agents
are not safe for use by multiple threads. Each thread will need its
own agent.

-mental

Judson Lester · Feb 21, 2008

'index' is indeed independent here; @agent being shared, however, is
very likely to cause problems. As far as I know, WWW::Mechanize agents
are not safe for use by multiple threads. Each thread will need its
own agent.

That agrees completely with my direct experience. WWW::Mechanize is,
sadly, not threadsafe - it reuses a buffer for each request, and if
you start a new request before another completes, the new request
will clobber the input buffer. I gladly share my painfully won
experience with you.

Judson

gm gm · Feb 21, 2008

Judson said:
That agrees completely with my direct experience. WWW::Mechanize is,
sadly, not threadsafe - it reuses a buffer for each request, and if
you start a new request before another completes, the new request
will clobber the input buffer. I gladly share my painfully won
experience with you.

Judson

that would explain the crazy output. I will try and modify it a bit, and
see how it works. Thanks!

gm gm · Feb 21, 2008

Judson said:
That agrees completely with my direct experience. WWW::Mechanize is,
sadly, not threadsafe - it reuses a buffer for each request, and if
you start a new request before another completes, the new request
will clobber the input buffer. I gladly share my painfully won
experience with you.

Judson

that seemed to make a big difference...also, do you think I need to put
a mutex around the 'out.puts' call (ie file output?)

7stud -- · Feb 21, 2008

gm said:
I am trying to expand my web crawler to use multiple threads (with
mechanize), and I ma having some trouble. It seems that each thread is
not creating a local variable, but rather they are sharing the "index"
variable

threads = []

10.times do |i|
threads = Thread.new(i) do |index|
if index == 0
index += 100
end

puts index

end
end

threads.each do |t|
t.join
end

--output:--
100
1
2
3
4
5
6
7
8
9

If the threads shared the index variable, then each line of the output
would be 100. Instead, i gets assigned to index for each thread, and
nothing a thread does to its index variable has any effect on another
thread's index variable.

Understanding Threads...	3	Nov 9, 2009
help on threads synchronization	7	May 13, 2011
Threads and synchronized access to an array	3	Jun 18, 2008
question on threads	3	Mar 9, 2010
Ruby 1.9, threads and FreeBSD 5	3	Feb 20, 2008
Ruby 1.9, threads and FreeBSD 5	6	Feb 20, 2008
Multi Threading	5	Apr 17, 2009
memory leak	26	Oct 20, 2009

trouble with threads

gm gm

MenTaLguY

Judson Lester

gm gm

gm gm

7stud --

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads