G
gm gm
I am trying to expand my web crawler to use multiple threads (with
mechanize), and I ma having some trouble. It seems that each thread is
not creating a local variable, but rather they are sharing the "index"
variable below:
threads = []
mutex = Mutex.new
10.times do |i|
threads = Thread.new(i) { |index|
while index < @will_visit.size
current_link = @will_visit[index]
begin
index += 10
puts current_link
page = @agent.get(current_link)
if(page.kind_of? WWW::Mechanize:age)
page.links.each do |link|
mutex.synchronize do
if(validLink?(link))
@will_visit.push(link.href)
end
end
end
end
puts "Currently visiting page #{index} of #{@will_visit.size}"
rescue Exception => msg
puts "Error with " + current_link
puts msg
puts msg.backtrace
end
end
}
end
threads.each {|t| t.join }
From what I have read from google, the 'index' variable should be
independent between threads, but it seems that it is shared. The problem
may also be with the face that @agent is shared, but I am not sure
mechanize), and I ma having some trouble. It seems that each thread is
not creating a local variable, but rather they are sharing the "index"
variable below:
threads = []
mutex = Mutex.new
10.times do |i|
threads = Thread.new(i) { |index|
while index < @will_visit.size
current_link = @will_visit[index]
begin
index += 10
puts current_link
page = @agent.get(current_link)
if(page.kind_of? WWW::Mechanize:age)
page.links.each do |link|
mutex.synchronize do
if(validLink?(link))
@will_visit.push(link.href)
end
end
end
end
puts "Currently visiting page #{index} of #{@will_visit.size}"
rescue Exception => msg
puts "Error with " + current_link
puts msg
puts msg.backtrace
end
end
}
end
threads.each {|t| t.join }
From what I have read from google, the 'index' variable should be
independent between threads, but it seems that it is shared. The problem
may also be with the face that @agent is shared, but I am not sure