thread help

B

Bart Nessux

Howdy,

Below is a script that I'm using to try and count the number of HTTP
servers within a company's private network. There are 65,536 possible
hosts that may have HTTP servers on them. Any way, I wrote this script
at first w/o threads. It works, but it takes days to run as it probes
one IP at a time... so I thought I'd try to make it threaded so it could
test several dozen IPs at once.

I'm no expert on threading, far from it. Could someone show me how I can
make this work correctly? I want to probe 64 unique IP address for HTTP
servers simultaneously, not the same IP addy 64 times (as I'm doing
now). Any tips would be much appreciated.

Bart

import urllib2, socket, threading, time

class trivialthread(threading.Thread):
def run(self):
socket.setdefaulttimeout(1)

hosts = []
networks = []

# Add the network 192.168.0 possibility.
networks.append("192.168.0.")
n = 0
while n < 255:
n = n + 1
# Generate and add networks 192.168.1-255 to the list of networks.
networks.append("192.168.%s." %(n))

for network in networks:
h = 0
# Add the n.n.n.0 host possibility
hosts.append(network+str(h))
while h < 255:
h = h + 1
# Add hosts 1 - 255 to each network.
hosts.append(network+str(h))

websites = file('websites.txt', 'w')
for ip in hosts:
try:
f = urllib2.urlopen("http://%s" %ip)
f.read()
f.close()
print>> websites, ip
except urllib2.URLError:
print ip
except socket.timeout:
print ip, "Timed Out..................."
except socket.sslerror:
print ip, "SSL Error..................."
websites.close()

if __name__ == '__main__':
threads = []
for x in range(64):
thread = trivialthread()
threads.append(thread)
for thread in threads:
thread.start()
while threading.activeCount() > 0:
print str(threading.activeCount()), "threads running incl. main"
time.sleep(1)
 
A

Aahz

I'm no expert on threading, far from it. Could someone show me how I can
make this work correctly? I want to probe 64 unique IP address for HTTP
servers simultaneously, not the same IP addy 64 times (as I'm doing
now). Any tips would be much appreciated.

Create a threading.Thread subclass that takes one IP address and a list
of ports to scan. Start 64 instances of this class, each with a
different IP address.
 
S

Scott David Daniels

Create a threading.Thread subclass that takes one IP address and a list
of ports to scan. Start 64 instances of this class, each with a
different IP address.

An alternative is to create a que into which you push IP addresses to
contact, and have each thread read addresses off the queue when they are
free to process them. This has the advantage of decoupling the number
of threads from the number of addresses you want to examine.

-Scott David Daniels
(e-mail address removed)
 
A

Aahz

An alternative is to create a que into which you push IP addresses to
contact, and have each thread read addresses off the queue when they are
free to process them. This has the advantage of decoupling the number
of threads from the number of addresses you want to examine.

Absolutely, but that requires a bit more work for someone who isn't
already familiar with threading.
 
R

Roger Binns

Scott said:
An alternative is to create a que into which you push IP addresses to
contact, and have each thread read addresses off the queue when they are
free to process them. This has the advantage of decoupling the number
of threads from the number of addresses you want to examine.

That is also the general best design pattern for doing threading.
Have a Queue.Queue object where you place work items and have
threads pull items off the queue and execute it. You can use
callbacks or another Queue for placing the results on.

The Queue.Queue class has the nice property that it is very
thread safe, and you can do both blocking and timed waits
on it.

The other problem to deal with that is particular to Python is
how to stop threads when you want to shutdown or cancel actions.
At the very least you can have a 'shutdown' message you place
on the Queue that when any thread reads, it shuts down.

Unfortunately Python doesn't allow interrupting a thread,
so any thread doing something will run to completion. You
can check a variable or something in lines of Python
code, but cannot do anything when in C code. For example
if you do some networking stuff and the C code (eg a DNS
lookup followed by a TCP connect) takes 2 minutes, then
you will have to wait at least that long.

In the simplest case you can just make all your threads be
daemon. Python will shutdown when there are no non-daemon
threads left, so you can just exit your main loop and all
will shutdown. However that means the worker threads just
get abruptly stopped in the middle of what they were
doing.

(IMHO it would be *really* nice if Python provided a way
to interrupt threads).

Roger
 
P

Peter Hansen

Roger said:
In the simplest case you can just make all your threads be
daemon. Python will shutdown when there are no non-daemon
threads left, so you can just exit your main loop and all
will shutdown. However that means the worker threads just
get abruptly stopped in the middle of what they were
doing.

(IMHO it would be *really* nice if Python provided a way
to interrupt threads).

Sounds like you can't eat your cake and have it, too. If
you _could_ interrupt threads**, wouldn't that mean "the worker
threads just get abruptly stopped in the middle of what they
were doing"?

-Peter

** There is a way to interrupt threads in Python now, but
it requires an extension routine, or perhaps something with
ctypes. Findable in the archives for this newsgroup/list.
 
R

Roger Binns

Peter said:
Sounds like you can't eat your cake and have it, too. If
you _could_ interrupt threads**, wouldn't that mean "the worker
threads just get abruptly stopped in the middle of what they
were doing"?

I meant in the same way that can in Java. In that case an
InterruptedException is thrown which the thread can catch
and do whatever it wants with.

As an example at the moment, socket.accept is a blocking
call and if a thread is executing that there is no way
of stopping it.

This would make shutdown and reconfigurations possible.
For example you could interrupt all relevant threads
and they could check a variable to see if they should
shutdown, bind to a different port, abandon the current
work item etc.

Roger
 
P

Peter Hansen

Roger said:
I meant in the same way that can in Java. In that case an
InterruptedException is thrown which the thread can catch
and do whatever it wants with.

I didn't think things worked quite that way in Java. For
example, I thought InterruptedException was seen by a thread
only if it had actually been asleep at the time it was sent.

I also didn't know it would actually terminate certain
blocking calls, such as in socket stuff.

Oh well, it's been a while...

-Peter
 
B

Bart Nessux

Scott said:
An alternative is to create a que into which you push IP addresses to
contact, and have each thread read addresses off the queue when they are
free to process them. This has the advantage of decoupling the number
of threads from the number of addresses you want to examine.

-Scott David Daniels
(e-mail address removed)

I like this idea. I read up on the queue and threading module at
python.org and a few other sites around the Web and came up with this,
however, it doesn't work. I get these errors when it runs:

Exception in thread Thread-149:
Traceback (most recent call last):
File "/usr/lib/python2.3/threading.py", line 434, in __bootstrap
self.run()
File "/usr/lib/python2.3/threading.py", line 414, in run
self.__target(*self.__args, **self.__kwargs)
File "www_reads_threaded_1.py", line 49, in sub_thread_proc
f = urllib2.urlopen(url).read()
File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.3/urllib2.py", line 324, in open
type_ = req.get_type()
AttributeError: 'NoneType' object has no attribute 'get_type'

The problem I have is this: I know too little about thread programming.
If anyone thinks the code I have below could be made to work for my
tasks (probe 65,000 IPs for HTTP servers using threads to speed things
up), then please *show* me how I might change it in order for it to work.

Thanks again,
Bart

networks = []
hosts = []
urls = []
#socket.setdefaulttimeout(30)
max_threads = 2
http_timeout = 30
start_time = time.time()

# Add the network 192.168.0 possibility.
networks.append("192.168.0.")

# Generate and add networks 192.168.1-255 to the list of networks.
n = 0
while n < 255:
n = n + 1
networks.append("192.168.%s." %(n))

# Generate and add hosts 1-255 to each network
for network in networks:
h = 0
# Add the n.n.n.0 host possibility
hosts.append(network+str(h))
while h < 255:
h = h + 1
hosts.append(network+str(h))

for ip in hosts:
ip = "http://" + ip
urls.append(ip)

urls = dict(zip(urls,urls))
# print urls

# Create a queue of urls to feed the threads
url_queue = Queue.Queue()
for url in urls:
url_queue.put(url)
# print url

def test_HTTP(url_queue):
def sub_thread_proc(url, result):
# try:
f = urllib2.urlopen(url).read()
# except Exception:
# print "Exception"
# else:
result.append(url)
while 1:
try:
url = url_queue.get(0)
except Queue.Empty:
return
result = []
sub_thread = threading.Thread(target=sub_thread_proc,
args=(url,result))
sub_thread.setDaemon(True)
sub_thread.start()
sub_thread.join(http_timeout)
print result

test_HTTP(urls)
 
E

Eddie Corns

The problem I have is this: I know too little about thread programming.
If anyone thinks the code I have below could be made to work for my
tasks (probe 65,000 IPs for HTTP servers using threads to speed things
up), then please *show* me how I might change it in order for it to work.

I haven't been following this thread but if I was doing this I would want to
use asynchronous programming. It would finally force me to get to grips with
twisted.

Eddie
 
B

Bart Nessux

One word Eddie... WOW!

This async stuff is fabulous! It works and it's dead easy for my
application. There is no cpu limit with what I'm doing... only I/O
problems. At your suggestion, I looked at twisted, and then just used
the standard python asyncore module because it looked so darn easy, and
as it turned out, it was.

Thanks a million for the advice. I was looking in the *wrong* direction.
 
R

Roger Binns

Peter said:
I didn't think things worked quite that way in Java. For
example, I thought InterruptedException was seen by a thread
only if it had actually been asleep at the time it was sent.

I also didn't know it would actually terminate certain
blocking calls, such as in socket stuff.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Thread.html#interrupt()

As is the Java way, they have different types of interrupted
exceptions and sockets etc would need to be InterruptibleChannels.
They also use checked exceptions which makes life a lot harder.

More on Java best practises for thread interruption:

http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html

I would just settle for a nice clean mechanism whereby you can
call Thread.interrupt() and have an InterruptException in that
thread (which ultimately terminates the thread if it isn't
handled anywhere).

Some threads won't be interruptible because they are deep in
extension libraries. Perhaps that can be returned to the
caller of Thread.interrupt().

Roger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top