python proxy checker ,change to threaded version


E

elca

Hello ALL,

i have some python proxy checker .

and to speed up check, i was decided change to mutlthreaded version,

and thread module is first for me, i was tried several times to convert to
thread version

and look for many info, but it not so much easy for novice python programmar
..

if anyone can help me really much appreciate!!

thanks in advance!


import urllib2, socket

socket.setdefaulttimeout(180)
# read the list of proxy IPs in proxyList
proxyList = open('listproxy.txt').read()

def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.yahoo.com') # <---check whether
proxy alive
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:

print "ERROR:", detail
return 1
return 0


for item in proxyList:
if is_bad_proxy(item):
print "Bad Proxy", item
else:
print item, "is working"
 
Ad

Advertisements

R

r0g

elca said:
Hello ALL,

i have some python proxy checker .

and to speed up check, i was decided change to mutlthreaded version,

and thread module is first for me, i was tried several times to convert to
thread version

and look for many info, but it not so much easy for novice python programmar
.

if anyone can help me really much appreciate!!

thanks in advance!


import urllib2, socket

socket.setdefaulttimeout(180)
# read the list of proxy IPs in proxyList
proxyList = open('listproxy.txt').read()

def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.yahoo.com') # <---check whether
proxy alive
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:

print "ERROR:", detail
return 1
return 0


for item in proxyList:
if is_bad_proxy(item):
print "Bad Proxy", item
else:
print item, "is working"



The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...


def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
import threading
class MyThread ( threading.Thread ):
def run ( self ):

# Call function
if function_args:
result = function(*function_args)
else:
result = function()

# Call callback
if callback:
if callback_args:
callback(result, *callback_args)
else:
callback(result)

MyThread().start()


You need to pass it a test function (+args) and, if you want to get a
result back from each thread you also need to provide a callback
function (+args). The first parameter of the callback function receives
the result of the test function so your callback would loo something
like this...

def cb( result, item ):
if result:
print "Bad Proxy", item
else:
print item, "is working"


And your calling loop would be something like this...

for item in proxyList:
run_in_thread( is_bad_proxy, func_args=[ item ], cb, callback_args=[
item ] )


Also, you might want to limit the number of concurrent threads so as not
to overload your system, one quick and dirty way to do this is...

import time
if threading.activeCount() > 9: time.sleep(1)

Note, this is a far from exact method but it works well enough for one
off scripting use

Hope this helps.


Suggestions from hardcore pythonistas on how to my make run_in_thread
function more elegant are quite welcome also :)


Roger Heathcote
 
E

elca

r0g said:
Hello ALL,

i have some python proxy checker .

and to speed up check, i was decided change to mutlthreaded version,

and thread module is first for me, i was tried several times to convert
to
thread version

and look for many info, but it not so much easy for novice python
programmar
.

if anyone can help me really much appreciate!!

thanks in advance!


import urllib2, socket

socket.setdefaulttimeout(180)
# read the list of proxy IPs in proxyList
proxyList = open('listproxy.txt').read()

def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.yahoo.com') # <---check
whether
proxy alive
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:

print "ERROR:", detail
return 1
return 0


for item in proxyList:
if is_bad_proxy(item):
print "Bad Proxy", item
else:
print item, "is working"



The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...


def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
import threading
class MyThread ( threading.Thread ):
def run ( self ):

# Call function
if function_args:
result = function(*function_args)
else:
result = function()

# Call callback
if callback:
if callback_args:
callback(result, *callback_args)
else:
callback(result)

MyThread().start()


You need to pass it a test function (+args) and, if you want to get a
result back from each thread you also need to provide a callback
function (+args). The first parameter of the callback function receives
the result of the test function so your callback would loo something
like this...

def cb( result, item ):
if result:
print "Bad Proxy", item
else:
print item, "is working"


And your calling loop would be something like this...

for item in proxyList:
run_in_thread( is_bad_proxy, func_args=[ item ], cb, callback_args=[
item ] )


Also, you might want to limit the number of concurrent threads so as not
to overload your system, one quick and dirty way to do this is...

import time
if threading.activeCount() > 9: time.sleep(1)

Note, this is a far from exact method but it works well enough for one
off scripting use

Hope this helps.


Suggestions from hardcore pythonistas on how to my make run_in_thread
function more elegant are quite welcome also :)


Roger Heathcote
Hello :)
thanks for your reply !
i will test it now and will comment soon
thanks again
 
T

Terry Reedy

r0g said:
The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...

Great idea. Thanks for posting this.
def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
import threading
class MyThread ( threading.Thread ):
def run ( self ):

# Call function
if function_args:
result = function(*function_args)
else:
result = function()

The check is not necessary. by design, f(*[]) == f()
Names do not match param names ;=)
# Call callback
if callback:
if callback_args:
callback(result, *callback_args)
else:
callback(result)

Ditto. g(x,*[]) == g(x)

def run(self):
result = func(*func_args) # matching run_in_thread param names
callback(result, *callback_args)

MyThread().start()

This is one of the best uses I have seen for a nested class definition.

Suggestions from hardcore pythonistas on how to my make run_in_thread
function more elegant are quite welcome also :)

I shortened it, at least.

Terry Jan Reedy
 
R

r0g

Terry said:
r0g said:
The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...

Great idea. Thanks for posting this.
def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
import threading
class MyThread ( threading.Thread ):
def run ( self ):

# Call function
if function_args:
result = function(*function_args)
else:
result = function()

The check is not necessary. by design, f(*[]) == f()


Excellent, thanks Terry :)

I knew it would be simpler than I thought! I've been writing a lot of
PHP and AS3 recently and it's easy to forget how python often just works
without needing the same level of hand holding, error checking &
defensive coding as other languages!


Names do not match param names ;=)


Oops yeah! Thought I'd refactor my painfully verbose variable names
before posting in a 70 char wide medium but it looks like I missed one!
*blush*

Roger.
 
R

r0g

Terry said:
r0g said:
The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...

Great idea. Thanks for posting this.
def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
<snipped cumbersome older version>


Okay, so here's the more concise version for posterity / future googlers...

import threading

def run_in_thread( func, func_args=[], callback=None, callback_args=[] ):
class MyThread ( threading.Thread ):
def run ( self ):
result = func(*func_args)
if callback:
callback(result, *callback_args)
MyThread().start()


Roger.
 
Ad

Advertisements

R

r0g

Rhodri said:
r0g said:
The trick to threads is to create a subclass of threading.Thread, define
the 'run' function and call the 'start()' method. I find threading quite
generally useful so I created this simple generic function for running
things in threads...

Great idea. Thanks for posting this.
def run_in_thread( func, func_args=[], callback=None,
callback_args=[] ):

I'm might wary of having mutable defaults for parameters. They make for
the most annoying errors. Even though they're actually safe here, I'd
still write:

def run_in_thread(func, func_args=(), callback=None, callback_args=()):

out of sheer paranoia.



Excellent point, thanks :)

I'm starting to suspect this is the highest quality group in all of usenet!


Roger.
 
L

Lie Ryan

Neat, but I think you mean

if callback is not None:
callback(result, *callback_args)

for that last line.

how about:
import threading

def run_in_thread( func, func_args=[], callback=lambda r,*a: None,
callback_args=[] ):
class MyThread ( threading.Thread ):
def run ( self ):
result = func(*func_args)
callback(result, *callback_args)
MyThread().start()


(and for me, I'd )
 
Ad

Advertisements

R

r0g

Lie said:
Neat, but I think you mean

if callback is not None:
callback(result, *callback_args)

for that last line.

how about:
import threading

def run_in_thread( func, func_args=[], callback=lambda r,*a: None,
callback_args=[] ):
class MyThread ( threading.Thread ):
def run ( self ):
result = func(*func_args)
callback(result, *callback_args)
MyThread().start()


(and for me, I'd )


Cool, that's a neat trick I'd never have thought of. I think the 2 line
alternative might be a little more pythonic though, in terms of
readability & simplicity...

if callback:
callback(result, *callback_args)

That could be because I'm not terribly au fait with the whole lambda
calculus thing though. What say those who are comfortable with it?
Obvious or oblique?

Roger.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top