Weird asyncore behaviour

F

Freddie

Hi,

I've been playing around with asyncore for one of my projects, currently
using it to fetch HTTP pages without blocking things. Having a few
issues, though. With the code below, I would start a new async_http
instance with "async_http(url, returnme)". If it encountered a redirect,
that object would start a new one as "async_http(url, returnme,
seen=self._seen)" to remember what URLs it had seen. The problem is that
after a while (usually after several async_http objects are active at
once), newly created async_http objects would have the seen parameter
with URLs filled in already! I have absolutely no idea how that could be
happening :\ I 'solved' it by explicitly passing "seen={}" for new
objects, but I would still like to know why this is happening :)

Freddie


class async_http(asyncore.dispatcher):
def __init__(self, parent, returnme, url, seen={}):
asyncore.dispatcher.__init__(self)
self.parent = parent
self.returnme = returnme
self.url = url
self._seen = seen
# split url, connect, etc

# at some point in here, we would find a redirect and go:
def at_some_point(self):
self._seen[self.url] = 1
async_http(self.parent, self.returnme, self.url, self._seen)
self.close()
 
A

Andrew Bennetts

Hi,

I've been playing around with asyncore for one of my projects, currently
using it to fetch HTTP pages without blocking things. Having a few
issues, though. With the code below, I would start a new async_http
instance with "async_http(url, returnme)". If it encountered a redirect,
that object would start a new one as "async_http(url, returnme,
seen=self._seen)" to remember what URLs it had seen. The problem is that
after a while (usually after several async_http objects are active at
once), newly created async_http objects would have the seen parameter
with URLs filled in already! I have absolutely no idea how that could be
happening :\ I 'solved' it by explicitly passing "seen={}" for new
objects, but I would still like to know why this is happening :)

Freddie


class async_http(asyncore.dispatcher):
def __init__(self, parent, returnme, url, seen={}):

That line is the problem. Default arguments are only created once, at
function definition time, not every time the function is called. So the
same 'seen' dictionary is being used for all async_http instances.
Generally, rather than using mutable default arguments, do:

def __init__(self, parent, returnme, url, seen=None):
if seen is None:
seen = {}
...

-Andrew.
 
E

Erik Max Francis

Freddie said:
class async_http(asyncore.dispatcher):
def __init__(self, parent, returnme, url, seen={}):
^^

This is your problem. Default arguments are only created once, so the
default seen argument is the same physical object shared by all
invocations of that method. That isn't what you want, as it will
accumulate changes.

Instead use the idiom:

def f(dictArg=None):
if dictArg is None:
dictArg = {} # create a new one each time
...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top