multythreading app memory consumption

R

Roman Petrichev

Hi folks.
I've just faced with very nasty memory consumption problem.
I have a multythreaded app with 150 threads which use the only and the
same function - through urllib2 it just gets the web page's html code
and assigns it to local variable. On the next turn the variable is
overritten with another page's code. At every moment the summary of
values of the variables containig code is not more than 15Mb (I've just
invented a tricky way to measure this). But during the first 30 minutes
all the system memory (512Mb) is consumed and 'MemoryError's is arising.
Why is it so and how can I limit the memory consumption in borders, say,
400Mb? Maybe there is a memory leak there?
Thnx

The test app code:


Q = Queue.Queue()
for i in rez: #rez length - 5000
Q.put(i)


def checker():
while True:
try:
url = Q.get()
except Queue.Empty:
break
try:
opener = urllib2.urlopen(url)
data = opener.read()
opener.close()
except:
sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
try:
opener.close()
except:
pass
continue
print len(data)


for i in xrange(150):
new_thread = threading.Thread(target=checker)
new_thread.start()
 
D

Dennis Lee Bieber

Hi folks.
I've just faced with very nasty memory consumption problem.
I have a multythreaded app with 150 threads which use the only and the
same function - through urllib2 it just gets the web page's html code
and assigns it to local variable. On the next turn the variable is
overritten with another page's code. At every moment the summary of
values of the variables containig code is not more than 15Mb (I've just
invented a tricky way to measure this). But during the first 30 minutes
all the system memory (512Mb) is consumed and 'MemoryError's is arising.
Why is it so and how can I limit the memory consumption in borders, say,
400Mb? Maybe there is a memory leak there?
Thnx
How much stack space gets allocated for 150 threads?
Q = Queue.Queue()
for i in rez: #rez length - 5000

Can't be the "test code" as you don't show the imports or where
"rez" is defined.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
R

Roman Petrichev

Dennis said:
How much stack space gets allocated for 150 threads?
Actually I don't know. How can I get to know this?
Can't be the "test code" as you don't show the imports or where
"rez" is defined.
Isn't it clear that "rez" is just a list of 5000 urls? I cannot place it
here, but believe me all of them are not big - "At every moment the
summary of values of the variables containig code is not more than 15Mb"

Regards
 
I

Istvan Albert

Roman said:
try:
url = Q.get()
except Queue.Empty:
break

This code will never raise the Queue.Empty exception. Only a
non-blocking get does:

url = Q.get(block=False)

As mentioned before you should post working code if you expect people
to help.

i.
 
D

Dennis Lee Bieber

Actually I don't know. How can I get to know this?

Unfortunately I don't know of any utility for finding stack sizes --
it may be somewhere in the OS documentation (I'm sure there is a default
size somewhere). Though I don't expect 150 threads to use more than 1MB
total...

Even if these are using all physical memory, I'd just expect some of
the threads to start paging out to disk.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
N

Neil Hodgson

Roman Petrichev:
Actually I don't know. How can I get to know this?

On Linux, each thread will often be allocated 10 megabytes of stack.
This can be viewed and altered with the ulimit command.

Neil
 
B

Bryan Olson

Roman said:
Hi folks.
I've just faced with very nasty memory consumption problem.
I have a multythreaded app with 150 threads [...]

The test app code:


Q = Queue.Queue()
for i in rez: #rez length - 5000
Q.put(i)


def checker():
while True:
try:
url = Q.get()
except Queue.Empty:
break
try:
opener = urllib2.urlopen(url)
data = opener.read()
opener.close()
except:
sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
try:
opener.close()
except:
pass
continue
print len(data)


for i in xrange(150):
new_thread = threading.Thread(target=checker)
new_thread.start()

Don't know if this is the heart of your problem, but there's no
limit to how big "data" could be, after

data = opener.read()

Furthermore, you keep it until "data" gets over-written the next
time through the loop. You might try restructuring checker() to
make data local to one iteration, as in:

def checker():
while True:
onecheck()

def onecheck():
try:
url = Q.get()
except Queue.Empty:
break
try:
opener = urllib2.urlopen(url)
data = opener.read()
opener.close()
print len(data)
except:
sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
try:
opener.close()
except:
pass
 
B

Bryan Olson

Dennis said:
How much stack space gets allocated for 150 threads?

In Python 2.5, each thread will be allocated

thread.stack_size()

bytes of stack address space. Note that address space is
not physical memory, nor even virtual memory. On modern
operating systems, the memory gets allocated as needed,
and 150 threads is not be a problem.
 
A

Andrew MacIntyre

Bryan said:
In Python 2.5, each thread will be allocated

thread.stack_size()

bytes of stack address space. Note that address space is
not physical memory, nor even virtual memory. On modern
operating systems, the memory gets allocated as needed,
and 150 threads is not be a problem.

Just a note that [thread|threading].stack_size() returns 0 to indicate
the platform default, and that value will always be returned unless an
explicit value has previously been set.

The Posix thread platforms (those that support programmatic setting of
this parameter) have the best support for sanity checking the requested
size - the value gets checked when actually set, rather than when the
thread creation is attempted.

The platform default thread stack sizes I can recall are:
Windows: 1MB (though this may be affected by linker options)
Linux: 1MB or 8MB depending on threading library and/or distro
FreeBSD: 64kB

--
 
R

Roman Petrichev

Thank you guys for your replies.
I've just realized that there was no memory leak and it was just my
mistake to think so. I've almost disappointed with my favorite
programming language before addressing the problem. Actually the app
consume as much memory as it should and I've just miscalculated.
Regards
 
D

Dennis Lee Bieber

Thank you guys for your replies.
I've just realized that there was no memory leak and it was just my
mistake to think so. I've almost disappointed with my favorite
programming language before addressing the problem. Actually the app
consume as much memory as it should and I've just miscalculated.
Regards

I do wonder if using that 150 threads is really effective... What
speed is the network connection capable of sustaining? For an idealized
(ignoring overhead) example, I've got a 1.5Mbps DSL (best I've actually
seen is a download running around 750Kbps). Distribute that 1.5Mbps
among 150 separate connections pulling data, and the average rate per
connection is choked down to 10Kbps -- and maybe a lot of buffering at
some upstream router that may be running multiple T1 lines and getting
each page at 60+Kbps.

Using 50 threads would allow each thread to obtain 30Kbps, which may
be a closer match to the rate the data is sent by the servers. There may
also be less overhead tied up in task switching.

Unfortunately, testing this won't be that easy -- especially if some
upstream machine caches pages (avoiding the need to go all the way to
the "source" on subsequent requests).

None the less... You might want to time how long the program takes
to complete with the 150 threads, then cut the number of threads to 100
and repeat... Then 50... 25... At some point I'd expect to see the run
time start to go up -- this would be the point, I'd think, at which the
network connection is not being fully used. In my hypothetical example,
that would happen when the number of threads drops below 25 (25 @ 60Kbps
=> 1.5Mbps)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top