Memory problem with Python

S

Squzer Crawler

i am developing distributed environment in my college using Python. I
am using therads in client for downloading wepages. Even though i am
reusing the thread, memory usage get increased. I don know why.? I am
using BerkelyDB for URLQueue, BeautifulShop for Parsing the webpages.

Any idea of redusing the memory usage.. please tell me....

I want my program to run in bouded Memory.. Please..........
 
S

soring

i am developing distributed environment in my college using Python. I
am using therads in client for downloading wepages. Even though i am
reusing the thread, memory usage get increased. I don know why.? I am
using BerkelyDB for URLQueue, BeautifulShop for Parsing the webpages.

Isn't the increased memory resulted from storing the already
processed pages?

Look first at all places where your code instantiates new
objects - and make sure you don't keep references to such objects that
are not needed anymore.

Also, reusing threads has nothing to do with saving memory - but
with saving on thread creation time, if I understand your problem
description.
 
S

Squzer Crawler

Isn't the increased memory resulted from storing the already
processed pages?

Look first at all places where your code instantiates new
objects - and make sure you don't keep references to such objects that
are not needed anymore.

Also, reusing threads has nothing to do with saving memory - but
with saving on thread creation time, if I understand your problem
description.

what about the cyclic reference.. can i use GC in my program..

if so, please tell me how to implement.. i am calling the gc.collect()
at the enf of the fetching.. Will it reduce my program speed. Else in
which way i can call it..?

please tell me........
 
J

Josiah Carlson

Squzer said:
what about the cyclic reference.. can i use GC in my program..

if so, please tell me how to implement.. i am calling the gc.collect()
at the enf of the fetching.. Will it reduce my program speed. Else in
which way i can call it..?

Garbage collection should happen automatically as long as you are
deleting references to objects you no longer need. If gc.garbage isn't
empty, then you have unbreakable reference cycles. It seems more
likely, as soring@gmail says, that you are keeping copies of the things
you already parsed in memory.

What you can do (if you aren't able to find the bug) is have a wrapper
program that repeatedly starts up your url fetcher via os.system().
Then have your url fetcher close itself down every few hours.

- Josiah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,202
Latest member
MikoOslo

Latest Threads

Top