multythreading app memory consumption

Discussion in 'Python' started by Roman Petrichev, Oct 23, 2006.

  1. Hi folks.
    I've just faced with very nasty memory consumption problem.
    I have a multythreaded app with 150 threads which use the only and the
    same function - through urllib2 it just gets the web page's html code
    and assigns it to local variable. On the next turn the variable is
    overritten with another page's code. At every moment the summary of
    values of the variables containig code is not more than 15Mb (I've just
    invented a tricky way to measure this). But during the first 30 minutes
    all the system memory (512Mb) is consumed and 'MemoryError's is arising.
    Why is it so and how can I limit the memory consumption in borders, say,
    400Mb? Maybe there is a memory leak there?
    Thnx

    The test app code:


    Q = Queue.Queue()
    for i in rez: #rez length - 5000
    Q.put(i)


    def checker():
    while True:
    try:
    url = Q.get()
    except Queue.Empty:
    break
    try:
    opener = urllib2.urlopen(url)
    data = opener.read()
    opener.close()
    except:
    sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
    try:
    opener.close()
    except:
    pass
    continue
    print len(data)


    for i in xrange(150):
    new_thread = threading.Thread(target=checker)
    new_thread.start()
     
    Roman Petrichev, Oct 23, 2006
    #1
    1. Advertising

  2. On Mon, 23 Oct 2006 03:31:28 +0400, Roman Petrichev <>
    declaimed the following in comp.lang.python:

    > Hi folks.
    > I've just faced with very nasty memory consumption problem.
    > I have a multythreaded app with 150 threads which use the only and the
    > same function - through urllib2 it just gets the web page's html code
    > and assigns it to local variable. On the next turn the variable is
    > overritten with another page's code. At every moment the summary of
    > values of the variables containig code is not more than 15Mb (I've just
    > invented a tricky way to measure this). But during the first 30 minutes
    > all the system memory (512Mb) is consumed and 'MemoryError's is arising.
    > Why is it so and how can I limit the memory consumption in borders, say,
    > 400Mb? Maybe there is a memory leak there?
    > Thnx
    >

    How much stack space gets allocated for 150 threads?

    >
    > Q = Queue.Queue()
    > for i in rez: #rez length - 5000


    Can't be the "test code" as you don't show the imports or where
    "rez" is defined.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Oct 23, 2006
    #2
    1. Advertising

  3. Dennis Lee Bieber wrote:
    > On Mon, 23 Oct 2006 03:31:28 +0400, Roman Petrichev <>
    > declaimed the following in comp.lang.python:
    >
    >> Hi folks.
    >> I've just faced with very nasty memory consumption problem.
    >> I have a multythreaded app with 150 threads which use the only and the
    >> same function - through urllib2 it just gets the web page's html code
    >> and assigns it to local variable. On the next turn the variable is
    >> overritten with another page's code. At every moment the summary of
    >> values of the variables containig code is not more than 15Mb (I've just
    >> invented a tricky way to measure this). But during the first 30 minutes
    >> all the system memory (512Mb) is consumed and 'MemoryError's is arising.
    >> Why is it so and how can I limit the memory consumption in borders, say,
    >> 400Mb? Maybe there is a memory leak there?
    >> Thnx
    >>

    > How much stack space gets allocated for 150 threads?

    Actually I don't know. How can I get to know this?
    >> Q = Queue.Queue()
    >> for i in rez: #rez length - 5000

    >
    > Can't be the "test code" as you don't show the imports or where
    > "rez" is defined.

    Isn't it clear that "rez" is just a list of 5000 urls? I cannot place it
    here, but believe me all of them are not big - "At every moment the
    summary of values of the variables containig code is not more than 15Mb"

    Regards
     
    Roman Petrichev, Oct 23, 2006
    #3
  4. Roman Petrichev wrote:

    > try:
    > url = Q.get()
    > except Queue.Empty:
    > break


    This code will never raise the Queue.Empty exception. Only a
    non-blocking get does:

    url = Q.get(block=False)

    As mentioned before you should post working code if you expect people
    to help.

    i.
     
    Istvan Albert, Oct 23, 2006
    #4
  5. On Mon, 23 Oct 2006 12:07:47 +0400, Roman Petrichev <>
    declaimed the following in comp.lang.python:


    > Actually I don't know. How can I get to know this?


    Unfortunately I don't know of any utility for finding stack sizes --
    it may be somewhere in the OS documentation (I'm sure there is a default
    size somewhere). Though I don't expect 150 threads to use more than 1MB
    total...

    Even if these are using all physical memory, I'd just expect some of
    the threads to start paging out to disk.
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Oct 23, 2006
    #5
  6. Roman Petrichev

    Neil Hodgson Guest

    Roman Petrichev:
    > Dennis Lee Bieber wrote:
    >> How much stack space gets allocated for 150 threads?

    > Actually I don't know. How can I get to know this?


    On Linux, each thread will often be allocated 10 megabytes of stack.
    This can be viewed and altered with the ulimit command.

    Neil
     
    Neil Hodgson, Oct 24, 2006
    #6
  7. Roman Petrichev

    Bryan Olson Guest

    Roman Petrichev wrote:
    > Hi folks.
    > I've just faced with very nasty memory consumption problem.
    > I have a multythreaded app with 150 threads

    [...]
    >
    > The test app code:
    >
    >
    > Q = Queue.Queue()
    > for i in rez: #rez length - 5000
    > Q.put(i)
    >
    >
    > def checker():
    > while True:
    > try:
    > url = Q.get()
    > except Queue.Empty:
    > break
    > try:
    > opener = urllib2.urlopen(url)
    > data = opener.read()
    > opener.close()
    > except:
    > sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
    > try:
    > opener.close()
    > except:
    > pass
    > continue
    > print len(data)
    >
    >
    > for i in xrange(150):
    > new_thread = threading.Thread(target=checker)
    > new_thread.start()


    Don't know if this is the heart of your problem, but there's no
    limit to how big "data" could be, after

    data = opener.read()

    Furthermore, you keep it until "data" gets over-written the next
    time through the loop. You might try restructuring checker() to
    make data local to one iteration, as in:

    def checker():
    while True:
    onecheck()

    def onecheck():
    try:
    url = Q.get()
    except Queue.Empty:
    break
    try:
    opener = urllib2.urlopen(url)
    data = opener.read()
    opener.close()
    print len(data)
    except:
    sys.stderr.write('ERROR: %s\n' % traceback.format_exc())
    try:
    opener.close()
    except:
    pass


    --
    --Bryan
     
    Bryan Olson, Oct 24, 2006
    #7
  8. Roman Petrichev

    Bryan Olson Guest

    Dennis Lee Bieber wrote:
    > How much stack space gets allocated for 150 threads?


    In Python 2.5, each thread will be allocated

    thread.stack_size()

    bytes of stack address space. Note that address space is
    not physical memory, nor even virtual memory. On modern
    operating systems, the memory gets allocated as needed,
    and 150 threads is not be a problem.


    --
    --Bryan
     
    Bryan Olson, Oct 24, 2006
    #8
  9. On Mon, 23 Oct 2006 23:25:47 GMT, Neil Hodgson
    <> declaimed the following in
    comp.lang.python:

    >
    > On Linux, each thread will often be allocated 10 megabytes of stack.
    > This can be viewed and altered with the ulimit command.
    >

    If true, 151 (main + 150 threads) wants 1.5GB... Sounds a bit high
    to me... 1MB each I could see...
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Oct 24, 2006
    #9
  10. Bryan Olson wrote:

    > In Python 2.5, each thread will be allocated
    >
    > thread.stack_size()
    >
    > bytes of stack address space. Note that address space is
    > not physical memory, nor even virtual memory. On modern
    > operating systems, the memory gets allocated as needed,
    > and 150 threads is not be a problem.


    Just a note that [thread|threading].stack_size() returns 0 to indicate
    the platform default, and that value will always be returned unless an
    explicit value has previously been set.

    The Posix thread platforms (those that support programmatic setting of
    this parameter) have the best support for sanity checking the requested
    size - the value gets checked when actually set, rather than when the
    thread creation is attempted.

    The platform default thread stack sizes I can recall are:
    Windows: 1MB (though this may be affected by linker options)
    Linux: 1MB or 8MB depending on threading library and/or distro
    FreeBSD: 64kB

    --
    -------------------------------------------------------------------------
    Andrew I MacIntyre "These thoughts are mine alone..."
    E-mail: (pref) | Snail: PO Box 370
    (alt) | Belconnen ACT 2616
    Web: http://www.andymac.org/ | Australia
     
    Andrew MacIntyre, Oct 24, 2006
    #10
  11. Thank you guys for your replies.
    I've just realized that there was no memory leak and it was just my
    mistake to think so. I've almost disappointed with my favorite
    programming language before addressing the problem. Actually the app
    consume as much memory as it should and I've just miscalculated.
    Regards
     
    Roman Petrichev, Oct 24, 2006
    #11
  12. On Tue, 24 Oct 2006 16:48:58 +0400, Roman Petrichev <>
    declaimed the following in comp.lang.python:

    > Thank you guys for your replies.
    > I've just realized that there was no memory leak and it was just my
    > mistake to think so. I've almost disappointed with my favorite
    > programming language before addressing the problem. Actually the app
    > consume as much memory as it should and I've just miscalculated.
    > Regards


    I do wonder if using that 150 threads is really effective... What
    speed is the network connection capable of sustaining? For an idealized
    (ignoring overhead) example, I've got a 1.5Mbps DSL (best I've actually
    seen is a download running around 750Kbps). Distribute that 1.5Mbps
    among 150 separate connections pulling data, and the average rate per
    connection is choked down to 10Kbps -- and maybe a lot of buffering at
    some upstream router that may be running multiple T1 lines and getting
    each page at 60+Kbps.

    Using 50 threads would allow each thread to obtain 30Kbps, which may
    be a closer match to the rate the data is sent by the servers. There may
    also be less overhead tied up in task switching.

    Unfortunately, testing this won't be that easy -- especially if some
    upstream machine caches pages (avoiding the need to go all the way to
    the "source" on subsequent requests).

    None the less... You might want to time how long the program takes
    to complete with the 150 threads, then cut the number of threads to 100
    and repeat... Then 50... 25... At some point I'd expect to see the run
    time start to go up -- this would be the point, I'd think, at which the
    network connection is not being fully used. In my hypothetical example,
    that would happen when the number of threads drops below 25 (25 @ 60Kbps
    => 1.5Mbps)
    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, Oct 24, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Duccio
    Replies:
    0
    Views:
    537
    Duccio
    Feb 25, 2006
  2. Kiran Kumar

    aspnet_wp.exe memory consumption

    Kiran Kumar, Jul 15, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    465
    Natty Gur
    Jul 15, 2003
  3. Ervin

    Urgent! GDI+ Memory consumption

    Ervin, Sep 15, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    622
    Ervin
    Sep 15, 2003
  4. tony_wang
    Replies:
    1
    Views:
    4,093
    Saravana [MVP]
    Nov 21, 2003
  5. Ivan Belov

    ASP.NET app instance memory consumption

    Ivan Belov, Jan 19, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    338
    Ivan Belov
    Jan 19, 2005
Loading...

Share This Page