performance degradation when looping through lists

Discussion in 'Python' started by Joachim Worringen, Apr 7, 2006.

  1. I need to process large lists (in my real application, this is to parse the
    content of a file). I noticed that the performance to access the individual list
    elements degrades over runtime.

    This can be reproduced easily using this code:

    import time

    N=100000
    p=10000

    A=[]
    for i in range(N):
    A.append(str(i))

    j = 0
    t = time.clock()
    for i in range(len(A)):
    j += int(A)
    if i % p == 0:
    t = time.clock() - t
    print t

    (the string conversion only servers to increase the duration of each iteration;
    you can observer the same effect with ints, too).

    When running this, I get output like this:
    0.0
    0.37
    0.03
    0.4
    0.06
    0.43
    0.09
    0.46
    0.13
    0.49

    I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
    [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

    I wonder why
    1. The execution times alternate between "fast" and "slow" (I observe the same
    effect in my much more complex application)
    2. The execution times increase steadily, from "0" to 0.13 for the "fast"
    phases, and from 0.37 to 0.49 for the slow phases.

    Within my application, the effect is more drastical as the following numbers
    show (the processing time per line is roughly the same for each line!):
    at line 10000 (32258 lines/s)
    at line 20000 (478 lines/s)
    at line 30000 (21276 lines/s)
    at line 40000 (475 lines/s)
    at line 50000 (15873 lines/s)
    at line 60000 (471 lines/s)
    at line 70000 (12658 lines/s)
    at line 80000 (468 lines/s)
    at line 90000 (10638 lines/s)
    at line 100000 (464 lines/s)
    at line 110000 (9090 lines/s)
    at line 120000 (461 lines/s)
    at line 130000 (7936 lines/s)
    at line 140000 (457 lines/s)
    at line 150000 (7042 lines/s)
    at line 160000 (454 lines/s)
    at line 170000 (6369 lines/s)
    at line 180000 (451 lines/s)
    at line 190000 (5780 lines/s)
    at line 200000 (448 lines/s)
    at line 210000 (4854 lines/s)
    at line 220000 (444 lines/s)
    at line 230000 (4504 lines/s)
    at line 240000 (441 lines/s)
    at line 250000 (4201 lines/s)
    at line 260000 (438 lines/s)
    at line 270000 (3952 lines/s)
    at line 280000 (435 lines/s)
    at line 290000 (3717 lines/s)
    at line 300000 (432 lines/s)
    at line 310000 (3508 lines/s)
    at line 320000 (429 lines/s)
    at line 330000 (3322 lines/s)
    at line 340000 (426 lines/s)
    at line 350000 (3154 lines/s)
    at line 360000 (423 lines/s)
    at line 370000 (3003 lines/s)
    at line 380000 (421 lines/s)
    at line 390000 (2873 lines/s)

    Any ideas why this is like this, and what I could do about it? It really makes
    may application non-scalable as the lines/s go down even further.

    --
    Joachim - reply to joachim at domain ccrl-nece dot de

    Opinion expressed is personal and does not constitute
    an opinion or statement of NEC Laboratories.
     
    Joachim Worringen, Apr 7, 2006
    #1
    1. Advertising

  2. Joachim Worringen wrote:
    > I need to process large lists (in my real application, this is to parse
    > the content of a file).


    Then you probably want to use generators instead of lists. The problem
    with large lists is that they eat a lot of memory - which can result in
    swapping .

    > I noticed that the performance to access the
    > individual list elements degrades over runtime.


    I leave this point to gurus, but it may have to do with swapping. Also,
    this is not real-time, so variations may have to do with your OS tasks
    scheduler.

    My 2 cents
    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    bruno at modulix, Apr 7, 2006
    #2
    1. Advertising

  3. bruno at modulix wrote:
    > Joachim Worringen wrote:
    >> I need to process large lists (in my real application, this is to parse
    >> the content of a file).

    >
    > Then you probably want to use generators instead of lists. The problem
    > with large lists is that they eat a lot of memory - which can result in
    > swapping .


    The effect also shows up in tiny examples (as the one posted) which surely don't
    swap on a 512MB machine.

    Also, I only read parts of the file into memory to avoid that memory becomes
    exhausted.

    Of course, using less memory is always a good idea - do you have a pointer on
    how to use generators for this application (basically, buffering file content in
    memory for faster access)? BTW, the effect also shows up with the linecache module.

    >> I noticed that the performance to access the
    >> individual list elements degrades over runtime.

    >
    > I leave this point to gurus, but it may have to do with swapping. Also,
    > this is not real-time, so variations may have to do with your OS tasks
    > scheduler.


    See above for the swapping. And the OS scheduler may create variations in
    runtime, but not monotone degradation. I don't think these two effect come into
    play here.

    --
    Joachim - reply to joachim at domain ccrl-nece dot de

    Opinion expressed is personal and does not constitute
    an opinion or statement of NEC Laboratories.
     
    Joachim Worringen, Apr 7, 2006
    #3
  4. Joachim Worringen

    Peter Otten Guest

    Joachim Worringen wrote:

    > I need to process large lists (in my real application, this is to parse
    > the content of a file). I noticed that the performance to access the
    > individual list elements degrades over runtime.
    >
    > This can be reproduced easily using this code:
    >
    > import time
    >
    > N=100000
    > p=10000
    >
    > A=[]
    > for i in range(N):
    > A.append(str(i))
    >
    > j = 0
    > t = time.clock()
    > for i in range(len(A)):
    > j += int(A)
    > if i % p == 0:
    > t = time.clock() - t
    > print t
    >
    > (the string conversion only servers to increase the duration of each
    > iteration; you can observer the same effect with ints, too).
    >
    > When running this, I get output like this:
    > 0.0
    > 0.37
    > 0.03
    > 0.4
    > 0.06
    > 0.43
    > 0.09
    > 0.46
    > 0.13
    > 0.49
    >
    > I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
    > [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2
    >
    > I wonder why
    > 1. The execution times alternate between "fast" and "slow" (I observe the
    > same effect in my much more complex application)


    Your timing code is buggy. Change it to

    import time

    N=100000
    p=10000

    A=[]
    for i in range(N):
    A.append(str(i))

    j = 0
    start = time.clock()
    for i in range(len(A)):
    j += int(A)
    if i % p == 0:
    end = time.clock()
    print end - start
    start = end

    Does the problem persist? I hope not.

    Peter
     
    Peter Otten, Apr 7, 2006
    #4
  5. Peter Otten wrote:
    > Your timing code is buggy. Change it to


    Ooops, you're right. Everything is fine now... Thanks.

    Joachim

    --
    Joachim - reply to joachim at domain ccrl-nece dot de

    Opinion expressed is personal and does not constitute
    an opinion or statement of NEC Laboratories.
     
    Joachim Worringen, Apr 7, 2006
    #5
  6. Joachim Worringen on comp.lang.python said:

    > I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
    > [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2


    Check Peter Otten's answer, and remember as well that GCC 2.96 can lead to
    highly strange issues whenever used.

    --
    Alan Franzoni <>
    -
    Togli .xyz dalla mia email per contattarmi.
    Rremove .xyz from my address in order to contact me.
    -
    GPG Key Fingerprint:
    5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
     
    Alan Franzoni, Apr 7, 2006
    #6
  7. Joachim Worringen

    Guest

    Hi,

    I wrote a program some days back and I was using lists heavily for
    performing operations such as pop, remove, append. My list size was
    around 1024x3 and there were around 20 different objects like this.

    What I want to ask you is that my program also degraded over a period
    of time. I cannot post the code as its lot of code.

    But I want to ask a question why List degrade. What other alternative
    for lists is a faster measure.

    Eveyr help is greatly appreciated,
     
    , Apr 7, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Boo R. Ghost

    Performance degradation

    Boo R. Ghost, May 11, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    414
    Boo R. Ghost
    May 12, 2004
  2. Jason Heyes
    Replies:
    4
    Views:
    375
    Swampmonster
    Dec 14, 2004
  3. Kevin Wan

    Why large performance degradation?

    Kevin Wan, Jul 28, 2003, in forum: C Programming
    Replies:
    7
    Views:
    507
    Randy Howard
    Jul 29, 2003
  4. saoirse_79

    looping through a list of lists.

    saoirse_79, Oct 8, 2003, in forum: Python
    Replies:
    0
    Views:
    257
    saoirse_79
    Oct 8, 2003
  5. Rob Hunter

    Re: looping through a list of lists.

    Rob Hunter, Oct 8, 2003, in forum: Python
    Replies:
    2
    Views:
    312
    anton muhin
    Oct 8, 2003
Loading...

Share This Page