performance degradation when looping through lists

Joachim Worringen · Apr 7, 2006

I need to process large lists (in my real application, this is to parse the
content of a file). I noticed that the performance to access the individual list
elements degrades over runtime.

This can be reproduced easily using this code:

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i))

j = 0
t = time.clock()
for i in range(len(A)):
j += int(A)
if i % p == 0:
t = time.clock() - t
print t

(the string conversion only servers to increase the duration of each iteration;
you can observer the same effect with ints, too).

When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the same
effect in my much more complex application)
2. The execution times increase steadily, from "0" to 0.13 for the "fast"
phases, and from 0.37 to 0.49 for the slow phases.

Within my application, the effect is more drastical as the following numbers
show (the processing time per line is roughly the same for each line!):
at line 10000 (32258 lines/s)
at line 20000 (478 lines/s)
at line 30000 (21276 lines/s)
at line 40000 (475 lines/s)
at line 50000 (15873 lines/s)
at line 60000 (471 lines/s)
at line 70000 (12658 lines/s)
at line 80000 (468 lines/s)
at line 90000 (10638 lines/s)
at line 100000 (464 lines/s)
at line 110000 (9090 lines/s)
at line 120000 (461 lines/s)
at line 130000 (7936 lines/s)
at line 140000 (457 lines/s)
at line 150000 (7042 lines/s)
at line 160000 (454 lines/s)
at line 170000 (6369 lines/s)
at line 180000 (451 lines/s)
at line 190000 (5780 lines/s)
at line 200000 (448 lines/s)
at line 210000 (4854 lines/s)
at line 220000 (444 lines/s)
at line 230000 (4504 lines/s)
at line 240000 (441 lines/s)
at line 250000 (4201 lines/s)
at line 260000 (438 lines/s)
at line 270000 (3952 lines/s)
at line 280000 (435 lines/s)
at line 290000 (3717 lines/s)
at line 300000 (432 lines/s)
at line 310000 (3508 lines/s)
at line 320000 (429 lines/s)
at line 330000 (3322 lines/s)
at line 340000 (426 lines/s)
at line 350000 (3154 lines/s)
at line 360000 (423 lines/s)
at line 370000 (3003 lines/s)
at line 380000 (421 lines/s)
at line 390000 (2873 lines/s)

Any ideas why this is like this, and what I could do about it? It really makes
may application non-scalable as the lines/s go down even further.

bruno at modulix · Apr 7, 2006

Joachim said:
I need to process large lists (in my real application, this is to parse
the content of a file).

Then you probably want to use generators instead of lists. The problem
with large lists is that they eat a lot of memory - which can result in
swapping .

I noticed that the performance to access the
individual list elements degrades over runtime.

I leave this point to gurus, but it may have to do with swapping. Also,
this is not real-time, so variations may have to do with your OS tasks
scheduler.

My 2 cents

Joachim Worringen · Apr 7, 2006

bruno said:
Then you probably want to use generators instead of lists. The problem
with large lists is that they eat a lot of memory - which can result in
swapping .

The effect also shows up in tiny examples (as the one posted) which surely don't
swap on a 512MB machine.

Also, I only read parts of the file into memory to avoid that memory becomes
exhausted.

Of course, using less memory is always a good idea - do you have a pointer on
how to use generators for this application (basically, buffering file content in
memory for faster access)? BTW, the effect also shows up with the linecache module.

I leave this point to gurus, but it may have to do with swapping. Also,
this is not real-time, so variations may have to do with your OS tasks
scheduler.

See above for the swapping. And the OS scheduler may create variations in
runtime, but not monotone degradation. I don't think these two effect come into
play here.

Peter Otten · Apr 7, 2006

Joachim said:
I need to process large lists (in my real application, this is to parse
the content of a file). I noticed that the performance to access the
individual list elements degrades over runtime.

This can be reproduced easily using this code:

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i))

j = 0
t = time.clock()
for i in range(len(A)):
j += int(A)
if i % p == 0:
t = time.clock() - t
print t

(the string conversion only servers to increase the duration of each
iteration; you can observer the same effect with ints, too).

When running this, I get output like this:
0.0
0.37
0.03
0.4
0.06
0.43
0.09
0.46
0.13
0.49

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

I wonder why
1. The execution times alternate between "fast" and "slow" (I observe the
same effect in my much more complex application)

Your timing code is buggy. Change it to

import time

N=100000
p=10000

A=[]
for i in range(N):
A.append(str(i))

j = 0
start = time.clock()
for i in range(len(A)):
j += int(A)
if i % p == 0:
end = time.clock()
print end - start
start = end

Does the problem persist? I hope not.

Peter

Joachim Worringen · Apr 7, 2006

Peter said:
Your timing code is buggy. Change it to

Ooops, you're right. Everything is fine now... Thanks.

Joachim

Alan Franzoni · Apr 7, 2006

Joachim Worringen on comp.lang.python said:

I use Python 2.3.4 (#1, Sep 3 2004, 12:08:45)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-110)] on linux2

Check Peter Otten's answer, and remember as well that GCC 2.96 can lead to
highly strange issues whenever used.

--
Alan Franzoni <[email protected]>
-
Togli .xyz dalla mia email per contattarmi.
Rremove .xyz from my address in order to contact me.
-
GPG Key Fingerprint:
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E

diffuser78 · Apr 7, 2006

Hi,

I wrote a program some days back and I was using lists heavily for
performing operations such as pop, remove, append. My list size was
around 1024x3 and there were around 20 different objects like this.

What I want to ask you is that my program also degraded over a period
of time. I cannot post the code as its lot of code.

But I want to ask a question why List degrade. What other alternative
for lists is a faster measure.

Eveyr help is greatly appreciated,

Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
looping through a list of lists.	0	Oct 8, 2003
Compiler Performance with Compile-Time Array?	2	Oct 27, 2012
seeking to improve Python skills	0	Jan 23, 2009
Segfault when setting an instance property on 2.7.3	0	Aug 25, 2012
python gc performance in large apps	0	Oct 21, 2005
python script as an emergency mailbox cleaner	9	Sep 20, 2003
Small Correction to K&R Exercise 1-22 Solution on CLC-Wiki	8	May 18, 2006

performance degradation when looping through lists

Joachim Worringen

bruno at modulix

Joachim Worringen

Peter Otten

Joachim Worringen

Alan Franzoni

diffuser78

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads