Paul Simmonds wrote:
[some alternative implementations]
I've done some timings on the functions above, here are the results:
Python2.2.1, 200000 line file(all data lines)
try/except with split: 3.08s
if with slicing: 2.32s
try/except with slicing: 2.34s
So slicing seems quicker than split, and using if instead of
try/except appears to speed it up a little more. I don't know how much
faster the current version of the interpreter would be, but I doubt
the ranking would change much.
Interesting. I doubt that split() itself is slow, instead
I believe that the pure fact that you are calling a function
instead of using a syntactic construct makes things slower,
since method lookup is not so cheap. Unfortunately, split()
cannot be cached into a local variable, since it is obtained
as a new method of the line, all the time. On the other hand,
the same holds for the find method...
Well, I wrote a test program and figured out, that the test
results were very dependant from the order of calling the
functions! This means, the results are not independent,
probably due to the memory usage.
Here some results on Win32, testing repeatedly...
D:\slpdev\src\2.2\src\PCbuild>python -i \python22\py\testlines.pyfunction test_index for 200000 lines took 1.064 seconds.
function test_find for 200000 lines took 1.402 seconds.
function test_split for 200000 lines took 1.560 seconds.function test_index for 200000 lines took 1.395 seconds.
function test_find for 200000 lines took 1.502 seconds.
function test_split for 200000 lines took 1.888 seconds.function test_index for 200000 lines took 1.416 seconds.
function test_find for 200000 lines took 1.655 seconds.
function test_split for 200000 lines took 1.755 seconds.
For that reason, I added a command line mode for testing
single functions, with these results:
D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py index
function test_index for 200000 lines took 1.056 seconds.
D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py find
function test_find for 200000 lines took 1.092 seconds.
D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py split
function test_split for 200000 lines took 1.255 seconds.
The results look much more reasonable; the index thing still
seems to be optimum.
Then I added another test, using an unbound str.index function,
which was again a bit faster.
Finally, I moved the try..except clause out of the game, by
using an explicit, restartable iterator, see the attached program.
D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py index3
function test_index3 for 200000 lines took 0.997 seconds.
As a side result, split seems to be unnecessarily slow.
cheers - chris
--
Christian Tismer :^) <mailto:
[email protected]>
Mission Impossible 5oftware : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a : *Starship*
http://starship.python.net/
14109 Berlin : PGP key ->
http://wwwkeys.pgp.net/
work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today?
http://www.stackless.com/
import sys, time
def test_index(data):
d={}
for l in data:
try: i=l.index('!')
except ValueError: continue
d[l[:i]]=l[i+1:]
return d
def test_find(data):
d={}
for l in data:
i=l.find('!')
if i >= 0:
d[l[:i]]=l[i+1:]
return d
def test_split(data):
d={}
for l in data:
try:
key, value = l.split("!", 1)
except ValueError: continue
d[key] = value
return d
def test_index2(data):
d={}
idx = str.index
for l in data:
try: i=idx(l, '!')
except ValueError: continue
d[l[:i]]=l[i+1:]
return d
def test_index3(data):
d={}
idx = str.index
it = iter(data)
while 1:
try:
for l in it:
i=idx(l, '!')
d[l[:i]]=l[i+1:]
else:
return d
except ValueError: continue
def make_data(n=200000):
return [ "this is some silly key %d!and that some silly value" % i for i in xrange(n) ]
def test(funcnames, n=200000):
if sys.platform == "win32":
default_timer = time.clock
else:
default_timer = time.time
data = make_data(n)
for name in funcnames.split():
fname = "test_"+name
f = globals()[fname]
t = default_timer()
f(data)
t = default_timer() - t
print "function %-10s for %d lines took %0.3f seconds." % (fname, n, t)
if __name__ == "__main__":
funcnames = "index find split index2 index3"
if len(sys.argv) > 1:
funcnames = " ".join(sys.argv[1:])
test(funcnames)