String concatenation benchmarking weirdness

R

Rotwang

Hi all,

the other day I 2to3'ed some code and found it ran much slower in 3.3.0
than 2.7.2. I fixed the problem but in the process of trying to diagnose
it I've stumbled upon something weird that I hope someone here can
explain to me. In what follows I'm using Python 2.7.2 on 64-bit Windows
7. Suppose I do this:

from timeit import timeit

# find out how the time taken to append a character to the end of a byte
# string depends on the size of the string

results = []
for size in range(0, 10000001, 100000):
results.append(timeit("y = x + 'a'",
setup = "x = 'a' * %i" % size, number = 1))

If I plot results against size, what I see is that the time taken
increases approximately linearly with the size of the string, with the
string of length 10000000 taking about 4 milliseconds. On the other
hand, if I replace the statement to be timed with "x = x + 'a'" instead
of "y = x + 'a'", the time taken seems to be pretty much independent of
size, apart from a few spikes; the string of length 10000000 takes about
4 microseconds.

I get similar results with strings (but not bytes) in 3.3.0. My guess is
that this is some kind of optimisation that treats strings as mutable
when carrying out operations that result in the original string being
discarded. If so it's jolly clever, since it knows when there are other
references to the same string:

timeit("x = x + 'a'", setup = "x = y = 'a' * %i" % size, number = 1)
# grows linearly with size

timeit("x = x + 'a'", setup = "x, y = 'a' * %i", 'a' * %i"
% (size, size), number = 1)
# stays approximately constant

It also can see through some attempts to fool it:

timeit("x = ('' + x) + 'a'", setup = "x = 'a' * %i" % size, number = 1)
# stays approximately constant

timeit("x = x*1 + 'a'", setup = "x = 'a' * %i" % size, number = 1)
# stays approximately constant

Is my guess correct? If not, what is going on? If so, is it possible to
explain to a programming noob how the interpreter does this? And is
there a reason why it doesn't work with bytes in 3.3?
 
R

Rotwang

Hi all,

the other day I 2to3'ed some code and found it ran much slower in 3.3.0 than
2.7.2. I fixed the problem but in the process of trying to diagnose it I've
stumbled upon something weird that I hope someone here can explain to me.

[stuff about timings]

Is my guess correct? If not, what is going on? If so, is it possible to
explain to a programming noob how the interpreter does this?

Basically, yes. You can find the discussion behind that optimization at:

http://bugs.python.org/issue980695

It knows when there are other references to the string because all
objects in CPython are reference-counted. It also works despite your
attempts to "fool" it because after evaluating the first operation
(which is easily optimized to return the string itself in both cases),
the remaining part of the expression is essentially "x = TOS + 'a'",
where x and the top of the stack are the same string object, which is
the same state the original code reaches after evaluating just the x.

Nice, thanks.

The stated use case for this optimization is to make repeated
concatenation more efficient, but note that it is still generally
preferable to use the ''.join() construct, because the optimization is
specific to CPython and may not exist for other Python
implementations.

The slowdown in my code was caused by a method that built up a string of
bytes by repeatedly using +=, before writing the result to a WAV file.
My fix was to replaced the bytes string with a bytearray, which seems
about as fast as the rewrite I just tried with b''.join. Do you know
whether the bytearray method will still be fast on other implementations?
 
W

wxjmfauth

from timeit import timeit, repeat

size = 1000

r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
print('1:', r)
r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
print('2:', r)
r = repeat("y = x + 'œ'", setup = "x = 'a' * %i" % size)
print('3:', r)
r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
print('4:', r)
r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
print('5:', r)
r = repeat("y = x + 'œ'", setup = "x = 'œ' * %i" % size)
print('6:', r)
r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
print('7:', r)
r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
print('8:', r)


c:\python32\pythonw -u "vitesse3.py"
1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
Exit code: 0
c:\Python33\pythonw -u "vitesse3.py"
1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
Exit code: 0


The difference between a correct (coherent) unicode handling and ...

jmf
 
T

Terry Reedy

from timeit import timeit, repeat

size = 1000

r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
print('1:', r)
r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
print('2:', r)
r = repeat("y = x + 'Å“'", setup = "x = 'a' * %i" % size)
print('3:', r)
r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
print('4:', r)
r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
print('5:', r)
r = repeat("y = x + 'Å“'", setup = "x = 'Å“' * %i" % size)
print('6:', r)
r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
print('7:', r)
r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
print('8:', r)


c:\python32\pythonw -u "vitesse3.py"
1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
Exit code: 0
c:\Python33\pythonw -u "vitesse3.py"
1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
Exit code: 0


The difference between a correct (coherent) unicode handling and ...

By 'correct' Jim means 'speedy', for a subset of string operations*.
rather than 'accurate'. In 3.2 and before, CPython does not handle
extended plane characters correctly on Windows and other narrow builds.
This is, by the way, true of many other languages. For instance, Tcl 8.5
and before (not sure about the new 8.6) does not handle them at all. The
same is true of Microsoft command windows.

* lets try another comparison:

from timeit import timeit
print(timeit("a.encode()", "a = 'a'*10000"))

3.2: 12.1 seconds
3.3 .7 seconds

3.3 is 15 times faster!!! (The factor increases with the length of a.)

A fairer comparison is the approximately 120 micro benchmarks in
Tools/stringbench.py. Here they are, uncensored, for 3.3.0 and 3.2.3. It
is in the Tools directory of some distributions but not all (including
not Windows). It can be downloaded from
http://hg.python.org/cpython/file/6fe28afa6611/Tools/stringbench

In FireFox, Right-click on the stringbench.py link and 'Save link as...'
to somewhere you can run it from.
stringbench v2.0
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit
(AMD64)]
2013-01-12 06:17:51.685781
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.41 0.43 95.2 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
(*1000)
0.42 0.43 95.8 ("where in the world is carmen san deigo?"*10).upper()
(*1000)
========== case conversion -- rare
0.41 0.43 95.8 ("Where in the world is Carmen San Deigo?"*10).lower()
(*1000)
0.42 0.43 96.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
(*1000)
========== concat 20 strings of words length 4 to 15
1.83 1.95 94.1 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.10 0.10 98.7 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.46 2.44 100.9 dna.count("AACT") (*10)
========== count newlines
0.77 0.75 103.6 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.30 0.27 110.5 ("A"*1000).find("A") (*1000)
0.45 0.06 750.5 "A" in "A"*1000 (*1000)
0.30 0.27 110.4 ("A"*1000).index("A") (*1000)
0.24 0.22 107.2 ("A"*1000).partition("A") (*1000)
0.33 0.29 116.6 ("A"*1000).rfind("A") (*1000)
0.32 0.29 107.9 ("A"*1000).rindex("A") (*1000)
0.20 0.21 94.1 ("A"*1000).rpartition("A") (*1000)
0.42 0.45 93.4 ("A"*1000).rsplit("A", 1) (*1000)
0.39 0.41 95.9 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.32 0.27 121.1 ("AB"*1000).find("AB") (*1000)
0.45 0.06 729.5 "AB" in "AB"*1000 (*1000)
0.30 0.27 111.2 ("AB"*1000).index("AB") (*1000)
0.23 0.28 85.0 ("AB"*1000).partition("AB") (*1000)
0.33 0.30 110.6 ("AB"*1000).rfind("AB") (*1000)
0.33 0.30 110.5 ("AB"*1000).rindex("AB") (*1000)
0.22 0.27 83.1 ("AB"*1000).rpartition("AB") (*1000)
0.46 0.47 96.7 ("AB"*1000).rsplit("AB", 1) (*1000)
0.44 0.48 90.9 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.24 0.29 84.0 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.26 0.28 92.9 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.25 0.28 90.0 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.67 0.0 "The %(k1)s is %(k2)s the
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.06 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.06 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
0.87 1.27 68.8 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.14 1.54 74.0 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.27 0.37 72.0 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.32 0.43 75.7 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.30 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.37 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
3.25 3.23 100.5 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.79 2.78 100.4 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.98 1.94 102.3 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
3.24 3.23 100.3 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
4.26 3.62 117.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23 3.23 100.1 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.32 2.32 100.1 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23 3.21 100.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58 3.57 100.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.60 3.60 100.0 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
3.60 3.56 101.2 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.62 0.58 106.3 ("AB"*300+"C").find("BC") (*1000)
0.92 0.82 111.8 ("AB"*300+"CA").find("CA") (*1000)
0.73 0.33 218.8 "BC" in ("AB"*300+"C") (*1000)
0.61 0.60 101.0 ("AB"*300+"C").index("BC") (*1000)
0.54 0.82 66.4 ("AB"*300+"C").partition("BC") (*1000)
0.66 0.63 104.6 ("C"+"AB"*300).rfind("CA") (*1000)
0.91 0.88 102.3 ("BC"+"AB"*300).rfind("BC") (*1000)
0.65 0.62 105.1 ("C"+"AB"*300).rindex("CA") (*1000)
0.53 0.56 94.5 ("C"+"AB"*300).rpartition("CA") (*1000)
0.75 0.77 96.6 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65 0.67 97.0 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.89 0.87 102.3 ("A"*1000).find("B") (*1000)
1.03 0.64 159.1 "B" in "A"*1000 (*1000)
0.67 0.68 98.7 ("A"*1000).partition("B") (*1000)
0.87 0.85 102.8 ("A"*1000).rfind("B") (*1000)
0.67 0.68 98.5 ("A"*1000).rpartition("B") (*1000)
0.87 0.87 99.2 ("A"*1000).rsplit("B", 1) (*1000)
0.86 0.85 101.5 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.22 1.16 104.9 ("AB"*1000).find("BC") (*1000)
1.93 2.02 95.2 ("AB"*1000).find("CA") (*1000)
1.37 0.94 145.3 "BC" in "AB"*1000 (*1000)
1.39 2.14 65.1 ("AB"*1000).partition("BC") (*1000)
2.32 2.31 100.7 ("AB"*1000).rfind("BC") (*1000)
1.47 1.44 102.1 ("AB"*1000).rfind("CA") (*1000)
2.26 2.27 99.7 ("AB"*1000).rpartition("BC") (*1000)
2.46 2.45 100.2 ("AB"*1000).rsplit("BC", 1) (*1000)
1.15 1.16 99.1 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.13 0.12 105.0 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12 0.12 105.2 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08 0.10 80.6 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.16 0.18 93.1 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11 0.13 84.4 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.39 0.41 94.8 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
2.02 2.36 85.6 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
3.12 3.23 96.6 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.33 0.40 82.4 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.75 0.86 87.4 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.41 0.48 86.1 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.14 0.18 79.3 ("Here are some words. "*2).partition(" ") (*1000)
0.11 0.14 75.1 ("Here are some words. "*2).rpartition(" ") (*1000)
0.35 0.39 90.3 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32 0.38 83.9 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.74 2.02 86.3 "...text...".rsplit("\n") (*10)
1.69 1.97 85.5 "...text...".split("\n") (*10)
1.89 2.55 74.0 "...text...".splitlines() (*10)
========== split newlines
0.35 0.39 88.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.34 0.40 86.4 "this\nis\na\ntest\n".split("\n") (*1000)
0.32 0.40 80.7 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.28 2.30 99.1 dna.rsplit("ACTAT") (*10)
2.63 2.66 98.9 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.55 0.69 79.0
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
(*1000)
0.58 0.70 82.9
"this--is--a--test--of--the--emergency--broadcast--system".split("--")
(*1000)
========== split whitespace (huge)
1.51 2.12 71.4 human_text.rsplit() (*10)
1.51 2.05 73.6 human_text.split() (*10)
========== split whitespace (small)
0.48 0.68 70.1 ("Here are some words. "*2).rsplit() (*1000)
0.48 0.64 74.9 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.24 0.25 95.9 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.24 0.25 95.7 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.23 0.25 95.4 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.09 0.21 44.1 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.09 0.12 74.0 "\nHello!".rstrip() (*1000)
0.09 0.12 74.0 "Hello!\n".rstrip() (*1000)
0.09 0.12 71.6 "\nHello!\n".strip() (*1000)
0.09 0.12 73.2 "\nHello!".strip() (*1000)
0.09 0.12 72.9 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.09 0.13 69.6 "\t \tHello".rstrip() (*1000)
0.09 0.13 72.3 "Hello\t \t".rstrip() (*1000)
0.07 0.08 86.8 "Hello\t \t".strip() (*1000)
========== tab split
0.59 0.65 90.9 GFF3_example.rsplit("\t", 8) (*1000)
0.55 0.59 94.2 GFF3_example.rsplit("\t") (*1000)
0.52 0.57 90.7 GFF3_example.split("\t", 8) (*1000)
0.52 0.57 90.1 GFF3_example.split("\t") (*1000)
108.87 116.31 93.6 TOTALstringbench v2.0
3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)]
2013-01-12 06:23:05.994000
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.63 3.01 21.0 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
(*1000)
0.63 2.90 21.5 ("where in the world is carmen san deigo?"*10).upper()
(*1000)
========== case conversion -- rare
0.84 2.83 29.8 ("Where in the world is Carmen San Deigo?"*10).lower()
(*1000)
0.50 3.47 14.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
(*1000)
========== concat 20 strings of words length 4 to 15
1.82 1.75 103.9 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.09 0.08 115.5 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.40 2.64 91.1 dna.count("AACT") (*10)
========== count newlines
0.77 0.75 101.6 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.19 0.18 101.9 ("A"*1000).find("A") (*1000)
0.39 0.05 824.7 "A" in "A"*1000 (*1000)
0.19 0.19 96.3 ("A"*1000).index("A") (*1000)
0.20 0.22 87.5 ("A"*1000).partition("A") (*1000)
0.20 0.20 101.8 ("A"*1000).rfind("A") (*1000)
0.20 0.20 101.2 ("A"*1000).rindex("A") (*1000)
0.18 0.22 82.5 ("A"*1000).rpartition("A") (*1000)
0.41 0.45 91.7 ("A"*1000).rsplit("A", 1) (*1000)
0.42 0.43 99.0 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.19 0.19 102.3 ("AB"*1000).find("AB") (*1000)
0.39 0.05 781.6 "AB" in "AB"*1000 (*1000)
0.19 0.20 97.9 ("AB"*1000).index("AB") (*1000)
0.23 0.33 71.1 ("AB"*1000).partition("AB") (*1000)
0.20 0.20 101.6 ("AB"*1000).rfind("AB") (*1000)
0.20 0.20 100.1 ("AB"*1000).rindex("AB") (*1000)
0.22 0.31 70.4 ("AB"*1000).rpartition("AB") (*1000)
0.47 0.53 90.0 ("AB"*1000).rsplit("AB", 1) (*1000)
0.45 0.52 85.0 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.18 0.18 97.6 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.18 0.18 100.4 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.18 0.18 97.1 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.53 0.0 "The %(k1)s is %(k2)s the
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.05 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.05 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
1.02 1.02 99.6 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.25 1.48 84.4 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.31 0.25 122.9 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.36 0.41 88.4 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.06 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
2.52 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.35 3.06 76.9 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.55 1.61 96.2 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
2.51 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
3.57 4.66 76.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.35 2.56 91.7 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58 3.92 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.62 3.96 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
2.89 3.38 85.4 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.52 0.52 99.5 ("AB"*300+"C").find("BC") (*1000)
0.69 0.90 76.5 ("AB"*300+"CA").find("CA") (*1000)
0.67 0.37 179.2 "BC" in ("AB"*300+"C") (*1000)
0.51 0.53 96.8 ("AB"*300+"C").index("BC") (*1000)
0.48 0.81 59.3 ("AB"*300+"C").partition("BC") (*1000)
0.55 0.55 101.5 ("C"+"AB"*300).rfind("CA") (*1000)
0.85 0.85 100.0 ("BC"+"AB"*300).rfind("BC") (*1000)
0.55 0.55 100.3 ("C"+"AB"*300).rindex("CA") (*1000)
0.52 0.60 87.1 ("C"+"AB"*300).rpartition("CA") (*1000)
0.78 0.82 95.4 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65 0.72 91.2 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.77 0.77 100.6 ("A"*1000).find("B") (*1000)
0.98 0.63 155.1 "B" in "A"*1000 (*1000)
0.66 0.66 99.7 ("A"*1000).partition("B") (*1000)
0.77 0.77 100.4 ("A"*1000).rfind("B") (*1000)
0.66 0.66 99.7 ("A"*1000).rpartition("B") (*1000)
0.88 0.88 100.4 ("A"*1000).rsplit("B", 1) (*1000)
0.88 0.87 101.2 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.19 1.21 98.1 ("AB"*1000).find("BC") (*1000)
1.79 2.51 71.2 ("AB"*1000).find("CA") (*1000)
1.28 1.08 119.1 "BC" in "AB"*1000 (*1000)
1.10 2.11 52.1 ("AB"*1000).partition("BC") (*1000)
2.37 2.37 100.0 ("AB"*1000).rfind("BC") (*1000)
1.36 1.36 100.5 ("AB"*1000).rfind("CA") (*1000)
2.25 2.26 99.9 ("AB"*1000).rpartition("BC") (*1000)
2.38 2.62 90.7 ("AB"*1000).rsplit("BC", 1) (*1000)
1.18 1.30 90.1 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.12 0.32 37.1 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12 0.30 37.9 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08 0.09 90.3 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.16 0.19 82.2 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11 0.12 98.3 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.40 0.58 67.9 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
1.95 2.13 91.7 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
2.93 3.25 90.3 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.25 0.26 96.6 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.73 1.01 72.0 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.30 0.34 89.0 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.12 0.13 93.3 ("Here are some words. "*2).partition(" ") (*1000)
0.11 0.11 98.8 ("Here are some words. "*2).rpartition(" ") (*1000)
0.32 0.37 86.5 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32 0.33 96.9 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.76 2.19 80.5 "...text...".rsplit("\n") (*10)
1.72 2.10 81.9 "...text...".split("\n") (*10)
1.87 2.58 72.4 "...text...".splitlines() (*10)
========== split newlines
0.36 0.34 103.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.35 0.33 105.9 "this\nis\na\ntest\n".split("\n") (*1000)
0.31 0.34 89.7 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.18 2.34 93.4 dna.rsplit("ACTAT") (*10)
2.50 2.64 94.5 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.59 0.62 95.3
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
(*1000)
0.55 0.59 93.1
"this--is--a--test--of--the--emergency--broadcast--system".split("--")
(*1000)
========== split whitespace (huge)
1.54 2.34 65.5 human_text.rsplit() (*10)
1.51 2.22 68.3 human_text.split() (*10)
========== split whitespace (small)
0.46 0.60 76.5 ("Here are some words. "*2).rsplit() (*1000)
0.45 0.51 87.6 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.18 0.18 97.3 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.18 0.18 100.1 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.17 0.18 96.8 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.11 0.21 52.0 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.06 0.07 92.1 "\nHello!".rstrip() (*1000)
0.06 0.07 92.2 "Hello!\n".rstrip() (*1000)
0.06 0.07 91.2 "\nHello!\n".strip() (*1000)
0.06 0.07 91.1 "\nHello!".strip() (*1000)
0.06 0.07 91.1 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.07 0.07 89.4 "\t \tHello".rstrip() (*1000)
0.07 0.07 91.4 "Hello\t \t".rstrip() (*1000)
0.04 0.05 88.7 "Hello\t \t".strip() (*1000)
========== tab split
0.57 0.56 100.8 GFF3_example.rsplit("\t", 8) (*1000)
0.53 0.53 100.7 GFF3_example.rsplit("\t") (*1000)
0.49 0.49 101.2 GFF3_example.split("\t", 8) (*1000)
0.51 0.49 103.5 GFF3_example.split("\t") (*1000)
102.13 125.57 81.3 TOTAL
 
I

Ian Kelly

The difference between a correct (coherent) unicode handling and ...

This thread was about byte string concatenation, not unicode, so your
rant is not even on-topic here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top