String concatenation benchmarking weirdness

Discussion in 'Python' started by Rotwang, Jan 11, 2013.

  1. Rotwang

    Rotwang Guest

    Hi all,

    the other day I 2to3'ed some code and found it ran much slower in 3.3.0
    than 2.7.2. I fixed the problem but in the process of trying to diagnose
    it I've stumbled upon something weird that I hope someone here can
    explain to me. In what follows I'm using Python 2.7.2 on 64-bit Windows
    7. Suppose I do this:

    from timeit import timeit

    # find out how the time taken to append a character to the end of a byte
    # string depends on the size of the string

    results = []
    for size in range(0, 10000001, 100000):
    results.append(timeit("y = x + 'a'",
    setup = "x = 'a' * %i" % size, number = 1))

    If I plot results against size, what I see is that the time taken
    increases approximately linearly with the size of the string, with the
    string of length 10000000 taking about 4 milliseconds. On the other
    hand, if I replace the statement to be timed with "x = x + 'a'" instead
    of "y = x + 'a'", the time taken seems to be pretty much independent of
    size, apart from a few spikes; the string of length 10000000 takes about
    4 microseconds.

    I get similar results with strings (but not bytes) in 3.3.0. My guess is
    that this is some kind of optimisation that treats strings as mutable
    when carrying out operations that result in the original string being
    discarded. If so it's jolly clever, since it knows when there are other
    references to the same string:

    timeit("x = x + 'a'", setup = "x = y = 'a' * %i" % size, number = 1)
    # grows linearly with size

    timeit("x = x + 'a'", setup = "x, y = 'a' * %i", 'a' * %i"
    % (size, size), number = 1)
    # stays approximately constant

    It also can see through some attempts to fool it:

    timeit("x = ('' + x) + 'a'", setup = "x = 'a' * %i" % size, number = 1)
    # stays approximately constant

    timeit("x = x*1 + 'a'", setup = "x = 'a' * %i" % size, number = 1)
    # stays approximately constant

    Is my guess correct? If not, what is going on? If so, is it possible to
    explain to a programming noob how the interpreter does this? And is
    there a reason why it doesn't work with bytes in 3.3?


    --
    I have made a thing that superficially resembles music:

    http://soundcloud.com/eroneity/we-berated-our-own-crapiness
    Rotwang, Jan 11, 2013
    #1
    1. Advertising

  2. Rotwang

    Rotwang Guest

    On 11/01/2013 20:16, Ian Kelly wrote:
    > On Fri, Jan 11, 2013 at 12:03 PM, Rotwang <> wrote:
    >> Hi all,
    >>
    >> the other day I 2to3'ed some code and found it ran much slower in 3.3.0 than
    >> 2.7.2. I fixed the problem but in the process of trying to diagnose it I've
    >> stumbled upon something weird that I hope someone here can explain to me.
    >>
    >> [stuff about timings]
    >>
    >> Is my guess correct? If not, what is going on? If so, is it possible to
    >> explain to a programming noob how the interpreter does this?

    >
    > Basically, yes. You can find the discussion behind that optimization at:
    >
    > http://bugs.python.org/issue980695
    >
    > It knows when there are other references to the string because all
    > objects in CPython are reference-counted. It also works despite your
    > attempts to "fool" it because after evaluating the first operation
    > (which is easily optimized to return the string itself in both cases),
    > the remaining part of the expression is essentially "x = TOS + 'a'",
    > where x and the top of the stack are the same string object, which is
    > the same state the original code reaches after evaluating just the x.


    Nice, thanks.


    > The stated use case for this optimization is to make repeated
    > concatenation more efficient, but note that it is still generally
    > preferable to use the ''.join() construct, because the optimization is
    > specific to CPython and may not exist for other Python
    > implementations.


    The slowdown in my code was caused by a method that built up a string of
    bytes by repeatedly using +=, before writing the result to a WAV file.
    My fix was to replaced the bytes string with a bytearray, which seems
    about as fast as the rewrite I just tried with b''.join. Do you know
    whether the bytearray method will still be fast on other implementations?


    --
    I have made a thing that superficially resembles music:

    http://soundcloud.com/eroneity/we-berated-our-own-crapiness
    Rotwang, Jan 11, 2013
    #2
    1. Advertising

  3. Rotwang

    Guest

    from timeit import timeit, repeat

    size = 1000

    r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
    print('1:', r)
    r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
    print('2:', r)
    r = repeat("y = x + 'œ'", setup = "x = 'a' * %i" % size)
    print('3:', r)
    r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
    print('4:', r)
    r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
    print('5:', r)
    r = repeat("y = x + 'œ'", setup = "x = 'œ' * %i" % size)
    print('6:', r)
    r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
    print('7:', r)
    r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
    print('8:', r)



    >c:\python32\pythonw -u "vitesse3.py"

    1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
    2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
    3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
    4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
    5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
    6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
    7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
    8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
    >Exit code: 0
    >c:\Python33\pythonw -u "vitesse3.py"

    1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
    2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
    3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
    4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
    5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
    6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
    7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
    8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
    >Exit code: 0



    The difference between a correct (coherent) unicode handling and ...

    jmf
    , Jan 12, 2013
    #3
  4. Rotwang

    Terry Reedy Guest

    On 1/12/2013 3:38 AM, wrote:
    > from timeit import timeit, repeat
    >
    > size = 1000
    >
    > r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
    > print('1:', r)
    > r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
    > print('2:', r)
    > r = repeat("y = x + 'Å“'", setup = "x = 'a' * %i" % size)
    > print('3:', r)
    > r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
    > print('4:', r)
    > r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
    > print('5:', r)
    > r = repeat("y = x + 'Å“'", setup = "x = 'Å“' * %i" % size)
    > print('6:', r)
    > r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
    > print('7:', r)
    > r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
    > print('8:', r)
    >
    >
    >
    >> c:\python32\pythonw -u "vitesse3.py"

    > 1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
    > 2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
    > 3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
    > 4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
    > 5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
    > 6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
    > 7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
    > 8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
    >> Exit code: 0
    >> c:\Python33\pythonw -u "vitesse3.py"

    > 1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
    > 2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
    > 3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
    > 4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
    > 5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
    > 6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
    > 7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
    > 8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
    >> Exit code: 0

    >
    >
    > The difference between a correct (coherent) unicode handling and ...


    By 'correct' Jim means 'speedy', for a subset of string operations*.
    rather than 'accurate'. In 3.2 and before, CPython does not handle
    extended plane characters correctly on Windows and other narrow builds.
    This is, by the way, true of many other languages. For instance, Tcl 8.5
    and before (not sure about the new 8.6) does not handle them at all. The
    same is true of Microsoft command windows.

    * lets try another comparison:

    from timeit import timeit
    print(timeit("a.encode()", "a = 'a'*10000"))

    3.2: 12.1 seconds
    3.3 .7 seconds

    3.3 is 15 times faster!!! (The factor increases with the length of a.)

    A fairer comparison is the approximately 120 micro benchmarks in
    Tools/stringbench.py. Here they are, uncensored, for 3.3.0 and 3.2.3. It
    is in the Tools directory of some distributions but not all (including
    not Windows). It can be downloaded from
    http://hg.python.org/cpython/file/6fe28afa6611/Tools/stringbench

    In FireFox, Right-click on the stringbench.py link and 'Save link as...'
    to somewhere you can run it from.

    >>>

    stringbench v2.0
    3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit
    (AMD64)]
    2013-01-12 06:17:51.685781
    bytes unicode
    (in ms) (in ms) % comment
    ========== case conversion -- dense
    0.41 0.43 95.2 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
    (*1000)
    0.42 0.43 95.8 ("where in the world is carmen san deigo?"*10).upper()
    (*1000)
    ========== case conversion -- rare
    0.41 0.43 95.8 ("Where in the world is Carmen San Deigo?"*10).lower()
    (*1000)
    0.42 0.43 96.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
    (*1000)
    ========== concat 20 strings of words length 4 to 15
    1.83 1.95 94.1 s1+s2+s3+s4+...+s20 (*1000)
    ========== concat two strings
    0.10 0.10 98.7 "Andrew"+"Dalke" (*1000)
    ========== count AACT substrings in DNA example
    2.46 2.44 100.9 dna.count("AACT") (*10)
    ========== count newlines
    0.77 0.75 103.6 ...text.with.2000.newlines.count("\n") (*10)
    ========== early match, single character
    0.30 0.27 110.5 ("A"*1000).find("A") (*1000)
    0.45 0.06 750.5 "A" in "A"*1000 (*1000)
    0.30 0.27 110.4 ("A"*1000).index("A") (*1000)
    0.24 0.22 107.2 ("A"*1000).partition("A") (*1000)
    0.33 0.29 116.6 ("A"*1000).rfind("A") (*1000)
    0.32 0.29 107.9 ("A"*1000).rindex("A") (*1000)
    0.20 0.21 94.1 ("A"*1000).rpartition("A") (*1000)
    0.42 0.45 93.4 ("A"*1000).rsplit("A", 1) (*1000)
    0.39 0.41 95.9 ("A"*1000).split("A", 1) (*1000)
    ========== early match, two characters
    0.32 0.27 121.1 ("AB"*1000).find("AB") (*1000)
    0.45 0.06 729.5 "AB" in "AB"*1000 (*1000)
    0.30 0.27 111.2 ("AB"*1000).index("AB") (*1000)
    0.23 0.28 85.0 ("AB"*1000).partition("AB") (*1000)
    0.33 0.30 110.6 ("AB"*1000).rfind("AB") (*1000)
    0.33 0.30 110.5 ("AB"*1000).rindex("AB") (*1000)
    0.22 0.27 83.1 ("AB"*1000).rpartition("AB") (*1000)
    0.46 0.47 96.7 ("AB"*1000).rsplit("AB", 1) (*1000)
    0.44 0.48 90.9 ("AB"*1000).split("AB", 1) (*1000)
    ========== endswith multiple characters
    0.24 0.29 84.0 "Andrew".endswith("Andrew") (*1000)
    ========== endswith multiple characters - not!
    0.26 0.28 92.9 "Andrew".endswith("Anders") (*1000)
    ========== endswith single character
    0.25 0.28 90.0 "Andrew".endswith("w") (*1000)
    ========== formatting a string type with a dict
    N/A 0.67 0.0 "The %(k1)s is %(k2)s the
    %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
    ========== join empty string, with 1 character sep
    N/A 0.06 0.0 "A".join("") (*100)
    ========== join empty string, with 5 character sep
    N/A 0.06 0.0 "ABCDE".join("") (*100)
    ========== join list of 100 words, with 1 character sep
    0.87 1.27 68.8 "A".join(["Bob"]*100)) (*1000)
    ========== join list of 100 words, with 5 character sep
    1.14 1.54 74.0 "ABCDE".join(["Bob"]*100)) (*1000)
    ========== join list of 26 characters, with 1 character sep
    0.27 0.37 72.0 "A".join(list("ABC..Z")) (*1000)
    ========== join list of 26 characters, with 5 character sep
    0.32 0.43 75.7 "ABCDE".join(list("ABC..Z")) (*1000)
    ========== join string with 26 characters, with 1 character sep
    N/A 1.30 0.0 "A".join("ABC..Z") (*1000)
    ========== join string with 26 characters, with 5 character sep
    N/A 1.37 0.0 "ABCDE".join("ABC..Z") (*1000)
    ========== late match, 100 characters
    3.25 3.23 100.5 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
    2.79 2.78 100.4 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
    1.98 1.94 102.3 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
    3.24 3.23 100.3 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
    4.26 3.62 117.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
    3.23 3.23 100.1 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
    2.32 2.32 100.1 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
    3.23 3.21 100.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
    3.58 3.57 100.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
    3.60 3.60 100.0 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
    3.60 3.56 101.2 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
    ========== late match, two characters
    0.62 0.58 106.3 ("AB"*300+"C").find("BC") (*1000)
    0.92 0.82 111.8 ("AB"*300+"CA").find("CA") (*1000)
    0.73 0.33 218.8 "BC" in ("AB"*300+"C") (*1000)
    0.61 0.60 101.0 ("AB"*300+"C").index("BC") (*1000)
    0.54 0.82 66.4 ("AB"*300+"C").partition("BC") (*1000)
    0.66 0.63 104.6 ("C"+"AB"*300).rfind("CA") (*1000)
    0.91 0.88 102.3 ("BC"+"AB"*300).rfind("BC") (*1000)
    0.65 0.62 105.1 ("C"+"AB"*300).rindex("CA") (*1000)
    0.53 0.56 94.5 ("C"+"AB"*300).rpartition("CA") (*1000)
    0.75 0.77 96.6 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
    0.65 0.67 97.0 ("AB"*300+"C").split("BC", 1) (*1000)
    ========== no match, single character
    0.89 0.87 102.3 ("A"*1000).find("B") (*1000)
    1.03 0.64 159.1 "B" in "A"*1000 (*1000)
    0.67 0.68 98.7 ("A"*1000).partition("B") (*1000)
    0.87 0.85 102.8 ("A"*1000).rfind("B") (*1000)
    0.67 0.68 98.5 ("A"*1000).rpartition("B") (*1000)
    0.87 0.87 99.2 ("A"*1000).rsplit("B", 1) (*1000)
    0.86 0.85 101.5 ("A"*1000).split("B", 1) (*1000)
    ========== no match, two characters
    1.22 1.16 104.9 ("AB"*1000).find("BC") (*1000)
    1.93 2.02 95.2 ("AB"*1000).find("CA") (*1000)
    1.37 0.94 145.3 "BC" in "AB"*1000 (*1000)
    1.39 2.14 65.1 ("AB"*1000).partition("BC") (*1000)
    2.32 2.31 100.7 ("AB"*1000).rfind("BC") (*1000)
    1.47 1.44 102.1 ("AB"*1000).rfind("CA") (*1000)
    2.26 2.27 99.7 ("AB"*1000).rpartition("BC") (*1000)
    2.46 2.45 100.2 ("AB"*1000).rsplit("BC", 1) (*1000)
    1.15 1.16 99.1 ("AB"*1000).split("BC", 1) (*1000)
    ========== quick replace multiple character match
    0.13 0.12 105.0 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
    ========== quick replace single character match
    0.12 0.12 105.2 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
    ========== repeat 1 character 10 times
    0.08 0.10 80.6 "A"*10 (*1000)
    ========== repeat 1 character 1000 times
    0.16 0.18 93.1 "A"*1000 (*1000)
    ========== repeat 5 characters 10 times
    0.11 0.13 84.4 "ABCDE"*10 (*1000)
    ========== repeat 5 characters 1000 times
    0.39 0.41 94.8 "ABCDE"*1000 (*1000)
    ========== replace and expand multiple characters, big string
    2.02 2.36 85.6 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
    ========== replace multiple characters, dna
    3.12 3.23 96.6 dna.replace("ATC", "ATT") (*10)
    ========== replace single character
    0.33 0.40 82.4 "This is a test".replace(" ", "\t") (*1000)
    ========== replace single character, big string
    0.75 0.86 87.4 "...text.with.2000.lines...replace("\n", " ") (*10)
    ========== replace/remove multiple characters
    0.41 0.48 86.1 "When shall we three meet again?".replace("ee", "") (*1000)
    ========== split 1 whitespace
    0.14 0.18 79.3 ("Here are some words. "*2).partition(" ") (*1000)
    0.11 0.14 75.1 ("Here are some words. "*2).rpartition(" ") (*1000)
    0.35 0.39 90.3 ("Here are some words. "*2).rsplit(None, 1) (*1000)
    0.32 0.38 83.9 ("Here are some words. "*2).split(None, 1) (*1000)
    ========== split 2000 newlines
    1.74 2.02 86.3 "...text...".rsplit("\n") (*10)
    1.69 1.97 85.5 "...text...".split("\n") (*10)
    1.89 2.55 74.0 "...text...".splitlines() (*10)
    ========== split newlines
    0.35 0.39 88.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
    0.34 0.40 86.4 "this\nis\na\ntest\n".split("\n") (*1000)
    0.32 0.40 80.7 "this\nis\na\ntest\n".splitlines() (*1000)
    ========== split on multicharacter separator (dna)
    2.28 2.30 99.1 dna.rsplit("ACTAT") (*10)
    2.63 2.66 98.9 dna.split("ACTAT") (*10)
    ========== split on multicharacter separator (small)
    0.55 0.69 79.0
    "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
    (*1000)
    0.58 0.70 82.9
    "this--is--a--test--of--the--emergency--broadcast--system".split("--")
    (*1000)
    ========== split whitespace (huge)
    1.51 2.12 71.4 human_text.rsplit() (*10)
    1.51 2.05 73.6 human_text.split() (*10)
    ========== split whitespace (small)
    0.48 0.68 70.1 ("Here are some words. "*2).rsplit() (*1000)
    0.48 0.64 74.9 ("Here are some words. "*2).split() (*1000)
    ========== startswith multiple characters
    0.24 0.25 95.9 "Andrew".startswith("Andrew") (*1000)
    ========== startswith multiple characters - not!
    0.24 0.25 95.7 "Andrew".startswith("Anders") (*1000)
    ========== startswith single character
    0.23 0.25 95.4 "Andrew".startswith("A") (*1000)
    ========== strip terminal newline
    0.09 0.21 44.1 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
    0.09 0.12 74.0 "\nHello!".rstrip() (*1000)
    0.09 0.12 74.0 "Hello!\n".rstrip() (*1000)
    0.09 0.12 71.6 "\nHello!\n".strip() (*1000)
    0.09 0.12 73.2 "\nHello!".strip() (*1000)
    0.09 0.12 72.9 "Hello!\n".strip() (*1000)
    ========== strip terminal spaces and tabs
    0.09 0.13 69.6 "\t \tHello".rstrip() (*1000)
    0.09 0.13 72.3 "Hello\t \t".rstrip() (*1000)
    0.07 0.08 86.8 "Hello\t \t".strip() (*1000)
    ========== tab split
    0.59 0.65 90.9 GFF3_example.rsplit("\t", 8) (*1000)
    0.55 0.59 94.2 GFF3_example.rsplit("\t") (*1000)
    0.52 0.57 90.7 GFF3_example.split("\t", 8) (*1000)
    0.52 0.57 90.1 GFF3_example.split("\t") (*1000)
    108.87 116.31 93.6 TOTAL
    >>>

    stringbench v2.0
    3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)]
    2013-01-12 06:23:05.994000
    bytes unicode
    (in ms) (in ms) % comment
    ========== case conversion -- dense
    0.63 3.01 21.0 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
    (*1000)
    0.63 2.90 21.5 ("where in the world is carmen san deigo?"*10).upper()
    (*1000)
    ========== case conversion -- rare
    0.84 2.83 29.8 ("Where in the world is Carmen San Deigo?"*10).lower()
    (*1000)
    0.50 3.47 14.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
    (*1000)
    ========== concat 20 strings of words length 4 to 15
    1.82 1.75 103.9 s1+s2+s3+s4+...+s20 (*1000)
    ========== concat two strings
    0.09 0.08 115.5 "Andrew"+"Dalke" (*1000)
    ========== count AACT substrings in DNA example
    2.40 2.64 91.1 dna.count("AACT") (*10)
    ========== count newlines
    0.77 0.75 101.6 ...text.with.2000.newlines.count("\n") (*10)
    ========== early match, single character
    0.19 0.18 101.9 ("A"*1000).find("A") (*1000)
    0.39 0.05 824.7 "A" in "A"*1000 (*1000)
    0.19 0.19 96.3 ("A"*1000).index("A") (*1000)
    0.20 0.22 87.5 ("A"*1000).partition("A") (*1000)
    0.20 0.20 101.8 ("A"*1000).rfind("A") (*1000)
    0.20 0.20 101.2 ("A"*1000).rindex("A") (*1000)
    0.18 0.22 82.5 ("A"*1000).rpartition("A") (*1000)
    0.41 0.45 91.7 ("A"*1000).rsplit("A", 1) (*1000)
    0.42 0.43 99.0 ("A"*1000).split("A", 1) (*1000)
    ========== early match, two characters
    0.19 0.19 102.3 ("AB"*1000).find("AB") (*1000)
    0.39 0.05 781.6 "AB" in "AB"*1000 (*1000)
    0.19 0.20 97.9 ("AB"*1000).index("AB") (*1000)
    0.23 0.33 71.1 ("AB"*1000).partition("AB") (*1000)
    0.20 0.20 101.6 ("AB"*1000).rfind("AB") (*1000)
    0.20 0.20 100.1 ("AB"*1000).rindex("AB") (*1000)
    0.22 0.31 70.4 ("AB"*1000).rpartition("AB") (*1000)
    0.47 0.53 90.0 ("AB"*1000).rsplit("AB", 1) (*1000)
    0.45 0.52 85.0 ("AB"*1000).split("AB", 1) (*1000)
    ========== endswith multiple characters
    0.18 0.18 97.6 "Andrew".endswith("Andrew") (*1000)
    ========== endswith multiple characters - not!
    0.18 0.18 100.4 "Andrew".endswith("Anders") (*1000)
    ========== endswith single character
    0.18 0.18 97.1 "Andrew".endswith("w") (*1000)
    ========== formatting a string type with a dict
    N/A 0.53 0.0 "The %(k1)s is %(k2)s the
    %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
    ========== join empty string, with 1 character sep
    N/A 0.05 0.0 "A".join("") (*100)
    ========== join empty string, with 5 character sep
    N/A 0.05 0.0 "ABCDE".join("") (*100)
    ========== join list of 100 words, with 1 character sep
    1.02 1.02 99.6 "A".join(["Bob"]*100)) (*1000)
    ========== join list of 100 words, with 5 character sep
    1.25 1.48 84.4 "ABCDE".join(["Bob"]*100)) (*1000)
    ========== join list of 26 characters, with 1 character sep
    0.31 0.25 122.9 "A".join(list("ABC..Z")) (*1000)
    ========== join list of 26 characters, with 5 character sep
    0.36 0.41 88.4 "ABCDE".join(list("ABC..Z")) (*1000)
    ========== join string with 26 characters, with 1 character sep
    N/A 1.06 0.0 "A".join("ABC..Z") (*1000)
    ========== join string with 26 characters, with 5 character sep
    N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000)
    ========== late match, 100 characters
    2.52 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
    2.35 3.06 76.9 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
    1.55 1.61 96.2 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
    2.51 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
    3.57 4.66 76.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
    3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
    2.35 2.56 91.7 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
    3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
    3.58 3.92 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
    3.62 3.96 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
    2.89 3.38 85.4 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
    ========== late match, two characters
    0.52 0.52 99.5 ("AB"*300+"C").find("BC") (*1000)
    0.69 0.90 76.5 ("AB"*300+"CA").find("CA") (*1000)
    0.67 0.37 179.2 "BC" in ("AB"*300+"C") (*1000)
    0.51 0.53 96.8 ("AB"*300+"C").index("BC") (*1000)
    0.48 0.81 59.3 ("AB"*300+"C").partition("BC") (*1000)
    0.55 0.55 101.5 ("C"+"AB"*300).rfind("CA") (*1000)
    0.85 0.85 100.0 ("BC"+"AB"*300).rfind("BC") (*1000)
    0.55 0.55 100.3 ("C"+"AB"*300).rindex("CA") (*1000)
    0.52 0.60 87.1 ("C"+"AB"*300).rpartition("CA") (*1000)
    0.78 0.82 95.4 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
    0.65 0.72 91.2 ("AB"*300+"C").split("BC", 1) (*1000)
    ========== no match, single character
    0.77 0.77 100.6 ("A"*1000).find("B") (*1000)
    0.98 0.63 155.1 "B" in "A"*1000 (*1000)
    0.66 0.66 99.7 ("A"*1000).partition("B") (*1000)
    0.77 0.77 100.4 ("A"*1000).rfind("B") (*1000)
    0.66 0.66 99.7 ("A"*1000).rpartition("B") (*1000)
    0.88 0.88 100.4 ("A"*1000).rsplit("B", 1) (*1000)
    0.88 0.87 101.2 ("A"*1000).split("B", 1) (*1000)
    ========== no match, two characters
    1.19 1.21 98.1 ("AB"*1000).find("BC") (*1000)
    1.79 2.51 71.2 ("AB"*1000).find("CA") (*1000)
    1.28 1.08 119.1 "BC" in "AB"*1000 (*1000)
    1.10 2.11 52.1 ("AB"*1000).partition("BC") (*1000)
    2.37 2.37 100.0 ("AB"*1000).rfind("BC") (*1000)
    1.36 1.36 100.5 ("AB"*1000).rfind("CA") (*1000)
    2.25 2.26 99.9 ("AB"*1000).rpartition("BC") (*1000)
    2.38 2.62 90.7 ("AB"*1000).rsplit("BC", 1) (*1000)
    1.18 1.30 90.1 ("AB"*1000).split("BC", 1) (*1000)
    ========== quick replace multiple character match
    0.12 0.32 37.1 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
    ========== quick replace single character match
    0.12 0.30 37.9 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
    ========== repeat 1 character 10 times
    0.08 0.09 90.3 "A"*10 (*1000)
    ========== repeat 1 character 1000 times
    0.16 0.19 82.2 "A"*1000 (*1000)
    ========== repeat 5 characters 10 times
    0.11 0.12 98.3 "ABCDE"*10 (*1000)
    ========== repeat 5 characters 1000 times
    0.40 0.58 67.9 "ABCDE"*1000 (*1000)
    ========== replace and expand multiple characters, big string
    1.95 2.13 91.7 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
    ========== replace multiple characters, dna
    2.93 3.25 90.3 dna.replace("ATC", "ATT") (*10)
    ========== replace single character
    0.25 0.26 96.6 "This is a test".replace(" ", "\t") (*1000)
    ========== replace single character, big string
    0.73 1.01 72.0 "...text.with.2000.lines...replace("\n", " ") (*10)
    ========== replace/remove multiple characters
    0.30 0.34 89.0 "When shall we three meet again?".replace("ee", "") (*1000)
    ========== split 1 whitespace
    0.12 0.13 93.3 ("Here are some words. "*2).partition(" ") (*1000)
    0.11 0.11 98.8 ("Here are some words. "*2).rpartition(" ") (*1000)
    0.32 0.37 86.5 ("Here are some words. "*2).rsplit(None, 1) (*1000)
    0.32 0.33 96.9 ("Here are some words. "*2).split(None, 1) (*1000)
    ========== split 2000 newlines
    1.76 2.19 80.5 "...text...".rsplit("\n") (*10)
    1.72 2.10 81.9 "...text...".split("\n") (*10)
    1.87 2.58 72.4 "...text...".splitlines() (*10)
    ========== split newlines
    0.36 0.34 103.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
    0.35 0.33 105.9 "this\nis\na\ntest\n".split("\n") (*1000)
    0.31 0.34 89.7 "this\nis\na\ntest\n".splitlines() (*1000)
    ========== split on multicharacter separator (dna)
    2.18 2.34 93.4 dna.rsplit("ACTAT") (*10)
    2.50 2.64 94.5 dna.split("ACTAT") (*10)
    ========== split on multicharacter separator (small)
    0.59 0.62 95.3
    "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
    (*1000)
    0.55 0.59 93.1
    "this--is--a--test--of--the--emergency--broadcast--system".split("--")
    (*1000)
    ========== split whitespace (huge)
    1.54 2.34 65.5 human_text.rsplit() (*10)
    1.51 2.22 68.3 human_text.split() (*10)
    ========== split whitespace (small)
    0.46 0.60 76.5 ("Here are some words. "*2).rsplit() (*1000)
    0.45 0.51 87.6 ("Here are some words. "*2).split() (*1000)
    ========== startswith multiple characters
    0.18 0.18 97.3 "Andrew".startswith("Andrew") (*1000)
    ========== startswith multiple characters - not!
    0.18 0.18 100.1 "Andrew".startswith("Anders") (*1000)
    ========== startswith single character
    0.17 0.18 96.8 "Andrew".startswith("A") (*1000)
    ========== strip terminal newline
    0.11 0.21 52.0 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
    0.06 0.07 92.1 "\nHello!".rstrip() (*1000)
    0.06 0.07 92.2 "Hello!\n".rstrip() (*1000)
    0.06 0.07 91.2 "\nHello!\n".strip() (*1000)
    0.06 0.07 91.1 "\nHello!".strip() (*1000)
    0.06 0.07 91.1 "Hello!\n".strip() (*1000)
    ========== strip terminal spaces and tabs
    0.07 0.07 89.4 "\t \tHello".rstrip() (*1000)
    0.07 0.07 91.4 "Hello\t \t".rstrip() (*1000)
    0.04 0.05 88.7 "Hello\t \t".strip() (*1000)
    ========== tab split
    0.57 0.56 100.8 GFF3_example.rsplit("\t", 8) (*1000)
    0.53 0.53 100.7 GFF3_example.rsplit("\t") (*1000)
    0.49 0.49 101.2 GFF3_example.split("\t", 8) (*1000)
    0.51 0.49 103.5 GFF3_example.split("\t") (*1000)
    102.13 125.57 81.3 TOTAL

    --
    Terry Jan Reedy
    Terry Reedy, Jan 12, 2013
    #4
  5. Rotwang

    Ian Kelly Guest

    On Sat, Jan 12, 2013 at 1:38 AM, <> wrote:
    > The difference between a correct (coherent) unicode handling and ...


    This thread was about byte string concatenation, not unicode, so your
    rant is not even on-topic here.
    Ian Kelly, Jan 12, 2013
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walala
    Replies:
    3
    Views:
    4,785
    walala
    Sep 18, 2003
  2. steve
    Replies:
    4
    Views:
    517
    Brian van den Broek
    Mar 13, 2005
  3. Chris Angelico
    Replies:
    0
    Views:
    139
    Chris Angelico
    Jan 12, 2013
  4. Terry Reedy
    Replies:
    0
    Views:
    142
    Terry Reedy
    Jan 12, 2013
  5. Chris Angelico
    Replies:
    0
    Views:
    126
    Chris Angelico
    Jan 12, 2013
Loading...

Share This Page