relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")

Discussion in 'Python' started by Laurent, Aug 21, 2011.

  1. Laurent

    Laurent Guest

    Hi Folks,

    I was arguing with a guy who was sure that incrementing a variable i with "i += 1" is faster than "i = i + 1". I couldn't tell if he was right or wrong so I did a little benchmark with the very useful timeit module.
    Here are the results on my little Linux Eeepc Netbook (using Python 3.2):


    Computing, please wait...

    Results for 1000000 times "i = i + 1":
    0.37591004371643066
    0.3827171325683594
    0.37238597869873047
    0.37305116653442383
    0.3725881576538086
    0.37294602394104004
    0.3712761402130127
    0.37357497215270996
    0.371567964553833
    0.37359118461608887
    Total 3.7396 seconds.

    Results for 1000000 times "i += 1":
    0.3821070194244385
    0.3802030086517334
    0.3828878402709961
    0.3823058605194092
    0.3801591396331787
    0.38340115547180176
    0.3795340061187744
    0.38153910636901855
    0.3835160732269287
    0.381864070892334
    Total 3.8175 seconds.

    ==> "i = i + 1" is 2.08% faster than "i += 1".



    I did many tests and "i = i + 1" always seems to be around 2% faster than "i += 1". This is no surprise as the += notation seems to be a syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I wrong in my interpretation?

    Btw here's the trivial Python 3.2 script I made for this benchmark:


    import timeit

    r = 10
    n = 1000000

    s1 = "i = i + 1"
    s2 = "i += 1"

    t1 = timeit.Timer(stmt=s1, setup="i = 0")
    t2 = timeit.Timer(stmt=s2, setup="i = 0")

    print("Computing, please wait...")

    results1 = t1.repeat(repeat=r, number=n)
    results2 = t2.repeat(repeat=r, number=n)

    print('\nResults for {} times "{}":'.format(n, s1))
    sum1 = 0
    for result in results1:
    print(result)
    sum1 += result
    print("Total {:.5} seconds.".format(sum1))

    print('\nResults for {} times "{}":'.format(n, s2))
    sum2 = 0
    for result in results2:
    print(result)
    sum2 += result
    print("Total {:.5} seconds.".format(sum2))

    print('\n==> "{}" is {:.3}% faster than "{}".'.format(s1,(sum2 / sum1) * 100 - 100, s2))



    Comments are welcome...
     
    Laurent, Aug 21, 2011
    #1
    1. Advertising

  2. Laurent

    woooee Guest

    as the += notation seems to be a syntaxic sugar layer that has to be
    converted to i = i + 1 anyway.

    That has always been my understanding. The faster way is to append to
    a list as concatenating usually, requires the original string,
    accessing an intermediate block of memory, and the memory for the
    final string.
    x_list.append(value)
    to_string = "".join(x_list)
     
    woooee, Aug 21, 2011
    #2
    1. Advertising

  3. Laurent

    Laurent Guest

    Well I agree with you about string concatenation, but here I'm talking about integers incrementation...
     
    Laurent, Aug 21, 2011
    #3
  4. On 21-08-11 19:03, Laurent wrote:
    > Well I agree with you about string concatenation, but here I'm talking about integers incrementation...


    Seems the two forms are not 100% identical:

    >>> import dis
    >>> def f1(x):

    .... x=x+1
    ....
    >>> def f2(x):

    .... x+=1
    ....
    >>>
    >>> dis.dis(f1)

    2 0 LOAD_FAST 0 (x)
    3 LOAD_CONST 1 (1)
    6 BINARY_ADD
    7 STORE_FAST 0 (x)
    10 LOAD_CONST 0 (None)
    13 RETURN_VALUE
    >>> dis.dis(f2)

    2 0 LOAD_FAST 0 (x)
    3 LOAD_CONST 1 (1)
    6 INPLACE_ADD
    7 STORE_FAST 0 (x)
    10 LOAD_CONST 0 (None)
    13 RETURN_VALUE
    >>>



    What the precise difference (semantics and speed) is between the
    BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
    code or maybe someone knows it from memory :)

    Irmen
     
    Irmen de Jong, Aug 21, 2011
    #4
  5. Laurent

    Laurent Guest

    Thanks for these explanations. So 2% speed difference just between "B..." and "I..." entries in a huge alphabetically-ordered switch case? Wow. Maybe there is some material for speed enhancement here...
     
    Laurent, Aug 21, 2011
    #5
  6. Laurent

    Hans Mulder Guest

    On 21/08/11 19:14:19, Irmen de Jong wrote:

    > What the precise difference (semantics and speed) is between the
    > BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
    > code or maybe someone knows it from memory :)


    There is a clear difference in semantics: BINARY_ADD always produces
    a new object, INPLACE_ADD may modify its left-hand operand in situ
    (if it's mutable).

    Integers are immutable, so for integers the semantics are the same,
    but for lists, for example, the two are different:

    >>> x = [2, 3, 5, 7]
    >>> y = [11, 13]
    >>> x+y

    [2, 3, 5, 7, 11, 13]
    >>> x

    [2, 3, 5, 7] # x still has its original value
    >>> x += y
    >>> x

    [2, 3, 5, 7, 11, 13] # x is now modified
    >>>


    For integers, I would not expect a measurable difference in speed.

    Hope this helps,

    -- HansM
     
    Hans Mulder, Aug 21, 2011
    #6
  7. Laurent

    Laurent Guest

    Well 2% more time after 1 million iterations so you're right I won't consider it.
     
    Laurent, Aug 21, 2011
    #7
  8. Laurent

    Roy Smith Guest

    In article <>,
    Christian Heimes <> wrote:

    > Am 21.08.2011 19:27, schrieb Andreas Löscher:
    > > As for using Integers, the first case (line 1319 and 1535) are true and
    > > there is no difference in Code. However, Python uses a huge switch-case
    > > construct to execute it's opcodes and INPLACE_ADD cames after
    > > BINARY_ADD, hence the difference in speed.

    >
    > I don't think that's the reason. Modern compiles turn a switch statement
    > into a jump or branch table rather than a linear search like chained
    > elif statements.


    This is true even for very small values of "modern". I remember the
    Unix v6 C compiler (circa 1977) was able to do this.
     
    Roy Smith, Aug 21, 2011
    #8
  9. Laurent

    Terry Reedy Guest

    On 8/21/2011 1:27 PM, Andreas Löscher wrote:
    >> What the precise difference (semantics and speed) is between the
    >> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
    >> code or maybe someone knows it from memory :)
    >>
    >> Irmen
    >>

    > from Python/ceval.c:
    >
    > 1316 case BINARY_ADD:
    > 1317 w = POP();
    > 1318 v = TOP();
    > 1319 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
    > 1320 /* INLINE: int + int */
    > 1321 register long a, b, i;
    > 1322 a = PyInt_AS_LONG(v);
    > 1323 b = PyInt_AS_LONG(w);
    > 1324 /* cast to avoid undefined behaviour
    > 1325 on overflow */
    > 1326 i = (long)((unsigned long)a + b);
    > 1327 if ((i^a)< 0&& (i^b)< 0)
    > 1328 goto slow_add;
    > 1329 x = PyInt_FromLong(i);
    > 1330 }
    > 1331 else if (PyString_CheckExact(v)&&
    > 1332 PyString_CheckExact(w)) {
    > 1333 x = string_concatenate(v, w, f, next_instr);
    > 1334 /* string_concatenate consumed the ref to v */
    > 1335 goto skip_decref_vx;
    > 1336 }
    > 1337 else {
    > 1338 slow_add:
    > 1339 x = PyNumber_Add(v, w);
    > 1340 }
    > 1341 Py_DECREF(v);
    > 1342 skip_decref_vx:
    > 1343 Py_DECREF(w);
    > 1344 SET_TOP(x);
    > 1345 if (x != NULL) continue;
    > 1346 break;
    >
    > 1532 case INPLACE_ADD:
    > 1533 w = POP();
    > 1534 v = TOP();
    > 1535 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
    > 1536 /* INLINE: int + int */
    > 1537 register long a, b, i;
    > 1538 a = PyInt_AS_LONG(v);
    > 1539 b = PyInt_AS_LONG(w);
    > 1540 i = a + b;
    > 1541 if ((i^a)< 0&& (i^b)< 0)
    > 1542 goto slow_iadd;
    > 1543 x = PyInt_FromLong(i);
    > 1544 }
    > 1545 else if (PyString_CheckExact(v)&&
    > 1546 PyString_CheckExact(w)) {
    > 1547 x = string_concatenate(v, w, f, next_instr);
    > 1548 /* string_concatenate consumed the ref to v */
    > 1549 goto skip_decref_v;
    > 1550 }
    > 1551 else {
    > 1552 slow_iadd:
    > 1553 x = PyNumber_InPlaceAdd(v, w);
    > 1554 }
    > 1555 Py_DECREF(v);
    > 1556 skip_decref_v:
    > 1557 Py_DECREF(w);
    > 1558 SET_TOP(x);
    > 1559 if (x != NULL) continue;
    > 1560 break;
    >
    > As for using Integers, the first case (line 1319 and 1535) are true and
    > there is no difference in Code. However, Python uses a huge switch-case
    > construct to execute it's opcodes and INPLACE_ADD cames after
    > BINARY_ADD, hence the difference in speed.
    >
    > To be clear, this is nothing you should consider when writing fast code.
    > Complexity wise they both are the same.


    With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
    floats (0.0 and 1.0), 6%

    --
    Terry Jan Reedy
     
    Terry Reedy, Aug 21, 2011
    #9
  10. Laurent

    Laurent Guest


    > With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
    > floats (0.0 and 1.0), 6%


    For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.
     
    Laurent, Aug 21, 2011
    #10
  11. Laurent

    Laurent Guest


    > With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
    > floats (0.0 and 1.0), 6%


    For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.
     
    Laurent, Aug 21, 2011
    #11
  12. Laurent

    Laurent Guest

    Actually 6% between float themselves is just as not-understandable.
     
    Laurent, Aug 21, 2011
    #12
  13. Laurent

    Laurent Guest

    Actually 6% between float themselves is just as not-understandable.
     
    Laurent, Aug 21, 2011
    #13
  14. Laurent

    Laurent Guest

    I did the test several times with floats on my machine and the difference is not as big as for integers:


    ==> "i = i + 1.0" is 0.732% faster than "i += 1.0".

    It seems normal as the float addition is supposed to be slower than integer addition, thus the syntaxic difference is comparatively less important.
     
    Laurent, Aug 21, 2011
    #14
  15. Laurent

    Nobody Guest

    On Sun, 21 Aug 2011 09:52:23 -0700, Laurent wrote:

    > I did many tests and "i = i + 1" always seems to be around 2% faster
    > than "i += 1". This is no surprise as the += notation seems to be a
    > syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I
    > wrong in my interpretation?


    It depends. If the value on the left has an __iadd__ method, that will be
    called; the value will be updated in-place, so all references to that
    object will be affected:

    > import numpy as np
    > a = np.zeros(3)
    > b = a
    > a

    array([ 0., 0., 0.])
    > b

    array([ 0., 0., 0.])
    > a += 1
    > a

    array([ 1., 1., 1.])
    > b

    array([ 1., 1., 1.])

    If the value on the left doesn't have an __iadd__ method, then addition is
    performed and the name is re-bound to the result:

    > a = a + 1
    > a

    array([ 2., 2., 2.])
    > b

    array([ 1., 1., 1.])

    If you're writing code which could reasonably be expected to work with
    arbitrary "numeric" values, you should decide which to use according to
    whether in-place modification is appropriate rather than trivial
    performance differences. If a difference of a few percent is significant,
    Python is probably the wrong language in the first place.
     
    Nobody, Aug 21, 2011
    #15
  16. Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
    > In article <>,
    > Christian Heimes <> wrote:
    >
    > > Am 21.08.2011 19:27, schrieb Andreas Lscher:
    > > > As for using Integers, the first case (line 1319 and 1535) are true and
    > > > there is no difference in Code. However, Python uses a huge switch-case
    > > > construct to execute it's opcodes and INPLACE_ADD cames after
    > > > BINARY_ADD, hence the difference in speed.

    > >
    > > I don't think that's the reason. Modern compiles turn a switch statement
    > > into a jump or branch table rather than a linear search like chained
    > > elif statements.

    >
    > This is true even for very small values of "modern". I remember the
    > Unix v6 C compiler (circa 1977) was able to do this.


    What is the difference in speed between a jump table that is searched
    from top to bottom in comparison to an ordinary if-then-elif...? The
    difference can only be in the search algorithm regarding the table.
    Without optimization (linear search) both are the same. If the compiler
    applies some magic the difference can be relevant (linear complexity for
    if-then-elif... and O(1) if you would use a dictionary).

    Hence the executed code for integers is the same, there must be a slower
    path to the code of BINARY_ADD than to INPLACE_ADD.

    How would such an jump table work to behave the same liek a
    switch-case-statement? Beware, that things like

    case PRINT_NEWLINE_TO:
    1802 w = stream = POP();
    1803 /* fall through to PRINT_NEWLINE */
    1804
    1805 case PRINT_NEWLINE:

    must be supported.


    Bye the way:
    First line of ceval.c since at least Version 2.4
    1
    2 /* Execute compiled code */
    3
    4 /* XXX TO DO:
    5 XXX speed up searching for keywords by using a dictionary
    6 XXX document it!
    7 */

    :)
     
    Andreas Löscher, Aug 22, 2011
    #16
  17. Am Sonntag, den 21.08.2011, 12:53 -0700 schrieb Laurent:
    > > With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
    > > floats (0.0 and 1.0), 6%

    >
    > For floats it is understandable. But for integers, seriously, 4% is a lot.. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.


    It's not as bad as you think. The addition of two integers is a cheap
    task (in terms of computation power). If you have two way's to to it,
    every little think (jumps in the code etc. ) will have a larger impact
    on the execution time than on an expensive operation.

    But every improvement on your algorithm will easily result in a
    significant shorter execution time than replaceing a+=1 with a=a+1 will
    ever do. :)
     
    Andreas Löscher, Aug 22, 2011
    #17
  18. 2011/8/22 Andreas Löscher <-chemnitz.de>:
    > How would such an jump table work to behave the same liek a
    > switch-case-statement?


    If your switch statement uses a simple integer enum with sequential
    values, then it can be done quite easily. Take this as an example:

    switch (argc)
    {
    case 0: printf("No args at all, this is weird\n"); break;
    case 1: printf("No args\n"); break;
    case 2: printf("Default for second arg\n");
    case 3: printf("Two args\n"); break;
    default: printf("Too many args\n"); break;
    }

    I compiled this using Open Watcom C, looked at the disassembly, and
    hereby translate it into pseudocode (I'll email/post the full 80x86
    disassembly if you like):

    1) Check if argc > 3 (unsigned comparison), if so jump to default case.
    2) Left shift argc two places, add a constant offset, fetch a pointer
    from there, and jump to it - that's the jump table. One JMP statement.
    3) Code follows for each case.

    Incidentally, the Open Watcom compiler actually turned several of the
    cases into offset-load of the appropriate string pointer, and then a
    jump to the single call to printf. The fall-through from 'case 2' to
    'case 3' works fine, although it means that 'case 2' has to be
    de-optimized from that one simplification.

    This type of optimization works best when the case values are
    sequential. (If I remove the 'case 0', the compiler decrements argc
    and proceeds to continue as above.) Otherwise, the jump table has to
    have a lot of copies of the "default" pointer.

    Chris Angelico
     
    Chris Angelico, Aug 22, 2011
    #18
  19. Laurent

    Terry Reedy Guest

    On 8/21/2011 7:17 PM, Andreas Löscher wrote:
    > Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
    >> In article<>,
    >> Christian Heimes<> wrote:
    >>
    >>> Am 21.08.2011 19:27, schrieb Andreas Lscher:
    >>>> As for using Integers, the first case (line 1319 and 1535) are true and
    >>>> there is no difference in Code. However, Python uses a huge switch-case
    >>>> construct to execute it's opcodes and INPLACE_ADD cames after
    >>>> BINARY_ADD, hence the difference in speed.
    >>>
    >>> I don't think that's the reason. Modern compiles turn a switch statement
    >>> into a jump or branch table rather than a linear search like chained
    >>> elif statements.

    >>
    >> This is true even for very small values of "modern". I remember the
    >> Unix v6 C compiler (circa 1977) was able to do this.

    >
    > What is the difference in speed between a jump table that is searched
    > from top to bottom in comparison to an ordinary if-then-elif...? The
    > difference can only be in the search algorithm regarding the table.
    > Without optimization (linear search) both are the same. If the compiler
    > applies some magic the difference can be relevant (linear complexity for
    > if-then-elif... and O(1) if you would use a dictionary).


    A jump or branch table is applicable when the case value values are all
    small ints, like bytes or less. For C, the table is simply an array of
    pointers (addressess, with entries for unused byte codes would be a void
    pointer). Hence, O(1) access.
    https://secure.wikimedia.org/wikipedia/en/wiki/Jump_table

    > Hence the executed code for integers is the same, there must be a slower
    > path to the code of BINARY_ADD than to INPLACE_ADD.
    >
    > How would such an jump table work to behave the same liek a
    > switch-case-statement? Beware, that things like
    >
    > case PRINT_NEWLINE_TO:
    > 1802 w = stream = POP();
    > 1803 /* fall through to PRINT_NEWLINE */


    add jump to address of the code for PRINT_NEWLINE

    > 1804
    > 1805 case PRINT_NEWLINE:
    >
    > must be supported.


    --
    Terry Jan Reedy
     
    Terry Reedy, Aug 22, 2011
    #19
  20. 2011/8/22 Andreas Löscher <-chemnitz.de>:
    > But every improvement on your algorithm will easily result in a
    > significant shorter execution time than replaceing a+=1 with a=a+1 will
    > ever do. :)
    >


    Agreed. If Python needed a faster alternative to "a=a+1", then I would
    recommend an "a.inc()" call or something; some way to avoid looking up
    the value of 1. (An equivalent to C's ++a operation, if you like.) But
    I think you'd be hard-pressed to find any situation where improving
    the speed of incrementing will be significant, that wouldn't be better
    served by algorithmic improvements (even just using "for a in
    range()") or dropping to C.

    ChrisA
     
    Chris Angelico, Aug 22, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. -
    Replies:
    2
    Views:
    371
    Bryce
    Jun 8, 2005
  2. Juan R.
    Replies:
    1
    Views:
    424
    Juan R.
    Feb 11, 2006
  3. InuY4sha

    Kernel modules syntaxes

    InuY4sha, Apr 9, 2008, in forum: C Programming
    Replies:
    1
    Views:
    356
    Antoninus Twink
    Apr 9, 2008
  4. Replies:
    2
    Views:
    386
    tragomaskhalos
    Jun 24, 2008
  5. Dani

    add to words syntaxes

    Dani, Mar 16, 2006, in forum: Ruby
    Replies:
    1
    Views:
    76
    Ross Bamford
    Mar 16, 2006
Loading...

Share This Page