# relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")

Discussion in 'Python' started by Laurent, Aug 21, 2011.

1. ### LaurentGuest

Hi Folks,

I was arguing with a guy who was sure that incrementing a variable i with "i += 1" is faster than "i = i + 1". I couldn't tell if he was right or wrong so I did a little benchmark with the very useful timeit module.
Here are the results on my little Linux Eeepc Netbook (using Python 3.2):

Results for 1000000 times "i = i + 1":
0.37591004371643066
0.3827171325683594
0.37238597869873047
0.37305116653442383
0.3725881576538086
0.37294602394104004
0.3712761402130127
0.37357497215270996
0.371567964553833
0.37359118461608887
Total 3.7396 seconds.

Results for 1000000 times "i += 1":
0.3821070194244385
0.3802030086517334
0.3828878402709961
0.3823058605194092
0.3801591396331787
0.38340115547180176
0.3795340061187744
0.38153910636901855
0.3835160732269287
0.381864070892334
Total 3.8175 seconds.

==> "i = i + 1" is 2.08% faster than "i += 1".

I did many tests and "i = i + 1" always seems to be around 2% faster than "i += 1". This is no surprise as the += notation seems to be a syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I wrong in my interpretation?

Btw here's the trivial Python 3.2 script I made for this benchmark:

import timeit

r = 10
n = 1000000

s1 = "i = i + 1"
s2 = "i += 1"

t1 = timeit.Timer(stmt=s1, setup="i = 0")
t2 = timeit.Timer(stmt=s2, setup="i = 0")

results1 = t1.repeat(repeat=r, number=n)
results2 = t2.repeat(repeat=r, number=n)

print('\nResults for {} times "{}":'.format(n, s1))
sum1 = 0
for result in results1:
print(result)
sum1 += result
print("Total {:.5} seconds.".format(sum1))

print('\nResults for {} times "{}":'.format(n, s2))
sum2 = 0
for result in results2:
print(result)
sum2 += result
print("Total {:.5} seconds.".format(sum2))

print('\n==> "{}" is {:.3}% faster than "{}".'.format(s1,(sum2 / sum1) * 100 - 100, s2))

Laurent, Aug 21, 2011

2. ### woooeeGuest

as the += notation seems to be a syntaxic sugar layer that has to be
converted to i = i + 1 anyway.

That has always been my understanding. The faster way is to append to
a list as concatenating usually, requires the original string,
accessing an intermediate block of memory, and the memory for the
final string.
x_list.append(value)
to_string = "".join(x_list)

woooee, Aug 21, 2011

3. ### LaurentGuest

Well I agree with you about string concatenation, but here I'm talking about integers incrementation...

Laurent, Aug 21, 2011
4. ### Irmen de JongGuest

On 21-08-11 19:03, Laurent wrote:
> Well I agree with you about string concatenation, but here I'm talking about integers incrementation...

Seems the two forms are not 100% identical:

>>> import dis
>>> def f1(x):

.... x=x+1
....
>>> def f2(x):

.... x+=1
....
>>>
>>> dis.dis(f1)

7 STORE_FAST 0 (x)
13 RETURN_VALUE
>>> dis.dis(f2)

7 STORE_FAST 0 (x)
13 RETURN_VALUE
>>>

What the precise difference (semantics and speed) is between the
code or maybe someone knows it from memory

Irmen

Irmen de Jong, Aug 21, 2011
5. ### LaurentGuest

Thanks for these explanations. So 2% speed difference just between "B..." and "I..." entries in a huge alphabetically-ordered switch case? Wow. Maybe there is some material for speed enhancement here...

Laurent, Aug 21, 2011
6. ### Hans MulderGuest

On 21/08/11 19:14:19, Irmen de Jong wrote:

> What the precise difference (semantics and speed) is between the
> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
> code or maybe someone knows it from memory

There is a clear difference in semantics: BINARY_ADD always produces
a new object, INPLACE_ADD may modify its left-hand operand in situ
(if it's mutable).

Integers are immutable, so for integers the semantics are the same,
but for lists, for example, the two are different:

>>> x = [2, 3, 5, 7]
>>> y = [11, 13]
>>> x+y

[2, 3, 5, 7, 11, 13]
>>> x

[2, 3, 5, 7] # x still has its original value
>>> x += y
>>> x

[2, 3, 5, 7, 11, 13] # x is now modified
>>>

For integers, I would not expect a measurable difference in speed.

Hope this helps,

-- HansM

Hans Mulder, Aug 21, 2011
7. ### LaurentGuest

Well 2% more time after 1 million iterations so you're right I won't consider it.

Laurent, Aug 21, 2011
8. ### Roy SmithGuest

In article <>,
Christian Heimes <> wrote:

> Am 21.08.2011 19:27, schrieb Andreas Löscher:
> > As for using Integers, the first case (line 1319 and 1535) are true and
> > there is no difference in Code. However, Python uses a huge switch-case
> > construct to execute it's opcodes and INPLACE_ADD cames after
> > BINARY_ADD, hence the difference in speed.

>
> I don't think that's the reason. Modern compiles turn a switch statement
> into a jump or branch table rather than a linear search like chained
> elif statements.

This is true even for very small values of "modern". I remember the
Unix v6 C compiler (circa 1977) was able to do this.

Roy Smith, Aug 21, 2011
9. ### Terry ReedyGuest

On 8/21/2011 1:27 PM, Andreas LÃ¶scher wrote:
>> What the precise difference (semantics and speed) is between the
>> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
>> code or maybe someone knows it from memory
>>
>> Irmen
>>

> from Python/ceval.c:
>
> 1317 w = POP();
> 1318 v = TOP();
> 1319 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
> 1320 /* INLINE: int + int */
> 1321 register long a, b, i;
> 1322 a = PyInt_AS_LONG(v);
> 1323 b = PyInt_AS_LONG(w);
> 1324 /* cast to avoid undefined behaviour
> 1325 on overflow */
> 1326 i = (long)((unsigned long)a + b);
> 1327 if ((i^a)< 0&& (i^b)< 0)
> 1329 x = PyInt_FromLong(i);
> 1330 }
> 1331 else if (PyString_CheckExact(v)&&
> 1332 PyString_CheckExact(w)) {
> 1333 x = string_concatenate(v, w, f, next_instr);
> 1334 /* string_concatenate consumed the ref to v */
> 1335 goto skip_decref_vx;
> 1336 }
> 1337 else {
> 1339 x = PyNumber_Add(v, w);
> 1340 }
> 1341 Py_DECREF(v);
> 1342 skip_decref_vx:
> 1343 Py_DECREF(w);
> 1344 SET_TOP(x);
> 1345 if (x != NULL) continue;
> 1346 break;
>
> 1533 w = POP();
> 1534 v = TOP();
> 1535 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
> 1536 /* INLINE: int + int */
> 1537 register long a, b, i;
> 1538 a = PyInt_AS_LONG(v);
> 1539 b = PyInt_AS_LONG(w);
> 1540 i = a + b;
> 1541 if ((i^a)< 0&& (i^b)< 0)
> 1543 x = PyInt_FromLong(i);
> 1544 }
> 1545 else if (PyString_CheckExact(v)&&
> 1546 PyString_CheckExact(w)) {
> 1547 x = string_concatenate(v, w, f, next_instr);
> 1548 /* string_concatenate consumed the ref to v */
> 1549 goto skip_decref_v;
> 1550 }
> 1551 else {
> 1553 x = PyNumber_InPlaceAdd(v, w);
> 1554 }
> 1555 Py_DECREF(v);
> 1556 skip_decref_v:
> 1557 Py_DECREF(w);
> 1558 SET_TOP(x);
> 1559 if (x != NULL) continue;
> 1560 break;
>
> As for using Integers, the first case (line 1319 and 1535) are true and
> there is no difference in Code. However, Python uses a huge switch-case
> construct to execute it's opcodes and INPLACE_ADD cames after
> BINARY_ADD, hence the difference in speed.
>
> To be clear, this is nothing you should consider when writing fast code.
> Complexity wise they both are the same.

With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
floats (0.0 and 1.0), 6%

--
Terry Jan Reedy

Terry Reedy, Aug 21, 2011
10. ### LaurentGuest

> With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
> floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

Laurent, Aug 21, 2011
11. ### LaurentGuest

> With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
> floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

Laurent, Aug 21, 2011
12. ### LaurentGuest

Actually 6% between float themselves is just as not-understandable.

Laurent, Aug 21, 2011
13. ### LaurentGuest

Actually 6% between float themselves is just as not-understandable.

Laurent, Aug 21, 2011
14. ### LaurentGuest

I did the test several times with floats on my machine and the difference is not as big as for integers:

==> "i = i + 1.0" is 0.732% faster than "i += 1.0".

It seems normal as the float addition is supposed to be slower than integer addition, thus the syntaxic difference is comparatively less important.

Laurent, Aug 21, 2011
15. ### NobodyGuest

On Sun, 21 Aug 2011 09:52:23 -0700, Laurent wrote:

> I did many tests and "i = i + 1" always seems to be around 2% faster
> than "i += 1". This is no surprise as the += notation seems to be a
> syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I
> wrong in my interpretation?

It depends. If the value on the left has an __iadd__ method, that will be
called; the value will be updated in-place, so all references to that
object will be affected:

> import numpy as np
> a = np.zeros(3)
> b = a
> a

array([ 0., 0., 0.])
> b

array([ 0., 0., 0.])
> a += 1
> a

array([ 1., 1., 1.])
> b

array([ 1., 1., 1.])

If the value on the left doesn't have an __iadd__ method, then addition is
performed and the name is re-bound to the result:

> a = a + 1
> a

array([ 2., 2., 2.])
> b

array([ 1., 1., 1.])

If you're writing code which could reasonably be expected to work with
arbitrary "numeric" values, you should decide which to use according to
whether in-place modification is appropriate rather than trivial
performance differences. If a difference of a few percent is significant,
Python is probably the wrong language in the first place.

Nobody, Aug 21, 2011
16. ### Andreas LöscherGuest

Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
> In article <>,
> Christian Heimes <> wrote:
>
> > Am 21.08.2011 19:27, schrieb Andreas Lscher:
> > > As for using Integers, the first case (line 1319 and 1535) are true and
> > > there is no difference in Code. However, Python uses a huge switch-case
> > > construct to execute it's opcodes and INPLACE_ADD cames after
> > > BINARY_ADD, hence the difference in speed.

> >
> > I don't think that's the reason. Modern compiles turn a switch statement
> > into a jump or branch table rather than a linear search like chained
> > elif statements.

>
> This is true even for very small values of "modern". I remember the
> Unix v6 C compiler (circa 1977) was able to do this.

What is the difference in speed between a jump table that is searched
from top to bottom in comparison to an ordinary if-then-elif...? The
difference can only be in the search algorithm regarding the table.
Without optimization (linear search) both are the same. If the compiler
applies some magic the difference can be relevant (linear complexity for
if-then-elif... and O(1) if you would use a dictionary).

Hence the executed code for integers is the same, there must be a slower

How would such an jump table work to behave the same liek a
switch-case-statement? Beware, that things like

case PRINT_NEWLINE_TO:
1802 w = stream = POP();
1803 /* fall through to PRINT_NEWLINE */
1804
1805 case PRINT_NEWLINE:

must be supported.

Bye the way:
First line of ceval.c since at least Version 2.4
1
2 /* Execute compiled code */
3
4 /* XXX TO DO:
5 XXX speed up searching for keywords by using a dictionary
6 XXX document it!
7 */

Andreas Löscher, Aug 22, 2011
17. ### Andreas LöscherGuest

Am Sonntag, den 21.08.2011, 12:53 -0700 schrieb Laurent:
> > With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
> > floats (0.0 and 1.0), 6%

>
> For floats it is understandable. But for integers, seriously, 4% is a lot.. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

It's not as bad as you think. The addition of two integers is a cheap
task (in terms of computation power). If you have two way's to to it,
every little think (jumps in the code etc. ) will have a larger impact
on the execution time than on an expensive operation.

But every improvement on your algorithm will easily result in a
significant shorter execution time than replaceing a+=1 with a=a+1 will
ever do.

Andreas Löscher, Aug 22, 2011
18. ### Chris AngelicoGuest

2011/8/22 Andreas Löscher <-chemnitz.de>:
> How would such an jump table work to behave the same liek a
> switch-case-statement?

If your switch statement uses a simple integer enum with sequential
values, then it can be done quite easily. Take this as an example:

switch (argc)
{
case 0: printf("No args at all, this is weird\n"); break;
case 1: printf("No args\n"); break;
case 2: printf("Default for second arg\n");
case 3: printf("Two args\n"); break;
default: printf("Too many args\n"); break;
}

I compiled this using Open Watcom C, looked at the disassembly, and
hereby translate it into pseudocode (I'll email/post the full 80x86
disassembly if you like):

1) Check if argc > 3 (unsigned comparison), if so jump to default case.
2) Left shift argc two places, add a constant offset, fetch a pointer
from there, and jump to it - that's the jump table. One JMP statement.
3) Code follows for each case.

Incidentally, the Open Watcom compiler actually turned several of the
cases into offset-load of the appropriate string pointer, and then a
jump to the single call to printf. The fall-through from 'case 2' to
'case 3' works fine, although it means that 'case 2' has to be
de-optimized from that one simplification.

This type of optimization works best when the case values are
sequential. (If I remove the 'case 0', the compiler decrements argc
and proceeds to continue as above.) Otherwise, the jump table has to
have a lot of copies of the "default" pointer.

Chris Angelico

Chris Angelico, Aug 22, 2011
19. ### Terry ReedyGuest

On 8/21/2011 7:17 PM, Andreas LÃ¶scher wrote:
> Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
>> In article<>,
>> Christian Heimes<> wrote:
>>
>>> Am 21.08.2011 19:27, schrieb Andreas Lscher:
>>>> As for using Integers, the first case (line 1319 and 1535) are true and
>>>> there is no difference in Code. However, Python uses a huge switch-case
>>>> construct to execute it's opcodes and INPLACE_ADD cames after
>>>> BINARY_ADD, hence the difference in speed.
>>>
>>> I don't think that's the reason. Modern compiles turn a switch statement
>>> into a jump or branch table rather than a linear search like chained
>>> elif statements.

>>
>> This is true even for very small values of "modern". I remember the
>> Unix v6 C compiler (circa 1977) was able to do this.

>
> What is the difference in speed between a jump table that is searched
> from top to bottom in comparison to an ordinary if-then-elif...? The
> difference can only be in the search algorithm regarding the table.
> Without optimization (linear search) both are the same. If the compiler
> applies some magic the difference can be relevant (linear complexity for
> if-then-elif... and O(1) if you would use a dictionary).

A jump or branch table is applicable when the case value values are all
small ints, like bytes or less. For C, the table is simply an array of
pointers (addressess, with entries for unused byte codes would be a void
pointer). Hence, O(1) access.
https://secure.wikimedia.org/wikipedia/en/wiki/Jump_table

> Hence the executed code for integers is the same, there must be a slower
>
> How would such an jump table work to behave the same liek a
> switch-case-statement? Beware, that things like
>
> case PRINT_NEWLINE_TO:
> 1802 w = stream = POP();
> 1803 /* fall through to PRINT_NEWLINE */

> 1804
> 1805 case PRINT_NEWLINE:
>
> must be supported.

--
Terry Jan Reedy

Terry Reedy, Aug 22, 2011
20. ### Chris AngelicoGuest

2011/8/22 Andreas Löscher <-chemnitz.de>:
> But every improvement on your algorithm will easily result in a
> significant shorter execution time than replaceing a+=1 with a=a+1 will
> ever do.
>

Agreed. If Python needed a faster alternative to "a=a+1", then I would
recommend an "a.inc()" call or something; some way to avoid looking up
the value of 1. (An equivalent to C's ++a operation, if you like.) But
I think you'd be hard-pressed to find any situation where improving
the speed of incrementing will be significant, that wouldn't be better
served by algorithmic improvements (even just using "for a in
range()") or dropping to C.

ChrisA

Chris Angelico, Aug 22, 2011