relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")


L

Laurent

Hi Folks,

I was arguing with a guy who was sure that incrementing a variable i with "i += 1" is faster than "i = i + 1". I couldn't tell if he was right or wrong so I did a little benchmark with the very useful timeit module.
Here are the results on my little Linux Eeepc Netbook (using Python 3.2):


Computing, please wait...

Results for 1000000 times "i = i + 1":
0.37591004371643066
0.3827171325683594
0.37238597869873047
0.37305116653442383
0.3725881576538086
0.37294602394104004
0.3712761402130127
0.37357497215270996
0.371567964553833
0.37359118461608887
Total 3.7396 seconds.

Results for 1000000 times "i += 1":
0.3821070194244385
0.3802030086517334
0.3828878402709961
0.3823058605194092
0.3801591396331787
0.38340115547180176
0.3795340061187744
0.38153910636901855
0.3835160732269287
0.381864070892334
Total 3.8175 seconds.

==> "i = i + 1" is 2.08% faster than "i += 1".



I did many tests and "i = i + 1" always seems to be around 2% faster than "i += 1". This is no surprise as the += notation seems to be a syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I wrong in my interpretation?

Btw here's the trivial Python 3.2 script I made for this benchmark:


import timeit

r = 10
n = 1000000

s1 = "i = i + 1"
s2 = "i += 1"

t1 = timeit.Timer(stmt=s1, setup="i = 0")
t2 = timeit.Timer(stmt=s2, setup="i = 0")

print("Computing, please wait...")

results1 = t1.repeat(repeat=r, number=n)
results2 = t2.repeat(repeat=r, number=n)

print('\nResults for {} times "{}":'.format(n, s1))
sum1 = 0
for result in results1:
print(result)
sum1 += result
print("Total {:.5} seconds.".format(sum1))

print('\nResults for {} times "{}":'.format(n, s2))
sum2 = 0
for result in results2:
print(result)
sum2 += result
print("Total {:.5} seconds.".format(sum2))

print('\n==> "{}" is {:.3}% faster than "{}".'.format(s1,(sum2 / sum1) * 100 - 100, s2))



Comments are welcome...
 
Ad

Advertisements

W

woooee

as the += notation seems to be a syntaxic sugar layer that has to be
converted to i = i + 1 anyway.

That has always been my understanding. The faster way is to append to
a list as concatenating usually, requires the original string,
accessing an intermediate block of memory, and the memory for the
final string.
x_list.append(value)
to_string = "".join(x_list)
 
L

Laurent

Well I agree with you about string concatenation, but here I'm talking about integers incrementation...
 
I

Irmen de Jong

Well I agree with you about string concatenation, but here I'm talking about integers incrementation...

Seems the two forms are not 100% identical:
.... x=x+1
........ x+=1
....2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE

What the precise difference (semantics and speed) is between the
BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
code or maybe someone knows it from memory :)

Irmen
 
L

Laurent

Thanks for these explanations. So 2% speed difference just between "B..." and "I..." entries in a huge alphabetically-ordered switch case? Wow. Maybe there is some material for speed enhancement here...
 
H

Hans Mulder

What the precise difference (semantics and speed) is between the
BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
code or maybe someone knows it from memory :)

There is a clear difference in semantics: BINARY_ADD always produces
a new object, INPLACE_ADD may modify its left-hand operand in situ
(if it's mutable).

Integers are immutable, so for integers the semantics are the same,
but for lists, for example, the two are different:
x = [2, 3, 5, 7]
y = [11, 13]
x+y [2, 3, 5, 7, 11, 13]
x [2, 3, 5, 7] # x still has its original value
x += y
x [2, 3, 5, 7, 11, 13] # x is now modified

For integers, I would not expect a measurable difference in speed.

Hope this helps,

-- HansM
 
Ad

Advertisements

R

Roy Smith

Christian Heimes said:
Am 21.08.2011 19:27, schrieb Andreas Löscher:

I don't think that's the reason. Modern compiles turn a switch statement
into a jump or branch table rather than a linear search like chained
elif statements.

This is true even for very small values of "modern". I remember the
Unix v6 C compiler (circa 1977) was able to do this.
 
T

Terry Reedy

from Python/ceval.c:

1316 case BINARY_ADD:
1317 w = POP();
1318 v = TOP();
1319 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
1320 /* INLINE: int + int */
1321 register long a, b, i;
1322 a = PyInt_AS_LONG(v);
1323 b = PyInt_AS_LONG(w);
1324 /* cast to avoid undefined behaviour
1325 on overflow */
1326 i = (long)((unsigned long)a + b);
1327 if ((i^a)< 0&& (i^b)< 0)
1328 goto slow_add;
1329 x = PyInt_FromLong(i);
1330 }
1331 else if (PyString_CheckExact(v)&&
1332 PyString_CheckExact(w)) {
1333 x = string_concatenate(v, w, f, next_instr);
1334 /* string_concatenate consumed the ref to v */
1335 goto skip_decref_vx;
1336 }
1337 else {
1338 slow_add:
1339 x = PyNumber_Add(v, w);
1340 }
1341 Py_DECREF(v);
1342 skip_decref_vx:
1343 Py_DECREF(w);
1344 SET_TOP(x);
1345 if (x != NULL) continue;
1346 break;

1532 case INPLACE_ADD:
1533 w = POP();
1534 v = TOP();
1535 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {
1536 /* INLINE: int + int */
1537 register long a, b, i;
1538 a = PyInt_AS_LONG(v);
1539 b = PyInt_AS_LONG(w);
1540 i = a + b;
1541 if ((i^a)< 0&& (i^b)< 0)
1542 goto slow_iadd;
1543 x = PyInt_FromLong(i);
1544 }
1545 else if (PyString_CheckExact(v)&&
1546 PyString_CheckExact(w)) {
1547 x = string_concatenate(v, w, f, next_instr);
1548 /* string_concatenate consumed the ref to v */
1549 goto skip_decref_v;
1550 }
1551 else {
1552 slow_iadd:
1553 x = PyNumber_InPlaceAdd(v, w);
1554 }
1555 Py_DECREF(v);
1556 skip_decref_v:
1557 Py_DECREF(w);
1558 SET_TOP(x);
1559 if (x != NULL) continue;
1560 break;

As for using Integers, the first case (line 1319 and 1535) are true and
there is no difference in Code. However, Python uses a huge switch-case
construct to execute it's opcodes and INPLACE_ADD cames after
BINARY_ADD, hence the difference in speed.

To be clear, this is nothing you should consider when writing fast code.
Complexity wise they both are the same.

With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
floats (0.0 and 1.0), 6%
 
L

Laurent

With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.
 
L

Laurent

With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.
 
Ad

Advertisements

L

Laurent

I did the test several times with floats on my machine and the difference is not as big as for integers:


==> "i = i + 1.0" is 0.732% faster than "i += 1.0".

It seems normal as the float addition is supposed to be slower than integer addition, thus the syntaxic difference is comparatively less important.
 
N

Nobody

I did many tests and "i = i + 1" always seems to be around 2% faster
than "i += 1". This is no surprise as the += notation seems to be a
syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I
wrong in my interpretation?

It depends. If the value on the left has an __iadd__ method, that will be
called; the value will be updated in-place, so all references to that
object will be affected:
import numpy as np
a = np.zeros(3)
b = a
a array([ 0., 0., 0.])
b array([ 0., 0., 0.])
a += 1
a array([ 1., 1., 1.])
b
array([ 1., 1., 1.])

If the value on the left doesn't have an __iadd__ method, then addition is
performed and the name is re-bound to the result:
a = a + 1
a array([ 2., 2., 2.])
b
array([ 1., 1., 1.])

If you're writing code which could reasonably be expected to work with
arbitrary "numeric" values, you should decide which to use according to
whether in-place modification is appropriate rather than trivial
performance differences. If a difference of a few percent is significant,
Python is probably the wrong language in the first place.
 
A

Andreas Löscher

Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
This is true even for very small values of "modern". I remember the
Unix v6 C compiler (circa 1977) was able to do this.

What is the difference in speed between a jump table that is searched
from top to bottom in comparison to an ordinary if-then-elif...? The
difference can only be in the search algorithm regarding the table.
Without optimization (linear search) both are the same. If the compiler
applies some magic the difference can be relevant (linear complexity for
if-then-elif... and O(1) if you would use a dictionary).

Hence the executed code for integers is the same, there must be a slower
path to the code of BINARY_ADD than to INPLACE_ADD.

How would such an jump table work to behave the same liek a
switch-case-statement? Beware, that things like

case PRINT_NEWLINE_TO:
1802 w = stream = POP();
1803 /* fall through to PRINT_NEWLINE */
1804
1805 case PRINT_NEWLINE:

must be supported.


Bye the way:
First line of ceval.c since at least Version 2.4
1
2 /* Execute compiled code */
3
4 /* XXX TO DO:
5 XXX speed up searching for keywords by using a dictionary
6 XXX document it!
7 */

:)
 
Ad

Advertisements

A

Andreas Löscher

Am Sonntag, den 21.08.2011, 12:53 -0700 schrieb Laurent:
For floats it is understandable. But for integers, seriously, 4% is a lot.. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

It's not as bad as you think. The addition of two integers is a cheap
task (in terms of computation power). If you have two way's to to it,
every little think (jumps in the code etc. ) will have a larger impact
on the execution time than on an expensive operation.

But every improvement on your algorithm will easily result in a
significant shorter execution time than replaceing a+=1 with a=a+1 will
ever do. :)
 
C

Chris Angelico

2011/8/22 Andreas Löscher said:
How would such an jump table work to behave the same liek a
switch-case-statement?

If your switch statement uses a simple integer enum with sequential
values, then it can be done quite easily. Take this as an example:

switch (argc)
{
case 0: printf("No args at all, this is weird\n"); break;
case 1: printf("No args\n"); break;
case 2: printf("Default for second arg\n");
case 3: printf("Two args\n"); break;
default: printf("Too many args\n"); break;
}

I compiled this using Open Watcom C, looked at the disassembly, and
hereby translate it into pseudocode (I'll email/post the full 80x86
disassembly if you like):

1) Check if argc > 3 (unsigned comparison), if so jump to default case.
2) Left shift argc two places, add a constant offset, fetch a pointer
from there, and jump to it - that's the jump table. One JMP statement.
3) Code follows for each case.

Incidentally, the Open Watcom compiler actually turned several of the
cases into offset-load of the appropriate string pointer, and then a
jump to the single call to printf. The fall-through from 'case 2' to
'case 3' works fine, although it means that 'case 2' has to be
de-optimized from that one simplification.

This type of optimization works best when the case values are
sequential. (If I remove the 'case 0', the compiler decrements argc
and proceeds to continue as above.) Otherwise, the jump table has to
have a lot of copies of the "default" pointer.

Chris Angelico
 
T

Terry Reedy

Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:

What is the difference in speed between a jump table that is searched
from top to bottom in comparison to an ordinary if-then-elif...? The
difference can only be in the search algorithm regarding the table.
Without optimization (linear search) both are the same. If the compiler
applies some magic the difference can be relevant (linear complexity for
if-then-elif... and O(1) if you would use a dictionary).

A jump or branch table is applicable when the case value values are all
small ints, like bytes or less. For C, the table is simply an array of
pointers (addressess, with entries for unused byte codes would be a void
pointer). Hence, O(1) access.
https://secure.wikimedia.org/wikipedia/en/wiki/Jump_table
Hence the executed code for integers is the same, there must be a slower
path to the code of BINARY_ADD than to INPLACE_ADD.

How would such an jump table work to behave the same liek a
switch-case-statement? Beware, that things like

case PRINT_NEWLINE_TO:
1802 w = stream = POP();
1803 /* fall through to PRINT_NEWLINE */

add jump to address of the code for PRINT_NEWLINE
 
Ad

Advertisements

C

Chris Angelico

2011/8/22 Andreas Löscher said:
But every improvement on your algorithm will easily result in a
significant shorter execution time than replaceing a+=1 with a=a+1 will
ever do. :)

Agreed. If Python needed a faster alternative to "a=a+1", then I would
recommend an "a.inc()" call or something; some way to avoid looking up
the value of 1. (An equivalent to C's ++a operation, if you like.) But
I think you'd be hard-pressed to find any situation where improving
the speed of incrementing will be significant, that wouldn't be better
served by algorithmic improvements (even just using "for a in
range()") or dropping to C.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top