Is a 32-bit build faster than a 64-bit build

Raymond Hettinger · Nov 12, 2010

Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?

Raymond

Stefan Sonnenberg-Carstens · Nov 12, 2010

Am 12.11.2010 22:24, schrieb Raymond Hettinger:

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?

In most cases (all?) I have seen did the performance not differ when
compared
between running python scripts in 32 or 64 bit env.
But that is not true for compiled code.
Just take your script and do some benchmarking.

In germany we say: probieren geht über studieren.
Don't study - try it.

An ex-post explanation will show at hand, I promise

Stefan Behnel · Nov 12, 2010

Raymond Hettinger, 12.11.2010 22:24:

Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance.

OTOH, x86_64 has more registers and allows faster compiler flags for
default installations (e.g. there is no x86_64 processor without MMX and
SSE). So, if you don't compile your software yourself (including OS kernel
and libraries, e.g. OpenSSL), it will likely run faster on 64bits simply
due to the better compiler optimisations.

There are good reasons for both being able to run faster depending on the
specific code, so benchmarking is basically all you can do.

Stefan

Antoine Pitrou · Nov 13, 2010

Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?

On micro-benchmarks, x86-64 is always faster by about 10-30% compared
to x86, simply because of the extended register set and other
additional niceties.

On a benchmark stressing the memory system a little more such as
dcbench-py3k.py in http://bugs.python.org/issue9520, the 64-bit build
is still faster until the tested data structure (a large dict)
overwhelms the 2MB last-level cache in my CPU, after which the 32-bit
build becomes 10% faster for the same numbers of elements:

To be clear, here are the figures in 64-bit mode:

10000 words ( 9092 keys), 2893621 inserts/s, 13426069 lookups/s, 86 bytes/key (0.8MB)
20000 words ( 17699 keys), 3206654 inserts/s, 12338002 lookups/s, 44 bytes/key (0.8MB)
40000 words ( 34490 keys), 2613517 inserts/s, 7643726 lookups/s, 91 bytes/key (3.0MB)
80000 words ( 67148 keys), 2579562 inserts/s, 4872069 lookups/s, 46 bytes/key (3.0MB)
160000 words ( 130897 keys), 2377487 inserts/s, 5765316 lookups/s, 48 bytes/key (6.0MB)
320000 words ( 254233 keys), 2119978 inserts/s, 5003979 lookups/s, 49 bytes/key (12.0MB)
640000 words ( 493191 keys), 1965413 inserts/s, 4640743 lookups/s, 51 bytes/key (24.0MB)
1280000 words ( 956820 keys), 1854546 inserts/s, 4338543 lookups/s, 52 bytes/key (48.0MB)

And here are the figures in 32-bit mode:

10000 words ( 9092 keys), 2250163 inserts/s, 9487229 lookups/s, 43 bytes/key (0.4MB)
20000 words ( 17699 keys), 2543235 inserts/s, 7653839 lookups/s, 22 bytes/key (0.4MB)
40000 words ( 34490 keys), 2360162 inserts/s, 8851543 lookups/s, 45 bytes/key (1.5MB)
80000 words ( 67148 keys), 2415169 inserts/s, 8581037 lookups/s, 23 bytes/key (1.5MB)
160000 words ( 130897 keys), 2203071 inserts/s, 6914732 lookups/s, 24 bytes/key (3.0MB)
320000 words ( 254233 keys), 2005980 inserts/s, 5670133 lookups/s, 24 bytes/key (6.0MB)
640000 words ( 493191 keys), 1856385 inserts/s, 4929790 lookups/s, 25 bytes/key (12.0MB)
1280000 words ( 956820 keys), 1746364 inserts/s, 4530747 lookups/s, 26 bytes/key (24.0MB)

However, it's not obvious to me that a program like "Django or some
other webserver" would have really bad cache locality. Even if the
total working set is larger than the CPU cache, there can still be
quite a good cache efficiency if a large fraction of CPU time is spent
on small datasets.

By the way, I've been experimenting with denser dicts and with
linear probing (in the hope that it will improve cache efficiency and
spatial locality in real applications), and there doesn't seem to be
adverse consequences on micro-benchmarks. Do you think I should upload
a patch?

Regards

Antoine.

py2exe and 64/32 bit windows	3	Apr 9, 2013
64-bit Python for Solaris	0	May 21, 2013
Looking for someone who can build a 64-bit version of SpamBayesinstaller for Windows	0	Feb 23, 2014
Compiling and running 32-bit Python on 64-bit server?	2	Feb 20, 2010
Python 2.7.1 64-bit Build on HP-UX11.31 ia64 with aCC - Manymodules failed to build	0	Sep 18, 2011
Perl 32-bit vs 64-bit question	1	Feb 3, 2012
Can 32-bit and 64-bit Python coexist in the same computer?	1	Oct 26, 2010
CTypes, 64 bit windows, 32 bit dll	6	Mar 31, 2008

Is a 32-bit build faster than a 64-bit build

Raymond Hettinger

Stefan Sonnenberg-Carstens

Stefan Behnel

Antoine Pitrou

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads