Is a 32-bit build faster than a 64-bit build

R

Raymond Hettinger

Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?


Raymond
 
S

Stefan Sonnenberg-Carstens

Am 12.11.2010 22:24, schrieb Raymond Hettinger:
My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?
In most cases (all?) I have seen did the performance not differ when
compared
between running python scripts in 32 or 64 bit env.
But that is not true for compiled code.
Just take your script and do some benchmarking.

In germany we say: probieren geht über studieren.
Don't study - try it.

An ex-post explanation will show at hand, I promise :)
 
S

Stefan Behnel

Raymond Hettinger, 12.11.2010 22:24:
Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance.

OTOH, x86_64 has more registers and allows faster compiler flags for
default installations (e.g. there is no x86_64 processor without MMX and
SSE). So, if you don't compile your software yourself (including OS kernel
and libraries, e.g. OpenSSL), it will likely run faster on 64bits simply
due to the better compiler optimisations.

There are good reasons for both being able to run faster depending on the
specific code, so benchmarking is basically all you can do.

Stefan
 
A

Antoine Pitrou

Has anyone here benchmarked a 32-bit Python versus a 64-bit Python for
Django or some other webserver?

My hypotheses is that for apps not needing the 64-bit address space,
the 32-bit version has better memory utilization and hence better
cache performance. If so, then switching python versions may enable a
single server to handle a greater traffic load. Has anyone here tried
that?

On micro-benchmarks, x86-64 is always faster by about 10-30% compared
to x86, simply because of the extended register set and other
additional niceties.

On a benchmark stressing the memory system a little more such as
dcbench-py3k.py in http://bugs.python.org/issue9520, the 64-bit build
is still faster until the tested data structure (a large dict)
overwhelms the 2MB last-level cache in my CPU, after which the 32-bit
build becomes 10% faster for the same numbers of elements:

To be clear, here are the figures in 64-bit mode:

10000 words ( 9092 keys), 2893621 inserts/s, 13426069 lookups/s, 86 bytes/key (0.8MB)
20000 words ( 17699 keys), 3206654 inserts/s, 12338002 lookups/s, 44 bytes/key (0.8MB)
40000 words ( 34490 keys), 2613517 inserts/s, 7643726 lookups/s, 91 bytes/key (3.0MB)
80000 words ( 67148 keys), 2579562 inserts/s, 4872069 lookups/s, 46 bytes/key (3.0MB)
160000 words ( 130897 keys), 2377487 inserts/s, 5765316 lookups/s, 48 bytes/key (6.0MB)
320000 words ( 254233 keys), 2119978 inserts/s, 5003979 lookups/s, 49 bytes/key (12.0MB)
640000 words ( 493191 keys), 1965413 inserts/s, 4640743 lookups/s, 51 bytes/key (24.0MB)
1280000 words ( 956820 keys), 1854546 inserts/s, 4338543 lookups/s, 52 bytes/key (48.0MB)

And here are the figures in 32-bit mode:

10000 words ( 9092 keys), 2250163 inserts/s, 9487229 lookups/s, 43 bytes/key (0.4MB)
20000 words ( 17699 keys), 2543235 inserts/s, 7653839 lookups/s, 22 bytes/key (0.4MB)
40000 words ( 34490 keys), 2360162 inserts/s, 8851543 lookups/s, 45 bytes/key (1.5MB)
80000 words ( 67148 keys), 2415169 inserts/s, 8581037 lookups/s, 23 bytes/key (1.5MB)
160000 words ( 130897 keys), 2203071 inserts/s, 6914732 lookups/s, 24 bytes/key (3.0MB)
320000 words ( 254233 keys), 2005980 inserts/s, 5670133 lookups/s, 24 bytes/key (6.0MB)
640000 words ( 493191 keys), 1856385 inserts/s, 4929790 lookups/s, 25 bytes/key (12.0MB)
1280000 words ( 956820 keys), 1746364 inserts/s, 4530747 lookups/s, 26 bytes/key (24.0MB)


However, it's not obvious to me that a program like "Django or some
other webserver" would have really bad cache locality. Even if the
total working set is larger than the CPU cache, there can still be
quite a good cache efficiency if a large fraction of CPU time is spent
on small datasets.

By the way, I've been experimenting with denser dicts and with
linear probing (in the hope that it will improve cache efficiency and
spatial locality in real applications), and there doesn't seem to be
adverse consequences on micro-benchmarks. Do you think I should upload
a patch?

Regards

Antoine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top