Building Python 2.4 with icc and processor-specific optimizations

M

Michael Hoffman

Just out of curiosity, I was wondering if anyone has
compiled Python 2.4 with the Intel C Compiler and its
processor specific optimizations. I can build it fine
with OPT="-O3" or OPT="-xN" but when I try to combine
them I get this as soon as ./python is run:

"""
case $MAKEFLAGS in \
*-s*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3 -xN' ./python -E ./setup.py -q build;; \
*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3 -xN' ./python -E ./setup.py build;; \
esac
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "./setup.py", line 6, in ?
import sys, os, getopt, imp, re
File "/usr/local/src/Python-2.4/Lib/os.py", line 130, in ?
raise ImportError, 'no os specific module found'
ImportError: no os specific module found
make: *** [sharedmods] Error 1
"""

Also, if I run ./python, I have this interesting result:

"""
$ ./python
'import site' failed; use -v for traceback
Python 2.4 (#34, Mar 12 2005, 18:46:28)
[GCC Intel(R) C++ gcc 3.0 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.('__main__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', '__builtin__', 'exceptions', 'gc', 'gc')
"""

Whoa--what's going on? Any ideas?
 
M

Michael Hoffman

Michael said:
Just out of curiosity, I was wondering if anyone has
compiled Python 2.4 with the Intel C Compiler and its
processor specific optimizations. I can build it fine
with OPT="-O3" or OPT="-xN" but when I try to combine
them I get this as soon as ./python is run:

"""
case $MAKEFLAGS in \
*-s*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG
-O3 -xN' ./python -E ./setup.py -q build;; \
*) CC='icc -pthread' LDSHARED='icc -pthread -shared' OPT='-DNDEBUG -O3
-xN' ./python -E ./setup.py build;; \
esac
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "./setup.py", line 6, in ?
import sys, os, getopt, imp, re
File "/usr/local/src/Python-2.4/Lib/os.py", line 130, in ?
raise ImportError, 'no os specific module found'
ImportError: no os specific module found
make: *** [sharedmods] Error 1
"""

Also, if I run ./python, I have this interesting result:

"""
$ ./python
'import site' failed; use -v for traceback
Python 2.4 (#34, Mar 12 2005, 18:46:28)
[GCC Intel(R) C++ gcc 3.0 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.('__main__', '__builtin__', '__builtin__', '__builtin__', '__builtin__',
'__builtin__', '__builtin__', '__builtin__', '__builtin__',
'__builtin__', '__builtin__', '__builtin__', '__builtin__',
'exceptions', 'gc', 'gc')
"""

Whoa--what's going on? Any ideas?

Further investigation reveals that the function that sets
sys.builtin_module_names sorts the list before turning it into a
tuple. And binarysort() in Objects/listobject.c doesn't work when
optimized in that fashion. Adding #pragma optimize("", off)
beforehand solves the problem. Why that is, I have no idea. Is
anyone else curious?

Also, if anyone is looking for a way to squeeze a little extra time
out of the startup, perhaps sorting the list at build-time,
rather than when Python starts would be good. Although probably
not worth the trouble. ;-)
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Michael said:
Further investigation reveals that the function that sets
sys.builtin_module_names sorts the list before turning it into a
tuple. And binarysort() in Objects/listobject.c doesn't work when
optimized in that fashion. Adding #pragma optimize("", off)
beforehand solves the problem. Why that is, I have no idea. Is
anyone else curious?

I would really like to know, indeed. OTOH, I probably don't have the
time to analyse it myself.

Looks like a compiler bug to me: perhaps, some condition is compile-time
asserted to be always true even though it could happen that it is false.

OTOH, it could also be Python's failure to follow C's aliasing rules
correctly; Python casts between C pointers which, in strict C, causes
undefined behaviour. So if your compiler has something similar to GCC's
-fno-strict-aliasing, you could see whether this helps.

If not, just try comparing the assembler output of either code, on
a function-by-function basis. Alternatively, try to annotate the
calls that go out of the sorting (e.g. to RichCompareBool) so that
you get tracing, and then see where the traces differ.
Also, if anyone is looking for a way to squeeze a little extra time
out of the startup, perhaps sorting the list at build-time,
rather than when Python starts would be good. Although probably
not worth the trouble. ;-)

Probably not. config.c is hand-written in some (embedded Python)
environments, and expecting it to be sorted would break these
environments.

Regards,
Martin
 
M

Michael Hoffman

Martin said:
OTOH, it could also be Python's failure to follow C's aliasing rules
correctly; Python casts between C pointers which, in strict C, causes
undefined behaviour. So if your compiler has something similar to GCC's
-fno-strict-aliasing, you could see whether this helps.

There's nothing like that specifically. There is an -falias option
which the manual just says "assume aliasing."
If not, just try comparing the assembler output of either code, on
a function-by-function basis.

Oh boy, it's a 10,000 line diff. The joys of interprocedural
optimization. I think I'll quit while I'm ahead...
> Alternatively, try to annotate the
calls that go out of the sorting (e.g. to RichCompareBool) so that
you get tracing, and then see where the traces differ.

Well, they go wrong almost right away:

non-optimized:

PyObject_RichCompareBool('signal', 'thread', 0)
PyObject_RichCompareBool('posix', 'signal', 0)
PyObject_RichCompareBool('errno', 'posix', 0)
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_codecs', '_sre', 0)
PyObject_RichCompareBool('zipimport', '_codecs', 0)
PyObject_RichCompareBool('zipimport', 'posix', 0)
PyObject_RichCompareBool('zipimport', 'thread', 0)
PyObject_RichCompareBool('_symtable', 'posix', 0)

optimized:

PyObject_RichCompareBool('signal', 'thread', 0)
PyObject_RichCompareBool('posix', 'errno', 0) # hmmm, comparing in the wrong direction
PyObject_RichCompareBool('posix', 'thread', 0)
PyObject_RichCompareBool('posix', 'signal', 0)
PyObject_RichCompareBool('errno', 'errno', 0) # totally bogus!
PyObject_RichCompareBool('errno', 'errno', 0) # and repeating it twice for good measure!
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_sre', 'errno', 0)
PyObject_RichCompareBool('_sre', 'posix', 0)

Well I probably have spent too much time on this already. To top things off, python
compiled with -O3 and without -xN actually runs faster, so I shouldn't even be trying
this road.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top