C extension + libm oddity [fmod(2.0, 2.0) == nan ?!]

  • Thread starter Lonnie Princehouse
  • Start date
L

Lonnie Princehouse

I've been trying to debug this for two days now, and it's a longshot
but I'm hoping that someone here might recognize a solution. I've got
a C extension which calls a function in a C library, which calls
another function in another library, which calls another function,
which calls fmod from the standard C math library. All of these are
shared libraries on Linux (x86 Gentoo 2.6.9). In other words, the
calling looks like this:

Python ->
Python C extension ->
Function from library 1 ->
Function from library 2 ->
fmod from libm.so

The critical line of C code looks like this:
ans = fmod ( x, 2.0 );

... where x is a double with value 2.0 (confirmed by gdb) and ans is a
double.

Now, fmod(2.0, 2.0) should be 0.0. The problem? ans is getting
assigned nan! I have stepped through it in the debugger now dozens of
times. Either fmod is putting the wrong return value on the stack, or
the stack is getting corrupted by something else and "ans" is getting
assigned the wrong value.

This happens only inside of the layered Python extension mess; if I try
to compile a test C program that makes similar calls to fmod, they work
just fine. Likewise, a very simple Python wrapper around fmod also
works (e.g. Python -> Python C extension -> fmod)

This all runs in a single thread, so it doesn't seem like it would be a
threading issue unless Python is making some threads under the hood.
All of the intermediary libraries were compiled with (-g -fPIC), no
optimization.

The intermediary libraries represent thousands of lines of very old
code. It is very possible that all sorts of memory leaks and other
subtle bugs exist, but what kind of memory leak could even cause this
kind of glitch!? How can I even approach debugging it? My next step
right now is going to be stepping through the individual
instructions... arrrrrrrrrrrrrgggh.


Versions: Python 2.4.1, gcc 3.3.6, glibc 2.3.5, Gentoo Linux 2.6.9.
 
P

Paul Rubin

Lonnie Princehouse said:
Now, fmod(2.0, 2.0) should be 0.0. The problem? ans is getting
assigned nan! I have stepped through it in the debugger now dozens of
times. Either fmod is putting the wrong return value on the stack, or
the stack is getting corrupted by something else and "ans" is getting
assigned the wrong value.

Have you compiled the C extension with all optimization turned off?
Especially with optimizations on, you can't really tell what is going
to get assigned in a variable because the code isn't computing the
intermediate values you might expect it to. I suggest disassembling
it and stepping through it instruction by instruction if you haven't
done that.
 
L

Lonnie Princehouse

Have you compiled the C extension with all optimization turned off?

Yes. The C extension's objects are compiled only with the debugging
flag and -fPIC for position indepdendent code, necessary for shared
objects. No optimization. The only things that have any optimization
are Python and glibc (using -O2) I guess I should try glibc without
optimization too... ick.
I suggest disassembling it and stepping through it instruction by instruction if you haven't done that.

Unfortunately I think you are correct ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top