Inconsistency producing constant for float "infinity"

P

Peter Hansen

I'm investigating a puzzling problem involving an attempt to generate a
constant containing an (IEEE 754) "infinity" value. (I understand that
special float values are a "platform-dependent accident" etc...)

The issue appears possibly to point to a bug in the Python compiler,
with it producing inconsistent results. I'm using "Python 2.4.2 (#67,
Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32".

This code sometimes produces a float of 1.0, sometimes infinity (or,
since I'm on Windows, the float with string representation of "1.#INF"),
as seen in the operand of the LOAD_CONST instruction:

def floatstr(o, allow_nan=True):
INFINITY = 1e66666

# more code follows which does *not* rebind INFINITY

27 0 LOAD_CONST 1 (1.0)
3 STORE_FAST 2 (INFINITY)
....


And, at other times, under circumstances I've yet to isolate:
27 0 LOAD_CONST 1 (1.#INF)
3 STORE_FAST 2 (INFINITY)
...



I'll keep digging to narrow down what's going on, but I wondered if
anyone had heard of or seen a similar problem, or is aware of an
existing issue that could cause this. I checked the list on sourceforge
but can't see anything relevant, nor did Google help.

Just had a thought: could this be an issue involving "marshal" and
either the writing of or reading of the .pyc file? Is it known to be
unsafe to have Python source with special float constants?

-Peter
 
S

Sybren Stuvel

Peter Hansen enlightened us with:
I'm investigating a puzzling problem involving an attempt to
generate a constant containing an (IEEE 754) "infinity" value. (I
understand that special float values are a "platform-dependent
accident" etc...)

Why aren't you simply using the fpconst package?

Sybren
 
P

Peter Hansen

Sybren said:
Peter Hansen enlightened us with:


Why aren't you simply using the fpconst package?

Probably because it's not in the stdlib yet, assuming that's still true.

(Using it might be an option anyway. I'm investigating a problem on
Win32 with simplejson, so it would be Bob Ippolito's choice whether
fpconst is a reasonable solution to the problem.)

My guess about marshal was correct. The problem (value becoming 1.0)
appears when running from .pyc files. Immediately after the source code
is changed, the code works, since it doesn't unmarshal the .pyc file but
just works from the bytecode freshly compiled in memory.

This demonstrates what would be the heart of the problem, which I guess
means this is not surprising to almost anyone, but perhaps will be a
wakeup call to anyone who might still be unaware and has code that
relies on constants like 1e6666 producing infinities:
1.0

-Peter
 
T

Tim Peters

[Peter Hansen]
[also Peter]
...
My guess about marshal was correct.
Yup.

The problem (value becoming 1.0) appears when running from .pyc
files. Immediately after the source code is changed, the code works,
since it doesn't unmarshal the .pyc file but just works from the
bytecode freshly compiled in memory.

This demonstrates what would be the heart of the problem, which I guess
means this is not surprising to almost anyone, but perhaps will be a
wakeup call to anyone who might still be unaware and has code that
relies on constants like 1e6666 producing infinities:

It has a much better chance of working from .pyc in Python 2.5.
Michael Hudson put considerable effort into figuring out whether the
platform uses a recognizable IEEE double storage format, and, if so,
marshal and pickle take different paths that preserve infinities,
NaNs, and signed zeroes.

For Pythons earlier than that, it's a better idea to avoid trying to
express special values as literals. That not only relies on platform
accidents about how marshal works, but also on platform accidents
about the C string->float library works. For example, do instead:

inf = 1e300 * 1e300
nan = inf - inf

If you're /on/ an IEEE-754 platform, those will produce an infinity
and a NaN, from source or fom .pyc.

Illustrating the remarkable truth that the Microsoft C float->string
routines can't read what the MS string->float routines produce for
special values (try reading "1.#INF" from C code with a double format
and you'll also get 1.0 back).

Here in Python 2.5, on Windows (note that marshal grew a new binary
float format, primarily to support this):
'IEEE, little-endian'
 
A

Alex Martelli

Tim Peters said:
It has a much better chance of working from .pyc in Python 2.5.
Michael Hudson put considerable effort into figuring out whether the
platform uses a recognizable IEEE double storage format, and, if so,
marshal and pickle take different paths that preserve infinities,
NaNs, and signed zeroes.

Isn't marshal constrained to work across platforms (for a given Python
release), and pickle also constrainted to work across releases (for a
given protocol)? I'm curious about how this still allows them to "take
different paths" (yeah, I _could_ study the sources, but I'm lazy:)...


Alex
 
T

Tim Peters

[Tim Peters]

[Alex Martelli]
Isn't marshal constrained to work across platforms (for a given Python
release), and pickle also constrainted to work across releases (for a
given protocol)?

Yes to both.
I'm curious about how this still allows them to "take different
paths" (yeah, I _could_ study the sources, but I'm lazy:)...

Good questions. Pickle first: pickle, with protocol >= 1, has always
had a binary format for Python floats, which is identical to the
big-endian IEEE-754 double-precision storage format. This is
independent of the native C double representation: even on a non-IEEE
box (e.g, VAX or Cray), protocol >= 1 pickle does the best it can to
/encode/ native doubles in the big-endian 754 double storage /format/.
This is explained in the docs for the "float8" opcode in
pickletools.py:

The format is unique to Python, and shared with the struct
module (format string '>d') "in theory" (the struct and cPickle
implementations don't share the code -- they should). It's
strongly related to the IEEE-754 double format, and, in normal
cases, is in fact identical to the big-endian 754 double format.
On other boxes the dynamic range is limited to that of a 754
double, and "add a half and chop" rounding is used to reduce
the precision to 53 bits. However, even on a 754 box,
infinities, NaNs, and minus zero may not be handled correctly
(may not survive roundtrip pickling intact).

The problem has been that C89 defines nothing about signed zeroes,
infinities, or NaNs, so even on a 754 box there was no consistency
across platforms in what C library routines like frexp() returned when
fed one of those things. As a result, what Python's "best non-heroic
effort" code for constructing a 754 big-endian representation actually
did was a platform-dependent accident when fed a 754 special-case
value. Likewise for trying to construct a native C double from a 754
representation of a 754 special-case value -- again, there was no
guessing what C library routines like ldexp() would return in those
cases.

Part of what Michael Hudson did for 2.5 is add code to guess whether
the native C double format /is/ the big-endian or little-endian 754
double-precision format. If so, protocol >= 1 pickle in 2.5 uses much
simpler code to pack and unpack Python floats, simply copying from/to
native bytes verbatim (possibly reversing the byte order, depending on
platform endianness). Python doesn't even try to guess whether a C
double is "normal", or an inf, NaN, or signed zero then, so can't
screw that up -- it just copies the bits blindly.

That's much better on IEEE-754 boxes, although I bet it still has
subtle problems. For example, IIRC, 754 doesn't wholly define the
difference in storage formats for signaling NaNs versus quiet NaNs, so
I bet it's still theoretically possible to pickle a signaling NaN on
one 754 box and get back a quiet NaN (or vice versa) when unpickled on
a different 754 box.

Protocol 0 (formerly known as "text mode") pickles are still a crap
shoot for 754 special values, since there's still no consistency
across platforms in what the C string<->double routines produce or
accept for special values.

Now on to marshal. Before 2.5, marshal only had a "text mode" storage
format for Python floats, much like protocol=0 pickle. So, as for
pickle protocol 0, what marshal produced or reconstructed for a 754
special value was a platform-dependent accident.

Michael added a binary marshal format for Python floats in 2.5, which
uses the same code protocol >= 1 pickle uses for serializing and
unserializing Python floats (except that the marshal format is
little-endian instead of big-endian). These all go thru
floatobject.c's _PyFloat_Pack8 and _PyFloat_Unpack8 now, and a quick
glance at those will show that they take different paths according to
whether Michael's native-format-guessing code decided that the native
format was ieee_big_endian_format, ieee_little_endian_format, or
unknown_format. The long-winded pre-2.5 pack/unpack code is only used
in the unknown_format case now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top