# The Cephes Mathematical Library

J

#### jacob navia

The cephes math library is a legend in the internet. It was
written by Stephen L. Moshier in 1984, but there was apparently
an even earlier version in assembler, before he rewrote that
in C. It is used in many systems, for instance in the python
language.

I started working with this library around 1997 or whereabouts.
I rewrote the core of it in assembler, fixed some (minor)
bugs, and added several functions like the Lambert's W
function and others. I rewrote all the functions needed for
replicating all math.h

This library is at the center of the qfloat data type in lcc-win
that was one of the first concrete applications of operator

Basically, the library works around

#define _NQ_ 12
typedef struct qfloatstruct {
int sign;
int exponent;
unsigned int mantissa[_NQ_];
}qfloat;

For a 32 bit system, this structure can be seen as an array
of 14 32 bit integers, with array[0] the sign, array[1] the
exponent, and array[2..13] the mantissa.

Within the mantissa, the first array position is always zero
and is used to hold bits that overflow from the higher
positions during the calculations.

When doing the four operations, the software expands this
structure by adding a zero to the end, to hold overflows in the
other direction.

I maintained this structure within the assembler core, and
essentially the 32 bit version uses the above description as is.

When preparing the 64 bit version however, I decided to
optimize the data layout.

1) The extra zero will no longer be stored in the number
but only stored in the temporary forms when actually
doing the calculations. Numbers will be expanded before
any of the four operations, then packed afterwards.

2) The size of the numbers must be a multiple of 2 to better
use the cache lines. A size of 64 bytes is a sensible choice.

All this leads naturally to the following structure
#define MANTISSA_LENGTH 7
typedef struct tagQfloat {
int sign;
unsigned int exponent;
unsigned long long mantissa[MANTISSA_LENGTH];
} Qfloat;

This allows for a precision of 448 bits, with 132 decimal
digits.

Obviously it would be too expensive to pass a 64 byte
structure by value for each function call, so I decided
to use references everywhere. The trick I used is that
instead of declaring variables like this:
Qfloat a,b,c;
I declare them like this:
Qfloat a[1],b[1],c[1];

This has the highly beneficial side effect that when
you call some function like
to add a+b and put the result in c, the numbers are
automatically passed by reference without having to use the
ugly
notation and maintaining the bulk of the library
code intact.

This 64 bit version has more or less the same speed as the 32 bit
version, even if it has around 25% more precision (132 digits
instead of 105). Obviously reducing the number of loops by
half (mantissa is not 14 but 7 elements wide) offset the
extra precision work.

This new library will be shipped with the 64 bit version of lcc-win.

The purpose of this message is to discuss the data structures and the
choices I did. Obviously this is a serious implementation
of floating point, not just an academic exercise where the numbers
are represented just by "unsigned char" or similar solutions.

J

#### jacob navia

jacob said:
The cephes math library is a legend in the internet. It was
written by Stephen L. Moshier in 1984, but there was apparently
an even earlier version in assembler, before he rewrote that
in C. It is used in many systems, for instance in the python
language.

Sorry. After I posted that, I found following comment in
qflti.c
/*
* Revision history:
*
* SLM, 5 Jan 84 PDP-11 assembly language version
* SLM, 2 Mar 86 fixed bug in asctoq()
* SLM, 6 Dec 86 C language version
*
*/

SLM is Stephen L. Moshier

The URL for the version of Mr Moshier is:

http://www.netlib.org/cephes/index.html

J

#### jacob navia

Eric said:
jacob said:
[...]
The purpose of this message is to discuss the data structures and the
choices I did. Obviously this is a serious implementation
of floating point, not just an academic exercise where the numbers
are represented just by "unsigned char" or similar solutions.

Okay, I'll bite: Does this discussion have anything to do
with C, or is it about implementation strategies for a library
of assembly code?

The core of the library is written in C, with the four operations
(add, subtract and multiply, + shift in assembler). I do not
see why you think the library is in assembler.
There seemed to be three C-ish things in your message:

doesn't pad) struct objects. I guess the assumptions
are correct for your compiler, but they also seem
gratuitous: The operations are described in terms of
arrays, so why not just use arrays?

The problem is that arrays must be homogeneous. The exponent and the
sign are 32 bit, the mantissa is 64 bit. If you find a C array that
will do that I would take it immediately of course.

Nowhere in the code a specific layout is assumed. Precisely this
is the object of using a structure. I do not see where do you see
anything that assumes a specific layout...

Obviously, if we have
struct qfloat{ int32_t sign;int32_t exponent;
long long mantissa[7]};
and the compiler aligns each 32 bit integer into a 64 bit slot
we will have a waste of memory locations but the library will work
Obviously the assembler routines would need to be modified to
fit the compiler layout but I see no problem with this.

2) A trick for avoiding an address-of operator in function
calls, by declaring the variables as one-element arrays.
The price paid is that simple assignment now needs extra
syntax: `x = y' becomes `x[0] = y[0]'. TANSTAAFL.

There isn't a single assignment
qfloat a,b;
a=b

in the whole library. Why?

Precisely because the library declared the numbers as arrays. For
unknown hysterical reasons arrays can't be assigned to.

NOW that I have rewritten this library to use structures, the assignment
as you say is possible in the source code. Within the library source
code the function
qmov(a,b); //Move a into b
is used.
3) I thought there was a third thing, but now I can't find it.

So: You've done something of interest to lcc users and maybe
to people who need in high-precision floating-point, and you
should feel good about having done so. But I fail to understand
why the implementation details of a library of assembly code are
a fit topic for a forum devoted to a different language.

The cephes library is written in C. I made that very clear in my
message but maybe not clear enough for you.

Besides the extra precision, the library offers many special functions
in an open implementation. The list of functions is very long, with
elliptical integrals, statistics, physics, and many other functions like
Riemann zeta, Bessel functions, gamma, etc.

All that is written in C, for 32 bit, 64 bit and 448 bit precision.
This is a well known library, and I thought that we could discuss it in
this group.