Anupam said:
Eric Sosman said:
There was a thread about three years ago regarding
implementations of integer square root functions without
resorting to a floating-point conversion. [...]
I would be very much interested in the article. Could you
provide a link to that thread?
Anupam: There's a wonderful service on the Internet,
known as Google. Among other things, Google maintains
searchable archives of Usenet newsgroups, including our
own comp.lang.c. Now, what do you suppose you find if
you go to
http://groups.google.com and search for "integer
square root" in the comp.lang.c archives?
And if you can, then the
speeded-up code too.
It's just a few simple modifications to the code in
the first article of the cited thread. I also threw in
a whole lot of comments to describe what was going on;
it was essentially a "demonstration piece." Here it is,
except that I've blotted some E-mail addresses out of
the source citations so as not to increase anyone's
exposure to spam:
/* HISTORY --
* 04-Jul-2002: Renamed the header file.
* 29-Jun-2000: Added a lot of comments.
* 19-May-2000: Original (see DESCRIPTION).
*/
/* DESCRIPTION --
* This function is a slight improvement on code posted to comp.lang.c by
* Lawrence Kirby (
[email protected]), which seems to have been based on code
* by Dann Corbit (
[email protected]), which he in turn traces back to
* an 80386 assembly routine by one Arne Steinarson. Arne got the method from
* a fragment of temple wall retrieved from sunken Atlantis by Coq Jasteau;
* the stony text refers to the method as a gift from the "Ancient Astronauts"
* and hints of dire consequences to impious Atlanteans who affronted the
* Heavens by converting the method to binary from its original base seven.
*
* My principal improvement was to use a table of rounded rather than
* truncated first approximations; this reduced the number of cases requiring
* two Newton-Raphson iterations instead of just one. I also eliminated some
* needless additions of unity here and there, and got rid of a pessimization
* which tested for x >= 65535*65535. Finally, I rearranged the range tests
* along the lines suggested by Hermann Kremer (
[email protected]),
* although whether this is a good idea or not depends on what sort of input
* distribution is anticipated.
*/
#include <limits.h>
#include "imath.h"
#if UINT_MAX != 0xFFFFFFFF
#error "This code needs a 32-bit unsigned integer"
#endif
unsigned int
isqrt(unsigned int x)
{
/* This table holds the rounded square roots of the integers 0..255, each
* expressed as a fixed-point binary number with four bits to the left and
* four bits to the right of the binary point. For example, sqrt(2) is
* 1.414... decimal or 1.01101... binary, which is rounded to 1.0111 and
* represented in the table as 0x17 == 23.
*/
static unsigned char est[0x100] = {
0, 16, 23, 28, 32, 36, 39, 42, 45, 48, 51, 53, 55, 58, 60, 62, 64, 66,
68, 70, 72, 73, 75, 77, 78, 80, 82, 83, 85, 86, 88, 89, 91, 92, 93, 95,
96, 97, 99, 100, 101, 102, 104, 105, 106, 107, 109, 110, 111, 112, 113,
114, 115, 116, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 139, 140, 141,
142, 143, 144, 145, 146, 147, 148, 148, 149, 150, 151, 152, 153, 153,
154, 155, 156, 157, 158, 158, 159, 160, 161, 162, 162, 163, 164, 165,
166, 166, 167, 168, 169, 169, 170, 171, 172, 172, 173, 174, 175, 175,
176, 177, 177, 178, 179, 180, 180, 181, 182, 182, 183, 184, 185, 185,
186, 187, 187, 188, 189, 189, 190, 191, 191, 192, 193, 193, 194, 195,
195, 196, 197, 197, 198, 199, 199, 200, 200, 201, 202, 202, 203, 204,
204, 205, 206, 206, 207, 207, 208, 209, 209, 210, 210, 211, 212, 212,
213, 213, 214, 215, 215, 216, 216, 217, 218, 218, 219, 219, 220, 221,
221, 222, 222, 223, 223, 224, 225, 225, 226, 226, 227, 227, 228, 229,
229, 230, 230, 231, 231, 232, 232, 233, 234, 234, 235, 235, 236, 236,
237, 237, 238, 238, 239, 239, 240, 241, 241, 242, 242, 243, 243, 244,
244, 245, 245, 246, 246, 247, 247, 248, 248, 249, 249, 250, 250, 251,
251, 252, 252, 253, 253, 254, 254, 255, 255 };
unsigned int r;
if (x >= 1U << 14) {
/* Some values in this range require at least one Newton-Raphson step
* to refine the initial estimate.
*/
if (x >= 1U << 30) {
r = est[x >> 24] << 8;
/* Some values in this range require two Newton-Raphson iterations
* instead of just one, so perform the first one here.
*/
r = (r + x / r) / 2;
}
else if (x >= 1U << 28)
r = est[x >> 22] << 7;
else if (x >= 1U << 26)
r = est[x >> 20] << 6;
else if (x >= 1U << 24)
r = est[x >> 18] << 5;
else if (x >= 1U << 22)
r = est[x >> 16] << 4;
else if (x >= 1U << 20)
r = est[x >> 14] << 3;
else if (x >= 1U << 18)
r = est[x >> 12] << 2;
else if (x >= 1U << 16)
r = est[x >> 10] << 1;
else
r = est[x >> 8] << 0;
/* Refine the estimate with one (or one more) Newton-Raphson step.
*/
r = (r + x / r) / 2;
}
else {
/* In this range, the initial estimate (after scaling and rounding)
* is very close and needs at most a small final tweak.
*/
if (x >= 1U << 12)
r = (est[x >> 6] + 1) >> 1;
else if (x >= 1U << 10)
r = (est[x >> 4] + 2) >> 2;
else if (x >= 1U << 8)
r = (est[x >> 2] + 4) >> 3;
else {
/* In this range, the initial estimate is exact and requires no
* tweaking at all.
*/
return est[x >> 0] >> 4;
}
}
/* Final tweak: The estimate (original or once- or twice-refined) is either
* correct as it stands or is one too large. (Don't worry about overflow
* in the multiplication; `r' is strictly less than 65536 here.)
*/
if (r * r > x)
--r;
return r;
}