Long Num speed

fermineutron · Nov 1, 2006

For a while now i have been "playing" with a little C program to
compute the factorial of large numbers. Currently it takes aboy 1
second per 1000 multiplications, that is 25000P1000 will take about a
second. It will be longer for 50000P1000 as expected, since more digits
will be in the answer. Now, on the Num Analyses forum/Group there is a
post reaporting that that person wrot java code that computed 1000000!
in about a second. That is about 10000 times faste than I would expect
my code to do it. So the two possiblilities are:
1) I am doing something terribly wrong
2) The othr person is lying

At the moment i am inclined to believe that its number 1.

I am posing my code below, I would like to hear your opinions about
why it is slow and how i can improove its speed.

I know that there are public BIGNUM libraries which are already
optimized for such calculations, but I dont want to use them, bcause i
want to approach this problem on a lower level. I am mostly interested
to find out how to get this code perform faster or what alternative
algorythms i should consider. The factorial calculation is just a test
program.

===================start paste==========================

#include<stdio.h>
#include<stdlib.h>
#include<math.h>

#define al 1024*20
#define base 1000
typedef long int IntegerArrayType;

struct AEI{
IntegerArrayType data[al];
long int digits;
};

void pack(IntegerArrayType i, struct AEI *N1);
void Amult(struct AEI * A, struct AEI * B, struct AEI * C);
void AEIprintf(struct AEI * N1);
void AEIfprintf(FILE * fp, struct AEI * N1);

int main(void)
{

struct AEI *N1, *MO, *Ans;
long i = 0, j = 0, ii, NUM, iii;
FILE *ff;

N1=malloc(sizeof(struct AEI));
MO=malloc(sizeof(struct AEI));
Ans=malloc(sizeof(struct AEI));
while (i < al){
N1->data = 0;
MO->data = 0;
Ans->data=0;
i++;
}

printf("Enter integer to Factorialize: ");
scanf("%ld", &NUM);

pack(1, N1);
pack(1, Ans);
ff = fopen("Results.txt", "w");
printf("you entered: %ld", NUM);

i=1;
while(i < NUM ){

iii=0;
while(iii<NUM && iii<1000){
ii = 1;
while (ii < al)
{
MO->data[ii] = 0;
ii++;
}
pack((i+iii), MO);
Amult(N1, MO, N1);
iii++;
}
i+=iii;
Amult(Ans, N1, Ans);
printf("\nProgress is: %d",i);
pack(1, N1);
}
if(ff!=NULL){
fprintf(ff,"\n%d\n",i-1);
AEIfprintf(ff, Ans);
}
fclose(ff);

printf("\nProgress: 100\%");

return 0;
}

void AEIprintf(struct AEI *N1){

float fieldLength;
double temp;
char format1[8];
long j, FL0;
j = N1->digits-1;
FL0=(long)log10((float)base);
fieldLength = (float)log10((float)base);
temp = modf(fieldLength, &fieldLength);

format1[0] = '%';
format1[1] = '0';
format1[2] = fieldLength + 48;
format1[3] = 'd';
format1[4] = 0x00;

printf("%*d", FL0, N1->data[j]);
j--;

while (j >= 0)
{
printf(format1, N1->data[j]);

j--;
}

return;
}

void AEIfprintf(FILE * fp, struct AEI *N1){
long j = N1->digits-1;

double fieldLength, temp;
char format0[8], format1[8];

fieldLength = (int)log10(base);
temp = modf(fieldLength, &fieldLength);

format0[0] = '%';
format0[1] = fieldLength + 48;
format0[2] = 'd';
format0[3] = 0x00;
format1[0] = '%';
format1[1] = '0';
format1[2] = fieldLength + 48;
format1[3] = 'd';
format1[4] = 0x00;

fprintf(fp,format0, N1->data[j]);
j--;

while (j >= 0){
fprintf(fp, format1, N1->data[j]);
j--;
}

return;
}

void pack(IntegerArrayType i, struct AEI * N1)
{
long t = 1, i1, j = 0;

while (t == 1){
i1 = i % base;
N1->data[j] = i1;
i = (i - i1) / base;
j++;
if (i == 0)
t = 0;
}
N1->digits=j;
return;
}

void Amult(struct AEI * A, struct AEI * B, struct AEI * C){
/*C = A * B; */
long i, ii,d, result, carry=0, digits=0;
struct AEI *Ans;
Ans=malloc(sizeof(struct AEI));
i=0;
d= (A->digits+B->digits-1);
while(i<d){
Ans->data=carry;
carry=0;
ii=0;
while(ii<=i){
if(B->data[ii]!=0){
Ans->data+=A->data[i-ii]*B->data[ii];
carry+=Ans->data/base;
Ans->data=Ans->data%base;
}
ii++;
}
carry+=Ans->data/base;
Ans->data=Ans->data%base;

i++;
}
if(carry!=0){
d++;
Ans->data=carry;
}

C->digits=d;
i=0;
while(i<d){
C->data=Ans->data;
i++;
}
return;
}

====================end paste============================

I tried to indent the code with spaces instead of tabs, but if some
parts end up not properly indented, I hope no one will hold it against
me.

Thanks ahead

Ian Collins · Nov 1, 2006

fermineutron said:
For a while now i have been "playing" with a little C program to
compute the factorial of large numbers. Currently it takes aboy 1
second per 1000 multiplications, that is 25000P1000 will take about a
second. It will be longer for 50000P1000 as expected, since more digits
will be in the answer. Now, on the Num Analyses forum/Group there is a
post reaporting that that person wrot java code that computed 1000000!
in about a second. That is about 10000 times faste than I would expect
my code to do it. So the two possiblilities are:
1) I am doing something terribly wrong
2) The othr person is lying

At the moment i am inclined to believe that its number 1.

I am posing my code below, I would like to hear your opinions about
why it is slow and how i can improove its speed.

Before anyone reviews the code, have you profiled it?

If not, why not? If you have, where were the bottlenecks?

Frederick Gotham · Nov 2, 2006

fermineutron:

Now, on the Num Analyses forum/Group there is a
post reaporting that that person wrot java code that computed 1000000!
in about a second. That is about 10000 times faste than I would expect
my code to do it. So the two possiblilities are:
1) I am doing something terribly wrong
2) The othr person is lying

The burden of proof is on the Java dude.

Probability suggests that he's lying, because only a very small proportion of
proficient programmers waste their time on mickey-mouse hold-my-hand
languages such as Java. Also, Java is slow.

Either that or the algorithm's something stupid like:

char const *Func(void)
{
return "234950723957325847125797509327509237509327592387509";
}

Ask the Java dude if you can see his code. If he refuses, assume that he's a
liar, then egg his house.

fermineutron · Nov 2, 2006

Ian said:
Before anyone reviews the code, have you profiled it?

If not, why not? If you have, where were the bottlenecks?

I profilled it, but there were no obvious bottlenecks which i would not
anticipate to be there by design.

here is the profiler output

http://igorpetrusky.awardspace.com/Temp/RunStats.html

I was thinking that maybe there is some other algorythm that is better
than mine for the long int arithemetic?

Factorial calculation is just a driver for the multiplication function.

Barry Schwarz · Nov 2, 2006

For a while now i have been "playing" with a little C program to
compute the factorial of large numbers. Currently it takes aboy 1
second per 1000 multiplications, that is 25000P1000 will take about a
second. It will be longer for 50000P1000 as expected, since more digits
will be in the answer. Now, on the Num Analyses forum/Group there is a
post reaporting that that person wrot java code that computed 1000000!
in about a second. That is about 10000 times faste than I would expect
my code to do it. So the two possiblilities are:
1) I am doing something terribly wrong
2) The othr person is lying

At the moment i am inclined to believe that its number 1.

I am posing my code below, I would like to hear your opinions about
why it is slow and how i can improove its speed.

I know that there are public BIGNUM libraries which are already
optimized for such calculations, but I dont want to use them, bcause i
want to approach this problem on a lower level. I am mostly interested
to find out how to get this code perform faster or what alternative
algorythms i should consider. The factorial calculation is just a test
program.

I don't know if my comment in Amult addresses the question you voiced
but you need to pay attention to your compiler diagnostics.

===================start paste==========================

#include<stdio.h>
#include<stdlib.h>
#include<math.h>

#define al 1024*20
#define base 1000
typedef long int IntegerArrayType;

struct AEI{
IntegerArrayType data[al];
long int digits;
};

void pack(IntegerArrayType i, struct AEI *N1);
void Amult(struct AEI * A, struct AEI * B, struct AEI * C);
void AEIprintf(struct AEI * N1);
void AEIfprintf(FILE * fp, struct AEI * N1);

int main(void)
{

struct AEI *N1, *MO, *Ans;
long i = 0, j = 0, ii, NUM, iii;
FILE *ff;

N1=malloc(sizeof(struct AEI));
MO=malloc(sizeof(struct AEI));
Ans=malloc(sizeof(struct AEI));
while (i < al){
N1->data = 0;
MO->data = 0;
Ans->data=0;
i++;
}

printf("Enter integer to Factorialize: ");

Please learn to indent consistently. This is like trying to roller
skate on cobblestones.

snip rest of function until

printf("\nProgress: 100\%");

Click to expand...

The correct way to include a '%' in a printf string is %%, not \%.
Didn't your compiler complain about an undefined escape sequence?

return 0;
}

void AEIprintf(struct AEI *N1){

float fieldLength;
double temp;
char format1[8];
long j, FL0;
j = N1->digits-1;
FL0=(long)log10((float)base);
fieldLength = (float)log10((float)base);
temp = modf(fieldLength, &fieldLength);

Click to expand...

This is a constraint violation. The second argument should be a
double*, not a float*. During execution it will invoke undefined
behavior. Didn't your compiler complain about a type mismatch?

format1[0] = '%';
format1[1] = '0';
format1[2] = fieldLength + 48;

Click to expand...

What is 48. You probably mean '0' and should use that.

format1[3] = 'd';
format1[4] = 0x00;

Click to expand...

While the hex value is correct, '\0' is more in keeping with common C
idiom.

printf("%*d", FL0, N1->data[j]);
j--;

while (j >= 0)
{
printf(format1, N1->data[j]);

j--;
}

return;
}

snip

void Amult(struct AEI * A, struct AEI * B, struct AEI * C){
/*C = A * B; */
long i, ii,d, result, carry=0, digits=0;
struct AEI *Ans;
Ans=malloc(sizeof(struct AEI));
i=0;
d= (A->digits+B->digits-1);
while(i<d){
Ans->data=carry;
carry=0;
ii=0;
while(ii<=i){
if(B->data[ii]!=0){
Ans->data+=A->data[i-ii]*B->data[ii];
carry+=Ans->data/base;
Ans->data=Ans->data%base;
}
ii++;
}
carry+=Ans->data/base;
Ans->data=Ans->data%base;

i++;
}
if(carry!=0){
d++;
Ans->data=carry;
}

C->digits=d;
i=0;
while(i<d){
C->data=Ans->data;

Click to expand...

You could have simply said *C = *Ans and let the compiler do this
possibly more efficiently.

Is there a reason you did not do the work in C directly (and avoid the
time spent in malloc and copying *Ans to *C)?

i++;

Click to expand...

Why do you go to so much trouble to avoid for loops?
for (i = 0; i < d; i++)
C->...;

}
return;

Click to expand...

Where do you free Ans? You are causing repeated memory leaks.

}

====================end paste============================

I tried to indent the code with spaces instead of tabs, but if some
parts end up not properly indented, I hope no one will hold it against
me.

Thanks ahead

Click to expand...

Remove del for email

Richard Heathfield · Nov 2, 2006

Barry Schwarz said:

On 1 Nov 2006 14:28:20 -0800, "fermineutron" <[email protected]>
wrote:

float fieldLength;
double temp;
char format1[8];
long j, FL0;
j = N1->digits-1;
FL0=(long)log10((float)base);
fieldLength = (float)log10((float)base);
temp = modf(fieldLength, &fieldLength);

Click to expand...

This is a constraint violation. The second argument should be a
double*, not a float*. During execution it will invoke undefined
behavior. Didn't your compiler complain about a type mismatch?

In my experience, fermineutron pays no heed to correctness issues, and is
concerned only with speed. He has a track record of ignoring corrections. I
think you're wasting your time, Barry.

Samuel Stearley · Nov 2, 2006

Hi
The math software in my ti89 calculator with a 12 MHz 68k CPU can
calculate all 613 digits of 299! in 1 second.
It then takes 5 seconds to convert that to a string for display. But
for giggles I once wrote my own display routine that does the integer
to string conversion in 1/3 of a second.

websnarf · Nov 2, 2006

Its a factorial calculation. The bottleneck is in the big integer
multiply, which itself should have a bottleneck in the platform's
multiply.

I profilled it, but there were no obvious bottlenecks which i would not
anticipate to be there by design.

here is the profiler output

http://igorpetrusky.awardspace.com/Temp/RunStats.html

I was thinking that maybe there is some other algorythm that is better
than mine for the long int arithemetic?

Probably so. Consider that you are doing nothing more than computing
the straight product of the numbers using no arithmetic short cuts at
all.

So here's what comes off the top of my head: Ask yourself the
following problem. How many factors of 2 are there in 1000000! ?
Certainly every other number is even. But every 4th number has 2
factors of 2, and every 8th number has 3 factors of two in it. So the
answer is:

f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ...

Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s,
and all the primes less than 1000000, as f(3), f(5), etc. Then the
result you are looking for is:

1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*...

Now, the question is -- what makes us think this will be any faster?
Well, the pow() function can be computed with successive squaring
tricks. Squaring faster than straight multiplying because (a*q^r + b)
^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number
multiplies that you have to perform here can be accelerated using any
number of big number multiply acceleration tricks (Karatsuba,
Toom-Cook, or DFTs.).

I don't know how much faster, if any, doing things this way would be.
If you find a faster way in the literature, I would be interested in
know it.

websnarf · Nov 2, 2006

Its a factorial calculation. The bottleneck is in the big integer
multiply, which itself should have a bottleneck in the platform's
multiply.

I profilled it, but there were no obvious bottlenecks which i would not
anticipate to be there by design.

here is the profiler output

http://igorpetrusky.awardspace.com/Temp/RunStats.html

I was thinking that maybe there is some other algorythm that is better
than mine for the long int arithemetic?

Probably so. Consider that you are doing nothing more than computing
the straight product of the numbers using no arithmetic short cuts at
all.

So here's what comes off the top of my head: Ask yourself the
following problem. How many factors of 2 are there in 1000000! ?
Certainly every other number is even. But every 4th number has 2
factors of 2, and every 8th number has 3 factors of two in it. So the
answer is:

f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ...

Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s,
and all the primes less than 1000000, as f(3), f(5), etc. Then the
result you are looking for is:

1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*...

Now, the question is -- what makes us think this will be any faster?
Well, the pow() function can be computed with successive squaring
tricks. Squaring faster than straight multiplying because (a*q^r + b)
^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number
multiplies that you have to perform here can be accelerated using any
number of big number multiply acceleration tricks (Karatsuba,
Toom-Cook, or DFTs.).

I don't know how much faster, if any, doing things this way would be.
If you find a faster way in the literature, I would be interested in
knowing it.

Chris Dollin · Nov 2, 2006

Frederick said:
Probability suggests that he's lying, because only a very small proportion of
proficient programmers waste their time on mickey-mouse hold-my-hand
languages such as Java. Also, Java is slow.

Can we drop the insults, please?

Keith Thompson · Nov 2, 2006

So here's what comes off the top of my head: Ask yourself the
following problem. How many factors of 2 are there in 1000000! ?
Certainly every other number is even. But every 4th number has 2
factors of 2, and every 8th number has 3 factors of two in it. So the
answer is:

f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ...

Similarly we can figure out the number of factors of 3s, 5s, 7s, 11s,
and all the primes less than 1000000, as f(3), f(5), etc. Then the
result you are looking for is:

1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*...

Now, the question is -- what makes us think this will be any faster?
Well, the pow() function can be computed with successive squaring
tricks. Squaring faster than straight multiplying because (a*q^r + b)
^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And the resulting big number
multiplies that you have to perform here can be accelerated using any
number of big number multiply acceleration tricks (Karatsuba,
Toom-Cook, or DFTs.).

Hmm, interesting approach. But does it really speed things up? If
you want to compute, say, 3**1024 with extended-precision integers,
successive squaring reduces the number of multiplications, but is it
faster than doing the 1023 multiplications? I'm thinking that
multiplying, say, a 499-digit number by a 1-digit number might be
quicker than multiplying two 250-digit numbers.

I haven't really thought this through (feel free to say that that's
obvious).

Richard Heathfield · Nov 2, 2006

Chris Dollin said:

Can we drop the insults, please?

That would at least be a start, but he still needs to make amends for past
insults.

Richard Heathfield · Nov 2, 2006

Keith Thompson said:

If
you want to compute, say, 3**1024 with extended-precision integers,
successive squaring reduces the number of multiplications, but is it
faster than doing the 1023 multiplications? I'm thinking that
multiplying, say, a 499-digit number by a 1-digit number might be
quicker than multiplying two 250-digit numbers.

Sure, but obtaining the 499-digit number in the first place is likely to be
slower.

Measure, measure, measure!

websnarf · Nov 2, 2006

Keith said:
Hmm, interesting approach. But does it really speed things up? If
you want to compute, say, 3**1024 with extended-precision integers,
successive squaring reduces the number of multiplications, but is it
faster than doing the 1023 multiplications? I'm thinking that
multiplying, say, a 499-digit number by a 1-digit number might be
quicker than multiplying two 250-digit numbers.

I haven't really thought this through (feel free to say that that's
obvious).

I'll resist the temptation. Especially, since I only went half way.

Anyhow, the issue is not whether or not the squaring trick is saving
you overall performance (it does -- this is not controvertial.) The
real controversy with my approach is that its breaking numbers down
into its constituent factors first, then recombining them. I.e., its
faster to multiply 72 by 91 directly then to first break it down into 8
* 9 * 7 * 13. The hope, however, is that the number of primes are
sufficiently less than n (in this case there are about 87K primes under
1 million) to compensate for the cost of computing the power
computation. Notice that the vast majority of the primes will have a
power less than, say, 6, and that in fact the most common power is 1.

websnarf · Nov 2, 2006

Frederick said:
fermineutron:

The burden of proof is on the Java dude.

Probability suggests that he's lying, because only a very small proportion of
proficient programmers waste their time on mickey-mouse hold-my-hand
languages such as Java. Also, Java is slow.

Either that or the algorithm's something stupid like:

char const *Func(void)
{
return "234950723957325847125797509327509237509327592387509";
}

1 million factorial is a very very very very very very big number. I
assure you that they did not solve the problem in the way you suggest
unless they are rigging against a very specific benchmark. Can you
imagine a library that's 100s of megs larger than necesary just to hold
a table of factorials?

Ask the Java dude if you can see his code. If he refuses, assume that he's a
liar, then egg his house.

Right, because nobody ever writes proprietary code.

In the case of bignum libraries, while I don't know this first hand,
the specification and architecture of the Java language itself suggests
very strongly to me that their bignum performance should be equal to
the best bignum libraries in existence, and faster than any portable
pure C bignum library. Again, I don't know this from direct first-hand
knowledge, but there is very strong possibility that you are completely
wrong on this point. Java may be very slow at many things, but if its
fast at anything, bignum is probably one of the prime candidates.

CBFalconer · Nov 2, 2006

Its a factorial calculation. The bottleneck is in the big integer
multiply, which itself should have a bottleneck in the platform's
multiply.

Probably so. Consider that you are doing nothing more than
computing the straight product of the numbers using no arithmetic
short cuts at all.

So here's what comes off the top of my head: Ask yourself the
following problem. How many factors of 2 are there in 1000000! ?
Certainly every other number is even. But every 4th number has 2
factors of 2, and every 8th number has 3 factors of two in it.
So the answer is:

f(2) = floor(1000000/2) + floor(1000000/4) + floor(1000000/8) + ...

Similarly we can figure out the number of factors of 3s, 5s, 7s,
11s, and all the primes less than 1000000, as f(3), f(5), etc.
Then the result you are looking for is:

1000000! = pow(2,f(2))*pow(3,f(3))*pow(5,f(5))*...

Now, the question is -- what makes us think this will be any
faster? Well, the pow() function can be computed with successive
squaring tricks. Squaring faster than straight multiplying
because (a*q^r + b) ^ 2 = a^2*q^(2*r) + b^2 + 2*a*b*(q^r). And
the resulting big number multiplies that you have to perform
here can be accelerated using any number of big number multiply
acceleration tricks (Karatsuba, Toom-Cook, or DFTs.).

I don't know how much faster, if any, doing things this way would
be. If you find a faster way in the literature, I would be
interested in know it.

This doesn't answer the 'faster' question, but your method is
basically the same as my algorithm for computing large factorials
in limited size registers, which I published here over 3 years ago.

/* compute factorials, extended range
on a 32 bit machine this can reach fact(15) without
unusual output formats. With the prime table shown
overflow occurs at 101.

Public domain, by C.B. Falconer. 2003-06-22
*/

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

/* 2 and 5 are handled separately
Placing 2 at the end attempts to preserve such factors
for use with the 5 factor and exponential notation
*/
static unsigned char primes[] = {3,7,11,13,17,19,23,29,31,37,
41,43,47,53,57,59,61,67,71,
/* add further primes here -->*/
2,5,0};
static unsigned int primect[sizeof primes]; /* = {0} */

static double fltfact = 1.0;

/* ----------------- */

static
unsigned long int fact(unsigned int n, unsigned int *zeroes)
{
unsigned long val;
unsigned int i, j, k;

#define OFLOW ((ULONG_MAX / j) < val)

/* This is a crude mechanism for passing back values */
for (i = 0; i < sizeof primes; i++) primect = 0;

for (i = 1, val = 1UL, *zeroes = 0; i <= n; i++) {
fltfact *= i; /* approximation */
j = i;
/* extract exponent of 10 */
while ((0 == (j % 5)) && (!(val & 1))) {
j /= 5; val /= 2;
(*zeroes)++;
}
/* Now try to avoid any overflows */
k = 0;
while (primes[k] && OFLOW) {
/* remove factors primes[k] */
while (0 == (val % primes[k]) && OFLOW) {
val /= primes[k];
++primect[k];
}
while (0 == (j % primes[k]) && OFLOW) {
j /= primes[k];
++primect[k];
}
k++;
}

/* Did we succeed in the avoidance */
if (OFLOW) {
#if DEBUG
fprintf(stderr, "Overflow at %u, %lue%u * %u\n",
i, val, *zeroes, j);
#endif
val = 0;
break;
}
val *= j;
}
return val;
} /* fact */

/* ----------------- */

int main(int argc, char *argv[])
{
unsigned int x, zeroes;
unsigned long f;

if ((2 == argc) && (1 == sscanf(argv[1], "%u", &x))) {
if (!(f = fact(x, &zeroes))) {
fputs("Overflow\n", stderr);
return EXIT_FAILURE;
}

printf("Factorial(%u) == %lu", x, f);
if (zeroes) printf("e%u", zeroes);
for (x = 0; primes[x]; x++) {
if (primect[x]) {
printf(" * pow(%d,%d)", primes[x], primect[x]);
}
}
putchar('\n');
printf("or approximately %.0f.\n", fltfact);
return 0;
}
fputs("Usage: fact n\n", stderr);
return EXIT_FAILURE;
} /* main */

Stefan Krah · Nov 2, 2006

* fermineutron said:
For a while now i have been "playing" with a little C program to
compute the factorial of large numbers. Currently it takes aboy 1
second per 1000 multiplications, that is 25000P1000 will take about a
second. It will be longer for 50000P1000 as expected, since more digits
will be in the answer. Now, on the Num Analyses forum/Group there is a
post reaporting that that person wrot java code that computed 1000000!
in about a second. That is about 10000 times faste than I would expect
my code to do it. So the two possiblilities are:
1) I am doing something terribly wrong
2) The othr person is lying

At the moment i am inclined to believe that its number 1.

Yes. I'm pretty sure that "the person" used the Prime-Schönhage algorithm and
the Java port of the apfloat library.

Using the C++ version of apfloat I get these times for 1000000! :

builtin factorial(): 45s

Prime-Schönhage: 15s

All on a Celeron 1.2 GHz. Apfloat's number theoretic transforms probably perform
much faster on a 64 bit platform.

Stefan Krah

websnarf · Nov 2, 2006

CBFalconer said:
This doesn't answer the 'faster' question, but your method is
basically the same as my algorithm for computing large factorials
in limited size registers, which I published here over 3 years ago.

Your solution computes a mathematically equivalent re-expression, but
the similarities end there. The differences start where the
*algorithms* begin. Your algorithm counts each factor individually
(++primect[k]), paying a divide and modulo penalty for each occurring
factor for each number one by one. As you can see in my formulations
above, calculating each f(p) will perform log(p,n) divides and no
modulos. So yeah, your solution definately doesn't answer the 'faster'
question.

/* compute factorials, extended range
on a 32 bit machine this can reach fact(15) without
unusual output formats. With the prime table shown
overflow occurs at 101.

The OP asked for 1000000!. You're off by 4 orders of magnitude. For
factorials that small, why wouldn't you just create a table?

Public domain, by C.B. Falconer. 2003-06-22
*/

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

/* 2 and 5 are handled separately
Placing 2 at the end attempts to preserve such factors
for use with the 5 factor and exponential notation
*/
static unsigned char primes[] = {3,7,11,13,17,19,23,29,31,37,
41,43,47,53,57,59,61,67,71,
/* add further primes here -->*/
2,5,0};

Ok, seriously dude. Do I need to say "640K ought to be enough for
anyone" yet again to you? What's your problem with prime number
sequences anyways? This is the second time I've seen you express them
in a miniscule finite table, for problems that require an unbounded
above sequence of them.

Ancient_Hacker · Nov 2, 2006

fermineutron said:
For a while now i have been "playing" with a little C program to
compute the factorial of large numbers.

You asked this question before, and I gave you the answer.

You're doing one digit at a time. Doing it with 32 bits at a time will
be at least 400,000,000 times faster. Why,? because with 32 bits you
can store numbers up to about 4 billion, AND you don't need to do a
divide and a mod each time.

fermineutron · Nov 2, 2006

Ancient_Hacker said:
You asked this question before, and I gave you the answer.

You're doing one digit at a time. Doing it with 32 bits at a time will
be at least 400,000,000 times faster. Why,? because with 32 bits you
can store numbers up to about 4 billion, AND you don't need to do a
divide and a mod each time.

I am doing 3 digits at the time, see the base constant at the top of a
file. Theoretically i could use 5 digits, or if to make the code only
compatible with C99 hence using 64bit long long int i could increase
base even higher. The restraint on the base is that base^2 must fit in
largest int type avaiable.

C exe on winXP memory limitation	57	Oct 27, 2006
Adding adressing of IPv6 to program	1	Feb 16, 2023
Range / empty list issues??	1	Dec 11, 2023
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Print with command-line arguments	0	Oct 2, 2022
Translater + module + tkinter	1	Feb 16, 2023
Command Line Arguments	0	Mar 7, 2023
Array of structs function pointer	10	Jul 16, 2023

Long Num speed

fermineutron

Ian Collins

Frederick Gotham

fermineutron

Barry Schwarz

Richard Heathfield

Samuel Stearley

websnarf

websnarf

Chris Dollin

Keith Thompson

Richard Heathfield

Richard Heathfield

websnarf

websnarf

CBFalconer

Stefan Krah

websnarf

Ancient_Hacker

fermineutron

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads