# Long number base conversion

Discussion in 'C Programming' started by Eyegor, Nov 25, 2010.

1. ### EyegorGuest

Hi,
I am working on a project, just for fun of it, so there is no 3rd
party restictions.
What I am trying to do is to write a simple bignum library, simmilar
to GMP. I know GMP exists and it works and I am not going to write
anything better than what it is, this is just for learning purposes.
Anyhow, I creaed several basic functions to add and subtract large
numbers which are basically Array Encoded Integers. I use 16-bit
signed integers as elements in the array, and the whole array
represents a very large integer, say 16000 bits or so.

The problem I ran into is writing a replica of scanf function.
Basically I want to comvert a decimal input like
67627457272523772428537372395768 into the binary array. Any suggestion
on how to do this would be greatly appreciated. I dont actually need
the code, just an algorithm suggestion. I wrote a printf function like
this about 5 years ago, but I dont have that code anymore and want to
replicate it.

thanks

Eyegor, Nov 25, 2010

2. ### BartCGuest

"Eyegor" <> wrote in message
news:...

> What I am trying to do is to write a simple bignum library, simmilar

> The problem I ran into is writing a replica of scanf function.
> Basically I want to comvert a decimal input like
> 67627457272523772428537372395768 into the binary array. Any suggestion
> on how to do this would be greatly appreciated. I dont actually need
> the code, just an algorithm suggestion. I wrote a printf function like
> this about 5 years ago, but I dont have that code anymore and want to
> replicate it.

s = "67627457272523772428537372395768"

n = 0
for each character c in s:
n = n*10 + (c-'0')

You need to use your bigint arithmetic for operations here.

--
Bartc

BartC, Nov 25, 2010

3. ### EyegorGuest

On Nov 24, 8:36 pm, "BartC" <> wrote:
> "Eyegor" <> wrote in message
>
> news:...
>
> > What I am trying to do is to write a simple bignum library, simmilar
> > The problem I ran into is writing a replica of scanf function.
> > Basically I want to comvert a decimal input like
> > 67627457272523772428537372395768 into the binary array. Any suggestion
> > on how to do this would be greatly appreciated. I dont actually need
> > the code, just an algorithm suggestion. I wrote a printf function like
> > this about 5 years ago, but I dont have that code anymore and want to
> > replicate it.

>
> s = "67627457272523772428537372395768"
>
> n = 0
> for each character c in s:
>    n = n*10 + (c-'0')
>
> You need to use your bigint arithmetic for operations here.
>
> --
> Bartc

Enginious, Thank you.

Eyegor, Nov 25, 2010
4. ### Keith ThompsonGuest

Eyegor <> writes:
> I am working on a project, just for fun of it, so there is no 3rd
> party restictions.
> What I am trying to do is to write a simple bignum library, simmilar
> to GMP. I know GMP exists and it works and I am not going to write
> anything better than what it is, this is just for learning purposes.
> Anyhow, I creaed several basic functions to add and subtract large
> numbers which are basically Array Encoded Integers. I use 16-bit
> signed integers as elements in the array, and the whole array
> represents a very large integer, say 16000 bits or so.

Why are you using *signed* integers? I'd think that unsigned
integers would be more convenient. If you have a bignum represented
as an array of 1000 16-bit integers, you only need 1 sign bit,
not a thousand of them.

> The problem I ran into is writing a replica of scanf function.
> Basically I want to comvert a decimal input like
> 67627457272523772428537372395768 into the binary array. Any suggestion
> on how to do this would be greatly appreciated. I dont actually need
> the code, just an algorithm suggestion. I wrote a printf function like
> this about 5 years ago, but I dont have that code anymore and want to
> replicate it.

Presumably you've already got functions that implement the arithmetic
operators (+, -, *, /, %) for your Array Encoded Integers. Just use
them to implement the string-to-bignum conversion in the same way
you'd implement a string-to-integer conversion. It's the same
algorithm (which I presume you already know or can find or invent),
just using different operations.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Nov 25, 2010
5. ### bertGuest

On Nov 25, 1:21 am, Eyegor <> wrote:
> Hi,
> I am working on a project, just for fun of it, so there is no 3rd
> party restictions.
> What I am trying to do is to write a simple bignum library, simmilar
> to GMP. I know GMP exists and it works and I am not going to write
> anything better than what it is, this is just for learning purposes.
> Anyhow, I creaed several basic functions to add and subtract large
> numbers which are basically Array Encoded Integers. I use 16-bit
> signed integers as elements in the array, and the whole array
> represents a very large integer, say 16000 bits or so.
>
> The problem I ran into is writing a replica of scanf function.
> Basically I want to comvert a decimal input like
> 67627457272523772428537372395768 into the binary array. Any suggestion
> on how to do this would be greatly appreciated. I dont actually need
> the code, just an algorithm suggestion. I wrote a printf function like
> this about 5 years ago, but I dont have that code anymore and want to
> replicate it.

Yes, BartC is perfectly correct.

You will find that a couple of useful additions to
your range of functions are bignum * int -> bignum,
and bignum / int -> (quot bignum, rem int). The
latter is particularly handy for the inverse of
scanf, churning out the decimal digits of a bigint.
--

bert, Nov 25, 2010
6. ### Tom St DenisGuest

On Nov 24, 8:21 pm, Eyegor <> wrote:
> Hi,
> I am working on a project, just for fun of it, so there is no 3rd
> party restictions.
> What I am trying to do is to write a simple bignum library, simmilar
> to GMP. I know GMP exists and it works and I am not going to write
> anything better than what it is, this is just for learning purposes.
> Anyhow, I creaed several basic functions to add and subtract large
> numbers which are basically Array Encoded Integers. I use 16-bit
> signed integers as elements in the array, and the whole array
> represents a very large integer, say 16000 bits or so.

Why not check out LibTomMath [disclosure I was the original developer/
maintainer]? It's public domain, has a whole suite of typical bignum
functions (and some required for crypto), even comes with a book on
bignum math in PDF format in the archive.

I'm not trying to discourage you from writing your own code, by all
means do that. You'll also learn more if you try then to explain the
algorithms (worked for me).

http://libtom.org/?page=features&newsitems=5&whatfile=ltm

Tom

Tom St Denis, Nov 25, 2010
7. ### Mark WoodingGuest

"BartC" <> writes:

> "Eyegor" <> wrote in message
> news:...
>
> > What I am trying to do is to write a simple bignum library, simmilar

>
> > The problem I ran into is writing a replica of scanf function.
> > Basically I want to comvert a decimal input like
> > 67627457272523772428537372395768 into the binary array. Any suggestion
> > on how to do this would be greatly appreciated. I dont actually need
> > the code, just an algorithm suggestion. I wrote a printf function like
> > this about 5 years ago, but I dont have that code anymore and want to
> > replicate it.

>
> s = "67627457272523772428537372395768"
>
> n = 0
> for each character c in s:
> n = n*10 + (c-'0')

This is a really sucky way of doing it. Good large-integer
multiplication algorithms work best when their inputs are approximately
the same length. The above algorithm is more or less pathological.

Here's a sketch of a fairly good algorithm. The algorithm will read a
sequence V of base-R digits of unknown length L, most significant first,
and output the integer represented by V

V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
item (X, N) onto the stack is therefore like this:

if the stack is empty, push (X, N) and return
let (X', N') be the topmost item on the stack
if N' > N then push (X, N) and return
pop (X', N')
set X <- X' R^N + X and N <- N + N' and go back to the top

Now, to convert the sequence V:

if V is exhausted, finish up (below)
read a digit D from V
push the pair (D, 1) onto the stack (above)

Finally, finish up as follows:

if the stack is empty, return 0 (or report an error: the
sequence was empty, which you might consider invalid)
let (Y, N) be the topmost item on the stack
while the stack is not empty:
pop (X, N')
set Y <- X R^N + Y and N <- N + R^N
output Y

It's not hard to compute a (fairly low) upper bound on the size of stack
required, so this can be allocated statically. All of the values of N
stored on the stack are powers of 2 (easy proof). You can save a number
of multiplications by caching R^N for the various values of N you
encounter rather than computing them afresh each time; again, the number
of items in this cache is bounded above by a small number you can
determine statically.

Plausible bounds are simple functions of CHAR_BIT and sizeof whatever
type you're using to store lengths of your bignums (this will be an
overestimate if there are padding bits, but that doesn't matter).

(I have an implementation of this in portable C, licensed under LGPL.)

-- [mdw]

Mark Wooding, Nov 25, 2010
8. ### BartCGuest

"Mark Wooding" <> wrote in message
news:...
> "BartC" <> writes:
>
>> "Eyegor" <> wrote in message
>> news:...

>> > The problem I ran into is writing a replica of scanf function.
>> > Basically I want to comvert a decimal input like
>> > 67627457272523772428537372395768 into the binary array. Any suggestion
>> > on how to do this would be greatly appreciated. I dont actually need
>> > the code, just an algorithm suggestion.

>> s = "67627457272523772428537372395768"
>>
>> n = 0
>> for each character c in s:
>> n = n*10 + (c-'0')

>
> This is a really sucky way of doing it. Good large-integer
> multiplication algorithms work best when their inputs are approximately
> the same length. The above algorithm is more or less pathological.

It'll do to get started. I like to test code (even pseudo-code), before I
post, the above took me a few minutes (not in C though).

> Here's a sketch of a fairly good algorithm. The algorithm will read a
> sequence V of base-R digits of unknown length L, most significant first,
> and output the integer represented by V
>
> V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

I would just assume R is 10. If bigints are stored in binary form, and R is
2 or 16 (the mostly likely apart from 10), then a string can be translated
very efficiently with a few bit-operations, without multiplies or additions.

(As it happens, my own bigint library stores numbers in decimal format,
allowing a similarly fast conversion, but the entire library is so slow
anyway that it's not worth bothering with.)

> The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
> such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
> is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
> item (X, N) onto the stack is therefore like this:

....

I'll have to study this further.

--
Bartc

BartC, Nov 25, 2010
9. ### EyegorGuest

On Nov 25, 12:20 pm, "BartC" <> wrote:
> "Mark Wooding" <> wrote in message
>
> news:...
>
>
>
> > "BartC" <> writes:

>
> >> "Eyegor" <> wrote in message
> >>news:....
> >> > The problem I ran into is writing a replica of scanf function.
> >> > Basically I want to comvert a decimal input like
> >> > 67627457272523772428537372395768 into the binary array. Any suggestion
> >> > on how to do this would be greatly appreciated. I dont actually need
> >> > the code, just an algorithm suggestion.
> >> s = "67627457272523772428537372395768"

>
> >> n = 0
> >> for each character c in s:
> >>   n = n*10 + (c-'0')

>
> > This is a really sucky way of doing it.  Good large-integer
> > multiplication algorithms work best when their inputs are approximately
> > the same length.  The above algorithm is more or less pathological.

>
> It'll do to get started. I like to test code (even pseudo-code), before I
> post, the above took me a few minutes (not in C though).
>
> > Here's a sketch of a fairly good algorithm.  The algorithm will read a
> > sequence V of base-R digits of unknown length L, most significant first,
> > and output the integer represented by V

>
> >        V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

>
> I would just assume R is 10. If bigints are stored in binary form, and R is
> 2 or 16 (the mostly likely apart from 10), then a string can be translated
> very efficiently with a few bit-operations, without multiplies or additions.
>
> (As it happens, my own bigint library stores numbers in decimal format,
> allowing a similarly fast conversion, but the entire library is so slow
> anyway that it's not worth bothering with.)
>
> > The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
> > such that 0 <= X < R^N.  The stack maintains an invariant: if (X0, N0)
> > is nearer the top than (X1, N1) then N0 < N1.  The algorithm to push an
> > item (X, N) onto the stack is therefore like this:

>
> ...
>
> I'll have to study this further.
>
> --
> Bartc

You don't want to have base10 library, because it will by about 100
times slower than what it can be, in multiplication routine. The trick
is to make library base multiple of 2 so you can use left bit shift in

I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
boring job babysitting technicians, so occasionally I get into a self
improvement mood, this happens when I feel like I am getting dumber at
work day after day. Anyhow, during these "self improvement moods I
usually code in C or do some other learning project. This is one of
those learning projects. I did this before, but lost code couple years
back during pc upgrade. Last time I wrote a factorial calculator which
used my own bignum library and could calculate 1,000,000 factorial
with no problems.

Eyegor, Nov 25, 2010
10. ### BartCGuest

"Eyegor" <> wrote in message
news:...
> On Nov 25, 12:20 pm, "BartC" <> wrote:

>> (As it happens, my own bigint library stores numbers in decimal format,
>> allowing a similarly fast conversion, but the entire library is so slow
>> anyway that it's not worth bothering with.)

> You don't want to have base10 library, because it will by about 100
> times slower than what it can be, in multiplication routine. The trick
> is to make library base multiple of 2 so you can use left bit shift in

Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
byte).

A decimal-based library isn't that terrible (some things are easier, such as
printing results), but my library was only quickly put together to solve a
problem, and is not too serious (otherwise I wouldn't have implemented it in
an interpreted language...)

> I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
> boring job babysitting technicians, so occasionally I get into a self
> improvement mood, this happens when I feel like I am getting dumber at
> work day after day. Anyhow, during these "self improvement moods I
> usually code in C or do some other learning project. This is one of
> those learning projects. I did this before, but lost code couple years
> back during pc upgrade. Last time I wrote a factorial calculator which
> used my own bignum library and could calculate 1,000,000 factorial
> with no problems.

How long did it take? My slow library took 90 minutes just to work out
100,000! (and you can read that exclamation mark either way...)

Might be time to upgrade it I think...

--
Bartc

BartC, Nov 25, 2010
11. ### EyegorGuest

On Nov 25, 5:55 pm, "BartC" <> wrote:
> "Eyegor" <> wrote in message
>
> news:...
>
> > On Nov 25, 12:20 pm, "BartC" <> wrote:
> >> (As it happens, my own bigint library stores numbers in decimal format,
> >> allowing a similarly fast conversion, but the entire library is so slow
> >> anyway that it's not worth bothering with.)

> > You don't want to have base10 library, because it will by about 100
> > times slower than what it can be, in multiplication routine. The trick
> > is to make library base multiple of 2 so you can use left bit shift in

>
> Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
> ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
> byte).
>
> A decimal-based library isn't that terrible (some things are easier, such as
> printing results), but my library was only quickly put together to solve a
> problem, and is not too serious (otherwise I wouldn't have implemented it in
> an interpreted language...)
>
> > I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
> > boring job babysitting technicians, so occasionally I get into a self
> > improvement mood, this happens when I feel like I am getting dumber at
> > work day after day. Anyhow, during these "self improvement moods I
> > usually code in C or do some other learning project. This is one of
> > those learning projects. I did this before, but lost code couple years
> > back during pc upgrade. Last time I wrote a factorial calculator which
> > used my own bignum library and could calculate 1,000,000 factorial
> > with no problems.

>
> How long did it take? My slow library took 90 minutes just to work out
> 100,000! (and you can read that exclamation mark either way...)
>
> Might be time to upgrade it I think...
>
> --
> Bartc

Eyegor, Nov 26, 2010
12. ### EyegorGuest

On Nov 25, 7:30 pm, Eyegor <> wrote:
> On Nov 25, 5:55 pm, "BartC" <> wrote:
>
>
>
> > "Eyegor" <> wrote in message

>
> >news:....

>
> > > On Nov 25, 12:20 pm, "BartC" <> wrote:
> > >> (As it happens, my own bigint library stores numbers in decimal format,
> > >> allowing a similarly fast conversion, but the entire library is so slow
> > >> anyway that it's not worth bothering with.)
> > > You don't want to have base10 library, because it will by about 100
> > > times slower than what it can be, in multiplication routine. The trick
> > > is to make library base multiple of 2 so you can use left bit shift in
> > > your multiplication function.

>
> > Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
> > ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
> > byte).

>
> > A decimal-based library isn't that terrible (some things are easier, such as
> > printing results), but my library was only quickly put together to solve a
> > problem, and is not too serious (otherwise I wouldn't have implemented it in
> > an interpreted language...)

>
> > > I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
> > > boring job babysitting technicians, so occasionally I get into a self
> > > improvement mood, this happens when I feel like I am getting dumber at
> > > work day after day. Anyhow, during these "self improvement moods I
> > > usually code in C or do some other learning project. This is one of
> > > those learning projects. I did this before, but lost code couple years
> > > back during pc upgrade. Last time I wrote a factorial calculator which
> > > used my own bignum library and could calculate 1,000,000 factorial
> > > with no problems.

>
> > How long did it take? My slow library took 90 minutes just to work out
> > 100,000! (and you can read that exclamation mark either way...)

>
> > Might be time to upgrade it I think...

>
> > --
> > Bartc

>

Decimal multiplication is what kills you. Change library to base 2
(256, 2^16 etc) and use left shifts.

Eyegor, Nov 26, 2010
13. ### BartCGuest

"Eyegor" <> wrote in message
news:...
> On Nov 25, 7:30 pm, Eyegor <> wrote:
>> On Nov 25, 5:55 pm, "BartC" <> wrote:

>> > > Last time I wrote a factorial calculator which
>> > > used my own bignum library and could calculate 1,000,000 factorial
>> > > with no problems.

>>
>> > How long did it take? My slow library took 90 minutes just to work out
>> > 100,000! (and you can read that exclamation mark either way...)

>>
>> > Might be time to upgrade it I think...

> Decimal multiplication is what kills you. Change library to base 2
> (256, 2^16 etc) and use left shifts.

About a 25000:1 difference, assuming an estimated 500000 seconds for my code
to calculate 1000000! I don't think using decimals would make that much
difference (and interpreted code might only account for 10:1 of it). I will
look tomorrow to see what's going on.

(BTW you're not running it on one of those Chinese supercomputers are you?)

--
Bartc

BartC, Nov 26, 2010
14. ### Mark WoodingGuest

"BartC" <> writes:

> How long did it take? My slow library took 90 minutes just to work out
> 100,000! (and you can read that exclamation mark either way...)

Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
almost 20s to convert the result to decimal. I compute 1000000! (and
output the result in hex, but that's trivial) in 1m 40s; decimal output
takes much longer: almost 50 minutes.

I'm using Karatsuba multiplication for large numbers, but nothing
fancier than that -- one probably ought to be using FFT-based algorithms
for these sizes of numbers, but my library is mainly intended for
cryptography, for which even Karatsuba is of questionable benefit. My
library is written entirely in (obsessively) portable C.

Beyond that, my algorithms for factorial computation and base coversion
are divide-and-conquer affairs, designed to operate on numbers of
similar sizes. Factorial is computed bottom-up, and uses a similar
stack trick to the decimal-to-binary converter I outlined earlier;
binary-to-decimal uses a top-down recursive algorithm I sketched in a

(My library is, for some reason, uncommonly fast at factorial
computations, even though I only implemented them as a useless toy.
When I compared a few years ago, it was faster than GMP, even though GMP
has fancier multiply algorithms and chunks of assembler code. I suspect
that GMP has improved since then. My source code is publicly
available.)

-- [mdw]

Mark Wooding, Nov 26, 2010
15. ### BartCGuest

"Mark Wooding" <> wrote in message
news:...
> "BartC" <> writes:
>
>> How long did it take? My slow library took 90 minutes just to work out
>> 100,000! (and you can read that exclamation mark either way...)

>
> Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
> almost 20s to convert the result to decimal. I compute 1000000! (and
> output the result in hex, but that's trivial) in 1m 40s; decimal output
> takes much longer: almost 50 minutes.

I've got it down to some 23 seconds for 100000! and 3500 seconds for
1000000!

This is using the basic algorithms for multiplication and factorials. I
suppose some fancier algorithms are on the cards, but that 3000 seconds for
decimal conversion you mentioned sounds a bit worrying; are there no fast
division techniques?

--
Bartc

BartC, Nov 27, 2010
16. ### Mark WoodingGuest

"BartC" <> writes:

> I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!

Nice going!

> This is using the basic algorithms for multiplication and factorials. I
> suppose some fancier algorithms are on the cards, but that 3000 seconds for
> decimal conversion you mentioned sounds a bit worrying; are there no fast
> division techniques?

There are; I've not implemented any because they're quite complicated
and they aren't useful in my target application. The one I'm using is
basically textbook long division (in base 2^32), as described in Knuth.
Places to look for better ones include Knuth, and Crandall and Pomerance
`Primes: A Computational Perspective'.

Base conversion in particular does repeated divisions by the same values
R^(2^n); it might be worth caching an approximations to R^(2^-n) and

Good luck.

-- [mdw]

Mark Wooding, Nov 27, 2010
17. ### WillemGuest

Mark Wooding wrote:
) "BartC" <> writes:
)
)> I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!
)
) Nice going!
)
)> This is using the basic algorithms for multiplication and factorials. I
)> suppose some fancier algorithms are on the cards, but that 3000 seconds for
)> decimal conversion you mentioned sounds a bit worrying; are there no fast
)> division techniques?
)
) There are; I've not implemented any because they're quite complicated
) and they aren't useful in my target application. The one I'm using is
) basically textbook long division (in base 2^32), as described in Knuth.
) Places to look for better ones include Knuth, and Crandall and Pomerance
) `Primes: A Computational Perspective'.
)
) Base conversion in particular does repeated divisions by the same values
) R^(2^n); it might be worth caching an approximations to R^(2^-n) and

There are other techniques to convert to decimal that are more tailored to
decimal specifically. These don't need repeated divisions of the whole
number. One rather simple technique is repeated addition and doubling(*),
but there are more sophisticated techniques available. Off the cuff, I
would say that an adaptation of arithmetic encoding would work well.

*) The doubling is the crux of the algorithm, so you need a fast way
to double a decimal number representation. The one I write a long
long time ago used BCD-mode in assembly and an addition loop.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Willem, Nov 27, 2010
18. ### Eric SosmanGuest

[OT] Re: Long number base conversion

On 11/26/2010 9:07 PM, BartC wrote:
> "Mark Wooding" <> wrote in message
> news:...
>> "BartC" <> writes:
>>
>>> How long did it take? My slow library took 90 minutes just to work out
>>> 100,000! (and you can read that exclamation mark either way...)

>>
>> Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
>> almost 20s to convert the result to decimal. I compute 1000000! (and
>> output the result in hex, but that's trivial) in 1m 40s; decimal output
>> takes much longer: almost 50 minutes.

I hope you're getting more than one decimal digit per division.
For example, dividing by 1000000000 gives you nine decimal digits at
a whack in a form that fits easily in a 32-bit integer.

> I've got it down to some 23 seconds for 100000! and 3500 seconds for
> 1000000!
>
> This is using the basic algorithms for multiplication and factorials. I
> suppose some fancier algorithms are on the cards, but that 3000 seconds
> for decimal conversion you mentioned sounds a bit worrying; are there no
> fast division techniques?

If you know you want to convert the result to decimal (and perhaps
that you don't want to do much else with it), one thing you can do is
handle the trailing zeroes specially. As you're plowing along building
up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
factor and count how many you've removed. Then at the end you'll have

N! = big_product * (2 ^ twos_count) * (5 ^ fives_count)

You now know that N! ends with min(twos_count, fives_count) trailing
zeroes. Since twos_count >= fives_count you don't even need the min
operation; the number of trailing zeroes is just fives_count, and
what comes before is the decimal expansion of

big_product * (2 ^ (twos_count - fives_count))

Assuming the underlying representation uses "digits" that are powers
of two, you can compute this with a shift and one multiplication (by
a power of two less than the digit base). The speed advantage comes
from the fact that you needn't do divisions to calculate all those
trailing zeroes, and that you're working with smaller bignumbers
throughout.

Any prime P appears in the prime factorization of N!

N/p + N/p^2 + N/p^3 + ...

times (using integer division with truncation), so 1000000! is
divisible by

1000000/5 + 100000/25 + ... + 1000000/390625 = 249998

and thus ends with a quarter-million trailing zeroes. Omitting them
from the accumulation means that the numbers you encounter along the
way are about 830,000 bits shorter than if you'd multiplied the whole
thing out naively.

--
Eric Sosman
lid

Eric Sosman, Nov 27, 2010
19. ### Mark WoodingGuest

Re: [OT] Re: Long number base conversion

Eric Sosman <> writes:

> I hope you're getting more than one decimal digit per division. For
> example, dividing by 1000000000 gives you nine decimal digits at a
> whack in a form that fits easily in a 32-bit integer.

I described my binary-to-decimal algorithm elsewhere; I explained that
it was a top-down divide-and-conquer algorithm, which should have given
a hint. Here's a fuller description.

Suppose the number to convert is N, and we want it in base R.

* If N = 0 then print `0' and stop. If N is negative, then print `-',
set N = -N, and continue.

* For i = 0 upwards, compute R^(2^i) by successive squaring and store
the results in a list. Stop when you find a number R^M which is
larger than N; don't bother storing this number. The maximum number
of entries in this list can be determined statically, and isn't very
large.

* Invoke the function convert(N, M/2, false), where convert(n, m, lzp)
is defined as follows.

-- (Invariant: 0 <= n < R^(2m).)

-- If n = 0 then: if lzp is true, print m `0' characters; return.

-- If m = 1 then print the base-R digit corresponding to n, and
return. (This is where the recursion bottoms out. You can
stop earlier, e.g., if n fits in a machine word: you must
ensure, if lzp is true, that you pad the output to m

-- Compute q and r such that n = q R^m + r with 0 <= r < R^m
using standard Euclidean division-with-remainder. (You should
have R^m already computed. Note that 0 <= q < R^m also,
maintaining the invariant for the recursive calls.)

-- If q = 0 and lzp is false, then perform convert(r, m/2,
false). Otherwise, perform convert(q, m/2, lzp) and
convert(r, m/2, true).

As I mentioned, this also has the benefit of producing output in
approximately the right order. If you watch it work on a large number,
you can see it pause at various points, most notably halfway through.

> If you know you want to convert the result to decimal (and perhaps
> that you don't want to do much else with it), one thing you can do is
> handle the trailing zeroes specially.

[...]

> Any prime P appears in the prime factorization of N!
>
> N/p + N/p^2 + N/p^3 + ...
>
> times (using integer division with truncation),

Yes. We can determine an upper bound on this simply by ignoring the
truncation and computing the limit (since p > 1, this is convergent): we
get N/(p - 1). Note that this is linear in N, whereas the length of N!
is bounded below (Ramanujan's approximation) by a multiple of N (log N -
1). Therefore as N gets larger, this trick becomes slowly less
effective. It's a shame, because it's a nice analysis.

-- [mdw]

Mark Wooding, Nov 27, 2010
20. ### BartCGuest

Re: [OT] Re: Long number base conversion

"Eric Sosman" <> wrote in message
news:icr452\$boe\$-september.org...
> On 11/26/2010 9:07 PM, BartC wrote:
>> "Mark Wooding" <> wrote in message
>> news:...
>>> "BartC" <> writes:
>>>
>>>> How long did it take? My slow library took 90 minutes just to work out
>>>> 100,000! (and you can read that exclamation mark either way...)
>>>
>>> Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
>>> almost 20s to convert the result to decimal. I compute 1000000! (and

>> I've got it down .... 3500 seconds for 1000000!

(/Update/: down to 2200 seconds by changing the way the result is copied at
each step.)

>>> output the result in hex, but that's trivial) in 1m 40s; decimal output
>>> takes much longer: almost 50 minutes.

> I hope you're getting more than one decimal digit per division.
> For example, dividing by 1000000000 gives you nine decimal digits at
> a whack in a form that fits easily in a 32-bit integer.

Actually the same technique can be used for the OP's problem of reading
decimal numbers into a binary big number; digits are read in groups of 9
using ordinary arithmetic, before adding into the big number.

(That is, if input numbers are going to be large; usually the output is the
problem.)

>> This is using the basic algorithms for multiplication and factorials. I

> If you know you want to convert the result to decimal (and perhaps
> that you don't want to do much else with it), one thing you can do is
> handle the trailing zeroes specially. As you're plowing along building
> up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
> factor and count how many you've removed. Then at the end you'll have

You mean that in 11! * 12, the 12 becomes 6, and I add one to the number of
two's?

What about *25 which becomes *5, do I apply the rule again to get *1?

(I've just tried this, and unless I made a mistake (which is quite likely),
I found it difficult to measure the difference on 100000!, and that's
without doing the adjustments at the end.)

--
Bartc

BartC, Nov 27, 2010