Long number base conversion

Eyegor · Nov 25, 2010

Hi,
I am working on a project, just for fun of it, so there is no 3rd
party restictions.
What I am trying to do is to write a simple bignum library, simmilar
to GMP. I know GMP exists and it works and I am not going to write
anything better than what it is, this is just for learning purposes.
Anyhow, I creaed several basic functions to add and subtract large
numbers which are basically Array Encoded Integers. I use 16-bit
signed integers as elements in the array, and the whole array
represents a very large integer, say 16000 bits or so.

The problem I ran into is writing a replica of scanf function.
Basically I want to comvert a decimal input like
67627457272523772428537372395768 into the binary array. Any suggestion
on how to do this would be greatly appreciated. I dont actually need
the code, just an algorithm suggestion. I wrote a printf function like
this about 5 years ago, but I dont have that code anymore and want to
replicate it.

thanks

BartC · Nov 25, 2010

What I am trying to do is to write a simple bignum library, simmilar

The problem I ran into is writing a replica of scanf function.
Basically I want to comvert a decimal input like
67627457272523772428537372395768 into the binary array. Any suggestion
on how to do this would be greatly appreciated. I dont actually need
the code, just an algorithm suggestion. I wrote a printf function like
this about 5 years ago, but I dont have that code anymore and want to
replicate it.

s = "67627457272523772428537372395768"

n = 0
for each character c in s:
n = n*10 + (c-'0')

You need to use your bigint arithmetic for operations here.

Eyegor · Nov 25, 2010

s = "67627457272523772428537372395768"

n = 0
for each character c in s:
n = n*10 + (c-'0')

You need to use your bigint arithmetic for operations here.

Enginious, Thank you.

Keith Thompson · Nov 25, 2010

Eyegor said:
I am working on a project, just for fun of it, so there is no 3rd
party restictions.
What I am trying to do is to write a simple bignum library, simmilar
to GMP. I know GMP exists and it works and I am not going to write
anything better than what it is, this is just for learning purposes.
Anyhow, I creaed several basic functions to add and subtract large
numbers which are basically Array Encoded Integers. I use 16-bit
signed integers as elements in the array, and the whole array
represents a very large integer, say 16000 bits or so.

Why are you using *signed* integers? I'd think that unsigned
integers would be more convenient. If you have a bignum represented
as an array of 1000 16-bit integers, you only need 1 sign bit,
not a thousand of them.

The problem I ran into is writing a replica of scanf function.
Basically I want to comvert a decimal input like
67627457272523772428537372395768 into the binary array. Any suggestion
on how to do this would be greatly appreciated. I dont actually need
the code, just an algorithm suggestion. I wrote a printf function like
this about 5 years ago, but I dont have that code anymore and want to
replicate it.

Presumably you've already got functions that implement the arithmetic
operators (+, -, *, /, %) for your Array Encoded Integers. Just use
them to implement the string-to-bignum conversion in the same way
you'd implement a string-to-integer conversion. It's the same
algorithm (which I presume you already know or can find or invent),
just using different operations.

bert · Nov 25, 2010

Hi,
I am working on a project, just for fun of it, so there is no 3rd
party restictions.
What I am trying to do is to write a simple bignum library, simmilar
to GMP. I know GMP exists and it works and I am not going to write
anything better than what it is, this is just for learning purposes.
Anyhow, I creaed several basic functions to add and subtract large
numbers which are basically Array Encoded Integers. I use 16-bit
signed integers as elements in the array, and the whole array
represents a very large integer, say 16000 bits or so.

The problem I ran into is writing a replica of scanf function.
Basically I want to comvert a decimal input like
67627457272523772428537372395768 into the binary array. Any suggestion
on how to do this would be greatly appreciated. I dont actually need
the code, just an algorithm suggestion. I wrote a printf function like
this about 5 years ago, but I dont have that code anymore and want to
replicate it.

Yes, BartC is perfectly correct.

You will find that a couple of useful additions to
your range of functions are bignum * int -> bignum,
and bignum / int -> (quot bignum, rem int). The
latter is particularly handy for the inverse of
scanf, churning out the decimal digits of a bigint.
--

Tom St Denis · Nov 25, 2010

Hi,
I am working on a project, just for fun of it, so there is no 3rd
party restictions.
What I am trying to do is to write a simple bignum library, simmilar
to GMP. I know GMP exists and it works and I am not going to write
anything better than what it is, this is just for learning purposes.
Anyhow, I creaed several basic functions to add and subtract large
numbers which are basically Array Encoded Integers. I use 16-bit
signed integers as elements in the array, and the whole array
represents a very large integer, say 16000 bits or so.

Why not check out LibTomMath [disclosure I was the original developer/
maintainer]? It's public domain, has a whole suite of typical bignum
functions (and some required for crypto), even comes with a book on
bignum math in PDF format in the archive.

I'm not trying to discourage you from writing your own code, by all
means do that. You'll also learn more if you try then to explain the
algorithms (worked for me).

http://libtom.org/?page=features&newsitems=5&whatfile=ltm

Tom

Mark Wooding · Nov 25, 2010

BartC said:
s = "67627457272523772428537372395768"

n = 0
for each character c in s:
n = n*10 + (c-'0')

This is a really sucky way of doing it. Good large-integer
multiplication algorithms work best when their inputs are approximately
the same length. The above algorithm is more or less pathological.

Here's a sketch of a fairly good algorithm. The algorithm will read a
sequence V of base-R digits of unknown length L, most significant first,
and output the integer represented by V

V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
item (X, N) onto the stack is therefore like this:

if the stack is empty, push (X, N) and return
let (X', N') be the topmost item on the stack
if N' > N then push (X, N) and return
pop (X', N')
set X <- X' R^N + X and N <- N + N' and go back to the top

Now, to convert the sequence V:

if V is exhausted, finish up (below)
read a digit D from V
push the pair (D, 1) onto the stack (above)

Finally, finish up as follows:

if the stack is empty, return 0 (or report an error: the
sequence was empty, which you might consider invalid)
let (Y, N) be the topmost item on the stack
while the stack is not empty:
pop (X, N')
set Y <- X R^N + Y and N <- N + R^N
output Y

It's not hard to compute a (fairly low) upper bound on the size of stack
required, so this can be allocated statically. All of the values of N
stored on the stack are powers of 2 (easy proof). You can save a number
of multiplications by caching R^N for the various values of N you
encounter rather than computing them afresh each time; again, the number
of items in this cache is bounded above by a small number you can
determine statically.

Plausible bounds are simple functions of CHAR_BIT and sizeof whatever
type you're using to store lengths of your bignums (this will be an
overestimate if there are padding bits, but that doesn't matter).

(I have an implementation of this in portable C, licensed under LGPL.)

-- [mdw]

BartC · Nov 25, 2010

Mark Wooding said:
This is a really sucky way of doing it. Good large-integer
multiplication algorithms work best when their inputs are approximately
the same length. The above algorithm is more or less pathological.

It'll do to get started. I like to test code (even pseudo-code), before I
post, the above took me a few minutes (not in C though).

Here's a sketch of a fairly good algorithm. The algorithm will read a
sequence V of base-R digits of unknown length L, most significant first,
and output the integer represented by V

V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

I would just assume R is 10. If bigints are stored in binary form, and R is
2 or 16 (the mostly likely apart from 10), then a string can be translated
very efficiently with a few bit-operations, without multiplies or additions.

(As it happens, my own bigint library stores numbers in decimal format,
allowing a similarly fast conversion, but the entire library is so slow
anyway that it's not worth bothering with.)

The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
item (X, N) onto the stack is therefore like this:

....

I'll have to study this further.

Eyegor · Nov 25, 2010

This is a really sucky way of doing it. Good large-integer
multiplication algorithms work best when their inputs are approximately
the same length. The above algorithm is more or less pathological.

Click to expand...

It'll do to get started. I like to test code (even pseudo-code), before I
post, the above took me a few minutes (not in C though).

Here's a sketch of a fairly good algorithm. The algorithm will read a
sequence V of base-R digits of unknown length L, most significant first,
and output the integer represented by V

Click to expand...

V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

Click to expand...

I would just assume R is 10. If bigints are stored in binary form, and R is
2 or 16 (the mostly likely apart from 10), then a string can be translated
very efficiently with a few bit-operations, without multiplies or additions.

(As it happens, my own bigint library stores numbers in decimal format,
allowing a similarly fast conversion, but the entire library is so slow
anyway that it's not worth bothering with.)

The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
item (X, N) onto the stack is therefore like this:

Click to expand...

...

I'll have to study this further.

You don't want to have base10 library, because it will by about 100
times slower than what it can be, in multiplication routine. The trick
is to make library base multiple of 2 so you can use left bit shift in
your multiplication function.

I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
boring job babysitting technicians, so occasionally I get into a self
improvement mood, this happens when I feel like I am getting dumber at
work day after day. Anyhow, during these "self improvement moods I
usually code in C or do some other learning project. This is one of
those learning projects. I did this before, but lost code couple years
back during pc upgrade. Last time I wrote a factorial calculator which
used my own bignum library and could calculate 1,000,000 factorial
with no problems.

BartC · Nov 25, 2010

You don't want to have base10 library, because it will by about 100
times slower than what it can be, in multiplication routine. The trick
is to make library base multiple of 2 so you can use left bit shift in
your multiplication function.

Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
byte).

A decimal-based library isn't that terrible (some things are easier, such as
printing results), but my library was only quickly put together to solve a
problem, and is not too serious (otherwise I wouldn't have implemented it in
an interpreted language...)

I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
boring job babysitting technicians, so occasionally I get into a self
improvement mood, this happens when I feel like I am getting dumber at
work day after day. Anyhow, during these "self improvement moods I
usually code in C or do some other learning project. This is one of
those learning projects. I did this before, but lost code couple years
back during pc upgrade. Last time I wrote a factorial calculator which
used my own bignum library and could calculate 1,000,000 factorial
with no problems.

How long did it take? My slow library took 90 minutes just to work out
100,000! (and you can read that exclamation mark either way...)

Might be time to upgrade it I think...

Eyegor · Nov 26, 2010

Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
byte).

A decimal-based library isn't that terrible (some things are easier, such as
printing results), but my library was only quickly put together to solve a
problem, and is not too serious (otherwise I wouldn't have implemented it in
an interpreted language...)

How long did it take? My slow library took 90 minutes just to work out
100,000! (and you can read that exclamation mark either way...)

Might be time to upgrade it I think...

About 20 seconds.

Eyegor · Nov 26, 2010

About 20 seconds.

Decimal multiplication is what kills you. Change library to base 2
(256, 2^16 etc) and use left shifts.

BartC · Nov 26, 2010

Decimal multiplication is what kills you. Change library to base 2
(256, 2^16 etc) and use left shifts.

About a 25000:1 difference, assuming an estimated 500000 seconds for my code
to calculate 1000000! I don't think using decimals would make that much
difference (and interpreted code might only account for 10:1 of it). I will
look tomorrow to see what's going on.

(BTW you're not running it on one of those Chinese supercomputers are you?)

Mark Wooding · Nov 26, 2010

BartC said:
How long did it take? My slow library took 90 minutes just to work out
100,000! (and you can read that exclamation mark either way...)

Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
almost 20s to convert the result to decimal. I compute 1000000! (and
output the result in hex, but that's trivial) in 1m 40s; decimal output
takes much longer: almost 50 minutes.

I'm using Karatsuba multiplication for large numbers, but nothing
fancier than that -- one probably ought to be using FFT-based algorithms
for these sizes of numbers, but my library is mainly intended for
cryptography, for which even Karatsuba is of questionable benefit. My
library is written entirely in (obsessively) portable C.

Beyond that, my algorithms for factorial computation and base coversion
are divide-and-conquer affairs, designed to operate on numbers of
similar sizes. Factorial is computed bottom-up, and uses a similar
stack trick to the decimal-to-binary converter I outlined earlier;
binary-to-decimal uses a top-down recursive algorithm I sketched in a
different thread quite recently.

(My library is, for some reason, uncommonly fast at factorial
computations, even though I only implemented them as a useless toy.
When I compared a few years ago, it was faster than GMP, even though GMP
has fancier multiply algorithms and chunks of assembler code. I suspect
that GMP has improved since then. My source code is publicly
available.)

-- [mdw]

BartC · Nov 27, 2010

Mark Wooding said:
Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
almost 20s to convert the result to decimal. I compute 1000000! (and
output the result in hex, but that's trivial) in 1m 40s; decimal output
takes much longer: almost 50 minutes.

I've got it down to some 23 seconds for 100000! and 3500 seconds for
1000000!

This is using the basic algorithms for multiplication and factorials. I
suppose some fancier algorithms are on the cards, but that 3000 seconds for
decimal conversion you mentioned sounds a bit worrying; are there no fast
division techniques?

Mark Wooding · Nov 27, 2010

BartC said:
I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!

Nice going!

This is using the basic algorithms for multiplication and factorials. I
suppose some fancier algorithms are on the cards, but that 3000 seconds for
decimal conversion you mentioned sounds a bit worrying; are there no fast
division techniques?

There are; I've not implemented any because they're quite complicated
and they aren't useful in my target application. The one I'm using is
basically textbook long division (in base 2^32), as described in Knuth.
Places to look for better ones include Knuth, and Crandall and Pomerance
`Primes: A Computational Perspective'.

Base conversion in particular does repeated divisions by the same values
R^(2^n); it might be worth caching an approximations to R^(2^-n) and
multiplying by that instead.

Good luck.

-- [mdw]

Willem · Nov 27, 2010

Mark Wooding wrote:
)
)> I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!
)
) Nice going!
)
)> This is using the basic algorithms for multiplication and factorials. I
)> suppose some fancier algorithms are on the cards, but that 3000 seconds for
)> decimal conversion you mentioned sounds a bit worrying; are there no fast
)> division techniques?
)
) There are; I've not implemented any because they're quite complicated
) and they aren't useful in my target application. The one I'm using is
) basically textbook long division (in base 2^32), as described in Knuth.
) Places to look for better ones include Knuth, and Crandall and Pomerance
) `Primes: A Computational Perspective'.
)
) Base conversion in particular does repeated divisions by the same values
) R^(2^n); it might be worth caching an approximations to R^(2^-n) and
) multiplying by that instead.

There are other techniques to convert to decimal that are more tailored to
decimal specifically. These don't need repeated divisions of the whole
number. One rather simple technique is repeated addition and doubling(*),
but there are more sophisticated techniques available. Off the cuff, I
would say that an adaptation of arithmetic encoding would work well.

*) The doubling is the crux of the algorithm, so you need a fast way
to double a decimal number representation. The one I write a long
long time ago used BCD-mode in assembly and an addition loop.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Eric Sosman · Nov 27, 2010

I hope you're getting more than one decimal digit per division.
For example, dividing by 1000000000 gives you nine decimal digits at
a whack in a form that fits easily in a 32-bit integer.

I've got it down to some 23 seconds for 100000! and 3500 seconds for
1000000!

This is using the basic algorithms for multiplication and factorials. I
suppose some fancier algorithms are on the cards, but that 3000 seconds
for decimal conversion you mentioned sounds a bit worrying; are there no
fast division techniques?

If you know you want to convert the result to decimal (and perhaps
that you don't want to do much else with it), one thing you can do is
handle the trailing zeroes specially. As you're plowing along building
up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
factor and count how many you've removed. Then at the end you'll have

N! = big_product * (2 ^ twos_count) * (5 ^ fives_count)

You now know that N! ends with min(twos_count, fives_count) trailing
zeroes. Since twos_count >= fives_count you don't even need the min
operation; the number of trailing zeroes is just fives_count, and
what comes before is the decimal expansion of

big_product * (2 ^ (twos_count - fives_count))

Assuming the underlying representation uses "digits" that are powers
of two, you can compute this with a shift and one multiplication (by
a power of two less than the digit base). The speed advantage comes
from the fact that you needn't do divisions to calculate all those
trailing zeroes, and that you're working with smaller bignumbers
throughout.

Any prime P appears in the prime factorization of N!

N/p + N/p^2 + N/p^3 + ...

times (using integer division with truncation), so 1000000! is
divisible by

1000000/5 + 100000/25 + ... + 1000000/390625 = 249998

and thus ends with a quarter-million trailing zeroes. Omitting them
from the accumulation means that the numbers you encounter along the
way are about 830,000 bits shorter than if you'd multiplied the whole
thing out naively.

Mark Wooding · Nov 27, 2010

Eric Sosman said:
I hope you're getting more than one decimal digit per division. For
example, dividing by 1000000000 gives you nine decimal digits at a
whack in a form that fits easily in a 32-bit integer.

I described my binary-to-decimal algorithm elsewhere; I explained that
it was a top-down divide-and-conquer algorithm, which should have given
a hint. Here's a fuller description.

Suppose the number to convert is N, and we want it in base R.

* If N = 0 then print `0' and stop. If N is negative, then print `-',
set N = -N, and continue.

* For i = 0 upwards, compute R^(2^i) by successive squaring and store
the results in a list. Stop when you find a number R^M which is
larger than N; don't bother storing this number. The maximum number
of entries in this list can be determined statically, and isn't very
large.

* Invoke the function convert(N, M/2, false), where convert(n, m, lzp)
is defined as follows.

-- (Invariant: 0 <= n < R^(2m).)

-- If n = 0 then: if lzp is true, print m `0' characters; return.

-- If m = 1 then print the base-R digit corresponding to n, and
return. (This is where the recursion bottoms out. You can
stop earlier, e.g., if n fits in a machine word: you must
ensure, if lzp is true, that you pad the output to m
characters with leading `0's.)

-- Compute q and r such that n = q R^m + r with 0 <= r < R^m
using standard Euclidean division-with-remainder. (You should
have R^m already computed. Note that 0 <= q < R^m also,
maintaining the invariant for the recursive calls.)

-- If q = 0 and lzp is false, then perform convert(r, m/2,
false). Otherwise, perform convert(q, m/2, lzp) and
convert(r, m/2, true).

As I mentioned, this also has the benefit of producing output in
approximately the right order. If you watch it work on a large number,
you can see it pause at various points, most notably halfway through.

If you know you want to convert the result to decimal (and perhaps
that you don't want to do much else with it), one thing you can do is
handle the trailing zeroes specially.
[...]

Any prime P appears in the prime factorization of N!

N/p + N/p^2 + N/p^3 + ...

times (using integer division with truncation),

Yes. We can determine an upper bound on this simply by ignoring the
truncation and computing the limit (since p > 1, this is convergent): we
get N/(p - 1). Note that this is linear in N, whereas the length of N!
is bounded below (Ramanujan's approximation) by a multiple of N (log N -
1). Therefore as N gets larger, this trick becomes slowly less
effective. It's a shame, because it's a nice analysis.

-- [mdw]

BartC · Nov 27, 2010

(/Update/: down to 2200 seconds by changing the way the result is copied at
each step.)

I hope you're getting more than one decimal digit per division.
For example, dividing by 1000000000 gives you nine decimal digits at
a whack in a form that fits easily in a 32-bit integer.

Actually the same technique can be used for the OP's problem of reading
decimal numbers into a binary big number; digits are read in groups of 9
using ordinary arithmetic, before adding into the big number.

(That is, if input numbers are going to be large; usually the output is the
problem.)

If you know you want to convert the result to decimal (and perhaps
that you don't want to do much else with it), one thing you can do is
handle the trailing zeroes specially. As you're plowing along building
up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
factor and count how many you've removed. Then at the end you'll have

You mean that in 11! * 12, the 12 becomes 6, and I add one to the number of
two's?

What about *25 which becomes *5, do I apply the rule again to get *1?

(I've just tried this, and unless I made a mistake (which is quite likely),
I found it difficult to measure the difference on 100000!, and that's
without doing the adjustments at the end.)

C / asm / long ints	32	Oct 26, 2006
Large number conversion, ASCII to binary	10	Jul 14, 2008
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
[semi OT] - Lack of long double implementation in VS	10	Oct 23, 2011
Finding a perfect number.	19	Apr 28, 2006
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Long Num speed	35	Nov 1, 2006
Can't solve problems! please Help	0	Sep 26, 2022

Long number base conversion

Eyegor

BartC

Eyegor

Keith Thompson

bert

Tom St Denis

Mark Wooding

BartC

Eyegor

BartC

Eyegor

Eyegor

BartC

Mark Wooding

BartC

Mark Wooding

Willem

Eric Sosman

Mark Wooding

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads