Long number base conversion

Discussion in 'C Programming' started by Eyegor, Nov 25, 2010.

  1. Eyegor

    Eyegor Guest

    Hi,
    I am working on a project, just for fun of it, so there is no 3rd
    party restictions.
    What I am trying to do is to write a simple bignum library, simmilar
    to GMP. I know GMP exists and it works and I am not going to write
    anything better than what it is, this is just for learning purposes.
    Anyhow, I creaed several basic functions to add and subtract large
    numbers which are basically Array Encoded Integers. I use 16-bit
    signed integers as elements in the array, and the whole array
    represents a very large integer, say 16000 bits or so.

    The problem I ran into is writing a replica of scanf function.
    Basically I want to comvert a decimal input like
    67627457272523772428537372395768 into the binary array. Any suggestion
    on how to do this would be greatly appreciated. I dont actually need
    the code, just an algorithm suggestion. I wrote a printf function like
    this about 5 years ago, but I dont have that code anymore and want to
    replicate it.

    thanks
     
    Eyegor, Nov 25, 2010
    #1
    1. Advertising

  2. Eyegor

    BartC Guest

    "Eyegor" <> wrote in message
    news:...

    > What I am trying to do is to write a simple bignum library, simmilar


    > The problem I ran into is writing a replica of scanf function.
    > Basically I want to comvert a decimal input like
    > 67627457272523772428537372395768 into the binary array. Any suggestion
    > on how to do this would be greatly appreciated. I dont actually need
    > the code, just an algorithm suggestion. I wrote a printf function like
    > this about 5 years ago, but I dont have that code anymore and want to
    > replicate it.


    s = "67627457272523772428537372395768"

    n = 0
    for each character c in s:
    n = n*10 + (c-'0')

    You need to use your bigint arithmetic for operations here.

    --
    Bartc
     
    BartC, Nov 25, 2010
    #2
    1. Advertising

  3. Eyegor

    Eyegor Guest

    On Nov 24, 8:36 pm, "BartC" <> wrote:
    > "Eyegor" <> wrote in message
    >
    > news:...
    >
    > > What I am trying to do is to write a simple bignum library, simmilar
    > > The problem I ran into is writing a replica of scanf function.
    > > Basically I want to comvert a decimal input like
    > > 67627457272523772428537372395768 into the binary array. Any suggestion
    > > on how to do this would be greatly appreciated. I dont actually need
    > > the code, just an algorithm suggestion. I wrote a printf function like
    > > this about 5 years ago, but I dont have that code anymore and want to
    > > replicate it.

    >
    > s = "67627457272523772428537372395768"
    >
    > n = 0
    > for each character c in s:
    >    n = n*10 + (c-'0')
    >
    > You need to use your bigint arithmetic for operations here.
    >
    > --
    > Bartc


    Enginious, Thank you.
     
    Eyegor, Nov 25, 2010
    #3
  4. Eyegor <> writes:
    > I am working on a project, just for fun of it, so there is no 3rd
    > party restictions.
    > What I am trying to do is to write a simple bignum library, simmilar
    > to GMP. I know GMP exists and it works and I am not going to write
    > anything better than what it is, this is just for learning purposes.
    > Anyhow, I creaed several basic functions to add and subtract large
    > numbers which are basically Array Encoded Integers. I use 16-bit
    > signed integers as elements in the array, and the whole array
    > represents a very large integer, say 16000 bits or so.


    Why are you using *signed* integers? I'd think that unsigned
    integers would be more convenient. If you have a bignum represented
    as an array of 1000 16-bit integers, you only need 1 sign bit,
    not a thousand of them.

    > The problem I ran into is writing a replica of scanf function.
    > Basically I want to comvert a decimal input like
    > 67627457272523772428537372395768 into the binary array. Any suggestion
    > on how to do this would be greatly appreciated. I dont actually need
    > the code, just an algorithm suggestion. I wrote a printf function like
    > this about 5 years ago, but I dont have that code anymore and want to
    > replicate it.


    Presumably you've already got functions that implement the arithmetic
    operators (+, -, *, /, %) for your Array Encoded Integers. Just use
    them to implement the string-to-bignum conversion in the same way
    you'd implement a string-to-integer conversion. It's the same
    algorithm (which I presume you already know or can find or invent),
    just using different operations.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 25, 2010
    #4
  5. Eyegor

    bert Guest

    On Nov 25, 1:21 am, Eyegor <> wrote:
    > Hi,
    > I am working on a project, just for fun of it, so there is no 3rd
    > party restictions.
    > What I am trying to do is to write a simple bignum library, simmilar
    > to GMP. I know GMP exists and it works and I am not going to write
    > anything better than what it is, this is just for learning purposes.
    > Anyhow, I creaed several basic functions to add and subtract large
    > numbers which are basically Array Encoded Integers. I use 16-bit
    > signed integers as elements in the array, and the whole array
    > represents a very large integer, say 16000 bits or so.
    >
    > The problem I ran into is writing a replica of scanf function.
    > Basically I want to comvert a decimal input like
    > 67627457272523772428537372395768 into the binary array. Any suggestion
    > on how to do this would be greatly appreciated. I dont actually need
    > the code, just an algorithm suggestion. I wrote a printf function like
    > this about 5 years ago, but I dont have that code anymore and want to
    > replicate it.


    Yes, BartC is perfectly correct.

    You will find that a couple of useful additions to
    your range of functions are bignum * int -> bignum,
    and bignum / int -> (quot bignum, rem int). The
    latter is particularly handy for the inverse of
    scanf, churning out the decimal digits of a bigint.
    --
     
    bert, Nov 25, 2010
    #5
  6. Eyegor

    Tom St Denis Guest

    On Nov 24, 8:21 pm, Eyegor <> wrote:
    > Hi,
    > I am working on a project, just for fun of it, so there is no 3rd
    > party restictions.
    > What I am trying to do is to write a simple bignum library, simmilar
    > to GMP. I know GMP exists and it works and I am not going to write
    > anything better than what it is, this is just for learning purposes.
    > Anyhow, I creaed several basic functions to add and subtract large
    > numbers which are basically Array Encoded Integers. I use 16-bit
    > signed integers as elements in the array, and the whole array
    > represents a very large integer, say 16000 bits or so.


    Why not check out LibTomMath [disclosure I was the original developer/
    maintainer]? It's public domain, has a whole suite of typical bignum
    functions (and some required for crypto), even comes with a book on
    bignum math in PDF format in the archive.

    I'm not trying to discourage you from writing your own code, by all
    means do that. You'll also learn more if you try then to explain the
    algorithms (worked for me).

    http://libtom.org/?page=features&newsitems=5&whatfile=ltm

    Tom
     
    Tom St Denis, Nov 25, 2010
    #6
  7. Eyegor

    Mark Wooding Guest

    "BartC" <> writes:

    > "Eyegor" <> wrote in message
    > news:...
    >
    > > What I am trying to do is to write a simple bignum library, simmilar

    >
    > > The problem I ran into is writing a replica of scanf function.
    > > Basically I want to comvert a decimal input like
    > > 67627457272523772428537372395768 into the binary array. Any suggestion
    > > on how to do this would be greatly appreciated. I dont actually need
    > > the code, just an algorithm suggestion. I wrote a printf function like
    > > this about 5 years ago, but I dont have that code anymore and want to
    > > replicate it.

    >
    > s = "67627457272523772428537372395768"
    >
    > n = 0
    > for each character c in s:
    > n = n*10 + (c-'0')


    This is a really sucky way of doing it. Good large-integer
    multiplication algorithms work best when their inputs are approximately
    the same length. The above algorithm is more or less pathological.

    Here's a sketch of a fairly good algorithm. The algorithm will read a
    sequence V of base-R digits of unknown length L, most significant first,
    and output the integer represented by V

    V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

    The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
    such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
    is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
    item (X, N) onto the stack is therefore like this:

    if the stack is empty, push (X, N) and return
    let (X', N') be the topmost item on the stack
    if N' > N then push (X, N) and return
    pop (X', N')
    set X <- X' R^N + X and N <- N + N' and go back to the top

    Now, to convert the sequence V:

    if V is exhausted, finish up (below)
    read a digit D from V
    push the pair (D, 1) onto the stack (above)

    Finally, finish up as follows:

    if the stack is empty, return 0 (or report an error: the
    sequence was empty, which you might consider invalid)
    let (Y, N) be the topmost item on the stack
    while the stack is not empty:
    pop (X, N')
    set Y <- X R^N + Y and N <- N + R^N
    output Y

    It's not hard to compute a (fairly low) upper bound on the size of stack
    required, so this can be allocated statically. All of the values of N
    stored on the stack are powers of 2 (easy proof). You can save a number
    of multiplications by caching R^N for the various values of N you
    encounter rather than computing them afresh each time; again, the number
    of items in this cache is bounded above by a small number you can
    determine statically.

    Plausible bounds are simple functions of CHAR_BIT and sizeof whatever
    type you're using to store lengths of your bignums (this will be an
    overestimate if there are padding bits, but that doesn't matter).

    (I have an implementation of this in portable C, licensed under LGPL.)

    -- [mdw]
     
    Mark Wooding, Nov 25, 2010
    #7
  8. Eyegor

    BartC Guest

    "Mark Wooding" <> wrote in message
    news:...
    > "BartC" <> writes:
    >
    >> "Eyegor" <> wrote in message
    >> news:...


    >> > The problem I ran into is writing a replica of scanf function.
    >> > Basically I want to comvert a decimal input like
    >> > 67627457272523772428537372395768 into the binary array. Any suggestion
    >> > on how to do this would be greatly appreciated. I dont actually need
    >> > the code, just an algorithm suggestion.


    >> s = "67627457272523772428537372395768"
    >>
    >> n = 0
    >> for each character c in s:
    >> n = n*10 + (c-'0')

    >
    > This is a really sucky way of doing it. Good large-integer
    > multiplication algorithms work best when their inputs are approximately
    > the same length. The above algorithm is more or less pathological.


    It'll do to get started. I like to test code (even pseudo-code), before I
    post, the above took me a few minutes (not in C though).

    > Here's a sketch of a fairly good algorithm. The algorithm will read a
    > sequence V of base-R digits of unknown length L, most significant first,
    > and output the integer represented by V
    >
    > V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]


    I would just assume R is 10. If bigints are stored in binary form, and R is
    2 or 16 (the mostly likely apart from 10), then a string can be translated
    very efficiently with a few bit-operations, without multiplies or additions.

    (As it happens, my own bigint library stores numbers in decimal format,
    allowing a similarly fast conversion, but the entire library is so slow
    anyway that it's not worth bothering with.)

    > The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
    > such that 0 <= X < R^N. The stack maintains an invariant: if (X0, N0)
    > is nearer the top than (X1, N1) then N0 < N1. The algorithm to push an
    > item (X, N) onto the stack is therefore like this:

    ....

    I'll have to study this further.

    --
    Bartc
     
    BartC, Nov 25, 2010
    #8
  9. Eyegor

    Eyegor Guest

    On Nov 25, 12:20 pm, "BartC" <> wrote:
    > "Mark Wooding" <> wrote in message
    >
    > news:...
    >
    >
    >
    > > "BartC" <> writes:

    >
    > >> "Eyegor" <> wrote in message
    > >>news:....
    > >> > The problem I ran into is writing a replica of scanf function.
    > >> > Basically I want to comvert a decimal input like
    > >> > 67627457272523772428537372395768 into the binary array. Any suggestion
    > >> > on how to do this would be greatly appreciated. I dont actually need
    > >> > the code, just an algorithm suggestion.
    > >> s = "67627457272523772428537372395768"

    >
    > >> n = 0
    > >> for each character c in s:
    > >>   n = n*10 + (c-'0')

    >
    > > This is a really sucky way of doing it.  Good large-integer
    > > multiplication algorithms work best when their inputs are approximately
    > > the same length.  The above algorithm is more or less pathological.

    >
    > It'll do to get started. I like to test code (even pseudo-code), before I
    > post, the above took me a few minutes (not in C though).
    >
    > > Here's a sketch of a fairly good algorithm.  The algorithm will read a
    > > sequence V of base-R digits of unknown length L, most significant first,
    > > and output the integer represented by V

    >
    > >        V[0] R^(L-1) + V[1] R^(L-2) + ... + V[L - 1]

    >
    > I would just assume R is 10. If bigints are stored in binary form, and R is
    > 2 or 16 (the mostly likely apart from 10), then a string can be translated
    > very efficiently with a few bit-operations, without multiplies or additions.
    >
    > (As it happens, my own bigint library stores numbers in decimal format,
    > allowing a similarly fast conversion, but the entire library is so slow
    > anyway that it's not worth bothering with.)
    >
    > > The algorithm is nonrecursive, but makes use of a stack of pairs (X, N)
    > > such that 0 <= X < R^N.  The stack maintains an invariant: if (X0, N0)
    > > is nearer the top than (X1, N1) then N0 < N1.  The algorithm to push an
    > > item (X, N) onto the stack is therefore like this:

    >
    > ...
    >
    > I'll have to study this further.
    >
    > --
    > Bartc


    You don't want to have base10 library, because it will by about 100
    times slower than what it can be, in multiplication routine. The trick
    is to make library base multiple of 2 so you can use left bit shift in
    your multiplication function.

    I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
    boring job babysitting technicians, so occasionally I get into a self
    improvement mood, this happens when I feel like I am getting dumber at
    work day after day. Anyhow, during these "self improvement moods I
    usually code in C or do some other learning project. This is one of
    those learning projects. I did this before, but lost code couple years
    back during pc upgrade. Last time I wrote a factorial calculator which
    used my own bignum library and could calculate 1,000,000 factorial
    with no problems.
     
    Eyegor, Nov 25, 2010
    #9
  10. Eyegor

    BartC Guest

    "Eyegor" <> wrote in message
    news:...
    > On Nov 25, 12:20 pm, "BartC" <> wrote:


    >> (As it happens, my own bigint library stores numbers in decimal format,
    >> allowing a similarly fast conversion, but the entire library is so slow
    >> anyway that it's not worth bothering with.)


    > You don't want to have base10 library, because it will by about 100
    > times slower than what it can be, in multiplication routine. The trick
    > is to make library base multiple of 2 so you can use left bit shift in
    > your multiplication function.


    Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
    ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
    byte).

    A decimal-based library isn't that terrible (some things are easier, such as
    printing results), but my library was only quickly put together to solve a
    problem, and is not too serious (otherwise I wouldn't have implemented it in
    an interpreted language...)


    > I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
    > boring job babysitting technicians, so occasionally I get into a self
    > improvement mood, this happens when I feel like I am getting dumber at
    > work day after day. Anyhow, during these "self improvement moods I
    > usually code in C or do some other learning project. This is one of
    > those learning projects. I did this before, but lost code couple years
    > back during pc upgrade. Last time I wrote a factorial calculator which
    > used my own bignum library and could calculate 1,000,000 factorial
    > with no problems.


    How long did it take? My slow library took 90 minutes just to work out
    100,000! (and you can read that exclamation mark either way...)

    Might be time to upgrade it I think...

    --
    Bartc
     
    BartC, Nov 25, 2010
    #10
  11. Eyegor

    Eyegor Guest

    On Nov 25, 5:55 pm, "BartC" <> wrote:
    > "Eyegor" <> wrote in message
    >
    > news:...
    >
    > > On Nov 25, 12:20 pm, "BartC" <> wrote:
    > >> (As it happens, my own bigint library stores numbers in decimal format,
    > >> allowing a similarly fast conversion, but the entire library is so slow
    > >> anyway that it's not worth bothering with.)

    > > You don't want to have base10 library, because it will by about 100
    > > times slower than what it can be, in multiplication routine. The trick
    > > is to make library base multiple of 2 so you can use left bit shift in
    > > your multiplication function.

    >
    > Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
    > ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
    > byte).
    >
    > A decimal-based library isn't that terrible (some things are easier, such as
    > printing results), but my library was only quickly put together to solve a
    > problem, and is not too serious (otherwise I wouldn't have implemented it in
    > an interpreted language...)
    >
    > > I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
    > > boring job babysitting technicians, so occasionally I get into a self
    > > improvement mood, this happens when I feel like I am getting dumber at
    > > work day after day. Anyhow, during these "self improvement moods I
    > > usually code in C or do some other learning project. This is one of
    > > those learning projects. I did this before, but lost code couple years
    > > back during pc upgrade. Last time I wrote a factorial calculator which
    > > used my own bignum library and could calculate 1,000,000 factorial
    > > with no problems.

    >
    > How long did it take? My slow library took 90 minutes just to work out
    > 100,000! (and you can read that exclamation mark either way...)
    >
    > Might be time to upgrade it I think...
    >
    > --
    > Bartc


    About 20 seconds.
     
    Eyegor, Nov 26, 2010
    #11
  12. Eyegor

    Eyegor Guest

    On Nov 25, 7:30 pm, Eyegor <> wrote:
    > On Nov 25, 5:55 pm, "BartC" <> wrote:
    >
    >
    >
    > > "Eyegor" <> wrote in message

    >
    > >news:....

    >
    > > > On Nov 25, 12:20 pm, "BartC" <> wrote:
    > > >> (As it happens, my own bigint library stores numbers in decimal format,
    > > >> allowing a similarly fast conversion, but the entire library is so slow
    > > >> anyway that it's not worth bothering with.)
    > > > You don't want to have base10 library, because it will by about 100
    > > > times slower than what it can be, in multiplication routine. The trick
    > > > is to make library base multiple of 2 so you can use left bit shift in
    > > > your multiplication function.

    >
    > > Binary allows you to use 16x16->32-bit multiplies (or even 32x32->64-bit
    > > ones), instead of 3.3x3.3 bits->6.6 bits (as I stored one decimal digit per
    > > byte).

    >
    > > A decimal-based library isn't that terrible (some things are easier, such as
    > > printing results), but my library was only quickly put together to solve a
    > > problem, and is not too serious (otherwise I wouldn't have implemented it in
    > > an interpreted language...)

    >
    > > > I am a Nuclear Engineer with M.Sc. in Physics, but I have a really
    > > > boring job babysitting technicians, so occasionally I get into a self
    > > > improvement mood, this happens when I feel like I am getting dumber at
    > > > work day after day. Anyhow, during these "self improvement moods I
    > > > usually code in C or do some other learning project. This is one of
    > > > those learning projects. I did this before, but lost code couple years
    > > > back during pc upgrade. Last time I wrote a factorial calculator which
    > > > used my own bignum library and could calculate 1,000,000 factorial
    > > > with no problems.

    >
    > > How long did it take? My slow library took 90 minutes just to work out
    > > 100,000! (and you can read that exclamation mark either way...)

    >
    > > Might be time to upgrade it I think...

    >
    > > --
    > > Bartc

    >
    > About 20 seconds.


    Decimal multiplication is what kills you. Change library to base 2
    (256, 2^16 etc) and use left shifts.
     
    Eyegor, Nov 26, 2010
    #12
  13. Eyegor

    BartC Guest

    "Eyegor" <> wrote in message
    news:...
    > On Nov 25, 7:30 pm, Eyegor <> wrote:
    >> On Nov 25, 5:55 pm, "BartC" <> wrote:


    >> > > Last time I wrote a factorial calculator which
    >> > > used my own bignum library and could calculate 1,000,000 factorial
    >> > > with no problems.

    >>
    >> > How long did it take? My slow library took 90 minutes just to work out
    >> > 100,000! (and you can read that exclamation mark either way...)

    >>
    >> > Might be time to upgrade it I think...


    >> About 20 seconds.


    > Decimal multiplication is what kills you. Change library to base 2
    > (256, 2^16 etc) and use left shifts.


    About a 25000:1 difference, assuming an estimated 500000 seconds for my code
    to calculate 1000000! I don't think using decimals would make that much
    difference (and interpreted code might only account for 10:1 of it). I will
    look tomorrow to see what's going on.

    (BTW you're not running it on one of those Chinese supercomputers are you?)

    --
    Bartc
     
    BartC, Nov 26, 2010
    #13
  14. Eyegor

    Mark Wooding Guest

    "BartC" <> writes:

    > How long did it take? My slow library took 90 minutes just to work out
    > 100,000! (and you can read that exclamation mark either way...)


    Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
    almost 20s to convert the result to decimal. I compute 1000000! (and
    output the result in hex, but that's trivial) in 1m 40s; decimal output
    takes much longer: almost 50 minutes.

    I'm using Karatsuba multiplication for large numbers, but nothing
    fancier than that -- one probably ought to be using FFT-based algorithms
    for these sizes of numbers, but my library is mainly intended for
    cryptography, for which even Karatsuba is of questionable benefit. My
    library is written entirely in (obsessively) portable C.

    Beyond that, my algorithms for factorial computation and base coversion
    are divide-and-conquer affairs, designed to operate on numbers of
    similar sizes. Factorial is computed bottom-up, and uses a similar
    stack trick to the decimal-to-binary converter I outlined earlier;
    binary-to-decimal uses a top-down recursive algorithm I sketched in a
    different thread quite recently.

    (My library is, for some reason, uncommonly fast at factorial
    computations, even though I only implemented them as a useless toy.
    When I compared a few years ago, it was faster than GMP, even though GMP
    has fancier multiply algorithms and chunks of assembler code. I suspect
    that GMP has improved since then. My source code is publicly
    available.)

    -- [mdw]
     
    Mark Wooding, Nov 26, 2010
    #14
  15. Eyegor

    BartC Guest

    "Mark Wooding" <> wrote in message
    news:...
    > "BartC" <> writes:
    >
    >> How long did it take? My slow library took 90 minutes just to work out
    >> 100,000! (and you can read that exclamation mark either way...)

    >
    > Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
    > almost 20s to convert the result to decimal. I compute 1000000! (and
    > output the result in hex, but that's trivial) in 1m 40s; decimal output
    > takes much longer: almost 50 minutes.


    I've got it down to some 23 seconds for 100000! and 3500 seconds for
    1000000!

    This is using the basic algorithms for multiplication and factorials. I
    suppose some fancier algorithms are on the cards, but that 3000 seconds for
    decimal conversion you mentioned sounds a bit worrying; are there no fast
    division techniques?

    --
    Bartc
     
    BartC, Nov 27, 2010
    #15
  16. Eyegor

    Mark Wooding Guest

    "BartC" <> writes:

    > I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!


    Nice going!

    > This is using the basic algorithms for multiplication and factorials. I
    > suppose some fancier algorithms are on the cards, but that 3000 seconds for
    > decimal conversion you mentioned sounds a bit worrying; are there no fast
    > division techniques?


    There are; I've not implemented any because they're quite complicated
    and they aren't useful in my target application. The one I'm using is
    basically textbook long division (in base 2^32), as described in Knuth.
    Places to look for better ones include Knuth, and Crandall and Pomerance
    `Primes: A Computational Perspective'.

    Base conversion in particular does repeated divisions by the same values
    R^(2^n); it might be worth caching an approximations to R^(2^-n) and
    multiplying by that instead.

    Good luck.

    -- [mdw]
     
    Mark Wooding, Nov 27, 2010
    #16
  17. Eyegor

    Willem Guest

    Mark Wooding wrote:
    ) "BartC" <> writes:
    )
    )> I've got it down to some 23 seconds for 100000! and 3500 seconds for 1000000!
    )
    ) Nice going!
    )
    )> This is using the basic algorithms for multiplication and factorials. I
    )> suppose some fancier algorithms are on the cards, but that 3000 seconds for
    )> decimal conversion you mentioned sounds a bit worrying; are there no fast
    )> division techniques?
    )
    ) There are; I've not implemented any because they're quite complicated
    ) and they aren't useful in my target application. The one I'm using is
    ) basically textbook long division (in base 2^32), as described in Knuth.
    ) Places to look for better ones include Knuth, and Crandall and Pomerance
    ) `Primes: A Computational Perspective'.
    )
    ) Base conversion in particular does repeated divisions by the same values
    ) R^(2^n); it might be worth caching an approximations to R^(2^-n) and
    ) multiplying by that instead.

    There are other techniques to convert to decimal that are more tailored to
    decimal specifically. These don't need repeated divisions of the whole
    number. One rather simple technique is repeated addition and doubling(*),
    but there are more sophisticated techniques available. Off the cuff, I
    would say that an adaptation of arithmetic encoding would work well.

    *) The doubling is the crux of the algorithm, so you need a fast way
    to double a decimal number representation. The one I write a long
    long time ago used BCD-mode in assembly and an addition loop.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 27, 2010
    #17
  18. Eyegor

    Eric Sosman Guest

    [OT] Re: Long number base conversion

    On 11/26/2010 9:07 PM, BartC wrote:
    > "Mark Wooding" <> wrote in message
    > news:...
    >> "BartC" <> writes:
    >>
    >>> How long did it take? My slow library took 90 minutes just to work out
    >>> 100,000! (and you can read that exclamation mark either way...)

    >>
    >> Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
    >> almost 20s to convert the result to decimal. I compute 1000000! (and
    >> output the result in hex, but that's trivial) in 1m 40s; decimal output
    >> takes much longer: almost 50 minutes.


    I hope you're getting more than one decimal digit per division.
    For example, dividing by 1000000000 gives you nine decimal digits at
    a whack in a form that fits easily in a 32-bit integer.

    > I've got it down to some 23 seconds for 100000! and 3500 seconds for
    > 1000000!
    >
    > This is using the basic algorithms for multiplication and factorials. I
    > suppose some fancier algorithms are on the cards, but that 3000 seconds
    > for decimal conversion you mentioned sounds a bit worrying; are there no
    > fast division techniques?


    If you know you want to convert the result to decimal (and perhaps
    that you don't want to do much else with it), one thing you can do is
    handle the trailing zeroes specially. As you're plowing along building
    up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
    factor and count how many you've removed. Then at the end you'll have

    N! = big_product * (2 ^ twos_count) * (5 ^ fives_count)

    You now know that N! ends with min(twos_count, fives_count) trailing
    zeroes. Since twos_count >= fives_count you don't even need the min
    operation; the number of trailing zeroes is just fives_count, and
    what comes before is the decimal expansion of

    big_product * (2 ^ (twos_count - fives_count))

    Assuming the underlying representation uses "digits" that are powers
    of two, you can compute this with a shift and one multiplication (by
    a power of two less than the digit base). The speed advantage comes
    from the fact that you needn't do divisions to calculate all those
    trailing zeroes, and that you're working with smaller bignumbers
    throughout.

    Any prime P appears in the prime factorization of N!

    N/p + N/p^2 + N/p^3 + ...

    times (using integer division with truncation), so 1000000! is
    divisible by

    1000000/5 + 100000/25 + ... + 1000000/390625 = 249998

    and thus ends with a quarter-million trailing zeroes. Omitting them
    from the accumulation means that the numbers you encounter along the
    way are about 830,000 bits shorter than if you'd multiplied the whole
    thing out naively.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Nov 27, 2010
    #18
  19. Eyegor

    Mark Wooding Guest

    Re: [OT] Re: Long number base conversion

    Eric Sosman <> writes:

    > I hope you're getting more than one decimal digit per division. For
    > example, dividing by 1000000000 gives you nine decimal digits at a
    > whack in a form that fits easily in a 32-bit integer.


    I described my binary-to-decimal algorithm elsewhere; I explained that
    it was a top-down divide-and-conquer algorithm, which should have given
    a hint. Here's a fuller description.

    Suppose the number to convert is N, and we want it in base R.

    * If N = 0 then print `0' and stop. If N is negative, then print `-',
    set N = -N, and continue.

    * For i = 0 upwards, compute R^(2^i) by successive squaring and store
    the results in a list. Stop when you find a number R^M which is
    larger than N; don't bother storing this number. The maximum number
    of entries in this list can be determined statically, and isn't very
    large.

    * Invoke the function convert(N, M/2, false), where convert(n, m, lzp)
    is defined as follows.

    -- (Invariant: 0 <= n < R^(2m).)

    -- If n = 0 then: if lzp is true, print m `0' characters; return.

    -- If m = 1 then print the base-R digit corresponding to n, and
    return. (This is where the recursion bottoms out. You can
    stop earlier, e.g., if n fits in a machine word: you must
    ensure, if lzp is true, that you pad the output to m
    characters with leading `0's.)

    -- Compute q and r such that n = q R^m + r with 0 <= r < R^m
    using standard Euclidean division-with-remainder. (You should
    have R^m already computed. Note that 0 <= q < R^m also,
    maintaining the invariant for the recursive calls.)

    -- If q = 0 and lzp is false, then perform convert(r, m/2,
    false). Otherwise, perform convert(q, m/2, lzp) and
    convert(r, m/2, true).

    As I mentioned, this also has the benefit of producing output in
    approximately the right order. If you watch it work on a large number,
    you can see it pause at various points, most notably halfway through.

    > If you know you want to convert the result to decimal (and perhaps
    > that you don't want to do much else with it), one thing you can do is
    > handle the trailing zeroes specially.


    [...]

    > Any prime P appears in the prime factorization of N!
    >
    > N/p + N/p^2 + N/p^3 + ...
    >
    > times (using integer division with truncation),


    Yes. We can determine an upper bound on this simply by ignoring the
    truncation and computing the limit (since p > 1, this is convergent): we
    get N/(p - 1). Note that this is linear in N, whereas the length of N!
    is bounded below (Ramanujan's approximation) by a multiple of N (log N -
    1). Therefore as N gets larger, this trick becomes slowly less
    effective. It's a shame, because it's a nice analysis.

    -- [mdw]
     
    Mark Wooding, Nov 27, 2010
    #19
  20. Eyegor

    BartC Guest

    Re: [OT] Re: Long number base conversion

    "Eric Sosman" <> wrote in message
    news:icr452$boe$-september.org...
    > On 11/26/2010 9:07 PM, BartC wrote:
    >> "Mark Wooding" <> wrote in message
    >> news:...
    >>> "BartC" <> writes:
    >>>
    >>>> How long did it take? My slow library took 90 minutes just to work out
    >>>> 100,000! (and you can read that exclamation mark either way...)
    >>>
    >>> Ouch. Mine can compute 100000! in about 2 seconds -- but it takes
    >>> almost 20s to convert the result to decimal. I compute 1000000! (and


    >> I've got it down .... 3500 seconds for 1000000!


    (/Update/: down to 2200 seconds by changing the way the result is copied at
    each step.)

    >>> output the result in hex, but that's trivial) in 1m 40s; decimal output
    >>> takes much longer: almost 50 minutes.


    > I hope you're getting more than one decimal digit per division.
    > For example, dividing by 1000000000 gives you nine decimal digits at
    > a whack in a form that fits easily in a 32-bit integer.


    Actually the same technique can be used for the OP's problem of reading
    decimal numbers into a binary big number; digits are read in groups of 9
    using ordinary arithmetic, before adding into the big number.

    (That is, if input numbers are going to be large; usually the output is the
    problem.)

    >> This is using the basic algorithms for multiplication and factorials. I


    > If you know you want to convert the result to decimal (and perhaps
    > that you don't want to do much else with it), one thing you can do is
    > handle the trailing zeroes specially. As you're plowing along building
    > up 1 * 2 * 3 * ... you can remove all the 2's and 5's from each new
    > factor and count how many you've removed. Then at the end you'll have


    You mean that in 11! * 12, the 12 becomes 6, and I add one to the number of
    two's?

    What about *25 which becomes *5, do I apply the rule again to get *1?

    (I've just tried this, and unless I made a mistake (which is quite likely),
    I found it difficult to measure the difference on 100000!, and that's
    without doing the adjustments at the end.)

    --
    Bartc
     
    BartC, Nov 27, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Marsaglia

    Assigning unsigned long to unsigned long long

    George Marsaglia, Jul 8, 2003, in forum: C Programming
    Replies:
    1
    Views:
    747
    Eric Sosman
    Jul 8, 2003
  2. Daniel Rudy

    unsigned long long int to long double

    Daniel Rudy, Sep 19, 2005, in forum: C Programming
    Replies:
    5
    Views:
    1,245
    Peter Shaggy Haywood
    Sep 20, 2005
  3. Mathieu Dutour

    long long and long

    Mathieu Dutour, Jul 17, 2007, in forum: C Programming
    Replies:
    4
    Views:
    516
    santosh
    Jul 24, 2007
  4. Replies:
    4
    Views:
    493
    Zeppe
    Sep 12, 2008
  5. chen li
    Replies:
    6
    Views:
    150
    Giles Bowkett
    Jan 23, 2007
Loading...

Share This Page