Which is a better way to implement this algorithm?

Discussion in 'C++' started by mike3, Apr 20, 2008.

  1. mike3

    mike3 Guest

    Hi.
    (Xposted to both comp.lang.c++ and comp.programming since I've got
    questions related to both C++ language and general programming)

    I've got the following C++ code. The first routine runs in like 65% of
    the time of the second routine. Yet both do the same thing. However,
    the second one seems better in terms of the way the code is written
    since it helps encapsulate the transformation in the inner loop better
    making it easier to read, at least in my opinion. Maybe it's not,
    maybe the former is "better" that way and I should go with it, but if
    the latter is "better" in that respect should I just ditch it anyway
    and tradeoff for performance since I want this thing to be fast???

    What each routine does is multiply two arbitrary-precision integers
    together. The second one though uses an additional "slice" type that
    provides a window enabling the "multiply and add" operation to be
    performed on a limited range of digits, which can then be advanced
    across the number, making clear that part of the algorithn.

    I'm using the simple "grade school" multiply algorithm. Note how the
    second routine more easily outlines this algorithm, while in the first
    it is a little more difficult to see. Which would you prefer, exactly?

    For brevity, class definitions and other members have been omitted.
    However it should not be too difficult to figure out the necessary
    information.

    Also, could I lose the typecasts in this code or do I need them?

    The reason why I'm asking is because I remembered getting told earlier
    by someone here (Alf P. Steinbach, group: comp.lang.c++) about how my
    last implementation of my program (this is for a fractal generator)
    had some sort of bad, bug-inducing "abstraction gap" yet I was unable
    to get enough elaboration responses about it so I've been more or less
    shooting around trying to figure out how to make it better (although
    he did give me some *examples* of where there were problems, which I
    have since fixed.). But it seems I'm losing performance and that's not
    a good thing for my application. Not to mention also that my original
    bignum implementation was called "silly" as well("...although I didn't
    look at the silly bignum class, whatever it's fault is..." ref:
    http://groups.google.com/group/comp.lang.c /msg/ac8df038d0b8dab9?dmode=source)
    without any explanation as to what exactly was so silly about it. So I
    had to start dark-shooting there too trying to figure out what I had
    done wrong.

    First version (fastest):
    ---
    void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    {
    /* Check to see if we're doing an in-place multiply */
    if(&a == this)
    {
    MulInPlace(b);
    return;
    } else if(&b == this) {
    MulInPlace(a);
    return;
    }

    /* Get lengths. */
    std::size_t rLength(GetLength());
    std::size_t aLength(a.GetLength());
    std::size_t bLength(b.GetLength());

    /* Zeroize this. */
    Zeroize();

    /* Do the multiplication. */
    TwoDigit64 carry(0);
    for(std::size_t i(0);i<aLength && i<rLength;i++)
    {
    carry = 0;

    for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
    {
    TwoDigit64
    tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
    digits[i+j] + carry);
    carry = tmp >> DIGIT_BITS;
    tmp -= carry << DIGIT_BITS;
    digits[i+j] = tmp;
    }

    if(i+bLength < rLength)
    digits[i+bLength] = carry;
    }
    }
    ---

    Second version (slower):
    ---
    *** Slice manipulation.
    inline Digit32 MulAddOp(Digit32 a, Digit32 b, Digit32 c, Digit32
    &carry)
    {
    TwoDigit64 sum(a + (static_cast<TwoDigit64>(b)*c) + carry);
    carry = sum >> DIGIT_BITS;
    return(sum & MAX_DIGIT);
    }

    inline Digit32 MulAddCarryPropOpL(Digit32 a, Digit32 &carry)
    {
    Digit32 sum(a + carry);
    carry = sum < carry;
    return(sum);
    }

    inline Digit32 MulAddCarryPropOpR(Digit32 b, Digit32 c, Digit32
    &carry)
    {
    TwoDigit64 sum((static_cast<TwoDigit64>(b)*c) + carry);
    carry = sum >> DIGIT_BITS;
    return(sum & MAX_DIGIT);
    }

    Digit32 RawDigitSlice::MulAdd(const ConstRawDigitSlice &a,
    const Digit32 b,
    const ConstRawDigitSlice &c)
    {
    /* Set up iterators */
    DigitIterator32 ri(GetStartIterator());
    ConstDigitIterator32 ai(a.GetConstStartIterator());
    ConstDigitIterator32 ci(c.GetConstStartIterator());

    /* Get minimum length of a and c. */
    std::size_t minLength(std::min(std::min(a.length, c.length),
    length));

    /* Do the combined multiply + add */
    Digit32 carry(0);
    std::size_t i(0);
    for(i;i<minLength;++i,++ri,++ai,++ci)
    *ri = MulAddOp(*ai, b, *ci, carry);

    /* Handle the remaining part(s) of a or b. */
    int largerLength(std::min(std::max(a.length, c.length), length));
    if(a.length >= c.length)
    {
    for(i;i<largerLength;++i,++ri,++ai)
    *ri = MulAddCarryPropOpL(*ai, carry);
    } else {
    for(i;i<largerLength;++i,++ri,++ci)
    *ri = MulAddCarryPropOpR(b, *ci, carry);
    }

    /* Finish carry propagation */
    if(largerLength < length)
    {
    for(i;i<length;++i,++ri)
    *ri = MulAddCarryPropOpL(0, carry);
    }

    /* Done! */
    return(carry);
    }

    *** This next routine is in a different source file, the one
    implementing the full "RawDigit" class.
    *** The former would be in another file that implements the
    "RawDigitSlice" class.
    void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    {
    /* Check to see if we're doing an in-place multiply */
    if(&a == this)
    {
    MulInPlace(b);
    return;
    } else if(&b == this) {
    MulInPlace(a);
    return;
    }

    /* Get lengths. */
    std::size_t rLength(GetLength());
    std::size_t aLength(a.GetLength());
    std::size_t bLength(b.GetLength());

    /* Zeroize this. */
    Zeroize();

    /* Set up slices. */
    RawDigitSlice runningSum(*this, 0, aLength + 1); /* explanation:
    (<RawDigit object>, <origin digit idx>, <nbr of digits in slice>) */
    ConstRawDigitSlice as(a);

    /* Do the multiplication. */
    for(std::size_t i(0);i<bLength && i<rLength;++i,++runningSum)
    acc.MulAdd(runningSum, b, as);
    }
    ---
     
    mike3, Apr 20, 2008
    #1
    1. Advertising

  2. "mike3" <> wrote in message
    news:...
    : Hi.
    : (Xposted to both comp.lang.c++ and comp.programming since I've got
    : questions related to both C++ language and general programming)
    :
    : I've got the following C++ code. The first routine runs in like 65% of
    : the time of the second routine. Yet both do the same thing. However,
    : the second one seems better in terms of the way the code is written
    : since it helps encapsulate the transformation in the inner loop better
    : making it easier to read, at least in my opinion. Maybe it's not,
    : maybe the former is "better" that way and I should go with it, but if
    : the latter is "better" in that respect should I just ditch it anyway
    : and tradeoff for performance since I want this thing to be fast???
    As a first-time reader of both implementations, I find the first one
    much easier to understand and maintain than the second one. Adding
    levels of abstractions without a clear benefit only obfuscates code.

    : I'm using the simple "grade school" multiply algorithm. Note how the
    : second routine more easily outlines this algorithm, while in the first
    : it is a little more difficult to see. Which would you prefer, exactly?
    The first one.

    : Also, could I lose the typecasts in this code or do I need them?
    I'll comment & review the first function below.
    At some point, you do need to (at least implicitly) cast
    an operand to the larger type, as this might not happen
    automatically for some combinations of types, in C++.

    : First version (fastest):
    : ---
    : void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    : {
    : /* Check to see if we're doing an in-place multiply */
    : if(&a == this)
    : {
    : MulInPlace(b);
    : return;
    : } else if(&b == this) {
    : MulInPlace(a);
    : return;
    : }
    General class design: unless there is an established reason
    not to do so, I would use operator overloading and implement
    (as members of the Num class):
    static Num operator *( const Num& a, const Num& b );
    Num& operator *( const Num& a );
    Maybe RawDigit::Mul is an internal private member, and the
    above operations are provided? But then the special cases
    (e.g. multiply in place) would best be handled in the
    layers above...

    : /* Get lengths. */
    : std::size_t rLength(GetLength());
    : std::size_t aLength(a.GetLength());
    : std::size_t bLength(b.GetLength());
    :
    : /* Zeroize this. */
    : Zeroize();

    Is the intent truly to implement modulo math, and to allow
    the result to be truncated? If not the result should be
    resized automatically, and the code is simplified.

    : /* Do the multiplication. */
    : TwoDigit64 carry(0);
    Technically, the carry should only require 1 digit,
    and can be defined within the outer loop below.

    : for(std::size_t i(0);i<aLength && i<rLength;i++)
    : {
    : carry = 0;
    :
    : for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
    : {
    : TwoDigit64
    : tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
    : digits[i+j] + carry);
    : carry = tmp >> DIGIT_BITS;
    : tmp -= carry << DIGIT_BITS;
    : digits[i+j] = tmp;
    : }
    :
    : if(i+bLength < rLength)
    : digits[i+bLength] = carry;
    : }
    : }

    Let's say that you have the following digit-related
    definitions within your class:
    typedef ... Digit;
    typedef ... TwoDigit;
    const unsigned digitBitCount = ...;
    const Digit digitBitMask = 0xF...UL;

    Assuming no truncation (because of adequate pre-allocation
    of result digits), the same algorithm can be written as:

    for( unsigned aPos = 0 ; aPos<aLength ; ++aPos )
    {
    Digit carry = 0;
    TwoDigit const aDig = a.digits[aPos]; //NB: cast "hidden" here
    for( unsigned bPos = 0 ; bPos<bLength ; ++bPos )
    {
    TwoDigit mul = aDig * b.digits[bPos]
    + this->digits[aPos+bPos]
    + carry;

    this->digits[aPos+bPos] = mul & digitBitMask;
    carry = ( mul >> digitBitCount );
    }
    this->digits[aPos+bPos] = carry;
    }

    There are many correct ways to write this, and some are
    probably better is some ways than this example. But I
    hope that you will find it useful.
    In any case, given the very acceptable complexity of this loop,
    I would not bother breaking it up or adding layers of
    encapsulation.

    Regards -Ivan
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Apr 20, 2008
    #2
    1. Advertising

  3. mike3

    Guest

    On 20 Apr., 08:26, mike3 <> wrote:
    > Hi.
    > (Xposted to both comp.lang.c++ and comp.programming since I've got
    > questions related to both C++ language and general programming)
    >
    > I've got the following C++ code. The first routine runs in like 65% of
    > the time of the second routine. Yet both do the same thing. However,
    > the second one seems better in terms of the way the code is written
    > since it helps encapsulate the transformation in the inner loop better
    > making it easier to read, at least in my opinion. Maybe it's not,
    > maybe the former is "better" that way and I should go with it, but if
    > the latter is "better" in that respect should I just ditch it anyway
    > and tradeoff for performance since I want this thing to be fast???
    >
    > What each routine does is multiply two arbitrary-precision integers
    > together. The second one though uses an additional "slice" type that
    > provides a window enabling the "multiply and add" operation to be
    > performed on a limited range of digits, which can then be advanced
    > across the number, making clear that part of the algorithn.
    >
    > I'm using the simple "grade school" multiply algorithm. Note how the
    > second routine more easily outlines this algorithm, while in the first
    > it is a little more difficult to see. Which would you prefer, exactly?
    >
    > For brevity, class definitions and other members have been omitted.
    > However it should not be too difficult to figure out the necessary
    > information.
    >
    > Also, could I lose the typecasts in this code or do I need them?
    >
    > The reason why I'm asking is because I remembered getting told earlier
    > by someone here (Alf P. Steinbach, group: comp.lang.c++) about how my
    > last implementation of my program (this is for a fractal generator)
    > had some sort of bad, bug-inducing "abstraction gap" yet I was unable
    > to get enough elaboration responses about it so I've been more or less
    > shooting around trying to figure out how to make it better (although
    > he did give me some *examples* of where there were problems, which I
    > have since fixed.). But it seems I'm losing performance and that's not
    > a good thing for my application. Not to mention also that my original
    > bignum implementation was called "silly" as well("...although I didn't
    > look at the silly bignum class, whatever it's fault is..." ref:http://groups.google.com/group/comp.lang.c /msg/ac8df038d0b8dab9?dmo...)
    > without any explanation as to what exactly was so silly about it. So I
    > had to start dark-shooting there too trying to figure out what I had
    > done wrong.
    >
    > First version (fastest):
    > ---
    > void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    > {
    > /* Check to see if we're doing an in-place multiply */
    > if(&a == this)
    > {
    > MulInPlace(b);
    > return;
    > } else if(&b == this) {
    > MulInPlace(a);
    > return;
    > }
    >
    > /* Get lengths. */
    > std::size_t rLength(GetLength());
    > std::size_t aLength(a.GetLength());
    > std::size_t bLength(b.GetLength());
    >
    > /* Zeroize this. */
    > Zeroize();
    >
    > /* Do the multiplication. */
    > TwoDigit64 carry(0);
    > for(std::size_t i(0);i<aLength && i<rLength;i++)
    > {
    > carry = 0;
    >
    > for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
    > {
    > TwoDigit64
    > tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
    > digits[i+j] + carry);
    > carry = tmp >> DIGIT_BITS;
    > tmp -= carry << DIGIT_BITS;
    > digits[i+j] = tmp;
    > }
    >
    > if(i+bLength < rLength)
    > digits[i+bLength] = carry;
    > }}
    >

    [snip Second version (slower)]

    I consider the first version easier to read. I prefer to see the
    main algorithm at one page instead of "millions of small
    methods". What I am missing is:
    - The signs. All your values seem to be unsigned.
    Do you want to use two's complement representation or
    sign + magnitude.
    - The management of the size. The user of the functions
    should not be bothered with resizing the big integers.

    Did you know that big integer libraries already exist. Some
    people have taken the burden to write a library for big integer
    functions. For example: Me.
    If you download Seed7
    (http://sourceforge.net/project/showfiles.php?group_id=151126)
    you will see that the file seed7/src/big_rtl.c contains a bigInteger
    library written in C. This library manages the memory for the
    digits automatically and contains the usual arithmetic operations
    (inclusive two forms of bigInteger division and remainder which
    trunc towards zero or minus infinite). The multiplication uses
    the Karatsuba algorithm when possible.
    BTW.: I plan to do an improved release today (By coincidence I
    did several improvements for bigInteger's). If you wait for approx.
    12 hours you can get the new version.

    Greetings Thomas Mertes

    Seed7 Homepage: http://seed7.sourceforge.net
    Seed7 - The extensible programming language: User defined statements
    and operators, abstract data types, templates without special
    syntax, OO with interfaces and multiple dispatch, statically typed,
    interpreted or compiled, portable, runs under linux/unix/windows.
     
    , Apr 20, 2008
    #3
  4. mike3

    mike3 Guest

    On Apr 20, 1:43 am, "Ivan Vecerina"
    <> wrote:
    > "mike3" <> wrote in message
    >
    > news:...
    > : Hi.
    > : (Xposted to both comp.lang.c++ and comp.programming since I've got
    > : questions related to both C++ language and general programming)
    > :
    > : I've got the following C++ code. The first routine runs in like 65% of
    > : the time of the second routine. Yet both do the same thing. However,
    > : the second one seems better in terms of the way the code is written
    > : since it helps encapsulate the transformation in the inner loop better
    > : making it easier to read, at least in my opinion. Maybe it's not,
    > : maybe the former is "better" that way and I should go with it, but if
    > : the latter is "better" in that respect should I just ditch it anyway
    > : and tradeoff for performance since I want this thing to be fast???
    > As a first-time reader of both implementations, I find the first one
    > much easier to understand and maintain than the second one.  Adding
    > levels of abstractions without a clear benefit only obfuscates code.
    >


    OK, then I'll go for the first. Especially considering it's faster...

    > : I'm using the simple "grade school" multiply algorithm. Note how the
    > : second routine more easily outlines this algorithm, while in the first
    > : it is a little more difficult to see. Which would you prefer, exactly?
    > The first one.
    >


    Hmm.

    > : Also, could I lose the typecasts in this code or do I need them?
    > I'll comment & review the first function below.
    > At some point, you do need to (at least implicitly) cast
    > an operand to the larger type, as this might not happen
    > automatically for some combinations of types, in C++.
    >


    So then in this case a cast is OK, right?

    > : First version (fastest):
    > : ---
    > : void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    > : {
    > :    /* Check to see if we're doing an in-place multiply */
    > :    if(&a == this)
    > :    {
    > :      MulInPlace(b);
    > :      return;
    > :    } else if(&b == this) {
    > :      MulInPlace(a);
    > :      return;
    > :    }
    > General class design: unless there is an established reason
    > not to do so, I would use operator overloading and implement
    > (as members of the Num class):
    >   static  Num   operator *( const Num& a, const Num& b );
    >           Num&  operator *( const Num& a );
    > Maybe RawDigit::Mul is an internal private member, and the
    > above operations are provided?  But then the special cases
    > (e.g. multiply in place) would best be handled in the
    > layers above...
    >


    I've tried overlaoded operators but one seems to have to construct
    a temporary copy to hold results to return in the case of the binary
    operators. This is a big problem for my app. as I have performance
    concerns. Deallocating and reallocating memory billions of times
    is a serious waste of performance. (These routines will be called
    billions of times.)

    Therefore I do the opposite and build these low-level routines first,
    then build the overloaded operators using them, which can then be
    used in all non-performance-critical areas of the program. If it turns
    out that I don't need the MulInPlace() functionality to be integrated
    with Mul(), I'll just remove it and document that Mul() does not
    work in-place. Is that acceptable practice?

    > :    /* Get lengths. */
    > :    std::size_t rLength(GetLength());
    > :    std::size_t aLength(a.GetLength());
    > :    std::size_t bLength(b.GetLength());
    > :
    > :    /* Zeroize this. */
    > :    Zeroize();
    >
    > Is the intent truly to implement modulo math, and to allow
    > the result to be truncated? If not the result should be
    > resized automatically, and the code is simplified.
    >


    So I can omit this functionality and just document that the
    routine doesn't behave as most would expect? (If I have operands
    of 3 differing lengths I'd usually expect a Mul() to respect
    those lengths but then again your opinion may differ.)

    Also the above just does nothing but clear the result buffer,
    which you need as we're going to add (as in arithmetical addition)
    numbers to it. You need that for the multiply algorithm to work.

    And you need to get the lengths of the input operands because
    you're going to loop over the digits, no? And you don't want to
    read outside the buffer or read too few digits? I think you would,
    but then again maybe I'm wrong...

    > :    /* Do the multiplication. */
    > :    TwoDigit64 carry(0);
    > Technically, the carry should only require 1 digit,
    > and can be defined within the outer loop below.
    >


    But I thought then that would create a performance-wasting conversion
    to/from 32/64 again and again.

    > :    for(std::size_t i(0);i<aLength && i<rLength;i++)
    > :    {
    > :       carry = 0;
    > :
    > :       for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
    > :       {
    > :          TwoDigit64
    > : tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
    > :                         digits[i+j] + carry);
    > :          carry = tmp >> DIGIT_BITS;
    > :          tmp -= carry << DIGIT_BITS;
    > :          digits[i+j] = tmp;
    > :       }
    > :
    > :       if(i+bLength < rLength)
    > :         digits[i+bLength] = carry;
    > :    }
    > : }
    >
    > Let's say that you have the following digit-related
    > definitions within your class:
    >   typedef ...  Digit;
    >   typedef ...  TwoDigit;
    >   const unsigned digitBitCount = ...;
    >   const Digit digitBitMask = 0xF...UL;
    >
    > Assuming no truncation (because of adequate pre-allocation
    > of result digits), the same algorithm can be written as:
    >
    >  for( unsigned aPos = 0 ; aPos<aLength ; ++aPos )
    >  {
    >     Digit carry = 0;
    >     TwoDigit const aDig = a.digits[aPos]; //NB: cast "hidden" here
    >     for( unsigned bPos = 0 ; bPos<bLength ; ++bPos )
    >     {
    >         TwoDigit mul = aDig * b.digits[bPos]
    >                      + this->digits[aPos+bPos]
    >                      + carry;
    >
    >         this->digits[aPos+bPos] =   mul &  digitBitMask;
    >         carry                   = ( mul >> digitBitCount );
    >     }
    >     this->digits[aPos+bPos] = carry;
    >  }
    >
    > There are many correct ways to write this, and some are
    > probably better is some ways than this example.  But I
    > hope that you will find it useful.
    > In any case, given the very acceptable complexity of this loop,
    > I would not bother breaking it up or adding layers of
    > encapsulation.
    >


    OK, then.
     
    mike3, Apr 20, 2008
    #4
  5. mike3

    mike3 Guest

    On Apr 20, 2:22 am, wrote:
    > On 20 Apr., 08:26, mike3 <> wrote:

    <snip>
    > > First version (fastest):
    > > ---
    > > void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
    > > {
    > >     /* Check to see if we're doing an in-place multiply */
    > >     if(&a == this)
    > >     {
    > >       MulInPlace(b);
    > >       return;
    > >     } else if(&b == this) {
    > >       MulInPlace(a);
    > >       return;
    > >     }

    >
    > >     /* Get lengths. */
    > >     std::size_t rLength(GetLength());
    > >     std::size_t aLength(a.GetLength());
    > >     std::size_t bLength(b.GetLength());

    >
    > >     /* Zeroize this. */
    > >     Zeroize();

    >
    > >     /* Do the multiplication. */
    > >     TwoDigit64 carry(0);
    > >     for(std::size_t i(0);i<aLength && i<rLength;i++)
    > >     {
    > >        carry = 0;

    >
    > >        for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
    > >        {
    > >           TwoDigit64
    > > tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
    > >                          digits[i+j] + carry);
    > >           carry = tmp >> DIGIT_BITS;
    > >           tmp -= carry << DIGIT_BITS;
    > >           digits[i+j] = tmp;
    > >        }

    >
    > >        if(i+bLength < rLength)
    > >          digits[i+bLength] = carry;
    > >     }}

    >
    > [snip Second version (slower)]
    >
    > I consider the first version easier to read. I prefer to see the
    > main algorithm at one page instead of "millions of small
    > methods". What I am missing is:
    > - The signs. All your values seem to be unsigned.
    >   Do you want to use two's complement representation or
    >   sign + magnitude.


    Digits are usually not signed, this is not a balanced-radix
    representation. This is simple radix-2^32 arithmetic.

    This is also a raw unsigned number type that just allows
    basic digit manipulations. I'm going to use it to build a
    floating point type on top.

    > - The management of the size. The user of the functions
    >   should not be bothered with resizing the big integers.
    >


    See above. This is a raw base type. In the full application
    I do not need automatic resizes (since with floating point one
    usually truncates or rounds the result from multiplication
    anyway), plus they would hinder the performance and I need
    lots of that latter stuff.

    > Did you know that big integer libraries already exist. Some
    > people have taken the burden to write a library for big integer
    > functions. For example: Me.


    Actually, I've looked at other packages when trying to figure out
    how to come up with mine.

    The reason for producing my own package was primarily due to
    licensing issues, since I do not know how I'm going to license
    the software when I decide to release it for distribution. If that
    was not a concern I would have just used someone else's package.
    I could always switch to the use of a different package in the
    future.

    Plus it's fun to program it and I can learn more about programming
    this way.

    > If you download Seed7
    > (http://sourceforge.net/project/showfiles.php?group_id=151126)
    > you will see that the file seed7/src/big_rtl.c contains a bigInteger
    > library written in C. This library manages the memory for the
    > digits automatically and contains the usual arithmetic operations
    > (inclusive two forms of bigInteger division and remainder which
    > trunc towards zero or minus infinite). The multiplication uses
    > the Karatsuba algorithm when possible.


    I'm curious: Is that Karatsuba D&C (divide & conquer) method
    faster even on smaller number sizes like 512 bits or so? Because
    if so I'm thinking of implementing that in my program as well.

    Also, I'm writing this with C++, not C, which means that looking
    at a C package may be less than helpful.

    > BTW.: I plan to do an improved release today (By coincidence I
    > did several improvements for bigInteger's). If you wait for approx.
    > 12 hours you can get the new version.
    >
     
    mike3, Apr 20, 2008
    #5
  6. mike3

    mike3 Guest

    On Apr 20, 1:43 am, "Ivan Vecerina"
    <> wrote:
    > "mike3" <> wrote in message

    <snip>
    > Assuming no truncation (because of adequate pre-allocation
    > of result digits), the same algorithm can be written as:
    >
    >  for( unsigned aPos = 0 ; aPos<aLength ; ++aPos )
    >  {
    >     Digit carry = 0;
    >     TwoDigit const aDig = a.digits[aPos]; //NB: cast "hidden" here
    >     for( unsigned bPos = 0 ; bPos<bLength ; ++bPos )
    >     {
    >         TwoDigit mul = aDig * b.digits[bPos]
    >                      + this->digits[aPos+bPos]
    >                      + carry;
    >
    >         this->digits[aPos+bPos] =   mul &  digitBitMask;
    >         carry                   = ( mul >> digitBitCount );
    >     }
    >     this->digits[aPos+bPos] = carry;
    >  }
    >
    > There are many correct ways to write this, and some are
    > probably better is some ways than this example.  But I
    > hope that you will find it useful.
    > In any case, given the very acceptable complexity of this loop,
    > I would not bother breaking it up or adding layers of
    > encapsulation.
    >


    Unfortunately, I noticed this last implementation where the
    cast is "hidden" seems to lose performance. My benchmark
    returned 17 seconds for this vs 12 for a similar routine where the
    only difference is the cast is not "hidden" by assigning the digit
    to a TwoDigit64.

    I'm wondering if maybe this is because the compiler has done
    a 32x32 multiply when using the one-liner with internal cast, and
    returns 64-bit result, but when you try to "hide" the cast it tries
    to do a 64x32 or even 64x64 multiply since it sees one 64-bit
    operand coming in, which wastes time. I guess it depends on the
    optimizer. In the one with the cast the optimizer can "see" that
    both operands are 32-bit, so it could "know" to only do 32x32->64
    instead of 64x32->64 or 64x64->64 (mod mul by 2^64).

    Is this a good theory? Should I go with the casts?
     
    mike3, Apr 20, 2008
    #6
  7. "mike3" <> wrote in message
    news:...
    >Unfortunately, I noticed this last implementation where the
    >cast is "hidden" seems to lose performance. My benchmark
    >returned 17 seconds for this vs 12 for a similar routine where the
    >only difference is the cast is not "hidden" by assigning the digit
    >to a TwoDigit64.
    >
    >I'm wondering if maybe this is because the compiler has done
    >a 32x32 multiply when using the one-liner with internal cast, and
    >returns 64-bit result, but when you try to "hide" the cast it tries
    >to do a 64x32 or even 64x64 multiply since it sees one 64-bit
    >operand coming in, which wastes time. I guess it depends on the
    >optimizer. In the one with the cast the optimizer can "see" that
    >both operands are 32-bit, so it could "know" to only do 32x32->64
    >instead of 64x32->64 or 64x64->64 (mod mul by 2^64).
    >
    >Is this a good theory? Should I go with the casts?


    These things are optimizer- and platform-dependent, but
    yes, a "widened" multiplication is the likely culprit.
    If you care about such low-level optimizations, I would
    recommend learning to look at the assembly code (it is not
    that difficult to get to a level where you can look and
    see what your compiler does / where overheads come from).

    Although in general it is always advisable to first look
    for better algorithms (as these micro-optimizations can
    be platform-specific, and become redundant or even
    counter-productive in the future). For instance, a 64-bit
    platform might be able to store the 64-bit word into
    a local register, and execute the code I posted faster.


    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Apr 20, 2008
    #7
  8. mike3

    mike3 Guest

    On Apr 20, 2:39 pm, "Ivan Vecerina"
    <> wrote:
    > "mike3" <> wrote in message
    >
    > news:...
    >
    >
    >
    >
    >
    > >Unfortunately, I noticed this last implementation where the
    > >cast is "hidden" seems to lose performance. My benchmark
    > >returned 17 seconds for this vs 12 for a similar routine where the
    > >only difference is the cast is not "hidden" by assigning the digit
    > >to a TwoDigit64.

    >
    > >I'm wondering if maybe this is because the compiler has done
    > >a 32x32 multiply when using the one-liner with internal cast, and
    > >returns 64-bit result, but when you try to "hide" the cast it tries
    > >to do a 64x32 or even 64x64 multiply since it sees one 64-bit
    > >operand coming in, which wastes time. I guess it depends on the
    > >optimizer. In the one with the cast the optimizer can "see" that
    > >both operands are 32-bit, so it could "know" to only do 32x32->64
    > >instead of 64x32->64 or 64x64->64 (mod mul by 2^64).

    >
    > >Is this a good theory? Should I go with the casts?

    >
    > These things are optimizer- and platform-dependent, but
    > yes, a "widened" multiplication is the likely culprit.
    > If you care about such low-level optimizations, I would
    > recommend learning to look at the assembly code (it is not
    > that difficult to get to a level where you can look and
    > see what your compiler does / where overheads come from).
    >
    > Although in general it is always advisable to first look
    > for better algorithms (as these micro-optimizations can
    > be platform-specific, and become redundant or even
    > counter-productive in the future). For instance, a 64-bit
    > platform might be able to store the 64-bit word into
    > a local register, and execute the code I posted faster.
    >


    So what would be the best course of action in this case?
     
    mike3, Apr 20, 2008
    #8
  9. mike3

    mike3 Guest

    Questioning the design of the bignum package (was Re: Which is abetter way to implement this algorithm?)

    On Apr 20, 12:26 am, mike3 <> wrote:
    <snip>

    Anyway this discussion made me wonder whether or not the design of the
    package overall is so good. You can get a full copy of the parts made
    so far here:

    http://www.mediafire.com/?z2j9t2s9mv9

    A batch file is included to compile it with Microsoft's C++ compiler
    under Windows. Currently it only implements the "RawDigit" unsigned
    integer type, but is made to allow the building of a floating point
    package on top (which is why you'll find things like "AddRsh" and
    "SubRsh", which are necessary for floating point arithmetic.).

    Is the current design I have there good, or poor? How does it compare
    to my old implementation, which you can find here (see "mp" directory
    in unzipped file):

    http://www.mediafire.com/?cfmzd1y3yij

    I'd like some discussion on this as I'm really just shooting around
    consider the lack of specifics given in Alf's posts.
     
    mike3, Apr 21, 2008
    #9
  10. "mike3" <> wrote in message
    news:...
    On Apr 20, 2:39 pm, "Ivan Vecerina"
    <> wrote:
    :> Although in general it is always advisable to first look
    :> for better algorithms (as these micro-optimizations can
    :> be platform-specific, and become redundant or even
    :> counter-productive in the future). For instance, a 64-bit
    :> platform might be able to store the 64-bit word into
    :> a local register, and execute the code I posted faster.
    :>
    :
    :So what would be the best course of action in this case?

    If squeezing more performance matters, study literature and
    find a faster algorithm. From a vague memory of reading
    Knuth's TAOCP (was 20 years ago for me, as a teenager),
    I vaguely remember that there is a faster algorithm for
    multiplying multi-word integers, whose complexity is
    O( n^log2(3) ) = O( n^1.58 ) rather than O(n^2) as you
    do now. If your integers are large enough, this could
    be considered... ( quick google search -> for example
    http://en.wikipedia.org/wiki/Karatsuba_algorithm ).

    But large integer multiplications are commonly needed
    (math, cryptography, etc), and some have certainly been
    optimized in assembly language, or even SIMD-optimized.
    I'm not into this lately, but you will certainly find
    open source projects that do it. Or it might be possible
    to use the "Big Number" library subset from Intel's
    Integrated Performance Primitives. It depends...

    Good luck -Ivan

    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Apr 21, 2008
    #10
  11. mike3

    mike3 Guest

    On Apr 21, 7:26 am, "Ivan Vecerina"
    <> wrote:
    > "mike3" <> wrote in message
    >
    > news:...
    > On Apr 20, 2:39 pm, "Ivan Vecerina"<> wrote:
    >
    > :> Although in general it is always advisable to first look
    > :> for better algorithms (as these micro-optimizations can
    > :> be platform-specific, and become redundant or even
    > :> counter-productive in the future). For instance, a 64-bit
    > :> platform might be able to store the 64-bit word into
    > :> a local register, and execute the code I posted faster.
    > :>
    > :
    > :So what would be the best course of action in this case?
    >
    > If squeezing more performance matters, study literature and
    > find a faster algorithm. From a vague memory of reading
    > Knuth's TAOCP (was 20 years ago for me, as a teenager),
    > I vaguely remember that there is a faster algorithm for
    > multiplying multi-word integers, whose complexity is
    > O( n^log2(3) ) = O( n^1.58 )   rather than O(n^2) as you
    > do now.  If your integers are large enough, this could
    > be considered... ( quick google search -> for examplehttp://en.wikipedia.org/wiki/Karatsuba_algorithm).
    >


    Ah, Knuth... the book I've dreamed of having for a long time
    but haven't been able to get due to a lack of money and
    access to an academic library. I only got one look at it,
    and another opportunity has not yet presented itself... :(

    Anyway, I've heard of these algorithms before, and even used
    it once. There's also FFT (Fast Fourier Transform) as well,
    which is even better at O(n log(n)). The problem though is
    these fast algorithms do not work so well for the relatively
    small number sizes I am considering here as they have more
    overhead. Here I only need a few hundred bits. If I need
    more, I could implement the faster method and then switch to
    it when the numbers exceed a certain size threshold, but
    for that small-size case, "grade school" like I'm using may
    actually be best.

    I've done some before in my attempts to write a program
    to calculate digits of pi to a huge amount (something like
    a hundred million or so), although I haven't been able to solve
    the disk I/O problems in that arena (and lack of source code
    to the really disk-efficient programs is not exactly a boon,
    you know), where the data to be worked with is so huge we
    have to store on disk.

    What I've heard though from those dabblings is that the
    best algorithm to use goes like this:

    1. For little numbers, use Grade School.

    2. For bigger numbers, use Karatsuba.

    3. For really big numbers, use FFT.

    Since I'm in the regime of (1), I have to use (1) and if a
    typecast is needed to get my compiler to generate fast code,
    I'll do it. Either that or just recode the mult routine in
    ASM instead of C++. In ASM I can just get the 32x32->64 in
    one instruction. Although the X86's paucity of Registers is
    not exactly a nice thing when doing this stuff. I do actually
    have a 64-bit X86 processor in my machine which has more
    registers, but my development OS (Windows) is only 32 bits
    and hence can only make do with 32-bit X86. I do have 64-bit
    Solaris 10 UNIX though on my machine as well... but developing
    Windows programs under UNIX?! No way, esp. considering I
    wouldn't be able to execute the 64-bit code under Windows
    anyway...

    > But large integer multiplications are commonly needed
    > (math, cryptography, etc), and some have certainly been
    > optimized in assembly language, or even SIMD-optimized.
    > I'm not into this lately, but you will certainly find
    > open source projects that do it.  Or it might be possible
    > to use the "Big Number" library subset from Intel's
    > Integrated Performance Primitives. It depends...
    >


    Hmm. I'm thinking of taking a look over GMP's ASM soruce code
    for X86 to see if maybe there might be some hint in that as
    to how to do a fast ASM-based mul.

    But for the generic "C++" code I think I'll just stay with the
    typecast in there. Although I guess it's not a good idea to use
    typecasts all over the place, this is *not* "all over the place",
    this is a very tiny low-level core part of the code.
     
    mike3, Apr 22, 2008
    #11
  12. mike3

    mike3 Guest

    Re: Questioning the design of the bignum package (was Re: Which is abetter way to implement this algorithm?)

    On Apr 21, 1:39 am, mike3 <> wrote:
    > On Apr 20, 12:26 am, mike3 <> wrote:
    > <snip>
    >
    > Anyway this discussion made me wonder whether or not the design of the
    > package overall is so good. You can get a full copy of the parts made
    > so far here:
    >
    > http://www.mediafire.com/?z2j9t2s9mv9
    >
    > A batch file is included to compile it with Microsoft's C++ compiler
    > under Windows. Currently it only implements the "RawDigit" unsigned
    > integer type, but is made to allow the building of a floating point
    > package on top (which is why you'll find things like "AddRsh" and
    > "SubRsh", which are necessary for floating point arithmetic.).
    >
    > Is the current design I have there good, or poor? How does it compare
    > to my old implementation, which you can find here (see "mp" directory
    > in unzipped file):
    >
    > http://www.mediafire.com/?cfmzd1y3yij
    >
    > I'd like some discussion on this as I'm really just shooting around
    > consider the lack of specifics given in Alf's posts.


    Any answers? *Bumping also because of high spamlevel in this group*
     
    mike3, Apr 23, 2008
    #12
  13. Re: Questioning the design of the bignum package (was Re: Which is a better way to implement this algorithm?)

    "mike3" <> wrote in message
    news:...
    On Apr 21, 1:39 am, mike3 <> wrote:
    :> Is the current design I have there good, or poor? How does it
    :> compare to my old implementation, which you can find here
    :> (see "mp" directory in unzipped file):
    :>
    :> http://www.mediafire.com/?cfmzd1y3yij
    :>
    :> I'd like some discussion on this as I'm really just shooting around
    :> consider the lack of specifics given in Alf's posts.
    :
    :Any answers? *Bumping also because of high spamlevel in this group*

    I think the issue might be that it would take quite some work to
    review the design of the whole library you are pointing to.
    You may need to start by explaining in what way your library
    is superior to available open source packages with a license
    that is less restrictive than yours: http://gmplib.org/

    You might also get more responses in a forum dedicated to
    numerical methods.

    Good luck,
    Ivan
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Brainbench MVP for C++ <> http://www.brainbench.com
     
    Ivan Vecerina, Apr 24, 2008
    #13
  14. mike3

    mike3 Guest

    Re: Questioning the design of the bignum package (was Re: Which is abetter way to implement this algorithm?)

    On Apr 24, 2:46 pm, "Ivan Vecerina"
    <> wrote:
    > "mike3" <> wrote in message
    >
    > news:...
    > On Apr 21, 1:39 am, mike3 <> wrote:
    > :> Is the current design I have there good, or poor? How does it
    > :> compare to my old implementation, which you can find here
    > :> (see "mp" directory in unzipped file):
    > :>
    > :>http://www.mediafire.com/?cfmzd1y3yij
    > :>
    > :> I'd like some discussion on this as I'm really just shooting around
    > :> consider the lack of specifics given in Alf's posts.
    > :
    > :Any answers? *Bumping also because of high spamlevel in this group*
    >
    > I think the issue might be that it would take quite some work to
    > review the design of the whole library you are pointing to.


    I'd suppose it might. However, below, I'll give some more
    specific questions.

    > You may need to start by explaining in what way your library
    > is superior to available open source packages with a license
    > that is less restrictive than yours:  http://gmplib.org/
    >


    For one, it's not, nor does it have to be. This is NOT part of a
    competition to make the best bignum package on Earth. This is a way
    to get something I'll own all the copyrights to so I have _zero_
    licensing issues (I.e. I can use it in whatever programs I make
    and license _those_ programs under whatever licenses I please),
    plus I learn something more about programming, plus I also
    can have some fun.

    Anyway, the _specific_ questions I had about the design are:

    1. Since the goal here is to put together a floating point package,
    should I bother with that "RawDigit" raw-integer thing in there
    at all or just have "primitive" routines that manipulate digit
    vectors more directly, say that take in digit vectors plus a
    length parameter specifying how many digits we want to work on?
    But then I've been told in previous posts (and it is clarification
    of these previous posts that is why I keep coming back) that it
    is "better" not to do that because it is more difficult to
    implement boundschecking and other "safety" features. Although
    interestingly enough GMP seems to use the arrays+length-parameter
    approach (since you seem to advocate GMP do you think it's well-
    written?), however GMP is written in C and this is C++, where
    different rules apply, so I'm not sure if it's good practice to
    carry over that approach to C++, even with std::vector<> instead
    of a raw array in there.

    2. I noticed that for some reason, when I finally got some big
    floating-point stuff made (built off RawDigit and it's "slice"
    thing), the addition is like 3x slower than a straight-up RawDigit
    addition (even a RawDigit "AddRsh" operation that includes a right-
    shift like the floating addition does.) Is that unavoidable, or is
    it the consequence of all those abstractions (a profiler showed
    "RawDigitSlice" constructions taking up seemingly-significant
    time)?
    If so is it a good idea to drop some of those abstractions? But
    then
    I've been told how my first whack at this had a big "abstraction
    gap" and this makes code more bug-prone. Does this mean there is a
    tradeoff between bug-resistance of code and performance, and this
    tradeoff is inevitable and cannot be minimized?

    Do you, or someone else, have any thoughts or answers on these
    specific issues?
     
    mike3, Apr 25, 2008
    #14
  15. mike3

    mike3 Guest

    Re: Questioning the design of the bignum package (was Re: Which is abetter way to implement this algorithm?)

    On Apr 24, 2:59 pm, Razii <> wrote:
    > On Thu, 24 Apr 2008 22:46:21 +0200, "Ivan Vecerina"
    >
    > <> wrote:
    > >*Bumping also because of high spamlevel in this group*

    >
    > There is no such thing as "*bumping" in the USENET. How the threads
    > are sorted depend on the newsreader client and the setting. I didn't
    > see any "bumping"


    OK...
     
    mike3, Apr 25, 2008
    #15
  16. mike3

    mike3 Guest

    Would this be an acceptable approach? (was Re: Questioning the design

    On Apr 24, 6:57 pm, mike3 <> wrote:
    > On Apr 24, 2:46 pm, "Ivan Vecerina"
    >
    >
    >
    >
    >
    > <> wrote:
    > > "mike3" <> wrote in message

    >
    > >news:...
    > > On Apr 21, 1:39 am, mike3 <> wrote:
    > > :> Is the current design I have there good, or poor? How does it
    > > :> compare to my old implementation, which you can find here
    > > :> (see "mp" directory in unzipped file):
    > > :>
    > > :>http://www.mediafire.com/?cfmzd1y3yij
    > > :>
    > > :> I'd like some discussion on this as I'm really just shooting around
    > > :> consider the lack of specifics given in Alf's posts.
    > > :
    > > :Any answers? *Bumping also because of high spamlevel in this group*

    >
    > > I think the issue might be that it would take quite some work to
    > > review the design of the whole library you are pointing to.

    >
    > I'd suppose it might. However, below, I'll give some more
    > specific questions.
    >
    > > You may need to start by explaining in what way your library
    > > is superior to available open source packages with a license
    > > that is less restrictive than yours:  http://gmplib.org/

    >
    > For one, it's not, nor does it have to be. This is NOT part of a
    > competition to make the bestbignumpackage on Earth. This is a way
    > to get something I'll own all the copyrights to so I have _zero_
    > licensing issues (I.e. I can use it in whatever programs I make
    > and license _those_ programs under whatever licenses I please),
    > plus I learn something more about programming, plus I also
    > can have some fun.
    >
    > Anyway, the _specific_ questions I had about the design are:
    >
    > 1. Since the goal here is to put together a floating point package,
    >    should I bother with that "RawDigit" raw-integer thing in there
    >    at all or just have "primitive" routines that manipulate digit
    >    vectors more directly, say that take in digit vectors plus a
    >    length parameter specifying how many digits we want to work on?
    >    But then I've been told in previous posts (and it is clarification
    >    of these previous posts that is why I keep coming back) that it
    >    is "better" not to do that because it is more difficult to
    >    implement boundschecking and other "safety" features. Although
    >    interestingly enough GMP seems to use the arrays+length-parameter
    >    approach (since you seem to advocate GMP do you think it's well-
    >    written?), however GMP is written in C and this is C++, where
    >    different rules apply, so I'm not sure if it's good practice to
    >    carry over that approach to C++, even with std::vector<> instead
    >    of a raw array in there.
    >
    > 2. I noticed that for some reason, when I finally got some big
    >    floating-point stuff made (built off RawDigit and it's "slice"
    >    thing), the addition is like 3x slower than a straight-up RawDigit
    >    addition (even a RawDigit "AddRsh" operation that includes a right-
    >    shift like the floating addition does.) Is that unavoidable, or is
    >    it the consequence of all those abstractions (a profiler showed
    >    "RawDigitSlice" constructions taking up seemingly-significant
    > time)?
    >    If so is it a good idea to drop some of those abstractions? But
    > then
    >    I've been told how my first whack at this had a big "abstraction
    >    gap" and this makes code more bug-prone. Does this mean there is a
    >    tradeoff between bug-resistance of code and performance, and this
    >    tradeoff is inevitable and cannot be minimized?
    >
    > Do you, or someone else, have any thoughts or answers on these
    > specific issues?


    NEW:

    I thought of yet another approach to this problem. Would it be OK
    to have, say, a bunch of simple "primitive" routines that simply
    operate
    on vectors of bignum digits, assuming all are the same length,
    with explicitly passed length parameter? But instead of passing
    them a vector or an iterator to one they'd take a special "safe
    iterator" that has a boundscheck in it.

    Ex. I'd have routines like:

    // Compute v0 = v1 + v2.
    Digit32 FG3DDigitVector_Add(SafeIterator v0, SafeIterator v1,
    SafeIterator v2,
    std::size_t length)
    {
    ...
    }

    This keeps the primitives lean and mean since they don't need all
    that extra baggage checking various length combinations that
    may never even occur anyway just so the routines behave as
    expected (You would expect if you passed two parameters of different
    lengths the routine would respect that even if you aren't actually
    going to _do_ it, and making the routine not do so (eg. treat operands
    as being of the shortest length passed) just seems bad to me.
    Maybe I could be wrong.). And then we use these primitives to
    build integer/floating-point types that take care of all the
    mixed-length cases, sign handling, etc.

    This is much like the approach used in GMP and many other bignum
    packages, although since that's C not C++ they take raw pointers to
    arrays of digits vs. safe-iterators into vectors of digits, but
    still...

    So is this a good approach to use here or not, even though I'm
    using C++ not C?
     
    mike3, Apr 25, 2008
    #16
  17. Re: Would this be an acceptable approach? (was Re: Questioning the design of the bignum package (was Re: Which is a better way to implement this algorithm?))

    :"mike3" <> wrote in message
    news:...
    :On Apr 24, 6:57 pm, mike3 <> wrote:
    :> On Apr 24, 2:46 pm, "Ivan Vecerina"
    :> > You may need to start by explaining in what way your library
    :> > is superior to available open source packages with a license
    :> > that is less restrictive than yours: http://gmplib.org/
    :>
    :> For one, it's not, nor does it have to be. This is NOT part of a
    :> competition to make the bestbignumpackage on Earth. This is a way
    :> to get something I'll own all the copyrights to so I have _zero_
    :> licensing issues (I.e. I can use it in whatever programs I make
    :> and license _those_ programs under whatever licenses I please),
    :> plus I learn something more about programming, plus I also
    :> can have some fun.

    Sure. But understand this may limit the interest of others in
    your specific library ;)

    :> Anyway, the _specific_ questions I had about the design are:
    :>
    :> 1. Since the goal here is to put together a floating point package,
    :> should I bother with that "RawDigit" raw-integer thing in there
    :> at all or just have "primitive" routines that manipulate digit
    :> vectors more directly, say that take in digit vectors plus a
    :> length parameter specifying how many digits we want to work on?
    :> But then I've been told in previous posts (and it is clarification
    :> of these previous posts that is why I keep coming back) that it
    :> is "better" not to do that because it is more difficult to
    :> implement boundschecking and other "safety" features. Although
    :> interestingly enough GMP seems to use the arrays+length-parameter
    :> approach (since you seem to advocate GMP do you think it's well-
    :> written?), however GMP is written in C and this is C++, where
    :> different rules apply, so I'm not sure if it's good practice to
    :> carry over that approach to C++, even with std::vector<> instead
    :> of a raw array in there.

    There are no universal rules as to what levels of abstraction need
    to be implemented; like all interfaces, they need to be logical (to
    users), reusable, self-contained. High-level interfaces need to
    be safe and simple, but lower-level interfaces that are more
    complex or that leave some safety checks to the users may be needed.
    These low-level interfaces, for computational stuff, often may
    provide a C-like interface: either to facilitate use in a variety
    of contexts (by avoiding use of high-level types or specific
    memory management approaches); or to facilitate implementation
    in assembly code; or because different users have different
    expectations of what a well designed high-level C++ class is.

    I am not a user of GMP (not doing any high-precision numeric stuff,
    although I deal with other performance-sensitive computations).
    A quick check of GMP revealed interesting low-level primitives,
    which seem designed to provide a common high-performance back-bone,
    that can be optimized in assembly language for a given platform:
    http://gmplib.org/manual/Low_002dlevel-Functions.html#Low_002dlevel-Functions

    At the other end of the spectrum, C++ classes built on the under-
    lyig framework are also provided (I haven't checked these...).

    Given the purpose and scope of these libraries, both of the above
    design choices seem very valid.

    :> 2. I noticed that for some reason, when I finally got some big
    :> floating-point stuff made (built off RawDigit and it's "slice"
    :> thing), the addition is like 3x slower than a straight-up RawDigit
    :> addition (even a RawDigit "AddRsh" operation that includes a right-
    :> shift like the floating addition does.) Is that unavoidable, or is
    :> it the consequence of all those abstractions (a profiler showed
    :> "RawDigitSlice" constructions taking up seemingly-significant
    :> time)?
    :> If so is it a good idea to drop some of those abstractions? But
    :> then
    :> I've been told how my first whack at this had a big "abstraction
    :> gap" and this makes code more bug-prone. Does this mean there is a
    :> tradeoff between bug-resistance of code and performance, and this
    :> tradeoff is inevitable and cannot be minimized?
    :>
    :> Do you, or someone else, have any thoughts or answers on these
    :> specific issues?

    Abstractions have a performance cost; the compiler's optimizer isn't
    infinitely smart, even when you do your best to help it.
    In "early" C++ days, the "Stepanov score" on "abstraction pentaly"
    was a popular metric of how well compilers were able to optimize-
    out higher-level C++ constructs (use web search with the keywords
    if you wish to find out more). And then, you may hit performance
    problems in debug builds, where compiler optimizations aren't available.

    More generally, adding classes/functions means more complexity.
    So there is always a balancing to be done between the addition
    of targeted abstractions that allow re-use or simplify code,
    and the "fluff" abstractions that may never add any value.
    See also http://en.wikipedia.org/wiki/Anti-pattern

    :NEW:
    :
    :I thought of yet another approach to this problem. Would it be OK
    :to have, say, a bunch of simple "primitive" routines that simply
    :eek:perate
    :eek:n vectors of bignum digits, assuming all are the same length,
    :with explicitly passed length parameter? But instead of passing
    :them a vector or an iterator to one they'd take a special "safe
    :iterator" that has a boundscheck in it.
    <...>
    :This is much like the approach used in GMP and many other bignum
    :packages, although since that's C not C++ they take raw pointers to
    :arrays of digits vs. safe-iterators into vectors of digits, but
    :still...

    Safe-iterators in low-level code seems to imply repeated redundant
    safety checks. Fine in general; but since your stated goal is
    optimal performance, is this the preferred approach for you ?
    You may want to study GMP code or ask maintainers of the GMP team;
    it might be smart to try to learn from their experience.

    :So is this a good approach to use here or not, even though I'm
    :using C++ not C?

    C++ is a multi-paradigm language. I have learned C++ before C,
    and rarely ever wrote C programs; I'm a strict adept of RAII
    and object-oriented design and encapsulation using C++ classes.
    But I do use C-style interfaces in my code when appropriate
    (even though what I call a "C-style" interface might actually
    be parameterized as a template, like the functions
    in the standard <algorithm> header).


    Good luck -Ivan
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
     
    Ivan Vecerina, Apr 26, 2008
    #17
  18. Re: Questioning the design of the bignum package (was Re: Which is a better way to implement this algorithm?)

    "Razii" <> wrote in message
    news:...
    : On Thu, 24 Apr 2008 22:46:21 +0200, "Ivan Vecerina"
    : <> wrote:
    :
    : >*Bumping also because of high spamlevel in this group*

    Hey, hey, you are misattributing to me somethething that
    was written by mike3 !

    -Ivan
     
    Ivan Vecerina, Apr 26, 2008
    #18
  19. mike3

    mike3 Guest

    Re: Would this be an acceptable approach? (was Re: Questioning the

    On Apr 26, 12:37 am, "Ivan Vecerina"
    <> wrote:
    > :"mike3" <> wrote in message
    >
    > news:...
    > :On Apr 24, 6:57 pm, mike3 <> wrote:
    > :> On Apr 24, 2:46 pm, "Ivan Vecerina"
    > :> > You may need to start by explaining in what way your library
    > :> > is superior to available open source packages with a license
    > :> > that is less restrictive than yours:http://gmplib.org/
    > :>
    > :> For one, it's not, nor does it have to be. This is NOT part of a
    > :> competition to make the best bignum package on Earth. This is a way
    > :> to get something I'll own all the copyrights to so I have _zero_
    > :> licensing issues (I.e. I can use it in whatever programs I make
    > :> and license _those_ programs under whatever licenses I please),
    > :> plus I learn something more about programming, plus I also
    > :> can have some fun.
    >
    > Sure. But understand this may limit the interest of others in
    > your specific library ;)
    >


    The bignum package is not intended to be an end in and of itself,
    but is part of a larger program. So there is no reason to choose
    my package over any other if someone else wants to. If I decide
    on a Free/open-source distribution/licensing model for my program
    in the end, I suppose they could then rip out the package if they
    wanted and use it, but like you said they'd probably consider other
    options instead, however that is not relevant since someone
    using this bignum package in their program over GMP or something
    was never the intention. (And if I do go for Free/open-source
    distribution/licensing I might very well abandon this and swap it
    out for GMP or something. Yet the programming experience it gives
    is definitely a boon anyways.)

    My performance requirements are not determined by the performance of
    competing bignum packages, but by the performance of the competitors
    to the main program itself, since that is what I'd be distributing,
    not just a bignum package. So that will determine how fast the bignum
    package needs to be, as well as the rest of the program overall.

    > :> Anyway, the _specific_ questions I had about the design are:
    > :>
    > :> 1. Since the goal here is to put together a floating point package,
    > :> should I bother with that "RawDigit" raw-integer thing in there
    > :> at all or just have "primitive" routines that manipulate digit
    > :> vectors more directly, say that take in digit vectors plus a
    > :> length parameter specifying how many digits we want to work on?
    > :> But then I've been told in previous posts (and it is clarification
    > :> of these previous posts that is why I keep coming back) that it
    > :> is "better" not to do that because it is more difficult to
    > :> implement boundschecking and other "safety" features. Although
    > :> interestingly enoughGMPseems to use the arrays+length-parameter
    > :> approach (since you seem to advocateGMPdo you think it's well-
    > :> written?), howeverGMPis written in C and this is C++, where
    > :> different rules apply, so I'm not sure if it's good practice to
    > :> carry over that approach to C++, even with std::vector<> instead
    > :> of a raw array in there.
    >
    > There are no universal rules as to what levels of abstraction need
    > to be implemented; like all interfaces, they need to be logical (to
    > users), reusable, self-contained. High-level interfaces need to
    > be safe and simple, but lower-level interfaces that are more
    > complex or that leave some safety checks to the users may be needed.
    > These low-level interfaces, for computational stuff, often may
    > provide a C-like interface: either to facilitate use in a variety
    > of contexts (by avoiding use of high-level types or specific
    > memory management approaches); or to facilitate implementation
    > in assembly code; or because different users have different
    > expectations of what a well designed high-level C++ class is.
    >


    Well, I'm talking about the design of the low-level primitives here.
    (Eg. routines to perform arithmetic on strings of digits, regardless
    of whether those digits form an integer, fixed-point, or floating-
    point
    bignum.) I've just now gone toward the C-like interface for them
    (i.e. standalone functions not members of classes) since it seems
    faster and more flexible, as well as due to the

    Anyway, as I mention later, I do provide some safety in the low-level
    primitives (at least the generic C++ versions, not any assembler code)
    by using a "safe iterator" to access elements that has internal
    boundschecking that can be flipped off in release builds. This makes
    it easier to debug bignum implementations built using these
    primitives,
    and yet high performance (as fast as a raw array on my box) is
    achieved
    in releases.

    > I am not a user ofGMP(not doing any high-precision numeric stuff,
    > although I deal with other performance-sensitive computations).
    > A quick check of GMP revealed interesting low-level primitives,
    > which seem designed to provide a common high-performance back-bone,
    > that can be optimized in assembly language for a given platform:
    > http://gmplib.org/manual/Low_002dlevel-Functions.html#Low_002dlevel-F...
    >
    > At the other end of the spectrum, C++ classes built on the under-
    > lyig framework are also provided (I haven't checked these...).
    >
    > Given the purpose and scope of these libraries, both of the above
    > design choices seem very valid.
    >


    Ah, so I guess then it isn't necessarily a big problem to have
    "C-like"-interfaced primitives like that in my C++ program, then?

    > :> 2. I noticed that for some reason, when I finally got some big
    > :> floating-point stuff made (built off RawDigit and it's "slice"
    > :> thing), the addition is like 3x slower than a straight-up RawDigit
    > :> addition (even a RawDigit "AddRsh" operation that includes a right-
    > :> shift like the floating addition does.) Is that unavoidable, or is
    > :> it the consequence of all those abstractions (a profiler showed
    > :> "RawDigitSlice" constructions taking up seemingly-significant
    > :> time)?
    > :> If so is it a good idea to drop some of those abstractions? But
    > :> then
    > :> I've been told how my first whack at this had a big "abstraction
    > :> gap" and this makes code more bug-prone. Does this mean there is a
    > :> tradeoff between bug-resistance of code and performance, and this
    > :> tradeoff is inevitable and cannot be minimized?
    > :>
    > :> Do you, or someone else, have any thoughts or answers on these
    > :> specific issues?
    >
    > Abstractions have a performance cost; the compiler's optimizer isn't
    > infinitely smart, even when you do your best to help it.
    > In "early" C++ days, the "Stepanov score" on "abstraction pentaly"
    > was a popular metric of how well compilers were able to optimize-
    > out higher-level C++ constructs (use web search with the keywords
    > if you wish to find out more). And then, you may hit performance
    > problems in debug builds, where compiler optimizations aren't available.
    >
    > More generally, adding classes/functions means more complexity.
    > So there is always a balancing to be done between the addition
    > of targeted abstractions that allow re-use or simplify code,
    > and the "fluff" abstractions that may never add any value.
    > See alsohttp://en.wikipedia.org/wiki/Anti-pattern
    >


    So then would all those heavy abstractions in the little tiny low-
    level primitives like that be "fluff", esp. if they create unused
    code (like different-length-operands handling code just to "make
    more sense" given the object structure (each RawDigit object
    has it's own length) even if we never pass different-length
    operands in the routines we build on top of the primitives and
    can implement that different-length-operand case handling in
    those higher-level routines(*)) and eat at performance?

    I've noticed that for the simpler, "C-like"/gmpian primitive
    interface approach the primitives are much leaner and meaner.
    I prefer my code lean and mean vs fat and bloated with "cookies
    and cakes" to make it taste yummier. Lean and mean also means
    there are less places for bugs. Having lots of seldom-if-ever-used
    code creates "bug magnets". To me that makes it better. Don't
    you think?

    (*) Example: Addition of two integers of different length can
    be built from two all-same-length primitives, one of which
    adds digits and the other of which propagates a carry.

    > :NEW:
    > :
    > :I thought of yet another approach to this problem. Would it be OK
    > :to have, say, a bunch of simple "primitive" routines that simply
    > :eek:perate
    > :eek:n vectors ofbignumdigits, assuming all are the same length,
    > :with explicitly passed length parameter? But instead of passing
    > :them a vector or an iterator to one they'd take a special "safe
    > :iterator" that has a boundscheck in it.
    > <...>
    > :This is much like the approach used inGMPand many otherbignum
    > :packages, although since that's C not C++ they take raw pointers to
    > :arrays of digits vs. safe-iterators into vectors of digits, but
    > :still...
    >
    > Safe-iterators in low-level code seems to imply repeated redundant
    > safety checks. Fine in general; but since your stated goal is
    > optimal performance, is this the preferred approach for you ?
    > You may want to study GMP code or ask maintainers of the GMP team;
    > it might be smart to try to learn from their experience.
    >


    The reason I chose the safe-iterators for the primitive was to make
    debugging any higher-level stuff built on top of them easier, as
    some of that stuff involves element accesses that I have had
    occasionally go out of bounds in previous attempts at implementation,
    and tracking down those bugs was very difficult due to the lack of
    proper bounds-checking. In fact, all these threads of mine about
    implementing bignum packages ultimately started with a thread about
    a very, very difficult-to-find memory bug (that in the end turned
    out to have nothing to do with the bignum package anyway but it turned
    up design flaws in the program that frustrated the discovery of the
    bug's ultimate cause.). Also, since the safe-iterator can be converted
    to essentially a normal iterator in a release build with the
    appropriate
    commands to the preprocessor, there is no performance loss.

    (Safe-iterator does not quite behave the same as a normal iterator,
    though, e.x. you cannot construct a safe-iterator from a normal
    iterator
    since the normal iterator contains no knowledge of boundaries (barring
    non-portable implementation-specific stuff that is not good to rely
    on)
    without specifying the vector it belongs to. Although you can extract
    the normal iterator buried inside the safe iterator if you want it.)

    > :So is this a good approach to use here or not, even though I'm
    > :using C++ not C?
    >
    > C++ is a multi-paradigm language. I have learned C++ before C,
    > and rarely ever wrote C programs; I'm a strict adept of RAII
    > and object-oriented design and encapsulation using C++ classes.
    > But I do use C-style interfaces in my code when appropriate
    > (even though what I call a "C-style" interface might actually
    > be parameterized as a template, like the functions
    > in the standard <algorithm> header).
    >


    See, and this is where my dilemma occured: should I bother with
    the more "C-like" interface for the low-level primitive stuff, or
    should I go and throw on all that object-oriented abstraction
    (like in the code I posted where I had all those "RawDigit" and
    "slice" abstractions)? It's those abstraction penalties that I
    noticed seemed to be piling up and eating at my performance --
    and I wanted to get good performance here (the penalties were
    widdling away performance to saddening levels.).
     
    mike3, Apr 27, 2008
    #19
  20. mike3

    mike3 Guest

    New Discovery about performance of bignum package (was Re: Would this

    On Apr 27, 2:40 am, mike3 <> wrote:
    <snip>

    ADDENDUM: I have just discovered that in fact the
    "RawDigit"+"RawDigitSlice"
    approach (approach 1) in the posted bignum package is in fact not
    significantly
    if at all slower than the "safe-iterator"+"C-like primitives" approach
    (approach 2),
    after making a fresh implementation of the latter and reexamining my
    original
    implementation (I had discovered in fact I was measuring the latter
    wrong, and
    that oddly enough raw *pointers* and arrays seem a bit (but not too
    much -- like
    8 or 9 vs 6.5 sec.) faster than a vector. Hmm, why all the slamming
    then I got back in
    September of '07 for not using vector??? Or perhaps those small
    speedups are not
    worth the advantages of vector, like automatic memory allocation.
    Especially not
    when the generic C++ routines are just there as placeholders more or
    less for
    faster CPU-specific assembler code that would use pointers anyway). So
    now we
    have a toss-off here. Which should I go with? One thing I noticed
    though about
    approach 1 was that I ended up coding functionality I did not use, for
    example I
    put mixed-length-operand handling in there even though I did not need
    that in
    many of them given how they were used (I just put it in to "make
    sense" given that
    the operands can be of different lengths since they each have their
    own length field).
    Approach 2 created leaner primitives. Also, all that extra
    functionality makes it
    more difficult to set up assembler.

    However, when I got grilled back in September for making a big
    "abstraction gap",
    would that indicate that perhaps I should go with the approach 1
    anyway and have
    that extra "RawDigit" layer of abstraction and settle for the
    increased complexity/
    unused "sensemaking" code and difficulty of writing the assembler?
    I've come to
    a real tough spot here.
     
    mike3, Apr 29, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kevin
    Replies:
    8
    Views:
    528
    Nigel Wade
    Feb 27, 2006
  2. Blue Ocean
    Replies:
    14
    Views:
    566
    jeffc
    Jul 9, 2004
  3. Michael Yanowitz

    Is there a better way to implement this:

    Michael Yanowitz, Jan 22, 2007, in forum: Python
    Replies:
    3
    Views:
    280
    Peter Otten
    Jan 23, 2007
  4. david.karr
    Replies:
    23
    Views:
    2,164
    Arne Vajhøj
    Jan 3, 2010
  5. Replies:
    2
    Views:
    56
    Mark H Harris
    May 13, 2014
Loading...

Share This Page