M
mike3
Hi.
(Xposted to both comp.lang.c++ and comp.programming since I've got
questions related to both C++ language and general programming)
I've got the following C++ code. The first routine runs in like 65% of
the time of the second routine. Yet both do the same thing. However,
the second one seems better in terms of the way the code is written
since it helps encapsulate the transformation in the inner loop better
making it easier to read, at least in my opinion. Maybe it's not,
maybe the former is "better" that way and I should go with it, but if
the latter is "better" in that respect should I just ditch it anyway
and tradeoff for performance since I want this thing to be fast???
What each routine does is multiply two arbitrary-precision integers
together. The second one though uses an additional "slice" type that
provides a window enabling the "multiply and add" operation to be
performed on a limited range of digits, which can then be advanced
across the number, making clear that part of the algorithn.
I'm using the simple "grade school" multiply algorithm. Note how the
second routine more easily outlines this algorithm, while in the first
it is a little more difficult to see. Which would you prefer, exactly?
For brevity, class definitions and other members have been omitted.
However it should not be too difficult to figure out the necessary
information.
Also, could I lose the typecasts in this code or do I need them?
The reason why I'm asking is because I remembered getting told earlier
by someone here (Alf P. Steinbach, group: comp.lang.c++) about how my
last implementation of my program (this is for a fractal generator)
had some sort of bad, bug-inducing "abstraction gap" yet I was unable
to get enough elaboration responses about it so I've been more or less
shooting around trying to figure out how to make it better (although
he did give me some *examples* of where there were problems, which I
have since fixed.). But it seems I'm losing performance and that's not
a good thing for my application. Not to mention also that my original
bignum implementation was called "silly" as well("...although I didn't
look at the silly bignum class, whatever it's fault is..." ref:
http://groups.google.com/group/comp.lang.c++/msg/ac8df038d0b8dab9?dmode=source)
without any explanation as to what exactly was so silly about it. So I
had to start dark-shooting there too trying to figure out what I had
done wrong.
First version (fastest):
---
void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
{
/* Check to see if we're doing an in-place multiply */
if(&a == this)
{
MulInPlace(b);
return;
} else if(&b == this) {
MulInPlace(a);
return;
}
/* Get lengths. */
std::size_t rLength(GetLength());
std::size_t aLength(a.GetLength());
std::size_t bLength(b.GetLength());
/* Zeroize this. */
Zeroize();
/* Do the multiplication. */
TwoDigit64 carry(0);
for(std::size_t i(0);i<aLength && i<rLength;i++)
{
carry = 0;
for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
{
TwoDigit64
tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
digits[i+j] + carry);
carry = tmp >> DIGIT_BITS;
tmp -= carry << DIGIT_BITS;
digits[i+j] = tmp;
}
if(i+bLength < rLength)
digits[i+bLength] = carry;
}
}
---
Second version (slower):
---
*** Slice manipulation.
inline Digit32 MulAddOp(Digit32 a, Digit32 b, Digit32 c, Digit32
&carry)
{
TwoDigit64 sum(a + (static_cast<TwoDigit64>(b)*c) + carry);
carry = sum >> DIGIT_BITS;
return(sum & MAX_DIGIT);
}
inline Digit32 MulAddCarryPropOpL(Digit32 a, Digit32 &carry)
{
Digit32 sum(a + carry);
carry = sum < carry;
return(sum);
}
inline Digit32 MulAddCarryPropOpR(Digit32 b, Digit32 c, Digit32
&carry)
{
TwoDigit64 sum((static_cast<TwoDigit64>(b)*c) + carry);
carry = sum >> DIGIT_BITS;
return(sum & MAX_DIGIT);
}
Digit32 RawDigitSlice::MulAdd(const ConstRawDigitSlice &a,
const Digit32 b,
const ConstRawDigitSlice &c)
{
/* Set up iterators */
DigitIterator32 ri(GetStartIterator());
ConstDigitIterator32 ai(a.GetConstStartIterator());
ConstDigitIterator32 ci(c.GetConstStartIterator());
/* Get minimum length of a and c. */
std::size_t minLength(std::min(std::min(a.length, c.length),
length));
/* Do the combined multiply + add */
Digit32 carry(0);
std::size_t i(0);
for(i;i<minLength;++i,++ri,++ai,++ci)
*ri = MulAddOp(*ai, b, *ci, carry);
/* Handle the remaining part(s) of a or b. */
int largerLength(std::min(std::max(a.length, c.length), length));
if(a.length >= c.length)
{
for(i;i<largerLength;++i,++ri,++ai)
*ri = MulAddCarryPropOpL(*ai, carry);
} else {
for(i;i<largerLength;++i,++ri,++ci)
*ri = MulAddCarryPropOpR(b, *ci, carry);
}
/* Finish carry propagation */
if(largerLength < length)
{
for(i;i<length;++i,++ri)
*ri = MulAddCarryPropOpL(0, carry);
}
/* Done! */
return(carry);
}
*** This next routine is in a different source file, the one
implementing the full "RawDigit" class.
*** The former would be in another file that implements the
"RawDigitSlice" class.
void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
{
/* Check to see if we're doing an in-place multiply */
if(&a == this)
{
MulInPlace(b);
return;
} else if(&b == this) {
MulInPlace(a);
return;
}
/* Get lengths. */
std::size_t rLength(GetLength());
std::size_t aLength(a.GetLength());
std::size_t bLength(b.GetLength());
/* Zeroize this. */
Zeroize();
/* Set up slices. */
RawDigitSlice runningSum(*this, 0, aLength + 1); /* explanation:
(<RawDigit object>, <origin digit idx>, <nbr of digits in slice>) */
ConstRawDigitSlice as(a);
/* Do the multiplication. */
for(std::size_t i(0);i<bLength && i<rLength;++i,++runningSum)
acc.MulAdd(runningSum, b, as);
}
---
(Xposted to both comp.lang.c++ and comp.programming since I've got
questions related to both C++ language and general programming)
I've got the following C++ code. The first routine runs in like 65% of
the time of the second routine. Yet both do the same thing. However,
the second one seems better in terms of the way the code is written
since it helps encapsulate the transformation in the inner loop better
making it easier to read, at least in my opinion. Maybe it's not,
maybe the former is "better" that way and I should go with it, but if
the latter is "better" in that respect should I just ditch it anyway
and tradeoff for performance since I want this thing to be fast???
What each routine does is multiply two arbitrary-precision integers
together. The second one though uses an additional "slice" type that
provides a window enabling the "multiply and add" operation to be
performed on a limited range of digits, which can then be advanced
across the number, making clear that part of the algorithn.
I'm using the simple "grade school" multiply algorithm. Note how the
second routine more easily outlines this algorithm, while in the first
it is a little more difficult to see. Which would you prefer, exactly?
For brevity, class definitions and other members have been omitted.
However it should not be too difficult to figure out the necessary
information.
Also, could I lose the typecasts in this code or do I need them?
The reason why I'm asking is because I remembered getting told earlier
by someone here (Alf P. Steinbach, group: comp.lang.c++) about how my
last implementation of my program (this is for a fractal generator)
had some sort of bad, bug-inducing "abstraction gap" yet I was unable
to get enough elaboration responses about it so I've been more or less
shooting around trying to figure out how to make it better (although
he did give me some *examples* of where there were problems, which I
have since fixed.). But it seems I'm losing performance and that's not
a good thing for my application. Not to mention also that my original
bignum implementation was called "silly" as well("...although I didn't
look at the silly bignum class, whatever it's fault is..." ref:
http://groups.google.com/group/comp.lang.c++/msg/ac8df038d0b8dab9?dmode=source)
without any explanation as to what exactly was so silly about it. So I
had to start dark-shooting there too trying to figure out what I had
done wrong.
First version (fastest):
---
void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
{
/* Check to see if we're doing an in-place multiply */
if(&a == this)
{
MulInPlace(b);
return;
} else if(&b == this) {
MulInPlace(a);
return;
}
/* Get lengths. */
std::size_t rLength(GetLength());
std::size_t aLength(a.GetLength());
std::size_t bLength(b.GetLength());
/* Zeroize this. */
Zeroize();
/* Do the multiplication. */
TwoDigit64 carry(0);
for(std::size_t i(0);i<aLength && i<rLength;i++)
{
carry = 0;
for(std::size_t j(0);j<bLength && (i+j)<rLength;j++)
{
TwoDigit64
tmp((static_cast<TwoDigit64>(a.digits)*b.digits[j]) +
digits[i+j] + carry);
carry = tmp >> DIGIT_BITS;
tmp -= carry << DIGIT_BITS;
digits[i+j] = tmp;
}
if(i+bLength < rLength)
digits[i+bLength] = carry;
}
}
---
Second version (slower):
---
*** Slice manipulation.
inline Digit32 MulAddOp(Digit32 a, Digit32 b, Digit32 c, Digit32
&carry)
{
TwoDigit64 sum(a + (static_cast<TwoDigit64>(b)*c) + carry);
carry = sum >> DIGIT_BITS;
return(sum & MAX_DIGIT);
}
inline Digit32 MulAddCarryPropOpL(Digit32 a, Digit32 &carry)
{
Digit32 sum(a + carry);
carry = sum < carry;
return(sum);
}
inline Digit32 MulAddCarryPropOpR(Digit32 b, Digit32 c, Digit32
&carry)
{
TwoDigit64 sum((static_cast<TwoDigit64>(b)*c) + carry);
carry = sum >> DIGIT_BITS;
return(sum & MAX_DIGIT);
}
Digit32 RawDigitSlice::MulAdd(const ConstRawDigitSlice &a,
const Digit32 b,
const ConstRawDigitSlice &c)
{
/* Set up iterators */
DigitIterator32 ri(GetStartIterator());
ConstDigitIterator32 ai(a.GetConstStartIterator());
ConstDigitIterator32 ci(c.GetConstStartIterator());
/* Get minimum length of a and c. */
std::size_t minLength(std::min(std::min(a.length, c.length),
length));
/* Do the combined multiply + add */
Digit32 carry(0);
std::size_t i(0);
for(i;i<minLength;++i,++ri,++ai,++ci)
*ri = MulAddOp(*ai, b, *ci, carry);
/* Handle the remaining part(s) of a or b. */
int largerLength(std::min(std::max(a.length, c.length), length));
if(a.length >= c.length)
{
for(i;i<largerLength;++i,++ri,++ai)
*ri = MulAddCarryPropOpL(*ai, carry);
} else {
for(i;i<largerLength;++i,++ri,++ci)
*ri = MulAddCarryPropOpR(b, *ci, carry);
}
/* Finish carry propagation */
if(largerLength < length)
{
for(i;i<length;++i,++ri)
*ri = MulAddCarryPropOpL(0, carry);
}
/* Done! */
return(carry);
}
*** This next routine is in a different source file, the one
implementing the full "RawDigit" class.
*** The former would be in another file that implements the
"RawDigitSlice" class.
void RawDigit::Mul(const RawDigit &a, const RawDigit &b)
{
/* Check to see if we're doing an in-place multiply */
if(&a == this)
{
MulInPlace(b);
return;
} else if(&b == this) {
MulInPlace(a);
return;
}
/* Get lengths. */
std::size_t rLength(GetLength());
std::size_t aLength(a.GetLength());
std::size_t bLength(b.GetLength());
/* Zeroize this. */
Zeroize();
/* Set up slices. */
RawDigitSlice runningSum(*this, 0, aLength + 1); /* explanation:
(<RawDigit object>, <origin digit idx>, <nbr of digits in slice>) */
ConstRawDigitSlice as(a);
/* Do the multiplication. */
for(std::size_t i(0);i<bLength && i<rLength;++i,++runningSum)
acc.MulAdd(runningSum, b, as);
}
---