data types (bitwidth) and multiplication

R

Ralf Hildebrandt

Hi all!


First of all: I am a C-newbie.


I have noticed a "strange" behavior with the standart integer
multiplication. The code is:

void main(void)
{
int a = 0x1234;
int b = 0xABCD;
long int result_1 = (a * b);
long int result_2 = (long int)(a * b); //the same as result_1
}

My machine is a 16 bit microcontroller (Texas Instruments MSP430).
Therefor "int" is 16 bits and "long int" is 32 bits long. The problem is
independent from the C-compiler (tested with IAR embedded workbench and
MSP430-GCC).

For clarity I use the following syntax (like VHDL):

a(15 downto 0) depicts all bits of variable a
result_1(31 downto 0) depicts all bits of result_1 (long int)
result_1(15 downto 0) depicts only the lower 16 bits (int)



The problem:
As everybody knows (16 bits) * (16 bits) = (32 bits).
Therefor the correct result would be (a * b)(31 downto 0). But only
result_1(15 downto 0) holds the correct lower bits of the result and
result_1(31 downto 16) is sign-exteded result(15 downto 0).
In other words: (a * b)(15 downto 0) is taken (and sign-extended) and (a
* b)(31 downto 16) is thrown away.

O.k., any arithmetic operation in C on two "int" variables lead to an
"int" as result and the type conversion to "long int" is done
afterwards. (So it is clear to see, that result_2==result_1.)

Let's have a look closer to hardware, before going on:
Every machine, that has a n*n hardware multiplier, computes the result
of this multiplication with bitwidth 2n. Therefor the correct result is
already stored in the output register of the hardware multiplier.

The first "solution" of this problem could be this:

void main(void)
{
long int a = 0x1234;
long int b = 0xABCD;
long int result_1 = (a * b);
}

It computes the correct result, but has a *major* disadvantage: Because
both "a" and "b" are now declared to be 32 bits long, the compiler has
to map this code to this multiplication (32 bits) * (32 bits) = (64
bits), that leads to the use of two times the hardware multiplier (and
even in software a 32 bit multiplication has to be done).


Finally - my questions:
Does a code construct exist in C, that forces the compiler to take all
32 bits from the result of a 16*16 bit multiplication?

Why is the result of the arithmetic operation "*" defined to be the same
data type like the inputs and has not doubled bitwidth?



Additionally: The same problem should occur on any other border of the
bitwidths (if a data type exists, that has two times the bitwidth of the
machine).


Thanks in advantage.
Ralf
 
L

Lew Pitcher

Hi all!


First of all: I am a C-newbie.


I have noticed a "strange" behavior with the standart integer

Not strange behaviour. Just behaviour you don't understand.
multiplication. The code is:

void main(void)
{
int a = 0x1234;
int b = 0xABCD;
long int result_1 = (a * b);
long int result_2 = (long int)(a * b); //the same as result_1

Yes, so? You perform 16bit multplication, then explicitly cast the results to
long int. Your (long int) cast doesn't affect precision of the multiplication,
it affects the precision of the expression of the results of the integer
multiplication. In other words, your (long int) cast simply converts the results
of an int multiplication to long.

If you wanted to increase the precision of the results, you should have cast the
two values to long /before/ you multiplied. In other words, you should have
long int result_3 = (long int)a * (long int)b;

or even
long int result_4 = (long int)a * b;

So long as one of the two operands is of long int precision, the multiplication
will be performed with long int precision, and the results will be of long int
precision. The cast(s) on the operand(s) simply coerce one or both of the
operands into long int precision, thus leading to long int multiplication, and a
suitable long int result.

--
Lew Pitcher
IT Consultant, Enterprise Technology Solutions
Toronto Dominion Bank Financial Group

(Opinions expressed are my own, not my employers')
 
R

Ralf Hildebrandt

Hi Lew and Grumble!

> Yes, so? You perform 16bit multplication, then explicitly cast the
> results to long int. Your (long int) cast doesn't affect precision of
> the multiplication, it affects the precision of the expression of the
> results of the integer multiplication. In other words, your (long int)
> cast simply converts the results of an int multiplication to long.

This is exactly, what I have observed and what I explained in my question.

> If you wanted to increase the precision of the results, you should
> have cast the two values to long /before/ you multiplied. In other
> words, you should have
>
>
> or even
>

or as Grumble said:
> http://www.eskimo.com/~scs/C-faq/q3.14.html


> So long as one of the two operands is of long int precision, the
> multiplication will be performed with long int precision, and the
> results will be of long int precision. The cast(s) on the operand(s)
> simply coerce one or both of the operands into long int precision,
> thus leading to long int multiplication, and a suitable long int
> result.

Is this the only way to compute a (16 bits) * (16 bits) = (32 bits)
multiplication? I can't imagine, that this *really silly* workaround is
the only way.

Let me explain again:
If one of the multiplicants is of type "long int", the other one will be
converted to "long int" too (if not already been done). This is a basic
principle in C when handling arithmetic operations.
But now -both operands are "long int"- truly a full (32 bits) * (32
bits) = (64 bits) operation is computed and the upper 32 bits of the
result are thrown away.

Again - for clarity: I can see, that the hardware multiplier computes
*two times* a multiplication (2 times a 16 bit multiplications, that
result in a 32 bit multiplication.). But the correct result of the 16
bit multiplication can be computed with only one 16 bit multiplication.
As I am a hardware engineer, I will translate the problem to software:
If I don't have any hardware multiplier, the multiplication is done in
Software (some conditional accumulations). This means

long int result_3 = (long int)a * (long int)b; // and derivatives

results in truely a complete 32 bit multiplication. All nessecary steps
and the correct result of a complete 32 bit multiplication are computed
(64 bits wide). (I can see the correct result in the registers.) But
then, after all this unnessecary computing, the upper half of the result
is moved to trash.

Both realisations -with hardware multiplier or in software- result in
the *doubled* computing load. I proved it with studying the assembler code.



Back to the 16 bit multiplication: I can see, studying the assembler
code, that the correct result of the "int" * "int" multiplication is
visible in the registers (both, if a hardware multiplier is used and if
all is done in software), but after this, the upper half part of the
result is destroyed.



Conclusion: On a n bit machine, where type "int" is n bits wide, AND
there exists a integer data type with 2n bithwidth (let's call it a
"int2"), the doubled and unnessecary computing load is done, if one
wants to have the result of a multiplication as wide as "int2".

I am not sure, but this problem should occur on a standart 32 bit x86
machine, if one wants to compute the "int" * "int" = "qword"
multiplication, too. (qword is 64 bits wide, isn't it?)
O.k. - today nearly everybodes does not care, if there are 1 or 2
multiplications computed, but only one is nessecary, but I do!
Especially for low power and even for high speed applications on small
(embedded) systems, the (low) CPU power has a major influence and I do
not want to waste it.


So is coding the multiplication in assembler the only solution?


Thanks for your comments.
Ralf
 
G

Grumble

Ralf said:
Conclusion: On a n bit machine, where type "int" is n bits wide, AND
there exists a integer data type with 2n bithwidth (let's call it a
"int2"), the doubled and unnessecary computing load is done, if one
wants to have the result of a multiplication as wide as "int2".

I am not sure, but this problem should occur on a standart 32 bit x86
machine, if one wants to compute the "int" * "int" = "qword"
multiplication, too. (qword is 64 bits wide, isn't it?)
O.k. - today nearly everybodes does not care, if there are 1 or 2
multiplications computed, but only one is nessecary, but I do!
Especially for low power and even for high speed applications on small
(embedded) systems, the (low) CPU power has a major influence and I do
not want to waste it.

So is coding the multiplication in assembler the only solution?

[ I think this is called the "widening multiply" problem. ]

I'm no C expert, but I don't think you can compute an NxN product
and expect to grab 2N bits in C. I could be wrong.

As far as I can tell, the problem does occur on IA32: MUL will take
two 32-bit registers as input, and store the result in two 32-bit
registers.

You could use inline assembly, if your compiler supports it.
 
G

glen herrmannsfeldt

Ralf said:
Hi Lew and Grumble!
(snip)


or as Grumble said:
(snip)

Is this the only way to compute a (16 bits) * (16 bits) = (32 bits)
multiplication? I can't imagine, that this *really silly* workaround is
the only way.

I believe that some compilers recognize that both operands were
converted from 16 bits, and do a 16 bit multiply, when you
cast one or both to long.

I don't at all know how many or which compilers do that.

It is a tradition for high level languages, but consider if
they didn't do it, what would happen when multiplying a
series of numbers?

i=a*b*c*d*e*f;

all variables are 16 bit: a*b is 32 bit,
32 bit product times c, convert c and a 64 bit product,
convert e, a 128 bit product, convert f, a 256 bit product.

So it is always that you get out what you put in.

(PL/I has the MULTIPLY function that allows one to specify
the precision of the result. Maybe C could add that.)

-- glen
 
J

J. J. Farrell

For context: 16-bit int, 32-bit long

gives a 16-bit result in result_1.
Is this the only way to compute a (16 bits) * (16 bits) = (32 bits)
multiplication? I can't imagine, that this *really silly* workaround is
the only way.

It's the normal way to get the result you want in C. That's not
directly related to the width of multiplication that the compiler
generates for the processor.
Let me explain again:
If one of the multiplicants is of type "long int", the other one will be
converted to "long int" too (if not already been done). This is a basic
principle in C when handling arithmetic operations.
But now -both operands are "long int"- truly a full (32 bits) * (32
bits) = (64 bits) operation is computed and the upper 32 bits of the
result are thrown away.

Yes, that's how the C Virtual Machine is required to behave (in
essence - I'm ignoring any edge conditions here).
Again - for clarity: I can see, that the hardware multiplier computes
*two times* a multiplication (2 times a 16 bit multiplications, that
result in a 32 bit multiplication.). But the correct result of the 16
bit multiplication can be computed with only one 16 bit multiplication.
As I am a hardware engineer, I will translate the problem to software:
If I don't have any hardware multiplier, the multiplication is done in
Software (some conditional accumulations). This means

long int result_3 = (long int)a * (long int)b; // and derivatives

results in truely a complete 32 bit multiplication. All nessecary steps
and the correct result of a complete 32 bit multiplication are computed
(64 bits wide). (I can see the correct result in the registers.) But
then, after all this unnessecary computing, the upper half of the result
is moved to trash.

Both realisations -with hardware multiplier or in software- result in
the *doubled* computing load. I proved it with studying the assembler code.

You proved something about the behaviour of your compiler used
in the way you invoked it. You didn't prove anything about C.
Back to the 16 bit multiplication: I can see, studying the assembler
code, that the correct result of the "int" * "int" multiplication is
visible in the registers (both, if a hardware multiplier is used and if
all is done in software), but after this, the upper half part of the
result is destroyed.

Conclusion: On a n bit machine, where type "int" is n bits wide, AND
there exists a integer data type with 2n bithwidth (let's call it a
"int2"), the doubled and unnessecary computing load is done, if one
wants to have the result of a multiplication as wide as "int2".

The compiler can do what it wants as long as it emulates the
defined behaviour of the C Virtual Machine. It could widen
them all to 128 bits, do the arithmetic, and throw away the
extra bits, if it wanted to. This is an issue of the quality
of implementation of the compiler.

In the case of

long int result = (long int)a * b;

where a and b are ints, the compiler can "see" that it is
being asked to multiply two 16-bit numbers to give a 32-bit
result. There's no reason why it shouldn't generate machine
code to do exactly that, since the program wouldn't be able
to tell the difference between its doing that and its
following the semantics of the C Virtual Machine.
So is coding the multiplication in assembler the only solution?

Either that, or find a decent C compiler. A compiler tuned
for getting the best out of small processors in embedded
environments ought to make use of an optimization like this.
 
C

CBFalconer

Ralf said:
.... snip ...

Conclusion: On a n bit machine, where type "int" is n bits wide, AND
there exists a integer data type with 2n bithwidth (let's call it a
"int2"), the doubled and unnessecary computing load is done, if one
wants to have the result of a multiplication as wide as "int2".

In addition, any time you perform a multiplication (or any other
arithmetic operation) where the result exceeds the capacity of the
signed type, the result is undefined behavior, and absolutely
anything can happen. This does not apply to unsigned types, where
the overflow action is carefully defined.

I.E. if you have 16 bit integers, then

int i, a, b;
long lg;

a = 4; b = 10000;
i = a * b;

results in undefined behavior. However

lg = (long)a * b;

is well defined. but a following "i = lg; is not.
 
K

Kevin Bracey

In message <n4mEb.423972$ao4.1359319@attbi_s51>
glen herrmannsfeldt said:
I believe that some compilers recognize that both operands were
converted from 16 bits, and do a 16 bit multiply, when you
cast one or both to long.

I don't at all know how many or which compilers do that.

The Norcroft ARM C and C++ compilers (supplied as part of ARM Ltd's
development tools), distinguishes the 5 cases

s32 x s32 -> 64 (note that the signedness matters for narrow
u32 x u32 -> 64 arguments)
s32 x 64 -> 64
u32 x 64 -> 64
64 x 64 -> 64

This is analogous to the OP's original example, given that it's a 32-bit CPU,
with 32x32->64 multiply instructions.

It's a very obvious optimisation to add for architectures that benefit from
it, and fairly straightforward to implement. One only needs to transform the
expression tree:


<64*64> <s32*64>
/ \ / \
/ \ / \
/ \ / \
<cast to ll> <ll expr> into <int expr> <ll expr>
|
|
|
<int expr>

and have separate code for each type of multiply. I'd agree that it looks
damned ugly in the source though, and it's not intuitive to the end user that
the compiler doesn't actually have to do a full 64x64->64 multiply.

The OP should put the casts in, inspect the compiler's output, and if it
is actually performing a full 32x32 multiply in his case, suggest the
improvement to his vendor.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top