Efficient division/remainder in C

nessuno · Jan 13, 2006

I can't find any discussion of this question in this NG.

I'd like to implement some variable precision integer arithmetic in C,
and do it efficiently. A problem arises with the divide/remainder
operations. I'm using gcc on x86, but a lot of what I say applies to
other architectures.

First, I assume the div and ldiv functions are more efficient than
using / and % (obviously it depends on the compiler and level of
optimization, but it's one operation instead of two).

However, div and ldiv require the length of the quotient, divisor,
dividend and remainder to all be the same. Now all the machines I've
ever worked with at the machine language level (quite a few of them)
implement integer division with a dividend that is twice as long as the
divisor, quotient and remainder. Moreover, this is what naturally
arises in multiple precision arithmetic, you get dividends twice as
long as everything else. So suppose I'm using 16-bit divisors,
quotients and remainders, but 32-bit dividends. If I were writing in
assembly language, I would use the instructions that divide 16 bits
into 32.

But when I write in C, I'm forced to use the ldiv function, since one
of the operands is 32-bits. (Let's say int=16-bit and long=32-bit, as
on my machine.) Then the compiler has to implement this ldiv function
by using the instruction that divides 32-bits into 64-bits, since the
divisor is now 32-bits. In other words, the compiled code is forced to
use a divide instruction with operands twice as large as needed, due to
the design of the div and ldiv functions.

Anybody know a way around this (apart from inserting assembly code in
my C program)?

Thanks in advance...

Eric Sosman · Jan 13, 2006

[...]
First, I assume the div and ldiv functions are more efficient than
using / and % (obviously it depends on the compiler and level of
optimization, but it's one operation instead of two).
[...]
But when I write in C, I'm forced to use the ldiv function, since one
of the operands is 32-bits. (Let's say int=16-bit and long=32-bit, as
on my machine.) Then the compiler has to implement this ldiv function
by using the instruction that divides 32-bits into 64-bits, since the
divisor is now 32-bits. In other words, the compiled code is forced to
use a divide instruction with operands twice as large as needed, due to
the design of the div and ldiv functions.

The leap from "I assume" to "I'm forced" seems a
risky one. What have your measurements shown about
the efficiencies of /, %, both / and %, and ldiv?

You *have* made measurements, haven't you?

HAVEN'T YOU????

Shark · Jan 13, 2006

Use inline assembly for the system you are on. Use macros to determine
which routine depending on your system. Burst into flames.

Jordan Abel · Jan 13, 2006

I can't find any discussion of this question in this NG.

I'd like to implement some variable precision integer arithmetic in C,
and do it efficiently. A problem arises with the divide/remainder
operations. I'm using gcc on x86, but a lot of what I say applies to
other architectures.

First, I assume the div and ldiv functions are more efficient than
using / and % (obviously it depends on the compiler and level of
optimization, but it's one operation instead of two).

Not really - it requires a function call and use of a structure, and any
compiler should optimize use of / and % with the same operands anyway.

But when I write in C, I'm forced to use the ldiv function, since one
of the operands is 32-bits. (Let's say int=16-bit and long=32-bit, as
on my machine.) Then the compiler has to implement this ldiv function
by using the instruction that divides 32-bits into 64-bits, since the
divisor is now 32-bits. In other words, the compiled code is forced to
use a divide instruction with operands twice as large as needed, due to
the design of the div and ldiv functions.

i'm pretty sure that there are instructions on x86 take 32 bits for both
operands.

Anybody know a way around this (apart from inserting assembly code in
my C program)?

just use / and %.

nessuno · Jan 13, 2006

You are right. Suppose I have declarations
unsigned int quot,rem,dvsr;
unsigned long dvnd;
and then I say
quot=dvnd/dvsr;
rem =dvnd%dvsr;

If the compiler is smart enough to generate a single divide instruction
(that's all that's required on the x86 hardware), then I have what I
want.

Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.

Thanks for the reply!

Jordan Abel · Jan 13, 2006

Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.

Keith Thompson · Jan 13, 2006

You are right.

[...]

About what? Most of us can't easily see the article to which you're
replying. Please read <http://cfaj.freeshell.org/google/>.

[...]

Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.

James Dow Allen · Jan 13, 2006

Jordan said:
Not really - it requires a function call and use of a structure, and any
compiler should optimize use of / and % with the same operands anyway.

just use / and %.

Some of the responses make me wonder if your question was understood.
You want both "/" and "%" but don't want to needlessly repeat the
division. Do compilers find this optimization?

Someone from Sun implies that speed comparison tests are appropriate.
He should preach this in-house. It was trivial to beat memcpy() on
Sun-3 (which was optimized for Sun-2). The early Sparc's integer
multiply
could not only be bettered by rewriting the assembly, it *could be
beaten
without assembly in a C routine!* (In either way the speed improvement
was huge.)

James

Chuck F. · Jan 13, 2006

Keith said:
(e-mail address removed) writes:

[...]

Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.

Click to expand...

<OT>
gcc -S
</OT>

I use: gcc -gstabs+ -Wa,-ahldn -c

and pipe the result to less.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>

Logan Shaw · Jan 14, 2006

James said:
Someone from Sun implies that speed comparison tests are appropriate.
He should preach this in-house. It was trivial to beat memcpy() on
Sun-3 (which was optimized for Sun-2).

How many employees from 20 years ago do you think are still there?

- Logan

Christian Bau · Jan 14, 2006

"James Dow Allen said:
Some of the responses make me wonder if your question was understood.
You want both "/" and "%" but don't want to needlessly repeat the
division. Do compilers find this optimization?

Someone from Sun implies that speed comparison tests are appropriate.
He should preach this in-house. It was trivial to beat memcpy() on
Sun-3 (which was optimized for Sun-2). The early Sparc's integer
multiply
could not only be bettered by rewriting the assembly, it *could be
beaten
without assembly in a C routine!* (In either way the speed improvement
was huge.)

The probably didn't measure it.
That's why he is telling you: Measure it.

Or are you suggesting that measuring the speed of code before trying to
optimise it for speed is a bad idea, because it was suggested by someone
who works for a company who shipped some sub-optimal code twenty years
ago?

Jim · Jan 15, 2006

You are right. Suppose I have declarations
unsigned int quot,rem,dvsr;
unsigned long dvnd;
and then I say
quot=dvnd/dvsr;
rem =dvnd%dvsr;

If the compiler is smart enough to generate a single divide instruction
(that's all that's required on the x86 hardware), then I have what I
want.

Maybe I will try to figure out how to list assembly output from gcc to
see what instructions it actually generates.

Thanks for the reply!

You should check as well if the optimiser results are different for
signed and unsigned integer types, since there is a slight
complication with the sign of the modulo and the way it is rounded if
one of the incoming values are negative.

Jim

James Dow Allen · Jan 16, 2006

I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

The probably didn't measure it.
That's why he is telling you: Measure it.

Mr. Sosman's comment was of course excellent.
I thought this went without saying. What in my post
implied otherwise? (I deliberately removed an attribution
to reduce risk that my comment would somehow be seen
as opposing.)

Or are you suggesting that measuring the speed of code before trying to
optimise it for speed is a bad idea, ...

Hunh? I'll admit to a sardonic tone that is sometimes confusing,
but what's with the blatant misconstruction?

The only point I was alluding to is that even big organizations with
which one would *like* to associate technical excellence, often
fail sharply. ... so sharply it almost seems unfair here to chastise
OP.

James

Jordan Abel · Jan 17, 2006

I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

OK, i'll bite - what goes in the array?

pete · Jan 17, 2006

Jordan said:
I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

Click to expand...

OK, i'll bite - what goes in the array?

Corners.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
unsigned x, y, arr[256];

for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}
for (x = 0; 127 > x; ++x) {
for (y = 0; 127 > y; ++y) {
printf("%u * %u is %u\n", x, y,
x > y
?arr[x+y] - arr[x-y]
:arr[x+y] - arr[y-x]);
}
}
return 0;
}

/* END new.c */

Chuck F. · Jan 17, 2006

Jordan said:
.... snip ...

I'm surprised no one was curious about the fast C-code
multiply. It typically takes shorts in, long out -- a common
case. If you've not seen the trick before, consider it a
puzzle to reverse engineer it from the fragment:

c = arr[x+y] - arr[x-y]; /* c = x * y */

It gives competitive performance even on some machines with
blazingly fast multiplies.

Click to expand...

OK, i'll bite - what goes in the array?

A table of squares. The result needs a division by 2.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>

Michael Mair · Jan 17, 2006

Jordan said:
Christian said:

Someone from Sun implies that speed comparison tests are appropriate.
He should preach this in-house. It was trivial to beat memcpy() on
Sun-3 (which was optimized for Sun-2). The early Sparc's integer
multiply
could not only be bettered by rewriting the assembly, it *could be
beaten
without assembly in a C routine!* (In either way the speed improvement
was huge.)

Click to expand...

I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

Click to expand...

OK, i'll bite - what goes in the array?

In addition to the other answers, see e.g.
http://www.mccw.hetlab.tk/92/Multiplication/en.html
for an "explanation".

Cheers
Michael

James Dow Allen · Jan 17, 2006

pete said:
Jordan said:

c = arr[x+y] - arr[x-y]; /* c = x * y */

Click to expand...

OK, i'll bite - what goes in the array?

Click to expand...

...
printf("%u * %u is %u\n", x, y,
x > y
?arr[x+y] - arr[x-y]
:arr[x+y] - arr[y-x]);

I just want to point out that a solution can simultaneously
(a) accept signeds, not just unsigneds
(b) avoid the x > y ? timewaster
(c) obey the rules of standard C.

Again this is left as an exercise.

James

pete · Jan 18, 2006

James said:
Jordan said:

c = arr[x+y] - arr[x-y]; /* c = x * y */
OK, i'll bite - what goes in the array?

Click to expand...

...
printf("%u * %u is %u\n", x, y,
x > y
?arr[x+y] - arr[x-y]
:arr[x+y] - arr[y-x]);

Click to expand...

I just want to point out that a solution can simultaneously
(a) accept signeds, not just unsigneds
(b) avoid the x > y ? timewaster
(c) obey the rules of standard C.

Again this is left as an exercise.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
int x, y, arr[256], *ptr;

ptr = arr + 128;
for (x = -128; 127 > x; ++x) {
ptr[x] = x * x / 4;
}
for (x = -63; 64 > x; ++x) {
for (y = -63; 64 > y; ++y) {
printf("%d * %d is %d\n",
x, y, ptr[x + y] - ptr[x - y]);
}
}
return 0;
}

/* END new.c */

Jordan Abel · Jan 18, 2006

Jordan said:
Jordan said:

c = arr[x+y] - arr[x-y]; /* c = x * y */
OK, i'll bite - what goes in the array?

Click to expand...

...
printf("%u * %u is %u\n", x, y,
x > y
?arr[x+y] - arr[x-y]
:arr[x+y] - arr[y-x]);

Click to expand...

I just want to point out that a solution can simultaneously
(a) accept signeds, not just unsigneds

How, by passing a possibly negative offset into an array? Or by testing
both vars to be < 0?

division problem	13	Dec 22, 2006
[SUMMARY] Long Division (#180)	0	Oct 26, 2008
String of digits, certain radix, perform division	5	Oct 25, 2008
display the process of division in c	8	Dec 23, 2005
[QUIZ] Long Division (#180)	6	Oct 17, 2008
Division with ieee.numeric_std	5	Sep 21, 2006
Integer division with / - request explanation of behavior	12	Sep 30, 2006
Calculating longword pointer, which method is faster ?	4	May 27, 2010

Efficient division/remainder in C

nessuno

Eric Sosman

Shark

Jordan Abel

nessuno

Jordan Abel

Keith Thompson

James Dow Allen

Chuck F.

Logan Shaw

Christian Bau

Jim

James Dow Allen

Jordan Abel

pete

Chuck F.

Michael Mair

James Dow Allen

pete

Jordan Abel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads