>> to accelerate division

fordge · Dec 8, 2004

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

doubt is there a cheaper option to i/24 ==> i/(16+8) ==> ????????
or i/72.....

also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication

fORDGE

Mysidia · Dec 8, 2004

Well, you could reduce it somewhat, but the divide wants to stick
around...

i/(24) == i/(6*4) == i/6/4 == i/(3*2)/4 == i/3/2/2/2
== (i / 3) >> 1 >> 1 >> 1

i / 24 == (i >> 3) / 3

If you can figure out a way of expressing i / 3 as bit shifts,
then you have an improvement.
But I am doubting there is a way, without doing the full divide.

-Jh

Ben Pfaff · Dec 8, 2004

Mysidia said:
If you can figure out a way of expressing i / 3 as bit shifts,
then you have an improvement.
But I am doubting there is a way, without doing the full divide.

You can implement i/3 as a multiplication followed by a bit
shift. In some cases this may be an improvement. However, if it
is an improvement, your compiler may already be doing it for you.

See chapter 10 of Warren, Jr., _Hacker's Delight_, Addison-Wesley
2003, ISBN 0-201-91465-4, for details.

James McIninch · Dec 8, 2004

<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

.... so you wouldn't expect any performance difference. Likewise, the
following should all generate exactly the same code:

i<<1;
i*=2;
i=i*2;

Also, depending on the hardware, 'i * 24' may be faster than '(i<<4) +
(i<<3)' ... assuming that your machine didn't optimize the code into the
same thing...

Andrey Tarasevich · Dec 8, 2004

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

This is waste of characters and your time, let alone the reduced
readability of the code. In situations when the multiplier is a
compile-time constant, the compiler will do this optimization much
better than you will ever be able to. On many modern platforms in many
cases shifts are not the most efficient way to implement multiplication.
By obscuring your code with these shifts you are more likely to confuse
the compiler and impede real optimizations. The compiler has to be able
de-optimize your '(i << 4) + (i << 3)' into normal 'i * 24' and then
re-optimize it efficiently. Some compilers might not be able to do that,
leaving you with lousy shifts instead of really efficient multiplication.

also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication

What is 'div n multiplication'? Anyway, the fastest way to multiply 'i'
by 24 is to write 'i * 24'.

Andrey Tarasevich · Dec 8, 2004

James said:
Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Highly unlikely in case of signed 'i'. In general case shift and
division mean two completely different things for signed values.

Also, depending on the hardware, 'i * 24' may be faster than '(i<<4) +
(i<<3)' ... assuming that your machine didn't optimize the code into the
same thing...

Strictly speaking, it depends on the compiler, not on the hardware.
There's no direct correspondence between C operators and machine
language commands, i.e. the notion of "hardware-defined speed" is not
directly applicable to C operators.

E. Robert Tisdale · Dec 8, 2004

James said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Let's try that.

> cat f0.c

int f(int i) {
return i >> 1;
}

> gcc -Wall -std=c99 -pedantic -O2 -S f0.c
> cat f1.c

int f(int i) {
return i >>= 1;
}

> gcc -Wall -std=c99 -pedantic -O2 -S f1.c
> cat f2.c

int f(int i) {
return i/2;
}

> gcc -Wall -std=c99 -pedantic -O2 -S f2.c
> cat f3.c

int f(int i) {
return i /= 2;
}

> gcc -Wall -std=c99 -pedantic -O2 -S f3.c
> cat f4.c

int f(int i) {
return i = i/2;
}

> gcc -Wall -std=c99 -pedantic -O2 -S f4.c
> diff f0.s f1.s

1c1
< .file "f0.c"
---

> .file "f1.c"
> diff f1.s f2.s

1c1
< .file "f1.c"
---

> .file "f2.c" 10a11,13
> movl %eax, %edx
> shrl $31, %edx
> addl %edx, %eax
> diff f2.s f3.s

1c1
< .file "f2.c"
---

> .file "f3.c"
> diff f3.s f4.s

1c1
< .file "f3.c"
---

Charlie Gordon · Dec 8, 2004

E. Robert Tisdale said:
James said:

<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Click to expand...

Let's try that. [...]

of all 4 tests, only 2 really make sense.

int fshift(int i) { return i >> 1; }
int fdiv(int i) { return i / 2; }

gcc optimizes i / 2 using shifts, but generates different code from i >> 1:

namely on intel chips, signed division by 2 is equivalent to :

(i + (i said:
int f(int i) {
return i >> 1;
}

int f(int i) {
return i >>= 1;
}

No difference in semanticsObviously

Charlie Gordon · Dec 8, 2004

Charlie Gordon said:
E. Robert Tisdale said:

James said:

<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Click to expand...

Let's try that. [...]

Click to expand...

of all 4 tests, only 2 really make sense.

int fshift(int i) { return i >> 1; }
int fdiv(int i) { return i / 2; }

gcc optimizes i / 2 using shifts, but generates different code from i >> 1:

namely on intel chips, signed division by 2 is equivalent to :

(i + (i < 0)) >> 1

Sorry for not clipping the rest of the quote, this post was submitted
erroneously by a keyboard shortcut I wasn't aware of.

i / 2 <=> (i + (i < 0)) >> 1 assuming a ton of things that are mostly true
on the ia86 family :
- 2's complement
- signed division truncates toward 0
- signed shift duplicates the sign bit...

However, given this expression, gcc will generate a test and 2 JMPs.
You have to be very verbose to make it generate the same code :

return (i + (signed)((unsigned)i >> ((sizeof(i) * CHAR_BIT - 1)))) >> 1;

or the alternative:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 1)) >> 1;

And still this assumes i is of type int. The expression must change for other
signed integral types.

Furthermore, for other simple divisions by a power of 2, gcc will generate a
test and 2 JMPs for the / operator.
But will generate linear (and potentially faster) code for these convoluted
expressions :

dividing by 4:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 3)) >> 2;

dividing by 8:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 7)) >> 3;

etc.

As a conclusion, it is more important to write readable code (using / and %) and
give the compiler every opportunity for optimisation by using or casting to
unsigned types. Yet this is a risky game:
- it is vain to rely on any particular compiler behaviour.
- even hand optimized code may prove slower.
- unsigned types have its own lot of problems.
- integer arithmetic is not necessarily 2's complement.
- >> has implementation defined semantics (can we list architectures where it
differs from ia86 ?).
- the java unsigned shift operator >>> was not added to the C standard (beats me
why).
- casting a signed arithmetic type to its unsigned version cannot be done
generically, reliably and portably.
- casting an unsigned arithmetic type to its signed version cannot be done
generically at all.

Such a can of worms!

David Kastrup · Dec 8, 2004

James McIninch said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

It better not. The first should do nothing (apart from which >>
should usually round towards negative infinity while / commonly rounds
towards zero).

... so you wouldn't expect any performance difference. Likewise, the
following should all generate exactly the same code:

i<<1;
i*=2;
i=i*2;

Again, the first should do nothing.

If your optimizer does not even do that, get another compiler.

David Kastrup · Dec 8, 2004

Ben Pfaff said:
You can implement i/3 as a multiplication followed by a bit
shift. In some cases this may be an improvement. However, if it
is an improvement, your compiler may already be doing it for you.

Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.

Ben Pfaff · Dec 8, 2004

David Kastrup said:
Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.

Because it is hideously obfuscatory and nonportable? It is a
much better choice to get a compiler with a smart optimizer.

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?= · Dec 8, 2004

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

No. You leave this to the compiler.
Don't obscure things like this unless you have determined
it to be a bottleneck. A sane compiler will optimize this
in the best way anyways.

Lawrence Kirby · Dec 8, 2004

On Wed, 08 Dec 2004 09:09:03 +0100, Charlie Gordon wrote:

....

- the java unsigned shift operator >>> was not added to the C
standard (beats me why).

In C you can use >> on an unsigned integer type to achieve this. Java
doesn't have unsigned integer types so requires a different approach.

Lawrence

Richard Bos · Dec 8, 2004

David Kastrup said:
Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.

Not if i is a signed int, it won't - at least, it's not guaranteed to.
Signed overflow has undefined behaviour.

Richard

Chris Croughton · Dec 8, 2004

This is waste of characters and your time, let alone the reduced
readability of the code. In situations when the multiplier is a
compile-time constant, the compiler will do this optimization much
better than you will ever be able to. On many modern platforms in many
cases shifts are not the most efficient way to implement multiplication.
By obscuring your code with these shifts you are more likely to confuse
the compiler and impede real optimizations. The compiler has to be able
de-optimize your '(i << 4) + (i << 3)' into normal 'i * 24' and then
re-optimize it efficiently. Some compilers might not be able to do that,
leaving you with lousy shifts instead of really efficient multiplication.

Please get my cow-orkers to understand that, they criticise my code for
using "x * 2" and say that it should be "x << 1", I've had to fight to
get sensible things through. (Yes, there are times when using shifts is
clearer, like with bit field operations, but when it's an arithmetic
operation like finding the mean of two numbers it is nonobvious to use a
shift.)

Chris C

jacob navia · Dec 8, 2004

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

doubt is there a cheaper option to i/24 ==> i/(16+8) ==> ????????
or i/72.....

also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication

fORDGE

To divide by 24 the lcc-win32 compiler emits

movl 8(%ebp),%eax ; argument in eax
movl $715827883,%edx
movl %eax,%ecx ; copy eax since it will be destroyed
imull %edx ; multiply by the constant above
sarl $2,%edx ; shift result in eax:edx
sarl $31,%ecx
subl %ecx,%edx ; adjust
movl %edx,%eax ; result in edx moved to eax

You see?
No division at all.

In many cases it is not worth to try to outsmart the compiler in C
for this low level stuff...

And in general, optimizations for this level can't be done in C,
since it is a high level language that doesn't give you any access
to the registers of the machine anyway.

References:

This is the algorithm proposed by

Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
Integers using Multiplication", in Proceedings of the SIGPLAN PLDI'94
Conference, June 1994.

Dave · Dec 8, 2004

Chris said:
Please get my cow-orkers to understand that, they criticise my code for
using "x * 2" and say that it should be "x << 1", I've had to fight to
get sensible things through. (Yes, there are times when using shifts is
clearer, like with bit field operations, but when it's an arithmetic
operation like finding the mean of two numbers it is nonobvious to use a
shift.)

Chris C

Simply get them to show you experimental evidence that one is quicker
than the other. Or show them evidence that there is no difference. Or
compile via assembly and see what the compiler does with those two
statements.

Sounds to me like your organisation is placing an inappropriate value on
premature optimisation.

Xenos · Dec 8, 2004

Andrey Tarasevich said:
Highly unlikely in case of signed 'i'. In general case shift and
division mean two completely different things for signed values.

Not unlikely. There are a lot of processors that have signed preserved or
arithmetic shift operations.

Xenos · Dec 8, 2004

David Kastrup said:
It better not. The first should do nothing (apart from which >>
should usually round towards negative infinity while / commonly rounds
towards zero).

Any why should it do nothing?

Again, the first should do nothing.

Again?

Minimum Total Difficulty	0	Nov 15, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
String of digits, certain radix, perform division	5	Oct 25, 2008
All CRUD operations work except POST. Why?	2	May 28, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
[QUIZ] Long Division (#180)	6	Oct 17, 2008
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Help with code	0	Jun 12, 2022

>> to accelerate division

fordge

Mysidia

Ben Pfaff

James McIninch

Andrey Tarasevich

Andrey Tarasevich

E. Robert Tisdale

Charlie Gordon

Charlie Gordon

David Kastrup

David Kastrup

Ben Pfaff

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Lawrence Kirby

Richard Bos

Chris Croughton

jacob navia

Dave

Xenos

Xenos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads