>> to accelerate division

F

fordge

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

doubt is there a cheaper option to i/24 ==> i/(16+8) ==> ????????
or i/72.....

also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication


fORDGE
 
M

Mysidia

Well, you could reduce it somewhat, but the divide wants to stick
around...

i/(24) == i/(6*4) == i/6/4 == i/(3*2)/4 == i/3/2/2/2
== (i / 3) >> 1 >> 1 >> 1

i / 24 == (i >> 3) / 3

If you can figure out a way of expressing i / 3 as bit shifts,
then you have an improvement.
But I am doubting there is a way, without doing the full divide.

-Jh
 
B

Ben Pfaff

Mysidia said:
If you can figure out a way of expressing i / 3 as bit shifts,
then you have an improvement.
But I am doubting there is a way, without doing the full divide.

You can implement i/3 as a multiplication followed by a bit
shift. In some cases this may be an improvement. However, if it
is an improvement, your compiler may already be doing it for you.

See chapter 10 of Warren, Jr., _Hacker's Delight_, Addison-Wesley
2003, ISBN 0-201-91465-4, for details.
 
J

James McIninch

<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

.... so you wouldn't expect any performance difference. Likewise, the
following should all generate exactly the same code:

i<<1;
i*=2;
i=i*2;

Also, depending on the hardware, 'i * 24' may be faster than '(i<<4) +
(i<<3)' ... assuming that your machine didn't optimize the code into the
same thing...
 
A

Andrey Tarasevich

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

This is waste of characters and your time, let alone the reduced
readability of the code. In situations when the multiplier is a
compile-time constant, the compiler will do this optimization much
better than you will ever be able to. On many modern platforms in many
cases shifts are not the most efficient way to implement multiplication.
By obscuring your code with these shifts you are more likely to confuse
the compiler and impede real optimizations. The compiler has to be able
de-optimize your '(i << 4) + (i << 3)' into normal 'i * 24' and then
re-optimize it efficiently. Some compilers might not be able to do that,
leaving you with lousy shifts instead of really efficient multiplication.
also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication

What is 'div n multiplication'? Anyway, the fastest way to multiply 'i'
by 24 is to write 'i * 24'.
 
A

Andrey Tarasevich

James said:
Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Highly unlikely in case of signed 'i'. In general case shift and
division mean two completely different things for signed values.
Also, depending on the hardware, 'i * 24' may be faster than '(i<<4) +
(i<<3)' ... assuming that your machine didn't optimize the code into the
same thing...

Strictly speaking, it depends on the compiler, not on the hardware.
There's no direct correspondence between C operators and machine
language commands, i.e. the notion of "hardware-defined speed" is not
directly applicable to C operators.
 
E

E. Robert Tisdale

James said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Let's try that.
> cat f0.c
int f(int i) {
return i >> 1;
}
> gcc -Wall -std=c99 -pedantic -O2 -S f0.c
> cat f1.c
int f(int i) {
return i >>= 1;
}
> gcc -Wall -std=c99 -pedantic -O2 -S f1.c
> cat f2.c
int f(int i) {
return i/2;
}
> gcc -Wall -std=c99 -pedantic -O2 -S f2.c
> cat f3.c
int f(int i) {
return i /= 2;
}
> gcc -Wall -std=c99 -pedantic -O2 -S f3.c
> cat f4.c
int f(int i) {
return i = i/2;
}
> gcc -Wall -std=c99 -pedantic -O2 -S f4.c
> diff f0.s f1.s
1c1
< .file "f0.c"
---
> .file "f1.c"
> diff f1.s f2.s
1c1
< .file "f1.c"
---
> .file "f2.c" 10a11,13
> movl %eax, %edx
> shrl $31, %edx
> addl %edx, %eax
> diff f2.s f3.s
1c1
< .file "f2.c"
---
> .file "f3.c"
> diff f3.s f4.s
1c1
< .file "f3.c"
---
 
C

Charlie Gordon

E. Robert Tisdale said:
James said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Let's try that. [...]

of all 4 tests, only 2 really make sense.

int fshift(int i) { return i >> 1; }
int fdiv(int i) { return i / 2; }

gcc optimizes i / 2 using shifts, but generates different code from i >> 1:

namely on intel chips, signed division by 2 is equivalent to :

(i + (i said:
int f(int i) {
return i >> 1;
}

int f(int i) {
return i >>= 1;
}

No difference in semanticsObviously
 
C

Charlie Gordon

Charlie Gordon said:
E. Robert Tisdale said:
James said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

Let's try that. [...]

of all 4 tests, only 2 really make sense.

int fshift(int i) { return i >> 1; }
int fdiv(int i) { return i / 2; }

gcc optimizes i / 2 using shifts, but generates different code from i >> 1:

namely on intel chips, signed division by 2 is equivalent to :

(i + (i < 0)) >> 1

Sorry for not clipping the rest of the quote, this post was submitted
erroneously by a keyboard shortcut I wasn't aware of.

i / 2 <=> (i + (i < 0)) >> 1 assuming a ton of things that are mostly true
on the ia86 family :
- 2's complement
- signed division truncates toward 0
- signed shift duplicates the sign bit...

However, given this expression, gcc will generate a test and 2 JMPs.
You have to be very verbose to make it generate the same code :

return (i + (signed)((unsigned)i >> ((sizeof(i) * CHAR_BIT - 1)))) >> 1;

or the alternative:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 1)) >> 1;

And still this assumes i is of type int. The expression must change for other
signed integral types.

Furthermore, for other simple divisions by a power of 2, gcc will generate a
test and 2 JMPs for the / operator.
But will generate linear (and potentially faster) code for these convoluted
expressions :

dividing by 4:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 3)) >> 2;

dividing by 8:

return (i + ((i >> ((sizeof(i) * CHAR_BIT - 1))) & 7)) >> 3;

etc.

As a conclusion, it is more important to write readable code (using / and %) and
give the compiler every opportunity for optimisation by using or casting to
unsigned types. Yet this is a risky game:
- it is vain to rely on any particular compiler behaviour.
- even hand optimized code may prove slower.
- unsigned types have its own lot of problems.
- integer arithmetic is not necessarily 2's complement.
- >> has implementation defined semantics (can we list architectures where it
differs from ia86 ?).
- the java unsigned shift operator >>> was not added to the C standard (beats me
why).
- casting a signed arithmetic type to its unsigned version cannot be done
generically, reliably and portably.
- casting an unsigned arithmetic type to its signed version cannot be done
generically at all.

Such a can of worms!
 
D

David Kastrup

James McIninch said:
<posted & mailed>

Keep in mind that your compiler will probably turn all of these statements
into the same code:

i>>1;
i/=2;
i=i/2;

It better not. The first should do nothing (apart from which >>
should usually round towards negative infinity while / commonly rounds
towards zero).
... so you wouldn't expect any performance difference. Likewise, the
following should all generate exactly the same code:

i<<1;
i*=2;
i=i*2;

Again, the first should do nothing.

If your optimizer does not even do that, get another compiler.
 
D

David Kastrup

Ben Pfaff said:
You can implement i/3 as a multiplication followed by a bit
shift. In some cases this may be an improvement. However, if it
is an improvement, your compiler may already be doing it for you.

Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.
 
B

Ben Pfaff

David Kastrup said:
Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.

Because it is hideously obfuscatory and nonportable? It is a
much better choice to get a compiler with a smart optimizer.
 
?

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,
No. You leave this to the compiler.
Don't obscure things like this unless you have determined
it to be a bottleneck. A sane compiler will optimize this
in the best way anyways.
 
L

Lawrence Kirby

On Wed, 08 Dec 2004 09:09:03 +0100, Charlie Gordon wrote:

....
- the java unsigned shift operator >>> was not added to the C
standard (beats me why).

In C you can use >> on an unsigned integer type to achieve this. Java
doesn't have unsigned integer types so requires a different approach.

Lawrence
 
R

Richard Bos

David Kastrup said:
Why not just write

i *= 2863311531;

As long as i is multiple of 3, this will work fine on 32bit machines.

Not if i is a signed int, it won't - at least, it's not guaranteed to.
Signed overflow has undefined behaviour.

Richard
 
C

Chris Croughton

This is waste of characters and your time, let alone the reduced
readability of the code. In situations when the multiplier is a
compile-time constant, the compiler will do this optimization much
better than you will ever be able to. On many modern platforms in many
cases shifts are not the most efficient way to implement multiplication.
By obscuring your code with these shifts you are more likely to confuse
the compiler and impede real optimizations. The compiler has to be able
de-optimize your '(i << 4) + (i << 3)' into normal 'i * 24' and then
re-optimize it efficiently. Some compilers might not be able to do that,
leaving you with lousy shifts instead of really efficient multiplication.

Please get my cow-orkers to understand that, they criticise my code for
using "x * 2" and say that it should be "x << 1", I've had to fight to
get sensible things through. (Yes, there are times when using shifts is
clearer, like with bit field operations, but when it's an arithmetic
operation like finding the mean of two numbers it is nonobvious to use a
shift.)

Chris C
 
J

jacob navia

we usually use <<n as a cheaper option to multiplication when its
factor of
2^n right,

even for non 2^n say

ix24 ==> ix(16+8) ==> (i<<4)+(i<<3)

similarly divison 2^n is >>n

doubt is there a cheaper option to i/24 ==> i/(16+8) ==> ????????
or i/72.....

also are there other faster arithmetic alternatives like the >> and <<
for
div n multiplication


fORDGE
To divide by 24 the lcc-win32 compiler emits


movl 8(%ebp),%eax ; argument in eax
movl $715827883,%edx
movl %eax,%ecx ; copy eax since it will be destroyed
imull %edx ; multiply by the constant above
sarl $2,%edx ; shift result in eax:edx
sarl $31,%ecx
subl %ecx,%edx ; adjust
movl %edx,%eax ; result in edx moved to eax

You see?
No division at all.

In many cases it is not worth to try to outsmart the compiler in C
for this low level stuff...

And in general, optimizations for this level can't be done in C,
since it is a high level language that doesn't give you any access
to the registers of the machine anyway.

References:

This is the algorithm proposed by

Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
Integers using Multiplication", in Proceedings of the SIGPLAN PLDI'94
Conference, June 1994.
 
D

Dave

Chris said:
Please get my cow-orkers to understand that, they criticise my code for
using "x * 2" and say that it should be "x << 1", I've had to fight to
get sensible things through. (Yes, there are times when using shifts is
clearer, like with bit field operations, but when it's an arithmetic
operation like finding the mean of two numbers it is nonobvious to use a
shift.)

Chris C

Simply get them to show you experimental evidence that one is quicker
than the other. Or show them evidence that there is no difference. Or
compile via assembly and see what the compiler does with those two
statements.

Sounds to me like your organisation is placing an inappropriate value on
premature optimisation.
 
X

Xenos

Andrey Tarasevich said:
Highly unlikely in case of signed 'i'. In general case shift and
division mean two completely different things for signed values.
Not unlikely. There are a lot of processors that have signed preserved or
arithmetic shift operations.
 
X

Xenos

David Kastrup said:
It better not. The first should do nothing (apart from which >>
should usually round towards negative infinity while / commonly rounds
towards zero).

Any why should it do nothing?
Again, the first should do nothing.
Again?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top