Optimisation questions

R

Raj Pashwar

Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

Is this generally a good optimisation to use?

Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

Thank You.
 
I

Ian Collins

Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

Is this generally a good optimisation to use?

No, any half decent compiler will be able to optimise x/8 without the
programmer resorting to hackery.
Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

Not really, register is only a hit and it is pretty well redundant these
days. Et the optimiser do its job.

Memory regions will be cached when they are accessed.
 
B

BartC

Raj Pashwar said:
Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;
Is this generally a good optimisation to use?

Try x&7 in place of x%8. The double-shift code above does something entirely
different (as well as modifying x).

But, unless your program does nothing except /8 and %8 operations, it's
unlikely to make much difference. And it's likely these simple optimisations
have already been done internally.

What sort of speed-up are you looking for?
 
E

Eric Sosman

Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).

"Pretty sure?" What do your (ahem) measurements think?

Also, note that x/8 and x>>3 are not equivalent if x is
negative. The C language does not define x>>3 for negative x,
but you are likely to get either a very large positive result
with no obvious relation to "one eighth of x" or else a negative
result that is a little smaller than you'd like.

If you happen to know that x is non-negative, so x>>3 is
well-defined, consider sharing your knowledge with the compiler
by making x an unsigned int. If you do so and then write x/8,
it is very likely that the compiler will substitute a shift.
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

Neither am I. Again, what do your measurements think?
Is this generally a good optimisation to use?

Probably not. x&7 is tempting and certainly simpler, but
as with the shift it, too, has issues with negative x. Once again,
making x an unsigned int may enable the compiler to make this or
some other optimization.

Let's also note that most machines' integer division produces
both the quotient and remainder in one operation. If the compiler
sees x/8 and x%8 close together, with no intervening change to x,
it may well combine them into just one division and make use of both
the outputs. A simplistic "divide takes THIS long, remainder takes
THAT long, add the two" may not be appropriate.
Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

Stop guessing and measure!
 
K

Keith Thompson

Raj Pashwar said:
My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

Is this generally a good optimisation to use?

Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

Are you enabling all optimization options in your compiler? For
example, if you're using gcc, are you using "-O3"? If not, that will
probably give you a better performance boost than any
micro-optimizations tweaks you might make in your source code.
 
K

Keith Thompson

Ian Collins said:
Not really, register is only a hit and it is pretty well redundant these
days. Et the optimiser do its job.

"hit" --> "hint"
"Et" --> "Let"
 
B

BartC

BartC said:
Try x&7 in place of x%8. The double-shift code above does something
entirely different (as well as modifying x).

Actually, x-((x>>3)<<3) does I think get the same answer (when the shifts
are the right way round..), but still, you're using 3 ops instead of 1:
x&=7;
 
J

James Kuyper

Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

As a general rule, your compiler should generate exactly the same code
for x/8 and x>>3, because they're required to have exactly the same
behavior. If it doesn't, get a better one. What that code will be may be
different on different machines, it might be called a division
instruction on one machine, and a shift instruction on a different
machine, but you can generally trust the compiler to know which one is
faster. If it doesn't, get a better one.
Is this generally a good optimisation to use?

Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access.

That's unlikely to be the case; if putting it in a register would speed
up, you can be reasonably sure the compiler already puts it in a
register, even if you don't specifically tell it to do so. If it didn't
put it in a register, it probably had a good reason for not doing so; in
that case, if it paid attention to your "register" declaration, it could
actually slow your program down. For precisely this reason, most
compilers ignore the keyword "register", and the way the standard is
written, this is perfectly acceptable behavior.
 
S

Stephen Sprunk

As a general rule, your compiler should generate exactly the same code
for x/8 and x>>3, because they're required to have exactly the same
behavior.

That only holds if x is non-negative. Since x is an int, the compiler
cannot safely make that assumption, so it cannot do strength reduction.

If it is possible to change x from int to unsigned int, that is another
story.

S
 
G

Geoff

Hello :

My current program runs too slow. It uses x/8 and x%8 many times, where x
is an int in both cases.

How slow is too slow? How did you measure it?
Where is your measurement data?

Debug build or release build?
Which compiler and what version are you using?
What is your target platform?
I have replaced x/8 with x>>3, as this is faster (pretty sure).
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

To be sure, you must measure.

What if x is negative? Are the results of x/8 vs. x>>3 the same across
your problem space?
Is this generally a good optimisation to use?

Generally, no.

Generally, the result can depend on the value range of x and the
architecture of your target. Generally, writing code this way is
called 'premature optimization'. Write code to say what you mean, let
the compiler do it's job regarding the details of the translation.
Obtain correct results first, then turn on optimizations and test the
code again.
Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

Declaring 'register' is a hint to the compiler, it's free to disregard
it if it wants. A 'region' can never be in register, you are correct.
Any decent, state-of-the-art compiler will know more than you about
how to optimize for this. What compiler and what version are you
using?

The way out of this paradox is to select a state-of-the-art compiler
for your target platform. What compiler and version are you using?
 
J

James Kuyper

That only holds if x is non-negative.

You're right - the result is implementation-defined if x is negative.
For my purposes, that's just about as bad as undefined behavior, so for
precisely that reason I never use >> on values that could be negative.
As a result, I have a tendency to automatically assume that an
expression like x>>3 implies that the author is certain (justifiably or
not) that x is not negative. However, when questions at this level are
being asked, that's a bad assumption, and I shouldn't have made it.
 
K

Kaz Kylheku

You're right - the result is implementation-defined if x is negative.

Yes and the point is that this is what helps to optimize it better.
For my purposes, that's just about as bad as undefined behavior, so for

It is nowhere near as "bad". It must not be a failure, and it must be
documented by the implementation!
precisely that reason I never use >> on values that could be negative.

In C90, that means you would also not use / on values that could be negative,
because yu could get truncation toward infinity or zero, and
implementation-defined is just about as bad as undefined ...

Speaking of division, whenever I see x/y, I tend to think that the programmer
must be assuming that y is not zero. Oh, and whenever I see x + y,
it must be that the programmer assumed that the sum of x + y is not greater
than INT_MAX ...

Anyway, let's see how GCC (4.5.2, x86) weighs in on this:

int div8_0(int x)
{
return x >> 3;
}

int div8_1(int x)
{
return x / 8;
}

..globl div8_0
.type div8_0, @function
div8_0:
..LFB0:
.cfi_startproc
movl %edi, %eax
sarl $3, %eax
ret
.cfi_endproc
..LFE0:
.size div8_0, .-div8_0
.p2align 4,,15
..globl div8_1
.type div8_1, @function
div8_1:
..LFB1:
.cfi_startproc
leal 7(%rdi), %eax
testl %edi, %edi
cmovns %edi, %eax
sarl $3, %eax
ret
.cfi_endproc
..LFE1:


Two more instructions in the div case!

So the advice "use / 8; the compiler will generate the same code" is quite
obsolete. It was that way on some compilers for the C90 language.

(For instance, if division has truncation toward negative infinity semantics,
and the arithmetic bit shifter shifts through the sign and copies the sign,
then such a reduction can be made.)

The reduction can also be made if you somehow make make it obvious to the
compiler that the variable cannot be negative.

(Doing that by making it unsigned, however, has its downsides: like
introducing bugs due to the discontinuity in unsigned in the neighborhood
of zero, and surprising value changes on conversion.)
 
R

Raj Pashwar

Thanks for all the answers, x&8 looks like a good one for efficiency.

I also had another idea while sleeping - amazing how that helps :) At
program startup, maybe in another thread, I can build an array of all
x/8, x%8 for x's that might occur, then just look up in this array
instead of doing the division every time. I will try this today also.

Thanks
 
I

Ian Collins

Thanks for all the answers, x&8 looks like a good one for efficiency.

Efficiency for what? Did you actually read what we wrote about the
futility of such silly optimisations? Did you measure the performance
of your code?
 
Z

Zoltan Kocsi

Thanks for all the answers, x&8 looks like a good one for efficiency.

I also had another idea while sleeping - amazing how that helps :) At
program startup, maybe in another thread, I can build an array of all
x/8, x%8 for x's that might occur, then just look up in this array
instead of doing the division every time. I will try this today also.

Chances are, your table will be slower, especially if it is decent size.
A table look up means an address calculation and a memory fetch, which even
with superscalar architectures and large caches is likely to cost you more
than a single instruction register-register AND or shift.

By the way, why would you need to build the table in a different thread? To
waste memory and cycles?

You'd be better off with listening to what the others replied to your
question. Unless x must be signed, change it to unsigned and the compiler
will merrily optimise your division away.

Zoltan
 
N

Nick Keighley

Here's the most important thing about optimisation: Any attempt at
optimisation is completely pointless unless you measure first how much
time your program spends in which part of the program.

I known this is the received wisdom, if not the dogma. But it isn't
actually true. Yes it's a good idea to measure. But then pressing the
"run" button and counting heartbeats until the answer appears is a
measurement. And sometimes a good-enough one.

You /can/ improve the performance of a program without measuring all
the different bits. I know I've done it (tweaking a hash calculation
resulted in a massive performance improvement). Sometimes educated
guess-work /is/ good enough.

So, yes, the advice should read "any attempt at optimisation should be
guided by careful measurement of parts of the program". But your
"completly pointless" comment is just hyperbole that does more harm
than good.
 
N

Nick Keighley

My current program runs too slow.

how do you know?

It uses x/8 and x%8 many times, where x
is an int in both cases.

I have replaced x/8 with x>>3, as this is faster (pretty sure).

and? Does it go any faster. I'd bet not.
However, I am not sure whether this is faster than x%8 :
x-=(x>>3)<<3;

measure it
Is this generally a good optimisation to use?

dunno. What does your measurement tell you?
Also I've got a memory region that I constantly need to access, so I
would like it to be in cache. I know that I can use 'register' in
variable declarations, to speed up it's

that's "its"
access. But if I have a pointer
to that mem. region and I declared it 'register' I will get faster access
to the pointer, not to the mem. region it points to, isn't it? What is
the way out of this paradox?

I see no paradox. Look up what "paradox" means. It doesn't mean "I
wish the laws of physics were different"
 
B

BartC

I also had another idea while sleeping - amazing how that helps :) At
program startup, maybe in another thread, I can build an array of all
x/8, x%8 for x's that might occur, then just look up in this array
instead of doing the division every time. I will try this today also.

If the range of x is small enough (8 or 16 bits), that it would make the
table viable, then you don't need a separate thread to set it up. But if
it's so big (32 bits for example), then such a huge table will likely make
your machine grind to a halt.

And a table-lookup will almost certainly take longer than a shift or mask
operation.

But, x/8 and x%8 sound like you're doing something with bit pointers.
Perhaps post more of your code here, so you can get more helpful responses,
once we know what you're trying to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top