Fastest way to get minimum of four integer values?

anders.johansen · Apr 1, 2008

Hi,

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Sincerely,
Anders

kasthurirangan.balaji · Apr 1, 2008

Hi,

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Sincerely,
Anders

the if could be replaced like
cell=(cell > trans ? trans : cell);

By replacing i cannot say what would be the improvement. There are
other factors to be considered.
moreover, you haven't said anything about the <calculations>. you may
also want to consider running in parallel. all depends on what you are
trying to achieve, about which you haven't said much.

Thanks,
Balaji.

Martin · Apr 1, 2008

Hi,

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Sincerely,
Anders

One obvious improvement is moving all <calculations> (or at least some
of them) out of the loop, if it's possible. If no - IMO there is no
considerable improvement.

Kai-Uwe Bux · Apr 1, 2008

Hi,

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Not from an algorithmic point of view. To find the minimum of four
values will require computing all of them and three additional comparisons.

On the other hand, there could be bit-fiddling techniques to find the
minimum without using branch operations. This can lead to better pipelining
on some processors. See, for instance:

www.cellperformance.com/articles/2006/04/benefits_to_branch_elimination.html

YMMV

Best

Kai-Uwe Bux

Lionel B · Apr 1, 2008

the if could be replaced like
cell=(cell > trans ? trans : cell);

Pointless. That would be no faster, is not (to my mind) any clearer and
potentially involves a redundant self-assignment of cell to itself
(although an optimising compiler would probably not generate code for
this).

[...]

Yannick Tremblay · Apr 1, 2008

Pointless. That would be no faster, is not (to my mind) any clearer and
potentially involves a redundant self-assignment of cell to itself
(although an optimising compiler would probably not generate code for
this).

Agree, any decent compiler should generate the exact same machine code for:

cell = ( cell > trans ? trans : cell);

and

if( cell > trans )
cell = trans;
else
cell = cell;

The fact that the first one can be written is C on one line is of no
relevance to the compiler.

Yan

Puppet_Sock · Apr 1, 2008

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Question: In the above psuedo code, the various <calculation>
entries use a lot of time, yes? That is, the various if-greater
tests are a tiny portion of the overall run-time, yes?
So, if the calcs happen every time through this loop, you
probably will find more opportunity for optimizing in those
calcs rather than the comparatively small potential in
the if-greater tests.

Also: Every time somebody starts asking about "make it faster"
I have a standard response I trot out. Where's your stop-watch?
Where's your performance spec? First measure how much time
various parts of the code take to run. Then look to your spec
to see if that is acceptable. If you've made the spec, relax!
And don't sweat the stuff that takes a small part of the time.
Socks

anders.johansen · Apr 1, 2008

First of all, thanks to all who answered.

I was kinda hoping somebody had a magic bit-twiddeling trick up their
sleeve for this. Never hurts to ask

More info follows and responses to some concrete questions.

Question: In the above psuedo code, the various <calculation>
entries use a lot of time, yes? That is, the various if-greater
tests are a tiny portion of the overall run-time, yes?
So, if the calcs happen every time through this loop, you
probably will find more opportunity for optimizing in those
calcs rather than the comparatively small potential in
the if-greater tests.

Actually the calculations are mainly table look-ups. It looks like
this:

int thismin = 100

<...>

<loop starts>

int cell = (acceptMatrix[false][source[i-1]][target[1]]) ?
distmatrix[i-1][1] : distmatrix[i-1][1]+1 ;
const int above (distmatrix[i-1][2]+1);
if (cell>above) cell=above;
const int left (distmatrix[1]+1);
if (cell>left) cell=left;
const int trans (
distmatrix[i-2][0] + 1 +
(!acceptMatrix[false][source[i-2]][target[1]]) +
(!acceptMatrix[true][source[i-1]][target[0]])
);
if (cell>trans) cell=trans;
distmatrix[2]=cell;
if (cell<thismin) thismin=cell;

<loop ends>

distmatrix is a 2-D array of int.

acceptmatrix is a 3D array of bool.

i and j are unsigned int loop indexes

source and target are const unsigned char * const

An additional question is whether the use of named const temporaries
is detrimental. Should I perhaps instead hand-inline the table
lookups, and let the compiler find the common expressions and
optimize, or...

I don't know how smart the compiler is, but I suspect "not too
bright". It's Borland C++ Builder version 6, so a 6 years old compiler
at best.

Also: Every time somebody starts asking about "make it faster"
I have a standard response I trot out. Where's your stop-watch?
Where's your performance spec? First measure how much time
various parts of the code take to run. Then look to your spec
to see if that is acceptable. If you've made the spec, relax!
And don't sweat the stuff that takes a small part of the time.
Socks

Click to expand...

The "stop-watch" is outside this algorithm, but currently this code
fragment dominates execution time. Changing the calculation of "trans"
from an if/then based structure to the current gobbledygook decreased
run time by approximately 15%, probably by eliminating a few branches
and a temporary or two. I compare the output to the original
implementation for a large dataset, so I am sure that behaviour was
not changed in any obvious way.I am also pursuing other ways to
decrease the data set that the algorithm iterates over, but that's an
entirely different discussion.

Of course I could just get some hotshot to hand-assy it for me, but
then I would be locked-in to that particular "magic" implementation.
Further, I doubt that solution would be as easy to hand-unroll as the
C/C++ version is.

Anders

Michael DOUBEZ · Apr 2, 2008

(e-mail address removed) a écrit :

First of all, thanks to all who answered.

I was kinda hoping somebody had a magic bit-twiddeling trick up their
sleeve for this. Never hurts to ask

More info follows and responses to some concrete questions.
[snip]

int thismin = 100

<...>

<loop starts>

int cell = (acceptMatrix[false][source[i-1]][target[1]]) ?
distmatrix[i-1][1] : distmatrix[i-1][1]+1 ;
const int above (distmatrix[i-1][2]+1);
if (cell>above) cell=above;
const int left (distmatrix[1]+1);
if (cell>left) cell=left;
const int trans (
distmatrix[i-2][0] + 1 +
(!acceptMatrix[false][source[i-2]][target[1]]) +
(!acceptMatrix[true][source[i-1]][target[0]])
);

Funny that you add integers and boolean.
What value do you expect from a conversion from bool to int ?
Or is the code incomplete ?

if (cell>trans) cell=trans;
distmatrix[2]=cell;
if (cell<thismin) thismin=cell;

<loop ends>

distmatrix is a 2-D array of int.

acceptmatrix is a 3D array of bool.

i and j are unsigned int loop indexes

Click to expand...

I see no j variable in your code.

source and target are const unsigned char * const

An additional question is whether the use of named const temporaries
is detrimental. Should I perhaps instead hand-inline the table
lookups, and let the compiler find the common expressions and
optimize, or...

I don't know how smart the compiler is, but I suspect "not too
bright". It's Borland C++ Builder version 6, so a 6 years old compiler
at best.

Click to expand...

In this case I suggest that you get a look at the assembly generated by
the compiler. And depending on the operations that seems consuming, you
might want to tinker a bit with compiler's directives.

Here is how I would rewrite the code you gave in order to minimize the
number of operation but I wouldn't expect to outsmart any modern compiler.

<loop starts>

int cell=distmatrix[i-1][1];
const char source_im1(source[i-1]); //reused later
const char target_1 (target[1]); //reused later
if(!acceptMatrix[false][src_im1][target_1])
{ //increment cell only in given case
++cell;
}

//change here to gain additions - might lose in memory load/unload
int cell2 (distmatrix[i-1][2]);
const int left (distmatrix[1]);
if (cell2>left) cell2=left;

int trans (distmatrix[i-2][0]);
if(!acceptMatrix[false][source[i-2]][target[1]])
{
++trans;
}
if(!acceptMatrix[true][source_im1][target_1])
{
++trans;
}
if (cell2>trans) cell2=trans;
//addition we add not done before
++cell2
if (cell>cell2) cell=cell2;

distmatrix[2]=cell;
if (cell<thismin) thismin=cell;

<loop ends>

Michael

anders.johansen · Apr 2, 2008

Hmmm,

How about this:

int cell = (acceptMatrix[false][source[i-1]][target[1]]) ?
distmatrix[i-1][1] : distmatrix[i-1][1]+1 ;
// Newval replaces above. All add one instrs. moved down
int newval (distmatrix[i-1][2]);
const int left (distmatrix[1]);
const int trans (
distmatrix[i-2][0] +
(!acceptMatrix[false][source[i-2]][target[1]]) +
(!acceptMatrix[true][source[i-1]][target[0]])
);
if (left < newval) newval = left;
if (trans < newval) newval = trans;
// Add one moved here
if (++newval < cell) cell=newval;
if (distmatrix[2]=cell < thismin) thismin=cell;

This saves two "+ 1" instructions, and replaces the remaining one with
a preincrement. The ++newval could (should?) be moved out of the if
statement for clearer code.

Daniel Pitts · Apr 6, 2008

Actually the calculations are mainly table look-ups. It looks like
this:

int thismin = 100

<...>

<loop starts>

int cell = (acceptMatrix[false][source[i-1]][target[1]]) ?
distmatrix[i-1][1] : distmatrix[i-1][1]+1 ;
const int above (distmatrix[i-1][2]+1);
if (cell>above) cell=above;
const int left (distmatrix[1]+1);
if (cell>left) cell=left;
const int trans (
distmatrix[i-2][0] + 1 +
(!acceptMatrix[false][source[i-2]][target[1]]) +
(!acceptMatrix[true][source[i-1]][target[0]])
);
if (cell>trans) cell=trans;
distmatrix[2]=cell;
if (cell<thismin) thismin=cell;

<loop ends>

Array lookups aren't terribly expensive, but they do have a cost. if
you can save a reference to distmatrix[i-1] (since its used in a few
places), that might shave a nanosecond or two. If you have nested
loops, look for calculations or references you can do outside of the
inner most loop (or outside of both loops).

For instance, if "i" is from your outer loop, you could "precalculate"
distmatrix, distmatrix[i-1], dismatrix[i-2], source[i-1],
source[i-2], etc...

Chances are, you'll have more luck finding a better algorithm than
you'll have shaving significant time off by optimizing this algorithm.

distmatrix is a 2-D array of int.

acceptmatrix is a 3D array of bool.

i and j are unsigned int loop indexes

source and target are const unsigned char * const

An additional question is whether the use of named const temporaries
is detrimental. Should I perhaps instead hand-inline the table
lookups, and let the compiler find the common expressions and
optimize, or...

Click to expand...

It shouldn't really hurt, they would probably compile to the same thing.

Martin York · Apr 7, 2008

Array lookups aren't terribly expensive, but they do have a cost. if
you can save a reference to distmatrix[i-1] (since its used in a few
places), that might shave a nanosecond or two.

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Not sure I buy that. Could be wrong!

Of course we are getting down to machine specific behavior at this
point. But accessing a memory location via an array or reference would
be just as expensive as most memory access instructions (Its been a
while so things may have changed) allow a major pointer and and offset
as part of the same instruction.

Actually the expensive bit is not in the memory retrieval but in
loading the memory from main memory into the local cache. If your code
is getting cache misses the processor stall time for this is many
orders of magnitude greater than any theoretical difference between an
array verses reference read.

As a result I think any changes to code for a perceived code
performance increase must be significant before you consider change to
the code that makes it less readable (read maintainable) as any
performance increases on a particular data size may evaporate when
used on a new data size because of changes in cache traversal/usage.

ppi · Apr 7, 2008

Hi,

I have a tight loop where I basically do this over and over (millions
of times):

int cell = <calculation>
const int above = <calculation>
if (cell > above) cell = above;
const int below = <calculation>
if (cell > below) cell = below;
const int trans = <calculation>
if (cell > trans) cell = trans;

Is there a faster way to do this?

Sincerely,
Anders

you may get better results by helping the compiler filling up the
pipeline:

int a = cell > above ? above : cell;
int b = below > trans ? trans : below;
cell = a < b ? b : a;

ideally 'a' and can 'b' be computed efficiently: they are both
independent and cached in registers.
By doing so you provide the compiler with more information regarding
the computation, namely that his free to compute a or b whenever it
wants, in this order or not.
Keeping on re-assigning the same variable again and again just make
the program instructions code stuck in the pipeline.
If you can write something like the previous code, you should notice a
speed-up.
This is nice feature off powerpc and AMD/Intel pentium processors.

You may be interested by:
- http://www.agner.org/optimize/optimizing_cpp.pdf
- http://developer.amd.com/TechnicalArticles/Articles/pages/7162004127.aspx
- http://developer.amd.com/TechnicalArticles/Articles/pages/6212004126.aspx

Even if the last 2 links are from the amd website, they are still
relevant for general optimization.

Unfortunately this means that you do have a good compiler and that the
code/algorithm you use can actually make this possible.

Hope it will help,
-- paulo

James Kanze · Apr 8, 2008

Array lookups aren't terribly expensive, but they do have a cost. if
you can save a reference to distmatrix[i-1] (since its used in a few
places), that might shave a nanosecond or two.
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Click to expand...

Not sure I buy that. Could be wrong!

More to the point (and this applies to the last response by ppi
as well, concerning pipeline stall): the compiler knows this
better than you, and turning on optimization will normally
result in the best solution here. Compilers have been caching
array accesses and common subexpressions for several decades
now, and all modern compilers know about pipelines, memory
caches, etc., and take them into account when optimizing. All
such micro-optimizations do in source code is make it more
convoluted---more difficult for the human reader to understand,
and more difficult for the compiler to optimize.

anders.johansen · Apr 9, 2008

Hi,

Well, I can tell you that manual loop unrolling for one special case
gave a massive speedup, and it now runs in about 25% of the time it
used to. I may attempt the approach of helping the compiler bound the
life of temporaries by using curlies, but initial results indicate
that this is detrimental to performance. Next I will probably try to
eliminate named temporaries, on the assumption that the compiler is
able to identify the references easier withour the aliassing
introduced by named temps.

Still, so far I have gotten some nice speedups. They are not related
to the issue I brought up here, but the solutions are in many cases
inspired by the discussion, so you have all helped me in various
ways

Thank you all!

Anders

Minimum Total Difficulty	0	Nov 15, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
TF-IDF	1	Aug 19, 2021
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
New way of doing integer types	9	Jan 17, 2008
the fastest way to do an odd/even check?	6	Feb 1, 2007
Best way to access methods of Objects and its sub-Objects	8	Apr 26, 2013

Fastest way to get minimum of four integer values?

anders.johansen

kasthurirangan.balaji

Martin

Kai-Uwe Bux

Lionel B

Yannick Tremblay

Puppet_Sock

anders.johansen

Michael DOUBEZ

anders.johansen

Daniel Pitts

Martin York

ppi

James Kanze

anders.johansen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads