IF-free conception.

Ian Collins · Feb 16, 2013

BartC said:
That makes a change from the majority of open-source C code that appears to
be Unix-oriented, in its distribution format and tool-chains expected to be
used to build it, just for a start!

Sums up the platform cultures....

glen herrmannsfeldt · Feb 16, 2013

But *you're* the one making (rather vague) claims about increased
performance. It's up to you to support those claims. I, for one,
am not going to do that work for you.

I can believe that replacing conditional code with non-conditional
("if-free") code can sometimes result in performance improvements.
If so, modern optimizing compilers are sophisticated enough to
perform such tranformations themselves. Do they? (I don't know the
answer to that). And I think that speculative execution avoids at
least some of the performance problems of conditional code; have you
looked into whether that might make your if-less code unnecessary?

I believe that some now have conditional load, which allows for a
conditional without the execution path uncertainty of a branch.

IA32 has CMOVxx (where xx is the condition) which can replace some
branch instructions. It might not go all the way back to the 386,
and it might be that compilers generating 32 bit code still stay
with instructions from the 386 (at least by default).

Between branch prediction and speculative execution, much of the
cost of branches is reduced.

-- glen

glen herrmannsfeldt · Feb 16, 2013

(snip, I wrote)

CDC was the famous one. There is a story that when Seymour Cray
switched to 2's complement for the Cray-1 somebody asked him why he'd
used 1's before. He said he'd only just learned about 2's complement.

I did use some CDC machines in the 1980's, but I don't remember if
they had C compilers. About the time I learned C, I had enough other
computers to use.

Unisys produced ones complement machines more recently, but again
I don't know about C compilers.

-- glen

Evgeney Knyazhev · Feb 16, 2013

1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.
*i'm too lazy to polish code (yes, i know it's extremely bad

).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's really good idea to avoid if-statements at some cases as much as possible. (heap sort has been one of such representative examples).
-----------------------
let's compare two type of codes in the heap sort.

IF-based:
------------------------------------
if(child+1<end){
if(a[child]<a[child+1])child++;
}

Evgeney Knyazhev · Feb 17, 2013

http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/

Good explanation of IF-Disaster

Ian Collins · Feb 17, 2013

Evgeney said:
1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.
*i'm too lazy to polish code (yes, i know it's extremely bad ).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's really good idea to avoid if-statements at some cases as much as possible. (heap sort has been one of such representative examples).

If you are so sure, post two *working C* examples and your benchmark
results. Otherwise this looks like yet another attempt at a pointless
premature micro-optimisation.

Evgeney Knyazhev · Feb 17, 2013

Ian, ye have code -- you can do it yourself for yourself

i don't see any reason why i need to do it for you. http://stackoverflow.com/questions/...-a-sorted-array-faster-than-an-unsorted-array
learn the problem, learn the experiences & you'll figure out why i'm confident of it ;D

Ben Bacarisse · Feb 17, 2013

Evgeney Knyazhev said:
1st & foremost, my Friends, my dozen of Thanks for reply of Yours. at
the 2nd moment, i'd like to put some things clear:

*my very goal to verify one idea or another.

But you don't do any verification! In fact when asked if you have any
evidence to support your claims you tell the asker to do it. When I did
that (me) you say that the example I was foolish enough to waste my time
on was just for fun.

*i'm too lazy to polish code (yes, i know it's extremely bad ).
*i'd like to move on Linux, but no time + too lazy again.
* branch prediction is very reason to slow processor down, so it's
really good idea to avoid if-statements at some cases as much as
possible.

Why, then, did it not work for the code of yours that I tried?

(heap sort has been one of such representative examples).
----------------------- let's compare two type of codes in the heap
sort.

IF-based:
------------------------------------
if(child+1<end){
if(a[child]<a[child+1])child++;
}

Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.

James Kuyper · Feb 17, 2013

---------------------------------
Well, so far the only results you've gotten is corrections to your
mistakes about C, expressions of disgust, reported timings that
contradict the supposed speed advantages, and redirection to more
appropriate windows/C++ forums. I haven't seen any positive expressions
of interest in the idea.

True, but this doesn't even come close. The costs to the clarity and
maintainability of the code are enormous. You've ignored requests to
provide test results measuring the supposed benefits, and other people
have produced test results that seem to suggest that your approach
actually makes things worse. Even if there were actual benefits, this
would constitute micro-optimization of the worst sort.

Evgeney Knyazhev · Feb 17, 2013

----------------------------------
Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.

Ian Collins · Feb 17, 2013

Evgeney Knyazhev wrote:

Please fix your quoting!

Ian, ye have code -- you can do it yourself for yourself i don't
see any reason why i need to do it for you.
http://stackoverflow.com/questions/...-a-sorted-array-faster-than-an-unsorted-array

As the answer there says, at least two common compilers can optimise
away the branch (I tried gcc and the sort made no difference).

Prefer simple code over obscure hacks unless you have a very specific
need for one.

Evgeney Knyazhev · Feb 17, 2013

well, James, perhaps later i'll get gcc to re-write code to it, or shall doit on Java. about readability: high-performance code frequently ain't goodreadable. Automatic Optimization with such technique would be preferable, but AO is too closely related with ATP (automated theorem proving). to avoid IFs ain't easy-go Beast at many cases.

Evgeney Knyazhev · Feb 17, 2013

Ian, what a kind of optimization ye're saying?

James Kuyper · Feb 17, 2013

----------------------------------
Simple answer: you might need it for the code to work: your re-write is
not the same as the original. It is possible (likely, I'd say) that the
child + 1 < end test is there to prevent accessing a[child + 1] when
that element is out-of-bounds.
----------------------------------
Ben, it's the code from original source:

-----------------------------
dv=(child+1<end);
dv=(a[child]<a[child+1])*dv;
child+=dv;
----------------------------

The original code never accesses a[child+1] when child+1>=end. There are
other possibilities, but one of the most common reasons for writing code
like that is that for larger values of 'child', the expression
a[child+1] has undefined behavior, because it refers to a non-existent
element of an array.

Your re-write always evaluates the expression a[child+1], regardless of
the value of 'child', before multiplying by 0. Thus, it fails to avoid
the undefined behavior that the original version successfully avoided.

James Kuyper · Feb 17, 2013

well, James, perhaps later i'll get gcc to re-write code to it, or
shall do it on Java. about readability: high-performance code
frequently ain't good readable. Automatic Optimization with such
technique would be preferable, but AO is too closely related with ATP
(automated theorem proving). to avoid IFs ain't easy-go Beast at many
cases.

So far. several people have reported timing results suggesting that the
compilers they were using took the simple, clear, if-based code, and
automatically optimized it to run as fast as, or even faster than, your
obscure and unreadable if-free code. Do you know of any compiler for
which this is NOT the case? Could you identify that compiler, the
platform you were using it on, and the results of your timing tests?

Ian Collins · Feb 17, 2013

Evgeney said:
Ian, what a kind of optimization ye're saying?

Optimising your quoting stile would be a good start!

I'm saying the (completely off topic C++) code you referred to ran just
a fast with or without the sort when compiled with gcc -O3.

Which just goes to show that micro-optimisations are best left to the
compiler.

glen herrmannsfeldt · Feb 17, 2013

(snip on avoiding branches)

True, but this doesn't even come close. The costs to the clarity and
maintainability of the code are enormous. You've ignored requests to
provide test results measuring the supposed benefits, and other people
have produced test results that seem to suggest that your approach
actually makes things worse. Even if there were actual benefits, this
would constitute micro-optimization of the worst sort.

You might look at:

http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

Written by Linus Torvalds, it seems to indicate that CMOV isn't
all that useful compared to branches.

Well, more specifically, on the P4 it isn't all that good, but
then mispredicted branches aren't either. On Core 2, both CMOV
and branches are better.

The discussion relates to out-of-order execution, which is done
by many of the more recent processors, and the effect of different
instructions on it.

Seems that for a truly unpredicable branch that CMOV might be
better. (A test on the result of a random number generator.)

Compilers might not be so good at knowing which ones are more
predicatable and which less.

-- glen

88888 Dihedral · Feb 17, 2013

Evgeney Knyazhevæ–¼ 2013å¹´2æœˆ16æ—¥æ˜ŸæœŸå…UTC+8ä¸Šåˆ4æ™‚51åˆ†47ç§’å¯«é“ï¼š

Good Time to all, Amici(Friends).

Here, i'd like to discuss how to reduce conditional branches in the code.1st of the all, i would like to share some tricks. Their description is here (http://alg0z.blogspot.ru/2013/02/if-or-no-if.html).

I'll give an example that will work to reduce the if part.

a= a&1; // integer for reducing to the 0,1 system

Invoke functor[a] immediately.

a=a | 0xffffffff ;// 0, 0xffffffff system for 32 bits
a2 = a ^ 0xffffffff;
y=(a&z1)| ( a2&z2); // if a y=z1 else z2 ;;
// good for very slow branching HW only

Bart van Ingen Schenau · Feb 17, 2013

well, James, perhaps later i'll get gcc to re-write code to it, or shall
do it on Java. about readability: high-performance code frequently ain't
good readable. Automatic Optimization with such technique would be
preferable, but AO is too closely related with ATP (automated theorem
proving). to avoid IFs ain't easy-go Beast at many cases.

Hight performance code does not have to be unreadable.
In fact, with modern optimizing compilers, it is far from obvious to hand-
optimize code, because any optimization you make might inhibit the
compiler from performing further optimizations, in effect rendering the
result *less* optimized.

As a point in fact, here are some measurements of your if-less sort
algorithms:

Optimised version (-O3):
(C) 2013 Evgeney Knyazhev.

Start No-IF heapsort
No-IF heapsort: 100000 ticks (100.000000 msec)
Start qsort
qsort: 20000 ticks (20.000000 msec)
Start std::sort
std::sort: 20000 ticks (20.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 60000 ticks (60.000000 msec)
Start No-IF two-way tides
No-IF two-way tides: 122200000 ticks (122200.000000 msec)

Debug version (-O0):
(C) 2013 Evgeney Knyazhev.

Start No-IF heapsort
No-IF heapsort: 140000 ticks (140.000000 msec)
Start qsort
qsort: 20000 ticks (20.000000 msec)
Start std::sort
std::sort: 30000 ticks (30.000000 msec)
Start std::make_heap + std::sort_heap
std::make_heap + std::sort_heap: 100000 ticks (100.000000 msec)
Start No-IF two-way tides
No-IF two-way tides: 221790000 ticks (221790.000000 msec)

As you can see, the No-IF algorithms are the two slowest ones (the two-
way tides might even be outperformed by a bogo-sort).

The code that generated the results (removed most of the C++ specific
stuff to keep it on topic here):

// IF-free sorting.cpp : Defines the entry point for the console
application.
// (C) 2013 Evgeney Knyazhev
// app name: IF-free Alg0Z
// ver: I.0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <time.h>
#include <algorithm>

void modExp(uint64_t mod, uint64_t *seed, uint64_t exp)
{
int root = 1;
uint64_t lseed = *seed;

for(int i = 0; exp >> (i + 1) > 0; i++)
{
if(((exp >> i) & 1) == 1)
{
root *= lseed;
root %= mod;
}
lseed = (lseed * lseed) % mod;
}
*seed = root;
return;
}

void pseudoRND(int64_t *arr, int num0fItems, uint64_t mod, uint64_t seed,
uint64_t exp)
{
int size0fT1 = sizeof(uint64_t) * 8 - 1;
uint64_t sign = 0;

for(int i = 0; i < num0fItems; i++)
{
modExp(mod, &seed, exp);
sign = (seed >> size0fT1);
arr = -1 * seed * sign + seed * ((sign + 1) & 1);
}
}

#define chosenT uint64_t

/*********************** HeapSort ********************************/
inline void swp(int64_t *p0, int64_t *p1)
{
int64_t empty;
empty = *p0;
*p0 = *p1;
*p1 = empty;
return;
}

inline int64_t le(int64_t *p0, int64_t *p1)
{
int64_t i1, i2, empty, *pEmpty;
pEmpty = ∅
i2 = (*p0 < *p1);
i1 = (i2 + 1) & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p0;
*p0 = *p1;
*p1 = empty;
return i2;
}

inline int siftDown(int64_t *a, int root, int end)
{
int child;
int count = 0;
int64_t dv;
child = root << 1;
child++;
while(child < end)
{
dv = (child + 1 < end);
dv = (a[child] < a[child+1]) * dv;
child += dv;
dv = le(&a[root], &a[child]);
root = child;
end *= dv;
child = root << 1;
child++;
count++;
}
return count;
}

inline int Heapify(int64_t *a, int end)
{
int root = (end-1) >> 1;
int cnt = 0;
while(root > -1)
{
cnt += siftDown(a, root--, end);
}
return cnt;
}

inline int heapSort(int64_t *arr, int end)
{
int count = Heapify(arr, end);
while(end > 0)
{
swp(&arr[0], &arr[end - 1]);
count += siftDown(arr, 0, end-1);
end--;
}
return count;
}
/*********************** HeapSort ********************************/

int comp(const void* a, const void* b)
{
const int64_t* lhs = (const int64_t*)a;
const int64_t* rhs = (const int64_t*)b;
return lhs - rhs;
}

/****************** main ******************/

int main(int argc, char* argv[])
{
printf("(C) 2013 Evgeney Knyazhev.\n\n");

int size0fInt = sizeof(int64_t) * 8 - 1;
int N0fItems = 100000;
int bottom = 0;
int DownThe;
int UpThe = 1;
int i1;
int i2;
int64_t deltaV;
int64_t *array2sort;
int64_t *unsortedArr;
int64_t *p0;
int64_t *p1;
int64_t *pEmpty;
int64_t empty;
char StopThe = 0;
char GV; /*greater value*/
char unit = 0;
pEmpty = ∅
chosenT seed = 613;
chosenT exp = 259;
chosenT mod = 91477361;
chosenT countIteration;

array2sort = (int64_t*)malloc(N0fItems * size0fInt);
unsortedArr = (int64_t*)malloc(N0fItems * size0fInt);

if(array2sort == NULL || unsortedArr == NULL)
{
printf("Not enough memory for array\n");
return 0;
}
pseudoRND(unsortedArr, N0fItems, mod, seed, exp);
for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

DownThe = N0fItems - 1;
countIteration = 0;
bottom = 0;
StopThe = 0;

printf("Start No-IF heapsort\n");
clock_t start = clock();
countIteration = heapSort(array2sort,N0fItems);
clock_t stop = clock();
printf("No-IF heapsort: %ld ticks (%f msec)\n", stop-start, (1000.0*
(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start qsort\n");
start = clock();
qsort(array2sort, N0fItems, sizeof(array2sort[0]), comp);
stop = clock();
printf("qsort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-start)/
CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::sort\n");
start = clock();
std::sort(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::sort: %ld ticks (%f msec)\n", stop-start, (1000.0*(stop-
start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start std::make_heap + std::sort_heap\n");
start = clock();
std::make_heap(array2sort, array2sort+N0fItems);
std::sort_heap(array2sort, array2sort+N0fItems);
stop = clock();
printf("std::make_heap + std::sort_heap: %ld ticks (%f msec)\n", stop-
start, (1000.0*(stop-start)/CLOCKS_PER_SEC));

for(unsigned int p = 0; p < (unsigned)N0fItems; p++)
array2sort[p] = unsortedArr[p];

printf("Start No-IF two-way tides\n");
start = clock();
/***********************************
IF-free Sorting two-way tides
************************************/
while(StopThe != 1)
{
p1 = &array2sort[DownThe];
p0 = &array2sort[DownThe - 1];
deltaV = *p1 - *p0;
GV = ((uint64_t)deltaV) >> size0fInt;
if(*p0 > *p1)
i2 = 1;
i1 = (GV + 1) & 1;
i2 = GV & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p1;
*p1 = *p0;
*p0 = empty;
unit = i2 + i1 * unit;
p1 = &array2sort[UpThe];
p0 = &array2sort[UpThe - 1];
deltaV = *p1 - *p0;
GV = ((uint64_t)deltaV) >> size0fInt;
i1 = (GV + 1) & 1;
i2 = GV & 1;
p1 = (int64_t*)(i2 * (chosenT)p1 + i1 * (chosenT)pEmpty);
p0 = (int64_t*)(i2 * (chosenT)p0 + i1 * (chosenT)pEmpty);
empty = *p1;
*p1 = *p0;
*p0 = empty;
unit = i2 + unit * i1;
UpThe++;
deltaV = UpThe - N0fItems;
GV = ((uint64_t)deltaV) >> size0fInt;
i1 = (GV + 1) & 1;
i2 = GV & 1;
bottom += i1;
UpThe = i2 * UpThe + i1 * bottom;
DownThe--;
/*deltaV=bottom-DownThe;
GV=((unsigned int)deltaV)>>size0fInt;
i1=(GV+1)&1;
i2=GV&1;*/
N0fItems -= i1;
DownThe = i2 * DownThe + i1 * (N0fItems - 1);
StopThe = i1 & (unit + 1);
unit &= i2;
countIteration++;
}
/**************************************
IF-free Sorting two-way tides
***************************************/
stop = clock();
printf("No-IF two-way tides: %ld ticks (%f msec)\n", stop-start,
(1000.0*(stop-start)/CLOCKS_PER_SEC));

return 0;
}

Bart v Ingen Schenau

Noob · Feb 17, 2013

glen said:
IA32 has CMOVxx (where xx is the condition) which can replace some
branch instructions. It might not go all the way back to the 386,

Indeed not.

Intel introduced CMOVcc with the Pentium Pro (so -march=i686 in gcc).

The availability of CMOVcc is signaled by bit 15 of EDX after
executing CPUID with EAX=1.

https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Pentium_Pro
https://en.wikipedia.org/wiki/Processor_supplementary_capability#IA-32
http://www.sandpile.org/x86/opc_2.htm
http://www.sandpile.org/x86/cpuid.htm

Regards.

What should I do Before I give up programming?	6	Jan 14, 2023
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
CSS Grid in a nutshell (Part 1)	0	May 31, 2023
Optimization idea: put (y&1) in [] instead of if()	11	Nov 4, 2009
I don't know if im on the right forum but i think someone did something weird on my instagram	0	Jan 31, 2023
Help with Javascript on my wordpress page	1	Aug 31, 2023
There is something weird going on my instagram and i need to know if it was done on purpose by someone else	2	Jan 31, 2023
Free root access	1	Apr 1, 2009

IF-free conception.

Ian Collins

glen herrmannsfeldt

glen herrmannsfeldt

Evgeney Knyazhev

Evgeney Knyazhev

Ian Collins

Evgeney Knyazhev

Ben Bacarisse

James Kuyper

Evgeney Knyazhev

Ian Collins

Evgeney Knyazhev

Evgeney Knyazhev

James Kuyper

James Kuyper

Ian Collins

glen herrmannsfeldt

88888 Dihedral

Bart van Ingen Schenau

Noob

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads