B
Ben Bacarisse
bartc said:Ben Bacarisse said:A solution using such a de Bruijn sequence has already been posted[1].
On my x86_64 machine it is the fastest solution so far:
mine & and shift: 84ns
mod 37 + lookup: 43ns
de Bruijn multiply plus lookup: 19ns
All are compiler with optimisation (the function ended up inlined and
%37 was done as previouslt descibed).
The de Bruijn method omits the x & -x from the bithacks site since we
don't need it here:
int bit_set_4(uint32_t x)
{
static const int MultiplyDeBruijnBitPosition[32] = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
return MultiplyDeBruijnBitPosition[x * 0x077CB531U >> 27];
}
That's pretty good, about as fast (on my machine) as a function
containing just a single BSF inline instruction.
But it might be worth pointing out that this code I think only works
properly if there is one and only one bit set. For x==0, it gives the
same result as x==1, and for x's with extra bits set, it seems to go
wrong.
You mean pointed out more explicitly than my saying that I removed the
x & -x from the linked bithack because we don't need it in this
example? More explicitly than all the text up thread about "if you
really have only one bit set then..."?
BSF (find lowest set bit) also doesn't deal with x = 0, but gives
expected results on the rest.
To add to my timings:
gcc asm("bsf"): 14ns;