fast templated bitfield finder

shaun roe · Dec 27, 2007

For a bit of seasonal festive fun, I thought I'd try making a bitfield
function, i.e. a function returning, for example, the value of bits 1 to
5 of a word as an integer, when the exact bits are known at compile
time.
The data word is always 32 bit unsigned integer (its a data stream from
some embedded controller).

The legacy code I have has lines like:
errNum = (dataWord>>1) & 31; //bits 1 to 5 is an error code

and what I would aim for is something like:
errNum=errBits(word);

the catch is that speed is all important, so I thought maybe templates
could help me to compile in the bitfields. My feeble attempts are below;
they are both about 40 times slower than the existing code (using gcc 4
on a mac).

Can anyone show me a better solution, not using macros (I am considering
using a union of a bitfield struct and an unsigned int, though I might
have to worry about endianness and what happens when I go to a 64 bit
machine) ?

attempt a)

template <int BIT1, int BIT2, typename T>
inline
T getBitField2( const T word){
static const unsigned int bitLen = BIT2 - BIT1 + 1;
static const unsigned int mask = (1<<bitLen) -1;
return (word >> BIT1) & mask;
}

used like:
errNum = getBitField2<1,5>(word);

attempt b)

template <int MASKLENGTH>
inline
const int maskit(){
return (1<<MASKLENGTH)-1;
}

template<class NumericalType>
inline
NumericalType getBitField(const NumericalType thisNumber, unsigned int
bitStart, unsigned int bitEnd){
const unsigned int bitLength = bitEnd-bitStart+1;
const NumericalType bitMask = (1<<bitLength) -1;
return (thisNumber>>bitStart) & bitMask;
}

template <unsigned int BIT1, unsigned int BIT2>
class BitField{
private:
const unsigned int mask;
public:
BitField():mask(maskit<BIT2>()){

}
inline int
operator()(const unsigned int word) const{
return (word>> BIT1) & mask;
}
};

(b) is used like:
BitField<1,5> errBits;

errNum = errBits(word);

surprisingly (for me) both (a) and the function call in using (b) have
approximately the same dismal timing, when I thought that instantiating
the class first (in b) would hardwire the bit numbers, so that the
function call would be much quicker.

jkherciueh · Dec 27, 2007

shaun said:
For a bit of seasonal festive fun, I thought I'd try making a bitfield
function, i.e. a function returning, for example, the value of bits 1 to
5 of a word as an integer, when the exact bits are known at compile
time.
The data word is always 32 bit unsigned integer (its a data stream from
some embedded controller).

The legacy code I have has lines like:
errNum = (dataWord>>1) & 31; //bits 1 to 5 is an error code

and what I would aim for is something like:
errNum=errBits(word);

the catch is that speed is all important, so I thought maybe templates
could help me to compile in the bitfields. My feeble attempts are below;
they are both about 40 times slower than the existing code (using gcc 4
on a mac).

Can anyone show me a better solution, not using macros (I am considering
using a union of a bitfield struct and an unsigned int, though I might
have to worry about endianness and what happens when I go to a 64 bit
machine) ?

attempt a)

template <int BIT1, int BIT2, typename T>
inline
T getBitField2( const T word){
static const unsigned int bitLen = BIT2 - BIT1 + 1;
static const unsigned int mask = (1<<bitLen) -1;
return (word >> BIT1) & mask;
}

used like:
errNum = getBitField2<1,5>(word);

attempt b)

template <int MASKLENGTH>
inline
const int maskit(){
return (1<<MASKLENGTH)-1;
}

template<class NumericalType>
inline
NumericalType getBitField(const NumericalType thisNumber, unsigned int
bitStart, unsigned int bitEnd){
const unsigned int bitLength = bitEnd-bitStart+1;
const NumericalType bitMask = (1<<bitLength) -1;
return (thisNumber>>bitStart) & bitMask;
}

template <unsigned int BIT1, unsigned int BIT2>
class BitField{
private:
const unsigned int mask;
public:
BitField():mask(maskit<BIT2>()){

}
inline int
operator()(const unsigned int word) const{
return (word>> BIT1) & mask;
}
};

(b) is used like:
BitField<1,5> errBits;

errNum = errBits(word);

surprisingly (for me) both (a) and the function call in using (b) have
approximately the same dismal timing, when I thought that instantiating
the class first (in b) would hardwire the bit numbers, so that the
function call would be much quicker.

In (a) and (b) you compute the bitMask inside a function to be called at
run-time. What about:

template < unsigned int from_, unsigned int to_ >
struct bit_mask {

static unsigned long const from = from_;
static unsigned long const to = to_;

private:

static unsigned long const mask = ( 1 << ( to - from ) ) - 1;

public:

static
unsigned long get ( unsigned long word ) {
return ( ( word >> from ) & mask );
}

};

#include <iostream>

int main ( void ) {
std::cout << bit_mask<1,6>::get( 1024 + 16 + 8 + 4 ) << '\n';
}

Best

Kai-Uwe Bux

Dave Rahardja · Dec 28, 2007

In (a) and (b) you compute the bitMask inside a function to be called at
run-time. What about:

template < unsigned int from_, unsigned int to_ >
struct bit_mask {

static unsigned long const from = from_;
static unsigned long const to = to_;

private:

static unsigned long const mask = ( 1 << ( to - from ) ) - 1;

What do from and to mean, and what are their ranges? If I read your
code correctly, then from has the range 0..31 and to has the range
1..32. If so this statement will cause a potential overflow when to ==
32 and from == 0.

public:

static
unsigned long get ( unsigned long word ) {
return ( ( word >> from ) & mask );
}

};

#include <iostream>

int main ( void ) {
std::cout << bit_mask<1,6>::get( 1024 + 16 + 8 + 4 ) << '\n';
}

How bout...

// Constraint: T has to be an integer type.

// Generate a mask containing bitCount_ number of one bits in the LSB position
template <int bitCount_, typename T>
struct Mask
{
static const T value = (T(1) << bitCount_) |
Mask<bitCount_ - 1, T>::value;
};

template <typename T>
struct Mask<0, T>
{
static const T value = 1;
};

// bitCount_ - Number of bits we are interested in
// lsb_ - Position of least significant bit we are interested in
template <int bitCount_, int lsb_, typename T>
inline T bitMask(const T& val)
{
return (val >> lsb_) & Mask<bitCount_, T>::value;
}

#include <iostream>
int main()
{
std::cout << bitMask<5, 3>(0x14AF24AB) << std::endl; // "21"
}

-dr

shaun roe · Dec 29, 2007

How bout...

// Constraint: T has to be an integer type.

// Generate a mask containing bitCount_ number of one bits in the LSB position
template <int bitCount_, typename T>
struct Mask
{
static const T value = (T(1) << bitCount_) |
Mask<bitCount_ - 1, T>::value;
};

template <typename T>
struct Mask<0, T>
{
static const T value = 1;
};

// bitCount_ - Number of bits we are interested in
// lsb_ - Position of least significant bit we are interested in
template <int bitCount_, int lsb_, typename T>
inline T bitMask(const T& val)
{
return (val >> lsb_) & Mask<bitCount_, T>::value;
}

#include <iostream>
int main()
{
std::cout << bitMask<5, 3>(0x14AF24AB) << std::endl; // "21"
}

-dr

thanks, this definitely looks promising, and the recursive template is
sexy enough... and yet I find the same timing again as my own attempts;
I'm guessing that theres an 'inline' being ignored or some lazy
instantiation to overcome...
I'll give it another couple of goes to see what happens, maybe trying to
force an early instantiation

Erik WikstrÃ¶m · Dec 29, 2007

thanks, this definitely looks promising, and the recursive template is
sexy enough... and yet I find the same timing again as my own attempts;
I'm guessing that theres an 'inline' being ignored or some lazy
instantiation to overcome...
I'll give it another couple of goes to see what happens, maybe trying to
force an early instantiation

Instantiation of templates always occurs during compilation and can not
affect run- time performance.

Dave Rahardja · Dec 29, 2007

thanks, this definitely looks promising, and the recursive template is
sexy enough... and yet I find the same timing again as my own attempts;
I'm guessing that theres an 'inline' being ignored or some lazy
instantiation to overcome...
I'll give it another couple of goes to see what happens, maybe trying to
force an early instantiation

Compile with optimizations turned on, then examine the assembler
output. The invocation of bitMask<>() should turn into an inlined
shift-and-mask operation (on almost all microprocessors, it's a single
instruction). Can't get any more efficient than that!

-dr

shaun roe · Dec 30, 2007

Dave Rahardja said:
Compile with optimizations turned on, then examine the assembler
output. The invocation of bitMask<>() should turn into an inlined
shift-and-mask operation (on almost all microprocessors, it's a single
instruction). Can't get any more efficient than that!

-dr

doh! of course I was compiling in debug mode; optimizing made a LOT of
difference, many thanks!

specialize a template for bitset<N>	3	Mar 16, 2006
bitfield confusion	33	Jul 11, 2013
bitfield organization	7	May 27, 2010
Alternative approach to bitfields	9	Jul 23, 2012
The code below does not work with -O3 with gcc 4.5 or later.	1	Dec 15, 2012
Writing Data to Bit Field	31	Feb 27, 2006
Pointer to templated function	2	Apr 17, 2011
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023

fast templated bitfield finder

shaun roe

jkherciueh

Dave Rahardja

shaun roe

Erik WikstrÃ¶m

Dave Rahardja

shaun roe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads