Population count of a bit string

J

jacob navia

Hi

The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).

I have developed this code, but I am not really satisfied with
it, specifically

o portability
o compatibility with C89

I assume that an unsigned long is at least 32 bits wide.

Thanks in advance for suggestions/comments.

static unsigned bitcount(uintmax_t x)
{
if (64 == sizeof(x)*CHAR_BIT) {
x -= (x>>1) & 0x5555555555555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x0101010101010101UL;
return x>>56;
}
else { // Assume that 32 bit unsigned long exists.
x -= (x>>1) & 0x55555555UL; // 0-2 in 2 bits
x = ((x>>2) & 0x33333333UL) + (x & 0x33333333UL); // 0-4 in 4 bits
x = ((x>>4) + x) & 0x0f0f0f0fUL; // 0-8 in 8 bits
x *= 0x01010101UL;
return x>>24;
}
}


/* In C89 uintmax_t doesn't exist. I use: */
#ifndef NO_C99
#include <stdint.h>
#else
typedef unsigned long uintmax_t;
#endif
static uintmax_t PopulationCount(BitString *b)
{
uintmax_t *pumax,x,result=0;
size_t iterations;
int count;
unsigned char *p;

if (b == NULL || b->count == 0)
return 0;
/* Treat 32 or 64 bits at once */
pumax = (uintmax_t *)b->contents;
/* Since b->contents is the result of a malloc operation,
it must be aligned correctly for all types, including
uintmax_t
*/
iterations = b->count/(sizeof(x)*CHAR_BIT);
/* Position the pointer p to get the last few bytes
that are left once the uintmax_t pointer finishes.
*/
p = b->contents+(iterations*sizeof(x));
while (iterations > 0) {
x = *pumax++;
if (x != 0)
result += bitcount(x);
iterations--;
}
count = (b->count%(sizeof(x)*CHAR_BIT));
if (count) {
x=0;
/* count is NOT unsigned! This supposes that an
integer can hold CHAR_BIT-1 characters.
*/
while (count > 0) {
x = (x << CHAR_BIT) | *p++;
count -= CHAR_BIT;
}
result += bitcount(x);
}
return result;
}

---------------------------------------------------------------------------------
 
K

Keith Thompson

jacob navia said:
The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).

I have developed this code, but I am not really satisfied with
it, specifically

o portability
o compatibility with C89

I don't have much to say about the algorithm at the time, but I do
have some minor comments.
I assume that an unsigned long is at least 32 bits wide.

There's no need to state this assumption; unsigned long is guaranteed
to be at least 32 bits wide.

I think you're ignoring the possibility of padding bits. That's not a
bad assumption to make, but you might want to make it explicit.

[...]
/* In C89 uintmax_t doesn't exist. I use: */
#ifndef NO_C99
#include <stdint.h>
#else
typedef unsigned long uintmax_t;
#endif

How is NO_C99 defined?

In principle, you should be able to test the value of
__STDC_VERSION__, but there could be non-C99 compilers that provide
<stdint.h> and/or have an unsigned type bigger than unsigned long.

Perhaps you could configure it by generating an executing a small test
program, much like autoconf does.
static uintmax_t PopulationCount(BitString *b)
{ [...]
/* count is NOT unsigned! This supposes that an
integer can hold CHAR_BIT-1 characters.
*/

Why isn't count unsigned?

You should rephrase that second sentence. First, I think you mean
"int" rather than "integer". Second, an int almost certainly can't
hold CHAR_BIT-1 characters; you probably mean that it can hold values
up to CHAR_BIT-1.

You might consider a compile-time assertion of some sort, in addition
to describing your assumptions in comments.

[...]
 
J

James Dow Allen

The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).

This is a case where straightforward code will often be faster,
I think, than a bit-twiddling trick. Here's what I use:

/* Table for fast bit counting */
#define BC1(x) x, x+1, x+1, x+2
#define BC2(x) BC1(x), BC1(x+1), BC1(x+1), BC1(x+2)
#define BC3(x) BC2(x), BC2(x+1), BC2(x+1), BC2(x+2)
#define BC4(x) BC3(x), BC3(x+1), BC3(x+1), BC3(x+2)
#define BC5(x) BC4(x), BC4(x+1), BC4(x+1), BC4(x+2)
static char BitCnt[1 << 11] = {
BC5(0), BC5(1),
};

int ones32(Uint32 g)
{
return 0
+ BitCnt[g & 0x7ff]
+ BitCnt[g>>11 & 0x7ff]
+ BitCnt[g>>22 & 0x3ff];
}

This code assumes the argument to ones32 has 32 bits
exactly. Other cases left as exercise. :)

James Dow Allen
 
J

jacob navia

Keith Thompson a écrit :
I don't have much to say about the algorithm at the time, but I do
have some minor comments.


There's no need to state this assumption; unsigned long is guaranteed
to be at least 32 bits wide.

This is in C99, I do not know if C89 made this assumption.

I think you're ignoring the possibility of padding bits. That's not a
bad assumption to make, but you might want to make it explicit.
[...]

/* In C89 uintmax_t doesn't exist. I use: */
#ifndef NO_C99
#include <stdint.h>
#else
typedef unsigned long uintmax_t;
#endif

How is NO_C99 defined?

By a configurable define in the header file.
In principle, you should be able to test the value of
__STDC_VERSION__, but there could be non-C99 compilers that provide
<stdint.h> and/or have an unsigned type bigger than unsigned long.

Perhaps you could configure it by generating an executing a small test
program, much like autoconf does.
static uintmax_t PopulationCount(BitString *b)
{ [...]
/* count is NOT unsigned! This supposes that an
integer can hold CHAR_BIT-1 characters.
*/

Why isn't count unsigned?

Because I test for count > 0 !!!
If you have a count of bits left of (say) 5, subtracting 8 will give
a negative number. I test for that. If it would be unsigned the test would
not work.
 
K

Keith Thompson

Keith Thompson said:

And it's not an assumption, it's a requirement. If an implementation
has unsigned long narrower than 32 bits, that implementation does not
conform either to C89 or to C99.
 
M

Michael Foukarakis

Hi

The "population count" in a bit string should be the number of
bits set (1) in it. It is identical to the "accumulate" function
(The sum of all bits).

I have developed this code, but I am not really satisfied with
it, specifically

o portability
o compatibility with C89

I assume that an unsigned long is at least 32 bits wide.

Thanks in advance for suggestions/comments.

static unsigned bitcount(uintmax_t x)
{
        if (64 == sizeof(x)*CHAR_BIT) {
            x -=  (x>>1) & 0x5555555555555555UL;                                // 0-2 in 2 bits
            x  = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL);  // 0-4 in 4 bits
            x  = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL;                           // 0-8 in 8 bits
            x *= 0x0101010101010101UL;
            return   x>>56;
        }
        else { // Assume that 32 bit unsigned long exists.
            x -=  (x>>1) & 0x55555555UL;                        // 0-2 in 2 bits
            x  = ((x>>2) & 0x33333333UL) + (x & 0x33333333UL);  // 0-4 in 4 bits
            x  = ((x>>4) + x) & 0x0f0f0f0fUL;                   // 0-8 in 8 bits
            x *= 0x01010101UL;
            return  x>>24;
        }

}

The only minor quibble with that code is that if the bit string is a
representation of a signed quantity, maintaining the sign could be a
mistake - depending on the semantics of the situation.

Other than that, it seems quite OK. C89 required unsigned long
integers to be at least 32-bits wide, so you're in the clear there.
Portability though is a different issue, you're likely to encounter
weird issues there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top