# Getting a range of bits from a bit vector

J

#### jacob navia

Hi

Suppose you have a bit vector composed of a descriptor and a sequence of
bits.
In descriptor->count you have the number of valid bits, and in
descriptor->contents you have a sequence of bits declared as
unsigned chars. The function "Create()" creates a new vector.

Does this function looks OK to you?

< begin code > ---------------------------------------------------
static BitString *GetRange(BitString *bs,size_t start,size_t end)
{
size_t t,len;
BitString *result;
size_t startbyte,endbyte,shiftamount,bytesToCopy,idx;

if (bs == NULL) {
NullPtrError("GetRange");

return NULL;
}
if (start >= bs->count)
return NULL;
if (start > end) {
t = start;
start = end;
end = t;
}
if (end >= bs->count)
end = bs->count;
len = end-start;
result = Create(len);
if (len == 0)
return result;
result->count = len;
startbyte = start/CHAR_BIT;
endbyte = end/CHAR_BIT;
shiftamount = start&(CHAR_BIT-1);
if (shiftamount == 0) {
/* Optimize this case. We can do just a memory move */
memmove(result->contents,bs->contents+startbyte,
endbyte - startbyte);
}
else {
/* Copy the first byte. Bring the first bit to be copied into
the position zero by shifting right all bits smaller than
the start index */
result->contents[0] = (bs->contents[startbyte] >> shiftamount);
bytesToCopy = (endbyte-startbyte);
idx = 1;
while (bytesToCopy) {
/* Put the low bits of the next byte into the correct
position of the last byte of the result */
unsigned b = (bs->contents[++startbyte] <<
(CHAR_BIT-shiftamount));
result->contents[idx-1] |= b;
/* Put the high bits now in the low position of the result */
b = (bs->contents[startbyte] >> (CHAR_BIT-shiftamount));
result->contents[idx] |= b;
bytesToCopy--;
idx++;
}
}
return result;
}
< end code > ---------------------------------------------------

I

#### Ike Naar

if (start >= bs->count)
return NULL;
if (start > end) {
t = start;
start = end;
end = t;
}

First, I do not like this kind of "fudging", and wonder if
the function would be cleaner if it just would require
``0 <= start <= end <= bs->count'' as a precondition,
returning NULL when the precondition is not satisfied.

Anyway, *if* you decide to swap start/end so that in the end
``start <= end'' holds,
then I think it's cleaner to put the ``start >= bs->count'' check
after the start/end swap, not before it.

Motivation: apparently ``start=10, end=5, bs->count=12'' is okay,
and equivalent to ``start=5, end=10, bs->count=12''.
Then it's a bit odd that ``start=10, end=5, bs->count=8'' would be
an error instead of being equivalent to the (acceptable)
``start=5, end=10, bs->count=8''.

I would also change the error check ``start >= bs-count''
to ``start > bs->count'' because I do not think it is an error to
extract the empty range from the empty bitvector, so the situation
start = end = bs->count = 0 should yield an empty bitvector, not
an error.

J

#### Juha Niskanen

if (start >= bs->count)
return NULL;

I would use assert() here and reserve NULL return value to indicate
potentially recoverable condition like memory exhaustion.
if (start > end) {
t = start;
start = end;
end = t;
}

Why not simply assert here instead of swapping endpoints? If caller can't
pass arguments in correct order, he might be doing something else wrong
as well. It would be best to detect potential programmer logic errors as
early as possible.
result = Create(len);
if (len == 0)
return result;

I think you meant

if (result == NULL)
return result;

Or if intent was to return an empty bitvector when len==0, how does Create()
fail when len!=0 and it cannot allocate memory?
result->count = len;

So the Create() did not fully construct a new bitvector after all? If it did,
then this assignment is redundant. If it did not, then you returned a partially
constructed empty bitvector earlier in len==0 case.
shiftamount = start&(CHAR_BIT-1);

Do you need to support machines where CHAR_BIT is not a power of two?

B

#### BartC

Suppose you have a bit vector composed of a descriptor and a sequence of
bits.
In descriptor->count you have the number of valid bits, and in
descriptor->contents you have a sequence of bits declared as
unsigned chars. The function "Create()" creates a new vector.

Does this function looks OK to you?
startbyte = start/CHAR_BIT;
endbyte = end/CHAR_BIT;
shiftamount = start&(CHAR_BIT-1);
if (shiftamount == 0) {
/* Optimize this case. We can do just a memory move */
memmove(result->contents,bs->contents+startbyte,
endbyte - startbyte);
}
else {
/* Copy the first byte. Bring the first bit to be copied into
the position zero by shifting right all bits smaller than
the start index */
....

Have you thought of having a Bit Vector where the first bit can start
anywhere within the first byte rather than at bit 0 or bit 7?

It sounds untidy, but it means no shifting is needed when extracting a
subrange, while it is also possible to point into the subrange of an
existing bit vector, without extracting.

However, operations between two bit vectors might be more awkward if they
are not aligned the same way, and an offset of 0..7 needs to be maintained
and accounted for in every indexing op.

So it depends on what sort of operations are more frequent. Perhaps
alignment can be an option.

J

#### jacob navia

Le 17/03/11 10:11, Ike Naar a écrit :
First, I do not like this kind of "fudging", and wonder if
the function would be cleaner if it just would require
``0<= start<= end<= bs->count'' as a precondition,
returning NULL when the precondition is not satisfied.

Yes, I thought of that but all functions in the library should be
"forgiving" in the sense that specifying a range from 8 to 3
is not really an error, more a matter of conventions that
Anyway, *if* you decide to swap start/end so that in the end
``start<= end'' holds,
then I think it's cleaner to put the ``start>= bs->count'' check
after the start/end swap, not before it.

I had it before the swap but then the problem arises that the
end could be larger than the length of the bitstring after the
swap, so I would need to do it again.
Motivation: apparently ``start=10, end=5, bs->count=12'' is okay,
and equivalent to ``start=5, end=10, bs->count=12''.
Then it's a bit odd that ``start=10, end=5, bs->count=8'' would be
an error instead of being equivalent to the (acceptable)
``start=5, end=10, bs->count=8''.

I would also change the error check ``start>= bs-count''
to ``start> bs->count'' because I do not think it is an error to
extract the empty range from the empty bitvector, so the situation
start = end = bs->count = 0 should yield an empty bitvector, not
an error.

You are right in this last point.

J

#### jacob navia

Le 17/03/11 11:06, Juha Niskanen a écrit :
I would use assert() here and reserve NULL return value to indicate
potentially recoverable condition like memory exhaustion.

Yes, actually I will issue an error ("Bad arguments") with the standard
way the library answers to errors. But since the return value is fixed
I *have* to return NULL.
Why not simply assert here instead of swapping endpoints? If caller can't
pass arguments in correct order, he might be doing something else wrong
as well. It would be best to detect potential programmer logic errors as
early as possible.

Well is it really an error to demand a range between 8 and 5?
Doesn't really look like an error to me.
I think you meant

if (result == NULL)
return result;

Or if intent was to return an empty bitvector when len==0, how does Create()
fail when len!=0 and it cannot allocate memory?

No, there is a misunderrstanding here. Create returns a sring that can
hold up to len bits. But the string is initially empty, i.e. count is
always zero. There are two things to keep in mind:

1) The capacity of the string (argument to create())
2) The actual number of bits in the strings, at creation time zero.
So the Create() did not fully construct a new bitvector after all? If it did,
then this assignment is redundant. If it did not, then you returned a partially
constructed empty bitvector earlier in len==0 case.

See above
Do you need to support machines where CHAR_BIT is not a power of two?

Surely not, sorry. Imagine a byte size of 13... That would break SO MANY
programs that I would not need to worry about.

J

#### James Waldby

.
startbyte = start/CHAR_BIT;
endbyte = end/CHAR_BIT;
shiftamount = start&(CHAR_BIT-1);
....

Even if you plan to ignore the case where CHAR_BIT isn't 2^n, you
still could compute shiftamount = start - startbyte * CHAR_BIT.

K

#### Keith Thompson

jacob navia said:
Le 17/03/11 10:11, Ike Naar a Ã©crit :

Yes, I thought of that but all functions in the library should be
"forgiving" in the sense that specifying a range from 8 to 3
is not really an error, more a matter of conventions that

Hmm. Personally, I'm not comfortable with that.

If I writes GetRange(bs, 8, 3), how likely is it that I really
want the range from bit 3 to bit 8? How much more likely is it
that I've made some kind of error (like assuming that the third
argument is a count), one that you've chosen not to diagnose?

IMHO, detecting and handling incorrect calls is nearly as important
as properly handling correct calls.

If you do want to swap the end points like this, I strongly suggest
documenting the behavior. And once people start depending on this,
you're stuck with the decision.

And a minor point: I'd make t local to the block. Actually,
I might write:

if (start > end) {
const size_t old_start = start;
start = end;
end = old_start;
}

(Not that there's anything wrong with the name "t".)

[...]

B

#### Barry Schwarz

Hi

Suppose you have a bit vector composed of a descriptor and a sequence of
bits.
In descriptor->count you have the number of valid bits, and in
descriptor->contents you have a sequence of bits declared as
unsigned chars. The function "Create()" creates a new vector.

Does this function looks OK to you?

< begin code > ---------------------------------------------------
static BitString *GetRange(BitString *bs,size_t start,size_t end)

Let bs->count == 24 and let bs->contents point to (or be an array of)
three 8-bit bytes holding the 24 bits. We can designate the 24 bits
as ABCDEFGH IJKLMNOP QRSTUVWX. If we call this function with
start set to 7 and end set to 21, I expect the result to be the 15-bit
sequence occupying two bytes HIJKLMNO PQRSTUV0.

I know that some architectures number the bits in a byte from right to
left but we are dealing with a artificial construct here (effectively
a bit array) and bit 7 should be "next to" bit 8, especially if we are
going to accept that not all bytes are 8-bits. On a 9-bit byte
system, the original data would look like ABCDEFGHI JKLMNOPQR
STUVWXzzz and the result should be HIJKLMNOP QRSTUV000.

The important point is, if you ignore the byte boundary, the sequence
of bits should be the same regardless of the value of CHAR_BIT.
{
size_t t,len;
BitString *result;
size_t startbyte,endbyte,shiftamount,bytesToCopy,idx;

if (bs == NULL) {
NullPtrError("GetRange");

return NULL;
}
if (start >= bs->count)
return NULL;
if (start > end) {
t = start;
start = end;
end = t;
}
if (end >= bs->count)
end = bs->count;
len = end-start;

You are missing a +1 here. The range 0 to 7 contains 8 bits, not 7.
result = Create(len);
if (len == 0)
return result;

Does Create() initialize the members of the newly allocated
descriptor? If not, result->count is unspecified and when the calling
function attempts to evaluate it the result is undefined. If it does,
it should set result->count to len and eliminate the need for the next
line.
result->count = len;
startbyte = start/CHAR_BIT;
endbyte = end/CHAR_BIT;
shiftamount = start&(CHAR_BIT-1);

Change to
shiftamount = start % CHAR_BIT;
to handle the case where CHAR_BIT is not a power of 2.
if (shiftamount == 0) {
/* Optimize this case. We can do just a memory move */
memmove(result->contents,bs->contents+startbyte,
endbyte - startbyte);

You are missing a +1 here also.
}
else {
/* Copy the first byte. Bring the first bit to be copied into
the position zero by shifting right all bits smaller than
the start index */
result->contents[0] = (bs->contents[startbyte] >> shiftamount);

I don't understand this statement at all. It stores bits that should
be discarded. In any event, continuing with the example, contents[0]
contains 0000000A or 0000000AB for the 8-bit and 9-bit cases.
bytesToCopy = (endbyte-startbyte);
idx = 1;
while (bytesToCopy) {
/* Put the low bits of the next byte into the correct
position of the last byte of the result */
unsigned b = (bs->contents[++startbyte] <<
(CHAR_BIT-shiftamount));

1 - In the first iteration, b contains JKLMNOP0 or JKLMNOP00.
2 - In the second iteration, b contains RSTUVWX0 or UVWXzzz00
result->contents[idx-1] |= b;

1 - And contents[0] now contains JKLMNOPA or LMNOPQRAB.
2 - And contents[1] now contains RSTUVWXH or UVWXzzzJK.
/* Put the high bits now in the low position of the result */
b = (bs->contents[startbyte] >> (CHAR_BIT-shiftamount));

1 - And b now contains 0000000H or 0000000JK
result->contents[idx] |= b;

1- And contents[1], if Create() initialized it to 0, contains the
same.
bytesToCopy--;
idx++;

2 - At the end of the second iteration contents now contains either
JKLMNOPA RSTUVWXH
or
LMNOPQRAB UVWXzzzJK
and I don't think either is what you intended.

K

#### Keith Thompson

jacob navia said:
Le 17/03/11 11:06, Juha Niskanen a Ã©crit :

Surely not, sorry. Imagine a byte size of 13... That would break SO MANY
programs that I would not need to worry about.

No argument there -- but you might consider enforcing this assumption
in your code, either with an assert or with a #error (if you can
figure out a way to determine at compile time whether CHAR_BIT is
a power of 2).

G

#### Gene

No argument there -- but you might consider enforcing this assumption
in your code, either with an assert or with a #error (if you can
figure out a way to determine at compile time whether CHAR_BIT is
a power of 2).

#if (CHAR_BIT & (CHAR_BIT - 1)) != 0
#error "CHAR_BIT is not a power of 2"
#endif

?

K

#### Keith Thompson

Gene said:
#if (CHAR_BIT & (CHAR_BIT - 1)) != 0
#error "CHAR_BIT is not a power of 2"
#endif

?

Yes, that's what I had in mind; I was just too lazy to look it up.

#ifndef CHAR_BIT
#error "CHAR_BIT is not defined (do you need #include <limits.h>?)"
#endif

Though I suppose it's just as easy to have "#include <limits.h>" in your
boilerplate as to have the above test.

I

#### ImpalerCore

No argument there -- but you might consider enforcing this assumption
in your code, either with an assert or with a #error (if you can
figure out a way to determine at compile time whether CHAR_BIT is
a power of 2).

\code
/*!
* \brief Assert a condition evaluated at compile time.
* \param name A descriptive name.
* \param expr The expression to evaluate.
*
* The motivation of an assertion that is evaluated at compile time
* is to declare requirements that are typically platform specific.
* For example, it is relatively common to find code that makes
* assumptions on the sizes of types. The \c C_STATIC_ASSERT macro
* can evaluate these conditions and force a compile time error if
* they are not true.
*
* \usage
* \include macros/C_STATIC_ASSERT_example.c
*
* On an embedded compiler, the integer type \c int is often 16-bits
* wide. In this scenario, the example above should display
* something similar to the following.
*
* \code
* C_STATIC_ASSERT_example.c:4: error: size of array
`sizeof_int_at_least_32_bits' is negative.
* \endcode
*/
#define C_STATIC_ASSERT(name, expr) extern char (name)[(expr) ? 1 :
-1]
\endcode

\code
#include <stdlib.h>
#include <limits.h>
#include <static_assert.h>

C_STATIC_ASSERT( CHAR_BIT_is_8_bits, CHAR_BIT == 8 );
C_STATIC_ASSERT( sizeof_int_at_least_32_bits, sizeof(int) >= 4 );

int main( void )
{
return EXIT_SUCCESS;
}
\endcode

To ensure that CHAR_BIT is a power of two, off the top of my head I'd
write something like:

C_STATIC_ASSERT( size_of_CHAR_BIT_supported,
CHAR_BIT == 8 || CHAR_BIT == 32 );

and add more terms representing supported powers-of-two as required if
I ran into other CHAR_BIT environments.

Best regards,
John D.

R

#### Ralf Damaschke

jacob navia said:
if (start >= bs->count)
return NULL;
if (start > end) {
t = start;
start = end;
end = t;
}

I second the comment from Ike Naar and I cannot follow your
objection.
Also, IMHO I would expect something slightly meaningful for a
documented start>end handling, such as reversing the bit string
selected (or of course getting an empty bit string).
if (shiftamount == 0) {
/* Optimize this case. We can do just a memory move */
memmove(result->contents,bs->contents+startbyte,
endbyte - startbyte);
}

The only reason I can imagine for not using memcpy would be a
common buffer with overlapping values that then of course change
all the time. Not a very useful concept (which you surely didn't
use).
while (bytesToCopy) {
/* Put the low bits of the next byte into the
correct
position of the last byte of the result */
unsigned b = (bs->contents[++startbyte] <<
(CHAR_BIT-shiftamount));
result->contents[idx-1] |= b;
/* Put the high bits now in the low position of the
result */
b = (bs->contents[startbyte] >>
(CHAR_BIT-shiftamount));

That shift width is inconsistent wrt to the layout derived by
contents[0].
result->contents[idx] |= b;

In some cases this one is probably off-by-one: consider a copy of
bits 1 to 8. Perhaps you decided to allocate one byte more than
needed for the sake of saving an additional check, but did you?

-- Ralf

B

#### BartC

Keith Thompson said:
Hmm. Personally, I'm not comfortable with that.

If I writes GetRange(bs, 8, 3), how likely is it that I really
want the range from bit 3 to bit 8? How much more likely is it
that I've made some kind of error (like assuming that the third
argument is a count), one that you've chosen not to diagnose?

If you've made an error, then you will get wrong results.

Specifying a bit range both ways is useful: some tend to use 0 to 31 for
generic bit arrays, and 31 to 0 for bits within a word (which also matches
the perceived order of the bits, with msb on the left, and lsb on the
right). This is what I did when I had a similar sort of library.

Of course, specifying 8 to 3 might also mean you want 3 to 8, but reversed;
but here I'd be happier with a specific reversal function.

Or, allowing 8 to 3 as a synonym for 3 to 8 could be an option.

K

#### Keith Thompson

BartC said:
If you've made an error, then you will get wrong results.

If it's practical (and sometimes it isn't), I'd much rather get an
explicit error indicatin than wrong results.
Specifying a bit range both ways is useful: some tend to use 0 to 31 for
generic bit arrays, and 31 to 0 for bits within a word (which also matches
the perceived order of the bits, with msb on the left, and lsb on the
right). This is what I did when I had a similar sort of library.

Of course, specifying 8 to 3 might also mean you want 3 to 8, but reversed;
but here I'd be happier with a specific reversal function.

Or, allowing 8 to 3 as a synonym for 3 to 8 could be an option.

As I said, my own preference is to require the arguments to be
given in a well-defined order. But I can deal with any consistent
behavior, as long as it's clearly documented.

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.