array of bits

\\ · May 13, 2010

Hello,
I would like to implement array of bits using the appropriate data
type for the machine
in which the code is compiled. So, using 32 bits for 32 bits machines
(and os), and 64 bits
for 64 bits machines (and os).
How can I know which is the natural word size of the machine and do
all the rest?
Thanks,
tano

Tom St Denis · May 13, 2010

Hello,
I would like to implement array of bits using the appropriate data
type for the machine
in which the code is compiled. So, using 32 bits for 32 bits machines
(and os), and 64 bits
for 64 bits machines (and os).
How can I know which is the natural word size of the machine and do
all the rest?
Thanks,
tano

Make an array of unsigned int then use CHAR_BIT * sizeof(int) to
figure out how many bits are in it.

Tom

\\ · May 13, 2010

Well, I hope there is a better way.
I think that even a 64 bit machine could use 32 bit registers for
unsigned int.
In 64 bit machines, I would like to use 64 bit data types.
Since I have to to AND and OR operation between bitsets, I think there
is a
good point in using 64 bits when available.
Is it possible that there is no way to get if I am compiling under a
64 bit machine?

Dann Corbit · May 13, 2010

Hello,
I would like to implement array of bits using the appropriate data
type for the machine
in which the code is compiled. So, using 32 bits for 32 bits machines
(and os), and 64 bits
for 64 bits machines (and os).
How can I know which is the natural word size of the machine and do
all the rest?
Thanks,
tano

It's a FAQ:

20.8: How can I implement sets or arrays of bits?

A: Use arrays of char or int, with a few macros to access the
desired bit at the proper index. Here are some simple macros to
use with arrays of char:

#include <limits.h> /* for CHAR_BIT */

#define BITMASK(b) (1 << ((b) % CHAR_BIT))
#define BITSLOT(b) ((b) / CHAR_BIT)
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))

(If you don't have <limits.h>, try using 8 for CHAR_BIT.)

References: H&S Sec. 7.6.7 pp. 211-216.

BTW, the book has a more detailed answer. If you want to actually
implement it as an array, you will have to use C++, because C does not
have operator overloading.

But if you are using C++ then it is already done for you in the STL
bitset template.

\\ · May 13, 2010

Yes, I read the FAQ.

My doubts are not how to do operations, but how can I choose
the most appropriate type for bitset.
I.e. choose a 32 bits type on a 32 bits machine,
and a 64 bits type on a 64 bits machine.
However thanks,
tano

Keith Thompson · May 13, 2010

\"\" said:
Well, I hope there is a better way.
I think that even a 64 bit machine could use 32 bit registers for
unsigned int.
In 64 bit machines, I would like to use 64 bit data types.
Since I have to to AND and OR operation between bitsets, I think there
is a
good point in using 64 bits when available.
Is it possible that there is no way to get if I am compiling under a
64 bit machine?

(Please find a way to format your paragraphs more sanely. The jagged
lines are difficult to read.)

What exactly is a "64 bit machine"? If you can rigorously define the
term (I can't), then you can probably figure out a way to determine
whether you're on one.

My best suggestion: Look at a variety of systems, and for each one,
see which integer type is the one you'd like to use. Ideally
unsigned int should be the best choice, since it's *supposed*
to have "the natural size suggested by the architecture of the
execution environment" (C99 6.2.5p5), but requirements for backward
compatibility with 32-bit systems often override this suggestion.
Some supposedly 64-bit systems even have 32-bit longs.

It may be that unsigned long will be the best choice on a wide
variety of systems, or perhaps one of the int*fast_t types defined in
<stdint.h> (for implementations that provide <stdint.h>). Or make it
a build-time configuration option, chosen manually for each target.

You should also profile your code with both 32-bit and 64-bit
integers on each platform. You might be surprised by the results.
I don't have any particular surpise in mind, I'm just saying that
you shouldn't make assumptions.

Ben Pfaff · May 13, 2010

\"\" said:
My doubts are not how to do operations, but how can I choose
the most appropriate type for bitset.
I.e. choose a 32 bits type on a 32 bits machine,
and a 64 bits type on a 64 bits machine.

I would probably just use "unsigned long". In the most common
cases it will have the sizes that you request, and in other cases
it will still work.

Eric Sosman · May 13, 2010

Yes, I read the FAQ.

My doubts are not how to do operations, but how can I choose
the most appropriate type for bitset.
I.e. choose a 32 bits type on a 32 bits machine,
and a 64 bits type on a 64 bits machine.

If you can define precisely what you mean by "32 bits machine"
and "64 bits machine," perhaps someone will think of a test you can
make to tell them apart. My bet is that a rigorous definition will
prove to be elusive -- but go ahead, give it a try, surprise me.

Keith Thompson · May 13, 2010

Datesfat Chicks said:
That depends on whether you want to do it at compile time or runtime.

At compile time ... other posters have suggested various limit files.

Also (and somebody will flame me for this--don't blame them), the standards
may (or may not) require a correlation between the way the preprocessor and
the compiler behave. So something like:

#if ((1 << 32) == 0)
... this is a 32-bit platform
#else
... this is a 64-bit platform
#endif

might (or might not) be required to work.

Why would anyone flame you for being unsure about this?

The actual requirement in C99 is that preprocessor arithmetic is
done in the equivalent of intmax_t and uintmax_t, so the above
won't do what you're trying to do. And since intmax_t must be at
least 64 bits, 1<<32 will never be 0 in the preprocessor. Even if
that weren't the case, shifting a 32-bit value by 32 or more bits
is undefined.

As far as at runtime ... the standard technique is to keep left-shifting "1"
as an unsigned integer and count how many times it takes to become 0, i.e.

unsigned how_many_bits(void)
{
unsigned mask=1, count=0;

while(mask)
{
mask <<= 1;
count++;
}

return(count);
}

This is much simpler:

#include <limits.h>
sizeof(int) * CHAR_BIT

Your function is likely to give a more useful answer if unsigned
int has padding bits, but I don't know of any implementation where
it does.

I'm gonna get so flamed ...

No, just corrected. (Some people can't tell the difference; I
trust you can.)

Ersek, Laszlo · May 13, 2010

If you can define precisely what you mean by "32 bits machine"
and "64 bits machine," perhaps someone will think of a test you can
make to tell them apart.

I believe the OP thinks of something like:

"64 bits machine": where uint64_t is supported, and the C compiler
generates machine instructions (or assembly code) that access such objects
as single machine words.

"32 bits machine": a machine -- which is not also a "64 bits machine" --
where uint32_t is supported, and where the C compiler generates machine
instructions (or assembly code) that access such objects as single machine
words.

I guess he's looking for the widest unsigned integer type that fits into a
machine register. Yes, I know. There is probably nothing in the standard
that would help us determine the answer portably. ("What's a register to
begin with?") The OP seems to ask for a portable guess that will work fast
on most sensible implemntations. I'd consider

#include <stdint.h> // SIZE_MAX
#include <limits.h> // UINT_MAX
#include <stddef.h> // size_t -- perhaps move this after the #if

#if SIZE_MAX > UINT_MAX
typedef size_t mytype;
#else
typedef unsigned mytype;
#endif

Cheers,
lacos

Keith Thompson · May 13, 2010

Ersek said:
I believe the OP thinks of something like:

"64 bits machine": where uint64_t is supported, and the C compiler
generates machine instructions (or assembly code) that access such objects
as single machine words.

"32 bits machine": a machine -- which is not also a "64 bits machine" --
where uint32_t is supported, and where the C compiler generates machine
instructions (or assembly code) that access such objects as single machine
words.

I think that nearly boils down to "64-bit machines have 64-bit machine
words, and 32-bit machines have 32-bit machine words". All that remains
is to define "machine word".

Which you've more or less already done: a machine word can be accessed
by a single instruction. But there's no portable way to detect that in
C.

I guess he's looking for the widest unsigned integer type that fits into a
machine register. Yes, I know. There is probably nothing in the standard
that would help us determine the answer portably. ("What's a register to
begin with?") The OP seems to ask for a portable guess that will work fast
on most sensible implemntations. I'd consider

#include <stdint.h> // SIZE_MAX
#include <limits.h> // UINT_MAX
#include <stddef.h> // size_t -- perhaps move this after the #if

#if SIZE_MAX > UINT_MAX
typedef size_t mytype;
#else
typedef unsigned mytype;
#endif

Why not just this?

typedef size_t mytype;

Your version uses unsigned int only when it and size_t are the same
size (so it probably doesn't make any difference) or when unsigned
int is bigger than size_t (which is possible but I've never heard
of it).

Size_t is probably the same size as a machine address register,
which is probably one "word" (whatever that means). It's not
really what size_t is intended for, but it looks like it could be
a reasonable heuristic.

Ben Pfaff · May 13, 2010

Keith Thompson said:
Which you've more or less already done: a machine word can be accessed
by a single instruction. But there's no portable way to detect that in
C.

I'm not sure that even such a definition is foolproof. For
example, I have heard that the IBM 360 architecture supports
decimal arithmetic of variable-length quantities in a single
instruction, but I doubt that considering such instructions would
give an answer that the OP would be comfortable with.

Ersek, Laszlo · May 14, 2010

[snip]

I guess he's looking for the widest unsigned integer type that fits
into a machine register. Yes, I know. There is probably nothing in the
standard that would help us determine the answer portably. ("What's a
register to begin with?") The OP seems to ask for a portable guess that
will work fast on most sensible implemntations. I'd consider

#include <stdint.h> // SIZE_MAX
#include <limits.h> // UINT_MAX
#include <stddef.h> // size_t -- perhaps move this after the #if

#if SIZE_MAX > UINT_MAX
typedef size_t mytype;
#else
typedef unsigned mytype;
#endif

Click to expand...

Why not just this?

typedef size_t mytype;

Your version uses unsigned int only when it and size_t are the same size
(so it probably doesn't make any difference) or when unsigned int is
bigger than size_t (which is possible but I've never heard of it).

Yes, I wanted to protect against a fortuitous "degenerate" case, in which
the size_t elements of the array would be constantly promoted when
manipulated, and then converted back when stored.

Thanks!
lacos

Lew Pitcher · May 14, 2010

I'm not sure that even such a definition is foolproof. For
example, I have heard that the IBM 360 architecture supports
decimal arithmetic of variable-length quantities in a single
instruction,

The AP ("Add Packed") instruction is one of the IBM360 "Storage/Storage"
machine language instructions. Consequently, each of the two operands can
be, independantly, between 1 and 256 bytes in length, representing numbers
with between 1 and 511 decimal digits. With this instruction, it *is*
possible to perform a decimal add between two packed-decimal values with
different lengths.

but I doubt that considering such instructions would
give an answer that the OP would be comfortable with.

No, Indeed, the IBM360 would give the OP nightmares, because it is
simultaneously an 8bit, a 16bit, a 32bit and a 64bit machine, which uses
24-bit and 31-bit addresses. Later versions would expand the hardware
address space to 63-bit addresses (note that none of the address types fit
exactly into the C uint*t space).

HTH

\\ · May 14, 2010

Ok,
it seems that the question is not simply to define
so I try to put it clear.

I have to use array of bits not only with operations
like set/reset a bit, but also to compare a piece of
an array with a piece of another array, with AND/OR
logical operations, so I would like to use the widest
type available on the machine the code is compiled in.
The widest type naturally supported, obviosly. If I
compile on a 32 bit system, and the compiler could
simulate 64 bit data types through multiple 32 bit
operations, I am not interested in 64, I should
choose 32.

I guess he's looking for the widest unsigned
integer type that fits into a machine register.
Yes!!

(beginner rule: don't do performance optimization;
advanced rule: don't do it *yet*).

Well, normally I am not interested in the trick of
the month to get faster code, but in this case, I
think that choosing the right type is even a way to
write better code.

I am studying your answers, however, and thank you
very much.
tano

Keith Thompson · May 14, 2010

\"\" said:
it seems that the question is not simply to define
so I try to put it clear.

I have to use array of bits not only with operations
like set/reset a bit, but also to compare a piece of
an array with a piece of another array, with AND/OR
logical operations, so I would like to use the widest
type available on the machine the code is compiled in.
The widest type naturally supported, obviosly. If I
compile on a 32 bit system, and the compiler could
simulate 64 bit data types through multiple 32 bit
operations, I am not interested in 64, I should
choose 32.

Well, normally I am not interested in the trick of
the month to get faster code, but in this case, I
think that choosing the right type is even a way to
write better code.

I am studying your answers, however, and thank you
very much.

My advice:

Use a single typedef that defines which unsigned type you want to use.
Other than that single declaration, write all your code so that it
doesn't care which type it's dealing with. It could even be unsigned
char.

Once you've got your code working, then worry about which type is more
efficient on a given system. You can easily tweak the typedef and
rebuild to do performance comparisons.

Don't assume that 32-bit integers are going to be faster than 64-bit
integer, or vice versa, until you've *measured* it.

robertwessel2 · May 14, 2010

The AP ("Add Packed") instruction is one of the IBM360 "Storage/Storage"
machine language instructions. Consequently, each of the two operands can
be, independantly, between 1 and 256 bytes in length, representing numbers
with between 1 and 511 decimal digits. With this instruction, it *is*
possible to perform a decimal add between two packed-decimal values with
different lengths.

Actually the decimal instructions allow two four bit lengths, and so
support operands of 1-16 bytes (and 1-31 decimal digits). Other "SS"
format instructions use the same field to represent a single eight bit
length instead.

No, Indeed, the IBM360 would give the OP nightmares, because it is
simultaneously an 8bit, a 16bit, a 32bit and a 64bit machine, which uses
24-bit and 31-bit addresses. Later versions would expand the hardware
address space to 63-bit addresses (note that none of the address types fit
exactly into the C uint*t space).

zArch extended the address space to 64, not 63 bits. And 24 and 32
bit addresses are almost always handled as a full (32 bit) register
(not that there haven't weren't cases of software taking pains to only
use three bytes to store a 24 bit address). And for the most part, at
the architectural level, and in user mode, the address size impacts
only where the address space wraps around (with the notable exception
of the two original S/360 subroutine call instructions, BAL and BALR,
which require a noteworthy semantic change when moving from 24 bit to
31 bit addressing - although subroutine call instructions without the
oddities of BAL and BALR have been in the ISA since the seventies).

\\ · May 14, 2010

Wow, it seems that C99 provides something that could help me
(quoted from libc manual):

If you want an integer with the widest range possible on
the platform on which it is being used, use one of the
following. If you use these, you should write code that
takes into account the variable size and range of the integer.

intmax_t
uintmax_t

Tell me if I'm wrong.

bart.c · May 14, 2010

"" said:
Wow, it seems that C99 provides something that could help me
(quoted from libc manual):

Tell me if I'm wrong.

On my machine, these are 64-bits, even though my processor is 32-bits (or
used in 32-bit mode anyway).

Seebs · May 14, 2010

Wow, it seems that C99 provides something that could help me
(quoted from libc manual):

Tell me if I'm wrong.

You're wrong.

See, intmax_t is, on many systems, very inefficient, because it's a larger
type than any of the hardware types.

-s

printing bits ... the right way	2	Apr 1, 2010
Shifting bits	31	Nov 29, 2011
32/64 bit cc differences	110	Jan 10, 2014
bits, representations in integers	9	Oct 9, 2008
Types of bits	9	May 20, 2012
the need for 64 bits	12	Dec 28, 2009
How to use 32 bits JRE on Windows 7 64 bits computer?	5	Jun 7, 2012
Finding number of bits of integer	29	Oct 18, 2007

array of bits

\\

Tom St Denis

\\

Dann Corbit

\\

Keith Thompson

Ben Pfaff

Eric Sosman

Keith Thompson

Ersek, Laszlo

Keith Thompson

Ben Pfaff

Ersek, Laszlo

Lew Pitcher

\\

Keith Thompson

robertwessel2

\\

bart.c

Seebs

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads