Rigorously speaking, I think we can say that every 8 bit entity
can be interpreted as an integral value in the range 0-255.
Furthermore, IF char is 8 bits on his implementation (which does
happen from time to time), then he can count on an unsigned char
having the value of 0-255. And IF in addition, his architecture
uses 2's complement for negative values (not really an
exceptional case either), he can also count on char having
values in the range -128-127.
And while there are two very big if's in there, the cases where
they don't hold are exceptional enough that I think he'd have
mentioned them if they didn't. For most programmers, they are
practical considerations only when one is striving for maximum
portability (or one is actually targeting one of the exotics).
I've been burning up the web looking for answers, and finding
(again) that me thinking outside the box is getting me into
trouble: I'm a techie. I look at a computer and I see its
innards; I look (imagine) at a file on disk and I *know* it's
nothing but magnetic domains in N and S orientations:
Physically, all we have is magnetic domains and electric charge.
Neither of which is, strictly speaking, 0's and 1's, but with an
appropriate discriminator, both can be interpreted as such. Of
course, even at the hardware level, you rarely have access at
that low a level. The machines I use all have hardware which
organizes those bits into bytes and words (and half words, and
double words), and interprets the resulting objects in different
ways: unsigned binary integers, 2's complement binary integers,
BCD, characters (not very often any more---that's usually left
to software today), floating point values, etc., etc.
The closest you can come to the individual "bits" is usually
machine bytes or machine words, unsigned char or some unsigned
integral type in C++.
Formally, all integral types but char may have padding bits.
Practically, again, such cases are rare and exotic. Although at
least one machine in the fairly recent past still used a tagged
architecture---rather than having two different machine
instructions, add and fadd, for integral and floating point add,
it had one machine instruction, which interpreted the bits in
the word to determine the type. If the mantissa field was zero,
it was an integer, otherwise a floating point. Obviously, the
results of overwriting and "int" with random bits on this
machine would be interesting, to put it mildly---you could
easily end up with an "int" that, when multiplied by 2, gave 3.
(But only if the program contained undefined behavior
elsewhere.)
Unless you already have to deal with such an exotic, I'd say
that you're on pretty safe grounds assuming that unsigned int is
16/32/64 bits, and corresponds to the values of the individual
bytes put end to end. (In other words, for most people, the
preceding paragraph can be classed as historical trivia, of no
real relevence to their programming today.)
Of course, if portability is no issue, you can even assume that
int is 4 bytes, or whatever it happens to be on your machine.
Bits. Ones and zeroes.
My need is to check the quality of my random number generators
(C++), because I'm getting an odd bias in a method I'm using
in my program. (So, no, I'm not worried about portability;
this is to check something precision-valuable to only me.)
For me, the best possible scenario is to *know* the random
numbers I use are truly random, and no computer-pseudorandom
generator can give me those.
By definition. They're not supposed to, either.
The best, short of building my own from scratch, is to take
advantage of the free *true* RNGs online, and they all put out
linear strings of random binary data.
You don't necessarily have to go online. At least on Unix
systems, all you have to do is open "/dev/random". Note that on
most hardware, without a dedicated white noise generator, random
bits don't come quickly. The system stores a certain number of
them, and once you've read these, reading from /dev/random can
be *very* slow (a couple of seconds per byte). For this reason,
I tend to use /dev/random only for seeding my pseudo-random
generator. (Or for applications where I don't need many random
values, like generating include guards, e.g.:
guard1=${prefix}` basename "$filename" | sed -e 's:[^a-zA-
Z0-9_]:_:g' `
guard2=`date +%Y%m%d`
guard3=`od -td2 -N 16 /dev/random | head -1 | awk '
BEGIN {
p =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
m = length( p )
}
{
for ( i = 2 ; i <= NF ; ++ i ) {
x = $i
if ( x < 0 ) x += 65526
printf( "%c", substr( p, (x%m)+1, 1 ) )
x = int(x / m)
printf( "%c", substr( p, (x%m)+1, 1 ) )
x = int(x / m)
printf( "%c", substr( p, (x%m)+1, 1 ) )
}
}
END {
printf( "\n" )
}' `
guard=${guard1}_${guard2}${guard3}
# ...
echo "#ifndef $guard"
echo "#define $guard"
echo
# ...
echo "#endif"
.)
(Some parse them for you; I prefer the raw format.)
I saw articles by James Kanze and a few others, but nothing I
found pinned down the problem I now face, which is how to read
'x' number of binary bits from a file and simply treat them as
if they were a long integer.
Formally, or practically on most machines?
If you've read all of what I've written, you know that a large
part of my argument is based on the fact that there is no such
thing as "unformatted" data. Well, I was wrong: you've found
such a case---a string of random bits is about as unformatted as
you can get. In this case, if you want guaranteed perfect
portability (which you can't get anyway, since your random
number source isn't going to be available on all machines),
you'd read unsigned char, and assemble them into unsigned long
using shift's and or (<< and |_. Practically, however, you
probably don't care about byte order, and you almost certainly
don't have to worry about porting to a 36 bit 1's complement
machine, or some other such exotic; in this particular case, I'd
just declare an array of unsigned long, reinterpret_cast the
pointer to it to char*, and use istream::read. (Having opened
the file in mode binary, of course, and having imbued the file
with std::locale::classic() before starting to read.)
(Also, I rather suspect that unsigned would be most appropriate
here. I don't know exactly what you are doing with the numbers
afterwards, but typically, if you're thinking of them in terms
of bits, then the unsigned integral types are more appropriate.
One less abstraction to deal with.)