Parse binary stream to long ints

John Brawley · Feb 27, 2008

Hello again, good people....
[[Code may follow, if I can't figure this out...]]

Preliminary question:
Is it true that every conceivable 8-bit binary byte equates to some
character (or integer) between 0 and 255?, and by extension/analogy, it's
true(?) that every conceivable 16-bit byte equates to an integer between 0
and 65,535?, and, the largest, every *conceivable* 32-bit byte is also an
integer between 0 and (4294967296(?))?

I have a binary file of true random bits (from an online true RNG) that I
need to parse into 8, or 16, or 32-bit numbers, using C++, which file is
just a continuous stream of random 1 or 0 bits. My intent is to parse it by
reading it in chunks (8, 16, or 32), to get random numbers out of it. If my
assumption above is right, I'll get usable integers from this technique.
Yes? No?

Thanks

Micah Cowan · Feb 27, 2008

John said:
Hello again, good people....
[[Code may follow, if I can't figure this out...]]

Preliminary question:
Is it true that every conceivable 8-bit binary byte equates to some
character (or integer) between 0 and 255?,

Assuming that bytes in your C++ implementation are eight bits (this is
not guaranteed: a byte is not the same thing as an octet), and that
you're talking about an "unsigned char" ("signed char" is also a byte,
but it's range isn't 0-255), then yes.

and by extension/analogy, it's
true(?) that every conceivable 16-bit byte equates to an integer between 0
and 65,535?, and, the largest, every *conceivable* 32-bit byte is also an
integer between 0 and (4294967296(?))?

The answer for these is no (if I take you mean "16-bit value", rather
than "byte": the answer would be yes if bytes were 16 bits on your
system, but then you wouldn't be able to talk about 8-bit bytes
alongside them).

3.9.1#1:
For character types, all bits of the object representation participate
in the value representation. For unsigned character types, all possible
bit patterns of the value representation represent numbers. These
requirements do not hold for other types. In any particular
implementation, a plain char object can take on either the same values
as a signed char or an unsigned char; which one is implementation-defined.

I have a binary file of true random bits (from an online true RNG) that I
need to parse into 8, or 16, or 32-bit numbers, using C++, which file is
just a continuous stream of random 1 or 0 bits. My intent is to parse it by
reading it in chunks (8, 16, or 32), to get random numbers out of it. If my
assumption above is right, I'll get usable integers from this technique.
Yes? No?

This can not be generally be assumed in C++, no. However, there's
nothing wrong with declaring that your code is only intended to be
portable to systems on which such a thing is true, and stopping at that.

John Brawley · Feb 27, 2008

Micah Cowan said:
John said:

Hello again, good people....
[[Code may follow, if I can't figure this out...]]

Preliminary question:
Is it true that every conceivable 8-bit binary byte equates to some
character (or integer) between 0 and 255?,

Click to expand...

Assuming that bytes in your C++ implementation are eight bits (this is
not guaranteed: a byte is not the same thing as an octet), and that
you're talking about an "unsigned char" ("signed char" is also a byte,
but it's range isn't 0-255), then yes.

and by extension/analogy, it's
true(?) that every conceivable 16-bit byte equates to an integer between 0
and 65,535?, and, the largest, every *conceivable* 32-bit byte is also an
integer between 0 and (4294967296(?))?

Click to expand...

The answer for these is no (if I take you mean "16-bit value", rather
than "byte": the answer would be yes if bytes were 16 bits on your
system, but then you wouldn't be able to talk about 8-bit bytes
alongside them).

3.9.1#1:
For character types, all bits of the object representation participate
in the value representation. For unsigned character types, all possible
bit patterns of the value representation represent numbers. These
requirements do not hold for other types. In any particular
implementation, a plain char object can take on either the same values
as a signed char or an unsigned char; which one is implementation-defined.

I have a binary file of true random bits (from an online true RNG) that I
need to parse into 8, or 16, or 32-bit numbers, using C++, which file is
just a continuous stream of random 1 or 0 bits. My intent is to parse it by
reading it in chunks (8, 16, or 32), to get random numbers out of it. If my
assumption above is right, I'll get usable integers from this technique.
Yes? No?

Click to expand...

This can not be generally be assumed in C++, no. However, there's
nothing wrong with declaring that your code is only intended to be
portable to systems on which such a thing is true, and stopping at that.
Micah J. Cowan

Thank you Micah.
I've been burning up the web looking for answers, and finding (again) that
me thinking outside the box is getting me into trouble:
I'm a techie. I look at a computer and I see its innards; I look (imagine)
at a file on disk and I *know* it's nothing but magnetic domains in N and S
orientations:
Bits. Ones and zeroes.
My need is to check the quality of my random number generators (C++),
because I'm getting an odd bias in a method I'm using in my program. (So,
no, I'm not worried about portability; this is to check something
precision-valuable to only me.)
For me, the best possible scenario is to *know* the random numbers I use are
truly random, and no computer-pseudorandom generator can give me those.
The best, short of building my own from scratch, is to take advantage of the
free *true* RNGs online, and they all put out linear strings of random
binary data. (Some parse them for you; I prefer the raw format.)
I saw articles by James Kanze and a few others, but nothing I found pinned
down the problem I now face, which is how to read 'x' number of binary bits
from a file and simply treat them as if they were a long integer. I mean,
if I can read random 32 bits *as*they*are*, then I *should* be able to
(logically enough) within a program and a language, fool the language into
thinking they are the 32 bits of a long integer.
Apparently I'm trying to go backwards in computer language development: the
language (C++ for ex.) goes to great lengths to _hide_ what I really want,
from the coder. Strings, streams, even very low-level parsing, all seem
aimed at turning the bits into human sensible characters or numbers.

Thanks for the response.
If anyone knows a simple way to get a linear series of random bits from a
disk file, reading those bits as a number, I'd appreciate knowing about
it...

Christopher · Feb 27, 2008

John said:
John said:

Hello again, good people....
[[Code may follow, if I can't figure this out...]]
Preliminary question:
Is it true that every conceivable 8-bit binary byte equates to some
character (or integer) between 0 and 255?,

Click to expand...

Click to expand...

Assuming that bytes in your C++ implementation are eight bits (this is
not guaranteed: a byte is not the same thing as an octet), and that
you're talking about an "unsigned char" ("signed char" is also a byte,
but it's range isn't 0-255), then yes.

Click to expand...

The answer for these is no (if I take you mean "16-bit value", rather
than "byte": the answer would be yes if bytes were 16 bits on your
system, but then you wouldn't be able to talk about 8-bit bytes
alongside them).

Click to expand...

3.9.1#1:
For character types, all bits of the object representation participate
in the value representation. For unsigned character types, all possible
bit patterns of the value representation represent numbers. These
requirements do not hold for other types. In any particular
implementation, a plain char object can take on either the same values
as a signed char or an unsigned char; which one is implementation-defined.

Click to expand...

This can not be generally be assumed in C++, no. However, there's
nothing wrong with declaring that your code is only intended to be
portable to systems on which such a thing is true, and stopping at that.
Micah J. Cowan

Click to expand...

Thank you Micah.
I've been burning up the web looking for answers, and finding (again) that
me thinking outside the box is getting me into trouble:
I'm a techie. I look at a computer and I see its innards; I look (imagine)
at a file on disk and I *know* it's nothing but magnetic domains in N and S
orientations:
Bits. Ones and zeroes.
My need is to check the quality of my random number generators (C++),
because I'm getting an odd bias in a method I'm using in my program. (So,
no, I'm not worried about portability; this is to check something
precision-valuable to only me.)
For me, the best possible scenario is to *know* the random numbers I use are
truly random, and no computer-pseudorandom generator can give me those.
The best, short of building my own from scratch, is to take advantage of the
free *true* RNGs online, and they all put out linear strings of random
binary data. (Some parse them for you; I prefer the raw format.)
I saw articles by James Kanze and a few others, but nothing I found pinned
down the problem I now face, which is how to read 'x' number of binary bits
from a file and simply treat them as if they were a long integer. I mean,
if I can read random 32 bits *as*they*are*, then I *should* be able to
(logically enough) within a program and a language, fool the language into
thinking they are the 32 bits of a long integer.
Apparently I'm trying to go backwards in computer language development: the
language (C++ for ex.) goes to great lengths to _hide_ what I really want,
from the coder. Strings, streams, even very low-level parsing, all seem
aimed at turning the bits into human sensible characters or numbers.

Thanks for the response.
If anyone knows a simple way to get a linear series of random bits from a
disk file, reading those bits as a number, I'd appreciate knowing about
it...

--
Peace
JB
(e-mail address removed)
Web:http://tetrahedraverse.com

If you don't care about what they truly represent, sure, you can read
in binary data and stick it into whatever type you desire.
If I write out binary data from 64 bit integers in one program, that
doesn't stop you from reading them in as <whatever size char your env
uses> chars, as long as you don't plan on those chars having any
representation related to what was written out.

Micah Cowan · Feb 27, 2008

John said:
Apparently I'm trying to go backwards in computer language development: the
language (C++ for ex.) goes to great lengths to _hide_ what I really want,
from the coder.

Well, the main reason that C++ hides such things, because it wants to
continue to support platforms for which it may not be able to guarantee
these things.

Thanks for the response.
If anyone knows a simple way to get a linear series of random bits from a
disk file, reading those bits as a number, I'd appreciate knowing about
it...

For my part, I probably wouldn't be worrying a whole hell of a lot with
portability in this case, and simply read them directly into ints (or
whatnot). This is made much easier by the fact that, in this case, byte
ordering is irrelevant (unless you want the same input to parse the same
way on various implementations). You'd of course be using
istream::read() rather than the >> operator.

The usual way to do more portable reads (though usually used for values
where it actually matters what format you read it in) is to read it in
as a series of bytes, and construct the int therefrom, perhaps via a
series of bitshifts, so that you can remain ignorant of the host byte
ordering.

The C++ FAQ Lite has a lot to say about serialization:
http://www.parashift.com/c++-faq-lite/serialization.html ... probably
much more than you need for this, but very useful info at any rate.

James Kanze · Feb 27, 2008

Rigorously speaking, I think we can say that every 8 bit entity
can be interpreted as an integral value in the range 0-255.
Furthermore, IF char is 8 bits on his implementation (which does
happen from time to time), then he can count on an unsigned char
having the value of 0-255. And IF in addition, his architecture
uses 2's complement for negative values (not really an
exceptional case either), he can also count on char having
values in the range -128-127.

And while there are two very big if's in there, the cases where
they don't hold are exceptional enough that I think he'd have
mentioned them if they didn't. For most programmers, they are
practical considerations only when one is striving for maximum
portability (or one is actually targeting one of the exotics).

I've been burning up the web looking for answers, and finding
(again) that me thinking outside the box is getting me into
trouble: I'm a techie. I look at a computer and I see its
innards; I look (imagine) at a file on disk and I *know* it's
nothing but magnetic domains in N and S orientations:

Physically, all we have is magnetic domains and electric charge.
Neither of which is, strictly speaking, 0's and 1's, but with an
appropriate discriminator, both can be interpreted as such. Of
course, even at the hardware level, you rarely have access at
that low a level. The machines I use all have hardware which
organizes those bits into bytes and words (and half words, and
double words), and interprets the resulting objects in different
ways: unsigned binary integers, 2's complement binary integers,
BCD, characters (not very often any more---that's usually left
to software today), floating point values, etc., etc.

The closest you can come to the individual "bits" is usually
machine bytes or machine words, unsigned char or some unsigned
integral type in C++.

Formally, all integral types but char may have padding bits.
Practically, again, such cases are rare and exotic. Although at
least one machine in the fairly recent past still used a tagged
architecture---rather than having two different machine
instructions, add and fadd, for integral and floating point add,
it had one machine instruction, which interpreted the bits in
the word to determine the type. If the mantissa field was zero,
it was an integer, otherwise a floating point. Obviously, the
results of overwriting and "int" with random bits on this
machine would be interesting, to put it mildly---you could
easily end up with an "int" that, when multiplied by 2, gave 3.
(But only if the program contained undefined behavior
elsewhere.)

Unless you already have to deal with such an exotic, I'd say
that you're on pretty safe grounds assuming that unsigned int is
16/32/64 bits, and corresponds to the values of the individual
bytes put end to end. (In other words, for most people, the
preceding paragraph can be classed as historical trivia, of no
real relevence to their programming today.)

Of course, if portability is no issue, you can even assume that
int is 4 bytes, or whatever it happens to be on your machine.

Bits. Ones and zeroes.
My need is to check the quality of my random number generators
(C++), because I'm getting an odd bias in a method I'm using
in my program. (So, no, I'm not worried about portability;
this is to check something precision-valuable to only me.)

For me, the best possible scenario is to *know* the random
numbers I use are truly random, and no computer-pseudorandom
generator can give me those.

By definition. They're not supposed to, either.

The best, short of building my own from scratch, is to take
advantage of the free *true* RNGs online, and they all put out
linear strings of random binary data.

You don't necessarily have to go online. At least on Unix
systems, all you have to do is open "/dev/random". Note that on
most hardware, without a dedicated white noise generator, random
bits don't come quickly. The system stores a certain number of
them, and once you've read these, reading from /dev/random can
be *very* slow (a couple of seconds per byte). For this reason,
I tend to use /dev/random only for seeding my pseudo-random
generator. (Or for applications where I don't need many random
values, like generating include guards, e.g.:

guard1=${prefix}` basename "$filename" | sed -e 's:[^a-zA-
Z0-9_]:_:g' `
guard2=`date +%Y%m%d`
guard3=`od -td2 -N 16 /dev/random | head -1 | awk '
BEGIN {
p =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
m = length( p )
}
{
for ( i = 2 ; i <= NF ; ++ i ) {
x = $i
if ( x < 0 ) x += 65526
printf( "%c", substr( p, (x%m)+1, 1 ) )
x = int(x / m)
printf( "%c", substr( p, (x%m)+1, 1 ) )
x = int(x / m)
printf( "%c", substr( p, (x%m)+1, 1 ) )
}
}
END {
printf( "\n" )
}' `

guard=${guard1}_${guard2}${guard3}
# ...
echo "#ifndef $guard"
echo "#define $guard"
echo
# ...
echo "#endif"

.)

(Some parse them for you; I prefer the raw format.)
I saw articles by James Kanze and a few others, but nothing I
found pinned down the problem I now face, which is how to read
'x' number of binary bits from a file and simply treat them as
if they were a long integer.

Formally, or practically on most machines?

If you've read all of what I've written, you know that a large
part of my argument is based on the fact that there is no such
thing as "unformatted" data. Well, I was wrong: you've found
such a case---a string of random bits is about as unformatted as
you can get. In this case, if you want guaranteed perfect
portability (which you can't get anyway, since your random
number source isn't going to be available on all machines),
you'd read unsigned char, and assemble them into unsigned long
using shift's and or (<< and |_. Practically, however, you
probably don't care about byte order, and you almost certainly
don't have to worry about porting to a 36 bit 1's complement
machine, or some other such exotic; in this particular case, I'd
just declare an array of unsigned long, reinterpret_cast the
pointer to it to char*, and use istream::read. (Having opened
the file in mode binary, of course, and having imbued the file
with std::locale::classic() before starting to read.)

(Also, I rather suspect that unsigned would be most appropriate
here. I don't know exactly what you are doing with the numbers
afterwards, but typically, if you're thinking of them in terms
of bits, then the unsigned integral types are more appropriate.
One less abstraction to deal with.)

James Kanze · Feb 27, 2008

Well, the main reason that C++ hides such things, because it
wants to continue to support platforms for which it may not be
able to guarantee these things.

There's that, and the fact that they're just a distraction to
the programmer much of the time. One of the parculiarities of
C++, however, is that it makes it a point of honor to allow you
to access the lowest levels when appropriate. Such code won't
necessarily be portable, of course, because such low level
abstractions do vary between hardware. But nothing says that
C++ can only be used for 100% portable applications.

For my part, I probably wouldn't be worrying a whole hell of a
lot with portability in this case, and simply read them
directly into ints (or whatnot). This is made much easier by
the fact that, in this case, byte ordering is irrelevant
(unless you want the same input to parse the same way on
various implementations).

If he's using a truely random source, there's no way he could
tell, since he can't get the same input on two different
implementations. About the only thing that might cause problems
is different formats for integral types. He can minimize this
by using unsigned integral types (most of the differences
concern representation of negative numbers), but at least one
implementation (using 48 bit signed magnitude int's) required 8
bits of an integral value to be 0, or it treated the value as a
floating point. (I don't know if it ever had a C++
implementation, or even a C, but it would have been interesting;
there was no hardware support for unsigned.)

You'd of course be using
istream::read() rather than the >> operator.

The usual way to do more portable reads (though usually used
for values where it actually matters what format you read it
in) is to read it in as a series of bytes, and construct the
int therefrom, perhaps via a series of bitshifts, so that you
can remain ignorant of the host byte ordering.

The C++ FAQ Lite has a lot to say about
serialization:http://www.parashift.com/c++-faq-lite/serialization.html...
probably much more than you need for this, but very useful
info at any rate.

The problem with most naïve deserialization schemes is that they
introduce a certain randomness (e.g. due to byte order, etc.).
In his case, I hardly think that that can be considered a real
problem---the results won't be any more random than the original
data.

James Kanze · Feb 27, 2008

Well, the main reason that C++ hides such things, because it
wants to continue to support platforms for which it may not be
able to guarantee these things.

There's that, and the fact that they're just a distraction to
the programmer much of the time. One of the parculiarities of
C++, however, is that it makes it a point of honor to allow you
to access the lowest levels when appropriate. Such code won't
necessarily be portable, of course, because such low level
abstractions do vary between hardware. But nothing says that
C++ can only be used for 100% portable applications.

For my part, I probably wouldn't be worrying a whole hell of a
lot with portability in this case, and simply read them
directly into ints (or whatnot). This is made much easier by
the fact that, in this case, byte ordering is irrelevant
(unless you want the same input to parse the same way on
various implementations).

If he's using a truely random source, there's no way he could
tell, since he can't get the same input on two different
implementations. About the only thing that might cause problems
is different formats for integral types. He can minimize this
by using unsigned integral types (most of the differences
concern representation of negative numbers), but at least one
implementation (using 48 bit signed magnitude int's) required 8
bits of an integral value to be 0, or it treated the value as a
floating point. (I don't know if it ever had a C++
implementation, or even a C, but it would have been interesting;
there was no hardware support for unsigned.)

You'd of course be using
istream::read() rather than the >> operator.

The usual way to do more portable reads (though usually used
for values where it actually matters what format you read it
in) is to read it in as a series of bytes, and construct the
int therefrom, perhaps via a series of bitshifts, so that you
can remain ignorant of the host byte ordering.

The C++ FAQ Lite has a lot to say about
serialization:http://www.parashift.com/c++-faq-lite/serialization.html...
probably much more than you need for this, but very useful
info at any rate.

The problem with most naïve deserialization schemes is that they
introduce a certain randomness (e.g. due to byte order, etc.).
In his case, I hardly think that that can be considered a real
problem---the results won't be any more random than the original
data.

John Brawley · Feb 27, 2008

Hi James...

Well, the main reason that C++ hides such things, because it
wants to continue to support platforms for which it may not be
able to guarantee these things.

the programmer much of the time. One of the parculiarities of
C++, however, is that it makes it a point of honor to allow you
to access the lowest levels when appropriate. Such code won't
necessarily be portable, of course, because such low level
abstractions do vary between hardware. But nothing says that
C++ can only be used for 100% portable applications.
I'm using C++ for that very reason: closest to the machine short of
assembler.
I'd hoped (and maybe I should have asked it this way) that there was a C++
way to get a *single*bit* at a time, from this linear-uninterrupted series
of random bits.
(I'd put them together how I wanted, elswhere.)
Portability as noted is no problem, nor any parsings. I 'see' in my mind
sets of 8,16, or 32 bits; I know those can be --or 'are' if I say so (*g*)--
integers; my problem is how to force C++ to do that for me.

For my part, I probably wouldn't be worrying a whole hell of a
lot with portability in this case, and simply read them
directly into ints (or whatnot). This is made much easier by
the fact that, in this case, byte ordering is irrelevant
(unless you want the same input to parse the same way on
various implementations).

If he's using a truely random source, there's no way he could
tell, since he can't get the same input on two different
implementations. About the only thing that might cause problems
is different formats for integral types. He can minimize this
by using unsigned integral types (most of the differences
concern representation of negative numbers), but at least one
implementation (using 48 bit signed magnitude int's) required 8
bits of an integral value to be 0, or it treated the value as a
floating point. (I don't know if it ever had a C++
implementation, or even a C, but it would have been interesting;
there was no hardware support for unsigned.)

You'd of course be using
istream::read() rather than the >> operator.

The usual way to do more portable reads (though usually used
for values where it actually matters what format you read it
in) is to read it in as a series of bytes, and construct the
int therefrom, perhaps via a series of bitshifts, so that you
can remain ignorant of the host byte ordering.

Where do I find info on bitshifts? Stroustrup has little to say on the
matter in his book.
Using myfile.read(buffer, 16) (for example) doesn't work (or I don't know
how yet): the compiler repeatedly gives me "cannot convert" errors. I've
tried everything I can think of or find on the web. (It _ought_ to work....
why not?) Seems the compiler (Borland bcc32) thinks that myfile.read()
must have a type char buffer only. (?)
HOWever, I *have* been able to use myfile.get() to get 8-bit integers
independent of one another, from the RNG output file, so if I could put
those together into longer integers, that would come close to doing the
trick....

The C++ FAQ Lite has a lot to say about
serialization:http://www.parashift.com/c++-faq-lite/serialization.html...
probably much more than you need for this, but very useful
info at any rate.

(I studied that, tried to use the info, got the same compiler errors and
type mismatch errors.)
The problem with most naïve deserialization schemes is that they
introduce a certain randomness (e.g. due to byte order, etc.).
In his case, I hardly think that that can be considered a real
problem---the results won't be any more random than the original
data.
James Kanze (GABI Software)
I am *quite* leery about introducing anything extra into the RNG's
bitstream. I would not trust the numbers.
Example: I said I can get 8-bit-parsed integers (0-255) from the file OK. I
can then multiply those by (say) ).001 and Pi, and get doubles that are in
the range I need to test the C++ PRNG and my code methodology.
I just don't trust it: even though Pi is itself decimally random (Carl Sagan
and Jodi Foster notwithstanding), I feel like I've stuck "circle!" into the
random purity of the TRNG's output.

My original vision was to grab the first one(or zero) *bit*, use it to sign
the following number, then grab the next 32 bits and call them a long
integer, and repeat all the way through the file until it ran out of data.
As usual, my hopes don't seem to match up with my realities....

It's irritating. I can put the file into Windows Wordpad and look at all
those ASCII characters (so I knew the file could be parsed), and I can make
my own code read the file and show exactly the same ASCII in the console
window, and/or get the 0-255 integers. I just can't get my code to force
long integers out of the same uninterrupted stream of bits.

(For info: I also found a free-for-hobbyist program that reads noise from
the computer video card's analog idle state, which seems, according to its
analysis mode, to produce very good true random output --but of course the
problem is the same: utterly unformatted unbroken serial 1-and-0 output.)

Thanks, James. I'm getting closer.... I *will* not let a mere language
issue stand in the way of me getting as close to true random doubles as is
humanly possible.

John Brawley · Feb 28, 2008

John Brawley said:
Hi James...

Wasn't ( a no-work; wanted a char*)
Thanks to suggestion (James'), now am, but.....

Suggestion on bitshifting would be nice...
Reference(s)?

Thanks, James. I'm getting closer.... I *will* not let a mere language
issue stand in the way of me getting as close to true random doubles as is
humanly possible.

....And here's my new problem.
(I hate pointers; I barely understand them), but I was able to implement
your (James') suggestion and reinterpret_cast the infile.read() function so
I could use read() with long ints. Now I have this weird result:

#include <fstream>
#include <cstdlib>
using namespace std;
ifstream inrn;
ofstream ourn;
long int * stuff[1];
int f;
int main() {
//next line works on *any* windows file....
//(test by sticking any filename in there)
inrn.open("random.dat", ios::binary);
ourn.open("outrn.txt");
inrn.read(reinterpret_cast<char*>(stuff),4);
cout<<stuff[0]; //...HEX!!
ourn<<*stuff; //writes same hex, to file
inrn.close();
ourn.close();
return 0;
}

It's all in hexadecimal.
Do I actually have to write a converter to get the long int I want out of
the hexadecimal?
I've redone many (*many*) things in this, and I believe I'm sure I'm not
getting an address, but stuff[] is actually getting a hexadecimal
representaiton of the long int I'm trying to achieve....

What'm I doing wrong?
Thank you....

Micah Cowan · Feb 28, 2008

John said:
...And here's my new problem.
(I hate pointers; I barely understand them), but I was able to implement
your (James') suggestion and reinterpret_cast the infile.read() function so
I could use read() with long ints. Now I have this weird result:

I recommend gaining a good understanding of pointers before attempting
to use them. Read the appropriate section of your C++ book, and then go
read the C++ FAQ Lite.

#include <fstream>
#include <cstdlib>
using namespace std;
ifstream inrn;
ofstream ourn;
long int * stuff[1];

This says: stuff is an array of pointer to long.

int f;
int main() {
//next line works on *any* windows file....
//(test by sticking any filename in there)
inrn.open("random.dat", ios::binary);
ourn.open("outrn.txt");
inrn.read(reinterpret_cast<char*>(stuff),4);

Using the name stuff, by itself, returns a pointer to its first element.
stuff is an array of pointer to long, so using its name gives a pointer
to pointer to long, which you're casting to char*. You are reading a
value into stuff[0], which is a pointer to long.

It'd be more portable to replace your 4 with (sizeof *stuff), btw, as
that will guarantee that you read the right number of bytes.

cout<<stuff[0]; //...HEX!!

Prints the value of a pointer-to-long to cout, which your implementation
represents in hexadecimal.

You can fix your problem by changing

long int * stuff[1];

to

long int stuff[1];

Though I don't see why you should do that instead of defining it as

long int stuff;

passing reinterpret_cast<char*>(&stuff) and sizeof(stuff) to
istream::read(), and using stuff itself directly in

cout << stuff;

and the like.

James Kanze · Feb 28, 2008

"John Brawley" <[email protected]> wrote in message

[...]

...And here's my new problem.
(I hate pointers; I barely understand them)

Then you probably shouldn't be using them. Using anything you
don't understand can be dangerous, but pointers are particularly
so.

, but I was able to implement
your (James') suggestion and reinterpret_cast the infile.read() function so
I could use read() with long ints. Now I have this weird result:

#include <fstream>
#include <cstdlib>
using namespace std;
ifstream inrn;
ofstream ourn;
long int * stuff[1];

Note that this is an array of one pointer. I'm not sure that
that's what you want.

int f;
int main() {
//next line works on *any* windows file....
//(test by sticking any filename in there)
inrn.open("random.dat", ios::binary);
ourn.open("outrn.txt");
inrn.read(reinterpret_cast<char*>(stuff),4);

And now you're likely to be running into deep trouble. C++
handles arrays a bit funny. (It's better to avoid them
entirely, and just use std::vector, but sometimes, we don't have
the choice.) In particular, it converts the array to a pointer
to the first element in a lot of cases. Including this one. So
what you're doing here is reading external bytes into a pointer.
Which is definitely not recommended.

Possibly, you didn't want the pointer to begin with:

long int stuff[ 1 ] ;
inrn.read( reinterpret_cast< char* >( stuff ), sizeof stuff ) ;

will do the trick: the implicite conversion of the array into a
pointer (a long int*) gives you something on which the
reinterpret_cast is legal, and will do what is wanted.

cout<<stuff[0]; //...HEX!!

As you've written the code, you're asking for a pointer to be
output: stuff is an array of pointers, and stuff[0] is the first
pointer. The format cout uses to write pointers is very system
dependent, but hex is a popular choice. (One system I once
worked on would print them as two hex values, separated by a
colon.)

ourn<<*stuff; //writes same hex, to file

Well, *stuff and stuff[0] are perfectly equivalent in C++.
Again, this is an oddity of C++ (inherited from C): the indexing
operator is defined in terms of pointer arithmetic, i.e. a is
defined as being the same a *(a+b). Indexing only works on
pointers, and in the case of something declared as an array, it
works because an array converts implicitly to a pointer.

Perhaps what you want is more like:

std::vector< long int > v( someSize ) ;
inrn.read( reinterpret_cast< char* >( &v[ 0 ] ),
sizeof( long int ) * v.size() ) ;

If you use std::vector, it will actually act like an array, and
not like a pointer.

inrn.close();
ourn.close();
return 0;
}

Click to expand...

It's all in hexadecimal.
Do I actually have to write a converter to get the long int I
want out of the hexadecimal?

Click to expand...

No. You have to read and write long int's, and not pointers to
long ints.

I've redone many (*many*) things in this, and I believe I'm
sure I'm not getting an address, but stuff[] is actually
getting a hexadecimal representaiton of the long int I'm
trying to achieve....

Click to expand...

You're getting the contents of the pointer. You initialize the
pointer by reading bytes into it from a file, and I certainly
wouldn't recommend dereferencing it. But as far as C++ is
concerned, it's a pointer, and << will treat it like a pointer.

John Brawley · Feb 28, 2008

"James Kanze"
On Feb 28, 1:51 am, "John Brawley"

[...]Then you probably shouldn't be using them. Using anything you
don't understand can be dangerous, but pointers are particularly
so.
Well, I've avoided them in my large program (not this one), and I did
understand that my use of one for my main xyz-coordinates database was a
'pointer to the first element of' my array[],
( double *pdb=new double[ /*HUGEnumber*/ ]; )
and I've used array-indexing math to do all manipulations thereon, so I was
quite confident I knew what I was doing here. (Wrong: 'Confidence of the
dilettante'....)

your (James') suggestion and reinterpret_cast the infile.read()
function so I could use read() with long ints.

#include <fstream>
#include <cstdlib>
using namespace std;
ifstream inrn;
ofstream ourn;

long int * stuff[1];

Click to expand...

Note that this is an array of one pointer. I'm not sure that
that's what you want.
I tried many sizes; they all got the same result (it varied only in how many
hex pairs I got, which in turn was dependent on what numbers I used lower in
the snippet).
What I envisioned was just enough memory being allotted to hold one long
int, and based on my (above, "*pdb=new double[]") one block seemed enough.
I 'see' my main database (not this program) as a linear sequence of bytes
of memory, in "double" steps (4 bytes: 32 bits, per), jumped by fives (thus
each block is 20 bytes long, and holds five double-precision floating point
numbers).
Your info here doesn't change that image (program has been working correctly
several years), but does force me to rethink a lot of what I *thought* I
knew...

//next line works on *any* windows file....
//(test by sticking any filename in there)
inrn.open("random.dat", ios::binary);
ourn.open("outrn.txt");

And now you're likely to be running into deep trouble. C++
handles arrays a bit funny. (It's better to avoid them
entirely, and just use std::vector, but sometimes, we don't have
the choice.) In particular, it converts the array to a pointer
to the first element in a lot of cases. Including this one. So
what you're doing here is reading external bytes into a pointer.
Which is definitely not recommended.
Then, it's possible my main program works so perfectly because I did
something _else_ I didn't understand? It did convert to first element of
the array[] (I didn't know it then), so you here ("Including this one") see
me just doing the same thing that worked for me before.
I avoided sdt::vector years ago for several reasons. (Assume "no choice.")
I thought therefore that I was reading external bytes into the array[]
itself, which is how I think of what's going on in my other program.
If I parse what you say here correctly, though, I was here doing something
that was wrong for me to do in the first place. No wonder I wasn't getting
what I expected to.
Possibly, you didn't want the pointer to begin with:
Right. I wanted a piece of (uncommitted, empty) hardware memory (an
array[]) pointed to by the pointer.
long int stuff[ 1 ] ;
inrn.read( reinterpret_cast< char* >( stuff ), sizeof stuff ) ;

will do the trick: the implicite conversion of the array into a
pointer (a long int*) gives you something on which the
reinterpret_cast is legal, and will do what is wanted.
I'll try it today.
I'll also RE-think nearly everything I've done in the other program, that's
been working for years....(*grin*)

cout<<stuff[0]; //...HEX!!

Click to expand...

As you've written the code, you're asking for a pointer to be
output: stuff is an array of pointers, and stuff[0] is the first
pointer. The format cout uses to write pointers is very system
dependent, but hex is a popular choice. (One system I once
worked on would print them as two hex values, separated by a
colon.)
Thank you. Based on what happens in the other program, which, when I
cout<<pdb[index],
produces what I expect it to (a proper and correct numerical value), I
expected this *identical* syntax to do the same. Instead I get a pointer
itself (which looks like an address)....
This is *not* easy to grasp. One program does what I expect, but apparently
(to me) identical syntax in this snippet does something I *don't* expect....
Well, *stuff and stuff[0] are perfectly equivalent in C++.
Again, this is an oddity of C++ (inherited from C): the indexing
operator is defined in terms of pointer arithmetic, i.e. a is
defined as being the same a *(a+b). Indexing only works on
pointers, and in the case of something declared as an array, it
works because an array converts implicitly to a pointer.
Gah.... (*^&%$)....
So, apparently, in my working program that's been doing massive complicated
floating point math by pulling numbers out of an array[] and stuffing them
back in, the FP numbers are not coming from the places in memory that I
thought they were, but from *other* places in memory *pointed to* by my *pdb
array[] which is an array[] of *pointers* that point to addresses where the
actual values are stored.... (??)
OK.... ok.... I'll wrap mind around this sooner or later....
I'm starting to think the gods were being kind to me when I wrote that first
program in Python, and again when I translated it into C++: I did wrong
things and got right results.
Perhaps what you want is more like:

std::vector< long int > v( someSize ) ;
inrn.read( reinterpret_cast< char* >( &v[ 0 ] ),
sizeof( long int ) * v.size() ) ;

If you use std::vector, it will actually act like an array, and
not like a pointer.
I bet that's exactly what I want.
I've never used vectors. I'm leery of them. I read they can "lose"
pointers into them if the vector resizes; they work much more slowly
than my runtime-static array[], and the way I've been doing it (with the
array[]) has been working perfectly for a long time now.
But: this is a perfect time for me to try using vectors, since this program
(to get the RNs from the binary file) is not directly related to the other
program, isn't likely to get itself integrated with the other program, and
is in fact purely and only aimed at testing the quality of the pseudo-random
generator that that program uses, to pin down an oddity in a
spherical-coordinates random point-scattering method no longer used therein.

Do I actually have to write a converter to get the long int I

Click to expand...

No. You have to read and write long int's, and not pointers to
long ints.
Which is exactly what I'll now try to do.
Thank you for this advice, and for the remedial education on what my
long-standing array[] is actually doing...(*grin*).

I've redone many (*many*) things in this, and I believe I'm

Click to expand...

sure I'm not getting an address, but stuff[] is actually
getting a hexadecimal representaiton of the long int I'm

trying to achieve....

Click to expand...

Click to expand...

You're getting the contents of the pointer. You initialize the
pointer by reading bytes into it from a file, and I certainly
wouldn't recommend dereferencing it. But as far as C++ is
concerned, it's a pointer, and << will treat it like a pointer.
One of the things I tried was dereferencing (several things, several
places).
Nothing blew up, just more (but different) hex numbers.
Now I go back into the fray.
Thank you very much, James.

John Brawley · Feb 28, 2008

"John Brawley"

"James Kanze"
"John Brawley"

[...]Then you probably shouldn't be using them. Using anything you
don't understand can be dangerous, but pointers are particularly

<massivesnips>
(readers backparse thread if curious)

(James suggests) :

long int stuff[ 1 ] ;
inrn.read( reinterpret_cast< char* >( stuff ), sizeof stuff ) ;

Wow.

(Even: *Wow.)

Deleting _one_character_ (the * in " long int * stuff[1];" ) from my posted
code produced the correct output, after me beating at this for hours before
my original post....
(Adding " sizeof stuff " where '4' was, produces same output, so didn't make
that change before checking effect of de-pointer-ing the array[].)

(I excuse myself only because I used the pointer to array[] elsewhere and
it's worked for years, so I was strongly "conditioned" to believe it would
do the same here.)

THANK YOU !
(I'll definitely study pointers much more. I still don't like them, but can
see their usefulness for this'n that, as well as the "danger" in them, and I
do like to think in terms of real hardware raw memory even if I have to do
so through 'indirection' or 'redirection'.)

John Brawley · Feb 28, 2008

"James Kanze"
"John Brawley" wrote:

Context:
[...]Then you probably shouldn't be using them. Using anything you
don't understand can be dangerous, but pointers are particularly
I can now get 'em, mess with 'em (make doubles), produce the test file I
need with + and - x,y,z coordinate values, and am back in business.

FYI, James, with thanks!,
pgm below writes a raw-XYZ coords file 333 points long, and can by using a
variable where '333' is, write one of any length, as long as the input
random.dat file has random bits.

#include <fstream>
#include <cstdlib>
using namespace std;
ifstream inrn;
ofstream ourn;
long int stuff[1];
int main() {
inrn.open("random.dat", ios::binary);
ourn.open("outrn.txt");
for (int i=0;i<333;i++) {
for (int i=0;i<3;i++) {
inrn.read(reinterpret_cast<char*>(stuff),4);
double x=stuff[0];
x=x*0.0000001;
ourn<<x<<" ";
}
ourn<<"\n";
}
inrn.close();
ourn.close();
return 0;
}

I am now one happy camper again, thanks to you.

Binary to BCD code understanding	0	Dec 27, 2021
Expected Token Density in Random Stream	0	Dec 7, 2011
Size of Integer , long Integer, Long double	5	Jul 4, 2010
Big-endian binary data to/from Python ints?	3	Dec 26, 2007
Why is long long int not as long as promised??	10	Sep 25, 2008
integer to binary 0-padded	4	Jun 15, 2011
can I make Javascript see binary data?	4	Jul 30, 2010
portable handling of binary data	8	Dec 16, 2008

Parse binary stream to long ints

John Brawley

Micah Cowan

John Brawley

Christopher

Micah Cowan

James Kanze

James Kanze

James Kanze

John Brawley

John Brawley

Micah Cowan

James Kanze

John Brawley

John Brawley

John Brawley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads