Bit-Pattern of Representation of Objects

R

Robbie Hatley

I was struggling to come up with a way to discern the actual
bit patterns of the representations of C++ objects (esp.
objects of small built-in types), and I came up with the
following mess. But I'm wondering, is there an easier way
to do this? This seems so klunky.

#include <iostream>
#include <cmath>
template<typename T>
void Binary(T const & object)
{
size_t size = 8 * sizeof(object); // Size of object in bits.
unsigned long long int mask =
static_cast<unsigned long long int>
(pow(2.0, static_cast<double>(size - 1)) + 0.1);
unsigned long long int pattern =
*reinterpret_cast<const unsigned long long int*>(&object);
for( ; mask > 0 ; mask >>= 1)
{
if(pattern & mask) std::cout << "1";
else std::cout << "0";
}
std::cout << std::endl;
return;
}


--
Cheers,
Robbie Hatley
East Tustin, CA, USA
lone wolf intj at pac bell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
A

Alf P. Steinbach

* Robbie Hatley:
I was struggling to come up with a way to discern the actual
bit patterns of the representations of C++ objects (esp.
objects of small built-in types), and I came up with the
following mess. But I'm wondering, is there an easier way
to do this? This seems so klunky.

#include <iostream>
#include <cmath>
template<typename T>
void Binary(T const & object)
{
size_t size = 8 * sizeof(object); // Size of object in bits.

Use CHAR_BITS or whatever it's called: a byte isn't necessarily 8 bits.

unsigned long long int mask =

C++ does not have a 'long long' type, yet.

static_cast<unsigned long long int>
(pow(2.0, static_cast<double>(size - 1)) + 0.1);

Use left shift operator.

unsigned long long int pattern =
*reinterpret_cast<const unsigned long long int*>(&object);

You don't know that sizeopf(object) <= sizeof(long long).

for( ; mask > 0 ; mask >>= 1)
{
if(pattern & mask) std::cout << "1";
else std::cout << "0";

std::cout << !!(pattern & mask);

}
std::cout << std::endl;
return;

Unnecessary 'return'.
 
F

Frederick Gotham

Robbie Hatley posted:
I was struggling to come up with a way to discern the actual
bit patterns of the representations of C++ objects


Here's some code I wrote recently if you're interested:

#include <cstddef>
#include <cassert>
#include <climits>
#include <ostream>

void PrintBits(void const * const mem,size_t amount_bytes,std::eek:stream &os)
{
assert( mem );
assert( amount_bytes );

char static str[CHAR_BIT + 1] = {};

unsigned char const *p = reinterpret_cast<unsigned char const*>(mem);

do
{
unsigned const byte_val = *p++;

char *pos = str;

unsigned to_and_with = 1U << CHAR_BIT - 1;

do *pos++ = byte_val & to_and_with ? '1' : '0';
while(to_and_with >>= 1);

os << str;

} while (--amount_bytes);
}


template<class T>
inline void PrintObjBits(T const &obj,std::eek:stream &os)
{
PrintBits(&obj,sizeof obj,os);
}


#include <iostream>

int main()
{
long double array[4] = { 241.126, 632.225, 2662.2523, 23345.2352 };

PrintObjBits(array,std::cout);
}
 
P

Pete Becker

Robbie said:
I was struggling to come up with a way to discern the actual
bit patterns of the representations of C++ objects (esp.
objects of small built-in types), and I came up with the
following mess. But I'm wondering, is there an easier way
to do this? This seems so klunky.

Copy the object into a suitably sized array of unsigned char, then dump
the unsigned chars in hex. If you really need binary rather than hex,
the conversion is a fairly simple exercise.
 
R

Robbie Hatley

Alf P. Steinbach said:
Use CHAR_BITS or whatever it's called: a byte isn't necessarily 8 bits.

"CHAR_BIT". True. However, the vast majority of the world's computers,
especially non-mainframe computers, use 8-bit bytes. That's pretty
ubiquitous.

But to be pedantic, I'll do:

size_t size = CHAR_BIT * sizeof(object);
C++ does not have a 'long long' type, yet.

Not yet, but the committe is working on it. Besides, many
C++ compilers jumped the gun and provided long long and
unsigned long long years ago, so they're actually quite common.
Use left shift operator.

Great idea! Thanks! Saves the inefficient pow() call.
I'll do this:

unsigned long long mask = unsigned long long(1) << size - 1;
You don't know that sizeopf(object) <= sizeof(long long).

Yes, I do. Think about it. Unsigned long long has a maximum
value of over 18 quintillion, so it can express the size of
an object of over 18EB. ("EB" is "exabytes". One exabyte
is 10^18 bytes, or 1 billion gigabytes.) You give me a computer
with 18EB of memory, I'll give you $500 for it. :) Of course,
pointers would then need to become unsigned long long in order
to address that much memory space, but it would be worth it.
(Caveat: that's assuming long long is at least 64 bits. If it's
not, the implimentor should not have bothered implimenting it.)

Anyway, thanks for the tips!

--
Cheers,
Robbie Hatley
East Tustin, CA, USA
lone wolf intj at pac bell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
R

Robbie Hatley

Yes, I do. Think about it. Unsigned long long has a maximum
value of over 18 quintillion, so it can express the size of
an object of over 18EB...

Oops, I blew it. I confused "size expressible by long long" (which
is about 18 quintillion bytes) with "size of long long" (which is
only 8 bytes on most implimentations). I must be getting sleepy.

So, my little program, as written, can only handle objects up to
8 bytes in length, not 18EB. Oh, well. I was only off by about
17 orders of magnitude. Not bad for a Sunday afternoon.


--
Cheers,
Robbie Hatley
East Tustin, CA, USA
lone wolf intj at pac bell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
R

Robbie Hatley

Frederick Gotham said:
#include <cstddef>
#include <cassert>
#include <climits>
#include <ostream>

void PrintBits(void const * const mem,size_t amount_bytes,std::eek:stream &os)
{
assert( mem );
assert( amount_bytes );

char static str[CHAR_BIT + 1] = {};

unsigned char const *p = reinterpret_cast<unsigned char const*>(mem);

do
{
unsigned const byte_val = *p++;

char *pos = str;

unsigned to_and_with = 1U << CHAR_BIT - 1;

do *pos++ = byte_val & to_and_with ? '1' : '0';
while(to_and_with >>= 1);

os << str;

} while (--amount_bytes);
}


template<class T>
inline void PrintObjBits(T const &obj,std::eek:stream &os)
{
PrintBits(&obj,sizeof obj,os);
}


#include <iostream>

int main()
{
long double array[4] = { 241.126, 632.225, 2662.2523, 23345.2352 };

PrintObjBits(array,std::cout);
}

Yes, that's much better than my approach. No limit on the size
of the object that way, other than that you're assuming it's
contiguous in memory.

I like the nested do loops, which break the problem first into
bytes, then into bits. And the fact that it just riffles through
the existing data in-place, without any copying. Cool. Thanks
for sharing this.

--
Cheers,
Robbie Hatley
East Tustin, CA, USA
lone wolf intj at pac bell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
K

Kai-Uwe Bux

Robbie said:
"CHAR_BIT". True. However, the vast majority of the world's computers,
especially non-mainframe computers, use 8-bit bytes. That's pretty
ubiquitous.

But to be pedantic, I'll do:

size_t size = CHAR_BIT * sizeof(object);


Not yet, but the committe is working on it. Besides, many
C++ compilers jumped the gun and provided long long and
unsigned long long years ago, so they're actually quite common.


Great idea! Thanks! Saves the inefficient pow() call.
I'll do this:

unsigned long long mask = unsigned long long(1) << size - 1;


Yes, I do. Think about it. Unsigned long long has a maximum
value of over 18 quintillion, so it can express the size of
an object of over 18EB. ("EB" is "exabytes". One exabyte
is 10^18 bytes, or 1 billion gigabytes.) You give me a computer
with 18EB of memory, I'll give you $500 for it. :)

You are aguing

sizeof(object) <= std::max<long long>()

not

sizeof( object ) <= sizeof( long long )

Note sizeof(long long) maybe as low as 8 (or even 1 if chars are really
huge).

Anyway, what happens in the code is that you try to store the bit pattern of
object in a variable of type unsigned long long, whose size might be too
small. So what about:

#include <cstddef>
#include <climits>

template < typename T >
struct bit_pattern {

static std::size_t const size = sizeof( T );

typedef unsigned char const * address;

static
address mem_location ( T const & t ) {
return ( reinterpret_cast< address >( &t ) );
}

template < typename OutIter >
static
OutIter dump_bits ( T const & t, OutIter where ) {
address loc = mem_location( t );
for ( std::size_t index = 0; index < size; ++index ) {
unsigned char c = loc[index];
unsigned char mask = 1;
for ( std::size_t bit_pos = 0; bit_pos < CHAR_BIT; ++ bit_pos ) {
where = ( ( c & mask ) != 0 );
++ where;
mask <<= 1;
}
}
return ( where );
}

}; // bit_pattern

template < typename T, typename OutIter >
OutIter dump_bits ( T const & t, OutIter where ) {
return ( bit_pattern<T>::dump_bits( t, where ) );
}

#include <iostream>
#include <iterator>


int main ( void ) {
int i = 5;
std::eek:stream_iterator< bool > bool_writer ( std::cout );
dump_bits( i, bool_writer );
std::cout << '\n';
}


Best

Kai-Uwe Bux
 
R

Robbie Hatley

Kai-Uwe Bux said:
You are aguing

sizeof(object) <= std::max<long long>()

not

sizeof( object ) <= sizeof( long long )


Yes, I relalized that a minute later. Really stupid error on my
part. (See my other post on that.)

#include <cstddef>
#include <climits>

template < typename T >
struct bit_pattern {

static std::size_t const size = sizeof( T );

typedef unsigned char const * address;

static
address mem_location ( T const & t ) {
return ( reinterpret_cast< address >( &t ) );
}

template < typename OutIter >
static
OutIter dump_bits ( T const & t, OutIter where ) {
address loc = mem_location( t );
for ( std::size_t index = 0; index < size; ++index ) {
unsigned char c = loc[index];
unsigned char mask = 1;
for ( std::size_t bit_pos = 0; bit_pos < CHAR_BIT; ++ bit_pos ) {
where = ( ( c & mask ) != 0 );
++ where;
mask <<= 1;
}
}
return ( where );
}

}; // bit_pattern

template < typename T, typename OutIter >
OutIter dump_bits ( T const & t, OutIter where ) {
return ( bit_pattern<T>::dump_bits( t, where ) );
}

#include <iostream>
#include <iterator>


int main ( void ) {
int i = 5;
std::eek:stream_iterator< bool > bool_writer ( std::cout );
dump_bits( i, bool_writer );
std::cout << '\n';
}

Hmmm... Really heavy-weight C++ solution, as opposed to Frederick
Gotham's C-flavored solution. A template function which invokes
a template member function in a template struct. Yikes.

I notice that in both dump_bits functions, OutIter is passed by
value instead of by ref. Is that a mistake, or is that by design?
It will cause the three copies of OutIter -- argument, parameter,
return -- to be independent of each other. I suppose there's an
advantage there, because a calling function could maintain an
"original entry point" iterator, as well as a "where we ended up"
iterator returned from dump_bits().

On second thought, I think that's a mistake, not a virtue. Since
this is a stream iterator, we probably DON'T want the ability to
go back and overwrite some earlier part of the stream. So I'm
thinking OutIter should be passed by reference.

I can't say I understand everything I'm looking at here. Like
THIS line of code:

std::eek:stream_iterator< bool > bool_writer ( std::cout );

I've never used stream iterators before. So this is basically
making a bool-writing ostream iterator and connecting it to cout?
How do you use that, something like the following?

*bool_writer = ( /* boolean value */ );
++bool_writer;

But I notice in your code, you have

where = ( ( c & mask ) != 0 );
++where;

Shouldn't that be more like the following?

*where = ( ( c & mask ) != 0 );
++where;


--
Cheers,
Robbie Hatley
East Tustin, CA, USA
lone wolf intj at pac bell dot net
(put "[usenet]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
K

Kai-Uwe Bux

Robbie said:
Kai-Uwe Bux said:
You are aguing

sizeof(object) <= std::max<long long>()

not

sizeof( object ) <= sizeof( long long )


Yes, I relalized that a minute later. Really stupid error on my
part. (See my other post on that.)

#include <cstddef>
#include <climits>

template < typename T >
struct bit_pattern {

static std::size_t const size = sizeof( T );

typedef unsigned char const * address;

static
address mem_location ( T const & t ) {
return ( reinterpret_cast< address >( &t ) );
}

template < typename OutIter >
static
OutIter dump_bits ( T const & t, OutIter where ) {
address loc = mem_location( t );
for ( std::size_t index = 0; index < size; ++index ) {
unsigned char c = loc[index];
unsigned char mask = 1;
for ( std::size_t bit_pos = 0; bit_pos < CHAR_BIT; ++ bit_pos ) {
where = ( ( c & mask ) != 0 );
++ where;
mask <<= 1;
}
}
return ( where );
}

}; // bit_pattern

template < typename T, typename OutIter >
OutIter dump_bits ( T const & t, OutIter where ) {
return ( bit_pattern<T>::dump_bits( t, where ) );
}

#include <iostream>
#include <iterator>


int main ( void ) {
int i = 5;
std::eek:stream_iterator< bool > bool_writer ( std::cout );
dump_bits( i, bool_writer );
std::cout << '\n';
}

Hmmm... Really heavy-weight C++ solution, as opposed to Frederick
Gotham's C-flavored solution. A template function which invokes
a template member function in a template struct. Yikes.

That is just to allow for automatic type deduction. Moreover, there is a
certain amount of overkill here. One could just do:

#include <cstddef>
#include <climits>

typedef unsigned char const * address;

template < typename T >
address mem_location ( T const & t ) {
return ( reinterpret_cast< address >( &t ) );
}

template < typename T, typename OutIter >
OutIter dump_bits ( T const & t, OutIter where ) {
address loc = mem_location( t );
for ( std::size_t index = 0; index < sizeof(T); ++index ) {
unsigned char c = loc[index];
unsigned char mask = 1;
for ( std::size_t bit_pos = 0; bit_pos < CHAR_BIT; ++ bit_pos ) {
*where = ( ( c & mask ) != 0 );
++ where;
mask <<= 1;
}
}
return ( where );
}

#include <iostream>
#include <iterator>

int main ( void ) {
int i = 5;
std::eek:stream_iterator< bool > bool_writer ( std::cout );
dump_bits( i, bool_writer );
std::cout << '\n';
}


The main difference is that I use an output iterator in the interface for
dump_bits(). This way, one could use this function to initialize a vector
or any other sequence.

I notice that in both dump_bits functions, OutIter is passed by
value instead of by ref. Is that a mistake, or is that by design?
It will cause the three copies of OutIter -- argument, parameter,
return -- to be independent of each other. I suppose there's an
advantage there, because a calling function could maintain an
"original entry point" iterator, as well as a "where we ended up"
iterator returned from dump_bits().

On second thought, I think that's a mistake, not a virtue. Since
this is a stream iterator, we probably DON'T want the ability to
go back and overwrite some earlier part of the stream. So I'm
thinking OutIter should be passed by reference.

In this regard, I am just following the precedent set by the algorithms from
the standard library: all iterators are passed by value. This implies that
iterator object better be designed to be small.
I can't say I understand everything I'm looking at here. Like
THIS line of code:

std::eek:stream_iterator< bool > bool_writer ( std::cout );

I've never used stream iterators before. So this is basically
making a bool-writing ostream iterator and connecting it to cout?
Yep.

How do you use that, something like the following?

*bool_writer = ( /* boolean value */ );
++bool_writer;
Yep.

But I notice in your code, you have

where = ( ( c & mask ) != 0 );
++where;

Shouldn't that be more like the following?

*where = ( ( c & mask ) != 0 );
++where;

Yes, that's a typo. Thanks for catching that. I wonder why it compiled and
produced the expected output.


Best

Kai-Uwe Bux
 
A

Alf P. Steinbach

* Kai-Uwe Bux:

ASIPW (Academic Spanner In Practical Wheel): how well does your code
work with diamond pattern virtual inheritance? <g>

Actually it's potentially far worse than that, from an academic or
language lawyer point of view, depending on which committee member
faction one favors.

IIRC, David Abrahams argued that for /any/ non-POD class the compiler is
allowed to distribute an object's value representation hither and dither
in memory, with just some pointers or offsets here and there to connect
things (essentially that's what done for the case mentioned above, but
you don't expect it elsewhere in practice), and further that this was
inherent in the phrase "region of storage" (or memory, whatever it was,
look up the definition of "object" in the standard), which, as he saw
it, was not necessarily a contigous region but rather any set of bytes.
I.e., that even a /variable/, a named object, is not necessarily
contained in the set of bytes from its start address through sizeof
bytes. My own view is, perhaps needless to say, that it is.
 
F

Frederick Gotham

Robbie Hatley posted:
No limit on the size of the object that way, other than that you're
assuming it's contiguous in memory.


Hmm... (give my brain a minute to mull that over).

Of course, in C++, an object can be really "fancy", and have all sorts of
constructors and overloaded operators, but *ultimately*, the object itself
is stored as a simple sequence of bits in memory.

Is it not a fundamental assumption that these bits are contiguous?

I like the nested do loops, which break the problem first into
bytes, then into bits.


If ever you have a sleepless night and want to expend some brain power, you
could become an efficiency junky and unroll whatever loops you can. Instead
of having:

unsigned to_and_with = 1U << CHAR_BIT - 1;

do *pos++ = byte_val & to_and_with ? '1' : '0';
while(to_and_with >>= 1);


We could have:


#if CHAR_BIT > 8


for(unsigned to_and_with = 1U << CHAR_BIT - 1;;byte_val >>= 1)
{
*pos++ = byte_val & to_and_with ? '1' : '0';

if(256 == byte_val) break;
}

#endif

*pos++ = byte_val & 128 ? '1' : '0';
*pos++ = byte_val & 64 ? '1' : '0';
*pos++ = byte_val & 32 ? '1' : '0';
*pos++ = byte_val & 16 ? '1' : '0';
*pos++ = byte_val & 8 ? '1' : '0';
*pos++ = byte_val & 4 ? '1' : '0';
*pos++ = byte_val & 2 ? '1' : '0';
*pos++ = byte_val & 1 ? '1' : '0';


(Of course, we like to hope the compiler will do this for us.)

And the fact that it just riffles through
the existing data in-place, without any copying. Cool.


If I were to go for extreme efficiency, I would only make one final output,
rather than outputting each byte one by one.

Thanks for sharing this.


You're welcome. :)
 
A

Alf P. Steinbach

* Frederick Gotham:
Robbie Hatley posted:


Hmm... (give my brain a minute to mull that over).

Of course, in C++, an object can be really "fancy", and have all sorts of
constructors and overloaded operators, but *ultimately*, the object itself
is stored as a simple sequence of bits in memory.

Is it not a fundamental assumption that these bits are contiguous?

No (although for a POD they are, and in practice they are except when
virtual inheritance is involved).
 
K

Kai-Uwe Bux

Alf said:
* Kai-Uwe Bux:

ASIPW (Academic Spanner In Practical Wheel): how well does your code
work with diamond pattern virtual inheritance? <g>

Perfectly fine: the requirements were a little vague. said:
Actually it's potentially far worse than that, from an academic or
language lawyer point of view, depending on which committee member
faction one favors.

IIRC, David Abrahams argued that for /any/ non-POD class the compiler is
allowed to distribute an object's value representation hither and dither
in memory, with just some pointers or offsets here and there to connect
things (essentially that's what done for the case mentioned above, but
you don't expect it elsewhere in practice), and further that this was
inherent in the phrase "region of storage" (or memory, whatever it was,
look up the definition of "object" in the standard), which, as he saw
it, was not necessarily a contigous region but rather any set of bytes.
I.e., that even a /variable/, a named object, is not necessarily
contained in the set of bytes from its start address through sizeof
bytes. My own view is, perhaps needless to say, that it is.

I wonder whether that interesting point of view implies that, say,
malloc(sizeof(T)) with a following placement new cannot be used to
construct an object of type T because the layout of T may require its
sizeof(T) bytes to be scattered in memory in a peculiar non-contiguous way.
That would be of interest for anybody using std::vector<T> with a custom
allocator.


Best

Kai-Uwe Bux
 
F

Frederick Gotham

Alf P. Steinbach posted:
No (although for a POD they are, and in practice they are except when
virtual inheritance is involved).


If we have an object, there's two operators we can apply to it which will
give us all the info about how it's stored in memory:

Address-of operator: &obj
sizeof operator: sizeof obj

Let's say that the former gives us a memory location of:

1032

And that the latter gives us a size of:

8

Can we not therefore assume that the object's bits are dispersed between
the following memory locations:

1032 through 1040


Therefore, if we print out all the bits from 1032 through 1040, are we not
printing out all of the object's bits?
 
A

Alf P. Steinbach

* Frederick Gotham:
Alf P. Steinbach posted:



If we have an object, there's two operators we can apply to it which will
give us all the info about how it's stored in memory:

Address-of operator: &obj
sizeof operator: sizeof obj

Let's say that the former gives us a memory location of:

1032

And that the latter gives us a size of:

8

Can we not therefore assume that the object's bits are dispersed between
the following memory locations:

1032 through 1040


Therefore, if we print out all the bits from 1032 through 1040, are we not
printing out all of the object's bits?

struct Base{ char id; Base( char anId = '?' ): id(anId) {} };
struct A: virtual Base { int x; A(): x( 1 ) {} };
struct B: virtual Base { int y; B(): y( 2 ) {} };
struct Derived: A, B { Derived( char anId ): Base( anId ) {} };

void showBits( B const& o )
{
// Assumption of o being contigous is made.
}

int main()
{
B b;
Derived d;

showBits( b ); // Possibly OK.
showBits( d ); // Definitely bad.
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top