casting X* to char*

M

Mark P

A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

I can think of three ways to do this:

char* pc = (char*) px;
char* pc = static_cast<char*> (static_cast<void*> (px));
char* pc = reinterpret_cast<char*> (px);

From my reading of the standard it seems that the results of these
casts are all unspecified.

Is this true?

In practice, would these casts ever do anything besides the obvious?

Is there any portable way to access the bytes of an object's representation?

Thanks,
Mark
 
S

Salt_Peter

Mark said:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.

Doesn't it make sense that its a better solution to stream those
componants rather than carying out a byte-by-byte transfer of the
object? What if the target of the transfer uses a different architecture
(or a different compiler). Why would you want to have to deal with the
type's padding?

The proposed solution above and below therefore constitute a good
example of undefined behaviour. Not to mention unneccesary bits being
transferred and having to detect the source and target architectures as
well as writing a new function to handle each possibility.
Why do it the hard way with undefined results when writing a simple
operator rids you of all the headaches?

std::eek:stream& operator<<(std::eek:stream& os, const X& x)
{
// stream the relevent componants into os
// return os;
}

and make the above function a friend of type X.
It doesn't get any simpler. And its portable.
I can think of three ways to do this:

char* pc = (char*) px;
char* pc = static_cast<char*> (static_cast<void*> (px));
char* pc = reinterpret_cast<char*> (px);

From my reading of the standard it seems that the results of these
casts are all unspecified.

I don't see a cast, i see a non-portable hack. The above is in fact
guarenteed to fail. Don't casting unless you develop a healthy respect
for what they do.
Is this true?

In practice, would these casts ever do anything besides the obvious?

Is there any portable way to access the bytes of an object's
representation?

Of course there is. The key is to access the *relevent* bytes. What is
relevent can very well depend on the requirements and needs.
You are assuming than an object will occupy in memory the sum of the
allocations of its componants. If your computer did not rely on
segment+index addressing schemes it would slow to a crawl. In C++ its
critical to provide code that is transparent to the platform its running
on. Your code needs to stream the object's components with no regards to
the architecture/platform underneath. Thats what C++ is all about.

Lets slap together a dumb example.

#include <iostream>
#include <ostream>

class X
{
char c;
int n;
public:
X() : c(' '), n(0) { }
~X() { }
};

int main()
{
X x;
std::cout << "sizeof(x) = " << sizeof(x) << std::endl;
}

/*
sizeof(x) = 8
*/

Interestingly enough on my system a char is 1 byte and an integer is 4
bytes (your mileage may well vary). So why the size of 8 bytes? Answer:
padding. Why would you ever want to pay an 8 byte transfer when you can
simply stream the char and integer? No hacking required. With a portable
result too.

Lets prove it, how about streaming type X objects to your standard
output? After all, it uses a std::eek:stream too, doesn't it? You can
replace std::cout with any interface that can accept a standard output
stream. All you need is a simple operator<< to stream your precious type
in any fashion you desire.

#include <iostream>
#include <ostream>

class X
{
char c;
int n;
public:
X() : c(' '), n(0) { } // default ctor
X(char c_, int n_) : c(c_), n(n_) { }
~X() { }
/* friend operator<< */
friend std::eek:stream& operator<<(std::eek:stream& os, const X& r_x)
{
os << "c = " << r_x.c; // stream the char
os << "; n = " << r_x.n; // stream the integer
return os << std::endl;
}
};

int main()
{
X xa('a', 0);
X xb('b', 1);

std::cout << "xa: " << xa;
std::cout << "xb: " << xb;

}

/*
xa: c = a; n = 0
xb: c = b; n = 1
*/

Are you now seeing the simplicity and power in the design? What if the
private member integer was in fact another class? No problem, write an
op<< for it too. What if i needed a container of 1000 X elements? What
if you needed to stream the whole container of 1000 X elements to
standard output?

Your way you would need hundreds of lines of code. And it would still
not be portable. I can create, load and stream a container of 1000 X
elements in 3 lines of code excluding includes. Yes: 3. Completely
portable and reusable.

hint: std::vector< X > vn(1000); // and std::copy(...) to std::cout
 
M

Michiel.Salters

Mark said:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array). ....
Is there any portable way to access the bytes of an object's representation?

Yes. The correct way is indeed to cast it to a char*. There is one
catch.
The object must have a POD type. POD is Plain Old Data, which roughly
means any old C object that can be memcpy'd as bytes. Almost all C++
features will make a type non-POD, see the standard or any advanced
book.

Of course, it may be a portable way to *access* the bytes of a POD, but
that
still doesn't mean those *bytes* are portable. And the bytes of a
pointer are
notoriously unusable later on.

HTH,
Michiel Salters
 
M

mlimber

Salt_Peter said:
Mark said:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.
[snip]

Good advice. See also these FAQs:

http://www.parashift.com/c++-faq-lite/serialization.html

Cheers! --M
 
T

Tomás

Mark P posted:

char* pc = (char*) px;


Pefectly okay.

char* pc = static_cast<char*> (static_cast<void*> (px));


Perfectly okay.

char* pc = reinterpret_cast<char*> (px);


Pefectly okay.

From my reading of the standard it seems that the results of these
casts are all unspecified.


Incorrect. EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD. The following code is
perfectly okay:

#include <string>
#include <iostream>

template<class T>
void PrintObjectBytes( const T &obj )
{
const unsigned char * const p_last_byte =
reinterpret_cast<const unsigned char *>(&obj) + ( sizeof(obj) - 1
);

for( const unsigned char *p = reinterpret_cast<const unsigned char *>
(&obj);
/* Nothing Condition */;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';

if ( p == p_last_byte ) break;
}
}


#include <cstdlib>

int main()
{
std::string str("Hello World!");

PrintObjectBytes( str );

std::system("PAUSE");
}


-Tomás
 
T

Tomás

posted:

Yes. The correct way is indeed to cast it to a char*. There is one
catch.
The object must have a POD type. POD is Plain Old Data, which roughly
means any old C object that can be memcpy'd as bytes. Almost all C++
features will make a type non-POD, see the standard or any advanced
book.


You're incorrect.

Find me the fanciest, most advanced class you can find... and I guarantee
you it's made up of bytes.

A type doesn't have to be a POD in order for you to access its bytes. See
my post elsewhere in thread for an example.


-Tomás
 
M

mlimber

Tomás said:
EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD.

Sure, but the meaning of those bytes might be different than expected.
For instance, a virtual table might be included or the compiler might
have inserted padding between members. If one is serializing an object
(as the OP indicated), then those bytes are not necessarily meaningful
when unserialized at some later time or on some other machine.
The following code is
perfectly okay:

#include <string>
#include <iostream>

template<class T>
void PrintObjectBytes( const T &obj )
{
const unsigned char * const p_last_byte =
reinterpret_cast<const unsigned char *>(&obj) + ( sizeof(obj) - 1
);

for( const unsigned char *p = reinterpret_cast<const unsigned char *>
(&obj);
/* Nothing Condition */;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';

if ( p == p_last_byte ) break;

Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.

Cheers! --M
 
T

Tomás

mlimber posted:

Use the for-loop condition instead of this line, which unnecessarily
duplicates the functionality of the for-loop construct.


I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.


-Tomás
 
R

red floyd

Tomás said:
mlimber posted:




I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.

do {
// loop body
} while (condition);
 
M

Mark P

Tomás said:
Mark P posted:




Pefectly okay.




Perfectly okay.




Pefectly okay.




Incorrect. EVERY object is made up of bytes, regardless of its type, and
regardless of whether it qualifies as a POD. The following code is
perfectly okay:

[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast, below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, the result of such a pointer conversion is unspecified."

Doesn't this mean that the converted pointer may not point to the bytes
of the original object? I can't imagine why this would ever happen, but
it seems that the standard permits it.

-Mark
 
D

dan2online

Salt_Peter said:
Mark said:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).

What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.

char *pc = (char *)px is reasonable for many cases if you want to
manipulate the bytes. In this scenario, X is plain old data type
(POD). Here is an example,
double *px = new double [100];
char *pc = (char *)px, so you can decode the floating point format by
manipulating the raw byte string.

But if the type X is not POD, it looks complicated to access the raw
byte string.
 
M

mlimber

Tomás said:
mlimber posted:




I want the condition to be tested AFTER the loop body, sort of like how you
can have a "do loop" instead of a "while loop".

Alas, C++ doesn't provide a "do for" loop.

I meant something more along these lines:

typedef unsigned char uchar;
const uchar * const end =
reinterpret_cast<const uchar*>(&obj) + sizeof(obj);
for( const uchar*p = reinterpret_cast<const uchar*>(&obj);
p != end;
++p )
{
std::cout << static_cast<unsigned>(*p) << '\n';
}

Cheers! --M
 
T

Tomás

mlimber posted:
const uchar * const end = reinterpret_cast<const uchar*>(&obj) +
sizeof(obj);


I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)

I know they're not taboo, but they just don't make sense to me.

What happens if an object is located near the "border", right near the end
of memory?

Things are extra hairy if you're dealing with very large objects.

The Standard doesn't say anything about what happens when pointer
arithmetic overflows.

-Tomás
 
T

Tomás

Mark P posted:
[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast,
below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified."

The reason they mention alignment requirements is as follows:

On a certain system, a char may be 8-Bit... however, the smallest amount
of memory that the CPU can access may be 16 bits. Therefore, a "char*"
would need an extra bit to indicate whether it's the first or last 8
bits.

For that reason the following expression may be false:

sizeof(char*) == sizeof(int*)


And accordingly, the following isn't guaranteed to work:

int main()
{
char k[5] = {};

int *p = reinterpret_cast<int*>( k + 1 );

char *pc = p;

assert( pc == k + 1 );
}

Doesn't this mean that the converted pointer may not point to the
bytes of the original object? I can't imagine why this would ever
happen, but it seems that the standard permits it.


A "char" has the least alignment requirements -- there's no problem.


-Tomás
 
M

mlimber

Tomás said:
mlimber posted:



I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)

I know they're not taboo, but they just don't make sense to me.

What happens if an object is located near the "border", right near the end
of memory?

See below, but the implication seems to be that they cannot be
allocated too close to a "border."
Things are extra hairy if you're dealing with very large objects.

No. The "end" pointer above points to the address immediately following
the object, not sizeof(T) after that point.
The Standard doesn't say anything about what happens when pointer
arithmetic overflows.

Incorrect. 5.7 para. 5 says, "f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."

Use them without fear!

Cheers! --M
 
T

Tomás

mlimber posted:
Incorrect. 5.7 para. 5 says, "f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."



My phobia is based more on disgust than logic.

Let's say we have the following structure which is used extensively
throughout our program:

struct Monkey {

long double settings[64];

unsigned long vars[128];

};

It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.

C++ is my favourite programming langauge because it's efficient in a
hardcore kind of way, but also has advanced, fancy features. Unions are a
brilliant example of such efficiency. Some things disgust me though,
namely "one past the end", and the way in which you can supply "delete"
and "free" with a null pointer... it would have been more efficient to
have:

template<class T>
inline void ndelete( T p ) { if (p) delete p; }


-Tomás
 
M

Mark P

Tomás said:
Mark P posted:
[example code snipped]

OK, but what then to make of 5.2.10.7 describing reinterpret_cast,
below?

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.65) Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified."

[alignment details snipped]
A "char" has the least alignment requirements -- there's no problem.

I don't think you're reading this correctly. Compacting some of the
clauses, I can write that section as:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that [a certain special case does a
special thing], the result of such a pointer conversion is unspecified."

That is, unless you're casting from T1* to *T2 and back to *T1 (with the
additional proviso about alignment), the result of this conversion is
unspecified.

-Mark
 
M

mlimber

Tomás said:
mlimber posted:
Incorrect. 5.7 para. 5 says, "f the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, and if the expression Q points one
past the last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer operand
and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce
an overflow; otherwise, the behavior is undefined."



My phobia is based more on disgust than logic.

Let's say we have the following structure which is used extensively
throughout our program:

struct Monkey {

long double settings[64];

unsigned long vars[128];

};

It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.


You're not wasting it; you just can't put a Monkey there (apparently).
You could, however, potentially put many other things in that memory.

Cheers! --M
 
M

mlimber

Tomás said:
mlimber posted:
Incorrect. 5.7 para. 5 says, "f the expression P points to the last
[snip]

It doesn't seem very C++-ish (or even C-ish) to effectively waste the
last kilobyte or so of memory.


PS, Footnote 75 in that same section says, "[A]n implementation need
only provide one extra byte (which might overlap another object in the
program) just after the end
of the object in order to satisfy the 'one past the last element'
requirements."

Cheers! --M
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top