char *, unsigned char * and POD types

J

john

Hi, at first the code doesn't seem to work. Any ideas?:


#include <iostream>
#include <cstdlib>

int main()
{
using namespace std;

int x= 7;

char *p= reinterpret_cast<char *>(&x);

for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';

}

Two more questions.

Q1) Is the above char * use guaranteed to work with all POD types?
Q2) If I remember well, unsigned char * covers more types. Am I wrong,
and it covers only POD types?


Thanks in advance.
 
R

Rolf Magnus

john said:
Hi, at first the code doesn't seem to work.

Define "doesn't seem to work".
Any ideas?:


#include <iostream>
#include <cstdlib>

int main()
{
using namespace std;

int x= 7;

char *p= reinterpret_cast<char *>(&x);

for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';


This will print the bytes as characters.
}

Two more questions.

Q1) Is the above char * use guaranteed to work with all POD types?

IIRC, the result of the reinterpret_cast is unspecified.
Q2) If I remember well, unsigned char * covers more types. Am I wrong,
and it covers only POD types?

It's the same for char and unsigned char. Only POD types are covered.
 
J

Jim Langston

Rolf said:
john said:
Hi, at first the code doesn't seem to work.

Define "doesn't seem to work".
Any ideas?:


#include <iostream>
#include <cstdlib>

int main()
{
using namespace std;

int x= 7;

char *p= reinterpret_cast<char *>(&x);

for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';


This will print the bytes as characters.


Right, what did you (the OP) expect it to output? On most computers, one
character would be 0x07 and the other 3 0x00 or nulls. If you wanted to see
the value of the bytes (the numerical value) you'll need to cast the char to
a number.
cout<< static_cast<int>( p ) << '\n';
may give you what you expect, although you haven't stated what you expect so
I can only guess. You may want to use static_cast said:
IIRC, the result of the reinterpret_cast is unspecified.

AFAIK there is no set requirement for a C or C++ program to store their
numbers in any particular way as long as they follow the requirements, I.E.
sizeof char <= sizeof int <= sizeof long int etc... A computer could use
whatever internal storage it deems best. So the output of the program is
unspecified, and for a fact on bigendian and little endian machines you will
get different outputs.
It's the same for char and unsigned char. Only POD types are covered.

I don't understand the question. "covers more types" of what? Basically
what you are trying to do (or so I think) is close, you just need to cast
the character to a number to see the value of it.
 
J

James Kanze

Rolf said:
john wrote:
Hi, at first the code doesn't seem to work.
Define "doesn't seem to work".
Any ideas?:
#include <iostream>
#include <cstdlib>
int main()
{
using namespace std;
int x= 7;
char *p= reinterpret_cast<char *>(&x);
for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';

This will print the bytes as characters.

Right, what did you (the OP) expect it to output? On most
computers, one character would be 0x07 and the other 3 0x00 or
nulls.

s/3/n/

The number of bytes in an int is not necessarily 4.
If you wanted to see the value of the bytes (the numerical
value) you'll need to cast the char to a number.
cout<< static_cast<int>( p ) << '\n';
may give you what you expect, although you haven't stated what
you expect so I can only guess. You may want to use
static_cast<unsigned int>.


You might be surprized by the output of unsigned int. Not here,
of course, but if char is signed, and one of the bytes
corresponds to a negative value. Converting first to unsigned
char, then to int or unsigned int does the trick, but generally,
if the goal is a binary dump of the contents of something, I'd
cast the address to unsigned char*, rather than char*, and then
cast to int in the output. I'd also output in hexadecimal.
AFAIK there is no set requirement for a C or C++ program to
store their numbers in any particular way as long as they
follow the requirements, I.E. sizeof char <= sizeof int <=
sizeof long int etc... A computer could use whatever internal
storage it deems best. So the output of the program is
unspecified, and for a fact on big endian and little endian
machines you will get different outputs.

The standard does require a pure binary representation for the
integral types, and that corresponding unsigned and signed types
have the same representation for values which are representable
in both, which means the sign bit is the high order bit. (At
least, I seem to recall something like the second requirement.)
For other than character types, however, you can have padding
bits, which play no part in the value.

The values actually output depend very much on the
implementation, and do vary from one architecture to the next,
and in some cases, from one compiler to the next. On the other
hand, reinterpret_cast, used in this way, invokes no undefined
behavior, and is guarantee to give you a memory dump of whatever
your looking at.
I don't understand the question. "covers more types" of what?
Basically what you are trying to do (or so I think) is close,
you just need to cast the character to a number to see the
value of it.

About the only thing I can think of: char can be signed, and if
the machine uses 1's complement or signed magnitude, then +/-
will both output a simple 0, even though they have different
representations at the bit level.

Also, if you output an int with the value 511, on most machines
(2's complement), if char is signed, one of the bytes will have
the value -128, which is probably not what is wanted. For any
use I can think of this, 255, or more likely, FF, would be more
reasonable.
 
T

Tomás Ó hÉilidhe

john said:
#include <iostream>
#include <cstdlib>

int main()
{
using namespace std;

int x= 7;

char *p= reinterpret_cast<char *>(&x);

for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';

}

Two more questions.

Q1) Is the above char * use guaranteed to work with all POD types?
Q2) If I remember well, unsigned char * covers more types. Am I wrong,
and it covers only POD types?



A "plain" char should only be used for storing characters. If you
want to use a byte for a different purpose (e.g. storing numbers), then
go with unsigned char or signed char.

If you're trying to print the bytes of an object, then the following
code is perfectly well-defined and portable:

#include <iostream>

template<class T>
void PrintBytes(T const &obj)
{
char unsigned const volatile *p =
reinterpret_cast<char unsigned const volatile*>(&obj);

char unsigned const volatile *const pend = p + sizeof obj;

do std::cout << *p++;
while (pend != p);
}

Despite someone has suggested to the contrary, the behaviour of the
reinterpret_cast is perfectly well-defined.
 
J

john

Tom said:
>
john said:
#include <iostream>
#include <cstdlib>

int main()
{
using namespace std;
int x= 7;
char *p= reinterpret_cast<char *>(&x);
for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';
}

Two more questions.

Q1) Is the above char * use guaranteed to work with all POD types?
Q2) If I remember well, unsigned char * covers more types. Am I wrong,
and it covers only POD types?



A "plain" char should only be used for storing characters. If you
want to use a byte for a different purpose (e.g. storing numbers), then
go with unsigned char or signed char.


AFAIK, only unsigned char * and char * are guaranteed to work for
getting the byte values of a POD, and not a signed char *.

If you're trying to print the bytes of an object, then the following
code is perfectly well-defined and portable:

#include <iostream>

template<class T>
void PrintBytes(T const &obj)
{
char unsigned const volatile *p =
reinterpret_cast<char unsigned const volatile*>(&obj);

char unsigned const volatile *const pend = p + sizeof obj;

do std::cout << *p++;
while (pend != p);
}

Despite someone has suggested to the contrary, the behaviour of the
reinterpret_cast is perfectly well-defined.


I think volatile is not necessary for non volatile PODs, and I think
here is an overkill, or am I missing something?
 
J

john

john said:
AFAIK, only unsigned char * and char * are guaranteed to work for
getting the byte values of a POD, and not a signed char *.




I think volatile is not necessary for non volatile PODs, and I think
here is an overkill

==> for this reason,


or am I missing something?
 
T

Tomás Ó hÉilidhe

john said:
AFAIK, only unsigned char * and char * are guaranteed to work for
getting the byte values of a POD, and not a signed char *.


You might be right... I hadn't thought about it coz I'd never use a signed
char for that purpose. I know signed char can't have padding bits, but I
wonder if it can still have invalid bit patterns (negative zero and all
that lark).

If dealing with two's complement, I don't see any reason why you couldn't
use signed char

I think volatile is not necessary for non volatile PODs, and I think
here is an overkill, or am I missing something?


I just stuck it in so you could use it on anything.
 
J

James Kanze

Tomás Ó hÉilidhe said:
#include <iostream>
#include <cstdlib>
int main()
{
using namespace std;
int x= 7;
char *p= reinterpret_cast<char *>(&x);
for(size_t i= 0; i< sizeof(x); ++i)
cout<< p<< '\n';
}
Two more questions.
Q1) Is the above char * use guaranteed to work with all POD types?
Q2) If I remember well, unsigned char * covers more types. Am I wrong,
and it covers only POD types?

A "plain" char should only be used for storing characters. If you
want to use a byte for a different purpose (e.g. storing numbers), then
go with unsigned char or signed char.

For clarity's sake. As far as the standard is concerned, char,
unsigned char and signed char are all (small) integral types.
Using plain char only for characters, the other two when you
want small integers, and unsigned char for raw memory, is a good
convention however.
If you're trying to print the bytes of an object, then the following
code is perfectly well-defined and portable:

Not unless the bytes in the object all correspond to printable
characters:).
#include <iostream>
template<class T>
void PrintBytes(T const &obj)
{
char unsigned const volatile *p =
reinterpret_cast<char unsigned const volatile*>(&obj);
char unsigned const volatile *const pend = p + sizeof obj;
do std::cout << *p++;
while (pend != p);
}
Despite someone has suggested to the contrary, the behaviour of the
reinterpret_cast is perfectly well-defined.

Sort of. Formally, I don't think that the standard guarantees
the above; you'd have to replace the reinterpret_cast with
static_cast< unsigned char const* >( static_cast< void const* >(
&obj ) ). Practically, there are enough other constraints on
reinterpret_cast that I can't imagine an implementation where it
didn't work (and it's what I also use).

On the other hand, outputting non-printable characters to a
stream opened in text mode is undefined behavior, and what
actually gets output also depends on the system. If obj is an
int with the value 10, for example, the above code outputs 4
bytes under Linux, 5 under Windows, both on an Intel PC.

I'm also dubious about the utility of volatile here.

I use a class template template Dump (and a function template
which returns it, for type induction) with the following
function:

template< typename T >
Dump< T >::Dump(
T const& obj )
: myObj( reinterpret_cast< unsigned char const* >( &obj ) )
{
}

template< typename T >
void
Dump< T >::print(
std::eek:stream& dest ) const
{
IOSave saver( dest ) ;
dest.fill( '0' ) ;
dest.setf( std::ios::hex, std::ios::basefield ) ;
unsigned char const* const
end = myObj + sizeof( T ) ;
for ( unsigned char const* p = myObj ; p != end ; ++ p ) {
if ( p != myObj ) {
dest << ' ' ;
}
dest << std::setw( 2 ) << (unsigned int)( *p ) ;
}
}

(The class provides an operator<< which calls this function, and
there is an function template which returns an instance of the
class, to exploit type deduction, so you can write things like:

int i = 10 ;
std::cout << "value = " << i
<< " (" << Gabi::dump( i ) << ")n" ;

and see something like:

value = 10 (00 00 00 0a)
.)
 
T

Tomás Ó hÉilidhe

James Kanze said:
Not unless the bytes in the object all correspond to printable
characters:).


We're printing unsigned char's, not char's. That should result in numbers
being printed rather than characters... right?

Sort of. Formally, I don't think that the standard guarantees
the above; you'd have to replace the reinterpret_cast with
static_cast< unsigned char const* >( static_cast< void const* >(
&obj ) ).


That's just for people who wet the bed at the thought of reinterpret
cast.

Every object is made up of bytes -- *every* object. The reinterpret cast
here is the perfect candidate for the job.

Practically, there are enough other constraints on
reinterpret_cast that I can't imagine an implementation where it
didn't work (and it's what I also use).

On the other hand, outputting non-printable characters to a
stream opened in text mode is undefined behavior, and what
actually gets output also depends on the system. If obj is an
int with the value 10, for example, the above code outputs 4
bytes under Linux, 5 under Windows, both on an Intel PC.


Again, I would have expected unsigned char's to result in the printing of
numbers instead of characters.
 
J

James Kanze

We're printing unsigned char's, not char's. That should result
in numbers being printed rather than characters... right?

No. (You've got a point that it probably should, but for
various historical reasons...)
That's just for people who wet the bed at the thought of
reinterpret cast.
Every object is made up of bytes -- *every* object. The
reinterpret cast here is the perfect candidate for the job.

Except that the standard doesn't say so. Practically speaking,
the standard doesn't even guarantee that you can
reinterpret_cast a char* to an int* without getting a core dump.

Realistically, the standard doesn't say so, because it wants
reinterpret_cast to be pragmatically useful, and what it takes
to be pragmatically useful depends very much on the machine
architecture. IMHO, the intent is clear, and I don't worry
about using it when I'm working this close to the hardware.
 
J

john

James said:
Except that the standard doesn't say so. Practically speaking,
the standard doesn't even guarantee that you can
reinterpret_cast a char* to an int* without getting a core dump.

Realistically, the standard doesn't say so, because it wants
reinterpret_cast to be pragmatically useful, and what it takes
to be pragmatically useful depends very much on the machine
architecture. IMHO, the intent is clear, and I don't worry
about using it when I'm working this close to the hardware.


If I recall well from past discussions in clc++,
"reinterpret_cast<unsigned char *>(&x)" will not work as expected in
multiple inheritance cases (like a class C inheriting from both classes
A and B, while

"static_cast<unsigned char *> (static_cast<void *> (&x));" will work.


class A;

class B;

class C: public A, public B
{
// ...
};
 
J

James Kanze

James Kanze wrote:
If I recall well from past discussions in clc++,
"reinterpret_cast<unsigned char *>(&x)" will not work as expected in
multiple inheritance cases (like a class C inheriting from both classes
A and B, while
"static_cast<unsigned char *> (static_cast<void *> (&x));" will work.
class C: public A, public B
{
// ...
};

For what definition of work? I expect that both
reinterpret_cast and static_cast will behave more or less
identically here. In both cases, starting from the address of
the complete object, you'll end up with a pointer to the first
byte of the complete object. In both cases, starting with the
address of one of the base classes, you'll get the address of
the first byte of the sub-object.

There may be problems in the case where the compiler has applied
the empty base class optimization, but I would expect the
problems to be present in both cases. And in neither case can
you cast the resulting pointer back to anything but its original
type, and expect to use it. (In general, any time you go
through a void*, you have to be careful of this. It's a
frequent error with callbacks.)

The difference is that in the case of the two static_cast, the
standard requires it to work, where as in the case of
reinterpret_cast, the standard formally leaves it
"implementation defined" (but with a number of indications that
the intent is for it to work).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top