Q: reinterpret_cast with undefined behavior?

Jakob Bieling · Apr 30, 2005

Hi,

I am trying to determine the endianness of the system as follows:

int i = 1;
bool is_little = *reinterpret_cast <char*> (&i) != 0;

But now I was asking myself, if this use of reinterpret_cast is
valid, according to the Standard.

thanks!

ben · Apr 30, 2005

signed char or unsigned char?

ben

Jakob Bieling · Apr 30, 2005

signed char or unsigned char?

On my system signed, but why does this matter? I am testing against
0 ..

Peter Koch Larsen · Apr 30, 2005

Jakob Bieling said:
Hi,

I am trying to determine the endianness of the system as follows:

int i = 1;
bool is_little = *reinterpret_cast <char*> (&i) != 0;

But now I was asking myself, if this use of reinterpret_cast is valid,
according to the Standard.

thanks!

This is perfectly valid - just go ahead. The signedness of char is
irrelevant in this respect.

/Peter

Andrew Koenig · Apr 30, 2005

int i = 1;
bool is_little = *reinterpret_cast <char*> (&i) != 0;

But now I was asking myself, if this use of reinterpret_cast is valid,
according to the Standard.

It is not. There is no requirement that int* and char* be represented in
compatible ways.

In fact, I can't even think of any requirement that the bits in an int be
stored in any particular order. If an implementation decided to store all
the even-numbered bits together, and then all the odd-numbered bits, I don't
think there would be anything wrong with that.

Jakob Bieling · Apr 30, 2005

It is not. There is no requirement that int* and char* be
represented in compatible ways.

In fact, I can't even think of any requirement that the bits in an
int be stored in any particular order. If an implementation decided
to store all the even-numbered bits together, and then all the
odd-numbered bits, I don't think there would be anything wrong with
that.

Guess I have to be more exact. Assuming the implementation uses
either the big- or the little-endian byte order and the binary
numeration system (as it 'shall' be used, according to 3.9.1/7).
Sticking to those restrictions, I assume the above code has still
unspecified behaviour, as your first point still stands, right? So I
came up with this:

char tmp [sizeof (int)];
int* i = new (tmp) int (1);
bool is_little = tmp [0] != 0;

Can I be sure of reading the memory used by the int, by accessing
'tmp' (still assuming that we have an implementation that conforms to
the restrictions I made above)?

Also, I was wondering, if I would have to call the d'tor there,
since it is just an int?

thanks!

Peter Koch Larsen · May 1, 2005

Andrew Koenig said:
It is not. There is no requirement that int* and char* be represented in
compatible ways.

In fact, I can't even think of any requirement that the bits in an int be
stored in any particular order. If an implementation decided to store all
the even-numbered bits together, and then all the odd-numbered bits, I
don't think there would be anything wrong with that.

I read the question as "is this reinterpret_cast" legal and well-defined in
std C++. In that case the answer is a clear YES. If the question is in the
above code can determine all types of endianness, those in existence now as
well as any conceivable ones, then of course the answer is NO.

/Peter

Ioannis Vranos · May 1, 2005

Jakob said:
Hi,

I am trying to determine the endianness of the system as follows:

int i = 1;
bool is_little = *reinterpret_cast <char*> (&i) != 0;

But now I was asking myself, if this use of reinterpret_cast is
valid, according to the Standard.

I think this does the task:

#include <cstring>
#include <limits>
#include <vector>

bool IsLittleEndian()
{
using namespace std;

unsigned int num= 0;

unsigned char buffer[sizeof(num)];

memcpy(buffer, &num, sizeof(num));

// Stores indices to modified bytes of num
unsigned indices[2];

const unsigned char *p= reinterpret_cast<unsigned char *>(&num);

num= 1;

for(unsigned i=0; i<sizeof(num); ++i)
{
if(p!= buffer)
{
indices[0]= i;
break;
}
}

num+= numeric_limits<unsigned char>::max();

for(unsigned i=0; i<sizeof(num); ++i)
{
if(p!= buffer)
{
indices[1]= i;
break;
}
}

if(indices[0]< indices[1])
return true;

else
return false;
}

Kanenas · May 1, 2005

It is not. There is no requirement that int* and char* be represented in
compatible ways.

Is "convertable so that one can always address the same memory
location as the other" a fair definition of what you mean by
"compatible"?

In fact, I can't even think of any requirement that the bits in an int be
stored in any particular order. If an implementation decided to store all
the even-numbered bits together, and then all the odd-numbered bits, I don't
think there would be anything wrong with that.

Such as an Intercal machine? That's a scary thought.

Kanenas

Kanenas · May 1, 2005

It is not. There is no requirement that int* and char* be represented in
compatible ways.

Is "convertable so that a char* can always be made to point to the
same memory location as an int*" a fair definition of what you mean by
"compatible"?

In fact, I can't even think of any requirement that the bits in an int be
stored in any particular order. If an implementation decided to store all
the even-numbered bits together, and then all the odd-numbered bits, I don't
think there would be anything wrong with that.

Such as an Intercal machine? That's a scary thought.

Kanenas

Kanenas · May 1, 2005

Guess I have to be more exact. Assuming the implementation uses
either the big- or the little-endian byte order and the binary
numeration system (as it 'shall' be used, according to 3.9.1/7).
Sticking to those restrictions, I assume the above code has still
unspecified behaviour, as your first point still stands, right? So I
came up with this:

Under the condition that a char* can alway point to the same memory
location as an int*, the original test will work all of the time under
a decent compiler.

Under the condition that the size of a char pointer >= size of an int
pointer, the original test will work at least some of the time under a
decent compiler.

char tmp [sizeof (int)];
int* i = new (tmp) int (1);
bool is_little = tmp [0] != 0;

Can I be sure of reading the memory used by the int, by accessing
'tmp' (still assuming that we have an implementation that conforms to
the restrictions I made above)?

The new should fail if &tmp isn't properly aligned for ints.

Also, I was wondering, if I would have to call the d'tor there,
since it is just an int?

Only classes have destructors (structures and unions being kinds of
classes), so no.

Section 3.9/2 of the 1996 2nd draft standard (the most recent one I
have access to) not only guarantees there is a way of copying an int
into a char (or unsigned char) array but gives an example which
inspired:

unsigned char tmp[sizeof(int)];
int i=1;
memcpy(tmp, &i, sizeof(int));
bool is_little = tmp [0];

Here's a test which also covers NUXI and IXUN machines.

unsigned char tmp[sizeof(int)];
int32_t i=0x01020304;
memcpy(tmp, &i, sizeof(int));
enum byte_order {
other, order_1234, order_2143, order_3412, order_4321
};
byte_order local_byte_order;
/*for the most part, test assumes bits within bytes
are not permuted. Test also assumes the first byte determines
the order (e.g. i isn't layed out as "\1\4\3\2")
*/
if (tmp[0] > 4) {
//bits within byte are permuted
local_byte_order = other;
} else {
local_byte_order = static_cast<byte_order>(tmp[0]);
}

Here's another test which is better behaved for really odd byte orders
(i.e. not 1234, 4321, 3412, 2143).

unsigned char tmp[sizeof(int)+1];
int32_t i=0x01020304;
memcpy(tmp, &i, sizeof(int));
tmp[sizeof(int)] = 0;
enum byte_order {
other, order_1234, order_2143, order_3412, order_4321
};
byte_order local_byte_order;
if (strcmp(tmp, "\01\02\03\04"))
local_byte_order = order_1234;
else if (strcmp(tmp, "\02\01\04\03"))
local_byte_order = order_2143;
else if (strcmp(tmp, "\03\04\01\02"))
local_byte_order = order_3412;
else if (strcmp(tmp, "\04\03\02\01"))
local_byte_order = order_4321;
else
local_byte_order = other;

Picking a magic number better than 0x01020304 may allow for easier,
broader and more stable testing.

Kanenas

Kanenas · May 1, 2005

I read the question as "is this reinterpret_cast" legal and well-defined in
std C++. In that case the answer is a clear YES. If the question is in the
above code can determine all types of endianness, those in existence now as
well as any conceivable ones, then of course the answer is NO.

Legal, yes. Well-defined, no. Probably work, yes. The requirement
(at least, in older standards) is that reinterpret_cast will convert
between object pointers, though the result is unspecified (not
undefined, but not well-defined). Here is the section of the 1996
draft standard stating the requirement:

5.2.10/7:
<quote>
A pointer to an object can be explicitly converted to a pointer to
an object of different type*. Except that converting an rvalue of type
"pointer to T1" to the type "pointer to T2" (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified.

* The types may have different cv-qualifiers, subject to the overall
restriction that a reinterpret_cast cannot cast away constness.
</quote>

Note that "object" in the quote above includes built-in types. Note
also that while one could imagine an architecture where the alignment
for chars is stricter than for ints, a real-world computer with such
an alignment scheme would be quite perverse (especially considering
that sizeof(int) >= sizeof(char)).

As long as all memory locations which can store an int can be
addressed by a char*, it's possible the original test to work under a
decent (whatever that means) compiler.

For the original test to work, it's not sufficient that pointer
conversion be value-preserving (i.e. the number of possible values for
pointers to two different types are the same, i.e. all pointers are of
the same size). This will guarantee pointer conversions will be
invertible, but there still may be some integer pointers which cannot
be converted to a char* which points to the same location as the int*.
For example, consider an architecture where int pointers are aligned
by shifting insignificant bits (e.g. an int32_t* with value 0x10
points at memory location 0x40) and all pointers are of the same size.
int pointers address the same number of locations as char pointers but
can address memory outside of the char arena (the maximum memory size
for the integer arena is sizeof(int) times greater than the char
arena). The original test will fail if 'i' happens to be located at a
memory location where a char* cannot point. There's a kind of
symmetry here: a char* can point inside the int arena where an int*
cannot point and an int* can point outside of the char arena where a
char* cannot point. (As a side note, pointer conversion by shifting
on such a machine is not compliant, as shifting is not invertible in
all cases; conversion by rotation, however, is compliant.)

As another approach to the original question, consider the following
gleaned from Stroustrup: any object pointer can be implicitly
converted to a void*, and a void* can be explicitly converted to a
pointer to any object type. This implies that (via conversion to a
void*) an int* can be converted to a char*, though it doesn't
guarantee that the resultant char* points to the same memory location
as the original int*.

Delving into this really made me glad I'm not responsible for the
standard.

Kanenas

Ioannis Vranos · May 1, 2005

Kanenas said:
Legal, yes. Well-defined, no.

The standard permits to treat POD types as sequences of chars/unsigned chars. In this
regard the behaviour is well defined. Its logic is flawed though. For example there may be
implementation oriented bits like padding bits with the value 1.

Jack Klein · May 2, 2005

Hi,

I am trying to determine the endianness of the system as follows:

int i = 1;
bool is_little = *reinterpret_cast <char*> (&i) != 0;

But now I was asking myself, if this use of reinterpret_cast is
valid, according to the Standard.

thanks!

I would prefer to used pointer to unsigned char here, because this is
the generic raw data type in C and C++.

I hope you realize that this test will always pass on a platform where
sizeof(int) == 1, which doesn't really tell you anything about endian
orientation at all.

Such platforms are rare outside the world of digital signal
processors, but there are indeed C++ compilers for some DSPs where
this is true.

reinterpret_cast	15	Jun 10, 2008
'use reinterpret_cast to group neighbored values together' defined inc++ standard?	12	Jan 6, 2013
Does the following snippet show undefined behavior?	10	Mar 30, 2014
dynamic cast vs reinterpret_cast	9	Feb 9, 2010
Const_cast as undefined behavior?	35	Jan 9, 2013
Use of undefined constant error	2	Jun 30, 2022
Problems with tertiary operator and reinterpret_cast	2	Jul 25, 2006
The behavior of istream.	3	Feb 28, 2014

Q: reinterpret_cast with undefined behavior?

Jakob Bieling

ben

Jakob Bieling

Peter Koch Larsen

Andrew Koenig

Jakob Bieling

Peter Koch Larsen

Ioannis Vranos

Kanenas

Kanenas

Kanenas

Kanenas

Ioannis Vranos

Jack Klein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads