casting X* to char*

J

Jack Saalweachter

Tomás said:
mlimber posted:

I have a phobia of pointers to "one past the end". (In fact I've a phobia
of pointers which point to anything other than legitimate addresses.)

This is ultimately open ranges versus closed ranges; [p, q) versus [p,
q]. Open ranges have a number of advantages over the closed ranges
[which you enspouse].

For one, note that open ranges are able to represent the 'empty range'.

When 'p == q' (or begin() == end()), [p, q) represents an empty range.
Iteration over it is naturally avoided; hell, the pointers don't even
have to be valid. Iterating from [0, 0) is perfectly safe.

There is no way, however, to represent the empty range with closed
ranges. Consider constructing your 'p_last_byte' over a range of /zero/
bytes [granted, I'm not certain it could happen in ISO C++]: p + 0 - 1 =
p-1! The range is now [p, p-1]. Not only will your iteration merrily
print out *p, it will find that p != p-1, p+1 != p-1, ... And if p == 0,
p-1 = ~0, so your range becomes [0, ~0).


If you look into the standard algorithms and they way they're used, open
ranges prevent a lot of unnecessary checks. For instance:

v.erase( remove_if(v.begin(), v.end(), X), v.end() );

is perfectly safe thanks to the magic of open ranges. If there are no
elements to be removed, remove_if returns v.end(), and v.erase sees the
empty range [v.end(), v.end()). If v.begin() == v.end(), remove_if
doesn't die, and simply does nothing, returning v.end().

If the STL didn't use open ranges for iterators, that line of code would
require two separate checks, one to ascertain that v was not empty (or
else remove_if would die) and one to determine that remove_if had found
elements to remove (or else erase would die).


Jack Saalweachter
 
S

Salt_Peter

dan2online said:
Salt_Peter said:
Mark said:
A colleague asked me something along the lines of the following today.

For some type X he has:

X* px = new X[sz];

Then he wants to convert px to a char* (I'm guessing for the purpose of
serializing the object array).
What for? Why not provide your own operator<< and operator>> for type X?
Lets consider what happens if type X is composed of primitive types,
containers, pointers and references.

char *pc = (char *)px is reasonable for many cases if you want to
manipulate the bytes. In this scenario, X is plain old data type
(POD). Here is an example,
double *px = new double [100];
char *pc = (char *)px, so you can decode the floating point format by
manipulating the raw byte string.

But if the type X is not POD, it looks complicated to access the raw
byte string.

I'll say it again since you haven't yet got the picture. Even with a
POD, that raw byte string will often include padding. Consider a complex
Pod with componants that don't fit nicely together. ie: 2 chars and a
double.

The use of operator overloading is a far more powerful, efficient,
reusable, portable, maintainable, extendeable and safe way to transfer
bits around. Its a win-win bargain and bug free - no pointers involved.
Technically, op<< and op>> are universal in that any type can be
streamed efficiently with all padding striped away - guaranteed. Any
interface that can accept a std::stream& will do to swallow or send. And
remember that the overloaded operator is not a member function, the POD
need not be a class.

Imagine a complex PODA which is a member of a PODB type. There is no
need to write a new function to stream both PODB and its PODA member +
other members. The operator you wrote for PODA will do just fine, you
only need worry about PODB's immediate needs since PODA already knows
how to stream itself (its an object, not just a bunch of bytes).

Again, if you need a container of POD elements and you require streaming
the entire container's contents, you already have an operator for the
elements. Regardless of whether the container is sequential or not and
irrelevent of the padding constraints.

Programming the bit transfer becomes much, much easier, bullet proof and
with a lot less code.
 
T

Tomás

mlimber posted:

PS, Footnote 75 in that same section says, "[A]n implementation need
only provide one extra byte (which might overlap another object in the
program) just after the end
of the object in order to satisfy the 'one past the last element'
requirements."


Now I see : ).


Pointer to one past the end it is!


-Tomás
 
T

Tomás

Mark P posted:

That is, unless you're casting from T1* to *T2 and back to *T1 (with the
additional proviso about alignment), the result of this conversion is
unspecified.


To be honest, I don't need to read anything from the Standard, because I
know I have to be right (I'm not being arogant, please bear with me...).
The Laws of Physics and The Laws of Mathematics over-rule anything that's
written in a programming language standard.

Firstly the Standard says that you can convert any pointer type to a
void*, e.g.:

void Func( double *p1, unsigned *p2, char* p3 )
{
void *p;

p = p1;

p = p2;

p = p3;
}

And it also says that you can convert back, and the original address
value will be perfectly preserved. Sample:

int main()
{
double k;

void *p1 = &k;

double *p2 = static_cast<double*>(p1);


*p2 = 45.372;
}


We all know that the smallest thing in C++ is a byte. No structure shall
be 8.5 bytes, or 2 and a third bytes, or one eight of a byte. Whole bytes
only.

Therefore, by simple logic, one can see that every object is made up of
bytes, and that every object can be accessed as simply an array of bytes.

At the end of the day we're only dealing with chips, and electrical
current, and bits and bytes, there's nothing mysterious.

One thing which puzzled me before is this:
Why was there a void* in C++ at all? A "char*" can reliably store any
address, so why did we need a "void*". If you'd like, here's the thread:


http://groups.google.ie/group/comp.std.c++/browse_frm/thread/7da690d52e6d
f286/86e2383d1f830ddb?tvc=1&q=void*+group%3Acomp.std.c%2B%2B+author%
3ATom%C3%A1s&hl=en#86e2383d1f830ddb


-Tomás
 
M

Mark P

Tomás said:
Mark P posted:

That is, unless you're casting from T1* to *T2 and back to *T1 (with the
additional proviso about alignment), the result of this conversion is
unspecified.


To be honest, I don't need to read anything from the Standard, because I
know I have to be right[...]
[snip]


Therefore, by simple logic, one can see that every object is made up of
bytes, and that every object can be accessed as simply an array of bytes.

Perhaps, but that wasn't my point. Would there be anything standard
non-conforming about an implementation which, for X not void and when
casting from void* to X* adds sizeof(X) to the address (modulo the
allowed range of addresses) and when casting from X* to void* subtracts
sizeof(X) from the address (modulo the allowed range of addresses)?

I'll repeat here the section of the standard I quoted earlier, with
added (by me) emphasis on the final clause:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, *the result of such a pointer conversion is unspecified*."

Is not the result of the single operation of converting a char* to a
void* then unspecified?

-Mark
 
T

Tomás

Mark P posted:

Perhaps, but that wasn't my point. Would there be anything standard
non-conforming about an implementation which, for X not void and when
casting from void* to X* adds sizeof(X) to the address (modulo the
allowed range of addresses) and when casting from X* to void*
subtracts sizeof(X) from the address (modulo the allowed range of
addresses)?


See below.

"A pointer to an object can be explicitly converted to a pointer to an
object of different type.


Example:

double d; /* Source Type */

Except that converting an rvalue of type
“pointer to T1”


T1 = double

to the type “pointer to T2”


T2 = char

(where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1)


Nothing has less strict alignment requirements than a char.

and back to its original type yields the
original pointer value


Yippie, we've satisfied the conditions!

,*the result of such a pointer conversion is
unspecified*."

The two lines immediately above refer to when the conditions are NOT
satisfied.

We've gone from:

Strict alignment (double)

to:

Less strict alignment (char)


So we're okay. The Standard is actually giving us PLENTY of slack here!
For instance, the following will work perfectly if a long double has
stricter alignment requirements than an int:

long double ld;

int *p = reinterpret_cast<int*>(&ld);

long double *p2 = reinterpret_cast<long double*>(p);

*p2 = 453.235;


So there you have it: Anything can go to char* and then back to its
original pointer type.

-Tomás
 
T

Tomás

Tomás posted:



Here's a little program I threw together for giving the different pointer
sizes on a given platform. On Windows XP, it gives 4 for every one.


#include <iostream>
#include <cstdlib>
#include <cstring>

/* The following are only use for their types */
#include <string>
#include <vector>
#include <typeinfo>


template<unsigned width>
const char* CentreHoriz( const char* const p_in )
{
/* NB:

(1) Uses static data, so be careful with sequence points.
(2) Doesn't check that string isn't too long.
*/


static char buffer[width + 1]; /* Automatic null terminator */


std::memset( buffer, ' ', width * sizeof(*buffer) );


unsigned const len = std::strlen(p_in);

std::memcpy( buffer + width / 2 - len / 2,
p_in,
len);

return buffer;
}


template<class T>
void PrintRow( const char* const p )
{
std::cout
<< '|'
<< CentreHoriz<36>(p)
<< "|| "
<< sizeof(T)
<< " |\n"

<<
"-------------------------------------------------------------\n";
}




int main()
{
std::cout <<
"=============================================================\n"
"| How much memory does a particular pointer type consume? |\n"
"=============================================================\n"
"| Type || Bytes |\n"
"=============================================================\n";

PrintRow<char*>("char*");
PrintRow<short*>("short*");
PrintRow<int*>("int*");
PrintRow<long*>("long*");
PrintRow<float*>("float*");
PrintRow<double*>("double*");
PrintRow<long double*>("long double*");
PrintRow<bool*>("bool*");
PrintRow<wchar_t*>("wchar_t*");
PrintRow<std::string*>("std::string*");
PrintRow<std::vector<std::string>*>("std::vector<std::string>*");

std::cout << '\n';

std::system("PAUSE");
}

-Tomás
 
M

Mark P

Tomás said:
Mark P posted:

[irrelevant example snipped]
So we're okay. The Standard is actually giving us PLENTY of slack here!

[more cuts]
So there you have it: Anything can go to char* and then back to its
original pointer type.

That point has never been the issue even though you keep providing me
with examples to illustrate it. I think you're not understanding the
wording of the standard, so let me quote this yet again:

"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, the result of such a pointer conversion is unspecified."

Allow me to rearrange the second sentence without altering its meaning:

"The result of such a pointer conversion is unspecified except [for one
special situation]."

In other words, *all* reinterpret_cast pointer-to-pointer conversions
have unspecified behavior except for one special case where you convert
back and forth between two types and respect alignment.

You keep showing me examples of the special case and I have no
disagreement with you over that case but my point, again, is that for
all other cases the standard states that the result is unspecified. In
particular, the result of the one-way conversion from void* to char* is
unspecified.

Mark
 
A

Alf P. Steinbach

* Mark P:
In other words, *all* reinterpret_cast pointer-to-pointer conversions
have unspecified behavior except for one special case where you convert
back and forth between two types and respect alignment.

Almost, but not quite.

A pointer to a POD-struct object can be converted via a "suitable"
reinterpret_cast to a pointer to the first member of that object (and of
course back), §9.2/17.

Yes, the standard is a bit inconsistent here, and in other places, which
is why it can be good fun to discuss what the standard really means...
 
T

Tomás

Mark P posted:

In
particular, the result of the one-way conversion from void* to char* is
unspecified.


Are you questioning the legality of the following?:

int main()
{
int val;

void *pvoid = &val; /* This is perfectly okay */


char *pchar = static_cast<char*>(pvoid); /* May get corrupted? */


pvoid = pchar; /* May not be reliable? */



int *p = static_cast<int*>(pvoid);

*p = 7;
}


Maybe the Standard doesn't state in plain English that this is legal...
but in my mind, it doesn't have too.

We all know that a "void*" can store ANY address reliably.

As every object is made up of bytes, we can also assume that a "char*"
can store ANY address reliably.

If two pointer types can store ANY address reliably, then it makes sense
that you can convert back and forth.

There have been many times when contemplating C++ that I thought I had
thought of everything... but then someone points out to me something that
I've overlooked. I can see no reason why a "char*" could not store any
address reliably, but nonetheless, an extra explicit paragraph in the
Standard wouldn't hurt.

All logic and reasoning aside, you could fill a warehouse with code that
stores arbitrary memory addresses in a "char*" (my own code included), so
it just wouldn't be appropriate make it illegal.

This situation of "everybody's doing it, so we better make it legal" has
propogated elsewhere. In C code, you'd commonly see the following to get
the address of the one-past-last element of an array:

int array[10];

int *p = &array[10];


The line immediately above gets turned into:

int *p = & *(array + 10);


As you can see, an invalid pointer gets dereferenced -- Undefined
Behaviour.

However, so many people do it in their code that the Standard committee
decided that an addressof operator followed immediately by a dereference
operator cancel each other out. Therefore the line of code becomes:

int *p = array + 10;


No more undefined behaviour. A good example of "bowing to what every
does".

But at the end of the day, I like to look at things from the perspective
of:

No matter how complicated or advanced or fancy a programming language
becomes, it's still built on bits, bytes and CPU instructions. If the
smallest addressible memory unit in C++ is going to be a char, then we
should be able to store ANY legitimate memory address in a "char*".


-Tomás
 
M

Mark P

Tomás said:
Mark P posted:




Are you questioning the legality of the following?:

int main()
{
int val;

void *pvoid = &val; /* This is perfectly okay */


char *pchar = static_cast<char*>(pvoid); /* May get corrupted? */


pvoid = pchar; /* May not be reliable? */



int *p = static_cast<int*>(pvoid);

*p = 7;
}

The issue is not "legality" in the sense of a well-formed program. The
issue, again, is that the behavior may be unspecified. In your specific
example I believe that this is *not* the case, however, since your
conversion sequence is: int* -> void* -> char* -> void* -> int*. In
particular, your conversion sequence "unwinds" itself and retraces its
steps back to the original type. Thus it falls under the special case
in the section of the standard that I have already quoted 4 times (and
won't repeat again for fear of violating copyright restrictions :) ).

Suppose instead you had offered:

int main ()
{
int val = 0;

void* pv = &val;
char* pc = static_cast<char*>(pv);
int* pi = static_cast<int*> (pc);

*pi = 7; // now, what is val?
}

However logical it may seem that val should be 7, the standard
nonetheless indicates that the value of val is unspecified.
Maybe the Standard doesn't state in plain English that this is legal...
but in my mind, it doesn't have too.

We all know that a "void*" can store ANY address reliably.

Only you know what you mean by reliably.
As every object is made up of bytes, we can also assume that a "char*"
can store ANY address reliably.
Ditto.


If two pointer types can store ANY address reliably, then it makes sense
that you can convert back and forth.

And where did you get the idea that the standard always makes sense? :)
There have been many times when contemplating C++ that I thought I had
thought of everything... but then someone points out to me something that
I've overlooked. I can see no reason why a "char*" could not store any
address reliably, but nonetheless, an extra explicit paragraph in the
Standard wouldn't hurt.

All logic and reasoning aside, you could fill a warehouse with code that
stores arbitrary memory addresses in a "char*" (my own code included), so
it just wouldn't be appropriate make it illegal.

It's clearly not illegal. Depending how it's used it may be unspecified
(though it's hard to imagine that an implementation would go out of its
way to make this not work as one would assume).

-Mark
 
D

dan2online

Salt_Peter said:
I'll say it again since you haven't yet got the picture. Even with a
POD, that raw byte string will often include padding. Consider a complex
Pod with componants that don't fit nicely together. ie: 2 chars and a
double.

How will the padding bytes affect the manipulation of raw byte string?
In many cases, we need to look inside the internal format of the raw
byte string.
The use of operator overloading is a far more powerful, efficient,
reusable, portable, maintainable, extendeable and safe way to transfer
bits around. Its a win-win bargain and bug free - no pointers involved.
Technically, op<< and op>> are universal in that any type can be
streamed efficiently with all padding striped away - guaranteed. Any
interface that can accept a std::stream& will do to swallow or send. And
remember that the overloaded operator is not a member function, the POD
need not be a class.

It will depend on your application.
Imagine a complex PODA which is a member of a PODB type. There is no
need to write a new function to stream both PODB and its PODA member +
other members. The operator you wrote for PODA will do just fine, you
only need worry about PODB's immediate needs since PODA already knows
how to stream itself (its an object, not just a bunch of bytes).

Again, if you need a container of POD elements and you require streaming
the entire container's contents, you already have an operator for the
elements. Regardless of whether the container is sequential or not and
irrelevent of the padding constraints.

Programming the bit transfer becomes much, much easier, bullet proof and
with a lot less code.

It is true for most cases, but not universal.
 
T

Tomás

Mark P posted:
In your specific example I believe that this is *not* the case,
however, since your conversion sequence is: int* -> void* -> char* ->
void* -> int*.

The point of my code is that I go from T* to char* and then back to T*. I
draw an analogy with other types:

double a = 56.253; /* Here's our original value */

int b = a; /* We store it in a different type */

double c = b; /* Now we bring it back to the original type */

assert( a == c ); /* Will the value have been preserved? */


In the example immediately above, "information will be lost" when we go
from double to int. Even though we finally go back to double, the
"corruption" has already taken place. Now let's look at it with pointers:

int a;

int *pint = &a; /* Here's our original value */

void *pvoid = pint; /* We store it in a different type */

int *pint2 = pvoid; /* We bring it back to the original type */

assert( pint2 == pint ); /* Will the value have been preserved? */


The above code snippet is guaranteed to work because you can store any
object's address in a "void*" and there won't be any "corruption".

As you quoted several times, the Standard also specifies that you can
reliably go from T1* to T2* without "corruption", but only if the
alignment requirements of T2 are no stricter. As "char" is the smallest
and most simple type we have in C++, it should have the least alignment
requirements (if not none). Therefore the conversion from any legitimate
pointer value to "char*" should go off without a hitch. Example:

int a;

int *pint = &a; /* Here's our original value */

char *pchar =
static_cast<char*>(pint); /* We store it in a different type */

int *pint2 =
static_cast<int*>(pchar); /* We bring it back to the original type */

assert( pint2 == pint ); /* Will the value have been preserved? */


The above should be perfectly okay.

You have gone on to say, notwithstanding any of the above, that the
conversion from "void*" to "char*" may be unspecified. However, if you
consider that a "void*" (assuming it contains a legitimate address) had
to start off as some other pointer value, you can see how there should be
no problem with going to "char*", given that the original pointer value
would have been able to go directly to "char*". That is to say, if the
following is possible:

T* to char*

Then the following should also be possible:

T* to void* to char*

In particular, your conversion sequence "unwinds"
itself and retraces its steps back to the original type.


But as I demonstrated with my "double" example, the "corruption" has
already taken place.

Suppose instead you had offered:

int main ()
{
int val = 0;

void* pv = &val;


"pv" should hold val's address without any corruption.

char* pc = static_cast<char*>(pv);


"pc" should hold the address stored in pv without any corruptino.

int* pi = static_cast<int*> (pc);


Back to the original type. Shouldn't be any corruption.

*pi = 7; // now, what is val?


Should work perfectly.

However logical it may seem that val should be 7, the standard
nonetheless indicates that the value of val is unspecified.


I suppose we have to decide just how pedantic the Standard has to be.
Should it be enough for us to presume that it works (because there's
about ten voices in my head shouting "For God's sake it works!"), or
should we be thinking, "The Standard has to state it explicitly in plain
English"?

Only you know what you mean by reliably.


reliably = no corruption, the original value is preserved perfectly.


And where did you get the idea that the standard always makes sense?
:)


Sometimes that's the only hope we have.

It's clearly not illegal. Depending how it's used it may be
unspecified (though it's hard to imagine that an implementation would
go out of its way to make this not work as one would assume).


I would never thing twice about any "dangers" of using "char*". I see it
as a "universal pointer type", just like "void*".


-Tomás
 
M

Mark P

Tomás said:
As you quoted several times, the Standard also specifies that you can
reliably go from T1* to T2* without "corruption", but only if the
alignment requirements of T2 are no stricter.

No! That is not what it says. Read it again if it's not clear. It
says that you can [reliably] go from T1* to T2* and back to T1* subject
to alignment constraints. The "and back" clause is not optional; the
standard only guarantees the result when both casts are performed. This
does *not* imply that you can, for example, go from T1* to T2* to T3* to
T1*, which was exactly the example of my previous post.
I suppose we have to decide just how pedantic the Standard has to be.
Should it be enough for us to presume that it works (because there's
about ten voices in my head shouting "For God's sake it works!"), or
should we be thinking, "The Standard has to state it explicitly in plain
English"?

Insufficient pedantry of the Standard is not the issue here. In fact I
would argue it's the opposite. The Standard makes a point of stating
that the result of these casts is unspecified. Had the Standard said
nothing I might agree with you that it's been left for sensible people
to infer the obvious, but if the Standard explicitly tells us that the
result is unspecified, then you really can't make a case that we're
meant to infer the *opposite*.

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top