C style casting

R

Ramesh Tharma

Hi,

Is any one knows what's wrong with the following code, I was told that it
will compile and run but it will crash for some values.

Assume that variables are initilized.

char* c;
long* lg;

c = (char*) lg;
lg = (long*) c;

Thanks,
Ramesh
 
S

S.Tobias

Ramesh Tharma said:
Is any one knows what's wrong with the following code, I was told that it
will compile and run but it will crash for some values.
Assume that variables are initilized.
How?

char* c;
long* lg;
c = (char*) lg;
lg = (long*) c;

There's nothing generally wrong, except that in the last line
if `c' is not aligned for type `long', then UB is invoked and
anything may happen. Everything depends on where `c' points to,
and on the implementation's alignment requirements.
 
J

junky_fellow

S.Tobias said:
There's nothing generally wrong, except that in the last line
if `c' is not aligned for type `long', then UB is invoked and
anything may happen. Everything depends on where `c' points to,
and on the implementation's alignment requirements.

Also, dereferencing "c" may give different results on implementations
that have different endianness.
 
S

Steffen Buehler

Ramesh said:
Is any one knows what's wrong with the following code, I was told
that it will compile and run but it will crash for some values.

Assume that variables are initilized.

char* c;
long* lg;

c = (char*) lg;
lg = (long*) c;

I remember there are systems which allow char pointers to point to any
kind of address but restrict long pointers to only even addresses.
(Wasn't good old Amiga one of these?) So if you force a long pointer to
an odd address it will lead to an exception.

Best regards
Steffen
 
R

Richard Bos

Steffen Buehler said:
I remember there are systems which allow char pointers to point to any
kind of address

They must; char * has the same representation and alignment as void *.
but restrict long pointers to only even addresses.
(Wasn't good old Amiga one of these?)

Such things are quite common, I gather. But if you stick with what's
defined by C, you'll never have to worry about it.
So if you force a long pointer to an odd address it will lead to an exception.

Mind you, the code above is portable. The conversion from long * to char
* is allowed; and the conversion back, given that the first one is
correct, must result in the same long * that was originally converted to
char *.

Richard
 
B

bahadir.balban

The first conversion, from long to char is just fine. However there are
two problems with the second conversion.

1) If the char is not word aligned, i.e. if you have a 32-bit machine,
and the address of char does not have an address that is multiple of
four, then dereferencing of long will be unaligned. If your compiler
does not support unaligned access, its a problem.

2) You're accessing more than (4 bytes) what you've defined by char (1
byte). So although it is unlikely, the 3 extra bytes you access may not
be accessable at all.

Thanks,
Bahadir
 
S

S.Tobias

Also, dereferencing "c" may give different results on implementations
that have different endianness.

The question was only about pointer conversions, and only that is what
I gave my answer to. Dereferencing above pointers (which was not
asked about) is another issue, and byte-sex is not the biggest headache.
For example, plain `char' might be signed and have a trap representation;
dereferencing `c' in such situation might cause UB, too. (It is valid
to access any object as `unsigned char' though.)
 
R

Robert Gamble

S.Tobias said:
There's nothing generally wrong, except that in the last line
if `c' is not aligned for type `long', then UB is invoked and
anything may happen. Everything depends on where `c' points to,
and on the implementation's alignment requirements.

Actually, it is guaranteed that you can safely convert a pointer to
char* and back again, lg will have the same value as it did before the
conversions. If the original value was properly aligned, it will be
after the conversions as well.

Robert Gamble
 
S

S.Tobias

Actually, it is guaranteed that you can safely convert a pointer to
char* and back again, lg will have the same value as it did before the
conversions. If the original value was properly aligned, it will be
after the conversions as well.

Yes, exactly so. But since the whole thing was not compilable,
I assumed those lines were unrelated snippets. I agree, if they
were - as written - part of a block, there would be absolutely
nothing wrong with them.
 
P

Peter Nilsson

S.Tobias said:
For example, plain `char' might be signed and have a trap representation;
dereferencing `c' in such situation might cause UB, too. (It is valid
to access any object as `unsigned char' though.)

6.2.6.1p5 leaves character types exempt from trap representations.
There are potentiall signed and plain character representations for
which
the value is merely unspecified.
 
G

Grumble

Who are you replying to? Why did you snip the original question?

The first conversion, from long to char is just fine. However there are
two problems with the second conversion.

1) If the char is not word aligned, i.e. if you have a 32-bit machine,
and the address of char does not have an address that is multiple of
four, then dereferencing of long will be unaligned. If your compiler
does not support unaligned access, its a problem.

2) You're accessing more than (4 bytes) what you've defined by char (1
byte). So although it is unlikely, the 3 extra bytes you access may not
be accessable at all.

I have several platforms where the numbers you give are incorrect.

On the first platform, long is 64-bits wide.
On the second platform, long is 1 byte.
 
S

S.Tobias

6.2.6.1p5 leaves character types exempt from trap representations.
There are potentiall signed and plain character representations for
which
the value is merely unspecified.

I'm confused by that part. Now when I read it again it seems you're
right. I'd welcome others' comments on this, too.

I had a short discussion on this issue in c.s.c, here're excerpts
from my "posted" file:

# > On 28 Nov 2004 01:49:57 GMT, "S.Tobias"
#
# > > Some people seem to believe that access of non-character objects with
# > > a character type other than `unsigned char' automatically invokes UB.
# > No, not automatically. Only if signed char, and plain char if signed,
# > have trap representations and a byte accessed via one of these lvalues
# > contains such a trap representation.

[...]

# > No, it does not. The key wording is in paragraph 5 of 6.2.6.1, where
# > the term 'trap representation' is defined. Here are the first two
# > sentences:
# [snip]
# > Now here is what I was told when I raised the issue before. The fact
# > that access with a non-character type is specifically undefined does
# > not guarantee that no accesses with a character type (other than
# > unsigned char) might not also cause undefined behavior.
 
J

Jack Klein

6.2.6.1p5 leaves character types exempt from trap representations.

No, strangely enough, it does not. Even though it appears to.
There are potentiall signed and plain character representations for
which
the value is merely unspecified.

Read 6.2.6.2 p5. Any signed integer type may have a trap
representation even if it contains only sign and value bits. All bits
are off if signed char contains padding bits, which 6.2.6.2 p2
SPECIFICALLY ALLOWS.
 
J

Jack Klein

6.2.6.1p5 leaves character types exempt from trap representations.
There are potentiall signed and plain character representations for
which
the value is merely unspecified.

I'm confused by that part. Now when I read it again it seems you're
right. I'd welcome others' comments on this, too.

I had a short discussion on this issue in c.s.c, here're excerpts
from my "posted" file:

# > On 28 Nov 2004 01:49:57 GMT, "S.Tobias"
#
# > > Some people seem to believe that access of non-character objects with
# > > a character type other than `unsigned char' automatically invokes UB.
# > No, not automatically. Only if signed char, and plain char if signed,
# > have trap representations and a byte accessed via one of these lvalues
# > contains such a trap representation.

[...]

# > No, it does not. The key wording is in paragraph 5 of 6.2.6.1, where
# > the term 'trap representation' is defined. Here are the first two
# > sentences:
# [snip]
# > Now here is what I was told when I raised the issue before. The fact
# > that access with a non-character type is specifically undefined does
# > not guarantee that no accesses with a character type (other than
# > unsigned char) might not also cause undefined behavior.

Here is what I just posted as a reply to Peter:
6.2.6.1p5 leaves character types exempt from trap representations.

No, strangely enough, it does not. Even though it appears to.
There are potentiall signed and plain character representations for
which
the value is merely unspecified.

Read 6.2.6.2 p5. Any signed integer type may have a trap
representation even if it contains only sign and value bits. All bits
are off if signed char contains padding bits, which 6.2.6.2 p2
SPECIFICALLY ALLOWS.

I did raise this issue in comp.std.c, a long, long time ago. Not only
6.2.6.1 p5, but several other places the standard uses phrases like "a
character type" too loosely. In C89/90, there was no mention of 'trap
representations', or indeed of any possible problems with any integer
operations other than overflow or underflow of the signed types, or
division by 0.

Although, at least one member of the committee said that such things
were allowed to exist under the earlier versions of the standard, even
though they weren't mentioned.

And the reply I received was essentially what I posted and you quoted
above, and I will copy and paste here again:
# > Now here is what I was told when I raised the issue before. The fact
# > that access with a non-character type is specifically undefined does
# > not guarantee that no accesses with a character type (other than
# > unsigned char) might not also cause undefined behavior.

If you disbelieve this, see if you can cite a reference anywhere in
the standard that states specifically that this is NOT undefined.

Simple logic does indeed show that:

all but X is Y

....does not prove that:

X is NOT Y

If a byte in memory contains what is, for a given implementation, a
trap representation for signed char, and if that byte is read by an
lvalue of signed character type, or plain character type if plain char
is signed, then the behavior is undefined.

Should the wording of the standard in the several places where it uses
the phrase "a character type", when a signed character type might
invoke UB, be changed? I thought so.

BTW, our friends down the hall went us one better on this. The ISO
C++ standard specifically disallows padding bits in signed char, yet
ISO C allows them.
 
P

Peter Nilsson

Jack said:
No, strangely enough, it does not. Even though it appears to.

Okay, character types have trap representations, however, those
representaions
are exempt from invoking undefined behaviour.
Read 6.2.6.2 p5.

"The values of any padding bits are unspecified.45) A valid
(non-trap)
object representation of a signed integer type where the sign bit is
zero is a valid object representation of the corresponding unsigned
type,
and shall represent the same value."
Any signed integer type may have a trap representation even if it
contains only sign and value bits.

But it is 6.2.6.1p5 which specifies the consequences of accessing
a trap representation, and character lvalues are left _off_ the
undefined behaviour list.
All bits are off if signed char contains padding bits, which 6.2.6.2p2
SPECIFICALLY ALLOWS.

Do you mean to imply all (padding?) bits are zero (off?), or did you
make
a typo in saying all 'bets' are off?!

If the former, there's no guarantee of that. Simply consider any
arbitrary object bye. Even on an 8-bit implementation, signed char
may have the range -127..127. But if I read a byte with the value
128 on such an implementation, the behaviour is not undefined.
The only thing you can say is the value is unspecified.

In other words, an unsigned char value of 128 may be a trap
representation
as a signed char, however the implementation must produce _some_ value
for that representation as a signed char.
 
L

Lawrence Kirby

No, strangely enough, it does not. Even though it appears to.

Right, it effectively says that trap representations in character types
don't cause undefined behaviour.

Lawrence
 
S

S.Tobias

/Trap representation/ is an object representation that does not represent
any value of the object's type.
Okay, character types have trap representations, however, those
representaions
are exempt from invoking undefined behaviour.

That's my understanding too, after carefully reading that paragraph.
It seems to say that accessing an object itself through a character
type, does not produce UB.

(Note: the Std talks about two kinds of values: object value (which is
actually its byte representation), and a value of a given type.)

Hmm... if an object does not have a value (has a trap representation),
then its value cannot be merely unspecified. The object simply
does not have a value, period.


[
Do you mean to imply all (padding?) bits are zero (off?), or did you
make
a typo in saying all 'bets' are off?!

I think he meant "bets".
]


Jack said:
Although, at least one member of the committee said that such things
were allowed to exist under the earlier versions of the standard, even
though they weren't mentioned.

I agree. The current Std didn't even have to define "trap representation";
just mentioning that an object may not have a value is enough.
And the reply I received was essentially what I posted and you quoted
above, and I will copy and paste here again:

I don't quite agree with the word "access".
If you disbelieve this, see if you can cite a reference anywhere in
the standard that states specifically that this is NOT undefined.

I didn't bother to look, but I'm sure I wouldn't find anything.
The Std is mostly expressed in terms of values, ie. it assumes
that an operand (argument, whatever...) has a value; representation
is somewhat a secondary idea (it sums up to the fact that objects
can be accessed through unsigned char).
Simple logic does indeed show that:

all but X is Y

...does not prove that:

X is NOT Y

Yes. But if I said: "The people living in the far North, hunting
fish and seals, are called Eskimoes", this does not technically
mean that Australians aren't called that, too. However, I'm
sure you won't find any native Australian Eskimoes hunting
kangaroos near Sidney.

We're dealing with human language here, not Mathematics. If the Std
explicitly excluded some types from a behaviour, presumably it intended
that the behaviour does not engage those types.
If a byte in memory contains what is, for a given implementation, a
trap representation for signed char, and if that byte is read by an
lvalue of signed character type, or plain character type if plain char
is signed, then the behavior is undefined.

Should the wording of the standard in the several places where it uses
the phrase "a character type", when a signed character type might
invoke UB, be changed? I thought so.

I think those words should be removed, or replaced with "unsigned char
type" where relevant. I think something along these lines happened
in TC2 wrt memcpy() and friends.

+++

Now I'll try to take Jack's side.

Although accessing trap representation of a signed char does not
seem to raise UB by itself, I think UB will be invoked anyway
at some point.

For an example, let's take the simplest, primary expression:
signed char c; /* assume trap representation */
c;
The expression "is converted to the value stored in the designated object"
(6.3.2.1p2). Since `c' does not have a value (presumably meaning "value of
a given type"), the behaviour is undefined because the Std fails
to define such situation.

I think this explanation should be valid for "++" and "--" operators
as well ("sizeof" and "&" obviously aren't problematic), and extensible
to arrays and other expressions.

+++

I have one more question: in the paragraph under discussion 6.2.6.1p5
it says trap representation can be produced by "a side effect that
modifies [...] the object by an lvalue expression that does
not have character type". So, for an example, how can you produce
a trap representation (in a valid way) in a `long' object with
a `long' lvalue?
 
P

pete

S.Tobias wrote:
Now I'll try to take Jack's side.

Although accessing trap representation of a signed char does not
seem to raise UB by itself, I think UB will be invoked anyway
at some point.

For an example, let's take the simplest, primary expression:
signed char c; /* assume trap representation */
c;
The expression
"is converted to the value stored in the designated object"
(6.3.2.1p2).
Since `c' does not have a value (presumably meaning "value of
a given type"), the behaviour is undefined because the Std fails
to define such situation.

That's what I think.
I think this explanation should be valid for "++" and "--" operators
as well ("sizeof" and "&" obviously aren't problematic),
and extensible to arrays and other expressions.

+++

I have one more question: in the paragraph under discussion 6.2.6.1p5
it says trap representation can be produced by "a side effect that
modifies [...] the object by an lvalue expression that does
not have character type". So, for an example, how can you produce
a trap representation (in a valid way) in a `long' object with
a `long' lvalue?

This could do it on a system that traps negative zero.

long a = rand();

a = a ^ -a;
 
T

Tim Rentsch

S.Tobias said:
/Trap representation/ is an object representation that does not represent
any value of the object's type.


That's my understanding too, after carefully reading that paragraph.
It seems to say that accessing an object itself through a character
type, does not produce UB.

IMO this conclusion isn't quite right... (read on)

Jack said:
Although, at least one member of the committee said that such things
were allowed to exist under the earlier versions of the standard, even
though they weren't mentioned.

I agree. The current Std didn't even have to define "trap representation";
just mentioning that an object may not have a value is enough.
And the reply I received was essentially what I posted and you quoted
above, and I will copy and paste here again:

[snip]

We're dealing with human language here, not Mathematics. If the Std
explicitly excluded some types from a behaviour, presumably it intended
that the behaviour does not engage those types.

This comment is a misreading. The statements in 6.2.6.1 p5 require
something of all types that aren't character types; the character
types aren't stated as excluded from the requirement, they just aren't
included in the statement.

Now I'll try to take Jack's side.

Although accessing trap representation of a signed char does not
seem to raise UB by itself, I think UB will be invoked anyway
at some point.

Here is my reading. See if this strikes your fancy:

1. Accessing an object with some type not a character type, where the
object holds a trap representation of the type of the object, must
produce undefined behavior.

2. Accessing an object with some type not a character type, where the
object holds a trap representation of the type of the object, but
the access is done through a signed character type or plain character
type if plain char has the same representation as signed char, might
or might not produce undefined behavior, depending on whether the
byte(s) accessed are trap representation for the signed character
type.

3. Accessing an object with some type that is a character type, where
the object holds a trap representation for the type of signed char,
and where the access is done through a signed character type or plain
character type if plain char has the same representation as signed
char, does produce undefined behavior.

4. Accessing an object of any type with any representation whether
trap representation or not, with the access being done through
an unsigned character type, is always defined behavior because
there are never trap representations for unsigned character.

I believe that's the reading most consistent with everything else
said about trap representations (at least that I've found).

I have one more question: in the paragraph under discussion 6.2.6.1p5
it says trap representation can be produced by "a side effect that
modifies [...] the object by an lvalue expression that does
not have character type". So, for an example, how can you produce
a trap representation (in a valid way) in a `long' object with
a `long' lvalue?

Three ways for trap representations to come into existence:

1. Uninitialized variables;

2. Trap representations for other types can be stored byte-by-byte
using 'unsigned char' access; and

3. Trap representations can be generated by exceptional conditions.
See notes 44 and 45, and section 6.5 p5. Notes 44 and 45 are
interesting, because they say "no arithmetic operation on valid
values can generate a trap representation other than as part of
an exceptional condition such as an overflow". Since exceptional
conditions *already* mean undefined behavior, saying they can
produce trap representations which can cause further undefined
behavior really isn't much cause for concern.

Presumably also trap representations could also be produced by, eg,
computing a (legal) value that's an 'unsigned long' and casting it to
'long'. The cast could be done indirectly, eg, casting the address of
the 'unsigned long' variable to '(long *)' and then dereferencing. In
either case, producing the value already required undefined behavior.
I believe there's no way to defined-ly produce a trap representation
for a 'long' object other than (1) or (2) above. (The wording in
6.2.6.1 p5 doesn't say that the operation that caused the store was a
defined operation.)

I guess it's also possible that only part of the object in question
could be modified, making the object as a whole a trap representation.
How this might happen without undefined behavior having already
happened I can't say....
 
S

S.Tobias

IMO this conclusion isn't quite right... (read on)
Jack said:
Although, at least one member of the committee said that such things
were allowed to exist under the earlier versions of the standard, even
though they weren't mentioned.

I agree. The current Std didn't even have to define "trap representation";
just mentioning that an object may not have a value is enough.
And the reply I received was essentially what I posted and you quoted
above, and I will copy and paste here again:

# > Now here is what I was told when I raised the issue before. The fact
# > that access with a non-character type is specifically undefined does
# > not guarantee that no accesses with a character type (other than
# > unsigned char) might not also cause undefined behavior.

[snip]

We're dealing with human language here, not Mathematics. If the Std
explicitly excluded some types from a behaviour, presumably it intended
that the behaviour does not engage those types.
This comment is a misreading. The statements in 6.2.6.1 p5 require
something of all types that aren't character types; the character
types aren't stated as excluded from the requirement, they just aren't
included in the statement.

Yes, and that was Jack Klein's POV, too. His argument was that even if
they were not included, it didn't mean they couldn't cause UB.
My argument was that if the Standard made some steps to "un-include"
them, then it probably meant to exclude them. Otherwise why does
the Standard mention character types at all?

Here is my reading. See if this strikes your fancy:

[ I wasn't sure what you meant by "Accessing an object with some type",
whether it was "object with some type" (ie. having some type), or
"accessing with some type". From the context it seems to follow that
it is the former, so I'll understand "an object having some type"
in each case. ]
1. Accessing an object with some type not a character type, where the
object holds a trap representation of the type of the object, must
produce undefined behavior.

You can access the object with with a different type. A `long' object
may have a trap representation (for `long' type), but you could access
it with an `unsigned long' lvalue (which on the implementation doesn't
have a trap representation).
2. Accessing an object with some type not a character type, where the
object holds a trap representation of the type of the object, but
the access is done through a signed character type or plain character
type if plain char has the same representation as signed char, might
or might not produce undefined behavior, depending on whether the
byte(s) accessed are trap representation for the signed character
type.
3. Accessing an object with some type that is a character type, where
the object holds a trap representation for the type of signed char,
and where the access is done through a signed character type or plain
character type if plain char has the same representation as signed
char, does produce undefined behavior.
4. Accessing an object of any type with any representation whether
trap representation or not, with the access being done through
an unsigned character type, is always defined behavior because
there are never trap representations for unsigned character.
I believe that's the reading most consistent with everything else
said about trap representations (at least that I've found).

I don't quite understand what's the point in splitting the rule
into four cases, or more (eg. I don't see a case when the object
having a non-character type and having a valid representation
for its type is accessed with `signed char' type which may have
a trap representation, but now I'm not sure if you inteded include it).

How is it different from the following:
1. What only matters is the type of the lvalue an object is accessed with.
If an object's representation is a trap representation for that
type (regardless of the object type), the behaviour is undefined.
2. `unsigned char' type does not have a trap representation.

(Above I actually assumed that `signed char' may cause UB
(to fit your description), which is under discussion here.)

I have one more question: in the paragraph under discussion 6.2.6.1p5
it says trap representation can be produced by "a side effect that
modifies [...] the object by an lvalue expression that does
not have character type". So, for an example, how can you produce
a trap representation (in a valid way) in a `long' object with
a `long' lvalue?
[snip]
Presumably also trap representations could also be produced by, eg,
computing a (legal) value that's an 'unsigned long' and casting it to
'long'. The cast could be done indirectly, eg, casting the address of
the 'unsigned long' variable to '(long *)' and then dereferencing. In
either case, producing the value already required undefined behavior.

I'm thinking of something similar, but inverse:

long l;
unsigned long *p = (void*)&l;
*p = some_value;

I think this is conforming (we may access `l' with `unsigned long'
lvalue), and it seems to fit the description. If `long' has a trap
representation, then such a write might generate it (no other UB
is invoked here).

I think the meaning of this is that a compiler is allowed to
pre-fetch the value of `l' at any time after the store operation (for
optimization reasons), and the programmer is responsible not to put
anything wrong in it ("pre-fetch" means the whole operation might be
performed entirely in registers, even before physically storing
the results in memory). If the same value was stored via a character
type (memcpy), then the compiler could not do such an optimization.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top