Casts on lvalues

Willem · Dec 5, 2012

BartC wrote:
) The offsets come from externally generated data, and initially were simple
) counts: +2, -1 etc. The code looked like this:
)
) R *p,*q;
)
) p=q+n;
)
) Very nice. Then I noticed this addition involved internally multiplying n by
) 16 (or a shift as it was), which wasn't so nice! It was just as easy to
) generate these numbers as multiples of 16 anyway (+32, -16 etc) so I did
) that. But then the code wasn't so pretty.

That's quite odd. A good compiler should have done that optimization for
you, if it's at all possible. Also, on x86 CPU's, there are addressing
modes that implicitly multiply by factors of two, with no speed penalty.

And, obviously, this is pretty much a micro optimization. Are you that
concerned with execution speed? Is this for some embedded cpu?

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Ian Collins · Dec 5, 2012

BartC said:
My original example was
more like this:

int* P;

++(char*)P;

ie. treat P as as a char* pointer (so that on my machine, the value in P
increments by 1 instead of 4).

Which may well result in P not being a valid pointer to an int, ending
the world when it is dereferenced. Such a construct is like giving a 3
year old a box ox matches. C does have plenty of matches on offer, but
they are normally in child proof boxes.

If you want a pointer to walk through some data built from an arbitrary
set of types, use a char*.

BartC · Dec 5, 2012

Willem said:
BartC wrote:
) The offsets come from externally generated data, and initially were
simple
) counts: +2, -1 etc. The code looked like this:
)
) R *p,*q;
)
) p=q+n;
)
) Very nice. Then I noticed this addition involved internally multiplying
n by
) 16 (or a shift as it was), which wasn't so nice! It was just as easy to
) generate these numbers as multiples of 16 anyway (+32, -16 etc) so I did
) that. But then the code wasn't so pretty.

That's quite odd. A good compiler should have done that optimization for
you, if it's at all possible. Also, on x86 CPU's, there are addressing
modes that implicitly multiply by factors of two, with no speed penalty.

How can it be optimised? If the pointer has a 16-byte stride, and you're
adding an offset N, then N*16 must be added to the pointer. The x86's
address scaling feature only goes up to *8. This *16 is implemented with a
shift.

And, obviously, this is pretty much a micro optimization. Are you that
concerned with execution speed? Is this for some embedded cpu?

As it happened, it made very little difference (about 0.4% better), even
though this unnecessary operation was executed some 80 million times every
second. But sometimes you are just making dozens of tiny improvements which
together make a worthwhile difference. (Sometimes, they inexplicably make
things slower too.)

But I just don't like the idea of my code (and the instruction cache)
getting cluttered up with things that don't need to be there. I doubt they
will make it go faster!

(And yes I am hoping to get this working on a slow-running board computer --
one day.)

BartC · Dec 5, 2012

Ian Collins said:
Which may well result in P not being a valid pointer to an int, ending the
world when it is dereferenced. Such a construct is like giving a 3 year
old a box ox matches. C does have plenty of matches on offer, but they
are normally in child proof boxes.

But, I can do the equivalent of ++(char*)P by writing:

P = (int*)((char*)P+1);

(or near enough if that's not quite right.) So the world can still end.

Keith Thompson · Dec 6, 2012

BartC said:
Isn't that the point of having casts?

No. The point of a cast is to explicitly convert a *value* of some
type to a value of some other specified type.

There are several ways in C to treat an object of some type as if it
were an object of some other type. A simple cast is not one of them.

Depending on what you're trying to do, you can take the object's
address, cast it to some other pointer type, and then dereference the
result. You do so at your own risk; it can fail badly if the object
doesn't have the proper alignment for the target type, or if the target
type is larger than the object, or if you assign an invalid value (such
as a trap representation) to the object.

Or you can use a union; this avoids alignment issues, but the other
problems still apply.

Or you can use memcpy() to create a *copy* of an object's representation
in an object of a different type.

On a platform that you know inside-out, and where the alternative is to do
exactly the same but using assembly code with all of it's disadvantages,
there there is plenty of value in doing it.

There are *plenty* of ways to do what you want to do without
resorting to assembly code. It's true that they're not as
syntactically "clean" as a hypothetical lvalue cast. Personally,
I think that's a good thing; type-punning is dangerous enough that
it shouldn't be done casually. If you think it should be easier
to do, that's fine. I don't think *anyone* is 100% happy with the
way C is defined.

I suppose you could define a macros to make it easier:

#include <stdio.h>

#define PUN(object, type) (*(type*)&(object))

int main(void) {
unsigned u;
PUN(u, float) = 1.0/3.0;
printf("u (as float) = %g\n", PUN(u, float));
printf("u = 0x%x\n", u);
return 0;
}

[...]

Keith Thompson · Dec 6, 2012

BartC said:
OK, I see it. But: the (T)A=B example might be done instead as:

memcpy(&A, &B, sizeof(T));

So it expresses something that could conceivably make sense. Unlike the
other examples that don't!

Ok, you're right, lvalue casts could conceivably make sense.

C doesn't have lvalue casts.

Are there any unanswered questions remaining?

glen herrmannsfeldt · Dec 6, 2012

OK, I get that now. But that's only because the Book says so.
(snip)

It's not quite that either, because this might involve unwanted int/float
conversions.
(snip)

It means: "pretend that A is a variable of type T for this assignment".
(snip)
Well in this case it wouldn't do anything too useful! But turning it around
a little:

double a;

(int)a = 3142;

This just writes the bit-pattern for integer 3142 in (on my machine) the
bottom half of a. But that, I can currently do in C using instead:

OK, but now that C has complex variables:

In PL/I, you can use the functions REAL and IMAG to get the
real and imaginary parts of a complex expression, and COMPLEX to
get a complex value from two real expressions.

PL/I uses functions where C would use casts, so to get the fixed
point value from a floating point expression you use the function
FIXED.

In addition to the functions REAL, IMAG, and COMPLEX, there are also
pseudo-variables, which allow for statements like:

REAL(Z)=3;

which assigns 3 to the real part of Z, leaving the imaginary part
unchanged. You can even do:

DCL Z FIXED BIN(31,0) COMPLEX;
Z=0;
DO IMAG(Z)=1 TO 100 BY 3;
PUT SKIP LIST(Z,SQRT(Z));
END;

in which case IMAG is both a function and pseudo-variable.
(It also works with FLOAT variables.)

Other pseudo-variables are SUBSTR and UNSPEC.

-- glen

BartC · Dec 6, 2012

Ben Bacarisse said:
No, I don't think so. Certainly not directly -- a cast expression is
not a lvalue. You can do dangerous thing like:

*(byte **)&p += n; /* don't do this!! */

or use a union with an R * and a char * pointer in it, but both
techniques rely on the representation of char and struct pointers being
the same -- they re-interpret the pointer rather than converting it.

I saw your comment and didn't look much further (I thought the dangers were
something to do with aliasing). Yet this seems exactly what I was looking
for.

So if someone wants an lvalue cast of the form (T)A, they can just do
*(T*)&A
instead. (Provided they know that, for example, pointer representations
happen to be compatible.)

So this statement:

C doesn't have lvalue casts.

isn't completely true, because it seems you can get around it easily by
turning it into an rvalue cast first. (Also I'm talking about type-punning
sorts of casts rather than type-conversion ones.)

Ben Bacarisse · Dec 6, 2012

BartC said:
I saw your comment and didn't look much further (I thought the dangers were
something to do with aliasing). Yet this seems exactly what I was looking
for.

So if someone wants an lvalue cast of the form (T)A, they can just do
*(T*)&A instead.

The depends on the meaning you give to "lvalue cast". Since C does not
have such a thing, you have to specify it, but your original posted used
code that had different semantics. That's why, in part, I said "don't
do this" -- because it does not have the same meaning as the code you
presented.

(Provided they know that, for example, pointer representations
happen to be compatible.)

It seems odd to assume this when you don't need to. Your original post
complained only about the fact that your solution was rather wordy (or
messy -- I don't recall exactly), so I suggested an inline function to
tidy it up. Why would you exchange a universal solution for one with a
restriction?

So this statement:

isn't completely true, because it seems you can get around it easily by
turning it into an rvalue cast first.

It's still completely true. C does not have pass by reference either,
and any technique used to get round that restriction does not alter that
fact.

(Also I'm talking about type-punning
sorts of casts rather than type-conversion ones.)

OK, but how could anyone have known? The code you originally posted
used entirely portable type-conversions.

Willem · Dec 6, 2012

BartC wrote:
)
)
) )> BartC wrote:
)> ) The offsets come from externally generated data, and initially were
)> simple
)> ) counts: +2, -1 etc. The code looked like this:
)> )
)> ) R *p,*q;
)> )
)> ) p=q+n;
)> )
)> ) Very nice. Then I noticed this addition involved internally multiplying
)> n by
)> ) 16 (or a shift as it was), which wasn't so nice! It was just as easy to
)> ) generate these numbers as multiples of 16 anyway (+32, -16 etc) so I did
)> ) that. But then the code wasn't so pretty.
)>
)> That's quite odd. A good compiler should have done that optimization for
)> you, if it's at all possible. Also, on x86 CPU's, there are addressing
)> modes that implicitly multiply by factors of two, with no speed penalty.
)
) How can it be optimised? If the pointer has a 16-byte stride, and you're
) adding an offset N, then N*16 must be added to the pointer.

I was under the impression that you were generating the offsets somewhere.
If you can pre-multiply those by 16, so can the compiler, one would think.

Take, for example:

int step = some_function();
struct struct_16_bytes_long p = start;
while (p->marker != 0) {
frobnicate(p->data);
p += step;
}

In this case, the optimizer should pull the multiply-by-16 out of the loop,
shouldn't it?

) The x86's address scaling feature only goes up to *8. This *16 is
) implemented with a shift.

I wasn't aware the stride was 16 bytes.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Keith Thompson · Dec 6, 2012

BartC said:
So this statement:

isn't completely true, because it seems you can get around it easily by
turning it into an rvalue cast first. (Also I'm talking about type-punning
sorts of casts rather than type-conversion ones.)

Yes, it is completely true.

C doesn't have linked lists or binary trees either, but you can
easily implement them using structs and pointers. C gives you the
basic tools needed to build just about *anything*. Lvalue casts
(whatever you happen to mean by that phrase) are not one of those
basic tools, but they are something you can build.

And please keep in mind that there's a *big* difference between
conversion and type-punning. Conversion, as implemented by
a cast operator, converts a *value* from one type to another.
For example, a conversion from int to float gives you a float
with the mathematical value of the operand, regardless of how ints
and floats are represented. Pointer conversions conceptually do
the same thing; it just happens that most modern implementations
represent all pointers the same way, so pointer conversions can be
implemented as a reinterpretation of the representation.

Piotr Kalinowski · Dec 6, 2012

BartC said:
But, I can do the equivalent of ++(char*)P by writing:

P = (int*)((char*)P+1);

(or near enough if that's not quite right.) So the world can still end.

It's not about disallowing you to shoot yourself. It's about making it
more difficult, so you're slightly less likely to do it. It's about
increasing the cost of ending the world, so that you'll think twice
(hopefully) before doing it.

Regards,
Piotr Kalinowski

Philipp Thomas · Dec 7, 2012

Or you can use a union; this avoids alignment issues, but the other
problems still apply.

When used for type-punning that isn't guaranteed to work but depends
on the compiler.

Or you can use memcpy() to create a *copy* of an object's representation
in an object of a different type.

And this is the only clean way to fix type-punning and avoid alignment
issues.

Philipp

Keith Thompson · Dec 7, 2012

Philipp Thomas said:
When used for type-punning that isn't guaranteed to work but depends
on the compiler.

A footnote in the standard (N1370 6.5.2.3p3, footnote 95) says:

If the member used to read the contents of a union object is
not the same as the member last used to store a value in the
object, the appropriate part of the object representation of
the value is reinterpreted as an object representation in the
new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.

I'm not sure why that's stated only in a non-normative footnote.
I suppose the implication is that it's already stated normatively,
but it's not clear to me that it is.

In any case, a union does avoid alignment issues.

Philip Lantz · Dec 7, 2012

BartC said:
OK, I see it. But: the (T)A=B example might be done instead as:

memcpy(&A, &B, sizeof(T));

So it expresses something that could conceivably make sense. Unlike the
other examples that don't!

Do you think that "B = (T)A" is similar to "memcpy(&B, &A, sizeof(T))"?

It's not.

So, if you did think that, it helps me understand why you think lvalue
casts make sense.

Öö Tiib · Dec 7, 2012

A footnote in the standard (N1370 6.5.2.3p3, footnote 95) says:

If the member used to read the contents of a union object is
not the same as the member last used to store a value in the
object, the appropriate part of the object representation of
the value is reinterpreted as an object representation in the
new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.

I'm not sure why that's stated only in a non-normative footnote.
I suppose the implication is that it's already stated normatively,
but it's not clear to me that it is.

C99 has such texts:

6.2.5p20, union has an overlapping set of member objects
6.7.2.1p14, the value of at most one of the members can be stored in a union object at any time
Annex J.1 the value of a union member other than the last one stored into is unspecified.

I do not feel it safe to use union for type punning.

BartC · Dec 7, 2012

Philip Lantz said:
Do you think that "B = (T)A" is similar to "memcpy(&B, &A, sizeof(T))"?

No. I'm interested in type-punning the left-hand-side; memcpy might be one
way of achieving that in some cases.

Your example might also work, except that (T) on the right-hand-side does
type conversion not type-punning.

It's not.

So, if you did think that, it helps me understand why you think lvalue
casts make sense.

Since any 'lvalue cast' of the form (T)A can be written instead as *(T*)&A,
which is perfectly legal, then why shouldn't it make sense?

Look, I have this compiler project from a couple of months back. That
language also doesn't have lvalue casts (it wasn't too hot on casts, but it
*does* have the 'equivalence' feature which C doesn't have, which is what is
used instead, and is better IMO).

I decided to add lvalue casts to that language. It took ten minutes, and six
lines of code, to have them working for assignment! (Needs a bit more work
for general lvalues, and obviously a lot more testing.)

So I can now write in that language:

real x
int a

a:=x
real(a):=x # lvalue cast!

Intermediate output:

0003: 1:005 convert (a,x) int32,
real64
0004: 1:006 move (a,x)
real64, real64

And, since it was set up to produce C code, this is the final output
(obviously r64 and i32 are typedefs):

r64 x;
i32 a;

a = x;
*(r64*)&a = x;

Notice anything similar between this last line (which I haven't doctored)
and the *(T*)&A I wrote above? (I didn't even change any part of the code
generation; this is what naturally came out. However, I didn't run this
example because I just realised the destination is too small..)

Lvalue casts *can* be meaningful, and while people are right in that C
doesn't directly define them, they can be achieved with a simple
transformation.

James Kuyper · Dec 7, 2012

C99 has such texts:

6.2.5p20, union has an overlapping set of member objects
6.7.2.1p14, the value of at most one of the members can be stored in a union object at any time
Annex J.1 the value of a union member other than the last one stored into is unspecified.

I do not feel it safe to use union for type punning.

Keep in mind that footnote 95 merely describes what the committee
intended to be the case from the very beginning, and what is easiest to
implement, and what virtually every real implementation of C always has
implemented, because a great many C programmers have always assumed it
was true. It's not really worthwhile worrying about the possibility that
a union won't work as described by footnote 95. There's much better
things to worry about - such as the fact that type punning inherently
requires building into your code implementation-specific knowledge about
how the two types are represented.

James Kuyper · Dec 7, 2012

On 12/07/2012 07:54 AM, BartC wrote:
....

Look, I have this compiler project from a couple of months back. That
language also doesn't have lvalue casts (it wasn't too hot on casts, but it
*does* have the 'equivalence' feature which C doesn't have, which is what is
used instead, and is better IMO).

C does have a feature with the same semantics as your "equivalence"
feature, just different syntax. That feature is called a union.

BartC · Dec 7, 2012

James Kuyper said:
On 12/07/2012 07:54 AM, BartC wrote:
...

C does have a feature with the same semantics as your "equivalence"
feature, just different syntax. That feature is called a union.

Unions are much more limited. For example, if you have:

int A[25];
double X;

you can't 'equivalence' X to A[16] without some difficulty. (Actually X
might span both A[16] and A[17].)

A might also be external, limiting the options further.

It's also harder to refer to A and X entirely independently; you might need
to use U.A and U.X (if you can even get that far).

(BTW 'equivalence' is a (now-deprecated) feature of Fortran. I thought
people might be familiar with it. My version just uses @, for example:
double X @ A[16]; )

function casts	27	Oct 12, 2012
Casts	81	Jun 23, 2007
Lexical Analysis on C++	1	Oct 31, 2023
lvalues and rvalues	127	Apr 6, 2010
Union and pointer casts?	13	Feb 24, 2011
Pointer casts for OOP	2	Aug 18, 2011
Function is not worked in C	2	Jun 27, 2023
Fun with casts	1	Apr 7, 2010

Casts on lvalues

Willem

Ian Collins

BartC

BartC

Keith Thompson

Keith Thompson

glen herrmannsfeldt

BartC

Ben Bacarisse

Willem

Keith Thompson

Piotr Kalinowski

Philipp Thomas

Keith Thompson

Philip Lantz

Öö Tiib

BartC

James Kuyper

James Kuyper

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads