Casts on lvalues

Rui Maciel · Dec 7, 2012

BartC said:
Suppose I have these types:

#define byte unsigned char

typedef struct {
int a,b,c,d;
} R; /* assume this is 16 bytes */

R stores 4 objects of type int, whose size is platform-dependent. If you
need a 4-byte integer data type then your best bet would be on int32_t,
which gives you the assurance that you will get an integer data type which
is exactly 32-bit wide.

And these variables:

int n; /* represents a *byte* offset */

If you wish to represent a byte offset then you should use size_t instead.

R* p;

I want to be able to do the following:
p += n;

but it doesn't work because n is a byte offset; it's not in terms of R
objects.

I don't know what yo were trying to do, but if you were trying to access the
offset of a structure member by supplying the address of the struct then you
should use the offsetof() command instead of relying on pointer arithmetic
voodoo.

But the obvious cast:

(byte*)p+=n;

doesn't appear to compile.
Try

The workarounds seem to be:

p += n/sizeof(R);

which I don't like.

You shouldn't. No one should. It represents all kinds of badness.

Even though p is always aligned (and n is known to be
a multiple of 16),

How do you know that p is always aligned, and that n is a multipleof 16?
Each non-bit-field member of a struct is aligned in an implementation-
defined manner, which means that there isn't any guarantee regarding the
alignment of the structure members.

In addition, padding may or may not exist.

Finally, R was defined as a struct composed of 4 objects of type int, and
the int data type may be 8 or 16-bit wide depending on the implementation.

and I know the divide will cancel out, it seems funny
having to introduce a divide op in the first place. And:

p = (R*)((byte*)p+n);

which is what I'm using but looks very untidy in real code.

In any case, the question remains, *is* there a way to cast an lvalue as
I've shown above?

What exactly were you trying to accomplish with that code? It's rather odd
that someone tries to assign a pointer to a structure to the n-th member of
a struct of the same type.

Rui Maciel

James Kuyper · Dec 7, 2012

James Kuyper said:
James Kuyper said:

On 12/07/2012 07:54 AM, BartC wrote:
...

C does have a feature with the same semantics as your "equivalence"
feature, just different syntax. That feature is called a union.

Click to expand...

Unions are much more limited. For example, if you have:

int A[25];
double X;

you can't 'equivalence' X to A[16] without some difficulty. (Actually X
might span both A[16] and A[17].)

Your earlier description of how "equivalence" works in your language
didn't mention that it could be used in that fashion. Yes, that would be
a bit harder to do in C. If _Alignof(int) != _Alignof(double), make sure
that a[16] is correctly aligned to be used as a place to store a double
could, in principle, be problematic. The fact that you chose a power of
2 as a subscript makes this unlikely to be an issue on most machines,
but I presume your language also allows X to be equivalenced with A[17]?
If there was only one 'double' object equivalenced to A[17], you could
adjust the location of A to make sure that A[17] was correctly aligned
for a double. However, that approach wouldn't work if multiple
incompatible equivalences were specified. How does your language deal
with that? Can it only be implemented on platforms with no alignment
restrictions?

A might also be external, limiting the options further.

In C, if A were external, code compiled to work with A could be
optimized to assume that no pointer to double ever aliases any part of
A. The union approach removes permission to make such optimizations, at
least for code within the scope of the union, but if the declaration of
A is outside your control, that's not an option.
Allowing anything in C that's similar to your language's "equivalence"
would require disabling such optimizations. Allowing it for arbitrary
types would require disabling all such optimizations; I presume that
such optimizations are prohibited in your language? Or perhaps code
which would interact badly with such optimizations is prohibited? That
would be the equivalent of restrict-qualifying most pointer declarations
in C.

It's also harder to refer to A and X entirely independently; you might need
to use U.A and U.X (if you can even get that far).

(BTW 'equivalence' is a (now-deprecated) feature of Fortran. I thought
people might be familiar with it. My version just uses @, for example:
double X @ A[16]; )

Its been a couple of decades since the last time I wrote much Fortran
code, and I don't think I ever used that feature, but I was aware of it.
If it has the same capability you describe above, I didn't remember that
fact, which isn't particularly surprising.

BartC · Dec 7, 2012

Rui Maciel said:
BartC wrote:

How do you know that p is always aligned

p points into a malloc-allocated array of such structs.

Because all such values (originally +2, -1 etc) have been multiplied by 16
(or rather, sizeof(R)) in an initial pass of the data!

What exactly were you trying to accomplish with that code? It's rather
odd
that someone tries to assign a pointer to a structure to the n-th member
of
a struct of the same type.

p points into an array of structs. It's modified to point instead at a +2
or -1 etc offset from where it currently is.

But I decided to change that +2, -1 etc to +32, -16 etc because I didn't
like the 80 million extra left-shifts being executed every second for no
purpose.

Öö Tiib · Dec 7, 2012

Keep in mind that footnote 95 merely describes what the committee
intended to be the case from the very beginning, and what is easiest to
implement, and what virtually every real implementation of C always has
implemented, because a great many C programmers have always assumed it
was true.

Yes, I am not worried how union is implemented, I am worried if an
optimizer might ignore that footnote and optimize too lot in some
case. So ... I prefer to memcpy or to cast (properly aligned) pointers.

glen herrmannsfeldt · Dec 7, 2012

James Kuyper said:
On 12/07/2012 08:48 AM, BartC wrote:

(snip, someone wrote)

C does have a feature with the same semantics as your "equivalence"
feature, just different syntax. That feature is called a union.

Click to expand...

Unions are much more limited. For example, if you have:
int A[25];
double X;
you can't 'equivalence' X to A[16] without some difficulty. (Actually X
might span both A[16] and A[17].)

Click to expand...

Your earlier description of how "equivalence" works in your language
didn't mention that it could be used in that fashion. Yes, that would be
a bit harder to do in C. If _Alignof(int) != _Alignof(double), make sure
that a[16] is correctly aligned to be used as a place to store a double
could, in principle, be problematic.
(snip)

It's also harder to refer to A and X entirely independently;
you might need to use U.A and U.X (if you can even get that far).
(BTW 'equivalence' is a (now-deprecated) feature of Fortran.
I thought people might be familiar with it. My version just
uses @, for example: double X @ A[16]; )

Click to expand...

Its been a couple of decades since the last time I wrote much Fortran
code, and I don't think I ever used that feature, but I was aware of it.
If it has the same capability you describe above, I didn't remember that
fact, which isn't particularly surprising.

Fortran EQUIVALENCE has many of the same restrictions as union, though
it is often used to get bits between different types.

Fortran now has TRANSFER, an intrinsic function that is defined
to "transfer the physical representation." It also avoids any
questions about alignment.

But then C has memcpy() and casts to (unsigned char *) to do it.

-- glen

BartC · Dec 7, 2012

James Kuyper said:
On 12/07/2012 08:48 AM, BartC wrote:

Unions are much more limited. For example, if you have:

int A[25];
double X;

you can't 'equivalence' X to A[16] without some difficulty. (Actually X
might span both A[16] and A[17].)

Click to expand...

but I presume your language also allows X to be equivalenced with A[17]?

Yes. Or one byte past. (Well, if the hardware allows it, why not?)

However, that approach wouldn't work if multiple
incompatible equivalences were specified. How does your language deal
with that?

The language is dumb. If you override the default alignments by using this
aliasing, then it will use exactly the address you gave it. Just like you
can do in assembly code.

Can it only be implemented on platforms with no alignment
restrictions?

Usually the feature will be used sensibly. Doubly so if unaligned accesses
cause a crash. Although it's possible to deal with that properly; I've seen
gcc generate byte-at-a-time code for accessing an int via a pointer it
thought was unaligned.

I presume that
such optimizations are prohibited in your language?

Not in the language. But my compilers never did sophisticated optimisations
anyway. (Actually I don't think they did unsophisticated ones either...)

Or perhaps code
which would interact badly with such optimizations is prohibited?

I don't think optimisations would need to be entirely ruled out. All the
information is there at compile-time (unlike pointers where you're never
quite sure what's pointing to what.) And you wouldn't use this stuff
everywhere.

BartC · Dec 7, 2012

glen herrmannsfeldt said:
Fortran EQUIVALENCE has many of the same restrictions as union, though
it is often used to get bits between different types.

This page shows you can still set up all sorts of relationships between
unrelated data:

http://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vn9b/index.html

Keith Thompson · Dec 7, 2012

Ã–Ã¶ Tiib said:
C99 has such texts:

6.2.5p20, union has an overlapping set of member objects
6.7.2.1p14, the value of at most one of the members can be stored
in a union object at any time
Annex J.1 the value of a union member other than the last one
stored into is unspecified.

I do not feel it safe to use union for type punning.

Note that C99 doesn't have that footnote, but N1256 does. It was added
by one of Technical Corrigenda 3 -- but N1256 (which incorporates TC3)
still has that clause in Annex J.1. I suspect this is just an
oversight.

N1570 (essentially C2011) changed the wording of the clause in J.1
(which is a list of unspecified behaviors) from:
The value of a union member other than the last one stored into
(6.2.6.1)
to:
The values of bytes that correspond to union members other than the
one last stored into (6.2.6.1).

I can't find the list of C99 DRs (the link I had,
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm>
now points to the list of C2011 DRs), so I don't know which DR
introduced this change.

But the implication is that the committee felt that the existing C99
standard already implied what the footnote says. It's conceivable
that a compiler could behave differently, (e.g., by optimizing away
an assignment to a union member if that member is never access again)
-- but such a compiler would break a good deal of existing code that
depends on this behavior, whether it's guaranteed by the standard
or not.

James Kuyper · Dec 7, 2012

Not in the language. But my compilers never did sophisticated optimisations
anyway. (Actually I don't think they did unsophisticated ones either...)

I don't think optimisations would need to be entirely ruled out. All the
information is there at compile-time (unlike pointers where you're never
quite sure what's pointing to what.) And you wouldn't use this stuff
everywhere.

I'm not talking about all optimizations, just the ones that depend upon
C's anti-aliasing guarantees. Consider the following function:

void func(int n, float array[static n], long *l)
{
for(int i=0; i<n; i++)
array = *l;
*l = n;
}

This isn't a particularly useful function - I've chosen it solely to
clearly illustrate my point - but my point also applies to more
complicated and useful functions.
An implementation is permitted, because of 6.5p7, to compile that
function as if it had been written:

void func(int n, float array[static n], long *l)
{
long temp = *l;
for(int i=0; i<n; i++)
array = temp;
*l = n;
}

The optimized version has exactly the same behavior as the original, so
long as 6.5p7 is not violated. If C were changed to allow a long integer
to be equivalenced with a element of a float array, the value of *l
might change during the loop. If so, these two versions of the code
would no longer be equivalent, and there's an important question to be
asked about which one actually matches the user's intent. If the initial
value of *l was carefully chosen with the intent that it would, in fact,
be changed by this code inside that loop, the developer is going to be a
little upset if that didn't actually happen.

BartC · Dec 7, 2012

James Kuyper said:
On 12/07/2012 01:27 PM, BartC wrote:

I'm not talking about all optimizations, just the ones that depend upon
C's anti-aliasing guarantees. Consider the following function:

void func(int n, float array[static n], long *l)
{
for(int i=0; i<n; i++)
array = *l;
*l = n;
}

An implementation is permitted, because of 6.5p7, to compile that
function as if it had been written:

Click to expand...

void func(int n, float array[static n], long *l)
{
long temp = *l;
for(int i=0; i<n; i++)
array = temp;
*l = n;
}

The optimized version has exactly the same behavior as the original, so
long as 6.5p7 is not violated. If C were changed to allow a long integer
to be equivalenced with a element of a float array, the value of *l
might change during the loop.

Click to expand...

(Seems like it might just be copied to itself? In this example).

If so, these two versions of the code
would no longer be equivalent, and there's an important question to be
asked about which one actually matches the user's intent. If the initial
value of *l was carefully chosen with the intent that it would, in fact,
be changed by this code inside that loop, the developer is going to be a
little upset if that didn't actually happen.

Click to expand...

I can't see that the problems are going to be that different from the ones
you might get when using pointers. (And the solutions might be similar.)

Because even without equivalencing, 'l' might anyway point into the array,
via a cast.

If this was part of the language, at least the aliasing would be done in an
overt manner, not hidden away in a pointer cast, in an operation done at
runtime so you don't know at any time what is aliasing what.

James Kuyper · Dec 7, 2012

I'm not talking about all optimizations, just the ones that depend upon
C's anti-aliasing guarantees. Consider the following function:

void func(int n, float array[static n], long *l)
{
for(int i=0; i<n; i++)
array = *l;
*l = n;
}

Click to expand...

An implementation is permitted, because of 6.5p7, to compile that
function as if it had been written:

Click to expand...

void func(int n, float array[static n], long *l)
{
long temp = *l;
for(int i=0; i<n; i++)
array = temp;
*l = n;
}

The optimized version has exactly the same behavior as the original, so
long as 6.5p7 is not violated. If C were changed to allow a long integer
to be equivalenced with a element of a float array, the value of *l
might change during the loop.

Click to expand...

(Seems like it might just be copied to itself? In this example).

I think you're assuming that sizeof(long)==sizeof(float), which is
commonplace, but not required. However, even on systems where that is
true, the result is not a simple copy. The conversion to float will
produce a bit pattern that's pretty unlikely to be the same as the
original (unless the original represents 0).

I can't see that the problems are going to be that different from the ones
you might get when using pointers. (And the solutions might be similar.)

Click to expand...

In C, such code using pointer casts violates 6.5p7, and therefore has
undefined behavior, so the solution is simply not to write such code.

Because even without equivalencing, 'l' might anyway point into the array,
via a cast.

Click to expand...

Not with defined behavior.

If this was part of the language, at least the aliasing would be done in an
overt manner, not hidden away in a pointer cast, in an operation done at
runtime so you don't know at any time what is aliasing what.

Click to expand...

As I said, the C language does provide a way to explicitly alias objects
of two different types, and the optimizations I've described above are
therefore prohibited when they occur within the scope of a union
declaration that connects the relevant types (except in those contexts
where the compiler can be certain that the relevant pointers do not
point at members of the same union object).

glen herrmannsfeldt · Dec 7, 2012

(snip, someone wrote)

Note that C99 doesn't have that footnote, but N1256 does. It was added
by one of Technical Corrigenda 3 -- but N1256 (which incorporates TC3)
still has that clause in Annex J.1. I suspect this is just an
oversight.
(snip)

But the implication is that the committee felt that the existing C99
standard already implied what the footnote says. It's conceivable
that a compiler could behave differently, (e.g., by optimizing away
an assignment to a union member if that member is never access again)
-- but such a compiler would break a good deal of existing code that
depends on this behavior, whether it's guaranteed by the standard
or not.

Some time ago, I was trying to figure out if it would be
possible for a C compiler to generate JVM code.

JVM has no operation that stores different types in the same memory.
I believe it isn't so hard to make a memcpy() that can do the
appropriate conversions and copies, though.

It seemed to me at the time that implementing union the same as
struct was one way that would work, and would also follow the
standard. (Though maybe less memory efficient.)

I asked here at the time, and many seemed to agree, though as you
say, it might break existing code.

-- glen

glen herrmannsfeldt · Dec 7, 2012

(snip on EQUIVALENCE)

This page shows you can still set up all sorts of relationships between
unrelated data:

http://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vn9b/index.html

EQUIVALENCE goes back to the first Fortran compiler, usually called
Fortran I, in 1956. It required at least 4K (36 bit) words to run.
A 704 with the full 32K (36 bit) words was pretty much the super
computer of the time. EQUIVALENCE was needed to conserve memory
use, and much less for type punning.

But yes, in later years, with more memory available, EQUIVALENCE
was commonly used to move bits between different data types.

TRANSFER wasn't added until, I believe, Fortran 90.
(Fortran 90 added dynamic allocation, which doesn't work
with EQUIVALENCE.)

-- glen

Ben Bacarisse · Dec 7, 2012

glen herrmannsfeldt said:
(snip, someone wrote)

Some time ago, I was trying to figure out if it would be
possible for a C compiler to generate JVM code.

JVM has no operation that stores different types in the same memory.
I believe it isn't so hard to make a memcpy() that can do the
appropriate conversions and copies, though.

It seemed to me at the time that implementing union the same as
struct was one way that would work, and would also follow the
standard. (Though maybe less memory efficient.)

I asked here at the time, and many seemed to agree, though as you
say, it might break existing code.

I don't think that would work out. In particular, 6.5.8 p5 says "All
pointers to members of the same union object compare equal". Since you
are talking about a compiler here, that alone is not the end of the
matter because the compiler might be able to arrange for this to appear
to be true. For example, a pointer might be made to consist of two
parts, one that is used for comparison while the other part is an offset
used for union access, but I fear that this won't fly in the long run.
(For example, this particular ruse will go wrong implementing the
special provision for "common initial sequences" in 6.5.2.3 p5. Again,
a fix-up is possible but I am sceptical about far this can go on.)

It would be an interesting exercise to see if would work, but I'm not
hopeful.

glen herrmannsfeldt · Dec 8, 2012

(snip, I wrote)

I don't think that would work out. In particular, 6.5.8 p5 says "All
pointers to members of the same union object compare equal". Since you
are talking about a compiler here, that alone is not the end of the
matter because the compiler might be able to arrange for this to appear
to be true. For example, a pointer might be made to consist of two
parts, one that is used for comparison while the other part is an offset
used for union access,

Well, most pointers have to have an Object reference to an array
and an offset into the array. Any scalar that can be pointed to
has to instead be an array of length 1.

but I fear that this won't fly in the long run.
(For example, this particular ruse will go wrong implementing the
special provision for "common initial sequences" in 6.5.2.3 p5. Again,
a fix-up is possible but I am sceptical about far this can go on.)

As well as I know, all that fun stuff only has to happen for
(unsigned char *), or (void *), so the compiler only has to do that
extra stuff when those occur.

But yes, getting that one right will be tricky.

It would be an interesting exercise to see if would work, but I'm not
hopeful.

It already gets tricky for struct. If you make a struct a Java class,
then you can have an object reference to the (instance of the) class,
which is different from a reference to a class member. So, extra
work to get that right.

-- glen

Phil Carmody · Dec 9, 2012

BartC said:
Plenty of terms can appear interchangeably on both left and right sides.

But while A can appear on either side, (T)A can't. And there doesn't
appear to be a convincing reason why not.

Your posts are getting more and more detached from reality. What would

(T)A = expr;

even mean? What is getting modified? Certainly A can't be. One could argue
that some temporary non-addressible pseudo-object containing the cast value
of A is being modified, but as that pseudo-object fails to be referenceable
after that line, the assignment is absolutely useless.

I want neither meaninglessness nor uselessness as a language feature.

Phil

James Kuyper · Dec 9, 2012

Your posts are getting more and more detached from reality. What would

(T)A = expr;

even mean? What is getting modified? Certainly A can't be. One could argue
that some temporary non-addressible pseudo-object containing the cast value
of A is being modified, but as that pseudo-object fails to be referenceable
after that line, the assignment is absolutely useless.

I want neither meaninglessness nor uselessness as a language feature.

He's made it clear what he wants it to mean. What he wants is equivalent
to the following C code, except for the fact that the following code
violates a constraint, and he wants this construct to have well-defined
behavior:

*(T*)&A = expr;

That this construct would have such a meaning is inconsistent with the
way casts work in the rest of the C language, which doesn't bother him;
in fact, I think he believes it's inconsistent for it to NOT work this
way. It would still be a horrible unsafe thing to do, even if it were
fully legal. It can be done somewhat more safely by declaring a union,
but he wants to be able to use it even in contexts where he can't change
the definition of A to be a union, and the syntax for doing it using a
union is more complicated than he wants it to be. Also, he thinks the
syntax to do this using a union is excessively complicated.

BartC · Dec 9, 2012

James Kuyper said:
On 12/09/2012 10:54 AM, Phil Carmody wrote:

(Neither do I.)

He's made it clear what he wants it to mean.

*(T*)&A = expr;

It can be done somewhat more safely by declaring a union,
but he wants to be able to use it even in contexts where he can't change
the definition of A to be a union, and the syntax for doing it using a
union is more complicated than he wants it to be. Also, he thinks the
syntax to do this using a union is excessively complicated.

It's not that complicated, but it's intrusive.

It would also affect uses of 'A' throughout the project. (And perhaps a
different module might want to apply a different union!) Sometimes you might
just want to use such a cast locally. Or it might be experimental, and you
want to try out something without turning the whole project upside-down.

There are plenty of uses for the feature. And if a union-like behaviour is
the way to do it, perhaps the compiler can take care of it, in trivial
cases. Or the rules for unions can be relaxed. So instead of having to
change;

S* p;

to:

union {
S* s;
T* t;
} p;

and changing every p to p.s or p.t, simpler to allow an anonymous union, for
example:

union {
S* p;
};

Now p can be used as before, but the compiler knows there might be aliasing
issues. This still requires the cast. Or:

union {
S* p;
T* pt;
};

Now p is only changed in a few places to pt, instead of using a cast. (As a
bonus, it would be nice to augment an existing union too.)

James Kuyper · Dec 9, 2012

On 12/09/2012 01:50 PM, BartC wrote:
....

There are plenty of uses for the feature. And if a union-like behaviour is
the way to do it, perhaps the compiler can take care of it, in trivial
cases. Or the rules for unions can be relaxed. So instead of having to
change;

S* p;

to:

union {
S* s;
T* t;
} p;

and changing every p to p.s or p.t, simpler to allow an anonymous union, for
example:

union {
S* p;
};

That wouldn't do anything useful. It doesn't tell the compiler which
other types p is aliased with, so it can't know how much space to
reserve for the union, nor what alignment requirements the union will have.

Now p can be used as before, but the compiler knows there might be aliasing
issues. This still requires the cast. Or:

union {
S* p;
T* pt;
};

That, on the other hand, would work, and is trivially compatible with
the current C language. Anonymous unions are already allowed as members
in structures (as of C2011), and stand-alone anonymous unions have, I
believe, long been a common extension to C.

Keith Thompson · Dec 10, 2012

James Kuyper said:
On 12/09/2012 01:50 PM, BartC wrote: [...]

union {
S* p;
T* pt;
};

Click to expand...

That, on the other hand, would work, and is trivially compatible with
the current C language. Anonymous unions are already allowed as members
in structures (as of C2011), and stand-alone anonymous unions have, I
believe, long been a common extension to C.

I think you may be mistaken on that last point.

Are you saying that (with such an extension), this:

union {
unsigned u;
float f;
};
f = 1.0;
printf("0x%x\n", u);

would be allowed? I haven't seen that; gcc warns "unnamed
struct/union that defines no instances". In other words, a
standalone anonymous union is already valid but useless.

function casts	27	Oct 12, 2012
Casts	81	Jun 23, 2007
Lexical Analysis on C++	1	Oct 31, 2023
lvalues and rvalues	127	Apr 6, 2010
Union and pointer casts?	13	Feb 24, 2011
Pointer casts for OOP	2	Aug 18, 2011
Function is not worked in C	2	Jun 27, 2023
Fun with casts	1	Apr 7, 2010

Casts on lvalues

Rui Maciel

James Kuyper

BartC

Öö Tiib

glen herrmannsfeldt

BartC

BartC

Keith Thompson

James Kuyper

BartC

James Kuyper

glen herrmannsfeldt

glen herrmannsfeldt

Ben Bacarisse

glen herrmannsfeldt

Phil Carmody

James Kuyper

BartC

James Kuyper

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads