realloc() implicit free() ?

S

S.Tobias

Tim Rentsch said:
Suppose we have a union type
union variant_union_tag {
struct {
uint8_t kind;
} variant_indicator_byte;
struct {
uint8_t kind;
char c[1024*1024];
} char_variant;
struct {
uint8_t kind;
short s[1024*1024];
} short_variant;
[...]
} int_variant;
[...]
} long_variant;
[...]
} long_long_variant;

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.
 
T

Tim Rentsch

S.Tobias said:
Tim Rentsch said:
Suppose we have a union type
union variant_union_tag {
struct {
uint8_t kind;
} variant_indicator_byte;
struct {
uint8_t kind;
char c[1024*1024];
} char_variant;
struct {
uint8_t kind;
short s[1024*1024];
} short_variant;
[...]
} int_variant;
[...]
} long_variant;
[...]
} long_long_variant;

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.

You raise a good point. Is there language in the standard that gives
a clear answer here? I looked but didn't find any.

There's a reasonable argument in terms of equivalence. Instead of

v->variant_indicator_byte.kind = 0;

the assignment

* (uint8_t*)v = 0;

should have the same effect, and this assignment is, as far as I can
determine, required to work. Similarly, if the first struct in the
union were to have a struct tag 'variant_indicator_byte_struct', then
the assignments

(*(struct variant_indicator_byte_struct *)v).kind = 0;
((struct variant_indicator_byte_struct *)v)->kind = 0;

also are (as far as I can determine) required to work. It would be
surprising if the casted assignments were required to work but the
cited assignment, which certainly seems like it should be equivalent,
were not.

Was the intention of the committee that the cited assignment be
required to work? I don't know, but reading the language about the
semantics of flexible array member (the closest example I could find
that seems analogous), it seems clear that the cited assignment would
work if it were analogous to how a flexible array member is treated.

Reading section 6.5.2.3 (discussing the '.' and '->' expressional
forms), I don't see any language that clearly addresses the question
one way or the other.
 
S

S.Tobias

Tim Rentsch said:
Tim Rentsch said:
Suppose we have a union type
union variant_union_tag {
struct {
uint8_t kind;
} variant_indicator_byte;
struct {
uint8_t kind;
char c[1024*1024];
} char_variant;
struct {
uint8_t kind;
short s[1024*1024];
} short_variant;
[...]
} int_variant;
[...]
} long_variant;
[...]
} long_long_variant;

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.
You raise a good point. Is there language in the standard that gives
a clear answer here? I looked but didn't find any.

I *think* it is 6.5.2.3 p.3 and p.4. Pay attention to the words:
"The value is that of the named member of the object to which the first
expression points,[...]". It means that the result of the "->" operator
is defined _only_ for objects that have that member (and undefined
otherwise), ie. the object must be appropriate effective type.

I think my interpretation is in agreement with what Eric Sossman
described in his recent article (read the spoiler)
Subject: Re: pointer conversion
Message-ID: <[email protected]>
He gives an interesting analysis of what happened when he applied "->"
to a pointer to an object which did not have the specified member,
although that member was at the same position as the intended read.

(Although IMHO his answer missed the question a little, I saved
it anyway for myself for future, it was a very valuable example.)

There's a reasonable argument in terms of equivalence. Instead of
v->variant_indicator_byte.kind = 0;
the assignment
* (uint8_t*)v = 0;
should have the same effect, and this assignment is, as far as I can
determine, required to work.

I don't agree to "the same effect", of course (in context of this
discussion; see also my last remarks at the bottom). But I think
the second is valid.

(Actually, the second expression should have two casts:
*(uint8_t*)(struct variant_indicator_byte_struct*)v = 0;
For an explanation see the recent csc discussion on ptr conversions.)
Similarly, if the first struct in the
union were to have a struct tag 'variant_indicator_byte_struct', then
the assignments
(*(struct variant_indicator_byte_struct *)v).kind = 0;
((struct variant_indicator_byte_struct *)v)->kind = 0;

I see nothing wrong with these, either.
also are (as far as I can determine) required to work. It would be
surprising if the casted assignments were required to work but the
cited assignment, which certainly seems like it should be equivalent,
were not.

To be sure we understand each other:
Its this (the first): ^^^^
operator that is problematic.

+++

I'm probably not the right person to give definitive answers here.
I plan to ask a few question about structs in csc in future,
after I do some deeper research myself first.

One question I would like to ask (concerning "the same effect" above)
is if setting subobjects (in dynamic objects) in a layout compatible
way with a struct is equivalent to initializing a sctruct object.
void *pv = malloc(enough_or_more);
struct s { int i; } *ps = pv;
int *pi = pv;
*pi = 42;
ps->i; // defined?
What is the effective type of the allocated object? Is it only `int',
or both `int' and `struct s'?
 
T

Tim Rentsch

S.Tobias said:
Tim Rentsch said:
Suppose we have a union type

union variant_union_tag {

struct {
uint8_t kind;
} variant_indicator_byte;

struct {
uint8_t kind;
char c[1024*1024];
} char_variant;

struct {
uint8_t kind;
short s[1024*1024];
} short_variant;

[...]
} int_variant;

[...]
} long_variant;

[...]
} long_long_variant;

};

...

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.
You raise a good point. Is there language in the standard that gives
a clear answer here? I looked but didn't find any.

I *think* it is 6.5.2.3 p.3 and p.4. Pay attention to the words:
"The value is that of the named member of the object to which the first
expression points,[...]". It means that the result of the "->" operator
is defined _only_ for objects that have that member (and undefined
otherwise), ie. the object must be appropriate effective type.

I read through (and re-read through) 6.5.2.3, and I don't think it
implies what you say about effective type. Consider

struct { int x; int y; int z; } *p;
p = malloc( sizeof *p );
if( !p ) bomb( "no memory" );
p->y = 0;

Surely the last assignment is legal. Yet, the object in the allocated
memory has no effective type, because it hasn't been stored into yet.
We can store and access 'p->y' all day, and never assign any of the
other fields of the struct (or the whole struct), and that's got to be
legal, no question. The struct as a whole, or members other than
p->y, might never have an effective type at all -- still we can assign
and access p->y.

I think my interpretation is in agreement with what Eric Sossman
described in his recent article (read the spoiler)
Subject: Re: pointer conversion
Message-ID: <[email protected]>
He gives an interesting analysis of what happened when he applied "->"
to a pointer to an object which did not have the specified member,
although that member was at the same position as the intended read.

I read through Eric's email, and it definitely was interesting.
However, there is a key difference here - he was accessing struct's of
different types that were not in a union. There is a special
guarantee in cases where the struct's *are* in a union, as explained
in 6.5.2.3 p5. I believe that if his struct's had been contained in a
union, that code would have been required to work. See Example 3 in
6.5.2.3 p8. The second part of the example seems to imply that if the
union type had been visible in f(), then it would be ok, *even though*
the accesses were being made through struct pointers rather than
through a union type.

I don't agree to "the same effect", of course (in context of this
discussion; see also my last remarks at the bottom). But I think
the second is valid.

I think it should have the same effect because of the rules about
pointers to structs being the same as pointers to their first
member (if not a bitfield) and pointers to unions being the same
as pointers to any member of the union.

(Actually, the second expression should have two casts:
*(uint8_t*)(struct variant_indicator_byte_struct*)v = 0;
For an explanation see the recent csc discussion on ptr conversions.)

I'm not sure if two casts are required or not; however, even if two
casts are required, we should be able to do this:

*(uint8_t*)(void*)v = 0;

I see nothing wrong with these, either.

Ok, that's good... continuing...
To be sure we understand each other:
Its this (the first):
^^^^
operator that is problematic.

I agree that the ^^^^-underlined operator is the one in question. The
"region of storage" that makes up the (union) object doesn't have an
effective type by virtue of it's having been declared or stored into,
so the effective type "is simply the type of the lvalue used for the
access" (6.5 p6). So we are accessing a "union object", because of
how it's being accessed, even if the storage that's been allocated
isn't enough to hold all the bytes that might be in one of the members
of the union.

I'm probably not the right person to give definitive answers here.
I plan to ask a few question about structs in csc in future,
after I do some deeper research myself first.

It's always good to get thoughtful, constructive discussion even if
the participants don't necessarily have all the answers. In this case
one of the key questions is, Are there definitive answers to be found?

By the way, I'm not sure my arguments here are right. They are just
the best arguments I've been able to find so far.

One question I would like to ask (concerning "the same effect" above)
is if setting subobjects (in dynamic objects) in a layout compatible
way with a struct is equivalent to initializing a sctruct object.
void *pv = malloc(enough_or_more);
struct s { int i; } *ps = pv;
int *pi = pv;
*pi = 42;
ps->i; // defined?
What is the effective type of the allocated object? Is it only `int',
or both `int' and `struct s'?

Assuming that 'sizeof *ps <= enough_or_more' actually holds, I would
say (1) 'ps->i;' is well defined, and (2) the effective type of the
allocated object is only 'int', not 'struct s'. My reasoning on (2)
is this: it's perfectly allowable for an implementation to add
padding at the end of a struct, and this padding is part of the object
(even though it's irrelevant in some sense). In practice the lack of
stored-into-effective-type doesn't matter, because any access to the
storage will be done through some lvalue, and the lvalue will supply
an effective type since the region of memory as a whole doesn't have
one.

To say this another way, struct objects that have been allocated with
malloc() never have an effective type of struct unless a structure
assignment has been done - normally the effective type comes from the
lvalue used to provide access.
 
S

S.Tobias

Tim Rentsch said:
S.Tobias said:
Tim Rentsch said:
Suppose we have a union type

union variant_union_tag {

struct {
uint8_t kind;
} variant_indicator_byte;

struct {
uint8_t kind;
char c[1024*1024];
} char_variant;

struct {
uint8_t kind;
short s[1024*1024];
} short_variant;

[...]
} int_variant;

[...]
} long_variant;

[...]
} long_long_variant;

};

...

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.
You raise a good point. Is there language in the standard that gives
a clear answer here? I looked but didn't find any.

I *think* it is 6.5.2.3 p.3 and p.4. Pay attention to the words:
"The value is that of the named member of the object to which the first
expression points,[...]". It means that the result of the "->" operator
is defined _only_ for objects that have that member (and undefined
otherwise), ie. the object must be appropriate effective type.
I read through (and re-read through) 6.5.2.3, and I don't think it
implies what you say about effective type. Consider
struct { int x; int y; int z; } *p;
p = malloc( sizeof *p );
if( !p ) bomb( "no memory" );
p->y = 0;
Surely the last assignment is legal. Yet, the object in the allocated
memory has no effective type, because it hasn't been stored into yet.

You have stored an int value into the object, and that part acquires
that effective type. I think that application of `->y' gives the
whole object the effective struct type "in the background". I know
that the Standard does not specify this, but it must work this way.

Consider this example:

struct s { int x; int y; int z; };
struct z { int x; int y; int z; };
void f(struct s *ps, struct z *pz)
{
int temp;
/* 1 */ temp = ps->y;
/* 2 */ pz->y = 0;
return temp;
}

Could compiler reorder lines 1 and 2? Yes, because it may assume
that `ps' and `pz' point to different objects. If they pointed
to the same (allocated) object[*], then line 1 would give it
an effective type `struct s', and line 2 would apply `z::->y'
to `struct s' object (ie. it would try to find z::y member in
an object that doesn't have it) and would raise UB; thus
compiler's assumption is not invalidated.

[*]allocated objects are intended to have similar access semantics
as declared objects (I have read it somewhere)

If the compiler could not make such an assumption (eg. when writing
through '->' didn't give the whole object a struct type), then there
would be no issue with the function `f' in the EXAMPLE 3 in 6.5.2.3.
(My example above is essentially the same.) The problem with it
is that `f' doesn't see the union and it doesn't know that both
its arguments may point to the same object, therefore it assumes
that they don't, and may access their members in independent ways,
*although* their layout is the same, which `f' necessarily knows
about.
We can store and access 'p->y' all day, and never assign any of the
other fields of the struct (or the whole struct), and that's got to be
legal, no question. The struct as a whole, or members other than
p->y, might never have an effective type at all -- still we can assign
and access p->y.

Yes, but if the object didn't have "the struct type as a whole",
how could you read (access) `p->y'?
I read through Eric's email, and it definitely was interesting.
However, there is a key difference here - he was accessing struct's of
different types that were not in a union.

No, this was not the problem; re-read his analysis. The problem
was that he was trying to read type2::prefix from a type1 struct.
Compiler assumed that since type2::prefix was read, so the object
had to be type2 type, so the compiler decided it could prefetch
more bytes (up to type2 size). This is what caused the crash, and
not merely the presence of pointer with the wrong type (of course
the type of the pointer disambiguated the member).

One way to fix this would be to cast and replace:
void func2(struct type2 *ptr) {
/* check for proper type first! */
if (ptr->prefix.type == TYPE2) {
with:
[...]
if (((struct type1 *)ptr)->prefix.type == TYPE2) {
Here we refer to a different `prefix' (type1::prefix). and compiler
cannot pull more bytes than are in `struct type1'.

[ The above "fix" is not correct from design POV, I just wanted
to show what would work. The Right fix would be to write:
if (((struct common *)ptr)->type == TYPE2) {
where the conversion (cast) is well defined (`struct common'
is the type of the first member of `type2'). ]

Of course, what Eric Sosman described was one way for UB to emerge.
But it clearly shows that the compiler inferred the size of the object
from the operand of the "->" operator. It doesn't matter if it's
a struct (Eric's case) or a union (your case); (the difference is
only whether the members overlap or not). If you apply "->" to
a union, the compiler could also infer the size from the operand
pointer type.

There is a special
guarantee in cases where the struct's *are* in a union, as explained
in 6.5.2.3 p5. I believe that if his struct's had been contained in a
union, that code would have been required to work. See Example 3 in
6.5.2.3 p8.

No. The compiler would still assume that what is on the left of "->"
points to an object of size(struct type2) bytes, or more (if in a union),
and would still pull up to size(struct type2) bytes.
The second part of the example seems to imply that if the
union type had been visible in f(), then it would be ok, *even though*
the accesses were being made through struct pointers rather than
through a union type.

No, it describes a different situation, when structs are members
of a union, and you write to one and read from another. I have
explained it already above.

I think it should have the same effect because of the rules about
pointers to structs being the same as pointers to their first
member (if not a bitfield) and pointers to unions being the same
as pointers to any member of the union.

I have explained this above. The problem is that `v' points
to less memory than is indicated by `->variant_indicator_byte'
operator.
I'm not sure if two casts are required or not; however, even if two
casts are required, we should be able to do this:
*(uint8_t*)(void*)v = 0;

This is a minor issue here, I just mentioned it. The problem is
that conversion from (union variant_union_tag*) to (uint8_t*)
is implementation defined (the union doesn't have uint8_t
member). Conversion to the type of one of its members is well
defined. To (char*) it is well defined too, and so it is to (void*)
(ahem... well... yes... have you read that c.s.c discussion?).


[snip]

Assuming that 'sizeof *ps <= enough_or_more' actually holds, I would
say (1) 'ps->i;' is well defined, and (2) the effective type of the
allocated object is only 'int', not 'struct s'. My reasoning on (2) [snip]

To say this another way, struct objects that have been allocated with
malloc() never have an effective type of struct unless a structure
assignment has been done - normally the effective type comes from the
lvalue used to provide access.

Maybe you're right. I don't know... it's hard for me to accept that
the same object could then be read through different types:
struct s1 { int i; } *ps1 = pv;
struct s2 { int i; } *ps2 = pv;
int (*pa)[] = pv;
ps1->i;
ps2->i;
(*pa)[0];
but maybe it can, until it is assigned a whole `struct s' type.
 
T

Tim Rentsch

S.Tobias said:
Tim Rentsch said:
S.Tobias said:
Suppose we have a union type

union variant_union_tag {

struct {
uint8_t kind;
} variant_indicator_byte;

struct {
uint8_t kind;
char c[1024*1024];
} char_variant;

struct {
uint8_t kind;
short s[1024*1024];
} short_variant;

[...]
} int_variant;

[...]
} long_variant;

[...]
} long_long_variant;

};

...

union variant_union_tag *v = malloc( sizeof v->variant_indicator_byte );
if( ! v ) bail( "no memory" );
v->variant_indicator_byte.kind = 0;

I am not quite sure, but I think this might be wrong.

Applying `->variant_indicator_byte' operator to `v' implies that `v' points
to a `union variant_union_tag' object, or, in case of dynamic objects, you
designate such an object to be that type. However the allocated object
cannot hold `union variant_union_tag' object, because its size is too
small, thus `->' should yield UB.

You raise a good point. Is there language in the standard that gives
a clear answer here? I looked but didn't find any.

I *think* it is 6.5.2.3 p.3 and p.4. Pay attention to the words:
"The value is that of the named member of the object to which the first
expression points,[...]". It means that the result of the "->" operator
is defined _only_ for objects that have that member (and undefined
otherwise), ie. the object must be appropriate effective type.
I read through (and re-read through) 6.5.2.3, and I don't think it
implies what you say about effective type. Consider
struct { int x; int y; int z; } *p;
p = malloc( sizeof *p );
if( !p ) bomb( "no memory" );
p->y = 0;
Surely the last assignment is legal. Yet, the object in the allocated
memory has no effective type, because it hasn't been stored into yet.

You have stored an int value into the object, and that part acquires
that effective type. I think that application of `->y' gives the
whole object the effective struct type "in the background". I know
that the Standard does not specify this, but it must work this way.

The problem with this interpretation is that it just doesn't match the
definition of effective type. The notions of "type" and "effective
type" are distinct notions: objects can have a type without having an
effective type. See 6.3.2.1 p1 for a definition of "type of an object"
and 6.5 p6 for a definition of "effective type"; I think you'll see
what I mean.

Consider this example:

struct s { int x; int y; int z; };
struct z { int x; int y; int z; };
void f(struct s *ps, struct z *pz)
{
int temp;
/* 1 */ temp = ps->y;
/* 2 */ pz->y = 0;
return temp;
}

Could compiler reorder lines 1 and 2? Yes, because it may assume
that `ps' and `pz' point to different objects. If they pointed
to the same (allocated) object[*], then line 1 would give it
an effective type `struct s', and line 2 would apply `z::->y'
to `struct s' object (ie. it would try to find z::y member in
an object that doesn't have it) and would raise UB; thus
compiler's assumption is not invalidated.

[*]allocated objects are intended to have similar access semantics
as declared objects (I have read it somewhere)

I agree that reordering lines 1 and 2 would be allowed in this case.
(Of course you meant 'int f( ...' rather than 'void f( ...' as was
written.) But what happens if we add a union?

struct s { int x; int y; int z; };
struct z { int x; int y; int z; };
union u { struct s s; struct z z; };
int f(struct s *ps, struct z *pz){
int temp;
temp = ps->y; /* 1 */
pz->y = 0; /* 2 */
return temp;
}

int call_f(){
union u u;
static struct z z_ones = {1,1,1};
u.z = z_ones;
return f( &u.s, &u.z );
}

Now the compiler would not be allowed to reorder lines 1 and 2. The
reason is that the variable 'ps' might be pointing to what is actually
a 'union u' object that last had a 'struct z' stored in the 'z' member
(as shown by the 'call_f' function body), in which case the code is
permitted (by 6.5.2.3 p5) to look at 'ps->y' even though the effective
type of the memory at *ps is not 'struct s'. You see what I mean?

If the compiler could not make such an assumption (eg. when writing
through '->' didn't give the whole object a struct type), then there
would be no issue with the function `f' in the EXAMPLE 3 in 6.5.2.3.
(My example above is essentially the same.) The problem with it
is that `f' doesn't see the union and it doesn't know that both
its arguments may point to the same object, therefore it assumes
that they don't, and may access their members in independent ways,
*although* their layout is the same, which `f' necessarily knows
about.

I think we are in agreement here - what the compiler can assume in
the absence of the union definition and what the compiler can assume
in the presence of the union definition are different.

Yes, but if the object didn't have "the struct type as a whole",
how could you read (access) `p->y'?

Referring to your example (or the modified one with the union):

the type of 'ps' in 'ps->y' is 'struct s *'
the type of 'pz' in 'pz->y' is 'struct z *'

the type of 'y' in 'ps->y' is 'int'
the type of 'y' in 'pz->y' is 'int'

the effective type of 'ps->y' is 'int' (because an int had been stored)
the effective type of 'pz->y' is 'int'

the subexpression 'ps' in 'ps->y' has no effective type
the subexpression 'pz' in 'pz->y' has no effective type
(the reason in both cases is that these subexpressions
aren't used to access any object)

Expressions (and subexpressions) have type, but not necessarily
effective type; effective type is relevant only when accessing
objects.

No, this was not the problem; re-read his analysis. The problem
was that he was trying to read type2::prefix from a type1 struct.
Compiler assumed that since type2::prefix was read, so the object
had to be type2 type, so the compiler decided it could prefetch
more bytes (up to type2 size). This is what caused the crash, and
not merely the presence of pointer with the wrong type (of course
the type of the pointer disambiguated the member).

I have to concede on this one. My comment is right as far as it goes,
but yours is righter. What I was thinking was that his problem could
have been solved by bundling the structs up in a union. Maybe that
would actually have worked, but I (now) believe that technically a
cast along the lines of what he wrote in his email would still be
needed. (None of that changes my other comments though.)

One way to fix this would be to cast and replace:
void func2(struct type2 *ptr) {
/* check for proper type first! */
if (ptr->prefix.type == TYPE2) {
with:
[...]
if (((struct type1 *)ptr)->prefix.type == TYPE2) {
Here we refer to a different `prefix' (type1::prefix). and compiler
cannot pull more bytes than are in `struct type1'.

Actually I think this fix isn't guaranteed to work. Because the
parameter is declared to be a 'struct type2 *', the compiler is
still allowed to hoist accesses to fields of that struct up
above the 'if' - just because the parameter is being casted doesn't
change the type as far as the compiler is concerned. Doing it the
other way:

void func2( struct type1 *ptr ){
if( ptr->prefix.type == TYPE2 ){
struct type2 ptr2 = (struct type2 *)ptr;
... use ptr2 ...
}
}

The code below the 'if' can't be hoisted because the cast may end up
producing different behavior if it is done or if it isn't.
(Technically, I think the cast and subsequent accesses could also
be hoisted above the 'if' if the compiler knew that they couldn't
cause problems, eg, if all memory accesses were legal. But in that
case there wouldn't be any errant behavior.)

[ The above "fix" is not correct from design POV, I just wanted
to show what would work. The Right fix would be to write:
if (((struct common *)ptr)->type == TYPE2) {
where the conversion (cast) is well defined (`struct common'
is the type of the first member of `type2'). ]

Same comment as above. The cast needs to be in the body of the
'if', not in the 'if' expression.

Of course, what Eric Sosman described was one way for UB to emerge.
But it clearly shows that the compiler inferred the size of the object
from the operand of the "->" operator. It doesn't matter if it's
a struct (Eric's case) or a union (your case); (the difference is
only whether the members overlap or not). If you apply "->" to
a union, the compiler could also infer the size from the operand
pointer type.

Right; the presence of a union type, or even having the parameter
be a pointer to union type, is not enough by itself to guarantee
that the code in Eric's example would work.

There is a special
guarantee in cases where the struct's *are* in a union, as explained
in 6.5.2.3 p5. [...more stuff about the previous example snipped...]
The second part [of example 6.5.2.3 p8] seems to imply that if the
union type had been visible in f(), then it would be ok, *even though*
the accesses were being made through struct pointers rather than
through a union type.

No, it describes a different situation, when structs are members
of a union, and you write to one and read from another. I have
explained it already above.

Right, the situation in Eric's email and the situation in the second
part of 6.5.2.3 p8 are different. I stand by my assertion that if the
union type had been visible in f() then f() would be required to work
(using reasoning similar to the reasoning given for the 'call_f()'
code example).

I have explained this above. The problem is that `v' points
to less memory than is indicated by `->variant_indicator_byte'
operator.

Certainly 'v' points to less memory than one would expect that the
union type would need (assuming malloc() doesn't round up to 1GB or
something ridiculous like that). But I don't see any clear evidence
that that can make a difference in distinguishing these two cases.

Also, just to be sure we're on the same page - what you're meaning to
say is that the available memory at '*v' isn't as big as 'sizeof(*v)',
not that the available memory at 'v->variant_indicator_byte' isn't as
big as 'sizeof(v->variant_indicator_byte)' -- right?

However... looking at the bigger picture, code like this:

int f( union variant_union_tag *v ){
if( v->variant_indicator_byte.kind == SHORT_VARIANT ){
return v->short_variant.c[ 2000 ];
}
return 0;
}

might very well fail in the same way that Eric's example failed. The
question is, is that because undefined behavior is being evoked, or is
it because the implementation has interpreted the standard wrongly? I
still haven't found language in the standard that answers this
question clearly and unambiguously.

This is a minor issue here, I just mentioned it. The problem is
that conversion from (union variant_union_tag*) to (uint8_t*)
is implementation defined (the union doesn't have uint8_t
member). Conversion to the type of one of its members is well
defined. To (char*) it is well defined too, and so it is to (void*)
(ahem... well... yes... have you read that c.s.c discussion?).

I admit I'm going by what I consider the spirit of the language in
section 6.7.2.1 p13,p14. Even if the language is a little off, I
believe it's clear what's intended to be guaranteed about pointer
conversion: namely, that pointers to a struct can be converted to
pointers to the first member of the struct and pointers to unions can
be converted to pointers to any of the union's members; these holding
recursively; and also holding vice versa when the pointer being
converted is pointing to an object that actually is in a struct or a
union. I've seen some of the discussion in comp.std.c; I think that
discussion is more about what the language in the standard should be
than it is about how the rules should be interpreted.

[snip]

Assuming that 'sizeof *ps <= enough_or_more' actually holds, I would
say (1) 'ps->i;' is well defined, and (2) the effective type of the
allocated object is only 'int', not 'struct s'. My reasoning on (2) [snip]

To say this another way, struct objects that have been allocated with
malloc() never have an effective type of struct unless a structure
assignment has been done - normally the effective type comes from the
lvalue used to provide access.

Maybe you're right. I don't know... it's hard for me to accept that
the same object could then be read through different types:
struct s1 { int i; } *ps1 = pv;
struct s2 { int i; } *ps2 = pv;
int (*pa)[] = pv;
ps1->i;
ps2->i;
(*pa)[0];
but maybe it can, until it is assigned a whole `struct s' type.

The notions of type and of effective type are different. Any
expression that *designates* an object has a type; effective type is
relevant only for expressions that *access* an object. The point of
6.5 p7 is to impose constraints on the relationship between the type
of an expression that designates an object and the effective type of
an access to an object.

Again you've brought up some good points. I still don't see any
language in the standard that gives a clear answer to the high level
question. I'm pretty sure though that the rules about effective
type don't imply any undefined behavior here.
 
S

S.Tobias

Tim Rentsch said:
S.Tobias said:
Tim Rentsch said:
I *think* it is 6.5.2.3 p.3 and p.4. Pay attention to the words:
"The value is that of the named member of the object to which the first
expression points,[...]". It means that the result of the "->" operator
is defined _only_ for objects that have that member (and undefined
otherwise), ie. the object must be appropriate effective type.
I read through (and re-read through) 6.5.2.3, and I don't think it
implies what you say about effective type. Consider
struct { int x; int y; int z; } *p;
p = malloc( sizeof *p );
if( !p ) bomb( "no memory" );
p->y = 0;
Surely the last assignment is legal. Yet, the object in the allocated
memory has no effective type, because it hasn't been stored into yet.

You have stored an int value into the object, and that part acquires
that effective type. I think that application of `->y' gives the
whole object the effective struct type "in the background". I know
that the Standard does not specify this, but it must work this way.
The problem with this interpretation is that it just doesn't match the
definition of effective type. The notions of "type" and "effective
type" are distinct notions: objects can have a type without having an
effective type. See 6.3.2.1 p1 for a definition of "type of an object"
and 6.5 p6 for a definition of "effective type"; I think you'll see
what I mean.

I don't quite get what you mean to say; "effective type" is the declared
type, or last stored type. You cannot access an object with any lvalue
you want, so I'd say that the definition in 6.3.2.1 is subjected to
the restrictions in 6.5p7.


I just want to add one more argument, to show that my supposition
is not completely unfounded:
struct sc { short s; char c; } s;
/* layout: ss--c--- */
memset(s, 0, sizeof s); /* 00000000 */
void *p = &s;
*(short*)p = 42; /* SS000000 */
((struct s*)p)->s = 42; /* SS??0??? */ // cf. 6.2.6.1p6
Writing into one struct member causes the whole struct to be accessed
(padding bytes take unspecified values). This is significantly
different from "injecting" values directly into parts where the members
are expected to be found.

Consider this example:

struct s { int x; int y; int z; };
struct z { int x; int y; int z; };
void f(struct s *ps, struct z *pz)
{
int temp;
/* 1 */ temp = ps->y;
/* 2 */ pz->y = 0;
return temp;
}

Could compiler reorder lines 1 and 2? Yes, because it may assume
that `ps' and `pz' point to different objects. If they pointed
to the same (allocated) object[*], then line 1 would give it
an effective type `struct s', and line 2 would apply `z::->y'
to `struct s' object (ie. it would try to find z::y member in
an object that doesn't have it) and would raise UB; thus
compiler's assumption is not invalidated.

[*]allocated objects are intended to have similar access semantics
as declared objects (I have read it somewhere)

I'm aware that my interpretation is not a strong one. The Std does not
actually say anywhere that objects may have members, ie. IMHO it is
ambiguous if in 6.5.2.3p3,4 by "member of the object" it means just
a value that is supposed to be found at that position, or something
more than that. I assume the latter, based on various hints, that
I'm trying to present. My point is that in an expression "p->m"
what counts for an access is not only the type of the whole expression,
but also the fact that a member is accessed. I think that if the Std
speaks about a member of an object, it means that the surrounding bytes
are presumed to be part of a struct/union (of a given effective
type) that holds that member. This is my whole point, which I'm trying
to prove.

I hope you don't believe me completely, because I don't, myself. ;-)
Otherwise I wouldn't plan to ask these questions in csc, would I?

Even if my interpretation is not quite correct, I think it agrees
with the responses in various DRs (take for an example DR236, and
see the line "union type must be used when changing effective type"
by the end - it seems to say it's not enough to "inject" a value;
I myself don't quite understand that explanation). I have found
related issues have had some discussions already:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_257.htm
(and links therein).

I agree that reordering lines 1 and 2 would be allowed in this case.
(Of course you meant 'int f( ...' rather than 'void f( ...' as was
written.) But what happens if we add a union?
struct s { int x; int y; int z; };
struct z { int x; int y; int z; };
union u { struct s s; struct z z; };
int f(struct s *ps, struct z *pz){
int temp;
temp = ps->y; /* 1 */
pz->y = 0; /* 2 */
return temp;
}
int call_f(){
union u u;
static struct z z_ones = {1,1,1};
u.z = z_ones;
return f( &u.s, &u.z );
}
Now the compiler would not be allowed to reorder lines 1 and 2. The
reason is that the variable 'ps' might be pointing to what is actually
a 'union u' object that last had a 'struct z' stored in the 'z' member
(as shown by the 'call_f' function body), in which case the code is
permitted (by 6.5.2.3 p5) to look at 'ps->y' even though the effective
type of the memory at *ps is not 'struct s'. You see what I mean?

Yes, now I see... Hmm... :-|
All I can say is that maybe the said exception is a double one:
it is both for union access, and for structs access.


[snip]

One way to fix this would be to cast and replace:
void func2(struct type2 *ptr) {
/* check for proper type first! */
if (ptr->prefix.type == TYPE2) {
with:
[...]
if (((struct type1 *)ptr)->prefix.type == TYPE2) {
Here we refer to a different `prefix' (type1::prefix). and compiler
cannot pull more bytes than are in `struct type1'.
Actually I think this fix isn't guaranteed to work. Because the
parameter is declared to be a 'struct type2 *', the compiler is
still allowed to hoist accesses to fields of that struct up
above the 'if' - just because the parameter is being casted doesn't
change the type as far as the compiler is concerned.

I don't understand the reasons why you say this. I think that
it doesn't matter what type the pointer is; you could create hundreds
of different pointers, the compiler is not allowed to access the
object they point to (consider malloc(0)). What counts here is
that a member access is being performed, and what matters is
the type of the _expression_ which is the operand to "->" operator.


[snip]

Certainly 'v' points to less memory than one would expect that the
union type would need (assuming malloc() doesn't round up to 1GB or
something ridiculous like that). But I don't see any clear evidence
that that can make a difference in distinguishing these two cases.
Also, just to be sure we're on the same page - what you're meaning to
say is that the available memory at '*v' isn't as big as 'sizeof(*v)',
not that the available memory at 'v->variant_indicator_byte' isn't as
big as 'sizeof(v->variant_indicator_byte)' -- right?

Yes, precisely!
However... looking at the bigger picture, code like this:
int f( union variant_union_tag *v ){
if( v->variant_indicator_byte.kind == SHORT_VARIANT ){
return v->short_variant.c[ 2000 ];
}
return 0;
}
might very well fail in the same way that Eric's example failed.

Yes, that's the point.

Reading of a member is not the same as writing to it, but I believe
it is possible do assume some equivalence, ie. that reading
a member conceptually accesses the whole containing object.
The
question is, is that because undefined behavior is being evoked, or is
it because the implementation has interpreted the standard wrongly? I
still haven't found language in the standard that answers this
question clearly and unambiguously.

Agreed. :-|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top