"free space" with declared type

S

S.Tobias

Can an object with a _declared type_ serve as free space?

union
{
char space[ENOUGH];
maxalign_t unused;
} _free;

void *my_malloc(size_t size)
{
/*...*/
return _free.space;
}

int main()
{
int *pi = my_malloc(sizeof *pi);
if (pi)
*pi = 7;
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?
Is there a better alternative (that doesn't use malloc)? (I'm thinking
of something similar in idea to "Suba" module in libmba.)
 
E

Eric Sosman

S.Tobias said:
Can an object with a _declared type_ serve as free space?

Yes; the hard part is figuring out what the
declaration should be.
union
{
char space[ENOUGH];
maxalign_t unused;
} _free;

Nitpick: `_free' is an identifier reserved to
the implementation.
void *my_malloc(size_t size)
{
/*...*/
return _free.space;
}

int main()
{
int *pi = my_malloc(sizeof *pi);
if (pi)
*pi = 7;
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?

No U.B. that I can see, provided `maxalign_t' is
aligned strictly enough for an `int'. Data objects are
defined as sequences of bytes (there's special language
for bit-fields), and accessing the individual bytes of a
multi-byte object is permitted. Note that memcpy() would
make no sense if this were not so; neither would, say,
fread() and fwrite() when applied to a multi-byte object.
Is there a better alternative (that doesn't use malloc)? (I'm thinking
of something similar in idea to "Suba" module in libmba.)

"Is there a better alternative?" Well, what problem
are you trying to solve? For some problems, the clearly
superior alternative is exit(0) ;-)
 
S

S.Tobias

Eric Sosman said:
S.Tobias wrote:
Yes; the hard part is figuring out what the
declaration should be.

What do you mean? I thought that any non-const object type would be
good; it's just that "accidentally" char seemed most convenient.
Am I wrong?
Nitpick: `_free' is an identifier reserved to
the implementation.

Thanks, I've changed it to "my_free" now.
union
{
char space[ENOUGH];
maxalign_t unused;
} my_free;

*pi;
}
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?
No U.B. that I can see, provided `maxalign_t' is
aligned strictly enough for an `int'. Data objects are
defined as sequences of bytes (there's special language
for bit-fields), and accessing the individual bytes of a
multi-byte object is permitted.

I thought that accessing (I have added a line for that) value
of object(s) of type char with an lvalue of type int was UB.
I'm referring here to 6.5 #6 and #7. my_free.space has a declared
type, and it cannot be changed by assignment through an lvalue with
different type.
Note that memcpy() would
make no sense if this were not so; neither would, say,
fread() and fwrite() when applied to a multi-byte object.

I'm not quite getting what you mean here. I think memcpy() (and others)
is not a good example here, because in theory it copies (accesses)
data by bytes (unsigned char), which is explicitly allowed.
Well, what problem
are you trying to solve?

The problem of UB, of course, which I think takes place there.
 
J

Jack Klein

Eric Sosman said:
S.Tobias wrote:
Yes; the hard part is figuring out what the
declaration should be.

What do you mean? I thought that any non-const object type would be
good; it's just that "accidentally" char seemed most convenient.
Am I wrong?
Nitpick: `_free' is an identifier reserved to
the implementation.

Thanks, I've changed it to "my_free" now.
union
{
char space[ENOUGH];
maxalign_t unused;
} my_free;

void *my_malloc(size_t size)
{
/*...*/ return my_free.space;
}

int main()
{
int *pi = my_malloc(sizeof *pi);
if (pi) {
*pi = 7;
*pi;
}
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?
No U.B. that I can see, provided `maxalign_t' is
aligned strictly enough for an `int'. Data objects are
defined as sequences of bytes (there's special language
for bit-fields), and accessing the individual bytes of a
multi-byte object is permitted.

I thought that accessing (I have added a line for that) value
of object(s) of type char with an lvalue of type int was UB.
I'm referring here to 6.5 #6 and #7. my_free.space has a declared
type, and it cannot be changed by assignment through an lvalue with
different type.

But an array of char is an array of bytes. I'm not going to bother
itemizing the reasoning beyond that, I assume you either already agree
with it or will take my word for it.

So what exactly is the difference between an uninitialized array of
ENOUGH chars and the pointer returned by a successful call to
malloc(ENOUGH)?

There is one and only one possible difference. The block of SOME_SIZE
chars returned by malloc() is guaranteed to be suitably aligned for
any object type, an arbitrary array of chars is not. Anything you can
store in the block returned by malloc() you can store in the arbitrary
array of characters, as long as the array is properly aligned for the
type. And you are assuming that your 'maxalign_t'

In C, memory is memory is memory, or to use the word the standard does
in place of memory, storage is storage is storage. Provided that
alignment requirements are met and const or volatile qualifiers are
not violated, all storage is the same.

my_free.space does indeed have a declared type, and is an object. But
objects do not actually have types, lvalues have types. Note the
definition of object in 3.14, "region of data storage in the execution
environment, the contents of which can represent values".

Once you actually store an object into memory using an lvalue of
integer type, it contains an integer value. Strictly speaking it no
longer contains char values, if char is signed on an implementation,
for one of the bytes of an arbitrary int value might be a trap
representation for a signed char.

Accessing an object of type array of chars with an lvalue of type int
would cause undefined behavior if that object actually contained char
values. It does not do so if the object actually contains an int
value.

Again, other than alignment if your max_aligh_t type is not correct,
there is no difference at all between your array of chars and the
block of bytes returned by malloc(). It is only by storing values
into objects, which must be done by an lvalue of one type or another,
do the contents take on a value of that type.

[snip]
I'm not quite getting what you mean here. I think memcpy() (and others)
is not a good example here, because in theory it copies (accesses)
data by bytes (unsigned char), which is explicitly allowed.

Actually, in theory, memcpy() and others do their work by 'magic', and
library functions are not bound by the rules that programs are.
The problem of UB, of course, which I think takes place there.

No UB here, no more than doing the same thing with the pointer
returned by a successful call to malloc().
 
S

S.Tobias

Jack Klein said:
Eric Sosman said:
S.Tobias wrote:
union
{
char space[ENOUGH];
maxalign_t unused;
} my_free;

void *my_malloc(size_t size)
{
/*...*/ return my_free.space;
}

int main()
{
int *pi = my_malloc(sizeof *pi);
if (pi) {
*pi = 7; *pi;
}
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?
No U.B. that I can see, provided `maxalign_t' is
aligned strictly enough for an `int'. Data objects are
defined as sequences of bytes (there's special language
for bit-fields), and accessing the individual bytes of a
multi-byte object is permitted.

I thought that accessing (I have added a line for that) value
of object(s) of type char with an lvalue of type int was UB.
I'm referring here to 6.5 #6 and #7. my_free.space has a declared
type, and it cannot be changed by assignment through an lvalue with
different type.
But an array of char is an array of bytes. I'm not going to bother
itemizing the reasoning beyond that, I assume you either already agree
with it or will take my word for it.
Yes.

So what exactly is the difference between an uninitialized array of
ENOUGH chars and the pointer returned by a successful call to
malloc(ENOUGH)?

The difference is pretty abstract, but I think this is the crux
of the disagreement and of the problem.

6.5#6 differentiates between objects which have or have not
declared type (footnote 72 details that allocated objects have
no declared type). In context of value access, the /effective type/
of an object *with* declared type is the declared type, always.
Objects *without* declared type (from malloc()) acquire the effective
type by storing a value with a type (except character type; the
effective type is also transmitted through memcpy()).
This is the difference.

6.5#7 details rules for accessing value in an object, and is expressed
in terms of effective type of an object.

Summary:
- allocated object acquires and keeps the effective type, a declared
object does not;
- reading or writing a value is allowed through an lvalue with
*compatible* type (or qualified, or un/signed), or character type,
or through aggregate or union type, which contains the type.

[snipped alignment provisions]

[snipped storage quals provisions]
my_free.space does indeed have a declared type, and is an object. But
objects do not actually have types, lvalues have types. Note the
definition of object in 3.14, "region of data storage in the execution
environment, the contents of which can represent values".

Yes, but the Standard in certain contexts associates *effective types*
with objects.
Once you actually store an object into memory using an lvalue of
integer type, it contains an integer value. Strictly speaking it no
longer contains char values,

This is the point where I don't agree. This is true for allocated
objects. But an array with a declared type keeps its effective
type, no matter what value you store in it or how (the very attempt
to store such value through an incompatible type lvalue is already UB).

What's more, although clearly it's not so in the above code, I think
the compiler in general may assume that `pi' does not point
into the array `my_free.space' at all.

I have given all my arguments a few lines above. I don't claim my
understanding is better; of course, I might be in error. Could you
please give your arguments, best with pointers to the appropriate places
in the C Standard. Thanks.

[snip]


+++++

Actually, in theory, memcpy() and others do their work by 'magic', and
library functions are not bound by the rules that programs are.

I agree; that's why I wrote "in theory" - 'magic' is "in practice".

In c.s.c. discussion "access via character type" Douglas A. Gwyn
wrote in said:
See DR #274, which is reflected in TC#2,
for a related change we made to the wording of
the standard to better reflect our intent in
connection with the mem*() functions.

And the corresponding quote from TC2 is:
# 73. Page 324, 7.21.1
# Add a new paragraph 3:
# For all functions in this subclause, each character shall be
# interpreted as if it had the type unsigned char (and therefore every
# possible object representation is valid and has a different value).
 
J

Jack Klein

Jack Klein said:
S.Tobias wrote:
union
{
char space[ENOUGH];
maxalign_t unused;
} my_free;


void *my_malloc(size_t size)
{
/*...*/
return my_free.space;
}

int main()
{
int *pi = my_malloc(sizeof *pi);
if (pi)
{
*pi = 7;
*pi;
}
}

If I'm not mistaken, the above code should invoke UB, because we
access an object declared as type char[] with an lvalue of type int.
Can it cause problems in some environments?

No U.B. that I can see, provided `maxalign_t' is
aligned strictly enough for an `int'. Data objects are
defined as sequences of bytes (there's special language
for bit-fields), and accessing the individual bytes of a
multi-byte object is permitted.

I thought that accessing (I have added a line for that) value
of object(s) of type char with an lvalue of type int was UB.
I'm referring here to 6.5 #6 and #7. my_free.space has a declared
type, and it cannot be changed by assignment through an lvalue with
different type.
But an array of char is an array of bytes. I'm not going to bother
itemizing the reasoning beyond that, I assume you either already agree
with it or will take my word for it.
Yes.

So what exactly is the difference between an uninitialized array of
ENOUGH chars and the pointer returned by a successful call to
malloc(ENOUGH)?

The difference is pretty abstract, but I think this is the crux
of the disagreement and of the problem.

6.5#6 differentiates between objects which have or have not
declared type (footnote 72 details that allocated objects have
no declared type). In context of value access, the /effective type/
of an object *with* declared type is the declared type, always.
Objects *without* declared type (from malloc()) acquire the effective
type by storing a value with a type (except character type; the
effective type is also transmitted through memcpy()).
This is the difference.

6.5#7 details rules for accessing value in an object, and is expressed
in terms of effective type of an object.

Summary:
- allocated object acquires and keeps the effective type, a declared
object does not;
- reading or writing a value is allowed through an lvalue with
*compatible* type (or qualified, or un/signed), or character type,
or through aggregate or union type, which contains the type.

[snipped alignment provisions]

[snipped storage quals provisions]
my_free.space does indeed have a declared type, and is an object. But
objects do not actually have types, lvalues have types. Note the
definition of object in 3.14, "region of data storage in the execution
environment, the contents of which can represent values".

Yes, but the Standard in certain contexts associates *effective types*
with objects.
Once you actually store an object into memory using an lvalue of
integer type, it contains an integer value. Strictly speaking it no
longer contains char values,

This is the point where I don't agree. This is true for allocated
objects. But an array with a declared type keeps its effective
type, no matter what value you store in it or how (the very attempt
to store such value through an incompatible type lvalue is already UB).

What's more, although clearly it's not so in the above code, I think
the compiler in general may assume that `pi' does not point
into the array `my_free.space' at all.

I have given all my arguments a few lines above. I don't claim my
understanding is better; of course, I might be in error. Could you
please give your arguments, best with pointers to the appropriate places
in the C Standard. Thanks.

[snip]


+++++

Actually, in theory, memcpy() and others do their work by 'magic', and
library functions are not bound by the rules that programs are.

I agree; that's why I wrote "in theory" - 'magic' is "in practice".

In c.s.c. discussion "access via character type" Douglas A. Gwyn
wrote in said:
See DR #274, which is reflected in TC#2,
for a related change we made to the wording of
the standard to better reflect our intent in
connection with the mem*() functions.

And the corresponding quote from TC2 is:
# 73. Page 324, 7.21.1
# Add a new paragraph 3:
# For all functions in this subclause, each character shall be
# interpreted as if it had the type unsigned char (and therefore every
# possible object representation is valid and has a different value).

Note the magic words "as if" in this text.

As for the rest, you are way, way, way over thinking this. Although I
agree here, as in many other cases, the wording in the standard could
be much, much clearer.

Let's look at the terms "declared type" and "effective type".

For the rest of the discussion, assume that 'ca' defined below meets
the implementation's alignment requirements for int. And let's assume
sizeof(int) is 4.

char ca [sizeof(int)] = { 0 };
int *ip = (int *)ca;

The declared type of 'ca' is array of four chars. So far, so good.

The declared type of 'ip' is pointer to int. Regardless of what
address 'ip' contains, the effective type of '*ip' is int. Accessing
the value of '*ip' through its effective type (int) can cause
undefined behavior for any number of reasons, alignment, invalid
pointer, trap representation for int, and so on. But not at all how
the block of memory into which 'ip' points was defined.

Remember, memory is memory, or storage is storage, or at the lowest
level in C, bytes is bytes. The declared type of a region of storage
(that is, the type in the expression that defined and caused storage
to be allocated) does not change the underlying nature of the bytes.
There are no bytes that can only hold chars, or ints, or doubles, or
pointers.

By the way, if you still aren't convinced, you should take this to
comp.std.c and see the responses you get there, some from members of
the C standard committee, perhaps including Doug Gwyn who posts there
regularly.
 
L

Lawrence Kirby

On Tue, 14 Dec 2004 23:41:50 -0600, Jack Klein wrote:

....
Let's look at the terms "declared type" and "effective type".

For the rest of the discussion, assume that 'ca' defined below meets
the implementation's alignment requirements for int. And let's assume
sizeof(int) is 4.

char ca [sizeof(int)] = { 0 };
int *ip = (int *)ca;
The declared type of 'ca' is array of four chars. So far, so good.

The declared type of 'ip' is pointer to int. Regardless of what
address 'ip' contains, the effective type of '*ip' is int.

Nope. Objects have an effective type, lvalues have a type but it is not
the "effective type" as defined by C99. C99 6.5p6 says

"The /effective type/ of an object for an access to its stored value is
the declared type of the object, if any."

So the effective type of ca is array of sizeof(int) char. Always. However
it is accessed. The concept of effective type is used in aliasing rules.
C90 didn't define an effective type, but it has a problem because malloc'd
(allocated) objects don't have a declared type so the aliasing rules
in C90 don't work properly for malloc'd objects. "Effective type" was
invented to rectify this problem. With a declared effective type mimics
the C90 semantics i.e. is the same as the declared type, with malloc'd
objects it depends on what was last written to the object.

It has to be possible for the effective type of an object to be different
to the type of an lvalue used to access that object or there would be no
point in having the term. Specifically rules in C99 6.5p7 such as the
following depend on this difference:

"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

- a type compatible with the effective type of the object

....


- a character type."
Accessing
the value of '*ip' through its effective type (int) can cause
undefined behavior for any number of reasons, alignment, invalid
pointer, trap representation for int, and so on. But not at all how
the block of memory into which 'ip' points was defined.

Absolutely because of this, which is what the aliasing rules in 6.5p7 are
all about. Your example above violates 6.5p7 so it invokes undefined
behaviour irrespective of any of the other issues you mention. This is
also true in C90 which makes the same requirements without using the
concept of effective type (but is broken for malloc'd objects).
Remember, memory is memory, or storage is storage, or at the lowest
level in C, bytes is bytes.

Not according to the C standard. Objects also have type associated with
them. That incidentally includes qualifiers. If you try to access a
volatile defined object using a non-volatile lvalue you get undefined
behaviour.
The declared type of a region of storage
(that is, the type in the expression that defined and caused storage
to be allocated) does not change the underlying nature of the bytes.

But type affects what code is generated to access those bytes.
There are no bytes that can only hold chars, or ints, or doubles, or
pointers.

Yes, you could use memcpy() etc. to put anything you like in them. What
you can't do is use any lvalue you like to put the corresponding type of
data in a declared object, even if size, alignment const,
volatile considerations are met. Take an example

long num = 1;
short *p = (short *)&l;

*p = 2;

This invokes undefined behaviour. The type of the lvalue *p is short but
the object it is accessing has an effective type of long. This is not a
combination permitted by C99 6.5p7. It is also undefind by C90 6.3. If
you're unsure the C90 text is a good place to start.

Why does the standard make this undefined? For efficiency reasons relating
to aliasing. Compiler optimisers need to track when an object can be
accessed in order, for example, to be able to hold its value in a
register. The compiler knows that num can't be accessed through a
short lvalue without invoking undefined behaviour and so can safely ignore
the *p = 2 side-effect for the purposes of optimising access to num. It
could continue to use the value of num it happened to have held in a
register, and maybe write it back to the memory object later clobbering
what was written by *p = 2.

Lawrence
 
S

S.Tobias

Just before I started preparing a post to c.s.c., Lawrence Kirby
answered Jack Klein's article and took an opposite point of view
on the issue of type of objects (which I completely agree with).

What now? Is anybody going to continue here, or shall I open a
new discussion in c.s.c., as advised by Jack Klein?
 
L

Lawrence Kirby

Just before I started preparing a post to c.s.c., Lawrence Kirby
answered Jack Klein's article and took an opposite point of view
on the issue of type of objects (which I completely agree with).

What now? Is anybody going to continue here, or shall I open a
new discussion in c.s.c., as advised by Jack Klein?

That's up to you and whether you are convinced or not. :)

Another place to look would be the rationale which has a reasonable
section on the whole basis of the aliasing rules.

Lawrence
 
S

S.Tobias

That's up to you and whether you are convinced or not. :)

I just thought the discussion itself would be very interesting
to read. But not to keep others unnecessaryly busy, I'll wait
for a better occasion.
Another place to look would be the rationale which has a reasonable
section on the whole basis of the aliasing rules.

Thanks, but I've already known that section before.

+++

I would still like to obtain an answer or some comments to my
original question regarding access of my_free.space in the same
manner as allocated buffers.
Can it cause problems in some environments?

Could array of some TYPE be allocated in a specific memory for
the TYPE? From the previous answers I can guess that probably it's
not a problem in practice, ie. all memory is "equal".

As for aliasing, I think it is reasonable to assume that no nasal
daemons would fly out if I clearly separated the code that accesses
my_free.space as its native type (char[]) and some other type
(eg. int).

As a side-note, I think it would be Standard-compliant if pointers
recorded in their value the object type they're pointing to (this
is what might break this illegal aliasing). This is similar to
a case discussed before on aliasing int a[2][2] array, where pointers
might record the size of an array (this is actually part of
the type characteristics of the array).
Is there a better alternative?

Well? Any suggestions?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top