Sizeof query

James Kuyper · Sep 16, 2009

Dik said:
Isn't sizeof(int) implementation defined?

"The expression ++E is equivalent to (E+=1)." (6.5.3.1p2)
"The type of an assignment expression is the type of the left operand
...." (6.5.16p3).

Ben Bacarisse · Sep 16, 2009

The hard part being, of course, remembering what sort of automatic
conversions apply to new++

I seem to recall these being defined in various circumstances:

usual arithmetic conversions
integer promotions
default argument promotions
default maze of promotion
usual twisty conversions

...all different.

Now will one of those result in new++ being an int? It means new=new+1, and
the 1 is an int. Something could happen.

I don't think conversions come into it. sizeof needs to know the type
of new++. The description of ++ directs us to the section on += where
we read that the type of this expression if that of the left hand side
(modulo type qualification). I don't think conversions come into it.

Before running the code, I guessed correctly. But I have a feeling it was
because there were a pair of hidden conversions that cancelled each other
out, not because there were no hidden conversions at all.

I think it is the latter -- there are no conversions involved.

Ben Bacarisse · Sep 16, 2009

Ben Bacarisse said:
I don't think conversions come into it.

I don't think conversions come into it.

there are no conversions involved.

I did not intend to be so repetitive or sound hectoring. I re-order
the text a few time and should have tidied up afterwards.

James Kuyper · Sep 16, 2009

James said:
something in my > desire to become evil?

Isn't sizeof(int) implementation defined?

"The expression ++E is equivalent to (E+=1)." (6.5.3.1p2)[/QUOTE]

Sorry - wrong operator, wrong citation - I'm not doing well this
morning. Correction:

You're apparently assuming that integer promotions must be applied to
new++. However, as footnote 48 says, "48) The integer promotions are
applied only: as part of the usual arithmetic conversions, to certain
argument expressions, to the operands of the unary +, -, and ~
operators, and to both operands of the shift operators, as specified by
their respective subclauses." Footnotes are not normative, but footnote
48 correctly summarizes the fact, that you can verify by reading the
normative text, that those are the only locations where the integer
promotions apply. The postfix ++ operator is not one of those locations.

Ben Bacarisse · Sep 16, 2009

James Kuyper said:
Sorry - wrong operator, wrong citation - I'm not doing well this
morning. Correction:

You're apparently assuming that integer promotions must be applied to
new++. However, as footnote 48 says, "48) The integer promotions are
applied only: as part of the usual arithmetic conversions, to certain
argument expressions, to the operands of the unary +, -, and ~
operators, and to both operands of the shift operators, as specified
by their respective subclauses." Footnotes are not normative, but
footnote 48 correctly summarizes the fact, that you can verify by
reading the normative text, that those are the only locations where
the integer promotions apply. The postfix ++ operator is not one of
those locations.

I disagree. Specifically, I disagree that conversions are excluded
from the ++ operator. Not that it matters in this case, since there
is no evaluation involved, but the description of ++ (6.5.2.4 p2)
says: "See the discussions of additive operators and compound
assignment for information on constraints, types, and /conversions/
[...]" (my emphasis). The description of compound assignment is clear
that, except to the fact that the lvalue E1 is evaluated only once,
E1 += E2 behaves exactly like E1 = E1 + E2.

I think this only matters on "odd" architectures.

Dik T. Winter · Sep 16, 2009

>
> Isn't sizeof(int) implementation defined?

I should have thought getter...

William Hughes · Sep 16, 2009

An object is, for purposes of that, equivalent to an array of one object,
so "&obj+1" is a defined value which would be the address of the next
object if there were an array of them.

Ok, we now have (&obj+1) - (&obj) = 1

However, It would seem to me that (when obj is not a char)

(char*)(&obj+1) and (char*)(&obj)
do not point within the same object

so the subtraction is still undefined. (admittedly, only
a perverse implementation would get this wrong)

- William Hughes

Dik T. Winter · Sep 16, 2009

>
> I should have thought getter...

And I think I ought to stop posting.

Keith Thompson · Sep 16, 2009

William Hughes said:
Ok, we now have (&obj+1) - (&obj) = 1

However, It would seem to me that (when obj is not a char)

(char*)(&obj+1) and (char*)(&obj)
do not point within the same object

so the subtraction is still undefined. (admittedly, only
a perverse implementation would get this wrong)

They don't point *within* the same object, but (char*)(&obj+1) points
just past the end of the object, which is permitted.

Any object may be treated as an array of char. (I don't have my copy
of the standard handy at the moment.)

William Hughes · Sep 16, 2009

[...]

Ok, we now have (&obj+1) - (&obj) = 1

Click to expand...

However, It would seem to me that (when obj is not a char)

Click to expand...

(char*)(&obj+1) and (char*)(&obj)
do not point within the same object

Click to expand...

so the subtraction is still undefined. (admittedly, only
a perverse implementation would get this wrong)

Click to expand...

They don't point *within* the same object, but (char*)(&obj+1) points
just past the end of the object, which is permitted.

Indeed. (note to self, when being insanely pedantic, do
not be sloppy)

Any object may be treated as an array of char. (I don't have my copy
of the standard handy at the moment.)

Yes, one character past an object that *may be treated* as an
array of char, not one character past an object that *is*
an array of char.

So my question becomes: Is this enough
wiggle room for the DS2K to insert
nasal demons? (We can't invoke the "as if"
rule without known what the "correct" behaviour
is)

On the other hand can you argue the type of pointer
makes no difference in determining whether where
they point is "legal" for the purpose of pointer
subtraction?

(Boy the light emanating from here won't reach practical
for decades)

- William Hughes

jameskuyper · Sep 16, 2009

William said:
[...]

Ok, we now have (&obj+1) - (&obj) = 1

Click to expand...

However, It would seem to me that (when obj is not a char)

Click to expand...

(char*)(&obj+1) and (char*)(&obj)
do not point within the same object

Click to expand...

so the subtraction is still undefined. (admittedly, only
a perverse implementation would get this wrong)

Click to expand...

They don't point *within* the same object, but (char*)(&obj+1) points
just past the end of the object, which is permitted.

Click to expand...

Indeed. (note to self, when being insanely pedantic, do
not be sloppy)

Any object may be treated as an array of char. (I don't have my copy
of the standard handy at the moment.)

Click to expand...

Yes, one character past an object that *may be treated* as an
array of char, not one character past an object that *is*
an array of char.

No, that's not the case, and it's not what's relevant. What is
relevant is that the rules for pointer arithmetic have explicit
special exceptions for pointers one past the end of an array; It is
for the purpose of those rule, among others, that a single object can
be treated as a 1-element array.

Section 6.5.6p8 says: "... if the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, ..."; since obj is the only element
in the array, &obj points at the last element of the array, and you
can safely write &obj+1 (you cannot safely write &obj+2 or &obj-1).
You can treat any object as an array of unsigned char, and (unsigned
char*)(&obj+1) points one char beyond the end of the char array
corresponding to obj.

Section 6.5.6p9 says: "When two pointers are subtracted, both shall
point to elements of the same array object, or one past the last
element of the array object; ...", so the subtraction is also
perfectly acceptable.

So my question becomes: Is this enough
wiggle room for the DS2K to insert
nasal demons? ...
No.

....
(Boy the light emanating from here won't reach practical
for decades)

As a purely practical matter, checked-pointer implementations are
pretty rare, and used mainly for debugging purposes. However, I
believe that there are non-DS9K implementations which do perform
optimizations (such as removal of anti-aliasing checks) that are
guaranteed to work as intended only if developers avoid performing
pointer arithmetic under circumstances where that arithmetic has
undefined behavior. Therefore, it is of practical important to know
whether or not that is the case.

Ben Bacarisse · Sep 16, 2009

Dik T. Winter said:
...

Click to expand...

<snipped code that included:>

char new; printf("%ld\n" (long)sizeof(new++));

And I think I ought to stop posting.

Click to expand...

That would be a shame! Anyway, sizeof(int) is not involved in the
code as posted.

William Hughes · Sep 16, 2009

William said:
William said:

[...]
Ok, we now have (&obj+1) - (&obj) = 1
However, It would seem to me that (when obj is not a char)
(char*)(&obj+1) and (char*)(&obj)
do not point within the same object
so the subtraction is still undefined. (admittedly, only
a perverse implementation would get this wrong)
They don't point *within* the same object, but (char*)(&obj+1) points
just past the end of the object, which is permitted.

Click to expand...

Click to expand...

Indeed. (note to self, when being insanely pedantic, do
not be sloppy)

Click to expand...

Yes, one character past an object that *may be treated* as an
array of char, not one character past an object that *is*
an array of char.

Click to expand...

No, that's not the case, and it's not what's relevant. What is
relevant is that the rules for pointer arithmetic have explicit
special exceptions for pointers one past the end of an array; It is
for the purpose of those rule, among others, that a single object can
be treated as a 1-element array.

Section 6.5.6p8 says: "... if the expression P points to the last
element of an array object, the expression (P)+1 points one past the
last element of the array object, ..."; since obj is the only element
in the array, &obj points at the last element of the array, and you
can safely write &obj+1 (you cannot safely write &obj+2 or &obj-1).
You can treat any object as an array of unsigned char, and (unsigned
char*)(&obj+1) points one char beyond the end of the char array
corresponding to obj.

Ok we now have several ways of looking at things

obj + 1, a "valid" place to point
We know we can treat obj as a character array
There is a char array corresponding to obj

<are the last two equivalent?>

My question remains: Is there enough
wiggle room for the DS2K to insert
nasal demons?

Possible analyses.

A Defined: If the pointers point into an array, or 1 past
the array things are defined. We can treat obj as
an array of char, so things are definied.

B Undefined: For things to be defined we need both
pointers to point into an array or one past the
array. The fact that we can treat obj as an
array of char does not mean we have an array
of char.

C Defined: For things to be defined we need both
pointers to point into an array or one past the
array. The type of the array does not have
to match the type of pointers. Casting to
(char*) may change what a pointer points at, it
does not change where it points.

Personally, I think B trumps A (being able to
treat obj as an array of char does not
provide enough existence of the char array to
make the subtraction defined) but C trumps B.

-William Hughes

Keith Thompson · Sep 16, 2009

William Hughes said:
Ok we now have several ways of looking at things

obj + 1, a "valid" place to point
We know we can treat obj as a character array
There is a char array corresponding to obj

<are the last two equivalent?>

My question remains: Is there enough
wiggle room for the DS2K to insert
nasal demons?

Possible analyses.

A Defined: If the pointers point into an array, or 1 past
the array things are defined. We can treat obj as
an array of char, so things are definied.

B Undefined: For things to be defined we need both
pointers to point into an array or one past the
array. The fact that we can treat obj as an
array of char does not mean we have an array
of char.

I disagree with your point B. If a pointer just past the end of the
notional char array isn't a valid pointer, then we're not able to
treat the object as an array of char. But the standard says that we
*can* treat an object as an array of char; therefore, the pointer just
past the end of the char array is valid.

On the other hand, I don't think the standard actually says that an
object can be treated as an array of char, at least not in so many
words. It's implied by C99 6.2.6.1:

Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of
which are either explicitly specified or implementation-defined.

...

Values stored in non-bit-field objects of any other object type
consist of n * CHAR_BIT bits, where n is the size of an object of
that type, in bytes. The value may be copied into an object of
type unsigned char [n] (e.g., by memcpy); the resulting set of
bytes is called the object representation of the value.

(I've replaced the multiplication symbol by *.)

Note that it talks about *copying* the value into an array of unsigned
char, not treating it in place as if it were an array of unsigned
char. There's may be other wording that guanttes that the latter will
work as well.

The distinction between char and unsigned char may or may not be
significant. (I prefer to use unsigned char for this kind of thing
anyway.)

[snip]

William Hughes · Sep 16, 2009

[...]

Ok we now have several ways of looking at things

Click to expand...

obj + 1, a "valid" place to point
We know we can treat obj as a character array
There is a char array corresponding to obj

Click to expand...

<are the last two equivalent?>

Click to expand...

My question remains: Is there enough
wiggle room for the DS2K to insert
nasal demons?

Click to expand...

Possible analyses.

Click to expand...

A Defined: If the pointers point into an array, or 1 past
the array things are defined. We can treat obj as
an array of char, so things are definied.

Click to expand...

B Undefined: For things to be defined we need both
pointers to point into an array or one past the
array. The fact that we can treat obj as an
array of char does not mean we have an array
of char.

Click to expand...

I disagree with your point B. If a pointer just past the end of the
notional char array isn't a valid pointer, then we're not able to
treat the object as an array of char.

Ok, you have convinced me.

But the standard says that we
*can* treat an object as an array of char; therefore, the pointer just
past the end of the char array is valid.

But just when you thought it was safe to go
back in the water.

On the other hand, I don't think the standard actually says that an
object can be treated as an array of char, at least not in so many
words. It's implied by C99 6.2.6.1:

Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of
which are either explicitly specified or implementation-defined.

...

Values stored in non-bit-field objects of any other object type
consist of n * CHAR_BIT bits, where n is the size of an object of
that type, in bytes. The value may be copied into an object of
type unsigned char [n] (e.g., by memcpy); the resulting set of
bytes is called the object representation of the value.

(I've replaced the multiplication symbol by *.)

Note that it talks about *copying* the value into an array of unsigned
char, not treating it in place as if it were an array of unsigned
char.

And I change back. Clearly a statement that something
can exist is not the same as a statement that something
does exist.

There's may be other wording that guanttes that the latter will
work as well.

The distinction between char and unsigned char may or may not be
significant. (I prefer to use unsigned char for this kind of thing
anyway.)

Oh well, there is always C.

- William Hughes

jameskuyper · Sep 16, 2009

William Hughes wrote:
....

Possible analyses.

A Defined: If the pointers point into an array, or 1 past
the array things are defined. We can treat obj as
an array of char, so things are definied.

B Undefined: For things to be defined we need both
pointers to point into an array or one past the
array. The fact that we can treat obj as an
array of char does not mean we have an array
of char.

The fact that it can be treated as an array of char is all we need to
know in order to make the behavior defined. That's what "can be
treated as an array of char" means.

C Defined: For things to be defined we need both
pointers to point into an array or one past the
array. The type of the array does not have
to match the type of pointers.

The behavior of pointer arithmetic is defined in terms of the elements
of the corresponding array. In order for the rules of pointer
arithmetic to make any sense whatsoever, the element type of the array
being referred to must be the same at the type the pointers point at.
If that were not the case, then in the context of a declaration

int array[10];

according to the rules in 6.5.6p8, the expression (char*)array + 5
would refer to array[5], regardless of what value sizeof(int) has.

Personally, I think B trumps A (being able to
treat obj as an array of char does not
provide enough existence of the char array to
make the subtraction defined) but C trumps B.

Correctness trumps everything, and of the three options you've given,
only A is correct.

jameskuyper · Sep 16, 2009

Keith Thompson wrote:
....

On the other hand, I don't think the standard actually says that an
object can be treated as an array of char, at least not in so many
words. It's implied by C99 6.2.6.1:

Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of
which are either explicitly specified or implementation-defined.

...

Values stored in non-bit-field objects of any other object type
consist of n * CHAR_BIT bits, where n is the size of an object of
that type, in bytes. The value may be copied into an object of
type unsigned char [n] (e.g., by memcpy); the resulting set of
bytes is called the object representation of the value.

(I've replaced the multiplication symbol by *.)

Note that it talks about *copying* the value into an array of unsigned
char, not treating it in place as if it were an array of unsigned
char. There's may be other wording that guanttes that the latter will
work as well.

The key point in putting together the inference is the memcpy()
reference. That memcpy() is given only as an example, and not
explicitly stated as being the only way of doing it, implies that
there's nothing magical about memcpy(). In other words, any code which
has the same defined behavior as memcpy() should do the job equally
well. It's perfectly feasible to write ordinary C code that does the
same thing as memcpy(), though not perhaps as efficiently as the built-
in version. Given the definition of memcpy(), it's pretty hard for me
to see how such code could perform such a copy unless the object being
copied could indeed be treated "in place as if it were an array of
unsigned char."

William Hughes · Sep 16, 2009

William Hughes wrote:

...

The fact that it can be treated as an array of char is all we need to
know in order to make the behavior defined. That's what "can be
treated as an array of char" means.

Indeed. I have agreed to this. However, it has been
noted that the standard does not use the form
"can be treated as an array of char", but only
mandates that the bytes can be copied. I do not
agree that this means that the pointer arithmetic must
be defined.

The behavior of pointer arithmetic is defined in terms of the elements
of the corresponding array. In order for the rules of pointer
arithmetic to make any sense whatsoever, the element type of the array
being referred to must be the same at the type the pointers point at.

Indeed. The *value* of

ptr1-ptr2

depends on the type of ptr1 (which is the same
as the type of ptr2). However, the *validity* of

ptr1-ptr2

is defined in terms of both pointing into
or one beyond, the same object.
If pointers ptr3 and ptr4 with different types can be said
to point to the same location then the
validity may be determined by an object whose
type is different from *ptr1

- William Hughes

Flash Gordon · Sep 16, 2009

Dik said:
Isn't sizeof(int) implementation defined?

That's what I was thinking of, but I was wrong to think it was relevant.

Oh well, I'm nver going to use sizeof like that anyway.

William Hughes · Sep 16, 2009

Keith Thompson wrote:

...

On the other hand, I don't think the standard actually says that an
object can be treated as an array of char, at least not in so many
words. It's implied by C99 6.2.6.1:

Click to expand...

Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of
which are either explicitly specified or implementation-defined..

Click to expand...

...

Click to expand...

Values stored in non-bit-field objects of any other object type
consist of n * CHAR_BIT bits, where n is the size of an object of
that type, in bytes. The value may be copied into an object of
type unsigned char [n] (e.g., by memcpy); the resulting set of
bytes is called the object representation of the value.

Click to expand...

(I've replaced the multiplication symbol by *.)

Click to expand...

Note that it talks about *copying* the value into an array of unsigned
char, not treating it in place as if it were an array of unsigned
char. There's may be other wording that guanttes that the latter will
work as well.

Click to expand...

The key point in putting together the inference is the memcpy()
reference. That memcpy() is given only as an example, and not
explicitly stated as being the only way of doing it, implies that
there's nothing magical about memcpy(). In other words, any code which
has the same defined behavior as memcpy() should do the job equally
well. It's perfectly feasible to write ordinary C code that does the
same thing as memcpy(), though not perhaps as efficiently as the built-
in version. Given the definition of memcpy(), it's pretty hard for me
to see how such code could perform such a copy unless the object being
copied could indeed be treated "in place as if it were an array of
unsigned char."

I don't see this at all. It is not even necessary
for memcpy to read the array as char
(The array might have to be read in larger chunks)
And even if we do conclude that this section means that
we must be able to form and dereference a char
pointer that points within the object, the section
says nothing about doing pointer arithmetic
with (char*)(&obj +1)
- William Hughe

Why sizeof(main) = 1?	8	Dec 16, 2012
sizeof suggestion	24	Oct 17, 2010
sizeof implementation	2	Jul 11, 2010
size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
[MUDFLAP] Is sizeof(ARRAY[0]) equivalent to sizeof(*ARRAY) ?	46	Jan 9, 2013
pointer subtraction and sizeof	20	Jan 12, 2007
sizeof implementation	8	Jul 11, 2010
[TinyCC] sizeof of element of struct returned by function	18	Feb 20, 2013

Sizeof query

James Kuyper

Ben Bacarisse

Ben Bacarisse

James Kuyper

Ben Bacarisse

Dik T. Winter

William Hughes

Dik T. Winter

Keith Thompson

William Hughes

jameskuyper

Ben Bacarisse

William Hughes

Keith Thompson

William Hughes

jameskuyper

jameskuyper

William Hughes

Flash Gordon

William Hughes

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads