Another sizeof question

R

Richard Tobin

Please excuse me if this has already been covered.

Given

char x[42];

is

sizeof(x[999])

any kind of error? If so, since the expression is not evaluated, how
would such an error be detected? What if the declaration was

int n = 42;
char x[n];

?

-- Richard
 
K

Keith Thompson

Please excuse me if this has already been covered.

Given

char x[42];

is

sizeof(x[999])

any kind of error?
[...]

I believe it's perfectly valid, and must yield 1.

x[999] is equivalent to *(x+999). The addition would invoke undefined
behavior, but only if it were evaluated.

I see no more reason for
sizeof(x[999])
to invoke UB than for
if (0) {
x[999];
}
to do so.
 
C

christian.bau

I believe it's perfectly valid, and must yield 1.

x[999] is equivalent to *(x+999). The addition would invoke undefined
behavior, but only if it were evaluated.

I see no more reason for
sizeof(x[999])
to invoke UB than for
if (0) {
x[999];
}
to do so.

It really must be fine. A similar situation is this one, which can be
found gazillion times in everyone's code::

something* p;
p = malloc (sizeof (*p));

This is the recommended idiom to allocate memory, and the expression
*p on its own would invoke undefined behaviour just like x[999]. It's
fine because *p is not evaluated.
 
A

Army1987

Please excuse me if this has already been covered.

Given

char x[42];

is

sizeof(x[999])

any kind of error? If so, since the expression is not evaluated, how
would such an error be detected? What if the declaration was

int n = 42;
char x[n];

Even in this case, x's type is a VLA, but x[999]'s type is char,
so it is not evaluated by sizeof.
 
R

Richard Tobin

christian.bau said:
It really must be fine. A similar situation is this one, which can be
found gazillion times in everyone's code::

something* p;
p = malloc (sizeof (*p));

Yes, that's a convincing argument.

-- Richard
 
R

Richard Tobin

int n = 42;
char x[n];
[/QUOTE]
Even in this case, x's type is a VLA, but x[999]'s type is char,
so it is not evaluated by sizeof.

Oops, yes. Then what about

int n = 42;
char x[n][n];

sizeof(x[999]);

I can't see how it could cause a problem in practice, because what would
the compiler do with the result of computing x[999] anyway?

I haven't looked at how compilers handle this sort of thing, but I assume
they perform a kind of abstract interpretation in which expressions are
evaluated for their type rather than their value.

-- Richard
 
C

Charlie Gordon

Richard Tobin said:
int n = 42;
char x[n];
Even in this case, x's type is a VLA, but x[999]'s type is char,
so it is not evaluated by sizeof.

Oops, yes. Then what about

int n = 42;
char x[n][n];

sizeof(x[999]);

I can't see how it could cause a problem in practice, because what would
the compiler do with the result of computing x[999] anyway?

I haven't looked at how compilers handle this sort of thing, but I assume
they perform a kind of abstract interpretation in which expressions are
evaluated for their type rather than their value.[/QUOTE]

This is a good example of the problem with the wording of the Standard
regarding VLA-typed arguments to sizeof. There is no reason to evaluate
x[999] to determine its size. It looks like a defect in C99 IMHO.
 
K

Kenneth Brody

Charlie said:
"Richard Tobin" <[email protected]> a écrit dans le message de [email protected]... [...]
Oops, yes. Then what about

int n = 42;
char x[n][n];

sizeof(x[999]);

I can't see how it could cause a problem in practice, because what would
the compiler do with the result of computing x[999] anyway?
[...]
This is a good example of the problem with the wording of the Standard
regarding VLA-typed arguments to sizeof. There is no reason to evaluate
x[999] to determine its size. It looks like a defect in C99 IMHO.

I assume you are referring to 6.5.3.4p2:

If the type of the operand is a variable length array type, the
operand is evaluated; otherwise, the operand is not evaluated and
the result is an integer constant.

Note that it says the operand is evaluated if it is a VLA, not if it
is a member of a VLA. In the above example, "sizeof(x[999])" would
not, IMO, evaluate the operand, because "x[999]" is not a VLA. Only
if you did "sizeof(x)" would it need to evaluate the operand "x".

Okay, hold on... (Isn't stream of consciousness writing fun?) I see
that x is a two-dimensional VLA in this new example. I guess that
that would mean that "x[999]" is a VLA.

Is it possible to have a VLA of different-sized VLAs? For example,
can the VLA "foo" have foo[1] point to a 3-element VLA while foo[2]
points to a 4-element VLA? If that can be done, then I can see why
it may be necessary to evaluate the operand of "sizeof(foo[x])".

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
R

Richard Tobin

Okay, hold on... (Isn't stream of consciousness writing fun?) I see
that x is a two-dimensional VLA in this new example. I guess that
that would mean that "x[999]" is a VLA.
Yes.

Is it possible to have a VLA of different-sized VLAs?

No. There's no syntax for declaring such a thing, and it doesn't
make much sense implementationally: how would you represent it?
And if you come up with a way to represent it, it's unlikely to
be better than an array of pointers to different-sized arrays.
For example,
can the VLA "foo" have foo[1] point to a 3-element VLA while foo[2]
points to a 4-element VLA?

Point to, but not be.

-- Richard
 
K

Keith Thompson

int n = 42;
char x[n];
Even in this case, x's type is a VLA, but x[999]'s type is char,
so it is not evaluated by sizeof.

Oops, yes. Then what about

int n = 42;
char x[n][n];

sizeof(x[999]);

I can't see how it could cause a problem in practice, because what would
the compiler do with the result of computing x[999] anyway?

I haven't looked at how compilers handle this sort of thing, but I assume
they perform a kind of abstract interpretation in which expressions are
evaluated for their type rather than their value.

Since x[999] is a VLA, it's evaluated even when it's an argument to
sizeof, so this invokes undefined behavior.

In practice, since the evaluation isn't going to do anything, I would
expect an implementation to just compute the size and not try to read
the out-of-bounds array element.

My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary. The committee needed to define when arguments to
sizeof are evaluated and when they aren't; it was easier to say that
VLAs are evaluated than to define exactly when evaluation is
necessary.
 
R

Richard Tobin

Keith Thompson said:
My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary.

As far as I can see, the evaluation is needed only when the operand of
sizeof is a type, rather than an object, for example sizeof(char [n]).

The description of sizeof talks about "the type of the operand"
regardless of whether the operand is an object or a type, which
is confusing - what is the type of a type?

In trying to come up with an example where the operand is an object,
and its size can vary depending on some variable, I tried:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int a = atoi(argv[1]);
int m=2, n=3;
char x[1][m], y[1][n];

printf("%d\n", (int)sizeof((a > 0 ? x: y)[1]));
return 0;
}

The conditional wouldn't work if they weren't VLAs, because char[2]
and char[3] aren't compatible. But char[m] and char[n] are compatible
even when m=2 and n=3. The standard authors anticipated this, and
made it undefined behaviour when in a context where compatible types
are required, two VLAs have different sizes.

-- Richard
 
B

Bart van Ingen Schenau

Keith said:
In practice, since the evaluation isn't going to do anything, I would
expect an implementation to just compute the size and not try to read
the out-of-bounds array element.

My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary. The committee needed to define when arguments to
sizeof are evaluated and when they aren't; it was easier to say that
VLAs are evaluated than to define exactly when evaluation is
necessary.
My guess is that the committee was thinking of two different uses of
VLA's with sizeof and tried to catch them both in a single statement.

In the case of
int n = 10;
int foo[n];
sizeof foo;
the sub-expression foo has to be evaluated in the sense that at runtime
the determination must be made how big foo actually is.
If foo were a non-VLA type, the determination would be done at
compile-time.

In the case of
int foo();
sizeof(x[foo()]);
the sub-expression foo() has to be evaluated (i.e. the function foo is
called) in order to determine the size of the type x[foo()].

I think that they tried to catch both these situations in a single
sentence and failed at it.
I think a better wording of the intentions (as I understand them) would
be:

2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand.
If the operand has a variable length array type the result is an
integer; otherwise, the result is an integer constant.
If the operand names a variable length array type, the sub-expressions
in the operand will be evaluated; otherwise, the operand is not
evaluated.

This wording makes it clear that for expressions of VLA type, the
expression itself will not be evaluated, but sizeof will also not yield
a constant value (because the size of the VLA is only known at
runtime).

Bart v Ingen Schenau
 
K

Keith Thompson

Bart van Ingen Schenau said:
Keith said:
In practice, since the evaluation isn't going to do anything, I would
expect an implementation to just compute the size and not try to read
the out-of-bounds array element.

My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary. The committee needed to define when arguments to
sizeof are evaluated and when they aren't; it was easier to say that
VLAs are evaluated than to define exactly when evaluation is
necessary.
My guess is that the committee was thinking of two different uses of
VLA's with sizeof and tried to catch them both in a single statement.

In the case of
int n = 10;
int foo[n];
sizeof foo;
the sub-expression foo has to be evaluated in the sense that at runtime
the determination must be made how big foo actually is.
If foo were a non-VLA type, the determination would be done at
compile-time.

Determining the size of the object foo doesn't require evaluating
'foo' any more than determining the size of n requires evaluating it
to determine that its current value is 10. For a non-VLA type, the
size is known at compilation time. For a VLA type, the size is
associated with the type, not with some object of the type.

For example, given:

int n = 10;
typedef int vla[n];
vla foo, bar;

the size of the type "vla" will, in any sane implementation, be stored
once (probably in some anonymous object known to the compiler).
In the case of
int foo();
sizeof(x[foo()]);
the sub-expression foo() has to be evaluated (i.e. the function foo is
called) in order to determine the size of the type x[foo()].
Agreed.

I think that they tried to catch both these situations in a single
sentence and failed at it.
I think a better wording of the intentions (as I understand them) would
be:

2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand.
If the operand has a variable length array type the result is an
integer; otherwise, the result is an integer constant.
If the operand names a variable length array type, the sub-expressions
in the operand will be evaluated; otherwise, the operand is not
evaluated.

This wording makes it clear that for expressions of VLA type, the
expression itself will not be evaluated, but sizeof will also not yield
a constant value (because the size of the VLA is only known at
runtime).

Close, but you still have to allow for types that contain VLAs. For
example, this:

int obj[10][n];

declares an array of fixed length 10, but each element of that array
is a VLA.
 
B

Ben Bacarisse

Keith Thompson said:
Bart van Ingen Schenau said:
Keith said:
In practice, since the evaluation isn't going to do anything, I would
expect an implementation to just compute the size and not try to read
the out-of-bounds array element.

My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary. The committee needed to define when arguments to
sizeof are evaluated and when they aren't; it was easier to say that
VLAs are evaluated than to define exactly when evaluation is
necessary.
My guess is that the committee was thinking of two different uses of
VLA's with sizeof and tried to catch them both in a single statement.

In the case of
int n = 10;
int foo[n];
sizeof foo;
the sub-expression foo has to be evaluated in the sense that at runtime
the determination must be made how big foo actually is.
If foo were a non-VLA type, the determination would be done at
compile-time.

Determining the size of the object foo doesn't require evaluating
'foo' any more than determining the size of n requires evaluating it
to determine that its current value is 10. For a non-VLA type, the
size is known at compilation time. For a VLA type, the size is
associated with the type, not with some object of the type.

For example, given:

int n = 10;
typedef int vla[n];
vla foo, bar;

the size of the type "vla" will, in any sane implementation, be stored
once (probably in some anonymous object known to the compiler).
In the case of
int foo();
sizeof(x[foo()]);
the sub-expression foo() has to be evaluated (i.e. the function foo is
called) in order to determine the size of the type x[foo()].

Agreed.

I agree if you read this example to mean that x is a type. If x is an
object, then your argument above applies, does it not? If x is a
type, then the (new) type x[foo()] can only have its size determined at
run time by 'evaluation' of the type. I put that in quotes because I
doubt that normal evaluation rules apply (for example, the 'value' has
no type in C).

I would favour going along with your argument above ("Determining the
size of the object foo doesn't require evaluating 'foo'") and altering
the standard to distinguish between sizeof applied to an expression
and sizeof applied to a type. The two cases are distinct in the
syntax, so I see no trouble with highlighting that they are different.
(In fact I think this would help; describing sizeof(int) as an
operator applied to an operand -- like all the other unary operators
-- is misleading.)

The wording would state that when the operand of sizeof is an
expression it is never evaluated. When sizeof appears in the form
where it precedes a type-name T in brackets, the behaviour would be
described using whatever wording is already used to describe the
evaluations of expressions embedded in types. It would probably be
necessary to add a clause that states that the result of sizeof is
only an integer constant when the type is not a VLA type.

This has two potential problems:

(1) It imposes a restriction in implementations: they must do what you
say all sane implementations do already which is to store the size of
VLA objects somewhere (if they can't determine it by static analysis).

(2) That I am wrong, and there are cases when the size of an
expression can't be determined without evaluation (give the sane idea
of recording VLA sizes above). I can't think of one but that does not
mean much.
 
B

Ben Bacarisse

Keith Thompson said:
My guess is that there's some case involving VLAs (or at least the
committee thought there was some case) where the evaluation is
actually necessary.

As far as I can see, the evaluation is needed only when the operand of
sizeof is a type, rather than an object, for example sizeof(char
[n]).

I should have read the rest of the thread! You've put it so much
better.
<snip>
 
K

Kenneth Brody

Keith Thompson wrote:
[...]
Determining the size of the object foo doesn't require evaluating
'foo' any more than determining the size of n requires evaluating it
to determine that its current value is 10. For a non-VLA type, the
size is known at compilation time. For a VLA type, the size is
associated with the type, not with some object of the type.

For example, given:

int n = 10;
typedef int vla[n];
vla foo, bar;

the size of the type "vla" will, in any sane implementation, be stored
once (probably in some anonymous object known to the compiler).
[...]

Consider:

size_t foo(int n)
{
int bar[n];
return sizeof bar;
}

Doesn't "bar" need to be evaluated, at least to the point of finding
out where its size has been stored?

(I don't use VLAs, as many [most?] of the compilers I use don't
support it, so this is mostly a thought experiment to me. But it
does make me wonder.)

What about:

#include <stdio.h>
int main(void)
{
int n=10;
int x=1,y=1;
char vla[n][n];
char non_vla[10][10];
size_t foo = sizeof vla[x++];
size_t bar = sizeof non_vla[y++];
printf("x is now %d\n",x);
printf("y is now %d\n",y);
return 0;
}

Since "vla[x++]" is a VLA, and therefore must be evaluated, does this
mean that side-effects are done as well? Will x be 2?

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
K

Keith Thompson

Ben Bacarisse said:
I would favour going along with your argument above ("Determining the
size of the object foo doesn't require evaluating 'foo'") and altering
the standard to distinguish between sizeof applied to an expression
and sizeof applied to a type. The two cases are distinct in the
syntax, so I see no trouble with highlighting that they are different.
(In fact I think this would help; describing sizeof(int) as an
operator applied to an operand -- like all the other unary operators
-- is misleading.)
[...]

I was about to say that the sizeof operator applied to a parenthesized
type name isn't the only case of an operator that doesn't really act
like one (because its operand isn't any kind of expression). The left
operand of the "." or "->" operator is an expression, but the right
operand is an identifier that must be the name of a struct or union
member; that "operand" is not evaluated, at least not in the same way
that an expression is evaluated.

Fortunately, I took a look at the grammar, and it turns out that
they're defined as postfix operators, just as (IMHO) they should be:

postfix-expression:
primary-expression
...
postfix-expression . identifier
postfix-expression -> identifier

In a sense, there's distinct postfix ".member" operator for each
member of each struct or union type, and likewise for "->". I'm not
sure why I thought they were defined as binary operators.

IMHO, it would have made more sense for 'sizeof expression' to be
treated as a unary-expression, but for 'sizeof ( type-name )' to be a
distinct kind of expression. Other symbols, such as "-" and "&",
denote two distinct operations, so the overloading wouldn't be a
problem.

But the current definition, though it introduces some special cases,
doesn't really cause any serious problems, so I don't think I'd
advocate changing the language.
 
K

Keith Thompson

Kenneth Brody said:
Keith Thompson wrote:
[...]
Determining the size of the object foo doesn't require evaluating
'foo' any more than determining the size of n requires evaluating it
to determine that its current value is 10. For a non-VLA type, the
size is known at compilation time. For a VLA type, the size is
associated with the type, not with some object of the type.

For example, given:

int n = 10;
typedef int vla[n];
vla foo, bar;

the size of the type "vla" will, in any sane implementation, be stored
once (probably in some anonymous object known to the compiler).
[...]

Consider:

size_t foo(int n)
{
int bar[n];
return sizeof bar;
}

Doesn't "bar" need to be evaluated, at least to the point of finding
out where its size has been stored?

Evaluating 'bar' means fetching the values of its elements; since they
haven't been initialized, doing so would invoke undefined behavior.

What needs to be "evaluated" is the *type* of bar -- just as it would
be if bar were an ordinary array. (In the latter case the evaluation
is trivial, since the size is constant.)
(I don't use VLAs, as many [most?] of the compilers I use don't
support it, so this is mostly a thought experiment to me. But it
does make me wonder.)

What about:

#include <stdio.h>
int main(void)
{
int n=10;
int x=1,y=1;
char vla[n][n];
char non_vla[10][10];
size_t foo = sizeof vla[x++];
size_t bar = sizeof non_vla[y++];
printf("x is now %d\n",x);
printf("y is now %d\n",y);
return 0;
}

Since "vla[x++]" is a VLA, and therefore must be evaluated, does this
mean that side-effects are done as well? Will x be 2?

According to the current wording of the standard, x++ must be
evaluated, but y++ must not be evaluated.

But also, consider this:

#include <stdio.h>
int main(void)
{
int n = 10;
int x = 1, y = 1;
char vla[10][n];
char non_vla[n][10];
size_t foo = sizeof vla[x++];
size_t bar = sizeof non_vla[y++];
printf("x is now %d\n", x);
printf("y is now %d\n", y);
return 0;
}

The standard, if taken literally, requires x to be incremented, but
not y, since 'vla' is a VLA, but 'non_vla' is not a VLA (though it's a
fixed-size array of VLAs).

And here's an even more amusing example:

#include <stdio.h>

int func(void)
{
puts("In func, returning 42");
return 42;
}

int main(void)
{
size_t size = sizeof(int[func()][10]);
printf("size = %d\n", (int)size);
return 0;
}

Since the argument to sizeof is a non-VLA type, it must not be
"evaluated" -- but it *must* be "evaluated" in order to determine the
size.

The fix is (a) to refer to "variably-modified" types rather than just
to VLAs, (b) to "evaluate" the argument of sizeof only if it's a
variably-modified type name, not if it's an expression, and (c) to
define what it means to "evaluate" a type name (namely, to evaluate
any expressions appearing within it).

As a programmer, it's easy to avoid this problem. Most 'sizeof'
expressions should apply to expressions anyway (usually to lvalues).
If you really need to apply 'sizeof' to a VLA type, keep it simple;
either define the type using expressions with no side effects, or use
a typedef so the side effects occur only when the type is first
declared.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top