why cant functions return arrays

S

sanjay.vasudevan

Why are the following declarations invalid in C?

int f()[];
int f()[10];

It would be great if anyone could also explain the design decision
for such a language restricton.

Regards,
Sanjay
 
W

Walter Roberson

Why are the following declarations invalid in C?
int f()[];
int f()[10];
It would be great if anyone could also explain the design decision
for such a language restricton.

Where would the array get temporarily stored? Who would be responsible
for freeing it?

In a stack-based machine that mixes parameters and control, the
information for the called routine is already on the stack by the time
the hypothethical return array is created. As the return array could
be of any size, the caller cannot know how much space to reserve
"under" the stack frame for the called routine. Therefor the called
routine would either have to somehow insert the space for the array
"above" the stack frame for the routine itself (so that it still
exists when the routine returns), or else the called routine would
have to return an address of the array. If it returns an address of
the array, the address cannot be that of an automatic variable in
the called routine, as after the call those automatic variables
become inaccessible. It can't use a static variable because it doesn't
know the maximum size. So the only way to make that work would be
to malloc() an array to store the data into, store the data, and
return the pointer to the malloc'd area. But then what's the contract
about who frees the array? If you require that the calling routine
-automatically- frees the array, then you add noticable complexity
to the language. If you require that the calling routine -explicitly-
free the array, then you have no more expressive power than you
already have available by making the return type a pointer and returning
the pointer to a malloc'd area.
 
J

jacob navia

Walter said:
Why are the following declarations invalid in C?
int f()[];
int f()[10];
It would be great if anyone could also explain the design decision
for such a language restricton.

Where would the array get temporarily stored?

Either in its final destination or in a temporary variable, as
all results from functions.
Who would be responsible
for freeing it?

Nobody, it would be either in a result variable or in a
temporary with automatic storage type.
In a stack-based machine that mixes parameters and control, the
information for the called routine is already on the stack by the time
the hypothethical return array is created. As the return array could
be of any size, the caller cannot know how much space to reserve
"under" the stack frame for the called routine. Therefor the called
routine would either have to somehow insert the space for the array
"above" the stack frame for the routine itself (so that it still
exists when the routine returns), or else the called routine would
have to return an address of the array. If it returns an address of
the array, the address cannot be that of an automatic variable in
the called routine, as after the call those automatic variables
become inaccessible. It can't use a static variable because it doesn't
know the maximum size. So the only way to make that work would be
to malloc() an array to store the data into, store the data, and
return the pointer to the malloc'd area. But then what's the contract
about who frees the array? If you require that the calling routine
-automatically- frees the array, then you add noticable complexity
to the language. If you require that the calling routine -explicitly-
free the array, then you have no more expressive power than you
already have available by making the return type a pointer and returning
the pointer to a malloc'd area.


Then, please tell me why this works?

struct f { int array[80];};

struct f function(void)
{
struct f result;
return result;
}

This is the same.

Functions can't return arrays because... an arbitrary
decision was done ages ago that must be preserved so that
new software remains compatible with the errors of
the past.
 
J

jacob navia

Why are the following declarations invalid in C?

int f()[];
int f()[10];

It would be great if anyone could also explain the design decision
for such a language restricton.

Regards,
Sanjay

There is no reason. It is just a design mistake of the language.
 
B

Bartc

Walter Roberson said:
Why are the following declarations invalid in C?
int f()[];
int f()[10];
It would be great if anyone could also explain the design decision
for such a language restricton.

Where would the array get temporarily stored? Who would be responsible
for freeing it?

This is no different to the problem of returning a struct of arbitrary (but
known) size.
As the return array could be of any size

I thought C arrays were always a known fixed size, apart from VLAs.

I think it was just decided that arrays and structs should behave
differently (because arrays are a little special), and there is no technical
reason why arrays cannot be passed to and returned from functions.
 
J

jacob navia

Bartc said:
I think it was just decided that arrays and structs should behave
differently (because arrays are a little special), and there is no technical
reason why arrays cannot be passed to and returned from functions.

Exactly. I am of the same opinion.
 
U

user923005

Why are the following declarations invalid in C?

int f()[];
int f()[10];

It would be great if anyone could also explain the design decision
for  such a language restricton.

I think it is basically the same idea as the electoral college:
"Let's protect them from their own stupidity."

Of course, you can do this:
struct array_wrapper {
int foo[100];
};
and return one of those.
 
B

Ben Bacarisse

Bartc said:
I think it was just decided that arrays and structs should behave
differently (because arrays are a little special), and there is no technical
reason why arrays cannot be passed to and returned from functions.

A little history... The situation was not so odd at first. For some
time C restricted parameters and returned results to the basic types.
By the time that struct passing and returning was added, it was too
late to re-visit the secondary position held by arrays.
 
R

Richard Tobin

Exactly. I am of the same opinion.

If you were designing the language from scratch, it would be no
problem. But you'll have trouble fitting it in now. The natural
syntax is already taken. How would you pass or return an array? You
can't just use its name, because that syntax is already used for
passing or returning the address of the first element. How can you
declare the formal argument? Same problem.

-- Richard
 
H

Hallvard B Furuseth

Why are the following declarations invalid in C?
int f()[];
int f()[10];

Probably because of the language (mis)feature that array expressions
often decay to pointers to the first element of the array.

What would you do with that function? The 2nd variant:
int a[10] = f();
would presumably be a case of assigning the contents of an array to
another array, which C does not support. This won't compile either:
int a[10], b[10];
void foo() { a = b; }

Or if you would do this:
foo() { int *a; ... { ... a = f(); ... } }
then there is still an array-to-array assignment going on, only this
time to a temporary variable on the stack. However in this case it's
worse: The compiler may not be able to tell when the temporary becomes
"dead" so the space can be reused. It'd have to keep it around until
the function returns. This is because it created a pointer to the
temporary, which it won't do with other kinds of temporaries.


Your first version is worse still, int *a = f() would need to return
data of unknown size and put it on the stack.
 
D

Default User

Richard said:
If you were designing the language from scratch, it would be no
problem. But you'll have trouble fitting it in now. The natural
syntax is already taken. How would you pass or return an array? You
can't just use its name, because that syntax is already used for
passing or returning the address of the first element. How can you
declare the formal argument? Same problem.

Further, arrays aren't modifiable lvalues. You can't assign the results
of the function to an array, which limits the usefulness to a degree.
You would really have start over and make arrays a first-class type.




Brian
 
B

Bartc

Hallvard B Furuseth said:
Why are the following declarations invalid in C?
int f()[];
int f()[10];

Probably because of the language (mis)feature that array expressions
often decay to pointers to the first element of the array.

What would you do with that function? The 2nd variant:
int a[10] = f();
would presumably be a case of assigning the contents of an array to
another array, which C does not support. This won't compile either:
int a[10], b[10];
void foo() { a = b; }

Or if you would do this:
foo() { int *a; ... { ... a = f(); ... } }
then there is still an array-to-array assignment going on, only this
time to a temporary variable on the stack. However in this case it's
worse: The compiler may not be able to tell when the temporary becomes
"dead" so the space can be reused. It'd have to keep it around until
the function returns. This is because it created a pointer to the
temporary, which it won't do with other kinds of temporaries.

*a is a pointer to int. f() returns an array of int. So there is a type
mismatch and this should not compile, and the problem would not arise. I
don't think you can reasonably expect an array returned by a function to
'decay' to a pointer in the same way as a normal array.

(Creating a pointer to a value returned by a function is not a good idea
anyway, array or not)
 
D

Dik T. Winter

> I think it was just decided that arrays and structs should behave
> differently (because arrays are a little special), and there is no technical
> reason why arrays cannot be passed to and returned from functions.

Originally functions could also not return structs (and they could not
be passed as parameters). Currently both are allowed for structs, but
none is allowed for arrayes. I think the reason is that passing an
array as a parameter had already an established meaning (pass the address
for the first element).
 
D

Dik T. Winter

>
> Exactly. I am of the same opinion.

There is no technical reason. But it was already established long ago that
when an actual parameter was an array that what would be passed was not the
array but the address of the first element. Changing that to actually
passing the array would break a lot of code.
 
S

Shirsoft

Why are the following declarations invalid in C?
int f()[];
int f()[10];
It would be great if anyone could also explain the design decision
for such a language restricton.

Where would the array get temporarily stored? Who would be responsible
for freeing it?

In a stack-based machine that mixes parameters and control, the
information for the called routine is already on the stack by the time
the hypothethical return array is created. As the return array could
be of any size, the caller cannot know how much space to reserve
"under" the stack frame for the called routine. Therefor the called
routine would either have to somehow insert the space for the array
"above" the stack frame for the routine itself (so that it still
exists when the routine returns), or else the called routine would
have to return an address of the array. If it returns an address of
the array, the address cannot be that of an automatic variable in
the called routine, as after the call those automatic variables
become inaccessible. It can't use a static variable because it doesn't
know the maximum size. So the only way to make that work would be
to malloc() an array to store the data into, store the data, and
return the pointer to the malloc'd area. But then what's the contract
about who frees the array? If you require that the calling routine
-automatically- frees the array, then you add noticable complexity
to the language. If you require that the calling routine -explicitly-
free the array, then you have no more expressive power than you
already have available by making the return type a pointer and returning
the pointer to a malloc'd area.

I & Sanjay had this discussion on the returning array by functions.
Lets say that the language allowed 2 things.
1. Return array from a function by just copying the address of the
first element.
2. Creating array aliases like regular variables.

What would be wrong with the following code then:

int[] func2(int a[]);
void func1()
{
int a[5];
int &b[]=func2(a);//b is some sort of an alias to the returned type
}
int[] func2(int a[])
{
//do something
return a;
}
 
J

jacob navia

Hallvard B Furuseth wrote:
[snip]
To get this to work cleanly, the language basically needs a new class of
types: "array which does not decay to a pointer". Given that, one can
decide just how such a type can be used, and you'll also have the
answer to all of the above.

lcc-win features this kind of array using operator overloading.

Arrays that do not decay into pointers are very useful for making
o bound checked arrays easily
o Counted strings
and many other applications
 
B

Bartc

Hallvard B Furuseth said:
Bartc writes:
Hallvard B Furuseth said:
(e-mail address removed) writes:
Why are the following declarations invalid in C?
int f()[];
int f()[10];
(...)
Or if you would do this:
foo() { int *a; ... { ... a = f(); ... } }
*a is a pointer to int. f() returns an array of int. So there is a type
mismatch and this should not compile, and the problem would not arise. I
don't think you can reasonably expect an array returned by a function to
'decay' to a pointer in the same way as a normal array.

No, I don't expect that - because the language forbinds "int f()[10]".
However, yes I do expect "a = <expression of type array of int>" to be
treated the same way regardless of where that expression comes from.
If it did not, just what should the program be allowed to do with it?

Well, it could be assigned to an array:

*a=f(); where a already points to a suitable space. Or possibly:

int b[10];
b=f();

Or it could be passed directly to another function that takes arrays of the
same size:

g(f());
Would &f() be OK? *&f()? f()[0]? &f()[0]?

To get this to work cleanly, the language basically needs a new class of
types: "array which does not decay to a pointer". Given that, one can
decide just how such a type can be used, and you'll also have the
answer to all of the above.

Actually, adding such a new type of array wouldn't be that much of a
problem. But the special interaction between newarrays and pointers wouldn't
exist in the same way as for ordinary arrays.

So setting up dynamic arrays and indexing them as though they were fixed
arrays would not be possible.

So I don't see such newarrays being popular with C aficionados. Especially
as these arrays would have to be a fixed size so would have the same
limitations as early Pascal.

A new array type needs something extra, and that would need to be flexible
bounds, or bounds that are passed around with the array. But then, this
would be taking C to a slightly different level.
 
J

John Bode

Why are the following declarations invalid in C?
int f()[];
int f()[10];
It would be great if anyone could also explain the design decision
for such a language restricton.
Regards,
Sanjay
There is no reason. It is just a design mistake of the language.

That is, in the opinion of the very opinionated Jacob Navia.

I happen to think it was a good choice. For about 30 years now you
have been able to pass and return fixed-size arrays by value by
wrapping them in a structure.

But that just brings up the question, why can you return structs
containing arrays by value, but not arrays themselves? Surely any
*technical* issue for one is just as valid for the other; so why the
difference?

I *suspect* the reason behind the OP's query is that array objects are
non-modifiable lvalues:

int a[10], b[10];
b = a; /* illegal! */

Since you can't assign a new array value to an array type object, it
doesn't make sense to allow functions to return array types. That
would make it not a mistake in design for no good reason (per Jacob),
but a consequence of how arrays are handled by the rest of the
language. Whether "how arrays are handled by the rest of the
language" is a mistake in design or not is an open issue.
 
K

Keith Thompson

Why are the following declarations invalid in C?

int f()[];
int f()[10];

It would be great if anyone could also explain the design decision
for such a language restricton.

Handling arrays as first-class types and objects turns out to be a
non-trivial language design problem.

As others have mentioned, C didn't originally allow structures to be
assigned, passed as arguments, or returned as function results.
Adding this functionality wasn't much of a problem, partly because all
structures of a given type are the same size. The only really new
thing was that some operations required copying arbitrarily large (but
fixed) amounts of data, possibly by implicitly invoking memcpy() or
something like it.

A major for arrays is that array names are usually converted to
pointers; defining the syntax for first-class array objects and values
without breaking existing code would be difficult.

Also, arrays of a given type can usefully have different sizes.

Let's assume that we can work around the existing code problem by
defining a new kind of array, one that doesn't decay to a pointer.

Assume that f is a function that returns an array of int. Presumably
you'd want to be able to do something like this:

int arr[100];
arr = f();

What if f returns an array of 90 ints? Or 200? Do you reject it at
compile time? If so, you have to decide when writing the f *exactly*
how many elements it's going to return; that's far less flexible than
C's current (admittedly unwieldy) mechanisms. Or do you reject it
during execution? If so, how? Do you introduce an exception
mechanism? Do you just say it's undefined behavior? Or do you copy
just part of the array?

Regardless of what syntax you use to express them, arrays are
fundamentally special in ways that other types are not. C's current
way of dealing with that complexity is not pretty, but it works.

Having said all that, you could disallow the assignment and allow
array-returning functions to be used in initializers:

int arr[] = f();

That could probably be quite useful *if* you could manage to define it
without breaking existing code.
 
J

jacob navia

John Bode wrote:
[snip]
I *suspect* the reason behind the OP's query is that array objects are
non-modifiable lvalues:

int a[10], b[10];
b = a; /* illegal! */

Since you can't assign a new array value to an array type object, it
doesn't make sense to allow functions to return array types. That
would make it not a mistake in design for no good reason (per Jacob),
but a consequence of how arrays are handled by the rest of the
language. Whether "how arrays are handled by the rest of the
language" is a mistake in design or not is an open issue.

This is just the same for structures. They have to be the same size.

It would be the same for arrays, arrays would be needed to be the same
size.

But then, since size information is always discarded when passing an
array to a function, this would be difficult to achieve.

What I wanted to emphasize is that the whole handling of arrays
is completely crazy in C.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top