Inflexible array members

A

Adam Warner

Hi all,

One cannot return a pointer to an array type in C because C has no first
class array types. But one can return a pointer to a struct containing an
incomplete array via the illegal but widely supported zero array struct
hack:

#include <stdlib.h>

typedef struct byte_vector_t byte_vector_t;

struct byte_vector_t {
unsigned char byte[0];
};

int main() {
byte_vector_t *byte_vector=malloc(10);
byte_vector->byte[9]=42;
return 0;
}

It is frequently stated that flexible array members are a substitute for
the zero array struct hack. Let's see (by compiling file array.c below
with GNU C):

#include <stdlib.h>

typedef struct byte_vector_t byte_vector_t;

struct byte_vector_t {
unsigned char byte[];
};

int main() {
byte_vector_t *byte_vector=malloc(10);
byte_vector->byte[9]=42;
return 0;
}

$ gcc -std=c99 array.c
array.c:6: error: flexible array member in otherwise empty struct

GCC refuses to compile this code because C99 states (6.7.2.1,
paragraph 2):

A structure or union shall not contain a member with incomplete or
function type (hence, a structure shall not contain an instance of
itself, but may contain a pointer to an instance of itself), except
that the last member of a structure with more than one named member may
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
have incomplete array type; such a structure (and any union containing,
possibly recursively, a member that is such a structure) shall not be a
member of a structure or an element of an array.

This restriction is repeated in paragraph 16.

The struct types are intended to serve as self-documenting code and stop
me from incorrectly indexing a pointer to a single value (and vice versa).
Typedefs also lead to self-documenting code but provide no additional
compile-time type safety.

Is this restriction upon flexible array members sensible? The core syntax
and semantics are better because one should not index past the elements of
an array (and a zero sized array has no elements). Is there a standard way
I can maintain the extra type safety of structs with an incomplete array
member without having to pad those structs with a non-zero sized header?

Many thanks,
Adam
 
G

Guest

Adam said:
Hi all,

One cannot return a pointer to an array type in C because C has no first
class array types.

Sure you can. Typically, you won't really want to, but it's possible.

typedef char array[10];
array *f(void) {
/* ... */
}

You can also write it without a typedef as

char (*f(void))[10] {
/* ... */
}

You can also leave out the size.

typedef char array[];
array *f(void) {
/* ... */
}

or

char (*f(void))[] {
/* ... */
}

The rest of your message seems irrelevant after that.
 
A

Adam Warner

Adam said:
Hi all,

One cannot return a pointer to an array type in C because C has no first
class array types.

Sure you can. Typically, you won't really want to, but it's possible.

typedef char array[10];
array *f(void) {
/* ... */
}

You can also write it without a typedef as

char (*f(void))[10] {
/* ... */
}

You can also leave out the size.

typedef char array[];
array *f(void) {
/* ... */
}

or

char (*f(void))[] {
/* ... */
}

Excellent! Thank you for the correction.

#include <stdint.h>
#include <stdlib.h>

typedef uint8_t octet_vector_t[];

int main() {
octet_vector_t *vector=malloc(10);
(*vector)[9]=42;
return 0;
}

Please let me known if there's a way to avoid the explicit dereferencing
syntax for every array access.

Regards,
Adam
 
G

Guest

Adam said:
Excellent! Thank you for the correction.

Please let me known if there's a way to avoid the explicit dereferencing
syntax for every array access.

"Typically, you won't really want to." If you use a pointer to an
array's first element instead of one to the whole array, which is what
most functions accepting or returning arrays do, you won't have that
problem. Is there a reason that is not an option for you?
 
G

Guest

Harald said:
"Typically, you won't really want to." If you use a pointer to an
array's first element instead of one to the whole array, which is what
most functions accepting or returning arrays do, you won't have that
problem. Is there a reason that is not an option for you?

To clarify, I mean with typedefs.
 
A

Adam Warner

To clarify, I mean with typedefs.

From my perspective a type system should distinguish between a scalar and
a vector and not permit one to be misused as the other without an explicit
type cast (though a scalar can be abstracted as the first element of a
vector of length 1).

The declaration "int *var;" is conceptually a pointer to a scalar of type
int or a pointer to a vector of type int. If it's a scalar it should never
be positively indexed as an array yet var[12345] will silently compile.

So let's explicitly define scalars as vectors of length 1:

#include <stdlib.h>

typedef unsigned char byte_t[1];

int main() {
byte_t *b=malloc(1);
(*b)[1]=123;
return 0;
}

$ gcc -std=c99 -Wall -Wextra array.c
$

This compiles without a single warning that the static indexing is out of
bounds.

There is a solution: If every scalar is defined within a struct then the
scalar cannot be mistaken for an array of scalars. This does have
implications for inefficient ABIs that return all structs, no matter how
small, via pointers to memory.

Regards,
Adam
 
G

Guest

Adam said:
From my perspective a type system should distinguish between a scalar and
a vector and not permit one to be misused as the other without an explicit
type cast (though a scalar can be abstracted as the first element of a
vector of length 1).

Ah, sorry, I don't think that's possible in standard C. Pointer
arithmetic is allowed for any pointer to a complete object type,
there's no way around that other than by not using a complete object
type, which is not an appropriate solution in most cases.
 
M

Mark McIntyre

The declaration "int *var;" is conceptually a pointer to a scalar of type
int or a pointer to a vector of type int.

Er, no its not. Its conceptually a pointer to a block of memory
containing objects of type int.

Its not a pointer to a vector of anything.
If it's a scalar it should never be positively indexed as an array yet var[12345] will silently compile.

Sure, because what it points to is a block of memory.

(snip example of array bounds overrun)
This compiles without a single warning that the static indexing is out of
bounds.

So? Its the programmer's responsibility not to break the rules.
There is a solution:

Indeed there are many. There is however a penalty, often quite a
severe one.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
A

Adam Warner

(snip example of array bounds overrun)

Here's that code example again:

#include <stdlib.h>

typedef unsigned char byte_t[1];

int main() {
byte_t *b=malloc(1);
(*b)[1]=123;
return 0;
}

$ gcc -std=c99 -Wall -Wextra array.c
$
So? Its the programmer's responsibility not to break the rules.

Here we have an explicit type definition that byte_t is an unsigned char
array of length 1 and you don't even care about silent compilation of a
STATICALLY OBVIOUS buffer overrun.

Third parties suffer from programmers unintentionally breaking rules.
C programming is like having a car with the speedometer removed and a gas
pedal that keeps getting stuck. Yet drivers of these cars are mystified
why others don't want them on the road.

Regards,
Adam
 
R

Richard Heathfield

Adam Warner said:

Here we have an explicit type definition that byte_t is an unsigned char
array of length 1 and you don't even care about silent compilation of a
STATICALLY OBVIOUS buffer overrun.

The compiler is under no obligation to compile it silently. It is free to
issue a diagnostic message. QoI issue.
Third parties suffer from programmers unintentionally breaking rules.

Yes indeed. That's because too many programmers haven't a clue what they're
doing.
C programming is like having a car with the speedometer removed and a gas
pedal that keeps getting stuck.

Like Formula 1 cars, you mean? They don't have speedometers. Whilst the gas
pedal doesn't actually stick as such, you could be forgiven for thinking
so. They can really go some, can't they?
Yet drivers of these cars are mystified why others don't want them on the
road.

When you need to get from A to B in a tearing hurry, you need a fast car
/and/ a safe driver. If you are employing programmers who need to be told
by the compiler not to write to the second byte in a single-byte array, I
suggest you fire them and get someone bright instead.
 
A

Adam Warner

Adam Warner said:



The compiler is under no obligation to compile it silently. It is free to
issue a diagnostic message. QoI issue.


Yes indeed. That's because too many programmers haven't a clue what they're
doing.

I suggest there is dysfunction in the programming community and low
quality of implementation is a symptom. I hope you enjoyed my
light-hearted analogy.
Like Formula 1 cars, you mean? They don't have speedometers. Whilst the gas
pedal doesn't actually stick as such, you could be forgiven for thinking
so. They can really go some, can't they?


When you need to get from A to B in a tearing hurry, you need a fast car
/and/ a safe driver. If you are employing programmers who need to be told
by the compiler not to write to the second byte in a single-byte array, I
suggest you fire them and get someone bright instead.

Thank you for proving my point. Formula 1 cars are not permitted upon
public highways.

Regards,
Adam

PS: A programming language that incorporates levels of compilation safety
can be safer by default but just as fast (and dangerous) when you are in a
tearing hurry.
 
K

Keith Thompson

Adam Warner said:
From my perspective a type system should distinguish between a scalar and
a vector and not permit one to be misused as the other without an explicit
type cast (though a scalar can be abstracted as the first element of a
vector of length 1).

The declaration "int *var;" is conceptually a pointer to a scalar of type
int or a pointer to a vector of type int. If it's a scalar it should never
be positively indexed as an array yet var[12345] will silently compile.

I'm afraid you must be talking about some language other than C.

In C, array indexing is defined in terms of pointer arithmetic; x[y]
is merely a shorthand for (*x+y).

You might not like it, but it's a fundamental feature of the language.

See section 6 of the comp.lang.c FAQ, <http://www.c-faq.com/>, for
more information.
 
R

Richard Heathfield

Adam Warner said:
I suggest there is dysfunction in the programming community and low
quality of implementation is a symptom.

Yes, that's true - but the quality of implementation of C compilers tends to
be extremely good. The fact that a compiler does not diagnose something it
is not required to diagnose does not indicate that it is a poor
implementation.
I hope you enjoyed my light-hearted analogy.

Sure, but remember that analogies are only illustrations of a point. They
cannot /prove/ a point. If they could, two people could prove opposite
points simply by choosing conflicting analogies. As Bjarne Stroustrup once
said, "proof by analogy is fraud".

Thank you for proving my point. Formula 1 cars are not permitted upon
public highways.

"Proof by analogy is fraud." - Bjarne Stroustrup.

Computers are not public highways.
PS: A programming language that incorporates levels of compilation safety
can be safer by default but just as fast (and dangerous) when you are in a
tearing hurry.

You are confusing the language with implementations of that language.
 
A

Adam Warner

You are confusing the language with implementations of that language.

BTW I'm not. Java is an example of a language definition where array
bounds checking is mandated. One cannot disable array bounds checking and
be programming within the semantics of Java the language. For example one
must be able to traverse an array using an otherwise infinite index count
by catching an array index out of bounds exception when the array is
referenced out of range.

Common Lisp is an example of a language that incorporates levels of
compilation safety. There is a concept of safe code. Extra run time checks
are performed upon safe code to help ensure the integrity of the virtual
machine. Correct code will run with the same semantics at any level of
safety, including the disabling of bounds checking. Furthermore the
compilation settings can be locally tuned. For example, once one is sure
that a critical inner loop is correct, safety can be lowered for that
particular region of code.
<http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_s.htm#safe>
<http://www.lispworks.com/documentation/HyperSpec/Body/d_optimi.htm#optimize>

C the language does not incorporate the concept of safe code nor the
localisation of unsafe code. It is no accident implementations eschew code
safety.

Regards,
Adam
 
J

Joe Wright

Keith said:
Adam Warner said:
From my perspective a type system should distinguish between a scalar and
a vector and not permit one to be misused as the other without an explicit
type cast (though a scalar can be abstracted as the first element of a
vector of length 1).

The declaration "int *var;" is conceptually a pointer to a scalar of type
int or a pointer to a vector of type int. If it's a scalar it should never
be positively indexed as an array yet var[12345] will silently compile.

I'm afraid you must be talking about some language other than C.

In C, array indexing is defined in terms of pointer arithmetic; x[y]
is merely a shorthand for (*x+y).

You might not like it, but it's a fundamental feature of the language.

See section 6 of the comp.lang.c FAQ, <http://www.c-faq.com/>, for
more information.
Not (*x+y) but *(x+y) and the difference is non-trivial.
 
K

Keith Thompson

Joe Wright said:
Keith said:
Adam Warner said:
From my perspective a type system should distinguish between a scalar and
a vector and not permit one to be misused as the other without an explicit
type cast (though a scalar can be abstracted as the first element of a
vector of length 1).

The declaration "int *var;" is conceptually a pointer to a scalar of type
int or a pointer to a vector of type int. If it's a scalar it should never
be positively indexed as an array yet var[12345] will silently compile.
I'm afraid you must be talking about some language other than C.
In C, array indexing is defined in terms of pointer arithmetic; x[y]
is merely a shorthand for (*x+y).
You might not like it, but it's a fundamental feature of the
language.
See section 6 of the comp.lang.c FAQ, <http://www.c-faq.com/>, for
more information.
Not (*x+y) but *(x+y) and the difference is non-trivial.

Oops. You're right, of course. It was a typo; thank you for catching
it.
 
R

Richard Heathfield

Adam Warner said:
BTW I'm not.

In C terms, you are.

C the language does not incorporate the concept of safe code nor the
localisation of unsafe code. It is no accident implementations eschew code
safety.

That's the programmer's job. Code safety is far too important to be left to
the compiler. Programmers should take responsibility for ensuring their
code is safe.
 
M

Mark McIntyre

BTW I'm not.

I'm afraid you are. Worse, you confuse your examples. First you
mention a language which has no levels of compliation safety, and
mandates bounds-checking. Thats fine, but its a different language,
and pays a penalty for these features. Then you mention a language in
which some implementations permit you to mark code as safe. Thats fine
too, but its an implementation feature. Again thats fine, but again
there's a penalty.
C the language does not incorporate the concept of safe code nor the
localisation of unsafe code.

I disagree, but I suspect it'd be a bit like explaining to an
agoraphobic why its quite safe to stand in on a hilltop.
It is no accident implementations eschew code safety

Not merely a nonsqeuitur, but a troll.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
A

Adam Warner

Both you and Richard are snipping too much context. I responded to
Richard's response to my postscript:

PS: A programming language that incorporates levels of compilation
safety can be safer by default but just as fast (and dangerous) when
you are in a tearing hurry.
I'm afraid you are. Worse, you confuse your examples. First you
mention a language which has no levels of compliation safety, and
mandates bounds-checking. Thats fine, but its a different language,
and pays a penalty for these features.

Exactly! Point 1: I am not confusing a language with implementations
of that language because the Java Language Specification mandates bounds
checking semantics:
<http://java.sun.com/docs/books/jls/third_edition/html/arrays.html#10.4>

All array accesses are checked at run time; an attempt to use an index
that is less than zero or greater than or equal to the length of the
array causes an ArrayIndexOutOfBoundsException to be thrown.

This is existential proof that safety can be mandated by a language
specification rather than purely a property of implementations.

Point 2: You were supposed to realise there is a penalty to be paid for
this feature. Which is why I then mentioned another language specification
that provides for levels of compilation safety so that implementations can
support both safe code and code without, for example, bounds checking.
In contrast Java the language dictates that out of bounds indexing is
perfectly legal program semantics so one cannot indiscriminately elide
bounds checking from correct Java programs.
Then you mention a language in which some implementations permit you to
mark code as safe. Thats fine too, but its an implementation feature.
Again thats fine, but again there's a penalty.

The notion of safe and hence unsafe code is also part of the language
specification I cited. I am not confused about the difference between a
programming language specification and the implementation of a programming
language.
I disagree, but I suspect it'd be a bit like explaining to an
agoraphobic why its quite safe to stand in on a hilltop.


Not merely a nonsqeuitur, but a troll.

Yet I still inhabit reality where the culture of C-is-a-portable-assembly-
language invariably eschews code safety. In your previous reply in this
thread you didn't even care about a basic static type definition check
that would have warned about a potential buffer overrun. A warning with no
runtime overhead implications and yet you still didn't care.

Indifference to code safety is a signal to implementers. Future
standardisation as a distillation of industry best practice will be
affected if there is no emphasis upon development of implementation
extensions that enhance code safety.

Regards,
Adam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,160
Latest member
CollinStri
Top