Generic linked list with internal storage?

E

Eric Sosman

[...]
Generally, true. I don't even consciously notice adding a small
list implementation -- it's too trivial to notice.

I'm curious what you consider a plain vanilla linked list?

I'll assume a basic familiarity with the notions of "list"
and of "link" (if these are lacking, ask any IUT student). As
expressed in C, a list's nodes are usually structs because this
is a convenient way to package payload and link(s) together in
one easily-managed blob. Each node contains a link to its
successor and perhaps also to its predecessor. The links are
usually struct pointers, sometimes array indices (if all the
nodes inhabit a single array), with a null pointer (or a special
index value) indicating "no successor/predecessor." There'll
also be some metadata: At the very least, a link to the list's
first node and often a link to the last as well (sometimes a
pointer-to-struct, sometimes a pointer-to-pointer-to-struct).
The metadata may be free-standing, or may inhabit a special
"list header" (especially for a circular list).

I'd consider the variations covered by the above to be
"plain vanilla" linked lists. Programmers should be familiar
with -- and comfortable with -- the basic operations of inserting,
deleting, searching, and traversing such lists. These operations
are so basic, in my view, that hiding the details away behind an
interface is (1) overkill and (2) obfuscatory. It makes about
as much sense to me as

int fetchFromIntArray(const int *array, size_t index) {
return array[index];
}
void storeIntoIntArray(int *array, size_t index, int value) {
array[index] = value;
}

What things might change a list's flavor to something other
than vanilla, to some flavor that might justify more opacity? A
non-exhaustive, er, list:

- Unusual links. If the links are self-relative offsets (as
might be used in lists shared between processes that map them
to different virtual addresses), or if the stored values are
the XOR of a forward and backward link, or some other odd
encoding is used, link manipulation may become intricate
enough to justify being packaged away somewhere.

- Bit-diddling. If the bits of the links' representation are
twiddled to encode additional information (e.g., by using
a few low-order "known to be zero" bits as flags), it's
probably best to put the encoding and decoding out of sight
somewhere. (The writers of LISP interpreters seem particularly
attracted to such techniques.)

- Peculiar nodes. If the list nodes are not structs, there's
usually something slightly exotic going on, worthy of being
hidden from us easily-scandalized maiden aunts.

- Mammoth metadata. If a list carries more than the usual
amount of metadata -- a node count, a link to the node most
recently inserted (at whatever position), a link to the node
at the midmost position, whatever -- it's probably best to
let a function package take care of the maintenance.

- Additional linkage. If there are additional links, as in
a skip list or some such, the burden of maintaining them
grows to the point where pre-packaging becomes justifiable.

.... and so on. However, it doesn't seem to me that a "generic"
package is what's needed for this sort of thing; the oddities are,
well, odd, and not easily generified. If you've got a list in
shared memory, using self-relative links whose low-order bits
encode extra information about the pointed-to nodes, you'll most
likely write a few functions to manage lists of exactly that
description; the IUT Senior Thesis Generic List Package simply
won't suffice.
Perhaps if
you could describe the vanilla list feature subset, maybe by listing
functions of an existing implementation like GLib or the C++ STL list,
or is there even a function interface to it?

I'm not intimate with GLib, and I have an irrational dislike
for all things C++, so I'll take a pass on this one.
Do you feel that adding a c_compare_function_t makes it a 'chocolate
twist with jimmies' linked list?

Sorry; I don't even know what a c_compare_function_t might be.
 
I

ImpalerCore

[...]
But for
a plain vanilla linked list ... Why bother?
Generally, true.  I don't even consciously notice adding a small
list implementation -- it's too trivial to notice.
I'm curious what you consider a plain vanilla linked list?

     I'll assume a basic familiarity with the notions of "list"
and of "link" (if these are lacking, ask any IUT student).  As
expressed in C, a list's nodes are usually structs because this
is a convenient way to package payload and link(s) together in
one easily-managed blob.  Each node contains a link to its
successor and perhaps also to its predecessor.  The links are
usually struct pointers, sometimes array indices (if all the
nodes inhabit a single array), with a null pointer (or a special
index value) indicating "no successor/predecessor."  There'll
also be some metadata: At the very least, a link to the list's
first node and often a link to the last as well (sometimes a
pointer-to-struct, sometimes a pointer-to-pointer-to-struct).
The metadata may be free-standing, or may inhabit a special
"list header" (especially for a circular list).

     I'd consider the variations covered by the above to be
"plain vanilla" linked lists.  Programmers should be familiar
with -- and comfortable with -- the basic operations of inserting,
deleting, searching, and traversing such lists.  These operations
are so basic, in my view, that hiding the details away behind an
interface is (1) overkill and (2) obfuscatory.  It makes about
as much sense to me as

        int fetchFromIntArray(const int *array, size_t index) {
            return array[index];
        }
        void storeIntoIntArray(int *array, size_t index, int value) {
            array[index] = value;
        }

     What things might change a list's flavor to something other
than vanilla, to some flavor that might justify more opacity?  A
non-exhaustive, er, list:

     - Unusual links.  If the links are self-relative offsets (as
       might be used in lists shared between processes that map them
       to different virtual addresses), or if the stored values are
       the XOR of a forward and backward link, or some other odd
       encoding is used, link manipulation may become intricate
       enough to justify being packaged away somewhere.

     - Bit-diddling.  If the bits of the links' representation are
       twiddled to encode additional information (e.g., by using
       a few low-order "known to be zero" bits as flags), it's
       probably best to put the encoding and decoding out of sight
       somewhere.  (The writers of LISP interpreters seem particularly
       attracted to such techniques.)

     - Peculiar nodes.  If the list nodes are not structs, there's
       usually something slightly exotic going on, worthy of being
       hidden from us easily-scandalized maiden aunts.

     - Mammoth metadata.  If a list carries more than the usual
       amount of metadata -- a node count, a link to the node most
       recently inserted (at whatever position), a link to the node
       at the midmost position, whatever -- it's probably best to
       let a function package take care of the maintenance.

     - Additional linkage.  If there are additional links, as in
       a skip list or some such, the burden of maintaining them
       grows to the point where pre-packaging becomes justifiable..

... and so on.  However, it doesn't seem to me that a "generic"
package is what's needed for this sort of thing; the oddities are,
well, odd, and not easily generified.  If you've got a list in
shared memory, using self-relative links whose low-order bits
encode extra information about the pointed-to nodes, you'll most
likely write a few functions to manage lists of exactly that
description; the IUT Senior Thesis Generic List Package simply
won't suffice.
Perhaps if
you could describe the vanilla list feature subset, maybe by listing
functions of an existing implementation like GLib or the C++ STL list,
or is there even a function interface to it?

     I'm not intimate with GLib, and I have an irrational dislike
for all things C++, so I'll take a pass on this one.

The fact that you ignore std::list and the C++ STL, one of the best
things to happen to containers, does say something to me about your
bias. If you haven't tried to use GLib shows that you prefer to do
things your own way, which is fine by me. If you had actually tried
to use either of these libraries, and found them wanting for reasons
x, y, and z, that would be one thing. But not experiencing them first
hand makes your opinion rather difficult to digest since I have a
completely different background than you. I came from C++ to C, so
that probably describes my bias.
     Sorry; I don't even know what a c_compare_function_t might be.

Something that abstracts comparison.

typedef int (*c_compare_function_t)( const void* p, const void* q )

If you manually encode a different search and sort for each struct and
fields that you manage, that's pretty hardcore.

Best regards,
John D.
 
J

jacob navia

pete a écrit :
Phil said:
Willem said:
jacob navia wrote:
) Which of
)
) strcpy(dst,src);
)
) or
) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
) dst = src;
)
) is clearer?

Given that they do different things, you can't compare clarity.
You're missing the point, which would have been illustrated
by the following:

Which of

strcpy(src, dst);

or

for (i = 0; dst != 0; i++)
src = dst;

is clearer ?

Well, both clearly have issues, so does that make them
equally clear?


You got it backwards.
jacob navia's for loop, does what strcpy does.

Willem's for loop, doesn't write a null byte.


It is VERY easy to make mistakes in trivial function code as
this example demonstrates.

I do not doubt that willem and carmody are experienced C
programmers, but...

There is nothing more trivial that strcpy, yet even an experienced
programer will not see a small (trivial) error...

This reinforces my argument: even if lists are "trivial" to implement,
it is better to avoid reimplementing the wheel...
 
E

Eric Sosman

[...]
Sorry; I don't even know what a c_compare_function_t might be.

Something that abstracts comparison.

typedef int (*c_compare_function_t)( const void* p, const void* q )

If you manually encode a different search and sort for each struct and
fields that you manage, that's pretty hardcore.

For lists where node-to-node comparison is important (which
is not true of all lists), it seems to me that the comparison is
associated with the nodes and not with the lists/trees/treaps/...
in which they reside. I'll usually search for an "equal" node
with something like

for (ptr = head; ptr != NULL; ptr = ptr->next) {
if (compareNodes(ptr, key) == 0)
break;
}
if (ptr != NULL) ...

and not with

static int compareWrapper(const void *p, const void *q) {
return compareNodes(p, q);
}
...
ptr = searchList(head, key, compareWrapper);
if (ptr != NULL) ...

The former seems to me (eye of the beholder, don'cha know)
simpler than the latter. Safer, too, because the compiler will
squawk if I accidentally use the comparator for some different
node type.

Sorting a list (when that makes sense) is, I feel, better
done by a "generic list-sorting function" than by a "generic list
package." That is, it makes more sense to me to abstract the
sorting operation itself than to abstract the features of the
thing being sorted. I've written such a function (that I'm
inordinately proud of), and I use it whenever needed -- but I'll
still use open code to build the list in the first place, and to
navigate it after sorting. Similarly, I'll use qsort() to sort
an array, but I'll use [] rather than some function package to
access it before and after. I truly don't see why I shouldn't.
 
P

Phil Carmody

pete said:
Phil said:
Willem said:
jacob navia wrote:
) Which of
)
) strcpy(dst,src);
)
) or
) for (int i = 0; i<=strlen(src);i++) // [I added an 'i']
) dst = src;
)
) is clearer?


Given that they do different things, you can't compare clarity.


You got it backwards.
jacob navia's for loop, does what strcpy does.


char buff[]="hello world";
src=buff+1;
dest=buff;
if (flag) {
strcpy(dst, src); // UB
} else {
for (int i = 0; i<=strlen(src);i++) // not UB
dst = src;
}

UB is not the same as not UB.

Phil
 
T

Tim Rentsch

jacob navia said:
[snip]

This reinforces my argument: even if lists are "trivial" to implement,
it is better to avoid reimplementing the wheel...

Simple linked lists predate the wheel. Historically
there is some question about whether the wheel was
invented before or after circular linked lists.

(Sorry, I wouldn't resist. :)
 
N

Nick

Jef Driesen said:
From: Jef Driesen <[email protected]>
Subject: Re: Generic linked list with internal storage?
Newsgroups: comp.lang.c
Date: Fri, 09 Apr 2010 13:47:49 +0200
Organization: BELNET - The Belgian Research Network



That makes the list is non generic, and that's not what I want.

But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.

The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.

This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.
 
T

Tim Rentsch

Nick said:
But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.

The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.

This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.

Actually I'm pretty sure that's not guaranteed. That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different. For example, if we have

struct foo { int i; double d; char b[4]; };
struct bas { int i; double d; char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1. I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).
 
G

Gene

But you write generic functions that work on a simple structure with
just the next and prev pointers.  You just cast your specific structures
to these, and cast back to the specific types in your callback function..
The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.
This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.

Actually I'm pretty sure that's not guaranteed.  That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different.  For example, if we have

   struct foo { int i;  double d;  char b[4]; };
   struct bas { int i;  double d;  char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1.  I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).

But the structs having different alignments don't cause a problem for
the "generic" list implementation. The Standard requires that a
pointer to one will access the common initial fields of the other.
 
T

Tim Rentsch

Gene said:
Nick said:
Jef Driesen <[email protected]> writes:
From: Jef Driesen <[email protected]>
Subject: Re: Generic linked list with internal storage?
Newsgroups: comp.lang.c
Date: Fri, 09 Apr 2010 13:47:49 +0200
Organization: BELNET - The Belgian Research Network
On 9/04/2010 12:45, bartc wrote:
A typical (double) linked list is implemented with these two data
structures:
typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;
typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;
typedef struct mydata_t {
...
} mydata_t;
The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for the
user defined mydata_t structure.
Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data?
Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?
(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)
That makes the list is non generic, and that's not what I want.
But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.
The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.
This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.

Actually I'm pretty sure that's not guaranteed. That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different. For example, if we have

struct foo { int i; double d; char b[4]; };
struct bas { int i; double d; char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1. I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).

But the structs having different alignments don't cause a problem for
the "generic" list implementation. The Standard requires that a
pointer to one will access the common initial fields of the other.

No, it doesn't. The "common initial sequence rule" holds only when
the two structure objects in question are members of a union object
(an actual union object, not just a type relationship), and then
only when the definition of the union type in question is visible at
the point of through-other-struct-type access. Trying to do that
with two arbitrary structs like the ones above, which happen to have
the same types of members at the beginning, but that aren't part of
an actual union object having those structs as members, results in
undefined behavior.
 
S

stan

Tim said:
Simple linked lists predate the wheel. Historically
there is some question about whether the wheel was
invented before or after circular linked lists.

(Sorry, I wouldn't resist. :)

I'm pretty sure Knuth covered all of the above, in real time as they
were discovered; and exhaustively too!

(Sorry I had a lapse of judgement myself :)
 
N

Nick

Tim Rentsch said:
Gene said:
From: Jef Driesen <[email protected]>
Subject: Re: Generic linked list with internal storage?
Newsgroups: comp.lang.c
Date: Fri, 09 Apr 2010 13:47:49 +0200
Organization: BELNET - The Belgian Research Network

On 9/04/2010 12:45, bartc wrote:
A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for the
user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data?

Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?

(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)

That makes the list is non generic, and that's not what I want.

But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.

The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.

This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.

Actually I'm pretty sure that's not guaranteed. That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different. For example, if we have

struct foo { int i; double d; char b[4]; };
struct bas { int i; double d; char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1. I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).

But the structs having different alignments don't cause a problem for
the "generic" list implementation. The Standard requires that a
pointer to one will access the common initial fields of the other.

No, it doesn't. The "common initial sequence rule" holds only when
the two structure objects in question are members of a union object
(an actual union object, not just a type relationship), and then
only when the definition of the union type in question is visible at
the point of through-other-struct-type access. Trying to do that
with two arbitrary structs like the ones above, which happen to have
the same types of members at the beginning, but that aren't part of
an actual union object having those structs as members, results in
undefined behavior.

I asked, with examples, about this a while ago, because I wanted to be
sure I was legal, and was assured I was. Anyone else got a view?

Can I do this (just typed in, not compiled or tested):

struct generic_list {
struct generic_list *next;
};

typedef void operate_function(void *list_item, void *args);

void iterate_over_list(void *list_param, operate_function *fn,
void *args) {
struct generic_list *list = list_param;
while(list) {
fn(list,args);
list = list->next;
}
}

And use this to operate on any list where the first item is a normal
"next" pointer?

[as in another thread, this is an example where I actually might well do
(*fn)(list,args)]

For the record I do, and it works. But I know how little that counts
for.
 
S

Seebs

I asked, with examples, about this a while ago, because I wanted to be
sure I was legal, and was assured I was. Anyone else got a view?

I am pretty sure that the compiler would have to go out of its way
to break anything; consensus of the committee has been that "all
struct pointers smell the same" -- meaning, as long as you really do have
something as an initial sequence, the fact that it could have been in
a union somewhere in another module means that it has to work as though
it is in fact in a union somewhere.

Consider a pair of struct types, struct a and struct b, which have
a common initial subsequence. And consider:

union c { struct a a; struct b b; };
and
void unionize(struct a *a, struct b *b) {
union c c;
if (a)
c->a = *a;
if (b)
c->b = *b;
};

Since that can exist, and unions have to hold objects in their unaltered
form, and you have to be able to access the common initial subsequence
in a and b, a compiler has to work VERY HARD to not work if you
pass a pointer to struct a or b, suitably cast, to a function expecting
their common initial subsequence.

I think the intent is probably that it is supposed to work. It's an easier
pitch in the case where you have a separate struct for the common initial
subsequence, because it is definitely the case that a pointer to a structure,
suitably converted, is a valid pointer to its initial member, and vice
versa.

-s
 
O

Oliver Jackson

pete said:
Phil said:
jacob navia wrote:
) Which of
)
)     strcpy(dst,src);
)
) or
)     for (int i = 0; i<=strlen(src);i++)  // [I added an 'i']
)         dst = src;
)
) is clearer?
Given that they do different things, you can't compare clarity.

You got it backwards.
jacob navia's for loop, does what strcpy does.

char buff[]="hello world";
src=buff+1;
dest=buff;
if (flag) {
  strcpy(dst, src); // UB} else {

  for (int i = 0; i<=strlen(src);i++) // not UB
    dst = src;

}

UB is not the same as not UB.


That's retarded. The strcpy call invokes undefined behavior only
because the standard doesn't spell out strcpy's precise
implementation. Anyway, your your protest has absolutely nothing to
do with the point Jacob was making, which has to do with accidentally
passing arguments to a function in the wrong order.
 
G

Gene

Gene said:
From: Jef Driesen <[email protected]>
Subject: Re: Generic linked list with internal storage?
Newsgroups: comp.lang.c
Date: Fri, 09 Apr 2010 13:47:49 +0200
Organization: BELNET - The Belgian Research Network
On 9/04/2010 12:45, bartc wrote:
A typical (double) linked list is implemented with these two data
structures:
typedef struct node_t {
    node_t *prev;
    node_t *next;
    void *data;
} node_t;
typedef struct list_t {
    node_t *head;
    node_t *tail;
} list_t;
typedef struct mydata_t {
    ...
} mydata_t;
The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for the
user defined mydata_t structure.
Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data?
Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?
(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)
That makes the list is non generic, and that's not what I want.
But you write generic functions that work on a simple structure with
just the next and prev pointers.  You just cast your specific structures
to these, and cast back to the specific types in your callback function.
The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.
This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.
Actually I'm pretty sure that's not guaranteed.  That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different.  For example, if we have
   struct foo { int i;  double d;  char b[4]; };
   struct bas { int i;  double d;  char b[1]; };
I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1.  I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).
But the structs having different alignments don't cause a problem for
the "generic" list implementation.  The Standard requires that a
pointer to one will access the common initial fields of the other.

No, it doesn't.  The "common initial sequence rule" holds only when
the two structure objects in question are members of a union object
(an actual union object, not just a type relationship), and then
only when the definition of the union type in question is visible at
the point of through-other-struct-type access.  Trying to do that
with two arbitrary structs like the ones above, which happen to have
the same types of members at the beginning, but that aren't part of
an actual union object having those structs as members, results in
undefined behavior.- Hide quoted text -

- Show quoted text -

You're right. Thanks for the correction. I'll delete the earlier
posting.
 
G

Gene

I am pretty sure that the compiler would have to go out of its way
to break anything; consensus of the committee has been that "all
struct pointers smell the same" -- meaning, as long as you really do have
something as an initial sequence, the fact that it could have been in
a union somewhere in another module means that it has to work as though
it is in fact in a union somewhere.

Consider a pair of struct types, struct a and struct b, which have
a common initial subsequence.  And consider:

        union c { struct a a; struct b b; };
and
        void unionize(struct a *a, struct b *b) {
                union c c;
                if (a)
                        c->a = *a;
                if (b)
                        c->b = *b;
        };

Since that can exist, and unions have to hold objects in their unaltered
form, and you have to be able to access the common initial subsequence
in a and b, a compiler has to work VERY HARD to not work if you
pass a pointer to struct a or b, suitably cast, to a function expecting
their common initial subsequence.

This is the argument I remember, and it made sense to me. However,
then I read this:
http://coding.derkeiler.com/Archive/C_CPP/comp.lang.c/2010-03/msg00220.html
which seems to offer a meaningful counterexample. Here it is for
convenience:

#include <stdio.h>
struct common { int a, b; };
struct big_struct { struct common com; double c; };
struct small_struct { struct common com; };

int main(void)
{
struct big_struct big;
struct small_struct *otherp = (void *)&big;
big.com.a = 0;
otherp->com.a = 11;
printf("big.com.a = %d\n", big.com.a);
return 0;
}

Ben says his gcc prints 0 (rather than 11 as mine does).
I think the intent is probably that it is supposed to work.  It's an easier
pitch in the case where you have a separate struct for the common initial
subsequence, because it is definitely the case that a pointer to a structure,
suitably converted, is a valid pointer to its initial member, and vice
versa.

But hypothetically, the conversion could involve pointer arithmetic,
no? If so, even this is, again, a not-certain-to-be-portable way to
implement generic lists, the OP's question.
 
S

Seebs

#include <stdio.h>
struct common { int a, b; };
struct big_struct { struct common com; double c; };
struct small_struct { struct common com; };

int main(void)
{
struct big_struct big;
struct small_struct *otherp = (void *)&big;
big.com.a = 0;
otherp->com.a = 11;
printf("big.com.a = %d\n", big.com.a);
return 0;
}

Ben says his gcc prints 0 (rather than 11 as mine does).

The interesting question, I guess, is whether simply declaring the
union (even if you don't use it) changes this. Or, for that matter,
having the addresses of one or more of these taken and passed
to something outside this module.
But hypothetically, the conversion could involve pointer arithmetic,
no?

So far as I can tell, no -- the pointer to the initial member has to
be the same as the pointer to the whole structure/union. (We spent a
while looking for this, but it's defined under the equality operator.)
If so, even this is, again, a not-certain-to-be-portable way to
implement generic lists, the OP's question.

Yeah. I think the problem is that strict aliasing rules have to be
subverted, and I think some compilers are smart enough to check whether
or not you've demonstrated that they're being subverted.

-s
 
T

Tim Rentsch

Nick said:
Tim Rentsch said:
Gene said:
From: Jef Driesen <[email protected]>
Subject: Re: Generic linked list with internal storage?
Newsgroups: comp.lang.c
Date: Fri, 09 Apr 2010 13:47:49 +0200
Organization: BELNET - The Belgian Research Network

On 9/04/2010 12:45, bartc wrote:
A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for the
user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data?

Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?

(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)

That makes the list is non generic, and that's not what I want.

But you write generic functions that work on a simple structure with
just the next and prev pointers. You just cast your specific structures
to these, and cast back to the specific types in your callback function.

The only constraint is that you have to have the same next/prev pointers
order in any structure that the generic functions work on.

This works because all structure pointers can be cast to others, and all
structures with common initial sequences have the same alignment for
that part.

Actually I'm pretty sure that's not guaranteed. That is,
even though the offsets of the corresponding members might
have to be the same, the alignment of the struct as a whole
might be different. For example, if we have

struct foo { int i; double d; char b[4]; };
struct bas { int i; double d; char b[1]; };

I believe it's allowed for the alignment of a 'struct foo'
to be 4 even though the alignment of a 'struct bas' might
be only 1. I doubt any implementations actually do this,
but it's allowed (isn't it?) and not really so far-fetched
that an implementation that had made such a choice could
go badly wrong if a (struct bas*) were being used as a
(struct foo*).

But the structs having different alignments don't cause a problem for
the "generic" list implementation. The Standard requires that a
pointer to one will access the common initial fields of the other.

No, it doesn't. The "common initial sequence rule" holds only when
the two structure objects in question are members of a union object
(an actual union object, not just a type relationship), and then
only when the definition of the union type in question is visible at
the point of through-other-struct-type access. Trying to do that
with two arbitrary structs like the ones above, which happen to have
the same types of members at the beginning, but that aren't part of
an actual union object having those structs as members, results in
undefined behavior.

I asked, with examples, about this a while ago, because I wanted to be
sure I was legal, and was assured I was. Anyone else got a view?

Can I do this (just typed in, not compiled or tested):

struct generic_list {
struct generic_list *next;
};

typedef void operate_function(void *list_item, void *args);

void iterate_over_list(void *list_param, operate_function *fn,
void *args) {
struct generic_list *list = list_param;
while(list) {
fn(list,args);
list = list->next;
}
}

And use this to operate on any list where the first item is a normal
"next" pointer? [snip]

Technically, it's undefined behavior. Practically speaking,
my guess is it will work on every implementation you're
likely to encounter. However, it's easy to devise
a perverse-but-conforming implementation where it would
fail.

I'm not sure how much this helps you, but if you follow
up with a more specific question I'll try to give a
more specific answer.
 
S

Seebs

The "common initial sequence rule" is required to work _only_ if
the struct object in question is contained in an actual union
object of the appropriate type. Not just if the union _type_
exists and is visible, but if the struct _object_ is actually in
a union _object_. If it's a freestanding struct, not contained
in a union object of a type that includes both structs, all bets
are off.

This surprises me, although I think you're right.

Speaking only for myself, I would *prefer* that the spec simply
give the guarantee that, if the types of the first N members of
two structure types are the same types in the same order, then you
can use a pointer to one as a pointer to the other so long as
you only access those members. In practice, it's what people
seem to mostly expect, and I don't think the lost optimizations
will be all that big a deal. In particular, "restrict" lets us
specify the thing we're most likely to care about -- that the
two arguments to a function which are both pointers to structures
necessarily point to different structures.

-s
 
T

Tim Rentsch

Seebs said:
I am pretty sure that the compiler would have to go out of its way
to break anything; consensus of the committee has been that "all
struct pointers smell the same" -- meaning, as long as you really do have
something as an initial sequence, the fact that it could have been in
a union somewhere in another module means that it has to work as though
it is in fact in a union somewhere.

Consider a pair of struct types, struct a and struct b, which have
a common initial subsequence. And consider:

union c { struct a a; struct b b; };
and
void unionize(struct a *a, struct b *b) {
union c c;
if (a)
c->a = *a;
if (b)
c->b = *b;
};

Of course you meant c.a and c.b, not c->a and c->b.
Since that can exist, and unions have to hold objects in their unaltered
form, and you have to be able to access the common initial subsequence
in a and b, a compiler has to work VERY HARD to not work if you
pass a pointer to struct a or b, suitably cast, to a function expecting
their common initial subsequence.

Practically speaking casting one struct pointer type to another
will almost certainly work for any members that happen to have
the same offset (and members in the "common leading sequence"
certainly will). However technically it's still undefined
behavior. In fact even just doing the cast of the pointer is
undefined behavior, since the two structures aren't guaranteed to
have the same alignment requirements; pointers to structures all
have the same alignment, but the structures they point to don't
have to.

I think the intent is probably that it is supposed to work. It's an easier
pitch in the case where you have a separate struct for the common initial
subsequence, because it is definitely the case that a pointer to a structure,
suitably converted, is a valid pointer to its initial member, and vice
versa.

The "common initial sequence rule" is required to work _only_ if
the struct object in question is contained in an actual union
object of the appropriate type. Not just if the union _type_
exists and is visible, but if the struct _object_ is actually in
a union _object_. If it's a freestanding struct, not contained
in a union object of a type that includes both structs, all bets
are off.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,905
Latest member
Kristy_Poole

Latest Threads

Top