Generic linked list with internal storage?

Tim Rentsch · Apr 13, 2010

Seebs said:
The interesting question, I guess, is whether simply declaring the
union (even if you don't use it) changes this.

Declaring a union type (including defining its members) makes
no difference. Only if the struct objects in question are
actually inside an actual union object does that provision
come into play.

Or, for that matter,
having the addresses of one or more of these taken and passed
to something outside this module.

Practically speaking that might have an effect, but the
technical answer is still undefined behavior. The second line
of main(), putting '(void*)&big' into 'otherp', is already
undefined behavior; however, even if that works (as indeed
it is likely to) the accesses through 'otherp' are also
undefined behavior, for the reason you later point out --
an object of one type is being accessed through a different
(and not an otherwise allowed) type.

So far as I can tell, no -- the pointer to the initial member has to
be the same as the pointer to the whole structure/union. (We spent a
while looking for this, but it's defined under the equality operator.)

Yeah. I think the problem is that strict aliasing rules have to be
subverted, and I think some compilers are smart enough to check whether
or not you've demonstrated that they're being subverted.

With some minor exceptions, it's always undefined behavior
to access an object of one type using a different type
if the object isn't in a union object that has both
types in it. Everyone knows about the exceptions for
accessing through (char*), and there a a couple of
others. But independent structs never qualify, outside
of the case where they are both in a union and the
actual object is in an actual union object.

Nick Keighley · Apr 14, 2010

On 4/10/2010 6:10 PM, ImpalerCore wrote:

I'll assume a basic familiarity with the notions of "list"
and of "link" (if these are lacking, ask any IUT student). As
expressed in C, a list's nodes are usually structs because this
is a convenient way to package payload and link(s) together in
one easily-managed blob. Each node contains a link to its
successor and perhaps also to its predecessor. The links are
usually struct pointers, sometimes array indices (if all the
nodes inhabit a single array), with a null pointer (or a special
index value) indicating "no successor/predecessor." There'll
also be some metadata: At the very least, a link to the list's
first node and often a link to the last as well (sometimes a
pointer-to-struct, sometimes a pointer-to-pointer-to-struct).
The metadata may be free-standing, or may inhabit a special
"list header" (especially for a circular list).

I'd consider the variations covered by the above to be
"plain vanilla" linked lists. Programmers should be familiar
with -- and comfortable with -- the basic operations of inserting,
deleting, searching, and traversing such lists.

yes they /should/ but are they actually?

These operations
are so basic, in my view, that hiding the details away behind an
interface is (1) overkill and (2) obfuscatory. It makes about
as much sense to me as

int fetchFromIntArray(const int *array, size_t index) {
return array[index];
}
void storeIntoIntArray(int *array, size_t index, int value) {
array[index] = value;
}

I disagree. Accessing a list is /not/ as simple as accessing an array.
If you'd spent as much time as I have debugging other peoples' buggy
list implementaions you'd be much more in favour of a list library.
It's one of the things that attracted me to C++, or at least to the
STL. I also object to application inter-mingled with list manipulation
code.

find_matching_call();

shouldn't be replaced with inline list walking code. Hey one day we
may decide the over-head of walking a list is too expensive and switch
to a tree or hash.

This is called abstraction and there should be separation of domain
and implementation.

Sorry; I don't even know what a c_compare_function_t might be.

me neither

--

In the development of the understanding of complex phenomena,
the most powerful tool available to the human intellect is
abstraction. Abstraction arises from the recognition of similarities
between certain objects, situations, or processes in the real world
and the decision to concentrate on these similarities and to ignore,
for the time being, their differences.
- C.A.R. Hoare

Nick Keighley · Apr 14, 2010

On Apr 11, 8:40 am, Eric Sosman <[email protected]> wrote:

[vanilla list]

Something that abstracts comparison.

typedef int (*c_compare_function_t)( const void* p, const void* q )

If you manually encode a different search and sort for each struct and
fields that you manage, that's pretty hardcore.

I've worked on a application that did this a *lot* (hand coded
searches)
It induced nausea

ImpalerCore · Apr 14, 2010

On Apr 11, 8:40 am, Eric Sosman <[email protected]> wrote:

Click to expand...

[vanilla list]

Something that abstracts comparison.

Click to expand...

typedef int (*c_compare_function_t)( const void* p, const void* q )

Click to expand...

If you manually encode a different search and sort for each struct and
fields that you manage, that's pretty hardcore.

Click to expand...

I've worked on a application that did this a *lot* (hand coded
searches)
It induced nausea

Interesting. How severe were the symptoms?

The kinds of comparison functions I use the most are of the form

\code
int c_vstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return c_strcmp( s1, s2 );
}

int c_intcmp( const void* p, const void* q )
{
const int* i1 = (const int*)p;
const int* i2 = (const int*)q;

return ( *i1 > *i2 ) - ( *i1 < *i2 );
}

/* Not usable for sorting, but for searching I find it quite useful.
*/
int vstrstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return strstr( s1, s2 ) ? 0 : -1;
}
\endcode

For struct types, there are specific comparison functions for each
field I need to search for. This is from one of the samples that I
used to test my list implementation.

\code
struct departure_record_tag
{
int id;
char destination[20];
char airline[20];
int flight_number;
char time[10];
};
typedef struct departure_record_tag departure_record;

departure_record departure_data[] =
{
{ 1, "Atlanta", "Delta", 1963, "12:00 PM" },
{ 2, "Atlanta", "AirTran Airways", 189, "12:30 PM" },
{ 3, "Boston", "Delta", 6650, "12:00 PM" },
{ 4, "Boston", "American", 4480, "13:00 PM" },
...
}

int departure_compare_by_destination( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_vstrcmp( dr1->destination, dr2->destination );
}

int departure_compare_by_destination( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_vstrcmp( dr1->airline, dr2->airline );
}

int departure_compare_by_flight_number( const void* p, const void* q )
{
const departure_record* dr1 = (const departure_record*)p;
const departure_record* dr2 = (const departure_record*)q;

return c_intcmp( &dr1->flight_number, &dr2->flight_number );
}
\endcode

There are also some comparison functions I use for searching that
don't require instantiating a departure_record to use. I find them to
be convenient, but they are not necessary.

\code
/* Not usable for sorting. */
int search_departure_by_airline( const void* p, const void* q )
{
const departure_record* dr = (const departure_record*)p;
const char* airline = (const char*)q;

return c_vstrcmp( dr->airline, airline );
}

c_list_t* departures = NULL;
c_list_t* l;

/* Fill up departures with records. */

l = c_list_search( departures, "Delta",
search_departures_by_airline );
if ( l ) {
/* Do something with it. */
}
\endcode

The alternative is hand-coding an iterative loop for each specific
search, while I don't find it nausea inducing, I do find it carpal
tunnel inducing for the easy stuff. For the harder stuff, I still
write hand-written loops. And by harder I mean where I have more than
simple equality to look for, like 'eq|ne|lt|gt|lteq|gteq' or multiple
fields. I wish I could figure out a way to do a tcpdump like filter
string that I could use, but it seems quite a bit out of my league.

Just some more random thoughts in the pile.

Best regards,
John D.

Michael Angelo Ravera · Apr 14, 2010

Yes, it might. In fact, almost certainly will, if the code has
any significant longevity.

It is possible to do something along the lines of what you're
asking about. Very briefly, the list "header" has an additional
member that stores a byte offset of where the node value is
stored, and what numeric value is stored in this offset member is
computed using one of the standard techniques for determining
alignment of a type. Doing this reliably and portably requires a
lot of attention to details regarding not only alignment but also
the rules about storing into structure members and padding bytes.
(There are some other details about what type is being used and
what interface to supply to read or store the data, but I'm
skipping over these.)

A more significant question, I think, is are you really sure you
want to do this, given that it seems like you aren't completely
sure of how to do it so it works? It's easy to sidestep all the
difficulties, just by doing two allocations. Until you know how
to avoid the problems, and avoid them reliably, you're probably
not in the best position to decide whether the single allocation
approach is a good one. So I would suggest either just doing
two allocations, or making learning how to implement a "data
and list header together" approach a pre-requisite to deciding
whether you do, in fact, actually want to use such an approach
in your program(s).- Hide quoted text -

- Show quoted text -

Assuming that the data contents are self-described, you SHOULD be able
to create a doublely linked list header structure that looks like
this:
typedef struct list_head
{
list_head * next;
list_head * prev;
grand_union data [0];
} list_head;

grand_union should be a union of all of the types with any alignment
issues something like:
typedef union grand_union
{
char *pc;
void *pv;
char c;
long l;
long long ll;
float f;
long float lf;
/* Include any other types that might have alignment issues here */
} grand_union;

In theory, the compiler should position the union within the struct in
such a way that all alignment issues are handled properly and the size
of the list_head struct will now make it possible to align on any of
the types included in grand_union (including a pointer, if that turns
you one).

The zero size allocation *IS* however, by some compilers, believed to
be nonstandard and will cost you a warning. If it works, you'd
allocate as follows:
new_node = malloc (sizeof (list_head) + mysize);

If you want to be completely standard, you can make the space
allocation be data [1] and use this allocation instead:
new_node = malloc (offsetof (list_head, data) + mysize);

If you have included a "mydatatype mdt" in the grand_union definiton,
you may allocate as follows:

new_node = malloc (offsetof (list_head, data.mdt[1]));

If your data is an array of long floats with num_cells occurances,
you could allocated as follows:

new_node = malloc (offsetof (list_head, data.lf[num_cells]));

Get the idea?

Eric Sosman · Apr 14, 2010

[...]
Assuming that the data contents are self-described, you SHOULD be able
to create a doublely linked list header structure that looks like
this:
typedef struct list_head
{
list_head * next;
list_head * prev;
grand_union data [0];
} list_head;

grand_union should be a union of all of the types [...]

The zero size allocation *IS* however, by some compilers, believed to
be nonstandard and will cost you a warning.

s/believed/correctly believed/
s/warning/diagnostic/

The C99 "flexible array member" is a way to do this cleanly.

If it works, you'd
allocate as follows:
new_node = malloc (sizeof (list_head) + mysize);

If you want to be completely standard, you can make the space
allocation be data [1] and use this allocation instead:
new_node = malloc (offsetof (list_head, data) + mysize);

For "completely standard," mysize must be greater than or
equal to sizeof(grand_union), even if the data to be stored
there is smaller. Why? Because the compiler has been told
that an entire `struct list_head' is allocated, and is therefore
at liberty to make references to any byte thereof. If some of
those bytes aren't there, you're out of luck.

Yes, I've seen this happen, even in a "slightly cleaner"
context. The program had a pile of struct types with common
initial sequences and different-sized continuations, and a union
that held all of them. A function took a pointer to the union
type, looked at a type code in the common initial part to decide
what kind of struct was actually present, and switch'ed to type-
specific code that dealt only with the "really-there" member:

struct foo { int type; int data; };
struct bar { int type; char data[42]; };
...
union all { struct foo foo; struct bar bar; ... };

void func(union all *up) {
switch(up->foo.type) {
case FOO:
... use up->foo ...
break;
case BAR:
... use up->bar ...
break;
...
}
}

On one platform, the program would get SIGSEGV (rarely; this
was very difficult to reproduce). Eventually, three of us figured
it out: The function was called with a pointer to a small-sized
`struct foo' that just happened to be allocated at the very end
of mapped memory, and the optimizing compiler ("knowing" that the
entire large-sized `union all' was present) had decided to do a
speculative pre-fetch of a toward-the-end field, inserting the
pre-fetch before the switch. And the field offset, being greater
than sizeof(struct foo), put the access off the end of memory ...

"If you lie to the compiler, it will get its revenge."
-- Henry Spencer

Flattening linked list problem	4	Dec 3, 2024
OT twalk	5	Sep 17, 2010
list = list->Next = malloc(sizeof(node));	4	May 25, 2007
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Linked-list problem - compiles but segfaults	13	Jul 4, 2011
linked list	19	Nov 21, 2009
Stack using doubly linked list	1	Jun 1, 2011
Thread-Safe Generic List Queue in C	7	Mar 16, 2014

Generic linked list with internal storage?

Tim Rentsch

Nick Keighley

Nick Keighley

ImpalerCore

Michael Angelo Ravera

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads