Generic linked list with internal storage?

Jef Driesen · Apr 8, 2010

Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for
the user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to
the mydata_t structure like this:

typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;
} node_t;

Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

But I think this might cause trouble due to alignment issues?

Thanks,

Jef

Malcolm McLean · Apr 8, 2010

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

But I think this might cause trouble due to alignment issues?

Yes. In practice it's unlikely that a structure consisting only of
pointers would break alignment, and even less likely that a couplet of
two pointers would do so. But it is possible and allowed by the
standard.
You can still store and extract the data by treating it as raw bytes,
but that's not very efficient.

MaciekL · Apr 8, 2010

Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

You can ommit aligment issue with follwowing example:

typedef struct node_mydata_t {
node_t node;
mydata_t mydata;
};

void function(void)
{
node_mydata_t * node_mydata = malloc(sizeof(*node_mydata));
node_t * node = &(node_mydata->node); /* same address as
node_mydata - node is a first element in structure */
node->data = &(node_mydata->mydata)
...
}

Best Regards

MaciekL · Apr 8, 2010

typedef struct node_mydata_t {
node_t node;
mydata_t mydata;
};

fixed:
typedef struct node_mydata_t {
node_t node;
mydata_t mydata;
} node_mydata_t;

jacob navia · Apr 8, 2010

Jef Driesen a écrit :

Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for
the user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to
the mydata_t structure like this:

typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;
} node_t;

Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

But I think this might cause trouble due to alignment issues?

Thanks,

Jef

In the containers library I am working I added a field "ElementSize" to
the list header (your list_t), and when I allocate a node I allocate with:

node_t *n = malloc(sizeof(node)+list->ElementSize);

To assign the data I do:

memcpy(node->data,object,list->ElementSize);

The node structure is declared exactly like yours but at the end I
differ with:

typedef struct node_t {
node_t *prev;
node_t *next;
char data[];
} node_t;

Note that this is using C99.

Malcolm McLean · Apr 8, 2010

You can ommit aligment issue with follwowing example:

typedef struct node_mydata_t {
node_t node;
mydata_t mydata;

};

Then the linked list isn't generic.
What we need is some sort of way of specifying padding to the next
alignment point. Let's say, to make it complicated, that pointers are
4 bytes whilst long doubles need an alignment of ten (the strictest
alignment).

Position Bytes Contents
0 4 prev
4 4 next
8 2 padding
10 any arbitrary data (eg, a long double)

We need some portable way of inserting the two padding bytes. Im not
sure that this is possible in ANSI C.

ImpalerCore · Apr 8, 2010

Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;

} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;

} list_t;

typedef struct mydata_t {
...

} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for
the user defined mydata_t structure.

That is one of the drawbacks of using a void* in a linked list.
However, since the linked list nodes using void* are always the same
size (12 bytes assuming 32-bit pointers), this can be somewhat
mitigated by using some memory pooling technique. The GLib
implementation has a "memory slice" that it uses to obtain linked
list, but I've never done any measurements to see what the performance
effect of it is.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to
the mydata_t structure like this:

typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;

} node_t;

This is not a generic linked list, its a specific linked list for
mydata_t. You need some generic mechanism like a void* or a resizable
byte array.

Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

But I think this might cause trouble due to alignment issues?

Since this is in reference to a generic list, there may be alignment
issues in general. However, for many structures, you can use the
technique and get away with it, it's a caveat of the technique. I
know very little about alignment issues in general, but many memory
debuggers use a similar technique to pad a memory allocation with a
struct to maintain properties of the memory returned, like the size of
the allocation, start and end addresses for fence checking. You may
be able to glean how they handle alignment problems from their
source. DMalloc is one memory debugger that you could inspect the
code to see if they have something that you could use towards this
effect.

Thanks,

Jef

While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.

1. Using function pointers to abstract behavior on the structure
pointed to by void*.

The two main function pointer types I'm referring to are these.

typedef void (*c_free_function_t)( void* );
typedef int (*c_compare_function_t)( const void* p, const void* q );

Consider what I use for my linked list free function.

\code
void c_list_free( c_list_t* list, c_free_function_t free_fn )
{
c_list_t* node;

while ( list )
{
node = list;
list = list->next;

if ( free_fn ) {
(*free_fn)( node->object );
}
c_free( node );
}
}
\endcode

The free_fn parameter abstracts what is required to properly destroy
the list node object. You can think of free_fn as a parallel to a C++
object destructor. If the linked list is responsible for the objects
it maintains, all you need to do to clean up the list is pass the
right destructor function. If the linked list is pointing to char*
arrays allocated with 'malloc', you can pass 'free' as the argument
and it will release each node's memory. If the structure is more
complicated, (for instance I have an auto-allocating structure called
c_string_t that maintains a dynamically allocated buffer), I can pass
a destructor specific to that c_string_t object. If the linked list
is a shallow reference to data, NULL can be passed to just destroy the
nodes themselves. I find this function convenient since I don't have
to write an iterating loop to release node resources.

While the free function is the appetizer, the comparison function is
the main course. It allows you to abstract list operations that
require comparisons. Typically this involves some kind of sorting,
but there other creative uses for them as well.

Consider a function to insert a generic object into a sorted linked
list (crosses my fingers that the code doesn't wrap too badly).

\code
c_list_t* c_list_insert_sorted( c_list_t* list,
void* object,
c_compare_function_t cmp_fn )
{
c_list_t* tmp_list = list;
c_list_t* node = NULL;
int cmp;

if ( cmp_fn )
{
/* c_new is a macro wrapper around malloc */
node = c_new( c_list_t, 1 );
if ( node )
{
node->object = object;
node->prev = NULL;
node->next = NULL;

if ( !list ) {
return node;
}

/*
* When defining the comparison function, it's second argument
is
* required to be the same type as the first, which should be
the
* structure pointed to by the list node's object pointer.
*/
cmp = (*cmp_fn)( object, tmp_list->object );

while ( (tmp_list->next) && (cmp > 0) )
{
tmp_list = tmp_list->next;
cmp = (*cmp_fn)( object, tmp_list->object );
}

if ( (!tmp_list->next) && (cmp > 0) )
{
tmp_list->next = node;
node->prev = tmp_list;
return list;
}

if ( tmp_list->prev )
{
tmp_list->prev->next = node;
node->prev = tmp_list->prev;
}
node->next = tmp_list;
tmp_list->prev = node;

if ( tmp_list == list ) {
return node;
}
}
}

return list;
}
\endcode

I can use this engine to insert any kind of object into a homogenous
list as long as I have some method of defining an order. Here are a
couple of comparison functions you could use to sort strings that use
char*.

int c_vstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

return c_strcmp( s1, s2 );
}

int vstrlencmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;

int slen1 = (int)strlen( s1 );
int slen2 = (int)strlen( s2 );

return slen1 - slen2;
}

Note that one orders strings alphanumerically while the other orders
strings by their length.

\code
int main( void )
{
c_list_t* grocery_list = NULL;
c_list_t* l;
size_t i;
char* s;

char* food_items[] = {
"flour", "milk", "butter", "sugar", "bread", "apples", "eggs"
};

for ( i = 0; i < C_ARRAY_N( food_items ); ++i )
{
s = c_strdup( food_items );
if ( s ) {
grocery_list = c_list_insert_sorted( grocery_list, s,
c_vstrcmp );
}
}

printf( "Grocery List:" );
for ( l = grocery_list; l != NULL; l = l->next ) {
printf( " %s", (char*)l->object );
}
printf( "\n" );

c_list_free( grocery_list, c_free );

return EXIT_SUCCESS;
}
\endcode

\output
Grocery List: apples bread butter eggs flour milk sugar
\endoutput

2. The other main advantage of using a void* or equivalent is the
ability to create *shallow copies* of the list.

Tying the data structure into the linked list node makes creating
views of your data really expensive. When the data is referenced by a
void*, all I need to do to create a linked list that is a shallow
reference to another linked list is to copy the pointer. I don't want
to reallocate a potentially big struct tied to the linked list node in
order to make a different view. Here is my function for creating
shallow copy of a linked list.

\code
c_list_t* c_list_copy( c_list_t* list )
{
c_list_t* new_list = NULL;
c_list_t* node;
c_list_t* last;
c_bool list_alloc_failed;

if ( list )
{
new_list = c_new( c_list_t, 1 );
if ( new_list )
{
new_list->object = list->object;
new_list->prev = NULL;
last = new_list;
list = list->next;

list_alloc_failed = FALSE;
while ( list && !list_alloc_failed )
{
last->next = c_new( c_list_t, 1 );
if ( last->next )
{
last->next->prev = last;
last = last->next;
last->object = list->object;
list = list->next;
}
else {
list_alloc_failed = TRUE;
}
}

last->next = NULL;

if ( list_alloc_failed )
{
while ( new_list )
{
node = new_list;
new_list = new_list->next;

c_free( node );
}
new_list = NULL;
}
}
}

return new_list;
}
\endcode

\code
int main( void )
{
c_list_t* donut_recipe = NULL;
c_list_t* low_cal;
c_list_t* l;
size_t i;
char* s;

char* ingredients[] = {
"dry yeast", "water", "milk", "shortening", "sugar", "salt",
"all-purpose flour", "eggs", "vanilla extract"
};
char* high_calories[] = { "shortening", "sugar", "eggs" };

for ( i = 0; i < C_ARRAY_N( ingredients ); ++i )
{
s = c_strdup( ingredients );
if ( s ) {
donut_recipe = c_list_insert_sorted( donut_recipe, s,
c_vstrcmp );
}
}

low_cal = c_list_copy( donut_recipe );

for ( i = 0; i < C_ARRAY_N( high_calories ); ++i ) {
low_cal = c_list_remove_if( low_cal, high_calories, c_vstrcmp,
NULL );
}

printf( "---Donut Recipe (low calorie)---\n" );
for ( l = low_cal; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}
printf( "\n" );

printf( "---Donut Recipe---\n" );
for ( l = donut_recipe; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}

c_list_free( low_cal, NULL );
c_list_free( donut_recipe, c_free );

return EXIT_SUCCESS;
}
\endcode

The output of the above program is

\output
---Donut Recipe (low calorie)---
all-purpose flour
dry yeast
milk
salt
vanilla extract
water

---Donut Recipe---
all-purpose flour
dry yeast
eggs
milk
salt
shortening
sugar
vanilla extract
water
\endoutput

Note that I removed some ingredients from the original donut recipe
without *modifying* or *copying* the original data. This is done
through the 'c_list_remove_if' function that is defined by the
following.

/*!
* \brief Remove all objects from a \c c_list_t that satisfy the
* comparison function \c cmp_fn. If none of the objects
* match, the \c c_list_t is unchanged.
* \param list A \c c_list_t.
* \param object The object to remove.
* \param cmp_fn The function to call for each object comparison. It
* should return 0 when the objects match.
* \param free_fn The list object free function.
* \return The new start of the \c c_list_t.
*/
c_list_t* c_list_remove_if( c_list_t* list, void* object,
c_compare_function_t cmp_fn, c_free_function_t free_fn );

Let's examine the call to c_list_remove_if a bit further

\code
low_cal = c_list_remove_if( low_cal, high_calories, c_vstrcmp,
NULL );
\endcode

The low_cal is the start of the linked list.
The high_calories is a specific string to search for using the
'c_vstrcmp' function to determine equality.
The last argument is NULL because this is a shallow copy of the list
data.

At the end, the 'low_cal' list view is freed using
'c_list_free( low_cal, NULL )' because the list doesn't own the memory
its pointing to.

These views can be anything from sorting records by a particular
field, removing nodes that don't match a constraint, all maintained
for the low-low cost of creating a list node.

I'm obviously slanted towards using the void* to implement a linked
list in a generic way. I think the linked list with the contained
data have a place, but they should be used for specific, not generic
applications. And by specific, I mean that if the generic list is too
slow, and can be verified via profiling, then go about creating a
custom linked list for the situation.

Best regards,
John D.

bartc · Apr 9, 2010

Jef Driesen said:
Hi,

A typical (double) linked list is implemented with these two data
structures:

typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;

typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;

typedef struct mydata_t {
...
} mydata_t;

The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for the
user defined mydata_t structure.

Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to the
mydata_t structure like this:

typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;
} node_t;

Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:

node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);

Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?

(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)

Jef Driesen · Apr 9, 2010

That is one of the drawbacks of using a void* in a linked list.
However, since the linked list nodes using void* are always the same
size (12 bytes assuming 32-bit pointers), this can be somewhat
mitigated by using some memory pooling technique. The GLib
implementation has a "memory slice" that it uses to obtain linked
list, but I've never done any measurements to see what the performance
effect of it is.

Implementing memory pooling is a little overkill for my little app.

Since this is in reference to a generic list, there may be alignment
issues in general. However, for many structures, you can use the
technique and get away with it, it's a caveat of the technique. I
know very little about alignment issues in general, but many memory
debuggers use a similar technique to pad a memory allocation with a
struct to maintain properties of the memory returned, like the size of
the allocation, start and end addresses for fence checking. You may
be able to glean how they handle alignment problems from their
source. DMalloc is one memory debugger that you could inspect the
code to see if they have something that you could use towards this
effect.

I want to avoid non-portable code if possible. It's not that I don't
want to use external storage with void* pointers. I was just wondering
if it was possible to avoid it.

While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.

1. Using function pointers to abstract behavior on the structure
pointed to by void*.

The two main function pointer types I'm referring to are these.

typedef void (*c_free_function_t)( void* );
typedef int (*c_compare_function_t)( const void* p, const void* q );

[... lots of interesting code ...]

One of the disadvantages with the external storage is that to destroy
the list, you have to walk the list and destroy each data item first.
This free function is a nice workaround.

2. The other main advantage of using a void* or equivalent is the
ability to create *shallow copies* of the list.

[... lots of interesting code ...]

I don't really need this functionality at the moment.

I'm obviously slanted towards using the void* to implement a linked
list in a generic way. I think the linked list with the contained
data have a place, but they should be used for specific, not generic
applications. And by specific, I mean that if the generic list is too
slow, and can be verified via profiling, then go about creating a
custom linked list for the situation.

I highly doubt the external storage will be too slow. Part of my
question was simply to find out whether it was possible to implement
something like what I proposed, but without running into problems due to
alignment, non-portable behavior, etc.

Jef Driesen · Apr 9, 2010

Why not include a node_t (or just next and prev pointers) at the start of
the mydata_t struct?

(The list_t stuff would have to be a bit different though, either a
dedicated list_t for each mydata_t, or discrete head/tail pointers of type
*mydata_t)

That makes the list is non generic, and that's not what I want.

SG · Apr 9, 2010

jacob said:
In the containers library I am working I added a field "ElementSize" to
the list header (your list_t), and when I allocate a node I allocate with:

node_t *n = malloc(sizeof(node)+list->ElementSize);

To assign the data I do:

memcpy(node->data,object,list->ElementSize);

The node structure is declared exactly like yours but at the end I
differ with:

typedef struct node_t {
node_t *prev;
node_t *next;
char data[];
} node_t;

Note that this is using C99.

But how do you access the data stored in that node? Say, you want to
store a list of doubles. Do you extract the double by memcopying the
bytes back into a double variable or do you write something like
(double*)(&node->data[0])? Alignment seems to be a problem in the
latter case which is IMHO unfortunate. OK, you could try to compute
how much padding would be necessary, store this offset in the "list
header" and allocate a couple of more bytes, I suppose.

If I remember correctly the Linux kernel uses something like this:

struct listlinks_t {
struct listlinks_t *prev;
struct listlinks_t *next;
};

#define containerof(ptr,type,member) \
((type*)((char*)ptr - offsetof(type,member)))

struct mli { /* mli = "my list item", concrete example */
struct listlinks_t listlinks;
double value;
};

(I didn't test the code, but I think the idea is clear)

where the list data structure only deals with listlinks_t structs and
alters some pointers of type listlinks_t*. When the user wants to
access the data of one list node he/she writes something like this:

struct listlinks_t *node = ...;
struct mli *p = containerof(node,struct mli,listlinks);
p->value;

The user manages the memory of the list items him/herself. In other
places the Linux kernel uses its own "kobject model" which includes
reference counting and something I would describe as "virtual
destructors" which allows the data structure (i.e. "kset") to take
control of the life-time management of objects.

IMHO, this is a nice solution -- for C code.

At first glimpse, it shares many similarities with what std::list<T>
can give you in C++. The only practical difference I see here is that
in C you could accidentally use the wrong type/member "parameters" for
the containerof macro. This is because the type information is lost
during the pointer arithmetic/casting hacks and the compiler can't
help you spot these kinds of errors for that reason. The compiler
"trusts" the programmer to get it right. In a way this reminds me of
the early Java days where containers' elements stored "references to
objects" which required manual casting in user code. Java's solution
to that problem was "generics". So, Java has a solution for this. C++
has a solution for this. What about C? Are you [not specifically you,
Jacob, but everyone here who's interested in a generic container
library written in C] comfortable with these kinds of pointer casts
and pointer arithmetics? Since there's no kind of runtime type
checking to see if a cast is valid at runtime nor is it desirable to
do something like this per default in a supposedly generic container
library for overhead reasons, wouldn't it make sense to get a little
help from the compiler via type checking at compile-time? I currently
don't see how this could be done in C, unfortunately. But maybe it's
just me. Maybe I'm overrating type-safety.

Cheers,
SG

SG · Apr 9, 2010

ImpalerCore said:
While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.

If you go for the first approach (data embedded in the "node object")
it allows you to store a void pointer as data. So, while the second
approach /forces/ you to do one extra allocation, the first approach
is more flexible in that it allows you to emulate the behaviour of the
other approach (by storing pointers as data).

Cheers,
SG

Malcolm McLean · Apr 9, 2010

jacob navia wrote:

wouldn't it make sense to get a little
help from the compiler via type checking at compile-time? I currently
don't see how this could be done in C, unfortunately.

You could do it by creating a variable type caled "type", to hold type
values like "int", or "struct fred". It would be a simple enumeration
with a field for levels of indirection.
The vast majority of the use would be in constants that would
propagate.

SG · Apr 9, 2010

jacob said:
jacob said:

[...]

Click to expand...

[...] wouldn't it make sense to get a little
help from the compiler via type checking at compile-time? I
currently don't see how this could be done in C, unfortunately.

Click to expand...

(I meant: ...without adding new language features)

You could do it by creating a variable type caled "type", to hold type
values like "int", or "struct fred". It would be a simple enumeration
with a field for levels of indirection.
The vast majority of the use would be in constants that would
propagate.

Care to explain this a little more? Are you talking about a
hypothetical language feature? Does what you suggest help at compile-
time or at run-time? How does it help?

Cheers,
SG

ImpalerCore · Apr 9, 2010

If you go for the first approach (data embedded in the "node object")
it allows you to store a void pointer as data. So, while the second
approach /forces/ you to do one extra allocation, the first approach
is more flexible in that it allows you to emulate the behaviour of the
other approach (by storing pointers as data).

I will agree that using my generic linked list approach where the list
node uses a void*, I'm not able to avoid the extra allocation, (at
least not without some macro-ey and clever hackery). In a kernel
where things are intended to be very tight, this is a use-case where
the extra allocation would be highly frowned upon.

When trying to develop a linked list for my application, I looked at a
few popular implementations. The thing that caught my eye the most
was GLib's, primarily because of their use of a function pointer to
abstract sorting and search on the linked list via a comparison
function pointer. The concept was easily to understand, and I decided
to use their approach as a starting point. The linux kernel's
implementation was clever, but since it lacked an obvious method to
search and sort a generic list, I was pulled towards the GLib
implementation. Since I didn't find that the linux kernel's version
had a method to abstract sorting and searching, I didn't view it's
other benefits that strongly. But if someone did the work to
integrate a comparison function interface into the linux linked list
interface, it would get my attention.

Best regards,
John D.

ImpalerCore · Apr 9, 2010

That is one of the drawbacks of using a void* in a linked list.
However, since the linked list nodes using void* are always the same
size (12 bytes assuming 32-bit pointers), this can be somewhat
mitigated by using some memory pooling technique. The GLib
implementation has a "memory slice" that it uses to obtain linked
list, but I've never done any measurements to see what the performance
effect of it is.

Click to expand...

Implementing memory pooling is a little overkill for my little app.

Since this is in reference to a generic list, there may be alignment
issues in general. However, for many structures, you can use the
technique and get away with it, it's a caveat of the technique. I
know very little about alignment issues in general, but many memory
debuggers use a similar technique to pad a memory allocation with a
struct to maintain properties of the memory returned, like the size of
the allocation, start and end addresses for fence checking. You may
be able to glean how they handle alignment problems from their
source. DMalloc is one memory debugger that you could inspect the
code to see if they have something that you could use towards this
effect.

Click to expand...

I want to avoid non-portable code if possible. It's not that I don't
want to use external storage with void* pointers. I was just wondering
if it was possible to avoid it.

While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.

Click to expand...

1. Using function pointers to abstract behavior on the structure
pointed to by void*.

Click to expand...

The two main function pointer types I'm referring to are these.

Click to expand...

typedef void (*c_free_function_t)( void* );
typedef int (*c_compare_function_t)( const void* p, const void* q );

Click to expand...

[... lots of interesting code ...]

Click to expand...

One of the disadvantages with the external storage is that to destroy
the list, you have to walk the list and destroy each data item first.
This free function is a nice workaround.

2. The other main advantage of using a void* or equivalent is the
ability to create *shallow copies* of the list.

Click to expand...

[... lots of interesting code ...]

Click to expand...

I don't really need this functionality at the moment.

I'm obviously slanted towards using the void* to implement a linked
list in a generic way. I think the linked list with the contained
data have a place, but they should be used for specific, not generic
applications. And by specific, I mean that if the generic list is too
slow, and can be verified via profiling, then go about creating a
custom linked list for the situation.

Click to expand...

I highly doubt the external storage will be too slow. Part of my
question was simply to find out whether it was possible to implement
something like what I proposed, but without running into problems due to
alignment, non-portable behavior, etc.

Just curious, what kinds of list sizes are you talking about?

The linux kernel linked list implementation may be something you can
go for if you don't need an abstraction for searching and sorting the
linked list. You may find its approach more useful if efficiency is a
high priority. It's an interesting implementation to study.

I honestly don't have enough experience to know if what you're
intending is portable. I use that technique you mention to create a
special allocator wrapper that keeps track of the number of malloc
function calls used and then return NULL to insert allocation faults
into my code to test response to allocation errors.

Someone with more experience may be able to give more of the pitfalls
of the technique, but I haven't been exposed to an environment where
that technique has caused a problem yet.

Best regards,
John D.

ImpalerCore · Apr 9, 2010

jacob said:
jacob said:

In the containers library I am working I added a field "ElementSize" to
the list header (your list_t), and when I allocate a node I allocate with:

Click to expand...

node_t *n = malloc(sizeof(node)+list->ElementSize);

Click to expand...

To assign the data I do:

The node structure is declared exactly like yours but at the end I
differ with:

Click to expand...

typedef struct node_t {
node_t *prev;
node_t *next;
char data[];
} node_t;

Click to expand...

Note that this is using C99.

Click to expand...

But how do you access the data stored in that node? Say, you want to
store a list of doubles. Do you extract the double by memcopying the
bytes back into a double variable or do you write something like
(double*)(&node->data[0])? Alignment seems to be a problem in the
latter case which is IMHO unfortunate. OK, you could try to compute
how much padding would be necessary, store this offset in the "list
header" and allocate a couple of more bytes, I suppose.

If I remember correctly the Linux kernel uses something like this:

struct listlinks_t {
struct listlinks_t *prev;
struct listlinks_t *next;
};

#define containerof(ptr,type,member) \
((type*)((char*)ptr - offsetof(type,member)))

struct mli { /* mli = "my list item", concrete example */
struct listlinks_t listlinks;
double value;
};

(I didn't test the code, but I think the idea is clear)

where the list data structure only deals with listlinks_t structs and
alters some pointers of type listlinks_t*. When the user wants to
access the data of one list node he/she writes something like this:

struct listlinks_t *node = ...;
struct mli *p = containerof(node,struct mli,listlinks);
p->value;

The user manages the memory of the list items him/herself. In other
places the Linux kernel uses its own "kobject model" which includes
reference counting and something I would describe as "virtual
destructors" which allows the data structure (i.e. "kset") to take
control of the life-time management of objects.

IMHO, this is a nice solution -- for C code.

At first glimpse, it shares many similarities with what std::list<T>
can give you in C++. The only practical difference I see here is that
in C you could accidentally use the wrong type/member "parameters" for
the containerof macro. This is because the type information is lost
during the pointer arithmetic/casting hacks and the compiler can't
help you spot these kinds of errors for that reason. The compiler
"trusts" the programmer to get it right. In a way this reminds me of
the early Java days where containers' elements stored "references to
objects" which required manual casting in user code. Java's solution
to that problem was "generics". So, Java has a solution for this. C++
has a solution for this. What about C? Are you [not specifically you,
Jacob, but everyone here who's interested in a generic container
library written in C] comfortable with these kinds of pointer casts
and pointer arithmetics? Since there's no kind of runtime type
checking to see if a cast is valid at runtime nor is it desirable to
do something like this per default in a supposedly generic container
library for overhead reasons, wouldn't it make sense to get a little
help from the compiler via type checking at compile-time? I currently
don't see how this could be done in C, unfortunately. But maybe it's
just me. Maybe I'm overrating type-safety.

Does it bother me that I don't have the compiler helping me out with
type safety with my linked list? No, not really, since there's not a
whole lot that one can do without either changing C or doing a lot of
extra work.

Casting void* to a specific pointer type doesn't bother me at all
simply because that's how generic things are done in C. It's well
defined, it's portable, so there isn't much of an issue for me there.
Sure it's a bit cumbersome to introduce casts when I need to access an
object in a linked list node, but it's not a deal killer to the point
that I find a different language to rewrite a project that is already
well entrenched in C.

Now doing things like allocating a single memory block and chopping it
up using pointer arithmetic and casting I'm less comfortable with, but
if it works for the platforms I need it too, I'll still do it if I
have to.

I don't think that type-safety is overrated. I think it should be
very highly rated. In fact, the issue is important enough that
several other languages were invented to handle type-safety better
than C. To me the question is if type-safety is so important that C
*needs* to be modified to support generics to remove this problem to
implement generic containers. At this point in time, my opinion is
no. If a programmer wants to swim in these shark-infested waters, who
are we to deny him.

Best regards,
John D.

SG · Apr 9, 2010

I will agree that using my generic linked list approach where the list
node uses a void*, I'm not able to avoid the extra allocation, (at
least not without some macro-ey and clever hackery). In a kernel
where things are intended to be very tight, this is a use-case where
the extra allocation would be highly frowned upon.

When trying to develop a linked list for my application, I looked at a
few popular implementations. The thing that caught my eye the most
was GLib's, primarily because of their use of a function pointer to
abstract sorting and search on the linked list via a comparison
function pointer. The concept was easily to understand, and I decided
to use their approach as a starting point. The linux kernel's
implementation was clever, but since it lacked an obvious method to
search and sort a generic list, I was pulled towards the GLib
implementation.

I see no reason why this design approach from the Linux kernel
prohibits arbitrary lists to be sorted according to arbitrary ordering
criteria. I would think that a generic list sort function a la

/**
* functions of type compare_t return TRUE if
* and only if *a comes before *b.
*/
typedef bool compare_t(void const *a, void const *b);

/**
* implements merge sort for linked lists
* using some arbitrary ordering criterion
*/
void list_sort(
struct listlinks_t *first,
ptrdiff_t link_member_offset,
compare_t *comp);

should do the trick. Here, list_sort doesn't need to know that all the
listlinks_t objects (that contain the prev/next pointers) are embedded
in some other struct containing the user data as well. But in order to
call the comparison function with the correct void pointers it needs
to know how to adjust the pointers similarly to how the containerof
macro works. This is done via the link_member_offset parameter. Usage
example:

struct my_double_item {
struct listlinks_t links;
double value;
};

bool my_double_compare(void const *a, void const *b)
{
return ((double const*)a) < ((double const*)b);
}

...
struct listlinks_t *first_node = ...;
list_sort(first_node,
offsetof(struct my_double_item,links),
&my_double_compare);
...

But what we get here is genericity via pointer casts, pointer
arithmetics (the type is actually "erased" which prevents the compiler
from helping us avoid type and pointer errors) and function pointers
(which do bring a certain runtime penalty compared to specialized
sorting functions with an "inline comparator"). Can you do better in
C? I don't know. Frankly, I'm not too happy about the type-safety
here. It looks a bit too hacky for my taste.

Cheers,
SG

ImpalerCore · Apr 9, 2010

I see no reason why this design approach from the Linux kernel
prohibits arbitrary lists to be sorted according to arbitrary ordering
criteria. I would think that a generic list sort function a la

I didn't say that the linux list sort couldn't be done, it's just that
the feature wasn't there in plain sight, and GLib's was. I chose the
path of least resistance.

/**
* functions of type compare_t return TRUE if
* and only if *a comes before *b.
*/
typedef bool compare_t(void const *a, void const *b);

/**
* implements merge sort for linked lists
* using some arbitrary ordering criterion
*/
void list_sort(
struct listlinks_t *first,
ptrdiff_t link_member_offset,
compare_t *comp);

and I guess list_search would look something like this?

struct listlinks_t* list_search( struct listlinks_t* first,
void* object,
ptrdiff_t link_member_offset,
compare_t* comp );

should do the trick. Here, list_sort doesn't need to know that all the
listlinks_t objects (that contain the prev/next pointers) are embedded
in some other struct containing the user data as well. But in order to
call the comparison function with the correct void pointers it needs
to know how to adjust the pointers similarly to how the containerof
macro works. This is done via the link_member_offset parameter. Usage
example:

struct my_double_item {
struct listlinks_t links;
double value;
};

bool my_double_compare(void const *a, void const *b)
{
return ((double const*)a) < ((double const*)b);
}

...
struct listlinks_t *first_node = ...;
list_sort(first_node,
offsetof(struct my_double_item,links),
&my_double_compare);
...

But what we get here is genericity via pointer casts, pointer
arithmetics (the type is actually "erased" which prevents the compiler
from helping us avoid type and pointer errors) and function pointers
(which do bring a certain runtime penalty compared to specialized
sorting functions with an "inline comparator"). Can you do better in
C? I don't know. Frankly, I'm not too happy about the type-safety
here. It looks a bit too hacky for my taste.

The easy answer to that question is, "use a different language",
*cough* C++ STL *cough*, that has better type safety built into the
language. These questions have been around for a very long time, and
C has tended to remain the way that it is with respect to type
safety. I think your criticisms are valid, and agree with them, and
if it was the only deciding factor, no one would choose C to do
anything of this sort. In fact, C was around when Stepanov was
working on his stuff, and he didn't use C. Hmmm, is that a
coincidence?

Do you believe that this is a worthy pursuit in C at all? You're not
alone if you don't think it is. However, there are some people who do
think it is.

Best regards,
John D.

jacob navia · Apr 9, 2010

SG a écrit :

jacob said:
jacob said:

In the containers library I am working I added a field "ElementSize" to
the list header (your list_t), and when I allocate a node I allocate with:

node_t *n = malloc(sizeof(node)+list->ElementSize);

To assign the data I do:

memcpy(node->data,object,list->ElementSize);

The node structure is declared exactly like yours but at the end I
differ with:

typedef struct node_t {
node_t *prev;
node_t *next;
char data[];
} node_t;

Note that this is using C99.

Click to expand...

But how do you access the data stored in that node?

Very easy. You use an iterator

Iterator *it = newIterator(mylist);
double *d;
for (d = GetFirst(it);
d != NULL;
d = it->GetNext(it) {
// Work with *d here
}

Say, you want to
store a list of doubles. Do you extract the double by memcopying the
bytes back into a double variable or do you write something like
(double*)(&node->data[0])?

You do not access the data directly. You use the API.

At first glimpse, it shares many similarities with what std::list<T>
can give you in C++. The only practical difference I see here is that
in C you could accidentally use the wrong type/member "parameters" for
the containerof macro. This is because the type information is lost
during the pointer arithmetic/casting hacks and the compiler can't
help you spot these kinds of errors for that reason. The compiler
"trusts" the programmer to get it right. In a way this reminds me of
the early Java days where containers' elements stored "references to
objects" which required manual casting in user code. Java's solution
to that problem was "generics". So, Java has a solution for this. C++
has a solution for this. What about C? Are you [not specifically you,
Jacob, but everyone here who's interested in a generic container
library written in C] comfortable with these kinds of pointer casts
and pointer arithmetics?

I have written a small utility (called "template.exe") that will take
a container written in a generic way, and will produce C code adapted
to the specific data structure you want (that should be typedefed)

I will publish its code and the sample input with the container
library.

Flattening linked list problem	4	Dec 3, 2024
OT twalk	5	Sep 17, 2010
list = list->Next = malloc(sizeof(node));	4	May 25, 2007
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
Linked-list problem - compiles but segfaults	13	Jul 4, 2011
linked list	19	Nov 21, 2009
Stack using doubly linked list	1	Jun 1, 2011
Thread-Safe Generic List Queue in C	7	Mar 16, 2014

Generic linked list with internal storage?

Jef Driesen

Malcolm McLean

MaciekL

MaciekL

jacob navia

Malcolm McLean

ImpalerCore

bartc

Jef Driesen

Jef Driesen

SG

SG

Malcolm McLean

SG

ImpalerCore

ImpalerCore

ImpalerCore

SG

ImpalerCore

jacob navia

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads