Hi,
A typical (double) linked list is implemented with these two data
structures:
typedef struct node_t {
node_t *prev;
node_t *next;
void *data;
} node_t;
typedef struct list_t {
node_t *head;
node_t *tail;
} list_t;
typedef struct mydata_t {
...
} mydata_t;
The major disadvantage of this approach is that the data contents of the
list needs to be stored outside of the list data structures. Thus if you
want to store some user defined data in the list, you end up with two
memory allocations per node: one for the node_t structure, and one for
the user defined mydata_t structure.
That is one of the drawbacks of using a void* in a linked list.
However, since the linked list nodes using void* are always the same
size (12 bytes assuming 32-bit pointers), this can be somewhat
mitigated by using some memory pooling technique. The GLib
implementation has a "memory slice" that it uses to obtain linked
list, but I've never done any measurements to see what the performance
effect of it is.
Is there a way to avoid this additional allocation, but without loosing
the ability to store custom data? Thus I don't want to tie my list to
the mydata_t structure like this:
typedef struct node_t {
node_t *prev;
node_t *next;
mydata_t data;
} node_t;
This is not a generic linked list, its a specific linked list for
mydata_t. You need some generic mechanism like a void* or a resizable
byte array.
Maybe allocate storage for the node_t and the mydata_t structure in a
single block. Something like this:
node_t *node = malloc (sizeof (node_t) + sizeof (mydata_t));
mydata_t *data = node + sizeof (node_t);
But I think this might cause trouble due to alignment issues?
Since this is in reference to a generic list, there may be alignment
issues in general. However, for many structures, you can use the
technique and get away with it, it's a caveat of the technique. I
know very little about alignment issues in general, but many memory
debuggers use a similar technique to pad a memory allocation with a
struct to maintain properties of the memory returned, like the size of
the allocation, start and end addresses for fence checking. You may
be able to glean how they handle alignment problems from their
source. DMalloc is one memory debugger that you could inspect the
code to see if they have something that you could use towards this
effect.
While the separate memory allocation cost of a linked list node with a
void* and its corresponding structure can be a major disadvantage,
there are some *major advantages* to using a void* in a generic linked
list interface.
1. Using function pointers to abstract behavior on the structure
pointed to by void*.
The two main function pointer types I'm referring to are these.
typedef void (*c_free_function_t)( void* );
typedef int (*c_compare_function_t)( const void* p, const void* q );
Consider what I use for my linked list free function.
\code
void c_list_free( c_list_t* list, c_free_function_t free_fn )
{
c_list_t* node;
while ( list )
{
node = list;
list = list->next;
if ( free_fn ) {
(*free_fn)( node->object );
}
c_free( node );
}
}
\endcode
The free_fn parameter abstracts what is required to properly destroy
the list node object. You can think of free_fn as a parallel to a C++
object destructor. If the linked list is responsible for the objects
it maintains, all you need to do to clean up the list is pass the
right destructor function. If the linked list is pointing to char*
arrays allocated with 'malloc', you can pass 'free' as the argument
and it will release each node's memory. If the structure is more
complicated, (for instance I have an auto-allocating structure called
c_string_t that maintains a dynamically allocated buffer), I can pass
a destructor specific to that c_string_t object. If the linked list
is a shallow reference to data, NULL can be passed to just destroy the
nodes themselves. I find this function convenient since I don't have
to write an iterating loop to release node resources.
While the free function is the appetizer, the comparison function is
the main course. It allows you to abstract list operations that
require comparisons. Typically this involves some kind of sorting,
but there other creative uses for them as well.
Consider a function to insert a generic object into a sorted linked
list (crosses my fingers that the code doesn't wrap too badly).
\code
c_list_t* c_list_insert_sorted( c_list_t* list,
void* object,
c_compare_function_t cmp_fn )
{
c_list_t* tmp_list = list;
c_list_t* node = NULL;
int cmp;
if ( cmp_fn )
{
/* c_new is a macro wrapper around malloc */
node = c_new( c_list_t, 1 );
if ( node )
{
node->object = object;
node->prev = NULL;
node->next = NULL;
if ( !list ) {
return node;
}
/*
* When defining the comparison function, it's second argument
is
* required to be the same type as the first, which should be
the
* structure pointed to by the list node's object pointer.
*/
cmp = (*cmp_fn)( object, tmp_list->object );
while ( (tmp_list->next) && (cmp > 0) )
{
tmp_list = tmp_list->next;
cmp = (*cmp_fn)( object, tmp_list->object );
}
if ( (!tmp_list->next) && (cmp > 0) )
{
tmp_list->next = node;
node->prev = tmp_list;
return list;
}
if ( tmp_list->prev )
{
tmp_list->prev->next = node;
node->prev = tmp_list->prev;
}
node->next = tmp_list;
tmp_list->prev = node;
if ( tmp_list == list ) {
return node;
}
}
}
return list;
}
\endcode
I can use this engine to insert any kind of object into a homogenous
list as long as I have some method of defining an order. Here are a
couple of comparison functions you could use to sort strings that use
char*.
int c_vstrcmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;
return c_strcmp( s1, s2 );
}
int vstrlencmp( const void* p, const void* q )
{
const char* s1 = (const char*)p;
const char* s2 = (const char*)q;
int slen1 = (int)strlen( s1 );
int slen2 = (int)strlen( s2 );
return slen1 - slen2;
}
Note that one orders strings alphanumerically while the other orders
strings by their length.
\code
int main( void )
{
c_list_t* grocery_list = NULL;
c_list_t* l;
size_t i;
char* s;
char* food_items[] = {
"flour", "milk", "butter", "sugar", "bread", "apples", "eggs"
};
for ( i = 0; i < C_ARRAY_N( food_items ); ++i )
{
s = c_strdup( food_items
);
if ( s ) {
grocery_list = c_list_insert_sorted( grocery_list, s,
c_vstrcmp );
}
}
printf( "Grocery List:" );
for ( l = grocery_list; l != NULL; l = l->next ) {
printf( " %s", (char*)l->object );
}
printf( "\n" );
c_list_free( grocery_list, c_free );
return EXIT_SUCCESS;
}
\endcode
\output
Grocery List: apples bread butter eggs flour milk sugar
\endoutput
2. The other main advantage of using a void* or equivalent is the
ability to create *shallow copies* of the list.
Tying the data structure into the linked list node makes creating
views of your data really expensive. When the data is referenced by a
void*, all I need to do to create a linked list that is a shallow
reference to another linked list is to copy the pointer. I don't want
to reallocate a potentially big struct tied to the linked list node in
order to make a different view. Here is my function for creating
shallow copy of a linked list.
\code
c_list_t* c_list_copy( c_list_t* list )
{
c_list_t* new_list = NULL;
c_list_t* node;
c_list_t* last;
c_bool list_alloc_failed;
if ( list )
{
new_list = c_new( c_list_t, 1 );
if ( new_list )
{
new_list->object = list->object;
new_list->prev = NULL;
last = new_list;
list = list->next;
list_alloc_failed = FALSE;
while ( list && !list_alloc_failed )
{
last->next = c_new( c_list_t, 1 );
if ( last->next )
{
last->next->prev = last;
last = last->next;
last->object = list->object;
list = list->next;
}
else {
list_alloc_failed = TRUE;
}
}
last->next = NULL;
if ( list_alloc_failed )
{
while ( new_list )
{
node = new_list;
new_list = new_list->next;
c_free( node );
}
new_list = NULL;
}
}
}
return new_list;
}
\endcode
\code
int main( void )
{
c_list_t* donut_recipe = NULL;
c_list_t* low_cal;
c_list_t* l;
size_t i;
char* s;
char* ingredients[] = {
"dry yeast", "water", "milk", "shortening", "sugar", "salt",
"all-purpose flour", "eggs", "vanilla extract"
};
char* high_calories[] = { "shortening", "sugar", "eggs" };
for ( i = 0; i < C_ARRAY_N( ingredients ); ++i )
{
s = c_strdup( ingredients );
if ( s ) {
donut_recipe = c_list_insert_sorted( donut_recipe, s,
c_vstrcmp );
}
}
low_cal = c_list_copy( donut_recipe );
for ( i = 0; i < C_ARRAY_N( high_calories ); ++i ) {
low_cal = c_list_remove_if( low_cal, high_calories, c_vstrcmp,
NULL );
}
printf( "---Donut Recipe (low calorie)---\n" );
for ( l = low_cal; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}
printf( "\n" );
printf( "---Donut Recipe---\n" );
for ( l = donut_recipe; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}
c_list_free( low_cal, NULL );
c_list_free( donut_recipe, c_free );
return EXIT_SUCCESS;
}
\endcode
The output of the above program is
\output
---Donut Recipe (low calorie)---
all-purpose flour
dry yeast
milk
salt
vanilla extract
water
---Donut Recipe---
all-purpose flour
dry yeast
eggs
milk
salt
shortening
sugar
vanilla extract
water
\endoutput
Note that I removed some ingredients from the original donut recipe
without *modifying* or *copying* the original data. This is done
through the 'c_list_remove_if' function that is defined by the
following.
/*!
* \brief Remove all objects from a \c c_list_t that satisfy the
* comparison function \c cmp_fn. If none of the objects
* match, the \c c_list_t is unchanged.
* \param list A \c c_list_t.
* \param object The object to remove.
* \param cmp_fn The function to call for each object comparison. It
* should return 0 when the objects match.
* \param free_fn The list object free function.
* \return The new start of the \c c_list_t.
*/
c_list_t* c_list_remove_if( c_list_t* list, void* object,
c_compare_function_t cmp_fn, c_free_function_t free_fn );
Let's examine the call to c_list_remove_if a bit further
\code
low_cal = c_list_remove_if( low_cal, high_calories, c_vstrcmp,
NULL );
\endcode
The low_cal is the start of the linked list.
The high_calories is a specific string to search for using the
'c_vstrcmp' function to determine equality.
The last argument is NULL because this is a shallow copy of the list
data.
At the end, the 'low_cal' list view is freed using
'c_list_free( low_cal, NULL )' because the list doesn't own the memory
its pointing to.
These views can be anything from sorting records by a particular
field, removing nodes that don't match a constraint, all maintained
for the low-low cost of creating a list node.
I'm obviously slanted towards using the void* to implement a linked
list in a generic way. I think the linked list with the contained
data have a place, but they should be used for specific, not generic
applications. And by specific, I mean that if the generic list is too
slow, and can be verified via profiling, then go about creating a
custom linked list for the situation.
Best regards,
John D.