Linked list or realloc()?

G

Goh, Yong Kwang

I'm currently doing a project that has a array that supposed to be
determined only at runtime.

Originally, the prototype I did as a proof of theory (or a quick hack)
So one method is to create a linked list and use malloc to create
LinkedListNode as needed and use free() to destroy a node.

But another easier way would be to use the realloc() function call to
resize a memory block in the heap so that all the "nodes" are now
consecutive and I can use the qsort() function. With LinkedList, I
can't use qsort() in stdlib.

I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same. Also,
realloc() will provide a block of consecutive memory bytes, thus more
locality and speeds up searching and sorting, instead of LinkedList
nodes that are scattered all over in the heap.

Are my above 2 assumptions true? Anyone tried this out before? And how
is the performance like?
 
R

Richard Bos

I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.

And doesn't use bubble-sort. However, a sorting function designed to use
with linked lists (a merge sort, for example) is likely to be faster
still.
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same.

Not necessarily. It depends on the reallocation scheme you use. Remember
that realloc() could end up copying all the data in your list, possibly
every time you add an entry. This could be inefficient, and could also
leave some holes in your allocation arena. OTOH, if you only add data in
large chunks and you know in advance how large the currently added chunk
is, it could be a lot more efficient.

Richard
 
C

CBFalconer

Goh said:
I'm currently doing a project that has a array that supposed to be
determined only at runtime.

Originally, the prototype I did as a proof of theory (or a quick hack)
So one method is to create a linked list and use malloc to create
LinkedListNode as needed and use free() to destroy a node.

But another easier way would be to use the realloc() function call to
resize a memory block in the heap so that all the "nodes" are now
consecutive and I can use the qsort() function. With LinkedList, I
can't use qsort() in stdlib.

I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same. Also,
realloc() will provide a block of consecutive memory bytes, thus more
locality and speeds up searching and sorting, instead of LinkedList
nodes that are scattered all over in the heap.

Are my above 2 assumptions true? Anyone tried this out before? And how
is the performance like?

No. Simply use mergesort to sort the linked list. The sort will
be almost as fast as quicksort, and much faster than any O(N*N)
sort. It will also be stable, unlike quicksort.

Assumption 2 is especially flawed in that a realloc is fairly
likely to require copying all data that came before. This is an
O(N*N) algorithm again.
 
N

nrk

CBFalconer said:
No. Simply use mergesort to sort the linked list. The sort will
be almost as fast as quicksort, and much faster than any O(N*N)
sort. It will also be stable, unlike quicksort.

Assumption 2 is especially flawed in that a realloc is fairly
likely to require copying all data that came before. This is an
O(N*N) algorithm again.

Copying in O(N*N)... now, are we talking about DS9K implementations here?
;-) Or were you talking about the realloc mechanism itself?

-nrk.
 
A

Al Bowers

I'm currently doing a project that has a array that supposed to be
determined only at runtime.

Originally, the prototype I did as a proof of theory (or a quick hack)
So one method is to create a linked list and use malloc to create
LinkedListNode as needed and use free() to destroy a node.

But another easier way would be to use the realloc() function call to
resize a memory block in the heap so that all the "nodes" are now
consecutive and I can use the qsort() function. With LinkedList, I
can't use qsort() in stdlib.

I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same. Also,
realloc() will provide a block of consecutive memory bytes, thus more
locality and speeds up searching and sorting, instead of LinkedList
nodes that are scattered all over in the heap.

Are my above 2 assumptions true? Anyone tried this out before? And how
is the performance like?

I'm not sure that it would be of better performance, but it seems to
me, that if the nodes are large objects, you can store the nodes in
a singly-linked (or doubly if you going to be removing nodes) list
and at the same time create an array of pointers
to each of the nodes in the linked list. This way instead of moving
around the nodes, you will sort the array of pointers, search the
array of pointers with the Standard C functions qsort and bsearch,
leaving the list structure unchanged.

For example you could cread a data type similiar to this:

typedef struct LIST
{
struct bio *listhead;
struct bio **sortedlist;
size_t count;
} LIST;

where listhead is a pointer to the head of the list, sortedlist
is an array of the node pointers that you dynamically reallocate and
count is a counter of the number of nodes created.

You then can write functions to manage the datatype.

Here is a simple example that can be modified to give you
even better performance.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define MAX_NAME 64

typedef enum{NO,YES} bool;

struct bio
{
char name[MAX_NAME];
unsigned age;
struct bio *next;
};

typedef struct LIST
{
struct bio *listhead;
struct bio **sortedlist;
size_t count;
} LIST;

/* Prototypes */
bool AddToLIST(LIST *p, const char *name, unsigned age,
int (*cmp)(const void *v1, const void *v2));
void FreeLIST(LIST *p);
int cmp(const void *v1, const void *v2);

int main(void)
{
LIST presidents = {NULL};
size_t i;

AddToLIST(&presidents,"Washington, George",34,cmp);
AddToLIST(&presidents,"Clinton, Bill",35,cmp);
AddToLIST(&presidents,"Nixon, Richard", 36,cmp);
puts("Sorted list of Presidents");
for(i = 0;i < presidents.count;i++)
printf("\tname: %s\n\tage: %u\n\n",
presidents.sortedlist->name,
presidents.sortedlist->age);
FreeLIST(&presidents);
return 0;
}

bool AddToLIST(LIST *p, const char *name, unsigned age,
int (*cmp)(const void *v1, const void *v2))
{
struct bio *new, **tmp;

if((new = malloc(sizeof *new)) == NULL) return NO;
if((tmp = realloc(p->sortedlist,
(p->count+1)*sizeof *tmp)) == NULL)
{
free(new);
return NO;
}
p->sortedlist = tmp;
new->next = p->listhead;
strncpy(new->name,name,MAX_NAME);
new->name[MAX_NAME-1] = '\0';
new->age = age;
p->listhead = p->sortedlist[p->count++] = new;
if(p->count >1)
qsort(p->sortedlist,p->count,sizeof *p->sortedlist,cmp);
return YES;
}

void FreeLIST(LIST *p)
{
size_t i;

for(i = 0;i < p->count;i++) free(p->sortedlist);
free(p->sortedlist);
p->count = 0;
p->listhead = NULL;
p->sortedlist = NULL;
return;
}

int cmp(const void *v1, const void *v2)
{
const struct bio *s1 = *(void **)v1;
const struct bio *s2 = *(void **)v2;

return strcmp(s1->name,s2->name);
}
 
N

nrk

I'm currently doing a project that has a array that supposed to be
determined only at runtime.

Originally, the prototype I did as a proof of theory (or a quick hack)
So one method is to create a linked list and use malloc to create
LinkedListNode as needed and use free() to destroy a node.

But another easier way would be to use the realloc() function call to
resize a memory block in the heap so that all the "nodes" are now
consecutive and I can use the qsort() function. With LinkedList, I
can't use qsort() in stdlib.

You can use qsort() with any array. To use it with a linked list, put
pointers to nodes into an array, then sort this array of pointers using
qsort and then rebuild the list from the sorted array of pointers. Whether
this is worth the trouble is another matter.
I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:

Note that with realloc, there is copying of data involved. As a safe lower
bound you would have to assume that you'll be needing atleast twice as much
memory as the data you possess. With a linked list you don't have this
problem. Also, depending on how you do your realloc (if you try to expand
by 1 everytime the limit is reached, you're in for some surprise on the
performance front), the performance may or may not be better than using a
linked list. The answer to performance trade-offs, as it usually is, is "It
depends".
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.

Absolutely. Why re-invent a square wheel when a perfectly circular one has
been handed to you?
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same. Also,
realloc() will provide a block of consecutive memory bytes, thus more
locality and speeds up searching and sorting, instead of LinkedList
nodes that are scattered all over in the heap.

Not necessary. The locality might help, but realloc likely has more
overhead than malloc. You have to factor in the copying cost, and also
keep in mind that you might need twice as much memory as your data.
Are my above 2 assumptions true? Anyone tried this out before? And how
is the performance like?

The answer to the first question is, "It depends". And the only way to tell
is to look at your specific inputs and requirements then analyze both
approaches.

-nrk.
 
E

Eric Sosman

CBFalconer said:
[...]
No. Simply use mergesort to sort the linked list. The sort will
be almost as fast as quicksort, and much faster than any O(N*N)
sort. It will also be stable, unlike quicksort.

It's straightforward to implement a stable mergesort,
but not all mergesort implementations are necessarily stable.
 
B

Ben Pfaff

nrk said:
Copying in O(N*N)... now, are we talking about DS9K implementations here?
;-) Or were you talking about the realloc mechanism itself?

I think he's assuming that the array is extended arithmetically;
e.g. each time add room for another 16 items. In that case the
copying will make the algorithm O(N*N). But if you extend it
geometrically (e.g. each time add room for another 1/3 of the
current total) then the algorithm stays O(N).
 
M

Martin Dickopp

nrk said:
Copying in O(N*N)... now, are we talking about DS9K implementations here?
;-) Or were you talking about the realloc mechanism itself?

If you grow the memory area by a constant amount every time it is
exhaused, `realloc' has to copy /more/ data every time it is called.
This results in O(N*N) behavior.

Often, a feasible solution is multiply the size of the memory area by a
constant factor every time it is exhaused. This way, the time intervals
how often growing is necessary increase by the same rate as the amount
of data that has to be copied, resulting in O(N) behavior (amortized).

Martin
 
S

Spacen Jasset

Goh said:
I'm currently doing a project that has a array that supposed to be
determined only at runtime.

Originally, the prototype I did as a proof of theory (or a quick hack)
So one method is to create a linked list and use malloc to create
LinkedListNode as needed and use free() to destroy a node.

But another easier way would be to use the realloc() function call to
resize a memory block in the heap so that all the "nodes" are now
consecutive and I can use the qsort() function. With LinkedList, I
can't use qsort() in stdlib.

I believe that by using realloc. Although it seems that this is a
"clumsy" approach, I believe that it's actually more efficient and
faster over using LinkedList due to 2 reasons:
(1) Using the qsort() that comes with stdlib is faster than some
bubble sort code I write myself. At least it is optimized and bugfree
and tested.
(2) Reallocating (resizing) has overheads but the overhead for doing
multiple malloc for each node for LinkedList is about the same. Also,
realloc() will provide a block of consecutive memory bytes, thus more
locality and speeds up searching and sorting, instead of LinkedList
nodes that are scattered all over in the heap.

Are my above 2 assumptions true? Anyone tried this out before? And how
is the performance like?

As the other posters have indicated a merge sort on a linked list would be
favourable. However if you require random access to your data then be all
means use realloc. I personally don't think realloc should be avoided on
principle; there are many cases where it at least as efficient, and
sometimes the code can be easier to write and maintain. Dynamic strings, for
example. It depends on your application and what you want to do of course.
Repeatedly reallocating a chunk of memory to be one bigger than it already
is - is probably always a bad thing to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top