Hi! I'm writing a function that returns an array of (at maximum) 64
pointers to char. I have thought of three possibility:
- The caller passes a pointer to a previously allocated array of 64
pointers. Similarly to sprintf(), the caller is entirely responsible for
handling the memory. The drawback is that my function will have to rely
on the correctness of such pointer to work properly.
- My function allocates the array and returns it. This is similar to
strdup(). Here I will have to rely on the caller function to properly
free the array, which is something I'm not very comfortable with.
- I'll have a "static char *ret[64]" in my function and will return
ret. I've not been able to think of a library function that behaves this
way, so I guess it's not reccomended. The good is that I'm free from
allocation/freeing problem. The bad is that the return values will be
overwritten at each call; if the user wants to keep the return values
for subsequent use he has to copy and store them somewhere. Another
drawback is that it consumes memory even if the function will never be
called.
In my specific case, I was leaning toward the third option but I'd like
to hear your opinion on pitfalls, things that I've missed or alternative
approaches that could work better.
There is no single best answer - what to do depends upon how your
function is to be used and what kind of tradeoffs you want to
make.
Here are some questions to think about:
(1) Is the function a general service routine that can be called
from many places or is it only used in one application? In other
words, is it idiosyncratic? If it is a general service routine
you had best (IMNSHO) make it thread safe. If it is
idiosyncratic you can take certain liberties.
(2) Does it have to be recursion-safe and/or thread-safe? Your
option three is very dangerous. Here is what can happen:
while (some_condition) {
data = your_function();
do_something_clever();
process(data);
}
If do_something_clever calls your_function the ball game is over;
data has been scribbled on.
(3) Will the function be called sporadically or will it be called
in a loop? If it is in a loop you can reuse the data space from
the previous iteration, e.g.,
T * data = 0;
...
/* Optionally allocate space for data here */
while (more_to_do) {
data = your_function(data);
...
}
free(data);
Notice that we pass in the data pointer and return it; this is a
fairly common technique. Now what you can do is allocate space
within your_function if the data pointer is null. As a further
refinement, if the loop normally terminates with a no more data
condition you can use this code
T * data = 0;
...
while (data = your_function(data)) {
...
}
In this case your_function frees the space and returns a null
pointer when it reaches termination. It reallocates space as
needed. This kind of usage pattern is a special case but it is
quite common.
A caveat is that if a instance of the data has a greater life
time than the loop cycle you will need to make a copy of it.
Incidentally passing in the data pointer and returning it is good
form; it preserves flexibility.
(4) Is the data being returned bounded in space or unbounded?
If bounded, is the data expected to be much smaller than the
bound? If the data has a reasonable space bound then you may as
well allocate that much space and be done with it. Otherwise you
will have an error condition to deal with. When the data space
is bounded it is simpler to let the caller provide it because it
can then come from automatic storage. Thus in your case we might
do
char * data_array[64];
...
your_function(data_array);
Here we don't have to allocate and deallocate the data space; it
is handled cheaply and automatically for us. If, however, the
data array were very large we might not want to do this; the
stack (automatic storage) is small compared to the size of the
heap (allocatable storage).
(5) When the data size can be variable do we do automatic
resizing or user controlled resizing? User controlled resizing
is "safer" but more cumbersome; it requires extra code and
decisions on the calling side. Automatic resizing is more
convenient but more prone to bugs, e.g., dangling pointers and
erroneous deallocations.
(6) How do we determine the end of data when it can be variable
sized? There are two basic ways to do this; on is to return
(somehow) the size, and the other is to put a sentinel value at
the end of the data, typically some kind of null value. There
are arguments for each choice. Sometimes it does matter; most of
the time it is a matter of preferred style. However the choice
does impact your function's API and how you allocate storage.
If you go the sentinel route you have to allocate extra space for
the sentinel. Thus, in your case, you would say
char * data_array[65];
or, more generally,
#define NDATA 64
...
data_array[NDATA+1];
On the other hand, if you go the "return size of data" you have
the problem of how to return it. The problem is that C has no
good way to return two distinct things. What you have is a
choice of hacks and kludges. Here are some:
(a) You can pass the address of the size in the calling
sequence, e.g.,
data = your_function(data, &size);
This works and is compact, which is about the best that can be
said for it. Function design is cleaner if the inputs come in
through the calling sequence and outputs are returned and never
the twain are confused.
(b) We can turn (a) around and return the size. In this case
the call looks like:
size = your_function(data);
This form can be quite convenient if we have a loop that
terminates when we run out of data and if the data size is
bounded. This time the code looks like:
char *data[NDATA];
...
while (your_function(data) > 0) {
...
}
(c) You can return a structure that holds both the data and its
size. For example:
struct T_descr {
char *data[NDATA];
int size;
};
...
T_descr descr;
...
descr = your_function(&descr);
for (i = 0; i< descr.size;i++) {
/* do stuff with descr.data
*/
};
This makes the code more cumbersome; however it treats the data
as an object with properties, which may be a better way to handle
the data.
(d) You can pass a structure that holds the data and its size
but return the data. This looks slightly different:
struct T_descr {
char *data[NDATA+1];
int size;
};
...
T_descr descr;
char ** data;
...
data = your_function(&descr);
for (i = 0; i< descr.size;i++) {
/* do stuff with data */
};
Ordinarily data would point to descr.data but could be set to
null on no data. An advantage of this form is that we can, so to
speak, have our cake and eat it too. That is, we can put in a
sentinel and use either sentinel based code or index based code,
depending on which is more convenient.
(e) An "industrial grade" version of (d) hides information in the
structure, e.g.,
struct T_descr {
void * private;
int size;
};
...
T_descr descr = {0,0};
char ** data;
...
data = your_function(&descr);
The point of this form is that your_function can handle
allocation and deallocation behind the scenes and the user does
not have to do anything about. Within your_function private
points to a structure that holds data that is held from one call
to the next.
My apologies if this is a bit on the long winded size; however I
thought it worthwhile to go through some of the alternatives and
issues.