/*
function to slurp in an ASCII file
Params: path - path to file
Returns: malloced string containing whole file
*/
I think we can improve this a great deal, with the result being
a function that is written entirely in Standard C and works in
every case in which it is possible for it to work, and -- by
calling a system-dependent function that the user is to supply,
but which may be replaced with a #define that simply returns 0
if desired -- is "reasonably efficient" as well.
char *loadfile(char *path)
{
FILE *fp;
int ch;
long i = 0;
long size = 0;
char *answer;
fp = fopen(path, "r");
if(!fp)
{
printf("Can't open %s\n", path);
return 0;
}
In a posting I read earlier, Julienne Walker wrote a version
that used a user-supplied "FILE" instead of a name. I think
this is superior, since it allows one to skip over some initial
portion of the file. (It also eliminates the question of what
to do if the file cannot be opened.) So let us do that:
char *loadfile(FILE *fp, size_t *sizep) {
size_t n; /* number of bytes read so far */
size_t space; /* amount of space allocated */
char *buf; /* the buffer we are working with */
char *new; /* for realloc()ing */
Now we come to what I see as the real "point of argument" here.
We would like to get an "estimate" of the size of the file, so that
we can do a single malloc() to hold the contents. Of course at
this particular point, we might like to subtract any initial offset
as well -- but we do not know how to convert the result of ftell()
or fgetpos() into such a number, so I will just proceed as if the
"initially-skipped count" is always zero. (If loadfile() is to
open the file, this is correct; if loadfile takes an already-open
file as above, we could always add a "skipped bytes count" argument.)
Here is where the system-dependent function comes in:
size_t estimate = estimate_file_size(fp);
The "estimate_file_size" function can use fstat() (on POSIX
systems), or "SYS$GETFILEMETADATA" on some other system, or
we can just do:
#define estimate_file_size(fp) 0
because the result is only assumed to be an *estimate*, rather than
an exact answer. As a nice bonus, this means that, e.g., on POSIX
systems, where fstat() returns a handy exact answer that is entirely
wrong if the file is being modified as we read it, the code still
works.
answer = malloc(size + 100);
if(!answer)
{
printf("Out of memory\n");
fclose(fp);
return 0;
}
Instead of adding 100, we just use the estimate (plus 1 for the
'\0'). In case the estimate is zero, though, we use 1 (plus the
same 1).
To indicate failure, we return NULL (as you do here) but
do not print an error message, since general-purpose library routines
usually should not do so (the error may need to be logged rather
than printed, for instance):
if (estimate == 0)
estimate = 1;
space = estimate;
buf = malloc(space + 1);
if (buf == NULL)
return NULL;
(Since we eliminated the issue of opening the file, a NULL return
always means "unable to read", either due to an I/O error or due
to malloc() failure.)
while( (ch = fgetc(fp)) != EOF)
answer[i++] = ch;
Now we get to the part that deals with the fact that the estimate
is merely an estimate:
for (n = 0;

{
size_t nsuccess, nattempt;
int c;
/*
* Attempt to fill in the rest of the buffer.
* We have read "n" bytes so far and we have
* space for "space" bytes (plus 1 extra, not
* counted in "space").
*
* If the read fails, we get a short count or
* zero. A short count could indicate normal EOF
* or an I/O error, so we must use feof() and
* ferror() to tell them apart.
*
* If we get everything we asked for, we hope
* that we are now at EOF, but we may not be,
* so check.
*/
nattempt = space - n;
nsuccess = fread(buf + n, 1, nattempt, fp);
n += nsuccess; /* we now have read this many */
if (nsuccess < nattempt) /* normal EOF or I/O error */
break;
c = getc(fp);
if (c == EOF) /* normal EOF or I/O error */
break;
/*
* Our estimate must have been too low. We actually
* have room to save c, so do that now, then enlarge
* the buffer and try again.
*/
buf[n] = c;
estimate *= 2; /* or any other suitable increase */
new = malloc(estimate + 1);
if (new == NULL) {
free(buf);
return NULL;
}
buf = new;
space = estimate;
}
We exit the above loop only on normal EOF or error (or, in a sense,
if the malloc() fails, but in that case we return to the caller,
rather than ending the loop). So now we check which is the case:
/*
* It might make for better flow to move the ferror() test
* here and let the non-ferror(), i.e., feof(), case be the
* last code in the function. I thought it was interesting
* to use feof() correctly in comp.lang.c for once, though.

*/
if (feof(fp)) {
/*
* All is well -- we successfully read the entire file.
* Optionally, we can realloc down here:
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
* This usually saves a few bytes when the estimate is poor,
* and always saves one byte when the estimate was 0
* (including for actually-empty files), but tends to
* cost runtime. (Of course, it would also make sense
* to see if n differs from the estimate first.)
*/
buf[n] = '\0';
if (sizep != NULL)
*sizep = n;
return buf;
}
/*
* Since the loop terminated, but feof(fp) was not set, we
* must have had some kind of error (bad floppy disk?) while
* reading the file. Here, I choose to discard the data read
* so far and return NULL, but there may be reasons to do
* other things. In general, however, error recovery is
* extremely system-dependent.
*/
free(buf);
return NULL;
}
At the risk of redundancy, here is the complete function, with
a leading comment added, and the extensive internal commenting
shrunken down. I also removed the "estimate" variable as it is
essentially the same as the "space" variable.
The following is entirely untested.
#include <stdio.h>
#include <stdlib.h>
/*
* Load from an existing opened file into memory, adding a
* terminating '\0' to make the result a valid C string. If
* sizep is non-NULL, set it to the number of bytes loaded
* (not including the terminating '\0').
*
* Returns NULL on failure, in which case *sizep is not useful.
*/
char *loadfile(FILE *fp, size_t *sizep) {
size_t n; /* number of bytes read so far */
size_t space; /* amount of space allocated */
char *buf; /* the buffer we are working with */
char *new; /* for realloc()ing */
space = estimate_file_size(fp);
if (space == 0)
space = 1; /* must attempt to read, even if empty file */
buf = malloc(space + 1);
if (buf == NULL)
return NULL;
/*
* If the estimate is 100% accurate or over-estimates,
* this loop runs only once. (If the estimate *is*
* accurate and all goes well, the getc() returns EOF.)
*/
for (n = 0;

{
size_t nsuccess, nattempt;
int c;
/*
* Attempt to fill in the rest of the buffer. Note that
* fread() returns a short count or 0 on EOF or error.
*/
nattempt = space - n;
nsuccess = fread(buf + n, 1, nattempt, fp);
n += nsuccess;
/*
* Terminate loop on EOF or error. If the estimate was
* right, we have to attempt one more byte to see the EOF.
*/
if (nsuccess < nattempt || (c = getc(fp)) == EOF)
break;
buf[n] = c; /* under-estimated -- save c and expand */
space *= 2;
new = malloc(space + 1);
if (new == NULL) {
free(buf);
return NULL;
}
buf = new;
}
if (ferror(fp)) {
/* I/O error -- not dealt with very well here. */
free(buf);
return NULL;
}
/* Loop ended, and not due to error, so must be normal EOF. */
#ifdef OPTIONAL
if (n < space) {
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
}
#endif
buf[n] = '\0';
if (sizep != NULL)
*sizep = n;
return buf;
}