Variable-sized lines of text in linked list


S

Scottman

I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

This is how I set up the linked list...

typedef struct node {
char *line;
struct node *next;
} linkedlist;

linkedlist* createlinkedlist(void) {
linkedlist* head;
head = (linkedlist *)malloc(sizeof(linkedlist));
head->line = NULL;
head->next = NULL;
return head;
}

void addnode(linkedlist* list, char *line) {
linkedlist* freespot;
linkedlist* newnode;
freespot = list;
while (freespot->next != NULL)
freespot = freespot->next;
newnode = (linkedlist *)malloc(sizeof(linkedlist));
newnode->line = line;
newnode->next = NULL;
freespot->next = newnode;
}

So with this in place, how can I read in variable length lines,
malloc() the proper storage for each and pass the pointer to
addnode()?

Thanks!

Cheers,
Scott Nelson
 
Ad

Advertisements

S

santosh

Scottman said:
I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

This is how I set up the linked list...

typedef struct node {
char *line;
struct node *next;
} linkedlist;

linkedlist* createlinkedlist(void) {
linkedlist* head;
head = (linkedlist *)malloc(sizeof(linkedlist));
head->line = NULL;
head->next = NULL;
return head;
}

void addnode(linkedlist* list, char *line) {
linkedlist* freespot;
linkedlist* newnode;
freespot = list;
while (freespot->next != NULL)
freespot = freespot->next;
newnode = (linkedlist *)malloc(sizeof(linkedlist));
newnode->line = line;
newnode->next = NULL;
freespot->next = newnode;
}

So with this in place, how can I read in variable length lines,
malloc() the proper storage for each and pass the pointer to
addnode()?

Thanks!

You essentially have to write a "read loop" which reads in the line into
a block of memory which will have to be expanded by realloc as more of
the line is read in. Finally when you hit the '\n' character, you can
pass this buffer to addnode. The primitive you can use to read can be
getc or fgetc (if you want to read character by character) or fgets (if
you want to read in larger sequences). You must also work out what to
do if realloc fails in the middle of a re-size attempt, or if the read
function fails with an I/O error, or if you hit end-of-file before
reading any characters, or if the first character is a '\n' and so on.
This design is actually harder than it sounds, and there have been
repeated long threads about this issue in this group, in the past. A
Google search should throw up lots of interesting discussions.
 
I

Ioan - Ciprian Tandau

I am trying to read a text file into memory without any knowledge of how
long each line will be. I am looking to store each line in a linked
list structure, however, I am unsure of how to dynamically allocate
space for each line.
void addnode(linkedlist* list, char *line) {
linkedlist* freespot;
linkedlist* newnode;
freespot = list;
while (freespot->next != NULL)
freespot = freespot->next;
newnode = (linkedlist *)malloc(sizeof(linkedlist)); newnode->line =
line;
newnode->next = NULL;
freespot->next = newnode;
}

You realize that you're iterating through the entire list for each new
line you want to add, right?
So with this in place, how can I read in variable length lines, malloc()
the proper storage for each and pass the pointer to addnode()?

You start by allocating a small array of characters and then start
reading characters into it. If you reach the limit of the allocated
memory and you still need to add more characters (including the string
termination character ('\0')), use *realloc* to resize the array and
continue populating it. *fgets* may help you with reading the characters.
Somebody else may even be able to point you to code that can help you
achieve this without implementing it yourself.
 
S

santosh

Ioan said:
You realize that you're iterating through the entire list for each new
line you want to add, right?


You start by allocating a small array of characters and then start
reading characters into it. If you reach the limit of the allocated
memory and you still need to add more characters (including the string
termination character ('\0')), use *realloc* to resize the array and
continue populating it. *fgets* may help you with reading the
characters. Somebody else may even be able to point you to code that
can help you achieve this without implementing it yourself.

This is a good page on this subject.

<http://www.cpax.org.uk/prg/writings/fgetdata.php>
 
P

pete

Scottman said:
I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

I have a program that does that here:
http://www.mindspring.com/~pfilandr/C/get_line/get_line.c

Here's another example:

/* BEGIN file_sort_2.c */
/*
From
program that reads 3 list of numbers,
which are stored in three seperate files,
and creates one sorted list.
Each file should contain not more than 15 numbers.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

#define NFILES 3
#define MAX_LINES_PER_FILE 15
#define LU_RAND_SEED 0LU
#define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFFLU)
#define NMEMB(A) (sizeof (A) / sizeof *(A))

struct list_node {
struct list_node *next;
void *data;
};

typedef struct list_node list_type;

int numcomp(const list_type *a, const list_type *b);
int get_line(char **lineptr, size_t *n, FILE *stream);
list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size);
void list_free(list_type *node, void (*free_data)(void *));
list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *));
int list_fputs(const list_type *node, FILE *stream);
static list_type *sort_node (list_type *head,
int (*compar)(const list_type *, const list_type *));
static list_type *merge_lists(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *));
static list_type *split_list(list_type *head);

int main(void)
{
long unsigned index, lu_seed, line;
int rc;
char *buff;
size_t size;
list_type *tail, *head;
char fn[L_tmpnam];
FILE *fp;

puts("/* BEGIN file_sort_2.c output */");
/*
** Open temporary input text files for writing.
** Write long unsigned values to standard output
** as well as to temporary input text files.
** Close each file after filling with long unsigned values.
** Open input text files for reading.
** Represent each line of each input text file
** as a string in a node of a linked list.
** Close each temp input file after reading.
*/
size = 0;
buff = NULL;
head = tail = NULL;
lu_seed = LU_RAND_SEED;
tmpnam(fn);
for (index = 0; index != NFILES; ++index) {
fp = fopen(fn, "w");
if (fp == NULL) {
fputs("fopen(fn, \"w\") == NULL\n", stderr);
break;
}
printf("\nInput file #%lu\n", index + 1);
line = lu_seed % MAX_LINES_PER_FILE + 1;
while (line-- != 0) {
lu_seed = LU_RAND(lu_seed);
fprintf( fp, "%lu\n", lu_seed);
fprintf(stdout, "%lu\n", lu_seed);
}
fclose(fp);
fp = fopen(fn, "r");
if (fp == NULL) {
fputs("fopen(fn, \"r\") == NULL\n", stderr);
break;
}
while ((rc = get_line(&buff, &size, fp)) > 0) {
tail = list_append(&head, tail, buff, rc);
if (tail == NULL) {
fputs("tail == NULL\n", stderr);
break;
}
}
fclose(fp);
if (rc != EOF) {
fprintf(stderr, "rc == %d\n", rc);
break;
}
}
/*
** Free allocated buffer used by get_line function.
** Remove temp input file.
*/
free(buff);
remove(fn);
/*
** Sort list.
** Display list.
** Free list.
*/
head = list_sort(head, numcomp);
puts("\nSorted Output List");
list_fputs(head, stdout);
list_free(head, free);
puts("\n/* END file_sort_2.c output */");
return 0;
}

int numcomp(const list_type *a, const list_type *b)
{
const long unsigned a_num = strtoul(a -> data, NULL, 10);
const long unsigned b_num = strtoul(b -> data, NULL, 10);

return b_num > a_num ? -1 : b_num != a_num;
}

int get_line(char **lineptr, size_t *n, FILE *stream)
{
int rc;
void *p;
size_t count;

count = 0;
while ((rc = getc(stream)) != EOF
|| !feof(stream) && !ferror(stream))
{
++count;
if (count == (size_t)-2) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
break;
}
if (count + 2 > *n) {
p = realloc(*lineptr, count + 2);
if (p == NULL) {
if (*n > count) {
if (rc != '\n') {
(*lineptr)[count] = '\0';
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
}
} else {
if (*n != 0) {
**lineptr = '\0';
}
ungetc(rc, stream);
}
count = 0;
break;
}
*lineptr = p;
*n = count + 2;
}
if (rc != '\n') {
(*lineptr)[count - 1] = (char)rc;
} else {
(*lineptr)[count - 1] = '\0';
break;
}
}
if (rc != EOF || !feof(stream) && !ferror(stream)) {
rc = INT_MAX > count ? count : INT_MAX;
} else {
if (*n > count) {
(*lineptr)[count] = '\0';
}
}
return rc;
}

list_type *list_append
(list_type **head, list_type *tail, void *data, size_t size)
{
list_type *node;

node = malloc(sizeof *node);
if (node != NULL) {
node -> next = NULL;
node -> data = malloc(size);
if (node -> data != NULL) {
memcpy(node -> data, data, size);
if (*head != NULL) {
tail -> next = node;
} else {
*head = node;
}
} else {
free(node);
node = NULL;
}
}
return node;
}

void list_free(list_type *node, void (*free_data)(void *))
{
list_type *next_node;

while (node != NULL) {
next_node = node -> next;
free_data(node -> data);
free(node);
node = next_node;
}
}

list_type *list_sort(list_type *head,
int (*compar)(const list_type *, const list_type *))
{
return head != NULL ? sort_node(head, compar) : head;
}

int list_fputs(const list_type *node, FILE *stream)
{
int rc = 0;

while (node != NULL
&& (rc = fputs(node -> data, stream)) != EOF
&& (rc = putc('\n', stream)) != EOF)
{
node = node -> next;
}
return rc;
}

static list_type *sort_node(list_type *head,
int (*compar)(const list_type *, const list_type *))
{
list_type *tail;

if (head -> next != NULL) {
tail = split_list(head);
tail = sort_node(tail, compar);
head = sort_node(head, compar);
head = merge_lists(head, tail, compar);
}
return head;
}

static list_type *split_list(list_type *head)
{
list_type *tail;

tail = head -> next;
while ((tail = tail -> next) != NULL
&& (tail = tail -> next) != NULL)
{
head = head -> next;
}
tail = head -> next;
head -> next = NULL;
return tail;
}

static list_type *merge_lists(list_type *head, list_type *tail,
int (*compar)(const list_type *, const list_type *))
{
list_type *list, *sorted, **node;

node = compar(head, tail) > 0 ? &tail : &head;
list = sorted = *node;
*node = sorted -> next;
while (*node != NULL) {
node = compar(head, tail) > 0 ? &tail : &head;
sorted -> next = *node;
sorted = *node;
*node = sorted -> next;
}
sorted -> next = tail != NULL ? tail : head;
return list;
}

/* END file_sort_2.c */
 
M

Morris Dovey

Scottman said:
So with this in place, how can I read in variable length lines,
malloc() the proper storage for each and pass the pointer to
addnode()?

Another way than has been mentioned is to read characters
recursively until end of line is detected (at which point you
know exactly how long the line actually is), allocate a buffer of
that size, and move all of the input into the newly-allocated
buffer before returning a pointer to the buffer to the calling
function.

Alternatively, you could limit the depth of recursion and create
a linked list of buffers that (collectively) contain the entire
line, and/or (optionally) push those buffers off to disk as they
were filled so as to not fail until your disk was completely
filled...
 
Ad

Advertisements

C

cr88192

Scottman said:
I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

This is how I set up the linked list...

typedef struct node {
char *line;
struct node *next;
} linkedlist;

linkedlist* createlinkedlist(void) {
linkedlist* head;
head = (linkedlist *)malloc(sizeof(linkedlist));
head->line = NULL;
head->next = NULL;
return head;
}

void addnode(linkedlist* list, char *line) {
linkedlist* freespot;
linkedlist* newnode;
freespot = list;
while (freespot->next != NULL)
freespot = freespot->next;
newnode = (linkedlist *)malloc(sizeof(linkedlist));
newnode->line = line;
newnode->next = NULL;
freespot->next = newnode;
}

So with this in place, how can I read in variable length lines,
malloc() the proper storage for each and pass the pointer to
addnode()?

usual, simplistic approach:
use a largish buffer to read input lines (somewhat longer than the longest
"sane" line), alternatively, it can be resized as needed as well;
usually, this can be read with either fgets, or a loop using fgetc (better
for the resize approach).

after this, we can use strdup or similar to allocate the exact string (the
main buffer is reused for reading each line, thus tending to be either the
initial size of the size of the longest line).

most of the time though, I make simplifying assumptions, such as assuming
256 chars is a sane maximum line length (if longer, the line is naturally
and arbitrarily broken at this limit by the use of fgets or similar).


another common approach in my case is to allocate a buffer big enough for
the whole file, which is read in and processed as such (decomposed into
lines or tokens or whatever...).
 
E

ediebur

Scottman said:
I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

This is how I set up the linked list...

typedef struct node {
char *line;
struct node *next;
} linkedlist;

linkedlist* createlinkedlist(void) {
linkedlist* head;
head = (linkedlist *)malloc(sizeof(linkedlist));
head->line = NULL;
head->next = NULL;
return head;
}

void addnode(linkedlist* list, char *line) {
linkedlist* freespot;
linkedlist* newnode;
freespot = list;
while (freespot->next != NULL)
freespot = freespot->next;
newnode = (linkedlist *)malloc(sizeof(linkedlist));
newnode->line = line;
newnode->next = NULL;
freespot->next = newnode;
}

So with this in place, how can I read in variable length lines,
malloc() the proper storage for each and pass the pointer to
addnode()?

Thanks!

Cheers,
Scott Nelson

Years ago I got interested in how text editors, Stallman's emacs in
particular, worked. I think I remember that one of his early design
choises was a linked list of text lines. He chose to go a different
way in the end but this was definitely contemplated
 
M

Morris Dovey

My web site server log seemed horribly cluttered and 'hits' from
one site were mixed with those from other sites, so I read the
log a line at a time and made lists (and lists of lists) so I
could produce a report in which requests would be grouped
chronologically by requestor. For example:

25 216.185.230.83
| 02-26 16:45 301 /DeSoto
| 02-26 16:45 200 /DeSoto/
| 02-26 20:17 200 /DeSoto/solar.html
| 02-26 20:17 200 /DeSoto/SC_Madison/
| 02-27 15:03 200 /DeSoto/SC_Types.html
| 02-27 15:26 200 /DeSoto/

for three visits, which is incredibly easier for me to read.

The main() function is fairly simple:

int main (int argc,char **argv)
{ char *line,*s,*z;
unsigned have_page,is_image,line_count = 0,len;
FILE *fp;
user_e *q;
evnt_e *e;
/*
* Check parameters and open the log file
*/
if (argc < 2)
{ puts("Usage: logan <log file>\n");
exit(EXIT_FAILURE);
}
if (!(fp = fopen(argv[1],"r")))
{ puts("Error: can't open log file\n");
exit(EXIT_FAILURE);
}
/*
* Eat the file
*/
while ((line = getsm(fp)))
{ ++line_count;
if (strlen(line) > 256) line[255] = '\0';
process(line);
free(line);
}
/*
* Output the list of requestors and files requested
*/
for (q=user_list.head; q; q=q->next)
{ printf("\n%7u %s\n",q->count,q->user);
have_page = 0;
for (e=q->head; e; e=e->next)
{ s = e->evnt->page->page;
len = strlen(s);
z = s + len;
if (len > 4) is_image = (!strcmp(z-4,".gif") ||
!strcmp(z-4,".ico") ||
!strcmp(z-4,".jpg") ||
!strcmp(z-4,".png"));
else is_image = 0;
if (have_page && is_image) continue;
printf(" | %s %03u %s\n",
ptime(e->evnt->when),e->evnt->code,s);
if (!have_page && !is_image) have_page = 1;
}
}
printf("\nLines: %u\n\n",line_count);
return 0;
}

The process() function, which breaks the line into its
constituant parts, is a bit messier, but still isn't really
complex:

/*
* Extract data from a log record to a log element
*/
log_e *process(char *line)
{ char *p, *s;
char bbuf[256], cbuf[256], fbuf[256], pbuf[256], tbuf[256],
ubuf[256];
log_e *q = NULL;
/*
* Extract the requestor
*/
for (p = line,s = ubuf; *p; ++p)
{ if (*p <= ' ') break;
*s++ = *p;
}
if (*p) ++p;
*s = '\0';

for ( ; *p; ++p) if (*p == '[') break;
if (*p) ++p;
/*
* Extract the access date/time
*/
for (s = tbuf; *p; ++p)
{ if (*p == ']') break;
else *s++ = *p;
}
if (*p) ++p;
*s = '\0';

for ( ; *p; ++p) if (*p == '"') break;
if (*p) ++p;
/*
* Extract the web page URL
*/
if (!strncmp(p,"GET ",4)) p += 4;
for (s = pbuf; *p; ++p)
{ if ((*p == '"') || (*p == ' ')) break;
else *s++ = *p;
}
if (*p == ' ') for (; *p; ++p) if (*p == '"') break;
if (*p) ++p;
*s = '\0';

for ( ; *p; ++p) if (*p != ' ') break;
/*
* Extract the return code
*/
for (s = cbuf; *p; ++p)
{ if ((*p < '0') || (*p > '9')) break;
else *s++ = *p;
}
if (*p) ++p;
*s = '\0';

for ( ; *p; ++p) if (*p != ' ') break;
/*
* Extract the byte count
*/
for (s = bbuf; *p; ++p)
{ if ((*p < '0') || (*p > '9')) break;
else *s++ = *p;
}
if (*p) ++p;
*s = '\0';

for ( ; *p; ++p) if (*p == '"') break;
if (*p) ++p;
/*
* Extract the referring URL
*/
for (s = fbuf; *p; ++p)
{ if (*p == '"') break;
else *s++ = *p;
}
if (*p) ++p;
*s = '\0';
/*
* Add the log record info to the list
*/
if (!(q = malloc(sizeof(log_e))))
{ puts("Allocation failure");
return NULL;
}
q->next = NULL;
q->user = add_user(q,ubuf);
q->when = gtime(tbuf);
q->page = add_page(q,pbuf);
q->code = atoi(cbuf);
q->bcnt = atoi(bbuf);
q->from = add_from(q,fbuf);
if (log_list.head) log_list.tail->next = q;
else log_list.head = q;
log_list.tail = q;
return q;
}

And the remainder is basic stuff to add elements to lists and
convert/format time.
 
C

CBFalconer

Scottman said:
I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

Just get the (short and simple) function ggets(), in ggets.zip.
Written in portable standard C, and put in public domain. One of
the demo programs reads complete text files into memory. You will
be amazed how simple. See:

<http://cbfalconer.home.att.net/download/>
 
B

Bill Reid

I am trying to read a text file into memory without any knowledge of
how long each line will be. I am looking to store each line in a
linked list structure, however, I am unsure of how to dynamically
allocate space for each line.

Maybe you don't have to malloc each line, but rather just the entire
file, with pointers to the start of each line, and replace each newline
with a terminating null (this decision will impact how easily you can
process the lines after you've stored them this way, but just throwing
out an idea that doesn't seem to be getting much play here).
This is how I set up the linked list...

typedef struct node {
char *line;

OK, but this can just be a pointer to the first character of each line
in what is essentially the entire file...right?
struct node *next;
} linkedlist;

OK, then in psuedo-code and hand-waving...
linkedlist* createlinkedlist(void) {
unsigned num_nodes=0;
size_t num_chars=0;
int current_char;

open_file_for_reading

num_chars=count_each_character_in_file(file_pointer)

rewind_file_pointer

file_memory=malloc_file_memory(num_chars)

malloc_list_head

list_head->line=file_memory

num_chars=0

read_each_character_in_file_to_memory_until_end_of_file {

if(current_char=='\n') {

current_char='\0'
malloc_next_node
next_node->line=file_memory+num_chars+1
}

num_chars++
}
}

....or something like that...
 
Ad

Advertisements

R

Richard Heathfield

CBFalconer said:
Just get the (short and simple) function ggets(), in ggets.zip.

....and be aware of its problems. Chuck, if you're going to keep on pimping
this function, shouldn't you at least warn people of its shortcomings?
 
C

CBFalconer

Richard said:
CBFalconer said:

...and be aware of its problems. Chuck, if you're going to keep on
pimping this function, shouldn't you at least warn people of its
shortcomings?

I don't consider them shortcomings. They are that, since the
routine can collect lines of ANY length, that if you can supply a
sufficiently long line without any line terminations (normally
meaning <return> chars typed) the routine will malloc sufficient
space. If you then fail to free that space (normally a memory leak
error) it won't be freed, and you can run out of memory. Unlikely
in practice, and requires careful programmer and user
concentration.

Similarly all users of malloc, calloc, or realloc should be warned
that repeated use or use for sufficiently large demands can run out
of memory.
 
R

Richard Heathfield

CBFalconer said:
I don't consider them shortcomings. They are that, since the
routine can collect lines of ANY length, that if you can supply a
sufficiently long line without any line terminations (normally
meaning <return> chars typed) the routine will malloc sufficient
space. If you then fail to free that space (normally a memory leak
error) it won't be freed, and you can run out of memory. Unlikely
in practice, and requires careful programmer and user
concentration.

In fact, there are several problems.

The most obvious one, the problem of an immensely long line that might
exhaust storage, is simply a consequence of the fact that computers don't
have infinite storage, and all such functions suffer from this problem, so
it is not specific to ggets - I was not including it in ggets's
shortcomings.

The next problem is one that you refer to yourself - the risk of a memory
leak. Again, any function that seeks to acquire arbitrarily long lines
will run this risk, but ggets exacerbates the problem by requiring the
caller to free the buffer after every single call. This is a problem that
can be overcome by allowing the caller to provide a buffer, the length of
its current contents, and perhaps its current maximum capacity, so that
the function can try to acquire memory only when it needs to, and the
caller need not be burdened with constant calls to free().

A third problem is that of Denial of Memory attacks, where malicious users
seek (perhaps over a remote connection) to bring the program down by
asking it to deal with ludicrously long lines. One obvious way to deal
with this is to give the function information on the largest line with
which it can reasonably be expected to cope, and to allow it to reject or
perhaps truncate longer lines.

<snip>
 
R

Richard Tobin

Richard Heathfield said:
A third problem is that of Denial of Memory attacks, where malicious users
seek (perhaps over a remote connection) to bring the program down by
asking it to deal with ludicrously long lines.

Whether the ability of a function to read a line limited only by the
amount of available memory is a disadvantage or an advantage depends
on the context. Not every program has to be robust again malicious
attacks.

-- Richard
 
R

Richard Heathfield

Richard Tobin said:
Whether the ability of a function to read a line limited only by the
amount of available memory is a disadvantage or an advantage depends
on the context. Not every program has to be robust again malicious
attacks.

Agreed - which is why ggets should not be dismissed out of hand.
Furthermore, ggets has the advantage of a very simple user interface. This
is not an advantage to be lightly dismissed. Nevertheless, the problems I
mentioned are also significant, and I think Chuck would be wiser to point
them out when he is recommending his function for other people, who might
not have as much experience as he does in deciding when ggets is, and is
not, appropriate in a particular situation.
 
Ad

Advertisements

R

Randy Howard

Richard Tobin said:


Agreed - which is why ggets should not be dismissed out of hand.
Furthermore, ggets has the advantage of a very simple user interface. This
is not an advantage to be lightly dismissed. Nevertheless, the problems I
mentioned are also significant, and I think Chuck would be wiser to point
them out when he is recommending his function for other people, who might
not have as much experience as he does in deciding when ggets is, and is
not, appropriate in a particular situation.

Since we're on the topic of these, can you comment on why your fgetline
and fgetword functions seem have an inconsistent API?

For example:

int fgetline(char **line, size_t *size, size_t maxrecsize, FILE *fp,
unsigned int flags);

int fgetword(char **word, size_t *size, const char *delimiters,
size_t maxrecsize, FILE *fp, unsigned int flags);


It seems to me it would be a lot more consistent if the common
arguments to both functions appeared first and in the same order.
 
C

CBFalconer

Richard said:
CBFalconer said:

In fact, there are several problems.

The most obvious one, the problem of an immensely long line that might
exhaust storage, is simply a consequence of the fact that computers don't
have infinite storage, and all such functions suffer from this problem, so
it is not specific to ggets - I was not including it in ggets's
shortcomings.

The next problem is one that you refer to yourself - the risk of a memory
leak. Again, any function that seeks to acquire arbitrarily long lines
will run this risk, but ggets exacerbates the problem by requiring the
caller to free the buffer after every single call. This is a problem that
can be overcome by allowing the caller to provide a buffer, the length of
its current contents, and perhaps its current maximum capacity, so that
the function can try to acquire memory only when it needs to, and the
caller need not be burdened with constant calls to free().

A third problem is that of Denial of Memory attacks, where malicious users
seek (perhaps over a remote connection) to bring the program down by
asking it to deal with ludicrously long lines. One obvious way to deal
with this is to give the function information on the largest line with
which it can reasonably be expected to cope, and to allow it to reject or
perhaps truncate longer lines.

<snip>

However you carefully snipped the portion where I pointed out that
this is no different than using the malloc family in any program.
 
R

Richard Heathfield

Randy Howard said:

Since we're on the topic of these, can you comment on why your fgetline
and fgetword functions seem have an inconsistent API?

For example:

int fgetline(char **line, size_t *size, size_t maxrecsize, FILE *fp,
unsigned int flags);

int fgetword(char **word, size_t *size, const char *delimiters,
size_t maxrecsize, FILE *fp, unsigned int flags);


It seems to me it would be a lot more consistent if the common
arguments to both functions appeared first and in the same order.

Well, they /are/ in the same order - but one takes an extra parameter. So
the question is where the delimiters parameter should go.

At the beginning? No - that's a really good place for the thing that
actually gets changed (following in the footsteps of *printf, strcpy,
strcat, memcpy, memset, etc).

At the end? No - that's a really good place for frequently-unused optional
information.

So it has to go somewhere in the middle. And once we've determined that,
it's really a matter of taste.

If you can come up with a compelling argument for putting the parameter
elsewhere, I'm all ears.
 
Ad

Advertisements

B

Bartc

Bill Reid said:
Maybe you don't have to malloc each line, but rather just the entire
file, with pointers to the start of each line, and replace each newline
with a terminating null (this decision will impact how easily you can
process the lines after you've stored them this way, but just throwing
out an idea that doesn't seem to be getting much play here).

Reading an entire file into memory is an admirable idea, but is frowned on
in this newsgroup for various reasons: the size of a file is impossible to
determine, or could change while you're reading the file, or because it
might be a 100GB monster you therefore should never attempt this even for
tiny files.

That doesn't mean you shouldn't do it; just keep quiet about it here :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top