Reading long lines from a file

V

Vlad Dogaru

Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

Thanks in advance,
Vlad Dogaru
 
R

Richard Heathfield

Vlad Dogaru said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal
size 2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php
 
P

pete

Vlad said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this?

Possibly with the phrase "statically allocated".
There's three kinds of duration:
1 automatic
2 static
3 allocated

Only allocated memory can be reallocated.
If so, how can I improve it?

A few of the regulars here
have written their own getline functions:
http://www.cpax.org.uk/prg/writings/fgetdata.php#related
 
V

Vlad Dogaru

Richard said:
Vlad Dogaru said:


To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?


If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

But your solution is much more elegant and now I see why fgets is
unnecessary.
Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php


Thank you for the clarification and the link. I will look into it and I
am confident that I can write a similar function.

Vlad
 
D

David Mathog

Vlad said:
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

This may not apply to your particular case, but in some instances I have
encountered with "arbitrarily long lines" one can just read a character
at a time, examine it, perform some action, and then continue. This
removes the need for a huge buffer, which in the worst case, might not
even fit into the computer's memory. Obviously this won't work if any
modification to the front of the line depends on a value near the end of
the line.

If you do go with the expanding buffer method be sure you that you do
NOT use strcat() to append each new chunk of text. Doing so will result
in each such addition scanning from the front of the buffer for the
terminal '\0' in the string. I've seen this bug many, many times.
It can cause a huge performance hit. Instead, keep track of the
length of the string in the buffer and just copy the new string directly
to the appropriate position, then adjust the length variable, and repeat.

Regards,

David Mathog
 
F

Flash Gordon

Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not. You can only pass pointers
returned by malloc or realloc to realloc.

Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

<snip>
 
P

Peter J. Holzer

Vlad Dogaru wrote, On 14/08/07 11:46:
Richard said:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer.
[...]
If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.

We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).
Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

ACK. But that's probably not something which should be hard-coded into
the application. After all, the program might run on a machine with 64
GB RAM where 5 GB of memory usage is quite acceptable. You could use a
configurable limit or rely on OS features to limit memory consumption
(e.g. ulimit on unixoid systems).

hp
 
S

Spiros Bousbouras

Vlad Dogaru wrote, On 14/08/07 11:46:
Richard Heathfield wrote:
To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a pointer. [...]
If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).
Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not.

We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly ;-).

Reading Flash Gordon's post I don't see him assuming anything.
He was simply aiming to cover all possibilities and I'm all for
that ; we do aim to be accurate around here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top