Regarding: fgets() replacement

P

Paul Hsieh

Paul D. Boyle said:
There was a recent thread in this group which talked about the
shortcomings of fgets(). I decided to try my hand at writing a
replacement for fgets() using fgetc() and realloc() to read a line of
arbitrary length. I know that the better programmers in this group could
write a more robust function, but here is my shot at it anyway.
I would appreciate people's comments on my fget_line() code below
(usage example included). Any constructive criticism welcome regarding
logic, design, style, etc. Thanks.

Your algorithm can end up calling realloc O(lengthof(input)) times.
This will end up shredding some of the existing heaps for some widely
deployed C compilers in existence today. I.e., just input enough
stuff and malloc/realloc will start grinding to a halt. Its not hard
to write up test scenarios to see this happen for yourself with 2 of
the most popular Windows C compilers. Using double the amount of
memory at each step is a simple way to avoid this problem (it will
also improve performance by quite a bit.)

Your function also only supports the one semantic of reallocating the
memory required for whatever input is supplied. This isnt going to
work very well in systems like Linux/UNIX which support "yes".
(Consider "yes | ./a.out") The time required to fill in a whole
swap-file's worth of data is pretty long; so this is not a marginal
situation if a hacker is trying to slow down your system, for example.

Your function doesn't make any distinction between binary or text
files. The thing is -- if you read a text file in a binary mode, you
can receive extraneous characters (like '\0's that would not be read
from a text file). That's all fine since you return the length so the
set of characters read (with the exception of the last one that might
be rejected by "validate") but, just like fgets(), that puts your
function at odds with every other function in the standard C library
which always disallows internal characters of char * strings to be
'\0' (since that's the end-of-string marker).

The validate() function you specify is a lot weaker than it could and
should be. First of all, if the input is fairly large, why not build
a table indexed by all characters and just check the table instead?
It would be a lot faster. Ok, the reason is because a callback
function can process a lot more context -- for example it might be
possible to parse simple grammars via state machine. The way you do
this is by passing an opaque context parameter to it (passed in to
fget_line()) as well as the character index. So it should be:

int (*validate)(int character, int idx, void * context)

And add in the additional parameter void * context to fget_line().

There also appears to be a flaw in your program that allows
local_buffer to be dereferenced even if its NULL; i.e., if the first
realloc() fails.

If you would like to see how I solved this same problem in a more
general way see:

http://www.azillionmonkeys.com/qed/userInput.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Using fgets 1
Replacement Problem 1
fgets() replacement 20
understanding fgets() 11
A process take input from /proc/<pid>/fd/0, but won't process it 0
fgets 1
fgets problem 23
fgets 6

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top