Hi,
it´s probably a FAQ, though I haven´t found any good info on it: How do
you go about writing a filter, i. e. a program that reads data from
stdin and processes it?
Since indefinte amounts of data could be read from stdin, they can´t
just be put into some (ever increasing) buffer. Using a buffer of
limited size can make it difficult to process the data because the
buffer could be too small.
What´s the solution for this?
The term "filter" is generally reserved for the kind of application that
only needs to keep a small portion of the input in memory at any given
time. A typical Unix filter is "cut". It parses each line of input up
into fields, which can be either fixed width or delimited by a
user-specifiable character which defaults to '\t'. It writes out a
specified subset of the fields, with the delimiter optionally replaced
with an arbitrary string. I've never attempted implementing it, but it
seems to me that it should be implementable in a way that never keeps
more than one character of input in memory at any given time.
However, if for some reason your program does need to store the entire
input, then you need expandable storage, and the C standard library
provides some. Start by allocating a buffer with malloc(). Whenever the
buffer gets full, call realloc() to expand it; I recommend increasing
the size by a fixed factor; 2 would be a good value. Note, there are a
couple of tricky points in connection with calling realloc():
* if realloc() fails, it returns a null pointer, and pointers into the
old buffer are still valid. Therefore, if you make the mistake of
storing the value returned by realloc() directly into the same pointer
you were using to keep track of your buffer, you'll lose your ability to
access that buffer if realloc() fails.
* if realloc() succeeds, it may have moved your data to a new location
in memory, invalidating any pointers you may have been keeping that
pointed into your old buffer. You can't safely do anything with any of
the old pointer values, not even comparing them for equality with new
ones to determine whether or not the buffer was moved. For each such
pointer, determine its offset from the beginning of the buffer before
calling realloc(). If realloc() succeeds, calculate the new value for
the corresponding pointer by adding that offset to the start of the new
buffer.
If realloc() does fail, you'll have to switch to a different approach.
One option is to create a temporary file using tmpfile(). Read from
standard input, then write to the temporary file. Once the entire file
is read in, you can move around in the temporary file using fseek(),
something you cannot do with stdin.