J
James Harris
....
Your first point makes sense as, under your suggestion, the code which
allocated the buffer would be responsible for freeing it which makes
sense especially as there's no garbage collector. It would also be
faster.
I'm less sold on the second point, though. Say that in order to
protect against denial of memory attacks the user were to set a limit
on how much to read how would that limit be chosen? We may be running
on a 4Gbyte machine or a 64k machine. In either case not all address
space would be available to the process. So how would the user
portably choose a limit for what to read?
If we do get the user to choose a limit how does the application logic
- i.e. the code using the function - handle hitting that limit? Surely
the idea of reading in a long section of data is to allow the code to
work on whole lines or whole whatevers in one go. If we still have to
write code to handle input data split over call boundaries why not use
a standard library function in the first place?
As a slight aside, specifying how much memory to keep /free/ may be
more useful here rather than how much we can use.
There are other issues with the ggets routine, IMHO,
3. Fixed size increments - does not scale well AND makes it more
likely that any eventual memory allocation failure will leave only a
tiny bit of space left.
4. The routine returns only on a newline. In the discussions this
seems to have been taken for a gimme - and seems to be common to other
routines I've found of the same type - but the OP did not specify he
was trying to read lines. He may have another terminator in mind.
5. These routines may be handling the wrong problem. The underlying
issue is not reading from a file but buffer management. It may be
better to write code to handle buffers than to write for the specific
(albeit most prevalent) case of reading a line. To illustrate, if we
knew the buffer was large enough we could read the input with
something like (and I admit my C may have syntax or other errors -
corrections welcome - but it should hopefully make the point
nonetheless)
while ((ch = getc(infile)) != EOF) {
buffer[offset++] = ch;
if (ch == endchar) break;
}
This loop is simple and would not require to be packaged in an
imported routine. It may be good enough as it stands to point the OP
in the right direction: i.e. do it yourself using getc rather than
fgets. On top of this loop we could add something like
if ((need_space_for(&buffer, &bufsize, offset)) != 0 {
/* handle out-of-space error */
}
immediately prior to putting the character in the buffer. I'm thinking
of the "need_space_for" routine as
1) returning immediately if the buffer is large enough,
2) if not large enough reallocating the buffer,
3) if unable to reallocate the buffer returning non-zero.
For performance it would be good if step 1 could be implemented as a
macro/inline and the rest as a called function. Is that possible in C?
This should be quite flexible. If the programmer wishes to avoid the
(albeit small) cost of the macro part on each call he could read the
input in nested loops in chunks of, say, 32 bytes checking that there
is enough space for another chunk at the top of each loop rather than
one each call, calling the macro as
if ((need_space_for(&buffer, &bufsize, offset + CHUNK_SIZE)) ...
where CHUNK_SIZE = 32.
The intention is that the same macro + routine combination could be
used for handling any flexibly-sized buffer in any circumstance, not
just for reading lines.
The above is just a suggestion. Can anyone see if this would work or
not? As ever when throwing ideas out onto Usenet it's interesting to
see what comes back - good and bad!
1) a fresh buffer is allocated each time. There is no way to supply an
existing buffer, and consequently the program must either keep track of
all these buffers, keep freeing them as it goes, or leak them. My concern
is that the last is likely to be the most common. A reusable buffer would
make memory management simpler (at the expense of a more complicated
function interface).
2) there is no protection against denial of memory attacks. Given an
arbitrarily long line, the function will continue to try to allocate
memory until... it fails! A malicious user who is trying to soak up as
much memory as possible on the host system could exploit this in an
attempt to deny memory to other processes.
Chuck's argument is that fixing these would make the function interface
more complicated - i.e. the function would be harder to call.
This is certainly true, but not compelling, because making these changes
would make ggets easier to *use* - there is more to usage than the call
itself.
Your first point makes sense as, under your suggestion, the code which
allocated the buffer would be responsible for freeing it which makes
sense especially as there's no garbage collector. It would also be
faster.
I'm less sold on the second point, though. Say that in order to
protect against denial of memory attacks the user were to set a limit
on how much to read how would that limit be chosen? We may be running
on a 4Gbyte machine or a 64k machine. In either case not all address
space would be available to the process. So how would the user
portably choose a limit for what to read?
If we do get the user to choose a limit how does the application logic
- i.e. the code using the function - handle hitting that limit? Surely
the idea of reading in a long section of data is to allow the code to
work on whole lines or whole whatevers in one go. If we still have to
write code to handle input data split over call boundaries why not use
a standard library function in the first place?
As a slight aside, specifying how much memory to keep /free/ may be
more useful here rather than how much we can use.
There are other issues with the ggets routine, IMHO,
3. Fixed size increments - does not scale well AND makes it more
likely that any eventual memory allocation failure will leave only a
tiny bit of space left.
4. The routine returns only on a newline. In the discussions this
seems to have been taken for a gimme - and seems to be common to other
routines I've found of the same type - but the OP did not specify he
was trying to read lines. He may have another terminator in mind.
5. These routines may be handling the wrong problem. The underlying
issue is not reading from a file but buffer management. It may be
better to write code to handle buffers than to write for the specific
(albeit most prevalent) case of reading a line. To illustrate, if we
knew the buffer was large enough we could read the input with
something like (and I admit my C may have syntax or other errors -
corrections welcome - but it should hopefully make the point
nonetheless)
while ((ch = getc(infile)) != EOF) {
buffer[offset++] = ch;
if (ch == endchar) break;
}
This loop is simple and would not require to be packaged in an
imported routine. It may be good enough as it stands to point the OP
in the right direction: i.e. do it yourself using getc rather than
fgets. On top of this loop we could add something like
if ((need_space_for(&buffer, &bufsize, offset)) != 0 {
/* handle out-of-space error */
}
immediately prior to putting the character in the buffer. I'm thinking
of the "need_space_for" routine as
1) returning immediately if the buffer is large enough,
2) if not large enough reallocating the buffer,
3) if unable to reallocate the buffer returning non-zero.
For performance it would be good if step 1 could be implemented as a
macro/inline and the rest as a called function. Is that possible in C?
This should be quite flexible. If the programmer wishes to avoid the
(albeit small) cost of the macro part on each call he could read the
input in nested loops in chunks of, say, 32 bytes checking that there
is enough space for another chunk at the top of each loop rather than
one each call, calling the macro as
if ((need_space_for(&buffer, &bufsize, offset + CHUNK_SIZE)) ...
where CHUNK_SIZE = 32.
The intention is that the same macro + routine combination could be
used for handling any flexibly-sized buffer in any circumstance, not
just for reading lines.
The above is just a suggestion. Can anyone see if this would work or
not? As ever when throwing ideas out onto Usenet it's interesting to
see what comes back - good and bad!