Is this string input function safe?

A

ais523

I use this function that I wrote for inputting strings. It's meant to
return a pointer to mallocated memory holding one input string, or 0 on
error. (Personally, I prefer to use 0 to NULL when returning null
pointers.) It looks pretty watertight to me, but my version of lint
complains about use of deallocated pointers, etc. Is this code
completely safe on all input, or have I missed something?

/* Header files included in the program containing malloc_getstr */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

/* This function gets a string of any length from stdin,
mallocating an appropriate amount of memory. It returns
0 on string length overflow or out-of-memory. Note that
slightly more space is allocated than neccesary (from
1 to 10 characters). */
char* malloc_getstr(void)
{
size_t s=0;
char* c;
char* t;
c=malloc(s+10);
if(!c)
{ /* out of memory */
return 0;
}
for(;;)
{
fgets(c+s,10,stdin);
if(strchr(c+s,'\n')) break; /* \n lets us know if the input has
ended. Check only the most recent
input. */
if(s>SIZE_MAX-9)
{ /* new string length too long to pass as realloc's argument */
free(c);
return 0; /* error return value */
}
s+=9;
t=realloc(c,s+10);
if(!t)
{ /* out of memory */
free(c);
return 0; /* error return value */
}
c=t;
}
*strchr(c+s,'\n')=0; /* replace the \n with a \0 */
return c;
}
 
E

Eric Sosman

ais523 said:
I use this function that I wrote for inputting strings. It's meant to
return a pointer to mallocated memory holding one input string, or 0 on
error. (Personally, I prefer to use 0 to NULL when returning null
pointers.) It looks pretty watertight to me, but my version of lint
complains about use of deallocated pointers, etc. Is this code
completely safe on all input, or have I missed something?

You've forgotten about end-of-file (and I/O errors).
/* Header files included in the program containing malloc_getstr */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

/* This function gets a string of any length from stdin,
mallocating an appropriate amount of memory. It returns
0 on string length overflow or out-of-memory. Note that
slightly more space is allocated than neccesary (from
1 to 10 characters). */
char* malloc_getstr(void)
{
size_t s=0;
char* c;
char* t;
c=malloc(s+10);
if(!c)
{ /* out of memory */
return 0;
}
for(;;)
{
fgets(c+s,10,stdin);
if(strchr(c+s,'\n')) break; /* \n lets us know if the input has
ended. Check only the most recent
input. */

... except that if fgets() detected end-of-file or error,
the contents of the buffer are indeterminate (for different
reasons in the two cases, but indeterminate anyhow). The
buffer might not contain a valid string, so the behavior of
strchr() is undefined.
if(s>SIZE_MAX-9)
{ /* new string length too long to pass as realloc's argument */

Looks like the test is wrong: the argument you're about
to pass to realloc will be s+19, not s+9. Another point of
view is to say that the test is right (approximately), but
in the wrong place: since you've already got s+10 characters
in the buffer, the test as it stands is too late to avoid
trouble. If you want to make the test, I'd suggest making it
right before allocating the memory.
free(c);
return 0; /* error return value */
}
s+=9;
t=realloc(c,s+10);
if(!t)
{ /* out of memory */
free(c);
return 0; /* error return value */
}
c=t;
}
*strchr(c+s,'\n')=0; /* replace the \n with a \0 */
return c;
}

General impression: Salvageable.

A few observations:

- Everybody writes a "better fgets()" eventually, and my
first used a strategy not unlike yours: call fgets(),
check for '\n', grow the buffer, lather, rinse, repeat.
However, I found that the function became much simpler
when I discarded fgets()/strchr() in favor of getc()
and testing each character as it arrived.

- The magic number 10 appears in five places in the code
(sometimes disguised as 9). I'm not as rabid as those
who insist on using #define for every number other than
0, 1, and perhaps 2, but when an essentially arbitrary
value shows up five times ...

- The test against SIZE_MAX shows praiseworthy diligence
(I failed to include a similar test in my own function,
so my hat's off to you.) The test will probably never
ever fire, but "better safe than sorry."

- Instead of limiting the function's usefulness by hard-
wiring stdin as the data source, why not accept a FILE*
function argument?

- Figuring out a good growth pattern for the buffer is a
matter of guesswork, and there's no one "best" way.
However, you might want to consider growing the buffer
more aggressively as the line gets longer and longer,
the idea being that a line that's already proved to be
at least 200 characters long may be more likely to grow
to 300 than to stop short of 210. (Of course, this
requires you to manage the 10/9 stuff differently.)

- Similarly with the size of the initial allocation: it
seems you're expecting very short input lines most of
the time. I'd suggest a larger starting size, not only
to avoid the work of copying the first part of the line
over and over and over again, but also so realloc() and
its friends won't have to deal with huge numbers of
itty-bitty memory fragments.
 
J

Josh Sebastian

In the time between my starting to write this and now, Eric Sosman has
already provided a response which covered most of what I wrote. But in
true "Me too" fashion, I shall post it anyway.
I use this function that I wrote for inputting strings. It's meant to
return a pointer to mallocated memory holding one input string, or 0 on
error. (Personally, I prefer to use 0 to NULL when returning null
pointers.)

Me too.
It looks pretty watertight to me, but my version of lint
complains about use of deallocated pointers, etc.

In my experience, lint gets confused if you do anything much more
complicated than a simple

x = malloc(...)
use_x();
free(x);
Is this code
completely safe on all input, or have I missed something?

You've missed something. See my comment after your call to fgets.
/* Header files included in the program containing malloc_getstr */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

/* This function gets a string of any length from stdin,

Why not pass the FILE* in as a parameter? That would allow you to use
it in a more general case and you could still provide a wrapper to make
using it for console input convenient.
mallocating an appropriate amount of memory. It returns
0 on string length overflow or out-of-memory. Note that
slightly more space is allocated than neccesary (from
1 to 10 characters). */
char* malloc_getstr(void)
{
size_t s=0;
char* c;
char* t;

Use variable names with more than a single character! Maybe

char* line;
char* alloc;
size_t len;
c=malloc(s+10);
if(!c)

You checked the return value of malloc. Good!
{ /* out of memory */
return 0;
}
for(;;)
{
fgets(c+s,10,stdin);

But you forgot to check the return value of fgets! If fgets fails
before any characters are read (say, stdin is already at EOF), then the
string doesn't get null-terminated. And the next thing you do is call
strchr, which assumes a null-terminated string.

If fgets succeeds the first time, you'll be fine even if it fails
thereafter because you don't overwrite the '\0' from the previous
iteration. But you still have that possibility of failure.

My safe line input function uses getc to read one character at a time
instead of mucking with fgets and its line terminators. I find it
easier, and though I haven't tested, I imagine that not having to scan
the characters again (which your strchr does below) makes up for any
lost efficiency. Heck, since getc's a macro, it might even be more
efficient. I've never had occasion to care, though.
if(strchr(c+s,'\n')) break; /* \n lets us know if the input has
ended. Check only the most recent
input. */

What if input ends (hard EOF) before a \n is read? Since you don't
check the return value of fgets, right now you have an infinite loop.
If you do check the return value of fgets but leave the bug here,
you'll incorrectly report an error condition to the caller (and forget
all the data along the way -- more about that later).
if(s>SIZE_MAX-9)

Why all these magic numbers? Have a #define EXPAND_SIZE 10, and then
that 9 here is really just (EXPAND_SIZE - 1). Also, I have found that
it's better to expand by a factor (eg, like start at size 10, and
double when you run out of space).
{ /* new string length too long to pass as realloc's argument */
free(c);

Hmm... so now you've removed a whole bunch of characters from the input
stream and failed to tell the caller any information about them. You
didn't even say how many you removed! I think that's a bad idea.
return 0; /* error return value */

I think you're trying to multiplex too much information into your
single output interface (the return value). Right now, it's doing
double-duty as the pointer to the dynamically-allocated string and an
error indicator. This is too restrictive. One alternative is:

int malloc_getstr(char** line)

Sometimes changing to a parameter makes things inconvenient for the
caller, since they can't just do something like

puts(malloc_getstr());

anymore, but in this case that would have been a memory leak anyway, so
you don't particularly want to encourage that. So the tradeoff here is
really going from

char* line;
if(line = malloc_getstr()) {
puts(line);
free(line);
}

to

char* line;
if(0 == malloc_getstr(&line)) {
puts(line);
free(line);
}

You also might want to consider allowing the user to pass a pointer to
size_t as a parmeter which, if not NULL, the object it points to gets
the number of characters read. After all, in the function you know this
already, and it's inconvenient to the tune of O(n) for the caller to
determine.
}
s+=9;
t=realloc(c,s+10);

Just a style issue: I would prefer to start c as NULL (or 0, if you
prefer) and then use realloc instead of malloc as my allocation
function, plus arrange my code in such a way that it appears only once
(albeit in a loop).
if(!t)
{ /* out of memory */
free(c);

Here you go again, forgetting things that might be useful to the
caller. If your user goes through the trouble to type a bazillion
characters as input, do you think he'll appreciate you just forgetting
it all because of a system limitation?
return 0; /* error return value */
}
c=t;
}
*strchr(c+s,'\n')=0; /* replace the \n with a \0 */

This call is suspect. Right now, the only exit point from the loop
(without returning) is when you find a '\n', so it's safe -- but that
might (should) change.
return c;
}

CBFalconer has posted code to his "ggets" function (search the
archives), which is an excellent implementation of the kind of thing
you're trying to do. I have my own which behaves a little differently,
but his is already out there and probably much more peer-reviewed than
mine. Even if you stick with a home-grown one, I suggest you study his.

Josh
 
C

CBFalconer

Josh said:
.... snip ...

CBFalconer has posted code to his "ggets" function (search the
archives), which is an excellent implementation of the kind of thing
you're trying to do. I have my own which behaves a little differently,
but his is already out there and probably much more peer-reviewed than
mine. Even if you stick with a home-grown one, I suggest you study his.

Available at <http://cbfalconer.home.att.net/download/ggets.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
K

Keith Thompson

Fred Kleinschmidt said:
Falconer's ggets could be improved slightly:

Its signature is
int fggets(char* *ln, FILE *f);

The first thing it should do is check to ensure that ln!=NULL and f!=NULL

Why should it do that? Just don't call it with null pointers.
 
W

websnarf

Fred said:
Falconer's ggets could be improved slightly:

Its signature is
int fggets(char* *ln, FILE *f);

The first thing it should do is check to ensure that ln!=NULL and f!=NULL

Yeah, except that he's following the ANSI C committee's lead in
suggesting that a language and its libraries should not actually assist
with programmer safety. This "peer reviewed" function has other design
problems however:

1) The function will always fail if it is unable to allocate 112 bytes
(even if the input size is smaller than that.)

2) The (linear) realloc strategy will cause real world heaps to
fragment like crazy on very long inputs (like the kinds you see in
typical weblogs from attempted virus attacks against IIS.) Since he
calls realloc O(n) times, and realloc itself is an O(n) function (think
about it) that means the function is O(n^2).

3) The external function fggets returns with enum values that are only
declared locally in the file. Its done in a well defined way, but its
really kind of annoying. I mean, why not just declare all the return
values and throw them in the .h file? If its a namespace issue, there
is no reason not to pick hard coded values, since there are so few
output values. (There is also the 3-valued output for *ln (filled in,
NULL, or not touched) that can be used to represent various errors.)

4) The function doesn't have a consistent output on out-of-memory
conditions.

5) The function inherits all the other problems of fgets(), namely that
it cannot support binary files (since it ignores '\0's that are read
from stdin) and although the total length of the input is never more
than O(1) away from the inner loop, this information is not computed
and is lost by the time the function returns (which usually means that
one way or another, you will end up calling strlen or something like it
one additional time for that string, after it has been output.)

You can find a somewhat easier to understand, and better documented
dynamic string input function that doesn't have these sorts of problems
here:

http://www.pobox.com/~qed/userInput.html
 
M

Michael Mair

Yeah, except that he's following the ANSI C committee's lead in
suggesting that a language and its libraries should not actually assist
with programmer safety.

True enough; it is IMO still a good source for seeing how input
can be done yourself. If someone understands it, moving on to
more efficient or more secure versions is easier.
This "peer reviewed" function has other design
problems however:

1) The function will always fail if it is unable to allocate 112 bytes
(even if the input size is smaller than that.)

2) The (linear) realloc strategy will cause real world heaps to
fragment like crazy on very long inputs (like the kinds you see in
typical weblogs from attempted virus attacks against IIS.) Since he
calls realloc O(n) times, and realloc itself is an O(n) function (think
about it) that means the function is O(n^2).

3) The external function fggets returns with enum values that are only
declared locally in the file. Its done in a well defined way, but its
really kind of annoying. I mean, why not just declare all the return
values and throw them in the .h file? If its a namespace issue, there
is no reason not to pick hard coded values, since there are so few
output values. (There is also the 3-valued output for *ln (filled in,
NULL, or not touched) that can be used to represent various errors.)

4) The function doesn't have a consistent output on out-of-memory
conditions.

5) The function inherits all the other problems of fgets(), namely that
it cannot support binary files (since it ignores '\0's that are read
from stdin) and although the total length of the input is never more
than O(1) away from the inner loop, this information is not computed
and is lost by the time the function returns (which usually means that
one way or another, you will end up calling strlen or something like it
one additional time for that string, after it has been output.)

You can find a somewhat easier to understand, and better documented
dynamic string input function that doesn't have these sorts of problems
here:

http://www.pobox.com/~qed/userInput.html

Nicely explained although I do not like the belligerent tone.
Will definitely go into my list of resources to point newbies to.

BTW: Finding a line like
#define system(s) #error "....."
on the page does not exactly instill awe.

Cheers
Michael
 
M

Mark McIntyre

Yeah, except that he's following the ANSI C committee's lead in
suggesting that a language and its libraries should not actually assist
with programmer safety.

If you'd bothered to say that differently, I might have agreed with
you. As it is, you merely come across as a bitter person who dislikes
C and has some axe to grind against the committee. Perhaps you wanted
to attend but weren't allowed to, or the majority disagreed with your
ideas.
5) The function inherits all the other problems of fgets(), namely that
it cannot support binary files

You seem to have a misunderstanding about fgets. That isn't a
'problem' with fgets, its not supposed to handle binary files.
and is lost by the time the function returns (which usually means that
one way or another, you will end up calling strlen or something like it
one additional time for that string, after it has been output.)

This also isn't a 'problem', its simply not intended to calculate it.

Perhaps you should consider a less agressive approach to life?
Mark McIntyre
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top