Critic of the given code.

S

santosh

I've written the following function to return a string of arbitrary
length from stdin. So far, it seems to work as it should. Are there any
unportable assumptions and/or logical errors in the code? Would it be
better to return the length of the string? Or an integer value
indicating success or failure, (the type of failure too)?

Thanks.

#include <stdio.h>
#include <stdlib.h>

#define IBUFSIZ 32
#define BUF_DELTA IBUFSIZ
#define BUF_INP_DIFF 3
#define BUF_CROP_THRESHOLD (BUF_INP_DIFF + 1)

char *getl_stdin(void);

char *getl_stdin(void) {
char *inp_p = NULL, *inp_p_cpy;
int nc, buffer_exists = 0, no_buf_crop = 0;
size_t curr_bufsiz = 0, curr_inpsiz = 0;

while((nc = fgetc(stdin)) != EOF) {
if((curr_bufsiz - curr_inpsiz) < BUF_INP_DIFF) {
curr_bufsiz += BUF_DELTA;
if((inp_p = realloc(inp_p, curr_bufsiz)) == NULL) {
if(buffer_exists) {
inp_p = inp_p_cpy;
inp_p[curr_inpsiz] = nc;
inp_p[curr_inpsiz + 1] = '\0';
++curr_inpsiz;
curr_bufsiz -= BUF_DELTA;
no_buf_crop = 1;
break;
}
else
break;
}
else {
inp_p_cpy = inp_p;
inp_p[curr_inpsiz] = nc;
++curr_inpsiz;
if(!buffer_exists)
buffer_exists = 1;
else {
if(nc == '\n') {
inp_p[curr_inpsiz] = '\0';
++curr_inpsiz;
break;
}
}
}
}
else {
inp_p[curr_inpsiz] = nc;
++curr_inpsiz;
if(nc == '\n') {
inp_p[curr_inpsiz] = '\0';
++curr_inpsiz;
break;
}
}
}

if(inp_p != NULL && !no_buf_crop) {
if((curr_bufsiz - curr_inpsiz) >= BUF_CROP_THRESHOLD) {
if((inp_p = realloc(inp_p, curr_inpsiz)) == NULL)
inp_p = inp_p_cpy;
else {
inp_p_cpy = inp_p;
curr_bufsiz = curr_inpsiz;
}
}
}

return inp_p;
}

int main(void) {
char *input = NULL;
int main_RV = 0;

puts("Enter a line of text:");
input = getl_stdin();
if(input == NULL) {
puts("getl_stdin() returned NULL.");
main_RV = EXIT_FAILURE;
}
else
printf("You wrote:\n%s\n", input);

return main_RV;
}
 
R

Richard Heathfield

santosh said:
I've written the following function to return a string of arbitrary
length from stdin. So far, it seems to work as it should. Are there any
unportable assumptions

Yes - tabs in your code! Convert to spaces when posting to Usenet, or some
newsreaders will strip out your indentation (as mine did).
if((curr_bufsiz - curr_inpsiz) < BUF_INP_DIFF) {
curr_bufsiz += BUF_DELTA;

Bad idea. It means you have to keep going back to the system for memory over
and over. Better: curr_bufsiz *= SCALING_FACTOR - many people suggest a
factor of 2, although I prefer newsize = 3 * oldsize / 2;
if((inp_p = realloc(inp_p, curr_bufsiz)) == NULL) {

Bad idea. If the realloc fails, you just overwrote your only pointer to the
original data. Use a temp, and copy it over if it's not NULL.

I didn't check - how forgiving is your routine of hitting end-of-file in the
middle of a line (i.e. not immediately after a '\n')?

You might want to fix it to take a FILE * as an arg, rather than limiting it
to stdin.

You've basically re-invented Chuck's ggets() function, by the way - and your
function suffers from the same problem as his, that it has no way to re-use
an existing buffer - the caller is obliged to allocate fresh space for
every line he reads. (My own similar routine allows the buffer to be
re-used if that's what the programmer desires.)
 
C

Chris Dollin

santosh said:
I've written the following function to return a string of arbitrary
length from stdin. So far, it seems to work as it should. Are there any
unportable assumptions and/or logical errors in the code? Would it be
better to return the length of the string? Or an integer value
indicating success or failure, (the type of failure too)?

It looks hellishly complicated, and I can't easily tell what the contract is.

I'd guess that you could do it in about half the code, at the cost of
improving its readability.
 
S

santosh

Richard said:
santosh said:


Yes - tabs in your code! Convert to spaces when posting to Usenet, or some
newsreaders will strip out your indentation (as mine did).

Okay, though I use vim, I haven't really learnt all it's options.
Bad idea. It means you have to keep going back to the system for memory over
and over. Better: curr_bufsiz *= SCALING_FACTOR - many people suggest a
factor of 2, although I prefer newsize = 3 * oldsize / 2;

You're method allocates increasingly more memory, relative to the
previous increase, as input continues.
Bad idea. If the realloc fails, you just overwrote your only pointer to the
original data. Use a temp, and copy it over if it's not NULL.

I do copy the return value of realloc(), if successful into a reserve
pointer. Only for the very first allocation, this not done,
(obviously).
I didn't check - how forgiving is your routine of hitting end-of-file in the
middle of a line (i.e. not immediately after a '\n')?

If it hits EOF before NL, the function appends a null character and
returns the partial input, if it exists in the first place.
You might want to fix it to take a FILE * as an arg, rather than limiting it
to stdin.

Yes, I thought of that, but considered writing another function for
that.
You've basically re-invented Chuck's ggets() function, by the way - and your
function suffers from the same problem as his, that it has no way to re-use
an existing buffer - the caller is obliged to allocate fresh space for
every line he reads. (My own similar routine allows the buffer to be
re-used if that's what the programmer desires.)

Yes, I should have made it more flexible, including arguments which
indicate what to do upon different kinds of runtime failures.

I considered returning a simple 1 for success or 0 for failure. If the
latter, then set errno to the specific kind of error, i.e., malloc()
failure, realloc() failure, EOF etc.
Are user functions allowed to modify errno? And what range of integer
values for errno would not conflict with system defined standard
values?

Thanks for the input.
 
R

Richard Heathfield

santosh said:
You're method allocates increasingly more memory, relative to the
previous increase, as input continues.

Correct - but it means far fewer visits to the well, and you can always
realloc down to the exact amount, at the end.
I do copy the return value of realloc(), if successful into a reserve
pointer.

Actually, if the very first allocation fails, then at the point inp_p_cpy is
used, it has not yet been given a value, so the value is indeterminate and
the result undefined.
If it hits EOF before NL, the function appends a null character and
returns the partial input, if it exists in the first place.

Good. That's the right way, I think.
Are user functions allowed to modify errno?
Yes.

And what range of integer
values for errno would not conflict with system defined standard
values?

Going negative should be safe enough as long as you don't call strerror or
perror! Personally, I stay well away from errno.
Thanks for the input.

You're welcome.
 
S

santosh

Chris said:
It looks hellishly complicated, and I can't easily tell what the contract is.

It will return a pointer to a block of memory containing a string from
stdin.
If initial allocation fails a null pointer is returned.
If reallocation fails, the pre-existing partial input, if any, is
returned.
If end-of-file is hit, again, partial input, if any, is returned.
 
R

Richard Heathfield

Chris Dollin said:
It looks hellishly complicated, and I can't easily tell what the contract
is.

I'd guess that you could do it in about half the code, at the cost of
improving its readability.

Some cost! :)

The code I use nowadays to do this is just 23 lines long, and the core loop
is trivial. (In case you're wondering, the xxxx just means I haven't yet
decided on the prefix I'm using for the library.)

int xxxx_str_getline(xxxx_str *s, FILE *fp)
{
int rc = xxxx_ERROR_OK;
int ch = 0;
size_t count = 0;
s->nobj = 0;

while(xxxx_ERROR_OK == rc && (ch = getc(fp)) != '\n' && ch != EOF)
{
unsigned char character = ch;
rc = xxxx_arr_write(s, &character, count++, 1);
}
if(xxxx_ERROR_INSUFFICIENT_MEMORY != rc)
{
unsigned char nul = 0;
rc = xxxx_arr_write(s, &nul, count, 1);
}
if(xxxx_ERROR_OK == rc && ch == EOF && count == 0)
{
rc = xxxx_ERROR_EOF;
}
return rc;
}
 
C

CBFalconer

santosh said:
I've written the following function to return a string of arbitrary
length from stdin. So far, it seems to work as it should. Are there
any unportable assumptions and/or logical errors in the code? Would
it be better to return the length of the string? Or an integer value
indicating success or failure, (the type of failure too)?

.... snip code ...

The failure to return error indicators is fatal, IMO. Take a look
at my ggets function, at:

<http://cbfalconer.home.att.net/download/ggets.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
C

CBFalconer

Richard said:
.... snip ...

You've basically re-invented Chuck's ggets() function, by the way
- and your function suffers from the same problem as his, that it
has no way to re-use an existing buffer - the caller is obliged to
allocate fresh space for every line he reads. (My own similar
routine allows the buffer to be re-used if that's what the
programmer desires.)

You mis-interpret my intent. ggets returns a fresh buffer for
every call. Thus there is no doubt about what to do with that
buffer, it *always* eventually needs freeing. At the same time
there is no problem simply storing the returned char* pointer
somewhere for later use.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
O

Old Wolf

santosh said:
Okay, though I use vim, I haven't really learnt all it's options.

Understandable, if you subscribe to the theory that the
human brain has finite storage capacity :)
 
F

Frank Silvermann

CBFalconer said:
Richard Heathfield wrote:
... snip ...

You mis-interpret my intent. ggets returns a fresh buffer for
every call. Thus there is no doubt about what to do with that
buffer, it *always* eventually needs freeing. At the same time
there is no problem simply storing the returned char* pointer
somewhere for later use.
Can you please post a link for the ggets() source. I fear that I am not
only posting to usenet but e-mailing cbfalconer at the same time. frank
 
R

Richard Heathfield

CBFalconer said:
You mis-interpret my intent. ggets returns a fresh buffer for
every call. Thus there is no doubt about what to do with that
buffer, it *always* eventually needs freeing.

Sure. So do all dynamically allocated buffers. But yours can't be
effectively re-used in the meantime. You need one allocation per line, and
there might be millions of lines.

It's not a misinterpretation of your intent. It's a criticism of your design
goals. :)
 
C

Chris Dollin

santosh said:
It will return a pointer to a block of memory containing a string from
stdin.
If initial allocation fails a null pointer is returned.
If reallocation fails, the pre-existing partial input, if any, is
returned.
If end-of-file is hit, again, partial input, if any, is returned.

You haven't said what happens to the newlines. (Do they stay in, or do
they get trimmed out.)

I'm uncomfortable with the description because `partial input` is returned
for two distinct cases - EOF and reallocation failure - and also
because the same problem - reallocation failure - is treated in two
separate ways.

I'm not sure how much of the contract is because the code turned out
that way or because the contract was designed that way. I do like that
something is returned if it's possible to return it, though.

[And I still think it could be done in half the code. But it's Friday.]
 
S

santosh

Chris said:
You haven't said what happens to the newlines. (Do they stay in, or do
they get trimmed out.)

I'm leaving then untouched. What's customary for such functions? The
standard C library seems undecided on this issue. fgets() retains
newlines while gets() doesn't.
I'm uncomfortable with the description because `partial input` is returned
for two distinct cases - EOF and reallocation failure

Both situations leave me with some input but no way to continue. The
question is whether to return this input, and let the caller deal with
it, or return EOF/NULL.
- and also because the same problem - reallocation failure - is treated in two
separate ways.

That is because failure of the first reallocation is simple while
failure of subsequent reallocations result in some residual, incomplete
input. If I had decided not to return incomplete input, I would've
dealt with both types of realloc() failures in the same manner.
I'm not sure how much of the contract is because the code turned out
that way or because the contract was designed that way.

Actually I decided, (on paper), what the function would do for each
eventuality, before writing the code. I just don't know if my choices
are correct or contrary to what better C programmers would do.
I do like that
something is returned if it's possible to return it, though.

[And I still think it could be done in half the code. But it's Friday.]

I doubt if I could do this function in significantly less code even on
a Wednesday. Programming is not a profession, but rather, a hobby to
me, and I confess, I've been putting off learning the fundamentals for
an inexcusable amount of time.
 
C

CBFalconer

Frank said:
Can you please post a link for the ggets() source. I fear that I
am not only posting to usenet but e-mailing cbfalconer at the same
time. frank

Don't do that (e-mailing at the same time). It is an annoyance.

<http://cbfalconer.home.att.net/download/ggets.zip>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
C

CBFalconer

Richard said:
CBFalconer said:

Sure. So do all dynamically allocated buffers. But yours can't be
effectively re-used in the meantime. You need one allocation per
line, and there might be millions of lines.

It's not a misinterpretation of your intent. It's a criticism of
your design goals. :)

So let's consider those. My objective was the simplicity of gets
with absolute safety. That means the minimum of parameters. I
wanted re-entrancy, or at least as much as the file system allows,
so that eliminates static storage. I wanted to return error
conditions, so that meant at least one parameter (two if the file
needs to be specified). I wanted complete consistency in the
usage, thus no possible added \n, as in fgets.

Now you may complain that the malloc/realloc/free suite has to be
used for each input line, creating overhead. My response is that
any such overhead is negligible compared with the actual external
file reading overhead.

Finally, the independently allocated memory per line can be very
useful, especially when the end objective is to store the line
itself in some form of data base.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
C

CBFalconer

santosh said:
Chris Dollin wrote:
.... snip ...


I'm leaving then untouched. What's customary for such functions?
The standard C library seems undecided on this issue. fgets()
retains newlines while gets() doesn't.

So did you ever consider why those routines do that? Apart from
safety, what is the fundamental difference between ggets and
fgets? Custom should affect such things as the order of
parameters, not the fundamental action.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
I

Ian Collins

santosh said:
I've written the following function to return a string of arbitrary
length from stdin. So far, it seems to work as it should. Are there any
unportable assumptions and/or logical errors in the code? Would it be
better to return the length of the string? Or an integer value
indicating success or failure, (the type of failure too)?

Thanks.

#include <stdio.h>
#include <stdlib.h>

#define IBUFSIZ 32
#define BUF_DELTA IBUFSIZ
#define BUF_INP_DIFF 3
#define BUF_CROP_THRESHOLD (BUF_INP_DIFF + 1)
Why not make these const unsigned and put them in the function, rather
than use macros? Puts the definition closer to the use and if you do
happen to be stepping though with a debugger, it'll show the the value.
char *getl_stdin(void);

char *getl_stdin(void) {
char *inp_p = NULL, *inp_p_cpy;
int nc, buffer_exists = 0, no_buf_crop = 0;
size_t curr_bufsiz = 0, curr_inpsiz = 0;
I'd put these one per line.

If you break out the guts of each if and else into a well named
function, the whole thing becomes easier to read.

Are there any unit tests?
 
R

Richard Heathfield

CBFalconer said:
So let's consider those.
Okay.

My objective was the simplicity of gets with absolute safety.

Part of the simplicity of gets is that you can re-use the same array over
and over again. I accept that, in order to provide safety, you have to
dynallocate the memory, so that's going to involve free()-ing the buffer;
it's an unavoidable complication.
That means the minimum of parameters.

Okay, so it's a trade-off between - er - ease of use and ease of use! Fair
enough - I'll stop beating on you. But have you considered writing a more
complicated, parameter-infested version (with a different name), a version
that allows buffer re-use - and then letting the user choose between the
two?
 
C

CBFalconer

.... snip about ggets and philosophies ...
Okay, so it's a trade-off between - er - ease of use and ease of
use! Fair enough - I'll stop beating on you. But have you
considered writing a more complicated, parameter-infested version
(with a different name), a version that allows buffer re-use -
and then letting the user choose between the two?

But you've already done that! You just don't publicize it as much.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top