Reading a line of text from a stream

Discussion in 'C Programming' started by Nelu, Mar 5, 2008.

  1. Nelu

    Nelu Guest

    I would appreciate some comments on the code below for reading a line of
    text from a stream.


    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>

    /**
    * Reads a line of text from a stream and returns the string on
    * success and NULL on failure.
    * fstream - the stream to read from
    * maxRead - a variable that specifies the maximum number of
    * characters to read. If maxRead is NULL then SIZE_MAX-1 is
    * considered as the maximum value. If maxRead is different from NULL
    * then the function will set the value to the length of non-NULL
    * string returned by the function.
    * fullLine - a variable that specifies whether or not the end of line
    * has been reached (a complete line was read). It also returns true if
    EOF
    * has been reached. It can be set to NULL.
    */
    char *readStringWithLimit(FILE *fstream, size_t *maxRead,
    int *fullLine) {
    size_t capacityIncrement=256;
    size_t capacity=capacityIncrement;
    size_t readLimit;
    size_t stringLength=0;
    char *myStr, *tmpStr;
    int localFullLine;
    unsigned char c;
    int ic;

    if(maxRead==NULL) {
    readLimit=SIZE_MAX-1;
    } else {
    if((*maxRead)>SIZE_MAX-1) {
    readLimit=SIZE_MAX-1;
    } else {
    readLimit=*maxRead-1;
    }
    }

    myStr=malloc(capacity);
    if(!myStr) {
    return NULL;
    }
    while(1) {
    ic=fgetc(fstream);
    if(ic==EOF) {
    if(ferror(fstream)) {
    free(myStr);
    return NULL;
    } else {
    localFullLine=1;
    break;
    }
    } else {
    c=(unsigned char)ic;
    if(c=='\n') {
    localFullLine=1;
    break;
    } else {
    if(readLimit>=stringLength) {
    if(stringLength==capacity) {
    //calculate the necessary size
    if(SIZE_MAX-capacityIncrement>capacity) {
    capacity+=capacityIncrement;
    } else {
    capacity=SIZE_MAX;
    }
    tmpStr=realloc(myStr,capacity);
    if(tmpStr) {
    myStr=tmpStr;
    } else {
    free(myStr);
    return NULL;
    }
    }
    myStr[stringLength++]=c;
    } else {
    ungetc(ic,fstream);
    localFullLine=0;
    break;
    }
    }
    }
    }
    if(stringLength==capacity) {
    tmpStr=realloc(myStr,capacity+1);
    if(tmpStr) {
    myStr=tmpStr;
    } else {
    free(myStr);
    return NULL;
    }
    }
    myStr[stringLength]='\0';
    if(maxRead) {
    *maxRead=stringLength;
    }
    if(fullLine) {
    *fullLine=localFullLine;
    }
    return myStr;
    }


    Thank you.

    --
    Ioan - Ciprian Tandau
    tandau _at_ freeshell _dot_ org (hope it's not too late)
    (... and that it still works...)
    Nelu, Mar 5, 2008
    #1
    1. Advertising

  2. Nelu

    Micah Cowan Guest

    Nelu <> writes:

    > I would appreciate some comments on the code below for reading a line of
    > text from a stream.
    >
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #include <stdint.h>
    >
    > /**
    > * Reads a line of text from a stream and returns the string on
    > * success and NULL on failure.
    > * fstream - the stream to read from
    > * maxRead - a variable that specifies the maximum number of
    > * characters to read. If maxRead is NULL then SIZE_MAX-1 is
    > * considered as the maximum value. If maxRead is different from NULL
    > * then the function will set the value to the length of non-NULL
    > * string returned by the function.


    You've accidentally left out documentation that if maxRead is not
    NULL, it will be used to set an upper limit on the line read.

    The documentation above is also a bit hard to read; it'd be nice if it
    were easier to find the start of each parameter description. Perhaps
    by indenting continuation lines:

    * maxRead - a variable that specifies the maximum number of
    * characters to read. If maxRead is NULL then SIZE_MAX-1
    ...

    You've also neglected to explicitly specify that the terminating "\n"
    will not be included in the string.

    IME documentation tends to get out-of-sync with the code pretty
    quickly, even when it doesn't already start out that way. My
    preference is to document it in the form of readable test cases that
    demonstrate the supported use cases (if the function itself can't be
    made so readable as to make its usage obvious). Of course, I realize
    that various shops have various documentation requirements, let alone
    individual preferences.

    It's a personal preference thing, but I dislike that you're
    overloading maxRead's meaning to both specify an optional maximum
    _and_ receive the size of the string. It makes things cumbersome for
    callers that may desire one function or the other, but not both. The
    one who merely wishes to specify an upper length may have to copy
    their value into a "receiving" object, passing its pointer; the one
    who wants the length but not an upper length either has to pass a var
    preset to SIZE_MAX-1, or to call strlen() after.

    > * fullLine - a variable that specifies whether or not the end of line
    > * has been reached (a complete line was read). It also returns true if
    > EOF
    > * has been reached. It can be set to NULL.
    > */
    > char *readStringWithLimit(FILE *fstream, size_t *maxRead,
    > int *fullLine) {
    > size_t capacityIncrement=256;
    > size_t capacity=capacityIncrement;
    > size_t readLimit;
    > size_t stringLength=0;
    > char *myStr, *tmpStr;
    > int localFullLine;
    > unsigned char c;
    > int ic;
    >
    > if(maxRead==NULL) {
    > readLimit=SIZE_MAX-1;
    > } else {
    > if((*maxRead)>SIZE_MAX-1) {
    > readLimit=SIZE_MAX-1;
    > } else {
    > readLimit=*maxRead-1;
    > }
    > }
    >
    > myStr=malloc(capacity);
    > if(!myStr) {
    > return NULL;
    > }
    > while(1) {
    > ic=fgetc(fstream);


    Using fgets() is another option, and _might_ be faster; though it would
    require a call to strlen() to find the newline.

    > if(ic==EOF) {
    > if(ferror(fstream)) {
    > free(myStr);
    > return NULL;
    > } else {
    > localFullLine=1;
    > break;
    > }
    > } else {
    > c=(unsigned char)ic;
    > if(c=='\n') {
    > localFullLine=1;
    > break;
    > } else {
    > if(readLimit>=stringLength) {
    > if(stringLength==capacity) {

    ^^ see below
    > //calculate the necessary size
    > if(SIZE_MAX-capacityIncrement>capacity) {
    > capacity+=capacityIncrement;
    > } else {
    > capacity=SIZE_MAX;
    > }
    > tmpStr=realloc(myStr,capacity);
    > if(tmpStr) {
    > myStr=tmpStr;
    > } else {
    > free(myStr);
    > return NULL;
    > }
    > }
    > myStr[stringLength++]=c;
    > } else {
    > ungetc(ic,fstream);
    > localFullLine=0;
    > break;
    > }
    > }
    > }
    > }
    > if(stringLength==capacity) {
    > tmpStr=realloc(myStr,capacity+1);
    > if(tmpStr) {
    > myStr=tmpStr;
    > } else {
    > free(myStr);
    > return NULL;
    > }
    > }


    The above check and potential realloc() could have been avoided, if
    you instead checked for stringLength+1==capacity at the previous
    if statement that I marked.

    > myStr[stringLength]='\0';
    > if(maxRead) {
    > *maxRead=stringLength;
    > }
    > if(fullLine) {
    > *fullLine=localFullLine;
    > }
    > return myStr;
    > }


    This design suffers from a couple of difficiencies (IMO):
    - It is not possible to distinguish between an empty line and EOF.
    - It is not possible to distinguish between I/O failures and
    allocation failures (both return NULL).

    The first one seems like the larger problem. The caller can call
    feof(), to be sure, but it seems like poor design to require it.

    --
    Micah J. Cowan
    Programmer, musician, typesetting enthusiast, gamer...
    http://micah.cowan.name/
    Micah Cowan, Mar 5, 2008
    #2
    1. Advertising

  3. Nelu

    Nelu Guest

    On Wed, 05 Mar 2008 12:07:46 -0800, Micah Cowan wrote:

    > Nelu <> writes:
    >
    >> I would appreciate some comments on the code below for reading a line
    >> of text from a stream.

    <snip>
    >> /**
    >> * Reads a line of text from a stream and returns the string on *
    >> success and NULL on failure.
    >> * fstream - the stream to read from
    >> * maxRead - a variable that specifies the maximum number of *
    >> characters to read. If maxRead is NULL then SIZE_MAX-1 is * considered
    >> as the maximum value. If maxRead is different from NULL * then the
    >> function will set the value to the length of non-NULL * string
    >> returned by the function.

    >
    > You've accidentally left out documentation that if maxRead is not NULL,
    > it will be used to set an upper limit on the line read.
    >
    > The documentation above is also a bit hard to read; it'd be nice if it
    > were easier to find the start of each parameter description. Perhaps by
    > indenting continuation lines:
    >
    > * maxRead - a variable that specifies the maximum number of *
    > characters to read. If maxRead is NULL then SIZE_MAX-1
    > ...
    >
    > You've also neglected to explicitly specify that the terminating "\n"
    > will not be included in the string.
    >
    > IME documentation tends to get out-of-sync with the code pretty quickly,
    > even when it doesn't already start out that way. My preference is to
    > document it in the form of readable test cases that demonstrate the
    > supported use cases (if the function itself can't be made so readable as
    > to make its usage obvious). Of course, I realize that various shops have
    > various documentation requirements, let alone individual preferences.


    The documentation was just a late addition and I didn't spend time on it.
    I was more interested in the correctness of the code.

    >
    > It's a personal preference thing, but I dislike that you're overloading
    > maxRead's meaning to both specify an optional maximum _and_ receive the
    > size of the string. It makes things cumbersome for callers that may
    > desire one function or the other, but not both. The one who merely
    > wishes to specify an upper length may have to copy their value into a
    > "receiving" object, passing its pointer; the one who wants the length
    > but not an upper length either has to pass a var preset to SIZE_MAX-1,
    > or to call strlen() after.


    Fair enough.

    <snip code>
    > The above check and potential realloc() could have been avoided, if you
    > instead checked for stringLength+1==capacity at the previous if
    > statement that I marked.


    Good point.

    >
    > This design suffers from a couple of difficiencies (IMO):
    > - It is not possible to distinguish between an empty line and EOF. -
    > It is not possible to distinguish between I/O failures and allocation
    > failures (both return NULL).
    >


    I was thinking about adding these then I thought that it would get to be
    uglier than it already is.




    Thank you.


    --
    Ioan - Ciprian Tandau
    tandau _at_ freeshell _dot_ org (hope it's not too late)
    (... and that it still works...)
    Nelu, Mar 5, 2008
    #3
  4. Nelu

    Micah Cowan Guest

    Nelu <> writes:

    >> This design suffers from a couple of difficiencies (IMO):
    >> - It is not possible to distinguish between an empty line and EOF. -
    >> It is not possible to distinguish between I/O failures and allocation
    >> failures (both return NULL).

    >
    > I was thinking about adding these then I thought that it would get to be
    > uglier than it already is.


    Writing a readable implementation is surely important, but enabling
    users that are using your function to write readable code is, IMO,
    more important (also, I don't believe that the two have to be mutually
    exclusive).

    If you were even just returning NULL for EOF instead of an empty
    string (which would require only very minor alterations), users could
    write code like:

    while ((str = readStringWithLimit(...)) != NULL) {
    /* Do stuff with str */
    free(str);
    }
    /* Possible further checking with ferror() goes here.
    Need to use ferror() and feof() to determine which of
    I/O error, EOF, and out-of-memory have occured. */

    With code like the above, they'd be able to assume that the loop won't
    break unless either they've reached EOF or else an error condition has
    occurred.

    However, as it is, they must do something like:

    while ((str = readStringWithLimit(...)) != NULL) {
    if (*str == '\0') {
    if (feof(fstream))
    break;
    /* else it was just an empty line. */
    }
    /* Do stuff with str */
    free(str);
    }
    /* Possible further checking with ferror() goes here. */

    That seems much, much more cumbersome to me.

    Note that if they use my first code snippet with your current
    implementation, then if there are no errors the loop will go on
    forever, because it will just keep returning the empty string over and
    over again after EOF is reached.

    --
    Micah J. Cowan
    Programmer, musician, typesetting enthusiast, gamer...
    http://micah.cowan.name/
    Micah Cowan, Mar 5, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dbuser
    Replies:
    8
    Views:
    502
    Thomas J. Gritzan
    Oct 10, 2005
  2. kaushikshome
    Replies:
    4
    Views:
    753
    kaushikshome
    Sep 10, 2006
  3. xyz
    Replies:
    3
    Views:
    607
  4. curious
    Replies:
    1
    Views:
    164
    Patrick Spence
    Oct 25, 2006
  5. Replies:
    2
    Views:
    133
Loading...

Share This Page