Taking a stab at getline

Discussion in 'C Programming' started by Noob, Feb 7, 2013.

  1. Noob

    Noob Guest

    Hello group,

    Here's my attempt at writing a "get_line" implementation, which
    reads an entire line from a file stream, dynamically allocating
    the space needed to store said line.

    Since this is for my own personal use, I didn't bother handling
    out-of-memory conditions elegantly. I just kick the bucket.
    ("We all kick the bucket in the end, in the end.")

    EOF is signaled by returning NULL.

    I'd like to hear suggestions/criticism.

    #include <stdio.h>
    #include <stdlib.h>
    #define BUFLEN 4000
    char *get_line(FILE *stream)
    {
    char *s = NULL; size_t len = 0; int c;

    do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
    if (c == EOF) return NULL;
    ungetc(c, stream);

    while ( 1 )
    {
    size_t max = len + BUFLEN;
    s = realloc(s, max);
    if (s == NULL) exit(EXIT_FAILURE);
    while (len < max)
    {
    c = fgetc(stream);
    if (c == EOF || c == '\n')
    {
    s[len] = '\0';
    return realloc(s, len+1);
    }
    s[len++] = c;
    }
    }
    }

    Regards.
    Noob, Feb 7, 2013
    #1
    1. Advertising

  2. Noob

    BartC Guest

    "Noob" <root@127.0.0.1> wrote in message news:kf07e7$v0d$...

    > I'd like to hear suggestions/criticism.


    No user docs given. As I understand how it works:

    o The function allocates a new dynamic buffer for every line read; the
    caller is responsible for freeing this.

    o The buffer allocated will always be at least 4000 bytes? (Significant if
    the caller wants to store these pointers in an array)

    o A return value of NULL means end-of-file (out-of-memory signalled by
    aborting)

    o *Blank lines are ignored*. (This is different from a normal getline();
    blank lines can be significant. However blank lines containing only white
    space *are* returned.)

    o The return value points to a null-terminated string (which always contain
    at least one character), when it's not NULL.

    > while ( 1 )
    > {
    > size_t max = len + BUFLEN;
    > s = realloc(s, max);


    > if (s == NULL) exit(EXIT_FAILURE);
    > while (len < max)
    > {
    > c = fgetc(stream);
    > if (c == EOF || c == '\n')
    > {
    > s[len] = '\0';
    > return realloc(s, len+1);


    Why the extra byte allocated here?

    --
    Bartc
    BartC, Feb 7, 2013
    #2
    1. Advertising

  3. Noob

    BartC Guest

    "BartC" <> wrote in message news:0oOQs.3694$7...
    > "Noob" <root@127.0.0.1> wrote in message
    > news:kf07e7$v0d$...
    >


    > o The buffer allocated will always be at least 4000 bytes? (Significant if
    > the caller wants to store these pointers in an array)


    >> return realloc(s, len+1);

    >
    > Why the extra byte allocated here?


    I got it just as I pressed Send! (I rarely use realloc.)

    --
    Bartc
    BartC, Feb 7, 2013
    #3
  4. Noob

    Eric Sosman Guest

    On 2/7/2013 7:42 AM, Noob wrote:
    > Hello group,
    >
    > Here's my attempt at writing a "get_line" implementation, which
    > reads an entire line from a file stream, dynamically allocating
    > the space needed to store said line.
    >
    > Since this is for my own personal use, I didn't bother handling
    > out-of-memory conditions elegantly. I just kick the bucket.
    > ("We all kick the bucket in the end, in the end.")
    >
    > EOF is signaled by returning NULL.
    >
    > I'd like to hear suggestions/criticism.
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #define BUFLEN 4000
    > char *get_line(FILE *stream)
    > {
    > char *s = NULL; size_t len = 0; int c;
    >
    > do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */


    It's a matter of preference, but I wouldn't choose to do this.
    Whether an empty line is meaningful or not isn't something for a
    low-level utility function to decide.

    > if (c == EOF) return NULL;
    > ungetc(c, stream);
    >
    > while ( 1 )
    > {
    > size_t max = len + BUFLEN;


    There's a faint chance this sum could wrap around, but you're
    probably safe in ignoring it: You're likely to run out of memory
    before exceeding what a size_t can handle. Still, an "industrial-
    strength" function would be on guard against the possibility.

    > s = realloc(s, max);
    > if (s == NULL) exit(EXIT_FAILURE);


    Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
    bucket -- but here you're kicking the entire program's bucket,
    without even knowing what the bucket may hold. Ugh. Ugh. Ugh.

    > while (len < max)
    > {
    > c = fgetc(stream);


    Why fgetc() instead of getc()? (Same question above, too.)

    > if (c == EOF || c == '\n')
    > {
    > s[len] = '\0';
    > return realloc(s, len+1);


    ... possibly returning NULL after successfully reading a line,
    plus leaking the memory where the line resides ... You could
    retain the original pointer and return it if realloc() fails, or
    you could make BUFLEN smaller and just not worry about the extra
    few bytes.

    > }
    > s[len++] = c;
    > }
    > }
    > }


    --
    Eric Sosman
    d
    Eric Sosman, Feb 7, 2013
    #4
  5. Noob

    James Kuyper Guest

    On 02/07/2013 07:42 AM, Noob wrote:
    > Hello group,
    >
    > Here's my attempt at writing a "get_line" implementation, which
    > reads an entire line from a file stream, dynamically allocating
    > the space needed to store said line.
    >
    > Since this is for my own personal use, I didn't bother handling
    > out-of-memory conditions elegantly. I just kick the bucket.
    > ("We all kick the bucket in the end, in the end.")
    >
    > EOF is signaled by returning NULL.
    >
    > I'd like to hear suggestions/criticism.
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #define BUFLEN 4000
    > char *get_line(FILE *stream)
    > {
    > char *s = NULL; size_t len = 0; int c;
    >
    > do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
    > if (c == EOF) return NULL;
    > ungetc(c, stream);
    >
    > while ( 1 )
    > {
    > size_t max = len + BUFLEN;
    > s = realloc(s, max);
    > if (s == NULL) exit(EXIT_FAILURE);
    > while (len < max)
    > {
    > c = fgetc(stream);
    > if (c == EOF || c == '\n')
    > {


    You're treating End Of File, End Of Line, and an I/O error in exactly
    the same fashion. In some contexts that might be acceptable, but I would
    ordinarily want to distinguish them. As long as you're restricted to
    reporting errors by returning NULL, you can't distinguish them. That's
    why I generally prefer using the return value of my function to report
    an error code; I'd take a pointer to a pointer as an argument to the
    function, and return the pointer the allocated memory through that argument.

    > s[len] = '\0';
    > return realloc(s, len+1);
    > }
    > s[len++] = c;
    > }
    > }
    > }


    When this routine is done, the only record of the length of the
    allocated memory is the null character at the end of the allocation. I
    prefer storing the length along with the allocation. To avoid having to
    allocate two separate objects, I use a flexible array member:

    struct allocation {
    size_t length;
    allocated_type data[];
    };

    In this case, allocated_type is char. Allocate the entire structure, and
    then store it's length in the structure. Reallocate as needed, storing
    the new length in the structure only after the reallocation is successful.

    On the rare occasions when I've done anything like this, I've usually
    occasionally desired to re-use an existing allocation. Therefore, the
    routine should take a pointer to an allocation as input, and malloc() a
    new one only if that pointer is null.
    --
    James Kuyper
    James Kuyper, Feb 7, 2013
    #5
  6. Noob

    James Kuyper Guest

    On 02/07/2013 08:55 AM, BartC wrote:
    ....
    > o The buffer allocated will always be at least 4000 bytes? (Significant if
    > the caller wants to store these pointers in an array)


    No, it gets reallocated below 4000 if less space is needed.

    ....
    > o *Blank lines are ignored*. (This is different from a normal getline();
    > blank lines can be significant. However blank lines containing only white
    > space *are* returned.)


    It ignores leading blank lines. That's a specialized usage, A general
    purpose function shouldn't work that way. But blank lines after the
    first blank line are not ignored; they cause the return of a pointer to
    null-terminated string containing no characters, which seems reasonable.
    --
    James Kuyper
    James Kuyper, Feb 7, 2013
    #6
  7. Noob

    Eric Sosman Guest

    On 2/7/2013 10:49 AM, James Kuyper wrote:
    > On 02/07/2013 08:55 AM, BartC wrote:
    > ...
    >> o The buffer allocated will always be at least 4000 bytes? (Significant if
    >> the caller wants to store these pointers in an array)

    >
    > No, it gets reallocated below 4000 if less space is needed.
    >
    > ...
    >> o *Blank lines are ignored*. (This is different from a normal getline();
    >> blank lines can be significant. However blank lines containing only white
    >> space *are* returned.)

    >
    > It ignores leading blank lines. That's a specialized usage, A general
    > purpose function shouldn't work that way. But blank lines after the
    > first blank line are not ignored; they cause the return of a pointer to
    > null-terminated string containing no characters, which seems reasonable.


    It looks like *all* empty lines are skipped, since the
    same skippage occurs on each call to the function. Or did
    I miss something?

    --
    Eric Sosman
    d
    Eric Sosman, Feb 7, 2013
    #7
  8. Noob

    James Kuyper Guest

    On 02/07/2013 11:07 AM, Eric Sosman wrote:
    > On 2/7/2013 10:49 AM, James Kuyper wrote:

    ....
    >> It ignores leading blank lines. That's a specialized usage, A general
    >> purpose function shouldn't work that way. But blank lines after the
    >> first blank line are not ignored; they cause the return of a pointer to
    >> null-terminated string containing no characters, which seems reasonable.

    >
    > It looks like *all* empty lines are skipped, since the
    > same skippage occurs on each call to the function. Or did
    > I miss something?


    No, I just made a stupid mistake while reading the code. It's one of
    several such mistakes, but I caught all (I hope!) of the others before
    posting my message.
    --
    James Kuyper
    James Kuyper, Feb 7, 2013
    #8
  9. Noob

    Noob Guest

    Eric Sosman wrote:
    > On 2/7/2013 7:42 AM, Noob wrote:
    >> Hello group,
    >>
    >> Here's my attempt at writing a "get_line" implementation, which
    >> reads an entire line from a file stream, dynamically allocating
    >> the space needed to store said line.
    >>
    >> Since this is for my own personal use, I didn't bother handling
    >> out-of-memory conditions elegantly. I just kick the bucket.
    >> ("We all kick the bucket in the end, in the end.")
    >>
    >> EOF is signaled by returning NULL.
    >>
    >> I'd like to hear suggestions/criticism.
    >>
    >> #include <stdio.h>
    >> #include <stdlib.h>
    >> #define BUFLEN 4000
    >> char *get_line(FILE *stream)
    >> {
    >> char *s = NULL; size_t len = 0; int c;
    >>
    >> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */

    >
    > It's a matter of preference, but I wouldn't choose to do this.
    > Whether an empty line is meaningful or not isn't something for a
    > low-level utility function to decide.


    You're probably right. I'll change it in v2.

    >> if (c == EOF) return NULL;
    >> ungetc(c, stream);
    >>
    >> while ( 1 )
    >> {
    >> size_t max = len + BUFLEN;

    >
    > There's a faint chance this sum could wrap around, but you're
    > probably safe in ignoring it: You're likely to run out of memory
    > before exceeding what a size_t can handle. Still, an "industrial-
    > strength" function would be on guard against the possibility.


    I understand the point you are making.

    >> s = realloc(s, max);
    >> if (s == NULL) exit(EXIT_FAILURE);

    >
    > Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
    > bucket -- but here you're kicking the entire program's bucket,
    > without even knowing what the bucket may hold. Ugh. Ugh. Ugh.


    Thing is, since I chose NULL to mean EOF, there was no "room"
    left for me to signal OOM. Unless I change the prototype...
    I'll give it a shot in v2.

    >> while (len < max)
    >> {
    >> c = fgetc(stream);

    >
    > Why fgetc() instead of getc()? (Same question above, too.)


    At some point in the early stages, I was thinking that it might
    be "dangerous" to use the macro version. Obviously, there's no
    problem when 'stream' is a function argument.

    >> if (c == EOF || c == '\n')
    >> {
    >> s[len] = '\0';
    >> return realloc(s, len+1);

    >
    > ... possibly returning NULL after successfully reading a line,
    > plus leaking the memory where the line resides ... You could
    > retain the original pointer and return it if realloc() fails, or
    > you could make BUFLEN smaller and just not worry about the extra
    > few bytes.


    Wait what?! Reallocating to a smaller size (i.e. truncating
    excessive space) could fail?

    At this point, len < max, thus len+1 <= max

    And I thought realloc frees its argument if it doesn't return
    what was passed in?

    Regards.
    Noob, Feb 7, 2013
    #9
  10. Noob

    Noob Guest

    BartC wrote:

    > Noob wrote:
    >
    >> I'd like to hear suggestions/criticism.

    >
    > No user docs given. As I understand how it works:


    I need to add a comment explaining what the caller is supposed
    to do with the return value.

    > o The function allocates a new dynamic buffer for every line read; the
    > caller is responsible for freeing this.


    Correct.

    > o The buffer allocated will always be at least 4000 bytes? (Significant if
    > the caller wants to store these pointers in an array)


    No, the buffer is just big enough to store the string.

    > o A return value of NULL means end-of-file (out-of-memory signaled by
    > aborting)


    Right.

    > o *Blank lines are ignored*. (This is different from a normal getline();
    > blank lines can be significant. However blank lines containing only white
    > space *are* returned.)


    Right. I'll make the behavior more consistent.

    > o The return value points to a null-terminated string (which always contain
    > at least one character), when it's not NULL.


    Correct.

    >> while ( 1 )
    >> {
    >> size_t max = len + BUFLEN;
    >> s = realloc(s, max);

    >
    >> if (s == NULL) exit(EXIT_FAILURE);
    >> while (len < max)
    >> {
    >> c = fgetc(stream);
    >> if (c == EOF || c == '\n')
    >> {
    >> s[len] = '\0';
    >> return realloc(s, len+1);

    >
    > Why the extra byte allocated here?


    For the terminator :)

    Regards.
    Noob, Feb 7, 2013
    #10
  11. Noob

    Eric Sosman Guest

    On 2/7/2013 11:44 AM, Noob wrote:
    > Eric Sosman wrote:
    >> On 2/7/2013 7:42 AM, Noob wrote:
    >>> [...]
    >>> s = realloc(s, max);
    >>> if (s == NULL) exit(EXIT_FAILURE);

    >>
    >> Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
    >> bucket -- but here you're kicking the entire program's bucket,
    >> without even knowing what the bucket may hold. Ugh. Ugh. Ugh.

    >
    > Thing is, since I chose NULL to mean EOF, there was no "room"
    > left for me to signal OOM. Unless I change the prototype...
    > I'll give it a shot in v2.


    One possibility is to keep your "narrow" interface and
    let the caller figure it out:

    char *line = get_line(stream);
    if (line == NULL) {
    if (feof(stream))
    fputs("End of file\n", stderr);
    else if ferror(stream))
    fputs("I/O error\n", stderr);
    else
    fputs("Out of memory\n", stderr);
    }

    This is not as squeaky-clean as some other approaches (for example,
    you could return a struct containing a status code and a pointer),
    but it still allows the caller to discover everything needed.

    If you decide to return NULL (or other failure indication) on
    an I/O error, it might be a good idea to preserve the errno value
    in case the failing getc() set it. The issue is to prevent other
    functions from clobbering it while cleaning up: copy it to an int
    of your own, call free() or whatever, then restore it before
    returning.

    >>> return realloc(s, len+1);

    >>
    >> ... possibly returning NULL after successfully reading a line,
    >> plus leaking the memory where the line resides ... You could
    >> retain the original pointer and return it if realloc() fails, or
    >> you could make BUFLEN smaller and just not worry about the extra
    >> few bytes.

    >
    > Wait what?! Reallocating to a smaller size (i.e. truncating
    > excessive space) could fail?


    Sure. Consider a memory manager that uses different "arenas"
    for different allocation sizes: Shrinking an allocation might move
    it from one arena to another. Or consider a memory manager that
    *always* reallocates, maybe filling the old area with 0xDEADBEEF
    to aid in debugging.

    > At this point, len < max, thus len+1 <= max
    >
    > And I thought realloc frees its argument if it doesn't return
    > what was passed in?


    No: If realloc() fails, it leaves the original allocation
    untouched.

    --
    Eric Sosman
    d
    Eric Sosman, Feb 7, 2013
    #11
  12. Noob

    Noob Guest

    James Kuyper wrote:

    > You're treating End Of File, End Of Line, and an I/O error in exactly
    > the same fashion. In some contexts that might be acceptable, but I would
    > ordinarily want to distinguish them. As long as you're restricted to
    > reporting errors by returning NULL, you can't distinguish them. That's
    > why I generally prefer using the return value of my function to report
    > an error code; I'd take a pointer to a pointer as an argument to the
    > function, and return the pointer the allocated memory through that argument.


    getc returning -1 means either EOF or I/O error, right?
    At this point, would the following assertion hold:

    !feof(stream) == !!ferror(stream)

    If we're dealing with an I/O error, how can we learn more about
    the nature of the error? Is errno set at some point?

    > When this routine is done, the only record of the length of the
    > allocated memory is the null character at the end of the allocation. I
    > prefer storing the length along with the allocation. To avoid having to
    > allocate two separate objects, I use a flexible array member:
    >
    > struct allocation {
    > size_t length;
    > allocated_type data[];
    > };
    >
    > In this case, allocated_type is char. Allocate the entire structure, and
    > then store it's length in the structure. Reallocate as needed, storing
    > the new length in the structure only after the reallocation is successful.
    >
    > On the rare occasions when I've done anything like this, I've usually
    > occasionally desired to re-use an existing allocation. Therefore, the
    > routine should take a pointer to an allocation as input, and malloc() a
    > new one only if that pointer is null.


    The getline implementation in glibc works like you describe.
    Using a flexible array is an interesting suggestion.

    Regards.
    Noob, Feb 7, 2013
    #12
  13. Noob

    Eric Sosman Guest

    On 2/7/2013 12:16 PM, Noob wrote:
    > James Kuyper wrote:
    >
    >> You're treating End Of File, End Of Line, and an I/O error in exactly
    >> the same fashion. In some contexts that might be acceptable, but I would
    >> ordinarily want to distinguish them. As long as you're restricted to
    >> reporting errors by returning NULL, you can't distinguish them. That's
    >> why I generally prefer using the return value of my function to report
    >> an error code; I'd take a pointer to a pointer as an argument to the
    >> function, and return the pointer the allocated memory through that argument.

    >
    > getc returning -1 means either EOF or I/O error, right?


    Almost, but not quite. A return of EOF (often -1, but
    possibly some other negative value) means end-of-input or
    I/O error. On some "exotic" systems it might even mean a
    character was successfully read; see below.

    > At this point, would the following assertion hold:
    >
    > !feof(stream) == !!ferror(stream)


    On an "exotic" system where UCHAR_MAX > INT_MAX, it is
    possible that a perfectly ordinary unsigned char could be equal
    to EOF after conversion to int; if both feof() and ferror() are
    false you've successfully read that character.

    Even on a "vanilla" system I think both feof() and ferror()
    could be true, if you've kept on calling getc() after an earlier
    failure indication. The error and end-of-file indicators are
    "sticky:" once they're set, they remain set until you clear
    them (the actions that clear them are different for the two
    indicators). So if you breeze past an I/O error and then
    encounter end-of-input, I think both indicators could be set.

    > If we're dealing with an I/O error, how can we learn more about
    > the nature of the error? Is errno set at some point?


    Some implementations may set it; none are obliged to. This
    is part of the price one pays for portable I/O: The library
    shields you from the ugly details of the underlying system, but
    at the same time prevents you from peeking at them. "Shielding"
    and "hiding" are closely related ...

    --
    Eric Sosman
    d
    Eric Sosman, Feb 7, 2013
    #13
  14. Noob <root@127.0.0.1> writes:
    > James Kuyper wrote:
    >> You're treating End Of File, End Of Line, and an I/O error in exactly
    >> the same fashion. In some contexts that might be acceptable, but I would
    >> ordinarily want to distinguish them. As long as you're restricted to
    >> reporting errors by returning NULL, you can't distinguish them. That's
    >> why I generally prefer using the return value of my function to report
    >> an error code; I'd take a pointer to a pointer as an argument to the
    >> function, and return the pointer the allocated memory through that argument.

    >
    > getc returning -1 means either EOF or I/O error, right?


    Terminology problem.

    EOF is a macro defined in <stdio.h>. It expands to an integer
    constant expression of type int with a negative value. It's a value
    that can be returned by certain I/O functions, usually to indicate
    that no more input is available. Its value is typically -1, but
    you shouldn't assume that.

    If getc() returns the value of EOF, it means that it tried and failed
    to read a character from stdin. This can be either because of an
    end-of-file condition (which will cause feof(stdin) to return a true
    result), or by an error condition (which will cause ferror(stdin)
    to return a true result).

    EOF stands for "end of file", but don't confuse EOF (either the
    macro or the int value it denotes) with an end-of-file condition.

    As the standard says (N1570 7.21.7.5):

    int getc(FILE *stream);

    The getc function returns the next character from the input
    stream pointed to by stream. If the stream is at end-of-file,
    the end-of-file indicator for the stream is set and getc returns
    EOF. If a read error occurs, the error indicator for the stream
    is set and getc returns EOF.

    (It's a bit easier to read with the highlighting in the PDF.)

    > At this point, would the following assertion hold:
    >
    > !feof(stream) == !!ferror(stream)


    It should, yes.

    > If we're dealing with an I/O error, how can we learn more about
    > the nature of the error? Is errno set at some point?


    Maybe. The standard doesn't require anything in <stdio.h> to set errno,
    but POSIX does.

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 7, 2013
    #14
  15. Noob <root@127.0.0.1> writes:

    > Here's my attempt at writing a "get_line" implementation, which
    > reads an entire line from a file stream, dynamically allocating
    > the space needed to store said line.


    A couple of things that have not yet been said...

    > Since this is for my own personal use, I didn't bother handling
    > out-of-memory conditions elegantly. I just kick the bucket.
    > ("We all kick the bucket in the end, in the end.")
    >
    > EOF is signaled by returning NULL.
    >
    > I'd like to hear suggestions/criticism.
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #define BUFLEN 4000


    Tiny point: I don't like the name, since it is not the length of a
    buffer -- it's a length increment.

    More substantively, it's often better to grow the allocation by a
    constant factor rather than by a constant increment. However, since
    realloc is probably cheap compared to I/O it may not matter here, but
    it's worth considering.

    > char *get_line(FILE *stream)
    > {
    > char *s = NULL; size_t len = 0; int c;
    >
    > do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
    > if (c == EOF) return NULL;
    > ungetc(c, stream);
    >
    > while ( 1 )
    > {
    > size_t max = len + BUFLEN;
    > s = realloc(s, max);
    > if (s == NULL) exit(EXIT_FAILURE);


    If you take Eric's suggestion of returning NULL (leaving the caller to
    figure our the reason for the failure) you should adjust this to free
    the allocated buffer before returning. If a huge line has caused an out
    of memory condition, the last thing you want to do is return without
    freeing what you can.

    > while (len < max)
    > {
    > c = fgetc(stream);
    > if (c == EOF || c == '\n')
    > {
    > s[len] = '\0';
    > return realloc(s, len+1);
    > }
    > s[len++] = c;
    > }
    > }
    > }
    >
    > Regards.


    --
    Ben.
    Ben Bacarisse, Feb 7, 2013
    #15
  16. Noob

    Shao Miller Guest

    On 2/7/2013 07:42, Noob wrote:
    >
    > Here's my attempt at writing a "get_line" implementation, which
    > reads an entire line from a file stream, dynamically allocating
    > the space needed to store said line.
    >


    Based on your other posts in this thread, I gather your next version
    won't skip lines, or it'll be documented and the name adjusted.

    > Since this is for my own personal use, I didn't bother handling
    > out-of-memory conditions elegantly. I just kick the bucket.
    > ("We all kick the bucket in the end, in the end.")
    >
    > EOF is signaled by returning NULL.
    >


    An alternative is to return an 'enum' value which a caller can 'switch'
    with, or opaque values which a header can define macros for.

    > I'd like to hear suggestions/criticism.
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    > #define BUFLEN 4000


    As Ben mentioned, it might be better called 'BUFINC', or some such. Or
    you could use 'BUFSIZ' from stdio.h. Or you could allow the caller to
    make the choice either through a parameter, encapsulating the choice in
    some kind of state, or even with a callback function that allows the
    caller to specify the method of growth. (You could have a default
    choice or callback, in case the caller isn't interested.)

    > char *get_line(FILE *stream)
    > {
    > char *s = NULL; size_t len = 0; int c;
    >
    > do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
    > if (c == EOF) return NULL;
    > ungetc(c, stream);
    >
    > while ( 1 )
    > {
    > size_t max = len + BUFLEN;
    > s = realloc(s, max);
    > if (s == NULL) exit(EXIT_FAILURE);


    Although you said above that error-handling wasn't particularly
    graceful, this call to 'exit' seems a bit harsh. :)

    > while (len < max)
    > {
    > c = fgetc(stream);
    > if (c == EOF || c == '\n')
    > {
    > s[len] = '\0';
    > return realloc(s, len+1);
    > }
    > s[len++] = c;
    > }
    > }
    > }
    >


    Since a stream might be "line buffered", I wonder if you could gain any
    sort of advantage by using 'fscanf'... The following quick C89 code
    (could have bugs!) doesn't try to grow any buffer, but uses 'fscanf':

    #include <stdio.h>

    int fetch_a_line(char * buf, unsigned int bufsz, FILE * stream) {
    char format_buf[
    /* '%' */
    1 +
    /* 32 decimal digits */
    32 +
    /* "[^\n]%n" */
    6 +
    /* null terminator */
    1
    ];
    int matched;
    int bytes;
    int c;

    sprintf(format_buf, "%%%u[^\n]%%n", bufsz);
    matched = fscanf(stream, format_buf, buf, &bytes);
    c = getc(stream);
    if (c != '\n')
    ungetc(c, stream);
    if (matched > 0)
    return bytes;
    return matched;
    }

    int main(void) {
    char test[512];
    int bytes;

    while (1) {
    bytes = fetch_a_line(test, sizeof test - 1, stdin);
    if (bytes == EOF)
    break;
    printf("Read line of %d characters: %s\n", bytes, test);
    }
    printf("Done.\n");
    return 0;
    }

    --
    - Shao Miller
    --
    "Thank you for the kind words; those are the kind of words I like to hear.

    Cheerily," -- Richard Harter
    Shao Miller, Feb 8, 2013
    #16
  17. Noob

    Noob Guest

    pete wrote:

    > I deactivated that feature
    > and modified a get_line_test program which I already had,
    > so that it would work with your function.


    What was your conclusion of your test?
    Noob, Feb 8, 2013
    #17
  18. Philip Lantz <> writes:
    > Keith Thompson wrote:
    >> Noob writes:
    >> > getc returning -1 means either EOF or I/O error, right?

    >>
    >> Terminology problem.
    >>
    >> EOF is a macro defined in <stdio.h>.

    >
    > Yes, but "EOF" also means "end of file" in ordinary everyday
    > computerese, which is clearly the way he was using the word in the
    > sentence above.


    Yes, but given the existence of the EOF macro, using EOF *in a C
    context* to refer to an end-of-file condition is potentially confusing.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 9, 2013
    #18
  19. Noob

    Rosario1903 Guest

    On Thu, 07 Feb 2013 13:42:12 +0100, Noob <root@127.0.0.1> wrote:

    >Hello group,
    >
    >Here's my attempt at writing a "get_line" implementation, which
    >reads an entire line from a file stream, dynamically allocating
    >the space needed to store said line.
    >
    >Since this is for my own personal use, I didn't bother handling
    >out-of-memory conditions elegantly. I just kick the bucket.
    >("We all kick the bucket in the end, in the end.")
    >
    >EOF is signaled by returning NULL.
    >
    >I'd like to hear suggestions/criticism.


    // ritorna un puntatore da liberarsi tramite free()
    // ritorna 0 per errore
    char* __export GetLineI(char* testo)
    {unsigned s, i;
    int c;
    char *p,*t;
    s=1024; p=malloc(s+4);
    if(p==0) return 0;
    printf("%s", testo); fflush(stdout);
    for(i=0; (c=getchar())!=EOF ; )
    {if(i>=s)
    {s=s+1024;
    t=realloc(p, s+4);
    if(t==0||(int)s<0)
    {free(p); return 0;}
    p=t;
    }
    p=c; ++i;
    if(c=='\n')
    break;
    }
    p=0;
    return p;
    }
    Rosario1903, Feb 9, 2013
    #19
  20. Noob

    Shao Miller Guest

    On 2/9/2013 13:02, Rosario1903 wrote:
    > On Thu, 07 Feb 2013 13:42:12 +0100, Noob <root@127.0.0.1> wrote:
    >
    >> Hello group,
    >>
    >> Here's my attempt at writing a "get_line" implementation, which
    >> reads an entire line from a file stream, dynamically allocating
    >> the space needed to store said line.
    >>
    >> Since this is for my own personal use, I didn't bother handling
    >> out-of-memory conditions elegantly. I just kick the bucket.
    >> ("We all kick the bucket in the end, in the end.")
    >>
    >> EOF is signaled by returning NULL.
    >>
    >> I'd like to hear suggestions/criticism.

    >
    > [some code]


    Here's an 'indent -kr' version of Rosario1903's code:

    // ritorna un puntatore da liberarsi tramite free()
    // ritorna 0 per errore
    char *__export GetLineI(char *testo)
    {
    unsigned s, i;
    int c;
    char *p, *t;
    s = 1024;
    p = malloc(s + 4);
    if (p == 0)
    return 0;
    printf("%s", testo);
    fflush(stdout);
    for (i = 0; (c = getchar()) != EOF;) {
    if (i >= s) {
    s = s + 1024;
    t = realloc(p, s + 4);
    if (t == 0 || (int) s < 0) {
    free(p);
    return 0;
    }
    p = t;
    }
    p = c;
    ++i;
    if (c == '\n')
    break;
    }
    p = 0;
    return p;
    }


    --
    - Shao Miller
    --
    "Thank you for the kind words; those are the kind of words I like to hear.

    Cheerily," -- Richard Harter
    Shao Miller, Feb 9, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John
    Replies:
    10
    Views:
    10,046
    Buster Copley
    Jul 14, 2003
  2. Poof
    Replies:
    1
    Views:
    382
  3. Cirene
    Replies:
    6
    Views:
    427
    Cowboy \(Gregory A. Beamer\)
    Sep 8, 2008
  4. Jim Cain
    Replies:
    1
    Views:
    204
    Yukihiro Matsumoto
    Jul 18, 2003
  5. Daniel Berger

    First stab at a weakref test suite

    Daniel Berger, Dec 21, 2007, in forum: Ruby
    Replies:
    6
    Views:
    131
    Michal Suchanek
    Dec 22, 2007
Loading...

Share This Page