Designing fgetline - a perspective

Discussion in 'C Programming' started by Richard Harter, Oct 12, 2007.

  1. The following is an exercise in thinking out a design.
    Comments and thoughts are welcome.

    Writing a routine to read lines from a file is one of
    those little design tasks that seems to create disagreement and
    confusion. The routine has various names; in this article we
    will call it fgetline. Here is a take on how to do it.

    What we want is a routine that keeps reading and returning lines
    from a file. A simple prototype for this function is

    char * fgetline(FILE *);

    There are some gotchas with this prototype - well, not really
    gotchas, rather little bits of awkwardness and inefficiency. One

    is that we are throwing away information that (usually) has to be

    recomputed, namely the length of the line and, as we shall see,
    status information. If we want to add the information to our
    prototype there are several that to do it - it can be
    passed back through the calling sequence or it can be returned by

    the function. (We could also pass it to a global - ugh, don't do

    it.) My vote is to have it all returned by the function which
    means we create a struct (record) that looks like:

    struct fgetline_info {
    char *line;
    size_t len;
    int status;
    };

    There are various choices that could be made in the types -
    season to taste. (A pointer to the struct could be passed in as
    an argument if need be.)


    A second issue is that there is no limit to how large the line
    can be. A plausible way to provide a limit that is to pass one
    in as an argument. If we do, our prototype looks like this:

    struct fgetline_info fgetline(FILE * fptr, size_t maxsize);

    This version has a little problem - where did the storage for the

    line come from? One way to deal with that is for fgetline to
    allocate storage for the line and for the calling routine to free

    it when it is done with the line.

    Okay, so how do we do this in fgetline? Well, there is a very
    simple way to do it, one that has been reinvented time and time
    again. We start out allocating a standard amount of space
    (numbers like 32, 64, and 80 bytes are typical) for a line.
    We read from the file until we either hit an EOL (end of line
    marker), a failure to read, or we have read as many characters as

    were allocated. If we haven't hit the EOL we reallocate the
    array, commonly by doubling the size, and doing another read.

    This works, but it is inefficient - we have to call malloc and
    free for every line. One way to get around this is to use a
    technique I call highwater buffering. In the file containing the

    code for fgetline we have two file-scope variables:

    static char * buffer = 0;
    static size_t size = 0;

    In fgetline we have some initialization code that allocates
    buffer space when the buffer size is zero. Thereafter we use our

    little doubling trick. The advantage of this scheme is that we
    have at most a small number of calls to malloc (the number of
    doublings needed to get to the largest line) and no calls to
    free.

    The very real disadvantage of this scheme is that it produces a
    dirty copy of the line. By a dirty copy, I mean one that can be
    scribbled on elsewhere before we are done with it. This will
    happen whenever fgetline is called anywhere else with any FILE
    argument before our next read. This is double plus ungood.

    Is there a way to get the efficiency of the highwater scheme
    without being dirty? Yes; the trick is that the calling program
    provides the initial buffer. If we add the buffer information to

    the prototype we get:

    fgetline_info fgetline(FILE * fptr,
    char * buffer,
    size_t size,
    size_t maxsize);

    There are three kinds of copies of the line that we might get
    from fgetline - clean, transient, and dirty. A clean copy is one

    that has no encumbrances - it lasts until it is specifically
    deleted. A transient copy is one that lasts until the next
    call to fgetline to get another line from the current file.
    (There may be another file being read elsewhere.)

    What kind of copy should fgetline provide, clean or transient?
    There are arguments for each choice. Clean copies are more
    expensive but safer. Transient copies are (usually) cheaper. In

    questions like this there is much to be said for the design
    principle that:

    When there is a significant choice as to the kind of
    output being delivered the library routine should let
    the user make the choice rather than dictating the
    choice unless offering the choice unduly complicates
    usage.

    So how do we provide a choice? That is simple; if the buffer
    pointer in the calling sequence is NULL, fgetline must return a
    clean copy; otherwise fgetline MAY return a transient copy. I
    say "may" because it might have to increase the buffer size.
    If it does the calling routine will have to check whether there
    was a size increase and, if so, will have to free the returned
    buffer. (The fgetline implementation can't realloc the passed
    in buffer; it will have to do a malloc for the first resizing.)

    It's probably best to have a value in the status field that
    signifies the size has been increased; call that value
    fg_increased and a normal read fg_normal. Then the usage for
    transient copies might look like this (first cut):

    ...
    struct fgetline_info fg;
    char buffer[80];
    int done = 0;
    ....
    for(;!done;) {
    ...
    fg = fgetline(fptr,buffer,80,FG_MAXSIZE);
    if (fg.status == fg_normal || fg.status == fg_increased) {
    /* do stuff */
    if (fg.status == fg_increased) free(fg.line);
    } else done = 1;
    }

    If we want clean copies the corresponding loop body would be

    fg = fgetline(fptr,0,0,FG_MAXSIZE);
    if (fg.status == fg_normal) {
    /* do stuff */
    free(fg.line);
    } else done = 1;

    What sort of return values should be available in the status
    field? These are the successful ones that occur to me:

    end_of file The last read (if any) found an EOL
    marker. The current read finds an
    immediate EOF.

    no_increase Normal read - either buffer was null or
    else it was not increased.

    increase Normal read - the buffer size has been
    increased.

    abn_no_increase Abnormal read - an EOF was found without
    an EOL. Buffer was not increased.

    abn_increase Abnormal read - an EOF was found without
    an EOL. Buffer was increased.

    In addition there are numerous kinds of errors that can occur.
    Calling sequence arguments include:

    no_file The file pointer is null.
    bad_buffer One and only one of buffere and size s
    zero.
    bad_size Size is 0 or is greater than maxsize
    bad_maxsize Maxsize is 0

    Then there are the memory allocation failures:

    bad_allocate Malloc or realloc failure
    big_line Line length is greater than maxsize.

    When there is a memory allocation failure the line and len fields

    hold what has been successfuly read.

    There are various conventions one could use for assigning code
    values for the possible return values. My view is that there
    should be a bit that is set only if there was an error, a bit
    that is set only if there was a memory allocation error, a bit
    that is set only if there was a buffer increase, and a bit that
    is set only if an EOL occurred.

    The point of doing this is that we can have simple tests to check

    whether an error occurred, whether some space has to be freed,
    and whether the file read is complete.

    As a final note, there is one minor decision left unsettled,
    namely should the returned line include an EOL marker before the
    string terminating 0. My take is that this matter is too trivial

    to warrant adding a flag to the calling sequence, and that it
    will be slightly less confusing if there is one present even if
    it has to be manufactured. Perhaps someone has a convincing
    argument one way or the other.


    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #1
    1. Advertising

  2. Richard Harter said:

    <snip>

    > What we want is a routine that keeps reading and returning lines
    > from a file. A simple prototype for this function is
    >
    > char * fgetline(FILE *);
    >
    > There are some gotchas with this prototype


    Check out http://www.cpax.org.uk/prg/writings/fgetdata.php in which I dealt
    with these problems a few years back. Curiously, I too chose the name
    fgetline (and fgetword, for token-based rather than line-based input).

    You, um, wanna try a different name? Two fgetlines could get confusing.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Oct 12, 2007
    #2
    1. Advertising

  3. Richard Harter

    Thad Smith Guest

    Richard Harter wrote:
    > The following is an exercise in thinking out a design.
    > Comments and thoughts are welcome.


    I enjoyed reading your design process.


    > Is there a way to get the efficiency of the highwater scheme
    > without being dirty? Yes; the trick is that the calling program
    > provides the initial buffer. If we add the buffer information to
    >
    > the prototype we get:
    >
    > fgetline_info fgetline(FILE * fptr,
    > char * buffer,
    > size_t size,
    > size_t maxsize);
    >
    > It's probably best to have a value in the status field that
    > signifies the size has been increased; call that value
    > fg_increased and a normal read fg_normal. Then the usage for
    > transient copies might look like this (first cut):
    >
    > ...
    > struct fgetline_info fg;
    > char buffer[80];
    > int done = 0;
    > ....
    > for(;!done;) {
    > ...
    > fg = fgetline(fptr,buffer,80,FG_MAXSIZE);
    > if (fg.status == fg_normal || fg.status == fg_increased) {
    > /* do stuff */
    > if (fg.status == fg_increased) free(fg.line);
    > } else done = 1;
    > }


    That is too messy for my taste. I can see giving the flexibility of
    using either a supplied buffer or allocating one, but to have the
    function choose based on the size of the line makes the application too
    error prone, IMO. If accepting a supplied buffer, I would only use
    that, giving an error indication if it were too short. The
    simplification in code is a good payoff for me.


    > What sort of return values should be available in the status
    > field? These are the successful ones that occur to me:
    >
    > end_of file The last read (if any) found an EOL
    > marker. The current read finds an
    > immediate EOF.
    >
    > no_increase Normal read - either buffer was null or
    > else it was not increased.
    >
    > increase Normal read - the buffer size has been
    > increased.
    >
    > abn_no_increase Abnormal read - an EOF was found without
    > an EOL. Buffer was not increased.
    >
    > abn_increase Abnormal read - an EOF was found without
    > an EOL. Buffer was increased.
    >
    > In addition there are numerous kinds of errors that can occur.
    > Calling sequence arguments include:
    >
    > no_file The file pointer is null.
    > bad_buffer One and only one of buffere and size s
    > zero.
    > bad_size Size is 0 or is greater than maxsize
    > bad_maxsize Maxsize is 0
    >
    > Then there are the memory allocation failures:
    >
    > bad_allocate Malloc or realloc failure
    > big_line Line length is greater than maxsize.


    Add I/O error.

    > As a final note, there is one minor decision left unsettled,
    > namely should the returned line include an EOL marker before the
    > string terminating 0. My take is that this matter is too trivial
    >
    > to warrant adding a flag to the calling sequence, and that it
    > will be slightly less confusing if there is one present even if
    > it has to be manufactured.


    I prefer a flag. Otherwise you need to usurp some character for an EOF
    marker. The constant EOF is intentionally outside (unsigned char)c for
    all characters c (excepting implementations with sizeof int = 1).

    --
    Thad
     
    Thad Smith, Oct 12, 2007
    #3
  4. On Fri, 12 Oct 2007 02:18:30 +0000, Richard Heathfield
    <> wrote:

    >Richard Harter said:
    >
    ><snip>
    >
    >> What we want is a routine that keeps reading and returning lines
    >> from a file. A simple prototype for this function is
    >>
    >> char * fgetline(FILE *);
    >>
    >> There are some gotchas with this prototype

    >
    >Check out http://www.cpax.org.uk/prg/writings/fgetdata.php in which I dealt
    >with these problems a few years back. Curiously, I too chose the name
    >fgetline (and fgetword, for token-based rather than line-based input).


    Thanks, I've read it; it's quite good; however it doesn't go into
    some of the issues I was concerned with.
    >
    >You, um, wanna try a different name? Two fgetlines could get confusing.


    But that's the logical name - getline gets a line from stdin,
    fgetline gets a line from an arbitrary file.



    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #4
  5. Richard Harter said:

    > On Fri, 12 Oct 2007 02:18:30 +0000, Richard Heathfield
    > <> wrote:
    >

    <snip>

    >>Check out http://www.cpax.org.uk/prg/writings/fgetdata.php in which I
    >>dealt with these problems a few years back. Curiously, I too chose the
    >>name fgetline (and fgetword, for token-based rather than line-based
    >>input).

    >
    > Thanks, I've read it; it's quite good; however it doesn't go into
    > some of the issues I was concerned with.


    I'm sure, but I wasn't particularly fussed about that - I just wanted to
    draw your attention to the name clash.

    >>You, um, wanna try a different name? Two fgetlines could get confusing.

    >
    > But that's the logical name - getline gets a line from stdin,
    > fgetline gets a line from an arbitrary file.


    Er, I agree that it's the logical name (obviously!). Oh, well - if you want
    a name clash, a name clash you can have, I guess.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Oct 12, 2007
    #5
  6. On Thu, 11 Oct 2007 20:32:32 -0600, Thad Smith
    <> wrote:

    >Richard Harter wrote:
    >> The following is an exercise in thinking out a design.
    >> Comments and thoughts are welcome.

    >
    >I enjoyed reading your design process.


    Thank you.
    >
    >
    >> Is there a way to get the efficiency of the highwater scheme
    >> without being dirty? Yes; the trick is that the calling program
    >> provides the initial buffer. If we add the buffer information to
    >>
    >> the prototype we get:
    >>
    >> fgetline_info fgetline(FILE * fptr,
    >> char * buffer,
    >> size_t size,
    >> size_t maxsize);
    >>
    >> It's probably best to have a value in the status field that
    >> signifies the size has been increased; call that value
    >> fg_increased and a normal read fg_normal. Then the usage for
    >> transient copies might look like this (first cut):
    >>
    >> ...
    >> struct fgetline_info fg;
    >> char buffer[80];
    >> int done = 0;
    >> ....
    >> for(;!done;) {
    >> ...
    >> fg = fgetline(fptr,buffer,80,FG_MAXSIZE);
    >> if (fg.status == fg_normal || fg.status == fg_increased) {
    >> /* do stuff */
    >> if (fg.status == fg_increased) free(fg.line);
    >> } else done = 1;
    >> }

    >
    >That is too messy for my taste. I can see giving the flexibility of
    >using either a supplied buffer or allocating one, but to have the
    >function choose based on the size of the line makes the application too
    >error prone, IMO. If accepting a supplied buffer, I would only use
    >that, giving an error indication if it were too short. The
    >simplification in code is a good payoff for me.


    Hmmm. My take is that I want the function to give me a complete
    line if at all possible but to do it in a way that minimizes
    calls to malloc/realloc/free. Passing a supplied buffer does this
    provided that we still do the expanding buffer trick. Treating a
    too small buffer as an error would move a bunch of mess to the
    calling program.

    That said, your point about being too error prone is well taken.
    It may be better to actually have two routines - one without an
    input buffer and one with. Call the one with no user supplied
    buffer fgetline and the one with the user supplied buffer
    fgetline_hp (hp for high performance :)).

    >
    >
    >> What sort of return values should be available in the status
    >> field? These are the successful ones that occur to me:
    >>
    >> end_of file The last read (if any) found an EOL
    >> marker. The current read finds an
    >> immediate EOF.
    >>
    >> no_increase Normal read - either buffer was null or
    >> else it was not increased.
    >>
    >> increase Normal read - the buffer size has been
    >> increased.
    >>
    >> abn_no_increase Abnormal read - an EOF was found without
    >> an EOL. Buffer was not increased.
    >>
    >> abn_increase Abnormal read - an EOF was found without
    >> an EOL. Buffer was increased.
    >>
    >> In addition there are numerous kinds of errors that can occur.
    >> Calling sequence arguments include:
    >>
    >> no_file The file pointer is null.
    >> bad_buffer One and only one of buffere and size s
    >> zero.
    >> bad_size Size is 0 or is greater than maxsize
    >> bad_maxsize Maxsize is 0
    >>
    >> Then there are the memory allocation failures:
    >>
    >> bad_allocate Malloc or realloc failure
    >> big_line Line length is greater than maxsize.

    >
    >Add I/O error.
    >
    >> As a final note, there is one minor decision left unsettled,
    >> namely should the returned line include an EOL marker before the
    >> string terminating 0. My take is that this matter is too trivial
    >>
    >> to warrant adding a flag to the calling sequence, and that it
    >> will be slightly less confusing if there is one present even if
    >> it has to be manufactured.

    >
    >I prefer a flag. Otherwise you need to usurp some character for an EOF
    >marker. The constant EOF is intentionally outside (unsigned char)c for
    >all characters c (excepting implementations with sizeof int = 1).


    I see that I was unclear. Here is the issue: When we return a
    line do we include \n or not. For example, suppose the line is
    "abc". Do we return abc\n\0 or abc\0? The "manufactured" comes
    into play when the last line of a file isn't properly terminated
    with a \n. If we are returning the EOL (not EOF) character then
    we have to manufacture one that is not actually present in the
    file. Upon reflection, it may be best to return abc\0.



    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #6
  7. On Fri, 12 Oct 2007 03:43:48 +0000, Richard Heathfield
    <> wrote:

    >Richard Harter said:
    >

    [snip]
    >>
    >> But that's the logical name - getline gets a line from stdin,
    >> fgetline gets a line from an arbitrary file.

    >
    >Er, I agree that it's the logical name (obviously!). Oh, well - if you want
    >a name clash, a name clash you can have, I guess.


    Hey, I'm easy. Do you have any good suggestions? getln, fgetln,
    rdline, and readline are already in use.


    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #7
  8. Richard Harter

    Logan Shaw Guest

    Richard Harter wrote:
    > Is there a way to get the efficiency of the highwater scheme
    > without being dirty? Yes; the trick is that the calling program
    > provides the initial buffer. If we add the buffer information to
    > the prototype we get:
    >
    > fgetline_info fgetline(FILE * fptr,
    > char * buffer,
    > size_t size,
    > size_t maxsize);
    >
    > There are three kinds of copies of the line that we might get
    > from fgetline - clean, transient, and dirty. A clean copy is one
    > that has no encumbrances - it lasts until it is specifically
    > deleted. A transient copy is one that lasts until the next
    > call to fgetline to get another line from the current file.
    > (There may be another file being read elsewhere.)
    >
    > What kind of copy should fgetline provide, clean or transient?
    > There are arguments for each choice. Clean copies are more
    > expensive but safer. Transient copies are (usually) cheaper. In
    > questions like this there is much to be said for the design
    > principle that:
    >
    > When there is a significant choice as to the kind of
    > output being delivered the library routine should let
    > the user make the choice rather than dictating the
    > choice unless offering the choice unduly complicates
    > usage.
    >
    > So how do we provide a choice? That is simple; if the buffer
    > pointer in the calling sequence is NULL, fgetline must return a
    > clean copy; otherwise fgetline MAY return a transient copy.


    I agree 100% so far, but this is where I diverge. IMHO, this is
    all a lot simpler if you let the caller make the choice up front
    before fgetline() is ever called. How do you do this when the
    length of the line is not known in advance? You let the user
    manage the storage for the "meta info" about the buffer (the
    address of the buffer, the size of it, etc.), and you track
    everything about the buffer either in local variables or in
    the storage that the user has supplied. If the user wants that
    meta info to be static, they declare it static. If they want
    to have a clean copy on every call, they make a new structure
    to track a buffer every time.Also notice that free_buffer() leaves the buffer metadata in a
    consistent state so that you can throw away the dynamic memory as
    soon as you know you don't need it yet still call freadline() on it
    again if you feel like it.

    It looks like this (extraneous fields omitted for clarity):

    struct LineBuffer {
    char *buffer;
    size_t buffer_size;
    };

    And the code looks like this:

    void init_buffer(LineBuffer *lb) {
    lb->buffer = NULL;
    lb->buffer_size = 0;
    }

    int increase_buffer(LineBuffer *lb, size_t newsize) {
    char *newbuffer = realloc(buffer, newsize);
    if (newbuffer == NULL) return 0;

    lb->buffer = newbuffer;
    lb->buffer_size = newsize;
    return 1;
    }

    void free_buffer(LineBuffer *lb) {
    free(lb->buffer);
    init_buffer(lb); /* leave in a consistent state */
    }

    int freadline(FILE *f, LineBuffer *lb) {
    int pos = 0;

    while (.... whatever ....) {
    if (pos >= lb->buffer_size &&
    ! increase_buffer(lb, 2 * lb->buffer_size + 1)) {
    return 0;
    }

    lb->buffer[pos++] = .... this character ....;
    }
    }

    Now the client code can do whatever it wants. It doesn't have to worry
    about conditionally freeing the old storage. Because the client is
    managing the metadata about the buffer and because it is passing that
    metadata to freadline(), the freadline() function can free() it if
    necessary.

    On a side note, since init_buffer() allocates no dynamic storage and
    merely leaves the metadata in a consistent state, no error handling
    is necessary, and only calls to freadline() have to worry about an
    error, thus reducing the amount of error-handling code needed.

    - Logan
     
    Logan Shaw, Oct 12, 2007
    #8
  9. * Richard Harter:
    > The following is an exercise in thinking out a design.
    > Comments and thoughts are welcome.
    >
    > Writing a routine to read lines from a file is one of
    > those little design tasks that seems to create disagreement and
    > confusion. The routine has various names; in this article we
    > will call it fgetline. Here is a take on how to do it.
    >
    > What we want is a routine that keeps reading and returning lines
    > from a file. A simple prototype for this function is
    >
    > char * fgetline(FILE *);
    >
    > There are some gotchas with this prototype - well, not really
    > gotchas, rather little bits of awkwardness and inefficiency. One
    >
    > is that we are throwing away information that (usually) has to be
    >
    > recomputed, namely the length of the line and, as we shall see,
    > status information. If we want to add the information to our
    > prototype there are several that to do it - it can be
    > passed back through the calling sequence or it can be returned by
    >
    > the function. (We could also pass it to a global - ugh, don't do
    >
    > it.) My vote is to have it all returned by the function which
    > means we create a struct (record) that looks like:
    >
    > struct fgetline_info {
    > char *line;
    > size_t len;
    > int status;
    > };


    This looks like a C++ std::string except it has mixed in that "status"
    field which has nothing to do with strings.


    > There are various choices that could be made in the types -
    > season to taste. (A pointer to the struct could be passed in as
    > an argument if need be.)
    >
    >
    > A second issue is that there is no limit to how large the line
    > can be. A plausible way to provide a limit that is to pass one
    > in as an argument. If we do, our prototype looks like this:
    >
    > struct fgetline_info fgetline(FILE * fptr, size_t maxsize);
    >
    > This version has a little problem - where did the storage for the
    >
    > line come from? One way to deal with that is for fgetline to
    > allocate storage for the line and for the calling routine to free
    >
    > it when it is done with the line.


    Well, the C++ function is

    std::eek:stream& std::getline( std::eek:stream&, std::string& );

    As far as I can see this solves all of the problems mentioned.

    What it doesn't solve is how to just ignore a line.

    For that, streams have the ignore() member function.

    OH! You have cross-posted to [comp.lang.c]! Well, darn, it shouldn't
    be too difficult to translate the C++ solution to C.

    Of course, then the programmer has to do what the compiler does for C++.

    And I think that's very relevant: the functionality you want is
    essentially to have all this stuff automated and enforced by the
    compiler, like in C++ and more high level languages.

    Cheers, & hth.,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Oct 12, 2007
    #9
  10. Richard Harter said:

    > On Fri, 12 Oct 2007 03:43:48 +0000, Richard Heathfield
    > <> wrote:
    >
    >>Richard Harter said:
    >>

    > [snip]
    >>>
    >>> But that's the logical name - getline gets a line from stdin,
    >>> fgetline gets a line from an arbitrary file.

    >>
    >>Er, I agree that it's the logical name (obviously!). Oh, well - if you
    >>want a name clash, a name clash you can have, I guess.

    >
    > Hey, I'm easy. Do you have any good suggestions? getln, fgetln,
    > rdline, and readline are already in use.


    Ah. Er, um... Hmmm. The trouble is that fgetline *is* the logical name! :)

    Okay, how about... oh. fgets has kinda gone, too, hasn't it?

    Um... sgetline? (s for stream (the source) or for string (the target) at
    your discretion).

    Just a thought. Obviously, fgetline isn't trademarked or anything (well,
    not as far as I know, anyway), so you're free to use that if you choose.
    I'm not being 'precious' about it, honestly - but between you and me the
    name clash is particularly bad because we can't even say "fgetline - you
    know, Richard H's version"!

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Oct 12, 2007
    #10
  11. On Thu, 11 Oct 2007 23:24:50 -0500, Logan Shaw
    <> wrote:

    >Richard Harter wrote:


    >> So how do we provide a choice? That is simple; if the buffer
    >> pointer in the calling sequence is NULL, fgetline must return a
    >> clean copy; otherwise fgetline MAY return a transient copy.

    >
    >I agree 100% so far, but this is where I diverge. IMHO, this is
    >all a lot simpler if you let the caller make the choice up front
    >before fgetline() is ever called. How do you do this when the
    >length of the line is not known in advance? You let the user
    >manage the storage for the "meta info" about the buffer (the
    >address of the buffer, the size of it, etc.), and you track
    >everything about the buffer either in local variables or in
    >the storage that the user has supplied. If the user wants that
    >meta info to be static, they declare it static. If they want
    >to have a clean copy on every call, they make a new structure
    >to track a buffer every time.Also notice that free_buffer() leaves the buffer metadata in a
    >consistent state so that you can throw away the dynamic memory as
    >soon as you know you don't need it yet still call freadline() on it
    >again if you feel like it.
    >
    >It looks like this (extraneous fields omitted for clarity):
    >
    > struct LineBuffer {
    > char *buffer;
    > size_t buffer_size;
    > };
    >
    >And the code looks like this:
    >
    > void init_buffer(LineBuffer *lb) {
    > lb->buffer = NULL;
    > lb->buffer_size = 0;
    > }
    >
    > int increase_buffer(LineBuffer *lb, size_t newsize) {
    > char *newbuffer = realloc(buffer, newsize);
    > if (newbuffer == NULL) return 0;
    >
    > lb->buffer = newbuffer;
    > lb->buffer_size = newsize;
    > return 1;
    > }
    >
    > void free_buffer(LineBuffer *lb) {
    > free(lb->buffer);
    > init_buffer(lb); /* leave in a consistent state */
    > }
    >
    > int freadline(FILE *f, LineBuffer *lb) {
    > int pos = 0;
    >
    > while (.... whatever ....) {
    > if (pos >= lb->buffer_size &&
    > ! increase_buffer(lb, 2 * lb->buffer_size + 1)) {
    > return 0;
    > }
    >
    > lb->buffer[pos++] = .... this character ....;
    > }
    > }
    >
    >Now the client code can do whatever it wants. It doesn't have to worry
    >about conditionally freeing the old storage. Because the client is
    >managing the metadata about the buffer and because it is passing that
    >metadata to freadline(), the freadline() function can free() it if
    >necessary.
    >
    >On a side note, since init_buffer() allocates no dynamic storage and
    >merely leaves the metadata in a consistent state, no error handling
    >is necessary, and only calls to freadline() have to worry about an
    >error, thus reducing the amount of error-handling code needed.


    Many thanks for the comments. You have, I opine, the right idea
    (using metadata) but your actual code doesn't do the job. One
    problem is that your init_buffer initializes the buffer to NULL.
    This means that the client can't supply a buffer either from the
    stack or from static storage; it will have to come from the heap
    which is what we want to avoid. We can get around that by having
    a pair of pointer/length variables, e.g.

    struct fg_meta {
    char * buf0;
    size_t len0;
    char * buffer;
    size_t length;
    };

    void init_fgmeta(fg_meta * fg; char *buf, size_t len) {
    if (!buf) len = 0;
    if (!len) buf = 0;
    fg->buf0 = buf; fg->buffer = buf;
    fg->len0 = len; fg->length = len;
    };

    with the obvious adjustment to free_fgmeta/free_buffer.

    More importantly we don't want the user (client if you like) to
    have to manage the meta data, though they can have the option to
    do so if they like. The implication is that at initialization
    time the user *tells* the package how it wants to do things,
    presumably by passing a flag to the initialization routine. Does
    the user want a clean copy each time? Then the user takes
    responsibility for freeing the storage. Will the user accept
    transient copies? Then they don't have to anything about freeing
    storage.

    The way I would do this is to define a structure that holds all
    this information; call the pointer to it a handle. Then we have
    three routines, one to initialize the structure, one to do the
    actual reads (and whatever back room storage management is
    needed) and one to close the read. The latter takes care of any
    cleanup that needs to be done.

    That's the alternative to passing bits and pieces through calling
    sequences, and it probably is the road that should be taken.





    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #11
  12. * Alf P. Steinbach:
    >
    > std::eek:stream


    I meant istream, of course.

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Oct 12, 2007
    #12
  13. On Fri, 12 Oct 2007 06:19:23 +0000, Richard Heathfield
    <> wrote:

    >Richard Harter said:
    >
    >> On Fri, 12 Oct 2007 03:43:48 +0000, Richard Heathfield
    >> <> wrote:
    >>
    >>>Richard Harter said:
    >>>

    >> [snip]
    >>>>
    >>>> But that's the logical name - getline gets a line from stdin,
    >>>> fgetline gets a line from an arbitrary file.
    >>>
    >>>Er, I agree that it's the logical name (obviously!). Oh, well - if you
    >>>want a name clash, a name clash you can have, I guess.

    >>
    >> Hey, I'm easy. Do you have any good suggestions? getln, fgetln,
    >> rdline, and readline are already in use.

    >
    >Ah. Er, um... Hmmm. The trouble is that fgetline *is* the logical name! :)
    >
    >Okay, how about... oh. fgets has kinda gone, too, hasn't it?
    >
    >Um... sgetline? (s for stream (the source) or for string (the target) at
    >your discretion).
    >
    >Just a thought. Obviously, fgetline isn't trademarked or anything (well,
    >not as far as I know, anyway), so you're free to use that if you choose.
    >I'm not being 'precious' about it, honestly - but between you and me the
    >name clash is particularly bad because we can't even say "fgetline - you
    >know, Richard H's version"!


    I did a bit of googling; freadline is pretty much free, so
    freadline it is.



    >
    >--
    >Richard Heathfield <http://www.cpax.org.uk>
    >Email: -http://www. +rjh@
    >Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    >"Usenet is a strange place" - dmr 29 July 1999


    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 12, 2007
    #13
  14. "Richard Harter" <> wrote in message
    > The following is an exercise in thinking out a design.
    > Comments and thoughts are welcome.
    >

    Firstly let me say I think this is a massively valuable exercise. Any
    competent C programmer should be able to write an fgetline() without bugs,
    the cghallenge is in developing an itnerface for it.
    >
    > My vote is to have it all returned by the function which
    > means we create a struct (record) that looks like:
    >
    > struct fgetline_info {
    > char *line;
    > size_t len;
    > int status;
    > };
    >

    Sorry hopeless.
    Caller wants
    while(line = fgetline() )
    {
    processline();
    }

    He can't be bothered with fiddling rounf with the internals of structures.

    > A second issue is that there is no limit to how large the line
    > can be. A plausible way to provide a limit that is to pass one
    > in as an argument. If we do, our prototype looks like this:
    >
    > struct fgetline_info fgetline(FILE * fptr, size_t maxsize);
    >
    > This version has a little problem - where did the storage for the
    > line come from?
    >

    You're passing in a maximum size, so caller might as well make the line a
    buffer in his scope, as in fgets().

    > In fgetline we have some initialization code that allocates
    > buffer space when the buffer size is zero. Thereafter we use our
    > little doubling trick.
    >

    Is this relevant? Apart for the problem of malicious, massive lines designed
    to tie up resources, I don't think most people care whether the function can
    be implemented efficiently or not.
    >

    [ clen, transient, dirty copies ]
    > When there is a significant choice as to the kind of
    > output being delivered the library routine should let
    > the user make the choice rather than dictating the
    > choice unless offering the choice unduly complicates
    > usage.
    >

    And the choice does unduly complicate usage. Caller doesn't want to be
    bothered with all this.
    >
    > What sort of return values should be available in the status
    > field? These are the successful ones that occur to me:
    >
    > end_of file The last read (if any) found an EOL
    > marker. The current read finds an
    > immediate EOF.
    >
    > no_increase Normal read - either buffer was null or
    > else it was not increased.
    >
    > increase Normal read - the buffer size has been
    > increased.
    >
    > abn_no_increase Abnormal read - an EOF was found without
    > an EOL. Buffer was not increased.
    >
    > abn_increase Abnormal read - an EOF was found without
    > an EOL. Buffer was increased.
    >
    > In addition there are numerous kinds of errors that can occur.
    > Calling sequence arguments include:
    >
    > no_file The file pointer is null.
    > bad_buffer One and only one of buffere and size s
    > zero.
    > bad_size Size is 0 or is greater than maxsize
    > bad_maxsize Maxsize is 0
    >
    > Then there are the memory allocation failures:
    >
    > bad_allocate Malloc or realloc failure
    > big_line Line length is greater than maxsize.
    >
    > When there is a memory allocation failure the line and len fields
    > hold what has been successfuly read.
    >

    This is a real problem. Too many return values. You can't get away from
    allocation failures unless you also redo your entire allocation strategy.
    You can't prevent IO errors, though in practise they are unlikely. Premature
    EOF is neither unlikely nor something you can avoid.
    >
    > As a final note, there is one minor decision left unsettled,
    > namely should the returned line include an EOL marker before the
    > string terminating 0.
    >

    Why not just extend fgets().
    You're passing a maximum size, which 99% of the time will be less than 1024
    bytes, so caller can easily declare it in his scope. He can malloc a massive
    buffer if expecting a massive line. The only thing we lose is that if there
    is huge but legitimate line of unknown length, caller has to allocate more
    memory.

    The problem of line length is easily solved, pass in a length pointer /
    return it. That leaves only the problem of what to do on overlong lines. The
    ideal thing is to take as much error handling code out of caller as
    possible.
    So caller might want to exit, he might want to print a message to stderr, he
    might want the function to refuse to return any data, or he might want to do
    what fgets() does and essentially ignore the problem.

    So

    /*
    fgetsx() - fgets extension.
    Params; buff - buffer to hold line
    maxlen - length of buffer
    flags - how to handle overlong lines and IO errors
    fp - pointer to an open file
    Returns: number of characters placed in buff, or errorcode
    */
    int fgetsx(char *buff, size_t maxlen, int flags, FILE *fp);

    So now we've just got to decide the flags.

    FGETS_EXIT - call exit on error
    FGETS_CODE - return an error code on error
    FGETS_STDERR - print a message to stderr on error
    FGETS_IO - report IO errors
    FGETS_OVERFLOW - report overflow errors
    FGETS_EOF - report premature EOF as an error

    FGETS_ALL - report all errors

    Return value -1 = premature EOF
    -2 - IO error
    -3 - buffer overflow error

    Typical call

    while(len = fgetsx(buff, 1024, FGETS_EXIT| FGETS_STDERR | FGETS_ALL, fp))
    {
    /* here we know that buff is a valid normal line. Else program quits with
    an error message */
    }

    while(ret = fgets(buff, 1024, FGETS_CODE | FGETS_ALL, fp) )
    {
    if(ret < 0)
    {
    switch(ret)
    {
    case EOF: /* premature EOF, handle */
    case FGETS_IOERROR: /* io error, handle */
    case FGETS_OVERFLOW: /* line too long, handle */
    }
    }
    }


    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Oct 12, 2007
    #14
  15. On Fri, 12 Oct 2007 21:06:10 +0100, "Malcolm McLean"
    <> wrote:

    >"Richard Harter" <> wrote in message
    >> The following is an exercise in thinking out a design.
    >> Comments and thoughts are welcome.
    >>

    >Firstly let me say I think this is a massively valuable exercise. Any
    >competent C programmer should be able to write an fgetline() without bugs,
    >the cghallenge is in developing an itnerface for it.


    Thanks.

    >>
    >> My vote is to have it all returned by the function which
    >> means we create a struct (record) that looks like:
    >>
    >> struct fgetline_info {
    >> char *line;
    >> size_t len;
    >> int status;
    >> };
    >>

    >Sorry hopeless.
    >Caller wants
    >while(line = fgetline() )
    >{
    > processline();
    >}
    >
    >He can't be bothered with fiddling rounf with the internals of structures.


    Point well taken, though I don't agree with "can't be bothered" -
    some will and some won't. There are a couple of lessons here
    though:

    (a) The usage should be consistent with common practice. It is
    very common in C to use loops like:
    while(line = foo()) {...}
    and
    while((len = foo()) >= 0) {...}

    so the interface should provide for it.

    (b) The desires of the "the can't be bothered" school of
    programming should be respected - at least to the extent of
    providing usage options that produce results that aren't too
    gross.

    >
    >> A second issue is that there is no limit to how large the line
    >> can be. A plausible way to provide a limit that is to pass one
    >> in as an argument. If we do, our prototype looks like this:
    >>
    >> struct fgetline_info fgetline(FILE * fptr, size_t maxsize);
    >>
    >> This version has a little problem - where did the storage for the
    >> line come from?
    >>

    >You're passing in a maximum size, so caller might as well make the line a
    >buffer in his scope, as in fgets().


    I disagree. A common usage is read all of the lines of a file
    and keep them in memory. If the buffers holding the lines are
    all maximum size then you get memory bloat. The purpose of
    maxsize is to serve as a sanity check.

    >
    >> In fgetline we have some initialization code that allocates
    >> buffer space when the buffer size is zero. Thereafter we use our
    >> little doubling trick.
    >>

    >Is this relevant? Apart for the problem of malicious, massive lines designed
    >to tie up resources, I don't think most people care whether the function can
    >be implemented efficiently or not.


    I care. I like library code to be mean, lean, and efficient. I
    don't like library code produces bloated resource usage.

    >>

    >[ clen, transient, dirty copies ]
    >> When there is a significant choice as to the kind of
    >> output being delivered the library routine should let
    >> the user make the choice rather than dictating the
    >> choice unless offering the choice unduly complicates
    >> usage.
    >>

    >And the choice does unduly complicate usage. Caller doesn't want to be
    >bothered with all this.


    This user does. I dare say that most users don't know, don't
    care, and don't understand that their code is bloated and uses an
    order of magnitude more resources than necessary. Often it
    doesn't matter - the cost of writing more efficient code can
    be greater than the savings.

    That said, programmers should (in my personal opinion) at least
    understand the costs of various choices. Part of programming is
    about understanding tradeoffs. I feel that one ought to know and
    appreciate that it is cheaper to reuse space rather than
    continually allocate and deallocate it from the heap.


    >>
    >> What sort of return values should be available in the status
    >> field? These are the successful ones that occur to me:
    >>
    >> end_of file The last read (if any) found an EOL
    >> marker. The current read finds an
    >> immediate EOF.
    >>
    >> no_increase Normal read - either buffer was null or
    >> else it was not increased.
    >>
    >> increase Normal read - the buffer size has been
    >> increased.
    >>
    >> abn_no_increase Abnormal read - an EOF was found without
    >> an EOL. Buffer was not increased.
    >>
    >> abn_increase Abnormal read - an EOF was found without
    >> an EOL. Buffer was increased.
    >>
    >> In addition there are numerous kinds of errors that can occur.
    >> Calling sequence arguments include:
    >>
    >> no_file The file pointer is null.
    >> bad_buffer One and only one of buffere and size s
    >> zero.
    >> bad_size Size is 0 or is greater than maxsize
    >> bad_maxsize Maxsize is 0
    >>
    >> Then there are the memory allocation failures:
    >>
    >> bad_allocate Malloc or realloc failure
    >> big_line Line length is greater than maxsize.
    >>
    >> When there is a memory allocation failure the line and len fields
    >> hold what has been successfuly read.
    >>

    >This is a real problem. Too many return values. You can't get away from
    >allocation failures unless you also redo your entire allocation strategy.
    >You can't prevent IO errors, though in practise they are unlikely. Premature
    >EOF is neither unlikely nor something you can avoid.


    These aren't return values - they are information flags, bits in
    the status word. At least that's the way I would do it. I tend
    to take the view that library code ought to have a way to inform
    the user of all the happenings of interest. Of course it
    shouldn't be shoved down the user's throat.


    >>
    >> As a final note, there is one minor decision left unsettled,
    >> namely should the returned line include an EOL marker before the
    >> string terminating 0.
    >>

    >Why not just extend fgets().
    >You're passing a maximum size, which 99% of the time will be less than 1024
    >bytes, so caller can easily declare it in his scope. He can malloc a massive
    >buffer if expecting a massive line. The only thing we lose is that if there
    >is huge but legitimate line of unknown length, caller has to allocate more
    >memory.


    That's one way to do things but I really don't like it in part
    for reasons already discussed, and because I don't trust coding
    decisions made on the " most of the time it won't be a problem"
    theory. More precisely my feeling is that code "down at the
    bottom" should be very solid.

    Having done the motherhood and apple pie commercial we now return
    to your regular station.

    >
    >The problem of line length is easily solved, pass in a length pointer /
    >return it. That leaves only the problem of what to do on overlong lines. The
    >ideal thing is to take as much error handling code out of caller as
    >possible.
    >So caller might want to exit, he might want to print a message to stderr, he
    >might want the function to refuse to return any data, or he might want to do
    >what fgets() does and essentially ignore the problem.
    >
    > So
    >
    >/*
    > fgetsx() - fgets extension.
    > Params; buff - buffer to hold line
    > maxlen - length of buffer
    > flags - how to handle overlong lines and IO errors
    > fp - pointer to an open file
    > Returns: number of characters placed in buff, or errorcode
    >*/
    >int fgetsx(char *buff, size_t maxlen, int flags, FILE *fp);
    >
    >So now we've just got to decide the flags.
    >
    >FGETS_EXIT - call exit on error
    >FGETS_CODE - return an error code on error
    >FGETS_STDERR - print a message to stderr on error
    >FGETS_IO - report IO errors
    >FGETS_OVERFLOW - report overflow errors
    >FGETS_EOF - report premature EOF as an error
    >
    >FGETS_ALL - report all errors


    This idea I like - pass flags to the routine that tell it what to
    do with errors.


    >
    >Return value -1 = premature EOF
    > -2 - IO error
    > -3 - buffer overflow error


    I dunno know about this one - using special values as flags is an
    ancient programming trick, but in my experience it is error prone
    and maintainance problems prone. The problems come from having
    two distinct meanings for a variable.
    >
    >Typical call
    >
    >while(len = fgetsx(buff, 1024, FGETS_EXIT| FGETS_STDERR | FGETS_ALL, fp))
    >{
    > /* here we know that buff is a valid normal line. Else program quits with
    > an error message */
    >}
    >
    >while(ret = fgets(buff, 1024, FGETS_CODE | FGETS_ALL, fp) )
    >{
    > if(ret < 0)
    > {
    > switch(ret)
    > {
    > case EOF: /* premature EOF, handle */
    > case FGETS_IOERROR: /* io error, handle */
    > case FGETS_OVERFLOW: /* line too long, handle */
    > }
    > }
    >}


    I assume that second call should be ret = fgetsx.

    Where is buff? If it is on the stack you're making transient
    copies; if it is allocated each time (and you have to do the
    allocation in the user code) you're making clean bloated copies.
    Notice that in one call you've got len in one call and ret in
    another.

    Thanks for the comments - you've given me things to think about.



    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    But the rhetoric of holistic harmony can generate into a kind of
    dotty, Prince Charles-style mysticism. -- Richard Dawkins
     
    Richard Harter, Oct 13, 2007
    #15
  16. "Richard Harter" <> wrote in message
    > I disagree. A common usage is read all of the lines of a file
    > and keep them in memory. If the buffers holding the lines are
    > all maximum size then you get memory bloat. The purpose of
    > maxsize is to serve as a sanity check.
    >

    That's a good point.
    If we're going to support reading the whole file, efficiently, which is
    worth doing, it means we should also support reading in a file structure in
    only a few lines.

    while(1)
    {
    fg = fgetline(fptr, 0, 0, FG_MAXSIZE);
    if(fg.status != fg_normal)
    {
    if(fg.status == end_of_file)
    break;
    /* error handling code */
    }
    temp = realloc(lines, (N+1) * sizeof(char *));
    if(!temp)
    {
    /* error handling code */
    }
    lines = temp;
    lines[N] = fg.line;
    N++;
    }

    it's not an utter disaster, but it is too much.

    How about.

    while(line = fg_getline(fptr, 0, 0, FG_MAXSIZE))
    {
    temp = realloc(lines, (N+1) * sizeof(char *));
    if(!temp)
    {
    /* error-handline code here */
    }
    lines = temp;
    lines[N] = line;
    N++;
    }
    /* error handler if fg_getline encounters a problem */
    if(fg_error(fptr))
    {
    switch(fg_error(fptr))
    {
    }
    }

    Don't really like that either. We've got the while() actually testing the
    condition, and we've moved some but not all of the error handling code out
    of the loop, but it still isn't a boiler-plate idiom.

    The other alternative is to have two functions, one for getting a single
    line, one for reading a whole file.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Oct 14, 2007
    #16
  17. Richard Harter

    CBFalconer Guest

    Malcolm McLean wrote:
    > "Richard Harter" <> wrote in message
    >
    >> I disagree. A common usage is read all of the lines of a file
    >> and keep them in memory. If the buffers holding the lines are
    >> all maximum size then you get memory bloat. The purpose of
    >> maxsize is to serve as a sanity check.

    >
    > That's a good point. If we're going to support reading the whole
    > file, efficiently, which is worth doing, it means we should also
    > support reading in a file structure in only a few lines.
    >

    .... snip code example ...

    Here is another example, from the testing suite for ggets. Note
    the simplicity and safety of capturing the entire file in memory
    (assuming sufficient memory exists). No excess memory is used.
    ggets.zip is available at:

    <http://cbfalconer.home.att.net/download>

    /* File freverse.c - testing ggets */

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include "ggets.h"

    typedef struct node {
    struct node *next;
    char *ln;
    } node, *nodeptr;

    /* ======================= */
    /* reverse string in place */
    /* return length of string */
    size_t revstring(char *stg)
    {
    char *last, temp;
    size_t lgh;

    lgh = strlen(stg);
    if (lgh > 1) {
    last = stg + lgh; /* points to '\0' */
    while (--last > stg) {
    temp = *stg; *stg++ = *last; *last = temp;
    }
    }
    return lgh;
    } /* revstring */

    /* ======================= */
    /* reverse file */
    int main(void)
    {
    char *line;
    nodeptr root, temp;
    int lncount, chcount;

    fprintf(stderr, "Reversing stdin to stdout\n");
    root = NULL;
    lncount = chcount = 0;
    while (0 == ggets(&line)) {
    if (NULL == (temp = malloc(sizeof *temp))) {
    fprintf(stderr, "Out of memory\n");
    return EXIT_FAILURE;
    }
    temp->next = root;
    chcount += revstring(line);
    lncount++;
    temp->ln = line;
    root = temp;
    }

    /* file captured, and in reverse ordered list */

    while (root) {
    (void)puts(root->ln);
    temp = root;
    root = root->next;
    free(temp->ln);
    free(temp);
    }
    fprintf(stderr, "%d chars in %d lines\n", chcount, lncount);
    return 0;
    } /* main */
    /* END freverse.c - testing ggets */

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
     
    CBFalconer, Oct 14, 2007
    #17
  18. Richard Harter

    ¬a\\/b Guest

    In data Sun, 14 Oct 2007 11:23:16 -0400, CBFalconer scrisse:
    >/* File freverse.c - testing ggets */
    >
    >#include <stdio.h>
    >#include <stdlib.h>
    >#include <string.h>
    >#include "ggets.h"
    >
    >typedef struct node {
    > struct node *next;
    > char *ln;
    >} node, *nodeptr;
    >
    >/* ======================= */
    >/* reverse string in place */
    >/* return length of string */
    >size_t revstring(char *stg)
    >{
    > char *last, temp;
    > size_t lgh;
    >
    > lgh = strlen(stg);
    > if (lgh > 1) {
    > last = stg + lgh; /* points to '\0' */
    > while (--last > stg) {
    > temp = *stg; *stg++ = *last; *last = temp;
    > }
    > }
    > return lgh;
    >} /* revstring */
    >
    >/* ======================= */
    >/* reverse file */
    >int main(void)
    >{
    > char *line;


    so it is ok even if here line!=0 ?

    > nodeptr root, temp;
    > int lncount, chcount;
    >
    > fprintf(stderr, "Reversing stdin to stdout\n");
    > root = NULL;
    > lncount = chcount = 0;
    > while (0 == ggets(&line)) {
    > if (NULL == (temp = malloc(sizeof *temp))) {
    > fprintf(stderr, "Out of memory\n");
    > return EXIT_FAILURE;
    > }
    > temp->next = root;
    > chcount += revstring(line);
    > lncount++;
    > temp->ln = line;
    > root = temp;
    > }
    >
    > /* file captured, and in reverse ordered list */
    >
    > while (root) {
    > (void)puts(root->ln);
    > temp = root;
    > root = root->next;
    > free(temp->ln);
    > free(temp);
    > }
    > fprintf(stderr, "%d chars in %d lines\n", chcount, lncount);
    > return 0;
    >} /* main */
    >/* END freverse.c - testing ggets */
    >
    >--
    > Chuck F (cbfalconer at maineline dot net)
    > Available for consulting/temporary embedded and systems.
    > <http://cbfalconer.home.att.net>
     
    ¬a\\/b, Oct 19, 2007
    #18
  19. On Sun, 14 Oct 2007 07:52:17 +0100, "Malcolm McLean"
    <> wrote:

    >
    >"Richard Harter" <> wrote in message
    >> I disagree. A common usage is read all of the lines of a file
    >> and keep them in memory. If the buffers holding the lines are
    >> all maximum size then you get memory bloat. The purpose of
    >> maxsize is to serve as a sanity check.
    >>

    >That's a good point.
    >If we're going to support reading the whole file, efficiently, which is
    >worth doing, it means we should also support reading in a file structure in
    >only a few lines.
    >
    >while(1)
    >{
    > fg = fgetline(fptr, 0, 0, FG_MAXSIZE);
    > if(fg.status != fg_normal)
    > {
    > if(fg.status == end_of_file)
    > break;
    > /* error handling code */
    > }
    > temp = realloc(lines, (N+1) * sizeof(char *));
    > if(!temp)
    > {
    > /* error handling code */
    > }
    > lines = temp;
    > lines[N] = fg.line;
    > N++;
    >}
    >
    >it's not an utter disaster, but it is too much.
    >
    >How about.
    >
    >while(line = fg_getline(fptr, 0, 0, FG_MAXSIZE))
    >{
    > temp = realloc(lines, (N+1) * sizeof(char *));
    > if(!temp)
    > {
    > /* error-handline code here */
    > }
    > lines = temp;
    > lines[N] = line;
    > N++;
    >}
    >/* error handler if fg_getline encounters a problem */
    >if(fg_error(fptr))
    >{
    > switch(fg_error(fptr))
    > {
    > }
    >}
    >
    >Don't really like that either. We've got the while() actually testing the
    >condition, and we've moved some but not all of the error handling code out
    >of the loop, but it still isn't a boiler-plate idiom.
    >
    >The other alternative is to have two functions, one for getting a single
    >line, one for reading a whole file.


    To be honest, I don't much like either alternative. However that
    may merely be a case of I like my wheel better. I have come up
    with an alternative that I like, but I haven't gone through the
    work of writing it all up. The essence is to have three
    routines, an open, a read, and a close, along with an include
    file, and an opaque struct to hold state. The usage looks
    something like this:

    #include "rdfline.h"
    ....
    struct rdf_info *rdf; /* Opaque, allocated by fg_open */
    FILE * fptr;
    char * line;
    size_t len;
    int status;

    ...

    if (!rdf_open(rdf,fptr,&line,&len,&status,maxsize,flags)) {
    /* Can't open - probably fopen failed */
    /* Do something about problem and goaway */
    }
    ...
    while (rdf_read(rdf)) {
    /* Optional error test code */
    /* Do stuff with line */
    }
    /* Optional error test code */
    rdf_close(rdf);

    The flags variable uses various bits to set flags that control
    things like error handling, memory management, and whether to
    produce clean or transient copies. Passing in a 0 uses the
    defaults which are:

    (a) The package does nothing about errors (writing to a log file,
    etc) except set an error bit in the status word.

    (b) A clean copy is produced; the user is responsible for freeing
    it.

    (c) rdf_read reads as much as it can and doesn't worry about a
    premature EOF or memory failure - set a flag bit if you want it
    to care.

    More later.




    Richard Harter,
    http://home.tiac.net/~cri, http://www.varinoma.com
    In the fields of Hell where the grass grows high
    Are the graves of dreams allowed to die
     
    Richard Harter, Oct 19, 2007
    #19
  20. "CBFalconer" <> wrote in message
    > Here is another example, from the testing suite for ggets. Note
    > the simplicity and safety of capturing the entire file in memory
    > (assuming sufficient memory exists). No excess memory is used.
    > ggets.zip is available at:
    >

    The ggets() interface isn't too bad at all. I didn't think so before this
    thread, but looking at the alternatives, its deficiencies are quite minor.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Oct 20, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Eckles
    Replies:
    1
    Views:
    1,673
  2. Replies:
    0
    Views:
    348
  3. Jeff Rodriguez
    Replies:
    4
    Views:
    447
    Jeff Rodriguez
    Nov 17, 2003
  4. Antoninus Twink

    Bug/Gross InEfficiency in HeathField's fgetline program

    Antoninus Twink, Oct 7, 2007, in forum: C Programming
    Replies:
    436
    Views:
    6,308
    user923005
    Nov 13, 2007
  5. Dik T. Winter
    Replies:
    2
    Views:
    372
    Charlie Gordon
    Nov 7, 2007
Loading...

Share This Page