Reading lines from a text file

Discussion in 'C Programming' started by Mark Hobley, Mar 22, 2010.

  1. Mark Hobley

    Mark Hobley Guest

    I want to read a text file a line at a time from within a C program. Are there
    some available functions or code already written that does this or do I need
    to code from scratch?

    If I am doing this from scratch, what is the best practise for allocating
    a buffer size for the input line?

    I guess open the file, scan once to determine the buffer size, then rewind and
    start reading. Has this already been done or do I need to code this from
    scratch?

    (My project is open source, so I can utilize GPL licenced code, if necessary.)

    C89 compatible code is preferred.

    Mark.

    --
    Mark Hobley
    Linux User: #370818 http://markhobley.yi.org/
    Mark Hobley, Mar 22, 2010
    #1
    1. Advertising

  2. On 2010-03-22, Mark Hobley <> wrote:
    > I want to read a text file a line at a time from within a C program. Are there
    > some available functions or code already written that does this or do I need
    > to code from scratch?
    >
    > If I am doing this from scratch, what is the best practise for allocating
    > a buffer size for the input line?
    >
    > I guess open the file, scan once to determine the buffer size, then rewind and
    > start reading. Has this already been done or do I need to code this from
    > scratch?
    >
    > (My project is open source, so I can utilize GPL licenced code, if necessary.)
    >


    Well, if you know how big your lines are, or know a reasonable
    maximum, you can just use:

    char buffer[1024];
    fgets(buffer, sizeof buffer, file);

    > C89 compatible code is preferred.
    >


    Otherwise, Chuck Falconer has a function called ggets() on his
    website that handles memory allocation and all that. I don't
    remember the link, but Google will find it.

    Richard Heathfield also has such a beast, according to the
    comments in Chuck's code. Given that Richard is still around
    and Chuck is not, you maybe will be better off with that.

    In either case, they're very easy functions to use.

    --
    Andrew Poelstra
    http://www.wpsoftware.net/andrew
    Andrew Poelstra, Mar 23, 2010
    #2
    1. Advertising

  3. (Mark Hobley) writes:

    > I want to read a text file a line at a time from within a C program. Are there
    > some available functions or code already written that does this or do I need
    > to code from scratch?

    <snip>
    > (My project is open source, so I can utilize GPL licenced code, if necessary.)


    gcc's glibc includes getline. If you can't use gcc and link against glibc
    you might be able to use the source (though extracting parts of the
    library might be fiddly).

    <snip>
    --
    Ben.
    Ben Bacarisse, Mar 23, 2010
    #3
  4. Mark Hobley

    Seebs Guest

    On 2010-03-22, Mark Hobley <> wrote:
    > I want to read a text file a line at a time from within a C program. Are there
    > some available functions or code already written that does this or do I need
    > to code from scratch?


    There are some.

    > If I am doing this from scratch, what is the best practise for allocating
    > a buffer size for the input line?


    Good question!

    > I guess open the file, scan once to determine the buffer size, then rewind and
    > start reading. Has this already been done or do I need to code this from
    > scratch?


    That's a very expensive way to do it. Reading is usually much more expensive
    than, say, copying in memory. If you can make reasonable guesses about buffer
    sizes, you should be able to do pretty well.

    Have a look at fgets(), which gets a string of definitely no more than a
    particular length. If a line is too long for it, you can call fgets()
    again to get more of the line.

    Do you need to keep multiple lines in memory, or do you just need to look
    at each one? A typical strategy I'll use for "look at each item in turn"
    is basically this:
    size_t line_len = 256;
    char *line_data;
    line_data = malloc(line_len);
    while (fgets(line_data, line_len, stdin)) {
    char *s;
    size_t this_line_len;
    this_line_len = strlen(line_data);
    while (line_data[this_line_len - 1] != '\n') {
    s = malloc(line_len * 2);
    memcpy(s, line_data, line_len);
    free(line_data);
    line_data = s;
    fgets(line_data + line_len, line_len, stdin);
    line_len *= 2;
    this_line_len = strlen(line_data);
    }
    }

    This omits quite a bit of error checking, but the basic idea is, you
    pick a buffer size, and use it, and if it's not big enough, you increase
    the buffer size, reallocate, then keep using that larger buffer. In
    most cases, you'll probably never even reallocate once.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Mar 23, 2010
    #4
  5. Mark Hobley <> wrote:
    > I want to read a text file a line at a time from within a C program. Are there
    > some available functions or code already written that does this or do I need
    > to code from scratch?


    > If I am doing this from scratch, what is the best practise for allocating
    > a buffer size for the input line?


    The simplest method is to start with guess for the length of the
    longest line and allocate as much. Now you use fgets() to read in
    a line and check if it ends in a '\n' - if it does everything is
    ok but if it doesn't the line was too long to fit into the buffer
    you started of with. In that case you jincrease the size of the
    buffer, e.g. by doubling its size, using realloc(), and try to
    read the rest of the line by calling fgets() again (but with the
    first argument pointing into the buffer were the last try stopped).
    Then repeat the test for the final '\n' and repeat increasing the
    buffer size if necessary. If you don't run out of memory you end
    up with a buffer that contains the complete line.

    The only special case you may have to consider is that the last
    line of a file may not end with a '\n' and then, of course, also
    what fgets() reads in can't contain that character - but if you
    try to read at the very end fgets() will return NULL, so it's
    possible to check for that condition.

    > I guess open the file, scan once to determine the buffer size, then rewind
    > and start reading.


    I guess reading the file twice just to find out the length of the
    longest line is too much work.

    > Has this already been done or do I need to code this from
    > scratch?


    Probably everyone being faced with the problem of reading lines of
    arbitary length will have written such a function at least once;-)
    Here's something I found looking through my files (although with
    quite a number changes to the original, so be wary, I may have
    broken it!):

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define LEN_GUESS 128

    int
    read_line( FILE * fp,
    char ** line )
    {
    static char *buf = NULL;
    static size_t buf_len = LEN_GUESS;
    char *p = buf;
    size_t rem_len = buf_len;

    if ( ! fp || ! line )
    return -1; /* bad argument(s) */

    if ( ! buf && ! ( buf = p = malloc( buf_len ) ) )
    return -1; /* running out of memory */
    *buf = '\0';

    while ( 1 )
    {
    size_t len;
    char *tmp;

    if ( ! fgets( p, rem_len, fp ) )
    {
    if ( ferror( fp ) )
    return -1; /* read failure */
    break;
    }

    len = strlen( p );

    if ( p[ len - 1 ] == '\n')
    break;

    if ( ! ( tmp = realloc( buf, 2 * buf_len ) ) )
    return -1; /* running out of memory */

    buf = tmp;
    p += len;
    rem_len += buf_len - len;
    buf_len *= 2;
    }

    *line = buf;
    return feof( fp ) ? 1 : 0; /* indicate if EOF has been reached */
    }

    Note that it's, of course, not thread-safe. And when you call it
    again the last line returned will be overwritten. When you don't
    need to call the function anymore you should free() the returned
    pointer.

    > (My project is open source, so I can utilize GPL licenced code, if
    > necessary.) C89 compatible code is preferred.


    Use it for whatever you want if it fits your needs (but better
    check carefully that it works, it's not my tested version, I
    just checked that it compiles!) And, of course, there are quite
    a number of ways it could be improved, it's more meant for giving
    you a better idea of how it could be done.

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
    Jens Thoms Toerring, Mar 23, 2010
    #5
  6. Andrew Poelstra <> writes:
    [...]
    > Otherwise, Chuck Falconer has a function called ggets() on his
    > website that handles memory allocation and all that. I don't
    > remember the link, but Google will find it.
    >
    > Richard Heathfield also has such a beast, according to the
    > comments in Chuck's code. Given that Richard is still around
    > and Chuck is not, you maybe will be better off with that.


    "still around" meaning that Richard still posts here in comp.lang.c;
    Chuck used to, but hasn't lately.

    > In either case, they're very easy functions to use.


    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 24, 2010
    #6
  7. "bartc" <> writes:

    > "Mark Hobley" <> wrote in message
    > news:...
    >>I want to read a text file a line at a time from within a C program. Are
    >>there
    >> some available functions or code already written that does this or do I
    >> need
    >> to code from scratch?
    >>
    >> If I am doing this from scratch, what is the best practise for allocating
    >> a buffer size for the input line?

    >
    > I just use a fixed size, big enough for text files that are line-oriented.
    >
    > I've just checked and I'm using a 2KB buffer, but it could be much higher if
    > memory allows.
    >
    > If the lines are longer than that sort of size, the file probably isn't
    > line-oriented and could do with a different approach. (Or might use a
    > different newline convention from that expected. Either way, you have a file
    > that is not in the right format.)


    I have two CSV files I'm using at the moment whose longest lines have
    2201 and 2306 bytes and one old one with a 10155 byte line. It's hard
    to put an upper limit on what is reasonable. Today's absurd it
    tomorrow's "pah!".

    <snip>
    --
    Ben.
    Ben Bacarisse, Mar 24, 2010
    #7
  8. Mark Hobley

    James Harris Guest

    On 22 Mar, 22:54, (Mark Hobley)
    wrote:
    > I want to read a text file a line at a time from within a C program. Are there
    > some available functions or code already written that does this or do I need
    > to code from scratch?


    Yes, I wrote a piece of code to do just that and incorporated in it
    helpful input from other people on comp.lang.c.

    http://codewiki.wikispaces.com/xbuf.c

    The section on reading lines shows what you are looking for and also
    why the code was needed, i.e. problems with other solutions.

    James
    James Harris, Mar 24, 2010
    #8
  9. On 23 Mar, 23:47, "bartc" <> wrote:
    > "Mark Hobley" <> wrote in message
    >
    > news:...
    >
    > >I want to read a text file a line at a time from within a C program. Are
    > >there
    > > some available functions or code already written that does this or do I
    > > need
    > > to code from scratch?

    >
    > > If I am doing this from scratch, what is the best practise for allocating
    > > a buffer size for the input line?

    >
    > I just use a fixed size, big enough for text files that are line-oriented.
    >
    > I've just checked and I'm using a 2KB buffer, but it could be much higher if
    > memory allows.
    >
    > If the lines are longer than that sort of size, the file probably isn't
    > line-oriented and could do with a different approach. (Or might use a
    > different newline convention from that expected. Either way, you have a file
    > that is not in the right format.)


    and what does your program do?


    > > I guess open the file, scan once to determine the buffer size, then rewind
    > > and
    > > start reading. Has this already been done or do I need to code this from
    > > scratch?

    >
    > From files that might work (although pedants might say that by the second
    > read, someone could have written a longer line to the file). From devices
    > such as consoles I'm not sure that would work.
    >
    > --
    > Bartc
    >
    > --- news://freenews.netfront.net/ - complaints: ---
    Nick Keighley, Mar 24, 2010
    #9
  10. "bartc" <> writes:
    [big snip]
    > What seems wrong is to let the input file dicate to you some ridiculous
    > 'line length' of perhaps a billion characters, and to go along with that.


    What seems wrong to me is to let limitations in the program impose
    some arbitrary limit on line length, when the input format you're
    trying to process imposes no such limit.

    If a file format specifies a maximum line length, then by all means go
    with that (and ideally report an error for any line that exceeds the
    limit, unless the format specification says that characters past the
    maximum are quietly ignored). If it doesn't, then handling
    arbitrarily long lines is better than imposing *any* limit other than
    what's imposed by available memory.

    And if the file format doesn't impose a maximum length but you're
    unwilling to handle very long lines, IMHO you should at least report
    an internal error if you see a line longer than you can handle.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 24, 2010
    #10
  11. "bartc" <> writes:

    > "Ben Bacarisse" <> wrote in message
    > news:...
    >> "bartc" <> writes:
    >>
    >>> "Mark Hobley" <> wrote in message
    >>> news:...
    >>>>I want to read a text file a line at a time from within a C program. Are

    >
    >>> I've just checked and I'm using a 2KB buffer, but it could be much
    >>> higher if
    >>> memory allows.
    >>>
    >>> If the lines are longer than that sort of size, the file probably isn't
    >>> line-oriented and could do with a different approach.

    >
    >> I have two CSV files I'm using at the moment whose longest lines have
    >> 2201 and 2306 bytes and one old one with a 10155 byte line. It's hard
    >> to put an upper limit on what is reasonable. Today's absurd it
    >> tomorrow's "pah!".

    >
    > The text file format is being abused then. This sounds like an export
    > from a database or spreadsheet. It's not text, unless you're using to
    > reading pages 60 feet wide.


    The structure is line-oriented. It should be read in text mode and a
    line ends when you see '\n'. I call that a text file.

    > If you already have code for a flexible getline(), then just
    > it. Otherwise the next step up from a hard-coded size is a one-time
    > allocated buffer which remains the same size. Bung 20KB (or 200KB) in
    > there, and have done with it.


    These solutions work, of course. I was just disputing the fact that
    there is some maximum line length beyond which something stops being a
    text file.

    <snip>
    --
    Ben.
    Ben Bacarisse, Mar 24, 2010
    #11
  12. Mark Hobley

    bartc Guest

    "Keith Thompson" <> wrote in message
    news:...
    > "bartc" <> writes:
    > [big snip]
    >> What seems wrong is to let the input file dicate to you some ridiculous
    >> 'line length' of perhaps a billion characters, and to go along with that.

    >
    > What seems wrong to me is to let limitations in the program impose
    > some arbitrary limit on line length, when the input format you're
    > trying to process imposes no such limit.


    OK, but then be prepared for your getline() function to actually need to be
    a getfile() function with some input, and to potentially grab most of the
    memory in your system, or even to bring down the program (if a giant file
    uses the wrong newline format for example).

    --
    Bartc
    bartc, Mar 24, 2010
    #12
  13. Mark Hobley

    Moi Guest

    On Mon, 22 Mar 2010 22:54:42 +0000, Mark Hobley wrote:

    > I want to read a text file a line at a time from within a C program. Are
    > there some available functions or code already written that does this or
    > do I need to code from scratch?
    >
    > If I am doing this from scratch, what is the best practise for
    > allocating a buffer size for the input line?
    >
    > I guess open the file, scan once to determine the buffer size, then
    > rewind and start reading. Has this already been done or do I need to
    > code this from scratch?
    >
    > (My project is open source, so I can utilize GPL licenced code, if
    > necessary.)
    >
    > C89 compatible code is preferred.
    >
    > Mark.


    No need for limits.

    1) Read the entire file into one buffer using fread, realloc() when needed.
    2) Make a second pass on the buffer, find the line endings , handle \r\n,
    replace them by \0, save the beginnings of the lines in an array of
    pointers, realloc()ing when needed,
    3) Make a third pass: process each line , searching for commas, replacing
    them by \0, saving pointers to the beginnings, realloc()ing when needed.

    Step 2 and 3 need to take care of quoting / escaping.
    Step 1,2,3 _can_ be combined into one state machine.



    HTH,
    AvK
    Moi, Mar 24, 2010
    #13
  14. "bartc" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> "bartc" <> writes:
    >> [big snip]
    >>> What seems wrong is to let the input file dicate to you some ridiculous
    >>> 'line length' of perhaps a billion characters, and to go along with that.

    >>
    >> What seems wrong to me is to let limitations in the program impose
    >> some arbitrary limit on line length, when the input format you're
    >> trying to process imposes no such limit.

    >
    > OK, but then be prepared for your getline() function to actually need
    > to be a getfile() function with some input, and to potentially grab
    > most of the memory in your system, or even to bring down the program
    > (if a giant file uses the wrong newline format for example).


    Or don't read an entire line into memory at a time. For example,
    if you're reading an XML file -- well, you should be using an
    XML parser that somebody else has already written. But if you're
    writing an XML parser for some reason, it might make more sense to
    read and store input until you see a '<' or '>' rather than '\n'.
    I've seen XML files with extremely long lines, but not with extremely
    long tag names.

    But yes, sometimes it does make sense to read entire lines into memory
    at once, even if they might be inordinately long.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 25, 2010
    #14
  15. Mark Hobley

    bartc Guest

    "Keith Thompson" <> wrote in message
    news:...
    > "bartc" <> writes:
    >> "Keith Thompson" <> wrote in message
    >> news:...
    >>> "bartc" <> writes:


    >>> What seems wrong to me is to let limitations in the program impose
    >>> some arbitrary limit on line length, when the input format you're
    >>> trying to process imposes no such limit.

    >>
    >> OK, but then be prepared for your getline() function to actually need
    >> to be a getfile() function with some input


    > Or don't read an entire line into memory at a time. For example,
    > if you're reading an XML file -- well, you should be using an
    > XML parser that somebody else has already written. But if you're
    > writing an XML parser for some reason, it might make more sense to
    > read and store input until you see a '<' or '>' rather than '\n'.
    > I've seen XML files with extremely long lines, but not with extremely
    > long tag names.


    I think XML is one of those text formats (like C source files and HTML),
    which are not really line-oriented; newline is just another whitespace
    character.

    In that case, if you don't use a dedicated file reader as you've suggested,
    you can't really use simple line-input.

    --
    Bartc
    bartc, Mar 25, 2010
    #15
  16. "bartc" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> "bartc" <> writes:
    >>> "Keith Thompson" <> wrote in message
    >>> news:...
    >>>> "bartc" <> writes:
    >>>> What seems wrong to me is to let limitations in the program impose
    >>>> some arbitrary limit on line length, when the input format you're
    >>>> trying to process imposes no such limit.
    >>>
    >>> OK, but then be prepared for your getline() function to actually need
    >>> to be a getfile() function with some input

    >
    >> Or don't read an entire line into memory at a time. For example,
    >> if you're reading an XML file -- well, you should be using an
    >> XML parser that somebody else has already written. But if you're
    >> writing an XML parser for some reason, it might make more sense to
    >> read and store input until you see a '<' or '>' rather than '\n'.
    >> I've seen XML files with extremely long lines, but not with extremely
    >> long tag names.

    >
    > I think XML is one of those text formats (like C source files and
    > HTML), which are not really line-oriented; newline is just another
    > whitespace character.


    Quibble: C preprocessor directives are line-oriented. And a C
    compiler is allowed to impose a maximum line length on source files.

    > In that case, if you don't use a dedicated file reader as you've
    > suggested, you can't really use simple line-input.


    Sure you can, as long as your simple line-input can handle arbitrarily
    long lines (and you have enough memory to store them). Admittedly
    it might not be the ideal solution.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 25, 2010
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bob
    Replies:
    5
    Views:
    16,518
    tuldom84
    Jul 15, 2012
  2. sahm
    Replies:
    4
    Views:
    42,596
    rel0aded911
    Nov 23, 2009
  3. Joe Wright
    Replies:
    0
    Views:
    496
    Joe Wright
    Jul 27, 2003
  4. Murali
    Replies:
    2
    Views:
    541
    Jerry Coffin
    Mar 9, 2006
  5. Cah Sableng
    Replies:
    0
    Views:
    230
    Cah Sableng
    Apr 23, 2007
Loading...

Share This Page