getline problem

Discussion in 'C Programming' started by Gand Alf, Feb 10, 2011.

  1. Gand Alf

    Gand Alf Guest

    Hello

    I have been writing a getline-type function. It should read an arbitrary
    length line from a file and return it in a buffer.

    Here is my code


    #include<stdio.h>
    #include<stdlib.h>
    #include<string.h>
    void
    GL (char **buf, size_t * sz, FILE * fp)
    {
    size_t off = 0;
    char *new;
    for(;;)
    {
    if(*buf)
    {
    fgets (*buf + off, *sz - off, fp);
    if(strchr (*buf + off, '\n'))
    break;
    off += *sz - 1;
    }
    else
    *sz = 1;
    new = realloc (*buf, *sz <<= 1);
    if(!new)
    {
    free (*buf);
    *buf = *sz = 0;
    return;
    }
    *buf = new;
    }
    }

    int
    main (void)
    {
    char *p = 0;
    size_t sz = 0;
    while(!feof (stdin))
    {
    GL (&p, &sz, stdin);
    printf ("%d\n", strlen (p));
    }
    }


    However when I run this with the following test file as input:
    ==begin input==
    hello
    world
    everyone

    bye
    ==end input==
    the results are as follows:

    3
    6
    9
    1
    4
    4

    instead I would expect to get:
    5
    5
    8
    0
    3

    Can anyone see what the problem is?

    Thanks
    Gand Alf, Feb 10, 2011
    #1
    1. Advertising

  2. Gand Alf

    James Waldby Guest

    On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:
    > I have been writing a getline-type function. It should read an
    > arbitrary length line from a file and return it in a buffer.

    [snip #includes]
    > void
    > GL (char **buf, size_t * sz, FILE * fp) {
    > size_t off = 0;
    > char *new;
    > for(;;)

    What, you can't spell 'while' ?

    > {
    > if(*buf)
    > {
    > fgets (*buf + off, *sz - off, fp);
    > if(strchr (*buf + off, '\n'))
    > break;
    > off += *sz - 1;

    ^ You have an extra character there.
    > }

    [snip realloc & most of main]
    Note, your main starts off properly enough with
    "int main (void)", but at the end has no "return 0".

    > However when I run this with the following test file as input: ==begin
    > input==
    > hello
    > world
    > everyone
    >
    > bye
    > ==end input==
    > the results are as follows:
    >
    > 3
    > 6
    > 9
    > 1
    > 4
    > 4
    >
    > instead I would expect to get:
    > 5
    > 5
    > 8
    > 0
    > 3
    >
    > Can anyone see what the problem is?


    (a) Wrong expectation -- Per man fgets, "If a newline is read,
    it is stored into the buffer", and (eg) strlen("world\n") is 6.

    (b) Mishandled EOF test -- Again per man, "fgets() returns ...
    NULL ... when end of file occurs while no characters have been
    read", but you've discarded the value of fgets(), so can't put
    a \0 at **buf, etc

    --
    jiw
    James Waldby, Feb 10, 2011
    #2
    1. Advertising

  3. pete <> writes:
    > James Waldby wrote:
    >>
    >> On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:

    >
    >> > for(;;)

    >> What, you can't spell 'while' ?

    >
    > for(;;) is the special construct for infinite loops.


    It's *a* construct for infinite loops.

    > On some compilers, while(1) generates a warning.


    while (1) and for (;;) are equally valid. Compilers can warn about
    anything they like.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Feb 10, 2011
    #3
  4. In article <ij1jrl$i7p$-september.org>,
    James Waldby <> wrote:

    > On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:


    [...]

    > > for(;;)

    > What, you can't spell 'while' ?


    I forgot around the 999th time I changed
    `while(1)' to `for(<initialize>; <test>; <iterate>)'.

    --
    Michael Press
    Michael Press, Feb 11, 2011
    #4
  5. Gand Alf

    Myth__Buster Guest

    @ OP

    <problematic snippet>

    if(strchr (*buf + off, '\n'))
    break;

    </problematic snippet>

    Well, there is indeed an extra character being considered as part of
    the string read before the control passes back to main. So, for this
    we would just replace it with NUL providing the required NUL-
    terminated C string back to main.

    <corrected snippet>

    char *ptrToNewLineChar = strchr (*buf + off, '\n');

    if( ptrToNewLineChar != NULL )
    {
    *ptrToNewLineChar = '\0';
    break;
    }

    </corrected snippet>

    In addition, if the specified number of characters in fgets are not
    read in then, there would be a need to update the offset(variable -
    off) appropriately. I would suggest to consider the strlen( ) to help
    in this regard as the the read buffer(variable - buf) would be a C
    string, to count the exact number of characters read correctly once we
    chop off the newline character from the read buffer as shown above.
    The corresponding implementation is below.

    <problematic snippet>

    off += *sz - 1;

    </problematic snippet>

    <corrected snippet>

    off += strlen(*buf + off);

    </corrected snippet>


    Cheers.
    Myth__Buster, Feb 11, 2011
    #5
  6. Gand Alf

    James Harris Guest

    On Feb 10, 7:48 pm, Gand Alf <> wrote:
    > Hello
    >
    > I have been writing a getline-type function. It should read an arbitrary
    > length line from a file and return it in a buffer.


    ....

    > Can anyone see what the problem is?


    Just a FYI. fgets is ok for limited use but it has some issues which
    make it not suitable unless you can guarantee the input to it. See

    http://codewiki.wikispaces.com/xbuf.c

    and look at the section What's wrong with fgets?

    James
    James Harris, Feb 11, 2011
    #6
  7. On Feb 11, 8:03 pm, James Harris <>
    wrote:
    >
    >
    > and look at the section What's wrong with fgets?
    >

    This doesn't mention the real problem, which is that fgets() is just
    too difficult to use correctly if a program must process all input
    with absolutely no errors.

    You have to to call strchr() to check for the newline. If it is
    absent, a partial read has occurred. So you need to take action to
    ensure that the characters remaining in the stream are not treated as
    a whole line. All very fiddly.
    However in most applications you can just assume that a partial read
    will generate a parse error on the next line.
    Malcolm McLean, Feb 13, 2011
    #7
  8. Gand Alf

    James Harris Guest

    On Feb 13, 12:25 pm, Malcolm McLean <>
    wrote:
    > On Feb 11, 8:03 pm, James Harris <>
    > wrote:
    >
    > > and look at the section What's wrong with fgets?

    >
    > This doesn't mention the real problem, which is that fgets() is just
    > too difficult to use correctly if a program must process all input
    > with absolutely no errors.
    >
    > You have to to call strchr() to check for the newline. If it is
    > absent, a partial read has occurred. So you need to take action to
    > ensure that the characters remaining in the stream are not treated as
    > a whole line. All very fiddly.
    > However in most applications you can just assume that a partial read
    > will generate a parse error on the next line.


    It's worse than I thought!

    Incidentally, it occurred to me that another, more modern way to read
    a line from a file may be to mmap the file and use memchr or similar
    to scan it. Maybe the fastest option? As a slight downside, the code
    might need to account for very large files that exceed address space.

    James
    James Harris, Feb 13, 2011
    #8
  9. Gand Alf

    Eric Sosman Guest

    [OT] Re: getline problem

    On 2/13/2011 2:56 PM, James Harris wrote:
    > [...]
    > Incidentally, it occurred to me that another, more modern way to read
    > a line from a file may be to mmap the file and use memchr or similar
    > to scan it. Maybe the fastest option? As a slight downside, the code
    > might need to account for very large files that exceed address space.


    <off-topic reason="requires extensions">

    That's one downside. Another is that the system's line-ending
    conventions won't be translated to '\n' for you; you'll need to
    interpret them yourself. Still another downside is that not all
    input sources are mmap-able: Try it on your keyboard, for instance,
    or on a pipe or socket.

    </off-topic>

    --
    Eric Sosman
    lid
    Eric Sosman, Feb 13, 2011
    #9
  10. Gand Alf

    James Harris Guest

    On Feb 13, 9:00 pm, Eric Sosman <> wrote:
    > On 2/13/2011 2:56 PM, James Harris wrote:
    >
    > > [...]
    > > Incidentally, it occurred to me that another, more modern way to read
    > > a line from a file may be to mmap the file and use memchr or similar
    > > to scan it. Maybe the fastest option? As a slight downside, the code
    > > might need to account for very large files that exceed address space.

    >
    > <off-topic reason="requires extensions">
    >
    >      That's one downside.  Another is that the system's line-ending
    > conventions won't be translated to '\n' for you; you'll need to
    > interpret them yourself.


    True.

    >  Still another downside is that not all
    > input sources are mmap-able: Try it on your keyboard, for instance,
    > or on a pipe or socket.


    Sure. I did say "file".

    James
    James Harris, Feb 13, 2011
    #10
  11. On Thu, 10 Feb 2011 19:48:18 +0000 (UTC), Gand Alf <>
    wrote:

    >Hello
    >
    >I have been writing a getline-type function. It should read an arbitrary
    >length line from a file and return it in a buffer.
    >
    >Here is my code
    >
    >
    >#include<stdio.h>
    >#include<stdlib.h>
    >#include<string.h>
    >void
    >GL (char **buf, size_t * sz, FILE * fp)


    1 - The first time you call GL, *buf is NULL and *sz is 0
    2 - The second time you call GL, *sz is 8 and buf points to an 8-byte
    area.
    3 - The third time you call GL is the same as the second.
    4 - You should work out the fourth and subsequent iterations for
    yourself.

    >{
    > size_t off = 0;
    > char *new;
    > for(;;)
    > {
    > if(*buf)
    > {
    > fgets (*buf + off, *sz - off, fp);


    1B - On the second iteration, *buf points to "h\0".
    1C - On the third iteration, *buf points to "hel\0".
    1D - On the fourth iteration, *buf points to "hel\0lo\n\0" It should
    now be obvious why your first output is 3.

    2A - On the first iteration, *buf points to "world\n\0" . It should
    now be obvious why your second output is 6.

    3A - On the first iteration, *buf points to "everyon\0".
    3B - On the second iteration, *buf points to "everyone\n\0". It
    should be obvious why your third output is 9.

    > if(strchr (*buf + off, '\n'))


    1D - And you return to main.

    2A - And you return to main.

    3B - And you return to main.

    > break;
    > off += *sz - 1;


    1B - off contains 1.
    1C - off now contains 4.

    3A - off now contains 7.

    > }
    > else


    1A - You set *sz to 1 on the first iteration of the for loop.

    > *sz = 1;
    > new = realloc (*buf, *sz <<= 1);


    1A - Ignoring malloc failures, *sz is now 2 and new points to a 2-byte
    area.
    1B - *sz is now 4 and new points to a 4-byte area.
    1C - *sz is now 8 and new points to an 8-byte area.

    3A = *sz is now 16 and new points to a 16 byte area.

    > if(!new)
    > {
    > free (*buf);
    > *buf = *sz = 0;
    > return;
    > }
    > *buf = new;
    > }
    >}
    >
    >int
    >main (void)
    >{
    > char *p = 0;
    > size_t sz = 0;
    > while(!feof (stdin))
    > {
    > GL (&p, &sz, stdin);
    > printf ("%d\n", strlen (p));
    > }
    >}
    >
    >
    >However when I run this with the following test file as input:
    >==begin input==
    >hello
    >world
    >everyone
    >
    >bye
    >==end input==
    >the results are as follows:
    >
    >3
    >6
    >9
    >1
    >4
    >4
    >
    >instead I would expect to get:
    >5
    >5
    >8
    >0
    >3
    >
    >Can anyone see what the problem is?
    >
    >Thanks


    --
    Remove del for email
    Barry Schwarz, Feb 24, 2011
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John
    Replies:
    10
    Views:
    10,032
    Buster Copley
    Jul 14, 2003
  2. Poof
    Replies:
    1
    Views:
    373
  3. Joe
    Replies:
    4
    Views:
    7,738
    Chris Theis
    Jan 22, 2004
  4. solartimba

    getline problem

    solartimba, Apr 6, 2004, in forum: C++
    Replies:
    1
    Views:
    424
    Leor Zolman
    Apr 6, 2004
  5. Alex

    getline problem

    Alex, Jun 5, 2004, in forum: C++
    Replies:
    6
    Views:
    501
    Victor Bazarov
    Jun 6, 2004
Loading...

Share This Page