getline problem

G

Gand Alf

Hello

I have been writing a getline-type function. It should read an arbitrary
length line from a file and return it in a buffer.

Here is my code


#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void
GL (char **buf, size_t * sz, FILE * fp)
{
size_t off = 0;
char *new;
for(;;)
{
if(*buf)
{
fgets (*buf + off, *sz - off, fp);
if(strchr (*buf + off, '\n'))
break;
off += *sz - 1;
}
else
*sz = 1;
new = realloc (*buf, *sz <<= 1);
if(!new)
{
free (*buf);
*buf = *sz = 0;
return;
}
*buf = new;
}
}

int
main (void)
{
char *p = 0;
size_t sz = 0;
while(!feof (stdin))
{
GL (&p, &sz, stdin);
printf ("%d\n", strlen (p));
}
}


However when I run this with the following test file as input:
==begin input==
hello
world
everyone

bye
==end input==
the results are as follows:

3
6
9
1
4
4

instead I would expect to get:
5
5
8
0
3

Can anyone see what the problem is?

Thanks
 
J

James Waldby

I have been writing a getline-type function. It should read an
arbitrary length line from a file and return it in a buffer. [snip #includes]
void
GL (char **buf, size_t * sz, FILE * fp) {
size_t off = 0;
char *new;
for(;;)
What, you can't spell 'while' ?
{
if(*buf)
{
fgets (*buf + off, *sz - off, fp);
if(strchr (*buf + off, '\n'))
break;
off += *sz - 1;
^ You have an extra character there.
[snip realloc & most of main]
Note, your main starts off properly enough with
"int main (void)", but at the end has no "return 0".
However when I run this with the following test file as input: ==begin
input==
hello
world
everyone

bye
==end input==
the results are as follows:

3
6
9
1
4
4

instead I would expect to get:
5
5
8
0
3

Can anyone see what the problem is?

(a) Wrong expectation -- Per man fgets, "If a newline is read,
it is stored into the buffer", and (eg) strlen("world\n") is 6.

(b) Mishandled EOF test -- Again per man, "fgets() returns ...
NULL ... when end of file occurs while no characters have been
read", but you've discarded the value of fgets(), so can't put
a \0 at **buf, etc
 
K

Keith Thompson

pete said:
for(;;) is the special construct for infinite loops.

It's *a* construct for infinite loops.
On some compilers, while(1) generates a warning.

while (1) and for (;;) are equally valid. Compilers can warn about
anything they like.
 
M

Myth__Buster

@ OP

<problematic snippet>

if(strchr (*buf + off, '\n'))
break;

</problematic snippet>

Well, there is indeed an extra character being considered as part of
the string read before the control passes back to main. So, for this
we would just replace it with NUL providing the required NUL-
terminated C string back to main.

<corrected snippet>

char *ptrToNewLineChar = strchr (*buf + off, '\n');

if( ptrToNewLineChar != NULL )
{
*ptrToNewLineChar = '\0';
break;
}

</corrected snippet>

In addition, if the specified number of characters in fgets are not
read in then, there would be a need to update the offset(variable -
off) appropriately. I would suggest to consider the strlen( ) to help
in this regard as the the read buffer(variable - buf) would be a C
string, to count the exact number of characters read correctly once we
chop off the newline character from the read buffer as shown above.
The corresponding implementation is below.

<problematic snippet>

off += *sz - 1;

</problematic snippet>

<corrected snippet>

off += strlen(*buf + off);

</corrected snippet>


Cheers.
 
J

James Harris

Hello

I have been writing a getline-type function. It should read an arbitrary
length line from a file and return it in a buffer.
....

Can anyone see what the problem is?

Just a FYI. fgets is ok for limited use but it has some issues which
make it not suitable unless you can guarantee the input to it. See

http://codewiki.wikispaces.com/xbuf.c

and look at the section What's wrong with fgets?

James
 
M

Malcolm McLean

and look at the section What's wrong with fgets?
This doesn't mention the real problem, which is that fgets() is just
too difficult to use correctly if a program must process all input
with absolutely no errors.

You have to to call strchr() to check for the newline. If it is
absent, a partial read has occurred. So you need to take action to
ensure that the characters remaining in the stream are not treated as
a whole line. All very fiddly.
However in most applications you can just assume that a partial read
will generate a parse error on the next line.
 
J

James Harris

This doesn't mention the real problem, which is that fgets() is just
too difficult to use correctly if a program must process all input
with absolutely no errors.

You have to to call strchr() to check for the newline. If it is
absent, a partial read has occurred. So you need to take action to
ensure that the characters remaining in the stream are not treated as
a whole line. All very fiddly.
However in most applications you can just assume that a partial read
will generate a parse error on the next line.

It's worse than I thought!

Incidentally, it occurred to me that another, more modern way to read
a line from a file may be to mmap the file and use memchr or similar
to scan it. Maybe the fastest option? As a slight downside, the code
might need to account for very large files that exceed address space.

James
 
E

Eric Sosman

[...]
Incidentally, it occurred to me that another, more modern way to read
a line from a file may be to mmap the file and use memchr or similar
to scan it. Maybe the fastest option? As a slight downside, the code
might need to account for very large files that exceed address space.

<off-topic reason="requires extensions">

That's one downside. Another is that the system's line-ending
conventions won't be translated to '\n' for you; you'll need to
interpret them yourself. Still another downside is that not all
input sources are mmap-able: Try it on your keyboard, for instance,
or on a pipe or socket.

</off-topic>
 
J

James Harris

[...]
Incidentally, it occurred to me that another, more modern way to read
a line from a file may be to mmap the file and use memchr or similar
to scan it. Maybe the fastest option? As a slight downside, the code
might need to account for very large files that exceed address space.

<off-topic reason="requires extensions">

     That's one downside.  Another is that the system's line-ending
conventions won't be translated to '\n' for you; you'll need to
interpret them yourself.
True.

 Still another downside is that not all
input sources are mmap-able: Try it on your keyboard, for instance,
or on a pipe or socket.

Sure. I did say "file".

James
 
B

Barry Schwarz

Hello

I have been writing a getline-type function. It should read an arbitrary
length line from a file and return it in a buffer.

Here is my code


#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void
GL (char **buf, size_t * sz, FILE * fp)

1 - The first time you call GL, *buf is NULL and *sz is 0
2 - The second time you call GL, *sz is 8 and buf points to an 8-byte
area.
3 - The third time you call GL is the same as the second.
4 - You should work out the fourth and subsequent iterations for
yourself.
{
size_t off = 0;
char *new;
for(;;)
{
if(*buf)
{
fgets (*buf + off, *sz - off, fp);

1B - On the second iteration, *buf points to "h\0".
1C - On the third iteration, *buf points to "hel\0".
1D - On the fourth iteration, *buf points to "hel\0lo\n\0" It should
now be obvious why your first output is 3.

2A - On the first iteration, *buf points to "world\n\0" . It should
now be obvious why your second output is 6.

3A - On the first iteration, *buf points to "everyon\0".
3B - On the second iteration, *buf points to "everyone\n\0". It
should be obvious why your third output is 9.
if(strchr (*buf + off, '\n'))

1D - And you return to main.

2A - And you return to main.

3B - And you return to main.
break;
off += *sz - 1;

1B - off contains 1.
1C - off now contains 4.

3A - off now contains 7.

1A - You set *sz to 1 on the first iteration of the for loop.
*sz = 1;
new = realloc (*buf, *sz <<= 1);

1A - Ignoring malloc failures, *sz is now 2 and new points to a 2-byte
area.
1B - *sz is now 4 and new points to a 4-byte area.
1C - *sz is now 8 and new points to an 8-byte area.

3A = *sz is now 16 and new points to a 16 byte area.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top