Segfault question.

N

name

When I started testing the algorithms for my wrap program, I threw together
this snippet of code, which works quite well. Except that it (predictably)
segfaults at the end when it tries to go beyond the file. At some point, I
tried to mend that behavior using feof() but without success. The
functionality is not harmed, but this has started to bug me. What am I
missing here? Sometimes being a code duffer is frustrating!! lol!!!

The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */

return EXIT_SUCCESS;
}

Thanks for reading.
 
M

Mike Wahler

name said:
When I started testing the algorithms for my wrap program, I threw together
this snippet of code, which works quite well.

No it doesn't. It invokes undefined behavior.
Except that it (predictably)
segfaults at the end when it tries to go beyond the file.

And I can see exactly why. See below.
At some point, I
tried to mend that behavior using feof() but without success.

Guessing rarely will fix the real problem.
The
functionality is not harmed,

Well, no, you can't kill something that's already dead. :)
but this has started to bug me.

Yes, you have a serious, fatal bug.
What am I
missing here?

You apparently forgot to check the documentation of a library
function, because you didn't allow for its possible failure.
Sometimes being a code duffer is frustrating!! lol!!!

Especially when you try to go to fast, as I suspect you've done.
The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;

I see below that you store the return value from 'strlen()'
in 'len'. This means its type should be 'size_t', not 'int'.
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {

You should check that 'argv[1]' is indeed a valid pointer
(i.e. make sure argc > 1) before trying to dereference it.
If argc <= 1, then the expression 'argv[1]' invokes
undefined behavior.
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {

If 'fgets()' encounters an error or end of file, it will return
NULL. If you pass NULL as the argument to 'strlen()' you get
undefined behavior (which could be manifested as a 'segfault').
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */

Nor did you do error checking where it really mattered.
Check the return value of *any* function which is documented
to possibly return a 'failure' indication (as does 'fgets()').
return EXIT_SUCCESS;
}

-Mike
 
C

CBFalconer

name said:
When I started testing the algorithms for my wrap program, I
threw together this snippet of code, which works quite well.
Except that it (predictably) segfaults at the end when it tries
to go beyond the file. At some point, I tried to mend that
behavior using feof() but without success. The functionality is
not harmed, but this has started to bug me. What am I missing
here? Sometimes being a code duffer is frustrating!! lol!!!

The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */

return EXIT_SUCCESS;
}

Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that.

<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>
 
V

Victor Nazarov

CBFalconer said:
Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that.
I've neve been able to figue out when fgets returns NULL. Please can you
explain it.
Fo example if I've got the following file:
aaaaaaaa\n
bbbbbbbbb<EOF>

And I would call fgets three times in a raw with a big buffer (lager
then 10 bytes or chars). When whould fgets return NULL?
<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>

IMHO if lines are too long it's time to create new function to solve the
part of the whole task.

> while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
I think that functional style is good enough, so I suggest that you'll
write a wrapper fo strlen.
Something like:
int my_strlen (const char *s)
{
size_t tmp;

if (s == NULL)
return -1;
tmp = strlen (s);
if (tmp > INT_MAX) {
errno = EINVAL;
return -1;
}
return (int)tmp;
}
 
C

Chris Torek

<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>

I personally do not mind 8-character-per-lexical-level indentation,
although I do think 4 works better. I do remember hearing, from
the "human/computer interaction" folks and people doing visual
studies, that anything less than three characters is not so good,
because -- depending on one's font -- two-character indentations
may not create sufficient angles to trigger the brain's horizontal
and vertical line detectors. These detectors exist, though, and
run all the time whether we want them to or not; careful indentation
takes advantage of them.

As for tabs: use them or do not, but do not change your system's
interpretation of them. If you want n-character indentation where
"n" differs from the system's interpretation of "hardware" tabs,
just make sure that when you push your "tab" key in your editor,
it inserts spaces and/or tabs in order to get to the n'th column.
(In some cases, this may mean pushing a key other than the one
labelled "tab". For instance, in vi/nvi/vim, use ^T and ^D to
indent and -- assuming you have autoindent set -- de-indent by the
value you have put in the "shiftwidth" setting. In emacs, of
course, the whole thing is fully programmable.) If you do this
*instead* of instructing your editor to re-interpret the hardware
tabs, then anyone using the same underlying system will be able to
edit your code and see the same columnization that you see.
 
N

name

Oops, included wrong file! My bad!! That was a prototype that didn't yield
correct results, as well as being badly constructed. The user version does
yield correct results but is still badly constructed (natch...), so exhibits
the same behavior.

Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that.

Okay, fgets returns a null pointer if it encounters either an EOF
immediately, or if it encounters an error. In the latter case, the string
array is undefined, so error checking fgets should be the first thing to do,
I gather. Passing a null pointer to strlen is what causes the segfault?
Does that mean that strlen returns that error because it doesn't recognize
what has been passed and so assumes it's outside of its allotted territory?

Or is something else entirely going on and I'm still at sea? <grin>

Thanks!
 
C

Chris Torek

Okay, fgets returns a null pointer if it encounters either an EOF
immediately, or if it encounters an error.

Yes (although the "error" case is a bit dodgy; some fgets()
implementations will only return NULL on EOF-or-error-at-start,
treating error-in-the-middle as a sign to return a valid C string
that does not end with '\n').
In the latter case, the string array is undefined, so error
checking fgets should be the first thing to do, I gather.

Yes. More precisely, check whether fgets() returned its first
argument or NULL (these are the only two possibilities). (You
can also use (feof(fp) || ferror(fp)) to see whether EOF and/or
error were encountered "along the way", but this may interact
badly with fgets() variants that handle partial input lines, as
I described above.)
Passing a null pointer to strlen is what causes the segfault?

Just so. The effect is officially undefined, but a "nice" system
such as a Linux box will trap the error at runtime and terminate
the program (by default -- programs can override this, and debuggers
can trap the problem before the program sees it). Less-nice systems
might have strlen() return 42.
Does that mean that strlen returns that error because it doesn't recognize
what has been passed and so assumes it's outside of its allotted territory?

Or is something else entirely going on and I'm still at sea? <grin>

On your system, strlen() never returns at all -- so it makes no
sense to say "strlen returns that error". You could say "strlen
produces that result", which at least avoids the word "returns". :)

The method by which Linux detects the problem and aborts the program
is beyond the scope of this newsgroup.[%] Here, it suffices to say
that strlen() requires a C string, and (char *)NULL does not qualify
as one. (A "C string" is a data structure consisting of one or
more "char"s in sequence, beginning with the char whose address is
given as a value of type "char *", and ending with the first '\0'.
Since NULL never points to a valid C data object, it cannot provide
the first byte of a string. Note that the empty string begins and
ends with its '\0' byte, which makes it quite different from NULL:
there is at least one valid C "char" there holding the '\0'.)

[% Still, I will mention that it has to do with "virtual memory"
and the on-chip MMU, which translates "virtual addresses" used by
running programs into "physical addresses" used to locate actual
bytes in RAM. The translation process has several trapping options,
with varying methods of handling them and "degrees of fatality":
areas can be marked entire-off-limits, or "within limits but not
present in RAM at the moment", or "valid but read-only", and so
on. On some CPUs, areas can even be marked execute-only, so that
it is impossible to read CPU instructions as data. Linux reserves
some areas as "not allocated to the program" and sets up the MMU
so that those areas are marked off-limits, then delivers a segmentation
fault if you attempt to read, write, or execute from such an area.]
 
N

name

<saved densely informative post for further study!!>

I gather that, for my purposes, the segfault at the EOF is comfortable,
because the EOF virtually always follows a newline and will be thus at the
beginning of a string. Doesn't bother me, but... suppose I process a file
where the EOF does show up without being preceeded by a newline? At that
point, I can't just live with a segfault unless I'm sure of the data I'm
getting.

Certainly I'm not in the market for the solution to Schroedinger's God
Problem as obtained some thousands of centuries hence in a land far, far
away!! LOL!!! Ummm... that was (will have been?) 42, was it not? <grin>

Perhaps I should just use the venerable ((c=getc(fp))!=EOF) approach. I'm
using that to drive the wrap program and, as near as I can tell, it's simple
enough that it should be considered bullet-proof. I understand that's not
the most efficient way of going, but for what I'm doing, that's really not
relevant. I can say that I am not disposed to even touch the scanf family
of functions! <grin>

I must presume that there are other more sophisticated strategies in use,
but I'm going to have to stick with what I think I can manage to understand,
lest I inundate myself unnecessarily!

In any case, thanks for all the info!
 
B

Barry Schwarz

I gather that, for my purposes, the segfault at the EOF is comfortable,
because the EOF virtually always follows a newline and will be thus at the
beginning of a string. Doesn't bother me, but... suppose I process a file
where the EOF does show up without being preceeded by a newline? At that
point, I can't just live with a segfault unless I'm sure of the data I'm
getting.

Not a good plan. The segfault you are currently experiencing is the
result of undefined behavior. The thing about undefined behavior is
it need not be consistent. Tomorrow it could manifest itself in a
completely different fashion, such as deleting the file you just
finished processing. Upgrading your hardware, OS, or compiler could
also change the behavior.



<<Remove the del for email>>
 
N

name

Not a good plan. The segfault you are currently experiencing is the
result of undefined behavior. The thing about undefined behavior is
it need not be consistent. Tomorrow it could manifest itself in a
completely different fashion, such as deleting the file you just
finished processing. Upgrading your hardware, OS, or compiler could
also change the behavior.

Okay, thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top