Can I assume fgets won't modify last bytes of output array if unused ?

F

Francis Moreau

Hello,

I think this is undefined behaviour, but I prefer asking just in case
I'm missing something.

Consider that a line is 8 characters long (including newline) and I pass
to fgets a buffer which can store at least 32 characters.

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

Thanks
 
M

Mark Wooding

Francis Moreau said:
Consider that a line is 8 characters long (including newline) and I pass
to fgets a buffer which can store at least 32 characters.

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

I think that you can, according to the standard: 7.19.7.2p2:

The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream
into the array pointed to by s. No additional characters are
read after a new-line character (which is retained) or after
end-of-file. A null character is written immediately after the
last character read into the array.

It only `reads ... characters ... into the array', and finally writes a
null character after the last one. It doesn't say it does anything else
to the array. An implementation that randomly trashes other parts of
the array would therefore be nonconforming.

-- [mdw]
 
F

Francis Moreau

I think that you can, according to the standard: 7.19.7.2p2:

The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream
into the array pointed to by s. No additional characters are
read after a new-line character (which is retained) or after
end-of-file. A null character is written immediately after the
last character read into the array.

It only `reads ... characters ... into the array', and finally writes a
null character after the last one. It doesn't say it does anything else
to the array. An implementation that randomly trashes other parts of
the array would therefore be nonconforming.

Well that's not really clear to me, it must indeed read at most n-1
characters and write them to the array with a null character. But it
doesn't say it must not trash following characters in the array even if
that sounds stupid...
 
M

Mark Wooding

Francis Moreau said:
Well that's not really clear to me, it must indeed read at most n-1
characters and write them to the array with a null character. But it
doesn't say it must not trash following characters in the array even
if that sounds stupid...

It also doesn't say that many other unhelpful and counterintuitive
things don't occur. If your implementation's `fgets' does something
other than what's described in the standard, then it's not conforming.
And that includes making `beep-beep' noises, or clobbering extra stuff
in the input array.

-- [mdw]
 
F

Francis Moreau

It also doesn't say that many other unhelpful and counterintuitive
things don't occur.

That's the reason why I think it's undefined.
If your implementation's `fgets' does something other than what's
described in the standard, then it's not conforming.

Well, I would say this differently: if my program relies on this
undefined behaviour then it's not conforming.
 
E

Eric Sosman

Hello,

I think this is undefined behaviour, but I prefer asking just in case
I'm missing something.

Consider that a line is 8 characters long (including newline) and I pass
to fgets a buffer which can store at least 32 characters.

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?
 
E

Eric Sosman

[...]
The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?
If there is a '\0' in the stream, fgets will put it in the buffer
without complaint.

In light of 7.19.2p2 I don't think even that much is guaranteed.
without complaint. In my view this is a data error, not a problem with
fgets. There is no rational case for '\0' in a text stream.

... which doesn't stop people from trying to handle screwball
formats with text streams, even though there's no certainty that the
attempts will succeed. I was just trying to imagine why the O.P.
was concerned about the tail end of an fgets() buffer, and wondering
whether he was attempting to deal with dodgy input.
 
F

Francis Moreau

Eric Sosman said:
As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?

I'm wondering what is the most efficient way to see if fgets() read an
entire line.

For example consider this:

char buf[16], *p;
FILE *fp;

/* initialise fp */

p = fgets(buf, sizeof(buf), fp);

from here you can do this:

/* check if the line is entirely read */
len = strlen(buf);
if (len == 15 && buf[14] != '\n') {
/* the line has not been read completely */

}

but you could also do:

buf[14] = '\n';
p = fgets(buf, sizeof(buf), fp);
if (buf[14] != '\n') {
/* the line has not been read completely */

}

which seems more efficient.

Hence my question...
 
E

Eric Sosman

Eric Sosman said:
As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?

I'm wondering what is the most efficient way to see if fgets() read an
entire line.

For example consider this:

char buf[16], *p;
FILE *fp;

/* initialise fp */

p = fgets(buf, sizeof(buf), fp);

from here you can do this:

/* check if the line is entirely read */
len = strlen(buf);
if (len == 15&& buf[14] != '\n') {
/* the line has not been read completely */

}

Possibly simpler, certainly briefer:

if (strchr(buf, '\n') == NULL) {
// incomplete line (or missing '\n' at EOF)
}
but you could also do:

buf[14] = '\n';
p = fgets(buf, sizeof(buf), fp);
if (buf[14] != '\n') {
/* the line has not been read completely */

... or consisted of thirteen characters plus '\n', and fgets()
dutifully set buf[14] = '\0'.
}

which seems more efficient.

Two thoughts: First, I/O is many orders of magnitude slower
than the CPU, so saving a few milliquavers probably just gets you
back to the idle loop sooner. Second, if it's really important to
get the answer ASAP you may be better off using getc() in a loop
and testing directly, rather than calling fgets() and then using
CSI forensics to post-analyze what it did.

It Would Be Nice If fgets() returned something more than a
single bit's worth of information, like the number of bytes read,
say, or a pointer to the '\0'. Unfortunately, the less-helpful
interface was already well-established before standardization got
underway, and (like some other features of the library) we're all
stuck with it.

Accept my best wishes for the holiday season, to you and yours
and all your vermiform appendices.
 
F

Francis Moreau

Eric Sosman said:
Eric Sosman said:
On 12/17/2010 5:26 AM, Francis Moreau wrote:
Hello,

I think this is undefined behaviour, but I prefer asking just in case

I'm missing something.

Consider that a line is 8 characters long (including newline) and I pass
to fgets a buffer which can store at least 32 characters.

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?

I'm wondering what is the most efficient way to see if fgets() read an
entire line.

For example consider this:

char buf[16], *p;
FILE *fp;

/* initialise fp */

p = fgets(buf, sizeof(buf), fp);

from here you can do this:

/* check if the line is entirely read */
len = strlen(buf);
if (len == 15&& buf[14] != '\n') {
/* the line has not been read completely */

}

Possibly simpler, certainly briefer:

if (strchr(buf, '\n') == NULL) {
// incomplete line (or missing '\n' at EOF)
}
Yes.
}

but you could also do:

buf[14] = '\n';
p = fgets(buf, sizeof(buf), fp);
if (buf[14] != '\n') {
/* the line has not been read completely */

... or consisted of thirteen characters plus '\n', and fgets()
dutifully set buf[14] = '\0'.

You're right, the test should had been

if (buf[14] && buf[14] != '\n') {
...
Two thoughts: First, I/O is many orders of magnitude slower
than the CPU, so saving a few milliquavers probably just gets you
back to the idle loop sooner. Second, if it's really important to
get the answer ASAP you may be better off using getc() in a loop
and testing directly, rather than calling fgets() and then using
CSI forensics to post-analyze what it did.

You're still right that it doesn't make any differences but I was trying
to have good taste when writting this nothing more.

And there're probably ton of developpers that encouter this, so there's
probably a well known pattern to check this.
It Would Be Nice If fgets() returned something more than a
single bit's worth of information, like the number of bytes read,
say, or a pointer to the '\0'. Unfortunately, the less-helpful
interface was already well-established before standardization got
underway, and (like some other features of the library) we're all
stuck with it.

Accept my best wishes for the holiday season, to you and yours
and all your vermiform appendices.

Thanks Mister Sosman, I'll take care of my vermiform appendix, don't
worry ;)

Happy Christmas.
 
T

Tim Rentsch

Eric Sosman said:
As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

I don't see any text in the Standard that would allow an
implementation to do this. I believe Mark Wooding's reading
on this question is the correct one here.
 
E

Eric Sosman

I don't see any text in the Standard that would allow an
implementation to do this. I believe Mark Wooding's reading
on this question is the correct one here.

Can you find any text that forbids it?
 
K

Keith Thompson

Eric Sosman said:
Can you find any text that forbids it?

6.2.4p2:

An object exists, has a constant address, and retains its
last-stored value throughout its lifetime.

This applies to each element of the buffer past the ones into which
fgets() stores values.
 
E

Eric Sosman

6.2.4p2:

An object exists, has a constant address, and retains its
last-stored value throughout its lifetime.

This applies to each element of the buffer past the ones into which
fgets() stores values.

Doesn't seem convincing, because it also applies to the elements
fgets() *does* store to.

Here's a synopsis of my thinking, for what it's worth: When
you call fgets(buff, 100, stream) you give fgets() permission to
write on all of buff[0] through buff[99] -- and, given enough input,
fgets() will certainly do so. Aside from the special handling of
end-of-input, I see no language in the Standard that says buff[42]
must be left untouched, and no description of circumstances that
would put buff[42] off-limits. Do you see such language?
 
K

Keith Thompson

Eric Sosman said:
Doesn't seem convincing, because it also applies to the elements
fgets() *does* store to.

No, it doesn't; those elements get *new* last-stored values.
Here's a synopsis of my thinking, for what it's worth: When
you call fgets(buff, 100, stream) you give fgets() permission to
write on all of buff[0] through buff[99] -- and, given enough input,
fgets() will certainly do so. Aside from the special handling of
end-of-input, I see no language in the Standard that says buff[42]
must be left untouched, and no description of circumstances that
would put buff[42] off-limits. Do you see such language?

No, but I don't think such language is necessary.

Any library function does what it's specified to do, and no more
(at least no more that's visible). sqrt() doesn't write to stdout,
for example, even though nothing in the standard explicitly says
it doesn't.
 
E

Edward A. Falk

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

Any time you're working with computers, and the word 'assume' comes up,
I hope all the hair stands up on the back of your neck.

No, you may not assume this, unless the man page explicitly says so.
Then it wouldn't be an assumption.
 
K

Keith Thompson

Any time you're working with computers, and the word 'assume' comes up,
I hope all the hair stands up on the back of your neck.

No, you may not assume this, unless the man page explicitly says so.
Then it wouldn't be an assumption.

I disagree.

First, what a man page says isn't necessarily relevant; fgets()
is defined by the ISO C Standard. (If a particular man page says
something that's inconsistent with the standard, it may be either
an error in the man page or an admission of non-conformance.)

My own interpretation of the C standard is, in the absence of
I/O errors, fgets() may not write past the characters that it's
specified to read -- any more than it may write to any other object.

As I mentioned elsethread, I feel perfectly comfortable *assuming*
that sqrt() doesn't write to stdout, even though there's no explicit
statement that it doesn't do so.
 
T

Tim Rentsch

Eric Sosman said:
Can you find any text that forbids it?

I believe there is no reason to look for any. Library
functions _must_ do what their respective specifications
require them to do, and _may_ do what the Standard
explicitly grants them license to do. This principle
also holds more generally; for example, storing into
a structure member is explicitly permitted to change
the contents of padding bytes. So unless there is some
text that explicitly provides for changing some array
elements past those that were read, the implementation
is obliged not to change them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top