Why this works???

K

Keith Thompson

pete said:
My other interpretation of what


could possibley mean:
is that the below shown program,
for an implementation which requires
a terminating new-line character on the last line,
that "hello, world\n" is the last line, and that "Good bye" isn't.

#include <stdio.h>
int main(void) {
printf("hello, world\n");
printf("Good bye");
return 0;
}

If "hello, world\n" is considered to be the last output line
of the above program, on implementations which require
a terminating new-line character on the last line,
then the behavior of the program is merely implementation defined,
rather than undefined, as I originally claimed in the subject line.

I don't see how you can justify that interpretation. You're assuming
that, for an implementation that doesn't require the trailing
new-line, a final unterminated output line *must* be discarded. Where
in the standard does it say, or even imply, this? If the
implementation added a trailing new-line itself, so the final output
consisted of the two lines "hello, world\n" and "Good bye\n", what
requirement of the standard would be violated?

The standard says a trailing new-line is (conditionally) required; it
doesn't say what happens if a program fails to meet that requirement.
It seems to me that that means the behavior is simply undefined.
 
R

Richard Heathfield

Keith Thompson said:
That's a poor analogy.

Ahem. Dem's fitin' woids, Keith - Kaz doesn't *do* poor analogies. :)

I think you may have missed Kaz's point. What he's talking about is the
difference between "well-defined" in ISO C terms and "well-defined" in
terms of the capabilities of a particular machine about whose details ISO
is deliberately (and necessarily) ignorant.

Maybe a parallel example will serve to illustrate the difference. In terms
of ISO C, we can all agree that the behaviour of the following code
fragment is thoroughly, utterly, irrevocably, and bananararily undefined:

void printat(const char *s, int x, int y, int bg, int fg)
{
unsigned char *scrbase = (unsigned char *)0xB8000000UL + 160 * y + 2 * x;
while(*s != '\0')
{
*scrbase++ = (unsigned char)*s++;
if(bg >= 0 && fg >= 0 && bg < 16 && fg < 16)
{
*scrbase = ((bg << 4) | fg);
}
++scrbase;
}
} /* Did I get this right? It's been a while. */

If, however, this code is compiled for large memory model and executed
under MS-DOS on an x86 machine in 80x25 colour text mode, the behaviour is
perfectly well-defined *on that machine*. Many DOS programs used precisely
this kind of "dirty trick" to get decent display performance by stepping
around the BIOS and going straight to video RAM. I would not be at all
surprised if you'd used this trick yourself, back in the dim and distant.
If this technique were not well-defined under the above circumstances,
there would have been a lot of very, very angry customers demanding their
money back from suppliers of such programs as Norton Utilities,
Multi-Edit, PC Tools, 1-2-3, dBase, and even Turbo C's IDE. The writers of
these programs were able to rely on the well-definedness of this technique
within the context of the dialect of C they were using - Turbo C,
Microsoft C, etc.

There are times when you need to step outside what ISO C can do. When you
do so, you *have* to rely on the well-definedness of operations and
techniques the behaviour of which ISO C does not itself define.
 
C

CBFalconer

Keith said:
.... snip ...


I don't see how you can justify that interpretation. You're
assuming that, for an implementation that doesn't require the
trailing new-line, a final unterminated output line *must* be
discarded. Where in the standard does it say, or even imply,
this? If the implementation added a trailing new-line itself,
so the final output consisted of the two lines "hello, world\n"
and "Good bye\n", what requirement of the standard would be
violated?

I think that interpretation can be deduced. Any 'line' can be
emitted without a final '\n', and will finally be actually executed
when the '\n' is actually emitted. That implies either a buffer,
or immediate (earlier) execution. I see no other possibility. Now
extend that situation to the last '\n'less line. Barring the use
of anticipators.
 
H

Harald van Dijk

I don't see how you can justify that interpretation. You're assuming
that, for an implementation that doesn't require the trailing new-line,
a final unterminated output line *must* be discarded. Where in the
standard does it say, or even imply, this?

It's an interesting and IMHO valid interpretation of
"A text stream is an ordered sequence of characters composed into /lines/,
each line consisting of zero or more characters plus a terminating new-
line character. Whether the last line requires a terminating new-line
character is implementation-defined."

"Lines" is italicised, so the above defines what a line is. If after
"a\nb\nc" is output, if the implementation documents that the last line
requires a terminating new-line character, then "c" does not match the
definition of "line". If "c" is not a line, then the lines are "a\n" and
"b\n". If the lines are "a\n" and "b\n", then the last line has a
terminating new-line character, so there's no reason the behaviour would
be undefined.
 
P

pete

No. I'm saying that "Good bye" would be the last line
on an implementation that didn't require a terminating '\n'
on the last line.
It's an interesting and IMHO valid interpretation of
"A text stream is an ordered sequence of characters
composed into /lines/,
each line consisting of zero or more characters
plus a terminating new-line character.
Whether the last line requires a terminating new-line
character is implementation-defined."

"Lines" is italicised, so the above defines what a line is.
If after "a\nb\nc" is output,
if the implementation documents that the last line
requires a terminating new-line character,
then "c" does not match the definition of "line".
If "c" is not a line, then the lines are "a\n" and "b\n".
If the lines are "a\n" and "b\n", then the last line has a
terminating new-line character,
so there's no reason the behaviour would be undefined.

That's pretty much what I thought I was thinking.
 
K

Kaz Kylheku

That's a poor analogy.  The behavior on division by zero is entirely
undefined.  Whether a text stream requires a trailing new-line is
implementation-defined, and must be documented by each implementation.

Yes. So each implementation must have a document which says:

``... a trailing newline is required.''

or

``... a trailing newline is not required.''

This is not implementation-defined /behavior/, but an implementation-
defined presence or absence of a requirement---a rather curious
animal.

The requirement belongs to the C standard, but whether or not this
requirement is ``enabled'' is determined by the implementation.

It's as if the standard's text was configurable in the following
manner:

#ifdef CONFIG_TEXT_STREAM_REQUIRES_NEWLINE
The last line written to a text stream shall be terminated by a
newline.
#endif

where the implementation controls whether this part of the text is
enabled.

But of course, implementations don't actually control the set of
requirements in the standard. The only way the above situation can be
achieved is if the standard imposes the requirement, and the
implementation either preserves it as is, or waives it.

Looking at this another way, using the standard only, what can we say
about the behavior of a program which doesn't terminate a text stream
with a newline? We know that it violates a requirement in some
implementations but not in others. Well, that's as good as undefined
behavior. If the behavior is undefined in any possible implementation,
then it's undefined.

Effectively, the standard requires text streams to be newline
terminated, but requires implementations to either document that this
requirement is waived in the local dialect, or else to document that
the requirement is upheld. But implementations can lift just about any
requirement in this way, such as the requirement that the divisor
operand in a division be non-zero, or that a dereferenced pointer be
non-null, etc.

Let's reason a little bit about how useless this is. There are two
cases: either the implementation upholds the requirement or it
doesn't. In the first case, the piece of documentation which says
``the newline is required'' is completely redundant, so it is useless
to require it. We know that it's required from the standard already!
In the second case, the documentation says ``the newline is not
required'', which is---doh---a documented extension over undefined
behavior. But there is already a blanket requirement in the standard
that extensions be documented! So it's redundant to reiterate this
requirement. Useless, again!

So the bottom line is that a statement of the form ``it is
implementation-defined whether X is required'' is just confusing
verbiage which logically means the same thing ``X is required''.
A better analogy might be:

    int main(void) { int i = 40000; return 0; }

This has well defined behavior for an implementation with
INT_MAX >= 40000, and undefined behavior for an implementation
with INT_MAX < 40000.

Not sure about that one. The worst thing that happens is that an
implementation-defined value is stored in i. Narrowing integer
conversions don't produce undefined behavior, but an implementation-
defined result. (Of course, that result can be embroiled in some
computation which can produce undefined behavior due to that result).

But how about this:

#include <limits.h>

int main(void)
{
if (INT_MAX < 40000)
*((char *) 0) = 0;
return 0;
}

This clearly has undefined behavior according to standard C, but in
specific dialects, it has a defined meaning. Those dialects would be
ones where INT_MAX >= 40000, or ones in which storing a byte through
the null pointer is a harmless operation (and documented as such).
I wouldn't call such implementations "dialects";
they're merely common variations of C.

Every implementation accepts some kind of dialect. When we write a
strictly conforming ISO C, we are also writing in a special dialect.

Programs can be defined or undefined in implementation-specific
dialects (using their notion of ``defined'' or ``undefined'', of
course).

Well-defined standard C programs are well-defined in every conforming
dialect.
 
K

Keith Thompson

Kaz Kylheku said:
Yes. So each implementation must have a document which says:

``... a trailing newline is required.''

or

``... a trailing newline is not required.''

This is not implementation-defined /behavior/, but an implementation-
defined presence or absence of a requirement---a rather curious
animal.

I agree, and I wish the standard were clearer about what this means.
Even if my interpretation is correct, I'd be happier if the standard
at least said explicitly that the behavior is undefined, rather than
leaving it undefined by default.
The requirement belongs to the C standard, but whether or not this
requirement is ``enabled'' is determined by the implementation.

It's as if the standard's text was configurable in the following
manner:

#ifdef CONFIG_TEXT_STREAM_REQUIRES_NEWLINE
The last line written to a text stream shall be terminated by a
newline.
#endif

where the implementation controls whether this part of the text is
enabled.

Right, except that there's no CONFIG_TEXT_STREAM_REQUIRES_NEWLINE
macro that a program can test. I suppose, you were speaking
figuratively, but if the implementation has to define it anyway, it
might be convenient to have it available for testing.

[...]
Let's reason a little bit about how useless this is. There are two
cases: either the implementation upholds the requirement or it
doesn't. In the first case, the piece of documentation which says
``the newline is required'' is completely redundant, so it is useless
to require it. We know that it's required from the standard already!
In the second case, the documentation says ``the newline is not
required'', which is---doh---a documented extension over undefined
behavior. But there is already a blanket requirement in the standard
that extensions be documented! So it's redundant to reiterate this
requirement. Useless, again!

I seem to have ignored the sentence preceding the one we're talking
about. Here's a bit more context:

A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined.

One possible interpretation is that the a text stream is *either*:
an ordered ... plus a terminating new-line character
*or*:
an ordered ... plus a terminating new-line character, except that
the terminating new-line character is optional for the last line.

I suspect that this was the actual intent, but if so it could have
been phrased better. In particular, the second sentence seems to
contradict the first rather than qualifying it.
So the bottom line is that a statement of the form ``it is
implementation-defined whether X is required'' is just confusing
verbiage which logically means the same thing ``X is required''.


Not sure about that one. The worst thing that happens is that an
implementation-defined value is stored in i. Narrowing integer
conversions don't produce undefined behavior, but an implementation-
defined result. (Of course, that result can be embroiled in some
computation which can produce undefined behavior due to that result).

Whoops, I momentarily forgot that overflow on a conversion doesn't
cause UB. Here's a better example of what I meant:

int main(void) { int i = 32767; i + 1; return 0; }

where an overflow in the addition does cause UB.
But how about this:

#include <limits.h>

int main(void)
{
if (INT_MAX < 40000)
*((char *) 0) = 0;
return 0;
}

This clearly has undefined behavior according to standard C, but in
specific dialects, it has a defined meaning. Those dialects would be
ones where INT_MAX >= 40000, or ones in which storing a byte through
the null pointer is a harmless operation (and documented as such).


Every implementation accepts some kind of dialect. When we write a
strictly conforming ISO C, we are also writing in a special dialect.

Since the standard doesn't say what a "dialect" is, I don't think
we're going to settle this one definitively.
Programs can be defined or undefined in implementation-specific
dialects (using their notion of ``defined'' or ``undefined'', of
course).

Well-defined standard C programs are well-defined in every conforming
dialect.

It's the behavior of a program that's defined or undefined, not the
program itself. (The standard defines the term "undefined behavior";
it doesn't define the term "undefined program".) A given program may
or may not invoke undefined behavior, depending on the environment in
which it's executed.
 
P

Peter Nilsson

The worst that can happen in C99 is that an implementation-
defined signal is raised.

C99 moved the goalposts, allowing a signal in place of
a result value.
Whoops, I momentarily forgot that overflow on a
conversion doesn't cause UB.

Since there's no way to portably handle an 'implementation-
defined signal', it's effectively UB in C99.
 
K

Keith Thompson

Peter Nilsson said:
Since there's no way to portably handle an 'implementation-
defined signal', it's effectively UB in C99.

Since there's presumably an implementation-defined way to handle an
implementation-defined signal, it's effectively implementation-defined
behavior. Not all C code is, or needs to be, portable.

Though I'm not entirely pleased with something as fundamental as
arithmetic conversions being left implementation-defined.
 
K

Keith Thompson

pete said:
=?UTF-8?q?Harald_van_D=C4=B3k?= wrote: [...]
It's an interesting and IMHO valid interpretation of
"A text stream is an ordered sequence of characters
composed into /lines/,
each line consisting of zero or more characters
plus a terminating new-line character.
Whether the last line requires a terminating new-line
character is implementation-defined."

"Lines" is italicised, so the above defines what a line is.
If after "a\nb\nc" is output,
if the implementation documents that the last line
requires a terminating new-line character,
then "c" does not match the definition of "line".
If "c" is not a line, then the lines are "a\n" and "b\n".
If the lines are "a\n" and "b\n", then the last line has a
terminating new-line character,
so there's no reason the behaviour would be undefined.

That's pretty much what I thought I was thinking.

I've posted a question to comp.std.c, subject "Last line with or
without new-line".
 
J

Joe Wright

Keith said:
pete said:
=?UTF-8?q?Harald_van_D=C4=B3k?= wrote: [...]
It's an interesting and IMHO valid interpretation of
"A text stream is an ordered sequence of characters
composed into /lines/,
each line consisting of zero or more characters
plus a terminating new-line character.
Whether the last line requires a terminating new-line
character is implementation-defined."

"Lines" is italicised, so the above defines what a line is.
If after "a\nb\nc" is output,
if the implementation documents that the last line
requires a terminating new-line character,
then "c" does not match the definition of "line".
If "c" is not a line, then the lines are "a\n" and "b\n".
If the lines are "a\n" and "b\n", then the last line has a
terminating new-line character,
so there's no reason the behaviour would be undefined.
That's pretty much what I thought I was thinking.

I've posted a question to comp.std.c, subject "Last line with or
without new-line".

Whose business is it? I suggest it is not C which defines 'line' but
perhaps the OS and its utilities. I run the following program..

#include <stdio.h>
int main(void) {
printf("Hello world\n");
printf("Good bye");
return 0;
}

...and redirect its output to a file, nl.txt. Now using a hex editor I
see that the second line is, in fact, not terminated with a newline. Now
I open nl.txt in an editor (edit.com?). I modify the file changing the
'H' to 'J' and save it. Now as I look at it in the hex editor I find the
editor has added the newline to the last line of the file.

I seem to remember vi will do the same thing.
 
P

Peter Nilsson

Keith Thompson said:
Since there's presumably an implementation-defined way to
handle an implementation-defined signal, it's effectively
implementation-defined behavior. Not all C code is, or
needs to be, portable.

If you're going to be lazy with conversion then why not also
be lazy about overflow?

Not all code needs to be implementation specific. Indeed,
I submit that much implementation specific code in existance
has no Good Reason to be so, unless ignorance and lazyness
are Good Reasons.
 
K

Keith Thompson

Peter Nilsson said:
If you're going to be lazy with conversion then why not also
be lazy about overflow?

Not all code needs to be implementation specific. Indeed,
I submit that much implementation specific code in existance
has no Good Reason to be so, unless ignorance and lazyness
are Good Reasons.

I think you're refuting arguments I wasn't trying to make. I was
merely disputing, in a narrow technical sense, your statement that
overflow on conversion is "effectively UB" -- though I agree that it
should be avoided as carefully as arithmetic overflow.
 
K

Keith Thompson

Joe Wright said:
Keith Thompson wrote: [...]
I've posted a question to comp.std.c, subject "Last line with or
without new-line".

Whose business is it? I suggest it is not C which defines 'line' but
perhaps the OS and its utilities.

Sorry, but the C standard does define "line"; see C99 7.19.2p2.
I run the following program..

#include <stdio.h>
int main(void) {
printf("Hello world\n");
printf("Good bye");
return 0;
}

..and redirect its output to a file, nl.txt. Now using a hex editor I
see that the second line is, in fact, not terminated with a
newline.

You probably ran it under an implementation that *doesn't* require a
new-line on the last line of a text stream. The results are exactly
what I'd expect.
Now I open nl.txt in an editor (edit.com?). I modify the file
changing the 'H' to 'J' and save it. Now as I look at it in the hex
editor I find the editor has added the newline to the last line of the
file.

I seem to remember vi will do the same thing.

The authors of those programs (which may or may not be implemented in
C) chose to transform a text file without a trailing new-line into one
with a trailing new-line. And how is this relevant to text streams in
C?
 
S

santosh

Keith said:
I think you mean to ask whether its behavior is undefined. In some
circumstances, I believe it is.


Not quite. If the implementation requires a terminating new-line
character on the last line of a textstream, *and* if the putchar('\n')
call fails (regardless of whether putchar('X') fails or not), then the
behavior is undefined.

For example, it could create a malformed output file, or leave an
output device in an inconsistent physical state. This is true even if
the program tried *really hard* to terminate the file properly.

And the undefined status could mean that the subsequent 'return 0'
breaks horribly, so it's not even guaranteed to return successful
termination status.
 
C

CBFalconer

pete said:
.... snip ...

The standard doesn't specify whether putchar('X') returns 'X'
or EOF. The standard doesn't specify whether putchar('\n')
returns '\n' or EOF.

Yes it does. From N869:

7.19.7.9 The putchar function

Synopsis
[#1]
#include <stdio.h>
int putchar(int c);

Description

[#2] The putchar function is equivalent to putc with the
second argument stdout.

Returns

[#3] The putchar function returns the character written. If
a write error occurs, the error indicator for the stream is
set and putchar returns EOF.
 
H

Harald van Dijk

Yes it does.

No, it doesn't.
From N869:

[#3] The putchar function returns the character written. If a write
error occurs, the error indicator for the stream is set and putchar
returns EOF.

So, does putchar('X') return 'X' or EOF? This depends on whether a write
error occurs. Whether a write error occurs is not specified by the
standard, so neither is the return value.
 
C

CBFalconer

Harald said:
CBFalconer said:
pete wrote:

... snip ...


Yes it does.

No, it doesn't.
From N869:

[#3] The putchar function returns the character written. If a write
error occurs, the error indicator for the stream is set and putchar
returns EOF.

So, does putchar('X') return 'X' or EOF? This depends on whether a
write error occurs. Whether a write error occurs is not specified by
the standard, so neither is the return value.

Oh come on. If it succeeds (no error) it returns 'X'. If it fails
(error) it returns EOF. What's the problem?
 
P

Philip Potter

CBFalconer said:
Harald said:
CBFalconer said:
pete wrote:

... snip ...

The standard doesn't specify whether putchar('X') returns 'X'
or EOF. The standard doesn't specify whether putchar('\n')
returns '\n' or EOF.
Yes it does.
No, it doesn't.
From N869:

[#3] The putchar function returns the character written. If a write
error occurs, the error indicator for the stream is set and putchar
returns EOF.
So, does putchar('X') return 'X' or EOF? This depends on whether a
write error occurs. Whether a write error occurs is not specified by
the standard, so neither is the return value.

Oh come on. If it succeeds (no error) it returns 'X'. If it fails
(error) it returns EOF. What's the problem?

If you go back and read the messages directly upthread, it becomes clear:

The problem is that the Standard does not require putchar() to succeed.
(I'm not sure why this was phrased in terms of return value; perhaps to
avoid ambiguity and to stick to Standard language.) This means that
pete's program may not be strictly conforming because it is not
guaranteed to terminate the last output line with '\n'. See message
<[email protected]> (directly upthread) for details.
 
H

Harald van Dijk

Oh come on. If it succeeds (no error) it returns 'X'. If it fails
(error) it returns EOF. What's the problem?

That's exactly the point: there's no guarantee that putchar won't fail.
To quote pete's message:
Is new.c defined?
The standard doesn't specify whether putchar('X') returns 'X' or EOF.
The standard doesn't specify whether putchar('\n') returns '\n' or EOF.
Can this program be said to be guaranteed to do anything at all?

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
putchar('X');
putchar('\n');
return 0;
}

/* END new.c */

The idea is that because the standard doesn't specify whether
putchar('X') succeeds, and doesn't specify whether putchar('\n')
succeeds, it's unspecified whether an incomplete line is printed. When an
unterminated line is printed on an implementation that requires a line
terminator for each line, the behaviour is undefined by omission. When
it's unspecified whether the behaviour is defined, the behaviour is
effectively already undefined.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top