Line input and implementation-defined behaviour

  • Thread starter Enrico `Trippo' Porreca
  • Start date
E

Enrico `Trippo' Porreca

Both K&R book and Steve Summit's tutorial define a getline() function
correctly testing the return value of getchar() against EOF.

I know that getchar() returns EOF or the character value cast to
unsigned char.

Since char may be signed (and if so, the return value of getchar() would
be outside its range), doesn't the commented line in the following code
produce implementation-defined behaviour?

char s[SIZE];
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
s = c; /* ??? */
i++;
}

s = '\0';

If this is indeed implementation defined, is there any solution?
 
S

Simon Biber

Enrico `Trippo' Porreca said:
Since char may be signed (and if so, the return value of getchar()
would be outside its range), doesn't the commented line in the
following code produce implementation-defined behaviour?

Almost. If a character is read whose code is out of the range of
signed char, it produces an implementation-defined result, or an
implementation-defined signal is raised. This is not quite as bad
as implementation-defined behaviour, but almost.
char s[SIZE];
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
s = c; /* ??? */
i++;
}

s = '\0';

If this is indeed implementation defined, is there any solution?


If char is signed, and the value of the character is outside the
range of signed char, then you have an out-of-range conversion to
a signed integer type, so: "either the result is implementation-defined
or an implementation-defined signal is raised." (C99 6.3.1.3#3)

However, because this is such an incredibly common operation in
existing C code, an implementor would be absolutely idiotic to
define this to have any undesired effects.
 
E

Enrico `Trippo' Porreca

Simon said:
char s[SIZE];
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
s = c; /* ??? */
i++;
}

s = '\0';

If this is indeed implementation defined, is there any solution?


If char is signed, and the value of the character is outside the
range of signed char, then you have an out-of-range conversion to
a signed integer type, so: "either the result is implementation-defined
or an implementation-defined signal is raised." (C99 6.3.1.3#3)

However, because this is such an incredibly common operation in
existing C code, an implementor would be absolutely idiotic to
define this to have any undesired effects.


I agree, but AFAIK the implementor is allowed to be idiot...
Am I right?

Is the following a plausible solution (i.e. without any trap
representation or type conversion or something-defined behaviour problem)?

char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t = c; /* ??? */
i++;
}

s = '\0';
 
S

Simon Biber

Enrico `Trippo' Porreca said:
I agree, but AFAIK the implementor is allowed to be idiot...
Am I right?

Yes, but trust me, anyone who fouled up the char<->int conversion
would break a large proportion of existing code that is considered
to be completely portable. Therefore their implementation would
not sell.

Consider the <ctype.h> functions, which require that the input is
an int whose value is within the range of unsigned char. That is
why we suggest that people cast to unsigned char like this:
char *p, s[] = "hello";
for(p = s; *p; p++)
*p = toupper((unsigned char)*p);
Now if the value of *p was negative, now when converted to unsigned
char it is positive and outside the range of signed char. So this
could theoretically be outside the range of int, if int and signed
char have the same range. Therefore you have the same situation in
reverse - unsigned char to int conversion is not guaranteed to be
within range.
Is the following a plausible solution (i.e. without any trap
representation or type conversion or something-defined behaviour
problem)?

char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t = c; /* ??? */


The assignment itself is safe, but since it places an arbitrary
representation into the elements of the array s, which are char
objects and possibly signed, it might generate a trap
representation. That is if signed char can have trap
representations. I'm not completely sure.
i++;
}

s = '\0';
 
M

Malcolm

Simon Biber said:
char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t = c; /* ??? */


s = 0;
The assignment itself is safe, but since it places an arbitrary
representation into the elements of the array s, which are char
objects and possibly signed, it might generate a trap
representation. That is if signed char can have trap
representations. I'm not completely sure.
signed chars can trap. unsigned chars are guaranteed to be able to hold
arbitrary data so cannot.
You would have to be desperately unlucky for the implementation to allow
non-chars to be read in from stdin, and then for the function to trap. The
most likely place for the trap to trigger would be the assignment s = 0,
since the compiler probably won't realise that pointer t actually points to
a buffer declared as straight char.
 
P

Peter Nilsson

Malcolm said:
Simon Biber said:
char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t = c; /* ??? */


s = 0;
The assignment itself is safe, but since it places an arbitrary
representation into the elements of the array s, which are char
objects and possibly signed, it might generate a trap
representation. That is if signed char can have trap
representations. I'm not completely sure.
signed chars can trap. unsigned chars are guaranteed to be able to hold
arbitrary data so cannot.
You would have to be desperately unlucky for the implementation to allow
non-chars to be read in from stdin, and then for the function to trap. The
most likely place for the trap to trigger would be the assignment s =

0,

0 is a value in the range of signed char, so it is not possible for a
conforming compiler to replace the contents of object s with a trap
representation.

[You can always initialise an unitialised automatic variable for instance,
even if it's uninitialised state is a trap representation.]
since the compiler probably won't realise that pointer t actually points to
a buffer declared as straight char.

You seem to be confusing 'trap representations' for 'trap'. The latter term
commonly being used for raised exceptions on many architectures. A trap
representation, in and of itself, need not raise an exception.

Indeed, whilst the standards allow signed char to have trap representations,
sections like 6.2.6.1p5 effectively say that all reads via character lvalues
are privileged. So at worst, it would seem, reading a character trap
representation will only yield an unspecified value. [Non-trapping trap
representations!]
 
M

Malcolm

Peter Nilsson said:
The most likely place for the trap to trigger would be the assignment
s = 0,


0 is a value in the range of signed char, so it is not possible for a
conforming compiler to replace the contents of object s with a trap
representation.

What I meant was that the assignment may trigger the trap, if illegal
characters are stored into the array s. This is because values from s may be
loaded into registers as chars.
Indeed, whilst the standards allow signed char to have trap
representations, sections like 6.2.6.1p5 effectively say that all reads via
character lvalues are privileged. So at worst, it would seem, reading a
character trap representation will only yield an unspecified value. [Non-
trapping trap representations!]
It seems it would be unacceptable for the line

fgets(line, sizeof line, fp);

to cause a program abort if fed an illegal character, with nothing the
programmer can do to stop it. OTOH reads are the most likely way for corrupt
data to get into the data, and the whole point of trap representations is to
close down any program that is malfunctioning.
 
E

Enrico `Trippo' Porreca

Simon said:
Yes, but trust me, anyone who fouled up the char<->int conversion
would break a large proportion of existing code that is considered
to be completely portable. Therefore their implementation would
not sell.

Uhm... So I think I should use K&R's getline(), without being too
paranoid about it...

Thanks.
 
D

Dan Pop

In said:
Almost. If a character is read whose code is out of the range of
signed char, it produces an implementation-defined result, or an
implementation-defined signal is raised. This is not quite as bad
as implementation-defined behaviour, but almost.

No implementation-defined signal is raised in C89 and I strongly doubt
that any *real* C99 implementation would do that, breaking existing C89
code.

Dan
 
S

Simon Biber

Added comp.std.c - we are discussing the effect of conversion of
an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
"either the result is implementation-defined or an
implementation-defined signal is raised."

Dan Pop said:
No implementation-defined signal is raised in C89 and I strongly doubt
that any *real* C99 implementation would do that, breaking existing C89
code.

Why was the 'implementation-defined signal' for signed integer
conversions added in C99? Was there some implementation that
required it, in order to be conforming?
 
C

Clive D. W. Feather

Simon Biber said:
Added comp.std.c - we are discussing the effect of conversion of
an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
"either the result is implementation-defined or an
implementation-defined signal is raised." [...]
Why was the 'implementation-defined signal' for signed integer
conversions added in C99? Was there some implementation that
required it, in order to be conforming?

No.

However, the point was raised - and many of us considered it a good one
- that the C89 Standard *requires* the silent generation of a nonsense
value with no easy way to detect that fact. In some programming
situations ("mission-critical code"), you'd much rather the compiler
generated code to trap this case and alert you in some way - a panic is
far better than a bad value slipping into a later calculation.

So we decided to offer this option to the compiler writer. There's no
requirement to take it, but it's available.
 
D

Douglas A. Gwyn

Clive D. W. Feather said:
However, the point was raised - and many of us considered it a good one
- that the C89 Standard *requires* the silent generation of a nonsense
value with no easy way to detect that fact. In some programming
situations ("mission-critical code"), you'd much rather the compiler
generated code to trap this case and alert you in some way - a panic is
far better than a bad value slipping into a later calculation.

Note that not everybody involved agrees with that reasoning.
In fact this is fundamentally flawed, since such conversions
can occur at translation time (within the #if constant-
expression) but the signal is an execution-time notion.
 
D

Dan Pop

In said:
Simon Biber said:
Added comp.std.c - we are discussing the effect of conversion of
an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
"either the result is implementation-defined or an
implementation-defined signal is raised." [...]
Why was the 'implementation-defined signal' for signed integer
conversions added in C99? Was there some implementation that
required it, in order to be conforming?

No.

However, the point was raised - and many of us considered it a good one
- that the C89 Standard *requires* the silent generation of a nonsense
value with no easy way to detect that fact.

C89 offers a very easy way of detecting it, where it actually matters:
compare the value before the conversion to the limits of the target type.

It also allows the detection of these limits, when they are not known at
compile time (see below).
In some programming
situations ("mission-critical code"), you'd much rather the compiler
generated code to trap this case and alert you in some way - a panic is
far better than a bad value slipping into a later calculation.

A panic is seldom desirable in mission-critical code and there is no way
to recover after the generation of such a signal without invoking
undefined behaviour. Therefore, mission-critical code has to do it the
C89 way, anyway.
So we decided to offer this option to the compiler writer. There's no
requirement to take it, but it's available.

It breaks portable C89 code that attempts to find the maximum value
that can be represented in an unknown signed integer type, say type_t:

unsigned long max = -1;

while ((type_t)max < 0 || (type_t)max != max) max >>= 1;

So, it is perfectly possible to write C89 code that is immune to
nonsensical values resulting from the conversion. There is NO way
to rewrite this code in *portable* C99.

Dan
 
L

lawrence.jones

In comp.std.c Simon Biber said:
Why was the 'implementation-defined signal' for signed integer
conversions added in C99? Was there some implementation that
required it, in order to be conforming?

Because raising an "overflow" signal is an entirely reasonable thing to
do in that situation. In C89, it wasn't entirely clear whether
"implementation-defined behavior" allowed that or not, but in C99 it's
perfectly clear that it does not, so the explicit license was added.

-Larry Jones

This sounds suspiciously like one of Dad's plots to build my character.
-- Calvin
 
P

Paul Eggert

In fact this is fundamentally flawed, since such conversions
can occur at translation time (within the #if constant-
expression) but the signal is an execution-time notion.

But doesn't the standard require a diagnostic if compile-time signed
integer overflow occurs, even in a preprocessor expression?

Perhaps the wording of the standard is flawed, but is there anything
wrong with the intent here? The intent seems to be that compile-time
overflow detection is required, and run-time overflow detection is
allowed but not required.
 
A

Al Grant

Clive D. W. Feather said:
However, the point was raised - and many of us considered it a good one
- that the C89 Standard *requires* the silent generation of a nonsense
value

No, it *requires* the silent generation of an implementation-defined
result. It does not require a nonsensical definition - implementations
can define it no more or less nonsensically than the unsigned case,
for example.
with no easy way to detect that fact. In some programming
situations ("mission-critical code"), you'd much rather the compiler
generated code to trap this case and alert you in some way - a panic is
far better than a bad value slipping into a later calculation.

In some programming situations ("mission-critical code") you'd
much rather be using a language with a coherent concept of range
types. Even if you go to the expense of implementing traps on
smaller-than-word signed types and bitfields, you still only have
a partial solution to the underlying requirement.
So we decided to offer this option to the compiler writer. There's no
requirement to take it, but it's available.

So why not offer that option for conversion-to-unsigned as well?
Or for overflow on unsigned values generally? Just the other day
I was looking at this:

typedef unsigned int Bool;

struct S {
Bool flag:1;
};

#define MYFLAG 0x8000

void f(long n, struct S *sp) {
Bool x = n & MYFLAG; /* oops */
sp->flag = x; /* oops */
}
 
A

Al Grant

Because raising an "overflow" signal is an entirely reasonable thing to
do in that situation. In C89, it wasn't entirely clear whether
"implementation-defined behavior" allowed that or not

It was entirely clear that it did.

It was also entirely clear that 3.2.1.2 did not use the phrase
"implementation-defined behavior". What it said was "if the
value cannot be represented the result is implementation-defined".
 
D

Dan Pop

In said:
Because raising an "overflow" signal is an entirely reasonable thing to
do in that situation. In C89, it wasn't entirely clear whether
"implementation-defined behavior" allowed that or not, but in C99 it's
perfectly clear that it does not, so the explicit license was added.

The C89 text is perfectly clear:

... if the value cannot be represented the result is
implementation-defined.

So, it is only *the result* that is implementation-defined, not any other
aspect of the program's behaviour.

Dan
 
K

Kevin Easton

In said:
Because raising an "overflow" signal is an entirely reasonable thing to
do in that situation. In C89, it wasn't entirely clear whether
"implementation-defined behavior" allowed that or not, but in C99 it's
perfectly clear that it does not, so the explicit license was added.

It's a pity it wasn't disabled by default, with the program
having to do something explicit to enable signal on overflow.

- Kevin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top