Line input and implementation-defined behaviour

Discussion in 'C Programming' started by Enrico `Trippo' Porreca, Sep 27, 2003.

  1. Both K&R book and Steve Summit's tutorial define a getline() function
    correctly testing the return value of getchar() against EOF.

    I know that getchar() returns EOF or the character value cast to
    unsigned char.

    Since char may be signed (and if so, the return value of getchar() would
    be outside its range), doesn't the commented line in the following code
    produce implementation-defined behaviour?

    char s[SIZE];
    int c;
    size_t i = 0;

    while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    s = c; /* ??? */
    i++;
    }

    s = '\0';

    If this is indeed implementation defined, is there any solution?

    --
    Enrico `Trippo' Porreca
     
    Enrico `Trippo' Porreca, Sep 27, 2003
    #1
    1. Advertising

  2. Enrico `Trippo' Porreca

    Simon Biber Guest

    "Enrico `Trippo' Porreca" <> wrote:
    > Since char may be signed (and if so, the return value of getchar()
    > would be outside its range), doesn't the commented line in the
    > following code produce implementation-defined behaviour?


    Almost. If a character is read whose code is out of the range of
    signed char, it produces an implementation-defined result, or an
    implementation-defined signal is raised. This is not quite as bad
    as implementation-defined behaviour, but almost.

    > char s[SIZE];
    > int c;
    > size_t i = 0;
    >
    > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    > s = c; /* ??? */
    > i++;
    > }
    >
    > s = '\0';
    >
    > If this is indeed implementation defined, is there any solution?


    If char is signed, and the value of the character is outside the
    range of signed char, then you have an out-of-range conversion to
    a signed integer type, so: "either the result is implementation-defined
    or an implementation-defined signal is raised." (C99 6.3.1.3#3)

    However, because this is such an incredibly common operation in
    existing C code, an implementor would be absolutely idiotic to
    define this to have any undesired effects.

    --
    Simon.
     
    Simon Biber, Sep 27, 2003
    #2
    1. Advertising

  3. Simon Biber wrote:
    >>char s[SIZE];
    >>int c;
    >>size_t i = 0;
    >>
    >>while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    >> s = c; /* ??? */
    >> i++;
    >>}
    >>
    >>s = '\0';
    >>
    >>If this is indeed implementation defined, is there any solution?

    >
    > If char is signed, and the value of the character is outside the
    > range of signed char, then you have an out-of-range conversion to
    > a signed integer type, so: "either the result is implementation-defined
    > or an implementation-defined signal is raised." (C99 6.3.1.3#3)
    >
    > However, because this is such an incredibly common operation in
    > existing C code, an implementor would be absolutely idiotic to
    > define this to have any undesired effects.


    I agree, but AFAIK the implementor is allowed to be idiot...
    Am I right?

    Is the following a plausible solution (i.e. without any trap
    representation or type conversion or something-defined behaviour problem)?

    char s[SIZE];
    unsigned char *t = (unsigned char *) s;
    int c;
    size_t i = 0;

    while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    t = c; /* ??? */
    i++;
    }

    s = '\0';

    --
    Enrico `Trippo' Porreca
     
    Enrico `Trippo' Porreca, Sep 27, 2003
    #3
  4. Enrico `Trippo' Porreca

    Simon Biber Guest

    "Enrico `Trippo' Porreca" <> wrote:
    > I agree, but AFAIK the implementor is allowed to be idiot...
    > Am I right?


    Yes, but trust me, anyone who fouled up the char<->int conversion
    would break a large proportion of existing code that is considered
    to be completely portable. Therefore their implementation would
    not sell.

    Consider the <ctype.h> functions, which require that the input is
    an int whose value is within the range of unsigned char. That is
    why we suggest that people cast to unsigned char like this:
    char *p, s[] = "hello";
    for(p = s; *p; p++)
    *p = toupper((unsigned char)*p);
    Now if the value of *p was negative, now when converted to unsigned
    char it is positive and outside the range of signed char. So this
    could theoretically be outside the range of int, if int and signed
    char have the same range. Therefore you have the same situation in
    reverse - unsigned char to int conversion is not guaranteed to be
    within range.

    > Is the following a plausible solution (i.e. without any trap
    > representation or type conversion or something-defined behaviour
    > problem)?
    >
    > char s[SIZE];
    > unsigned char *t = (unsigned char *) s;
    > int c;
    > size_t i = 0;
    >
    > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    > t = c; /* ??? */


    The assignment itself is safe, but since it places an arbitrary
    representation into the elements of the array s, which are char
    objects and possibly signed, it might generate a trap
    representation. That is if signed char can have trap
    representations. I'm not completely sure.

    > i++;
    > }
    >
    > s = '\0';


    --
    Simon.
     
    Simon Biber, Sep 27, 2003
    #4
  5. Enrico `Trippo' Porreca

    Malcolm Guest

    "Simon Biber" <> wrote in message
    >
    > > char s[SIZE];
    > > unsigned char *t = (unsigned char *) s;
    > > int c;
    > > size_t i = 0;
    > >
    > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    > > t = c; /* ??? */


    s = 0;
    >
    > The assignment itself is safe, but since it places an arbitrary
    > representation into the elements of the array s, which are char
    > objects and possibly signed, it might generate a trap
    > representation. That is if signed char can have trap
    > representations. I'm not completely sure.
    >

    signed chars can trap. unsigned chars are guaranteed to be able to hold
    arbitrary data so cannot.
    You would have to be desperately unlucky for the implementation to allow
    non-chars to be read in from stdin, and then for the function to trap. The
    most likely place for the trap to trigger would be the assignment s = 0,
    since the compiler probably won't realise that pointer t actually points to
    a buffer declared as straight char.
     
    Malcolm, Sep 27, 2003
    #5
  6. "Malcolm" <> wrote in message
    news:bl52k9$ure$...
    >
    > "Simon Biber" <> wrote in message
    > >
    > > > char s[SIZE];
    > > > unsigned char *t = (unsigned char *) s;
    > > > int c;
    > > > size_t i = 0;
    > > >
    > > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    > > > t = c; /* ??? */

    >
    > s = 0;
    > >
    > > The assignment itself is safe, but since it places an arbitrary
    > > representation into the elements of the array s, which are char
    > > objects and possibly signed, it might generate a trap
    > > representation. That is if signed char can have trap
    > > representations. I'm not completely sure.
    > >

    > signed chars can trap. unsigned chars are guaranteed to be able to hold
    > arbitrary data so cannot.
    > You would have to be desperately unlucky for the implementation to allow
    > non-chars to be read in from stdin, and then for the function to trap. The
    > most likely place for the trap to trigger would be the assignment s =

    0,

    0 is a value in the range of signed char, so it is not possible for a
    conforming compiler to replace the contents of object s with a trap
    representation.

    [You can always initialise an unitialised automatic variable for instance,
    even if it's uninitialised state is a trap representation.]

    > since the compiler probably won't realise that pointer t actually points

    to
    > a buffer declared as straight char.


    You seem to be confusing 'trap representations' for 'trap'. The latter term
    commonly being used for raised exceptions on many architectures. A trap
    representation, in and of itself, need not raise an exception.

    Indeed, whilst the standards allow signed char to have trap representations,
    sections like 6.2.6.1p5 effectively say that all reads via character lvalues
    are privileged. So at worst, it would seem, reading a character trap
    representation will only yield an unspecified value. [Non-trapping trap
    representations!]

    --
    Peter
     
    Peter Nilsson, Sep 28, 2003
    #6
  7. Enrico `Trippo' Porreca

    Malcolm Guest

    "Peter Nilsson" <> wrote in message
    >
    > > The most likely place for the trap to trigger would be the assignment
    > > s = 0,

    >
    > 0 is a value in the range of signed char, so it is not possible for a
    > conforming compiler to replace the contents of object s with a trap
    > representation.
    >

    What I meant was that the assignment may trigger the trap, if illegal
    characters are stored into the array s. This is because values from s may be
    loaded into registers as chars.
    >
    > Indeed, whilst the standards allow signed char to have trap
    > representations, sections like 6.2.6.1p5 effectively say that all reads

    via
    > character lvalues are privileged. So at worst, it would seem, reading a
    > character trap representation will only yield an unspecified value. [Non-
    > trapping trap representations!]
    >

    It seems it would be unacceptable for the line

    fgets(line, sizeof line, fp);

    to cause a program abort if fed an illegal character, with nothing the
    programmer can do to stop it. OTOH reads are the most likely way for corrupt
    data to get into the data, and the whole point of trap representations is to
    close down any program that is malfunctioning.
     
    Malcolm, Sep 28, 2003
    #7
  8. Simon Biber wrote:
    >> I agree, but AFAIK the implementor is allowed to be idiot...
    >> Am I right?

    >
    > Yes, but trust me, anyone who fouled up the char<->int conversion
    > would break a large proportion of existing code that is considered
    > to be completely portable. Therefore their implementation would
    > not sell.


    Uhm... So I think I should use K&R's getline(), without being too
    paranoid about it...

    Thanks.

    --
    Enrico `Trippo' Porreca
     
    Enrico `Trippo' Porreca, Sep 28, 2003
    #8
  9. Enrico `Trippo' Porreca

    CBFalconer Guest

    Enrico `Trippo' Porreca wrote:
    > Simon Biber wrote:
    >
    > >> I agree, but AFAIK the implementor is allowed to be idiot...
    > >> Am I right?

    > >
    > > Yes, but trust me, anyone who fouled up the char<->int conversion
    > > would break a large proportion of existing code that is considered
    > > to be completely portable. Therefore their implementation would
    > > not sell.

    >
    > Uhm... So I think I should use K&R's getline(), without being too
    > paranoid about it...


    Consider ggets, available at:

    <http://cbfalconer.home.att.net/download/>

    which has the convenience of gets without the insecurities.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
     
    CBFalconer, Sep 28, 2003
    #9
  10. Enrico `Trippo' Porreca

    Dan Pop Guest

    In <3f75cf48$0$4189$> "Simon Biber" <> writes:

    >"Enrico `Trippo' Porreca" <> wrote:
    >> Since char may be signed (and if so, the return value of getchar()
    >> would be outside its range), doesn't the commented line in the
    >> following code produce implementation-defined behaviour?

    >
    >Almost. If a character is read whose code is out of the range of
    >signed char, it produces an implementation-defined result, or an
    >implementation-defined signal is raised. This is not quite as bad
    >as implementation-defined behaviour, but almost.


    No implementation-defined signal is raised in C89 and I strongly doubt
    that any *real* C99 implementation would do that, breaking existing C89
    code.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Sep 29, 2003
    #10
  11. Enrico `Trippo' Porreca

    Simon Biber Guest

    Added comp.std.c - we are discussing the effect of conversion of
    an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
    "either the result is implementation-defined or an
    implementation-defined signal is raised."

    "Dan Pop" <> wrote:
    > "Simon Biber" <> writes:
    > >Almost. If a character is read whose code is out of the range of
    > >signed char, it produces an implementation-defined result, or an
    > >implementation-defined signal is raised. This is not quite as bad
    > >as implementation-defined behaviour, but almost.

    >
    > No implementation-defined signal is raised in C89 and I strongly doubt
    > that any *real* C99 implementation would do that, breaking existing C89
    > code.


    Why was the 'implementation-defined signal' for signed integer
    conversions added in C99? Was there some implementation that
    required it, in order to be conforming?

    --
    Simon.
     
    Simon Biber, Sep 29, 2003
    #11
  12. In article <3f789d28$0$26924$>, Simon Biber
    <> writes
    >Added comp.std.c - we are discussing the effect of conversion of
    >an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
    > "either the result is implementation-defined or an
    > implementation-defined signal is raised."

    [...]
    >Why was the 'implementation-defined signal' for signed integer
    >conversions added in C99? Was there some implementation that
    >required it, in order to be conforming?


    No.

    However, the point was raised - and many of us considered it a good one
    - that the C89 Standard *requires* the silent generation of a nonsense
    value with no easy way to detect that fact. In some programming
    situations ("mission-critical code"), you'd much rather the compiler
    generated code to trap this case and alert you in some way - a panic is
    far better than a bad value slipping into a later calculation.

    So we decided to offer this option to the compiler writer. There's no
    requirement to take it, but it's available.

    --
    Clive D.W. Feather, writing for himself | Home: <>
    Tel: +44 20 8371 1138 (work) | Web: <http://www.davros.org>
    Fax: +44 870 051 9937 | Work: <>
    Written on my laptop; please observe the Reply-To address
     
    Clive D. W. Feather, Sep 30, 2003
    #12
  13. "Clive D. W. Feather" wrote:
    > Simon Biber <> writes
    > >Added comp.std.c - we are discussing the effect of conversion of
    > >an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
    > > "either the result is implementation-defined or an
    > > implementation-defined signal is raised."
    > >Why was the 'implementation-defined signal' for signed integer
    > >conversions added in C99? Was there some implementation that
    > >required it, in order to be conforming?

    > However, the point was raised - and many of us considered it a good one
    > - that the C89 Standard *requires* the silent generation of a nonsense
    > value with no easy way to detect that fact. In some programming
    > situations ("mission-critical code"), you'd much rather the compiler
    > generated code to trap this case and alert you in some way - a panic is
    > far better than a bad value slipping into a later calculation.


    Note that not everybody involved agrees with that reasoning.
    In fact this is fundamentally flawed, since such conversions
    can occur at translation time (within the #if constant-
    expression) but the signal is an execution-time notion.
     
    Douglas A. Gwyn, Sep 30, 2003
    #13
  14. Enrico `Trippo' Porreca

    Dan Pop Guest

    In <iSuee7DwZTe$> "Clive D. W. Feather" <> writes:

    >In article <3f789d28$0$26924$>, Simon Biber
    ><> writes
    >>Added comp.std.c - we are discussing the effect of conversion of
    >>an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
    >> "either the result is implementation-defined or an
    >> implementation-defined signal is raised."

    >[...]
    >>Why was the 'implementation-defined signal' for signed integer
    >>conversions added in C99? Was there some implementation that
    >>required it, in order to be conforming?

    >
    >No.
    >
    >However, the point was raised - and many of us considered it a good one
    >- that the C89 Standard *requires* the silent generation of a nonsense
    >value with no easy way to detect that fact.


    C89 offers a very easy way of detecting it, where it actually matters:
    compare the value before the conversion to the limits of the target type.

    It also allows the detection of these limits, when they are not known at
    compile time (see below).

    >In some programming
    >situations ("mission-critical code"), you'd much rather the compiler
    >generated code to trap this case and alert you in some way - a panic is
    >far better than a bad value slipping into a later calculation.


    A panic is seldom desirable in mission-critical code and there is no way
    to recover after the generation of such a signal without invoking
    undefined behaviour. Therefore, mission-critical code has to do it the
    C89 way, anyway.

    >So we decided to offer this option to the compiler writer. There's no
    >requirement to take it, but it's available.


    It breaks portable C89 code that attempts to find the maximum value
    that can be represented in an unknown signed integer type, say type_t:

    unsigned long max = -1;

    while ((type_t)max < 0 || (type_t)max != max) max >>= 1;

    So, it is perfectly possible to write C89 code that is immune to
    nonsensical values resulting from the conversion. There is NO way
    to rewrite this code in *portable* C99.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Sep 30, 2003
    #14
  15. Enrico `Trippo' Porreca

    Guest

    In comp.std.c Simon Biber <> wrote:
    >
    > Why was the 'implementation-defined signal' for signed integer
    > conversions added in C99? Was there some implementation that
    > required it, in order to be conforming?


    Because raising an "overflow" signal is an entirely reasonable thing to
    do in that situation. In C89, it wasn't entirely clear whether
    "implementation-defined behavior" allowed that or not, but in C99 it's
    perfectly clear that it does not, so the explicit license was added.

    -Larry Jones

    This sounds suspiciously like one of Dad's plots to build my character.
    -- Calvin
     
    , Sep 30, 2003
    #15
  16. Enrico `Trippo' Porreca

    Paul Eggert Guest

    At Tue, 30 Sep 2003 14:52:05 GMT, "Douglas A. Gwyn" <> writes:

    > In fact this is fundamentally flawed, since such conversions
    > can occur at translation time (within the #if constant-
    > expression) but the signal is an execution-time notion.


    But doesn't the standard require a diagnostic if compile-time signed
    integer overflow occurs, even in a preprocessor expression?

    Perhaps the wording of the standard is flawed, but is there anything
    wrong with the intent here? The intent seems to be that compile-time
    overflow detection is required, and run-time overflow detection is
    allowed but not required.
     
    Paul Eggert, Sep 30, 2003
    #16
  17. Enrico `Trippo' Porreca

    Al Grant Guest

    "Clive D. W. Feather" <> wrote in message news:<iSuee7DwZTe$>...
    > However, the point was raised - and many of us considered it a good one
    > - that the C89 Standard *requires* the silent generation of a nonsense
    > value


    No, it *requires* the silent generation of an implementation-defined
    result. It does not require a nonsensical definition - implementations
    can define it no more or less nonsensically than the unsigned case,
    for example.

    > with no easy way to detect that fact. In some programming
    > situations ("mission-critical code"), you'd much rather the compiler
    > generated code to trap this case and alert you in some way - a panic is
    > far better than a bad value slipping into a later calculation.


    In some programming situations ("mission-critical code") you'd
    much rather be using a language with a coherent concept of range
    types. Even if you go to the expense of implementing traps on
    smaller-than-word signed types and bitfields, you still only have
    a partial solution to the underlying requirement.

    > So we decided to offer this option to the compiler writer. There's no
    > requirement to take it, but it's available.


    So why not offer that option for conversion-to-unsigned as well?
    Or for overflow on unsigned values generally? Just the other day
    I was looking at this:

    typedef unsigned int Bool;

    struct S {
    Bool flag:1;
    };

    #define MYFLAG 0x8000

    void f(long n, struct S *sp) {
    Bool x = n & MYFLAG; /* oops */
    sp->flag = x; /* oops */
    }
     
    Al Grant, Oct 1, 2003
    #17
  18. Enrico `Trippo' Porreca

    Al Grant Guest

    wrote in message news:<>...
    > In comp.std.c Simon Biber <> wrote:
    > > Why was the 'implementation-defined signal' for signed integer
    > > conversions added in C99? Was there some implementation that
    > > required it, in order to be conforming?

    >
    > Because raising an "overflow" signal is an entirely reasonable thing to
    > do in that situation. In C89, it wasn't entirely clear whether
    > "implementation-defined behavior" allowed that or not


    It was entirely clear that it did.

    It was also entirely clear that 3.2.1.2 did not use the phrase
    "implementation-defined behavior". What it said was "if the
    value cannot be represented the result is implementation-defined".
     
    Al Grant, Oct 1, 2003
    #18
  19. Enrico `Trippo' Porreca

    Dan Pop Guest

    In <> writes:

    >In comp.std.c Simon Biber <> wrote:
    >>
    >> Why was the 'implementation-defined signal' for signed integer
    >> conversions added in C99? Was there some implementation that
    >> required it, in order to be conforming?

    >
    >Because raising an "overflow" signal is an entirely reasonable thing to
    >do in that situation. In C89, it wasn't entirely clear whether
    >"implementation-defined behavior" allowed that or not, but in C99 it's
    >perfectly clear that it does not, so the explicit license was added.


    The C89 text is perfectly clear:

    ... if the value cannot be represented the result is
    implementation-defined.

    So, it is only *the result* that is implementation-defined, not any other
    aspect of the program's behaviour.

    Dan
    --
    Dan Pop
    DESY Zeuthen, RZ group
    Email:
     
    Dan Pop, Oct 1, 2003
    #19
  20. Enrico `Trippo' Porreca

    Kevin Easton Guest

    In comp.lang.c wrote:
    > In comp.std.c Simon Biber <> wrote:
    >>
    >> Why was the 'implementation-defined signal' for signed integer
    >> conversions added in C99? Was there some implementation that
    >> required it, in order to be conforming?

    >
    > Because raising an "overflow" signal is an entirely reasonable thing to
    > do in that situation. In C89, it wasn't entirely clear whether
    > "implementation-defined behavior" allowed that or not, but in C99 it's
    > perfectly clear that it does not, so the explicit license was added.


    It's a pity it wasn't disabled by default, with the program
    having to do something explicit to enable signal on overflow.

    - Kevin.
     
    Kevin Easton, Oct 1, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Oodini
    Replies:
    1
    Views:
    1,846
    Keith Thompson
    Sep 27, 2005
  2. Ioannis Vranos

    Implementation-defined behaviour

    Ioannis Vranos, Mar 28, 2008, in forum: C Programming
    Replies:
    56
    Views:
    1,253
  3. Michael Tsang
    Replies:
    32
    Views:
    1,163
    Richard Bos
    Mar 1, 2010
  4. Michael Tsang
    Replies:
    54
    Views:
    1,241
    Phil Carmody
    Mar 30, 2010
  5. Markus Dehmann
    Replies:
    19
    Views:
    263
    Anno Siegel
    Mar 27, 2006
Loading...

Share This Page