Type of a string literal

Discussion in 'C Programming' started by Old Wolf, Dec 19, 2013.

  1. Old Wolf

    Old Wolf Guest

    C99 6.4.5/5 seems to say that string literals have type char[] .

    However, the version of GCC I have installed fails to compile the following
    code, giving the error
    foo.c:6: error: assignment of read-only location

    This happens with "-ansi" and with "-std=c99" , with or without the
    other usual warning switches.

    Is this a GCC bug or am I misinterpreting the standard?


    #include <stdio.h>

    int main(void)
    {
    if ( 0 )
    "abc"[0] = 1;

    return 0;
    }
     
    Old Wolf, Dec 19, 2013
    #1
    1. Advertisements

  2. .... or of type wchar_t.
    6.4.5/6 says "If the program attempts to modify such an array, the
    behavior is undefined" so the issue is whether your program attempts to
    modify the array. It certainly doesn't look like it:
    I get the same error if I move the assignment into a static function
    which is never called. gcc then both complains about the assignment and
    tells me that the function is never called.

    It looks to me like gcc is overstepping the mark. clang is happy with
    it.
     
    Ben Bacarisse, Dec 19, 2013
    #2
    1. Advertisements

  3. As I understand it, in K&R C strings constants were writable.

    While you won't normally do it that way, you might have a pointer to
    one, and write over the constant using that.

    ANSI removed that ability, but compilers should still support it given
    the appropriate option.

    Or maybe you are suggesting that the paragraph quoted should have
    included the 'const' qualifier.

    On the other hand, inside an if(0) it seems a little mean of the
    compiler to disallow it. Could be a warning, though.

    -- glen
     
    glen herrmannsfeldt, Dec 19, 2013
    #3
  4. Old Wolf

    James Kuyper Guest

    It actually doesn't specify the type of string literals directly. It
    does specify that "The multibyte character sequence is then used to
    initialize an array of static storage duration and length just
    sufficient to contain the sequence.", so I would say it's more
    appropriate to for the type to be char[N] for the appropriate value of
    N. It does specify the type of the elements of the array; it's char if
    there's no prefix or a u8 prefix, and is wchar_t, char16_t or char32_t,
    depending upon which combination of the u or U and L, prefixes you use.

    In most contexts, an lvalue of array type decays into a pointer to the
    appropriate element type, so the array length doesn't matter. However,
    there's three contexts where that's not the case: sizeof("Hello") should
    be 6, &"world!" should have the type char(*)[7], and the declaration:

    char greeting[] = "Hello world!";

    is equivalent to

    char greeting[13] = {'H', 'e', 'l', 'l', 'o', ' ',
    'w', 'o', 'r', 'l', 'd', '!', '\0'};
    "If the program attempts to modify such an array, the behavior is
    undefined" (6.4.5p5). If "const" had been part of the original C
    language, string literals should have had a const-qualified type.
    However, by the time "const" was added, too much legacy code had been
    written which would have broken if that change were made.
     
    James Kuyper, Dec 19, 2013
    #4
  5. Old Wolf

    Eric Sosman Guest

    Not a bug, because a compiler is always allowed to issue
    any diagnostics it likes. That's why gcc can issue a warning
    for `if (x = 0) ...' even though it's perfectly legal C.
    If the assignment were attempted, the behavior would be
    undefined. The anonymous array created by the string literal
    is of type `char[4]' rather than `const char[4]', but that's
    a historical accident: `const' was a latecomer to C. Despite
    the array's non-`const'-ness, attempting to modify it yields
    undefined behavior.

    Perhaps one could argue that the compiler should have seen
    that the assignment would never be executed, optimized the whole
    business away, and suppressed the warning -- but that's an argument
    to have with the compiler developers, not with the language.
    More on the "historical accident:" Before `const' there was no
    way for a function to advertise that it didn't intend to write through
    a pointer argument. A function operating on a string looked like:

    int oldfunc(string)
    char *string;
    { ... }

    It looked this way whether it wanted read-only or read-write access;
    the function definition was the same either way. Along came `const'
    and it became possible to state the difference:

    int readwrite(string)
    char *string;
    { ... }

    int readonly(string)
    const char *string;
    { ... }

    or with prototypes (which came along at the same time):

    int readwrite(char *string) { ... }

    int readonly(const char *string) { ... }

    At this point the Committee *could* have `const'-ified the string
    literal's array, but then what would have happened to oldfunc() --
    to all those oldfunc()'s in their myriad thousands in C code that
    had been written in the two decades preceding the ANSI Standard?
    Every attempt to call them with literal arguments would suddenly
    become an "error by fiat" -- with a diagnostic required, no less.
    How eager would folks have been to adopt a brand-new Standard whose
    immediate effect was to delegitimize a huge amount of pre-existing
    code? So the Committee took the "impure" but intensely practical
    stance that the literal arrays would be "non-`const' but please
    don't write them." That's the situation that still prevails.
     
    Eric Sosman, Dec 19, 2013
    #5
  6. N1570 6.5.1p4 (in the section on primary expressions) says:

    A string literal is a primary expression. It is an lvalue with type
    as detailed in 6.4.5.

    It would be better IMHO if 6.4.5 specified the type of the literal, as
    6.4.4 does for constants.

    As far as I can tell, there's no explicit statement about the *value* of
    a string literal. It's obvious that it's the value of the static array
    object described in 6.4.5, but the standard doesn't actually say so.
     
    Keith Thompson, Dec 19, 2013
    #6
  7. I'll have to check, but I don't think K&R1 *required* string literals to
    be writable.

    As of C89 (and C90, and C99, and C11), string literals are not const,
    but attempting to modify them (more precisely, the static arrays
    associated with them) has undefined behavior.

    There's no particular reason for a modern compiler to support
    modifying string literals, except perhaps to support old (bad) code.
    There's certainly no requirement in the standard to support it.
    Making string literals const (as C++ did) would have broken existing
    code. Prior to the 1989 ANSI C standard, the "const" qualifier didn't
    exist. A code snippet like this:

    int func(char *s) { /* ... */ }
    ...
    func("hello");

    would be illegal in C89/C90 if string literals were const; the solution
    would be to change the parameter to "const char *s" (which is a good
    idea anyway), but that wasn't possible in pre-ANSI C. The existing
    rule is a necessary compromise.

    [...]
     
    Keith Thompson, Dec 19, 2013
    #7
  8. It doesn't *explicitly* say that, but combined with 6.5.1p4 we can
    determine that the type of "hello" is char[6].
    It's a gcc bug, corrected in a later version.

    With gcc 4.1.2, I see the same error message you do.

    With gcc 4.7.1, I get:

    foo.c: In function 'main':
    foo.c:6:17: warning: assignment of read-only location '"abc"[0]' [enabled by default]

    which is a reasonable warning, but not a diagnostic required by the
    standard.
    A conforming hosted C implementation may not reject this program, since
    it doesn't violate any syntax rules or constraints. (Though I suppose a
    sufficiently perverse compiler could claim that it exceeds some capacity
    limit.)
     
    Keith Thompson, Dec 19, 2013
    #8
  9. Yes, it's a bug, because it's a fatal error, not a warning.
    The compiler rejects the program. The bug was corrected in a later
    version of gcc.

    [...]
     
    Keith Thompson, Dec 19, 2013
    #9
  10. Old Wolf

    osmium Guest

    That was one of the big criticisms of Schildt Books. He treated them as
    writable and the standard said (or implied)otherwise. The compilers I used
    always allowed me to write in them - but I didn't.
     
    osmium, Dec 19, 2013
    #10
  11. I suppose, and I think I wouldn't complain if it isssued a
    warning for this one, but it issued an error instead.

    (snip)
    But it was an error, not warning.

    Even without the if(0) the compiler doesn't know that it will
    ever be executed.

    -- glen
     
    glen herrmannsfeldt, Dec 19, 2013
    #11
  12. Old Wolf

    Eric Sosman Guest

    Thanks for the correction (and to Keith Thompson, too).
    The argument (for warning or for error) goes the other
    way around: The compiler needn't prove an execution *will*
    be attempted, but should in a case as simple as this one be
    able to prove that it *won't* be. I'm sure gcc can do such
    proofs at suitable optimization levels, but the larger
    question of whether to complain about "optimized out" code
    remains open. It seems to me the compiler does the developer
    a service by pointing out problems even in optimized out
    sections, since the reasons for optimizing out today may not
    obtain tomorrow (when a different and less helpful compiler
    may be in use). Replace `if(0)' with `if(CHAR_BIT==8)' to
    get a code block that will be optimized away on nearly every
    platform, but should still be inspected and perhaps warned
    about.
     
    Eric Sosman, Dec 19, 2013
    #12
  13. Old Wolf

    Tim Rentsch Guest

    More precisely, because the program is strictly conforming. A
    conforming implementation is required to accept any strictly
    conforming program, but not any other programs. Not violating
    any syntax rule or constraint is one aspect of being strictly
    conforming, but not the only aspect. However this program
    qualifies on those other aspects also.
    Exceeding a capacity limit provides a basis for not being able to
    execute a program successfully, but not for rejecting (ie, not
    accepting) a program. Any strictly conforming program must be
    accepted by a conforming implementation, even if the resultant
    executable cannot be run successfully.
     
    Tim Rentsch, Dec 20, 2013
    #13
  14. What about 4p3?

    A program that is correct in all other aspects, operating on correct
    data, containing unspecified behavior shall be a correct program and
    act in accordance with 5.1.2.3.

    This program:

    #include <stdio.h>
    #include <limits.h>
    int main(void) {
    printf("%d\n", INT_MAX);
    }

    is not strictly conforming, since its behavior is
    implementation-defined, but I don't believe a compiler is permitted
    to reject it because of that.
    Practically speaking, any compiler with finite resources (more briefly,
    "any compiler") will have some programs that are just too big for it to
    process. A likely example:

    int main(void)
    {
    [6.02e23 lines of "{" omitted]
    [6.02e23 lines of "}" omitted]
    }

    This obviously exceeds the minimal translation limits specified in in
    5.2.4.1, and very likely exceeds the actual translation limits of any
    given compiler. 5.2.4.1 requires a conforming implementation to
    "translate and execute" a program that hits all the minimal limits (127
    nesting levels of blocks, etc.), which I believe implies that it's not
    intended to translate *or* execute programs that exceeds those limits.
     
    Keith Thompson, Dec 20, 2013
    #14
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.