VLA question

Discussion in 'C Programming' started by Philip Lantz, Jun 14, 2013.

  1. Sure I was using a system with CHAR_BIT==8.

    My point was that I think Stephen is mistaken in his statement that:

    The only example I can envision a problem with is a character
    literal that today is negative. IIRC, the conversion to char is
    well-defined in that case. However, if character literals were
    char, it'd have a large positive value.

    I don't think that changing charctaer constants from int to char
    would cause the values of any such constants to change from negative
    to positive, assuming the signedness of char isn't changed at the
    same time.
    Keith Thompson, Jun 27, 2013
    1. Advertisements

  2. "The precision of an integer type is the number of bits it uses to
    represent values, excluding any sign and padding bits."

    In your example, the padding bit means that the precision of int is less
    than that of signed char. But that's not allowed because

    "[t]he rank of a signed integer type shall be greater than the rank of
    any signed integer type with less precision"


    "[t]he rank of long long int shall be greater than the rank of long
    int, which shall be greater than the rank of int, which shall be
    greater than the rank of short int, which shall be greater than the
    rank of signed char".
    Ben Bacarisse, Jun 28, 2013
    1. Advertisements

  3. Philip Lantz

    Öö Tiib Guest

    Do not dump that common subset just because it is not idiomatic C++. I
    agree with you about different languages and idioms, but that does not
    matter when there is a situation that can be solved by using common subset.

    Example good property of common subset between C and C++: It is better and
    more modern C than C89. (by my taste and some others, YMMV)

    Example situation where that property helps: Code is required to be C and
    one of (multiple) targets is required to be Microsoft compiler. Microsoft
    C compiler is basically C89 but Microsoft C++ compiler compiles
    "common subset" pretty well (few warnings to silence).

    See? The code is going to be far from idiomatic C++, but that does not
    matter, it will be regardless compiled on C++ compiler.
    Öö Tiib, Jun 28, 2013
  4. Philip Lantz

    Öö Tiib Guest

    If we are are preprocessing for C++11 compiler then something like:

    ptr->link = (decltype(ptr->link))malloc(sizeof *ptr->link);

    If for C++03 or earlier then its *lot* harder. Extensions may help.
    For example g++ had 'typeof' extension that worked like 'decltype'.
    It is basically same.

    array = (decltype(array))malloc(n * sizeof *array);

    I pretty much see that this cast is redundant and
    annoying from code's readability perspective.
    Yes, Stroustrup proposed 'decltype' at 2002 but simple things like it
    take decades like always. Recent versions of gcc, msvc, clang, icc
    and even Borland's c++ builder seem to recognize decltype so it is
    quite widely supported.
    There may be other, even trickier cases. When a tool fails to
    understand a piece of code then that indicates that human might
    fail as well. So it may be better to simplify code not smartify tools.
    Öö Tiib, Jun 28, 2013
  5. Philip Lantz

    James Kuyper Guest

    He was just paraphrasing what I said - if he was wrong, I was wrong. "If an integer character constant contains a single
    character or escape sequence, its value is the one that results when an
    object with type char whose value is that of the single character or
    escape sequence is converted to type int."
    If a char object contains the representation of a value greater than
    INT_MAX, when that value is converted to int, the result will be
    negative. Therefore, under the current rules, the corresponding
    character literals must have a negative value. If the rules were changed
    to give them the type char, they would have the actual value of the
    corresponding char objects, which would be greater than INT_MAX.
    James Kuyper, Jun 28, 2013
  6. Philip Lantz

    Eric Sosman Guest

    Perhaps I've misunderstood (not for the first time, nor the
    last): I thought the cast-adder would produce code that was valid
    in the much-discussed "common subset" of C++ and C. Indeed, the
    word "redundant" suggests that line, since only in C would the
    cast be unnecessary. But then, the solution you offer is quite
    clearly not part of any "common subset" ... So I'm afraid I just
    haven't grasped your meaning.
    Eric Sosman, Jun 28, 2013
  7. Philip Lantz

    Ian Collins Guest

    #if !defined __cplusplus
    # define decltype(X) void*

    Ian Collins, Jun 28, 2013
  8. Philip Lantz

    Eric Sosman Guest

    Well, maybe. Sort of makes a mockery of the "common subset"
    notion, though. Take it a step further:

    #ifdef __cplusplus
    // Arbitrary C++ code here
    // Arbitrary C code here

    .... and now the "common subset" includes the entirety of both
    languages -- the subset is the union!

    (Long ago in this newsgroup -- I think ANSI C was still
    a draft, with `noalias' -- somebody posted a "Hello, World!"
    program. The unusual feature was that the program would
    output "Hello, World!" when its source was fed to a C
    implementation *or* to a Fortran implementation *or* to the
    Unix shell. So: Is it useful to speak of the "common subset"
    of C, Fortran, and sh?)
    Eric Sosman, Jun 28, 2013
  9. Philip Lantz

    Öö Tiib Guest

    It can be. Conditional compiling and macros have to be used for to get
    rid of necessary for C++ and illegal in C things like 'extern "C"' or
    'decltype' casts. I had something like that in mind:

    ptr->link =
    #if defined __cplusplus
    malloc(sizeof *ptr->link);

    However now I see that Ian Collins suggested even better way.
    "Redundant" because such casts lower readability of code (YMMV). When
    something is needed that hurt readability then there are alternatives to
    add them manually or to use a tool that adds them temporarily compiling
    time. I often prefer latter.
    Öö Tiib, Jun 28, 2013
  10. Do you mean designated initializers?
    Keith Thompson, Jun 28, 2013
  11. Got it, you're right.


    CHAR_BIT == 16
    sizeof(int) == 1
    CHAR_MIN == 0
    CHAR_MAX == 65535
    INT_MIN == -32768
    INT_MAX == +32767

    `\xffff' is a character constant, which is of type int. Its value
    is the result of converting (char)65535 to type int, which is likely
    to be -1. If character constants were of type char, it would have
    the positive value (char)65535 of type char.

    Just to add to the frivolity, the result of the conversion
    is implementation-defined. Throw one's-complement and
    sign-and-magnitude into the mix, and things get fun.
    Keith Thompson, Jun 28, 2013
  12. If a tool doesn't always work, then it can be extremely irritating.
    You might have thousands of C files to process. It fails on just one of them,
    but that means you've got to get a programmer to fix up the code manually,
    then document that, for that instance, the tool-chain fails. That adds a lot
    of cost, and means that errors can much more easily slip in.
    Malcolm McLean, Jun 28, 2013
  13. Philip Lantz

    Öö Tiib Guest

    I have not seen any tools that always work. Some are more robust, some
    less but godly robustness is missing. It is because there are always
    defects in code, compilers, linkers, standard libraries operating systems
    and hardware on what that all runs.
    When we are talking about repositories of thousands of files then we are
    likely talking about efforts of thousands of man-days and so we are
    likely talking about teams of tens of developers? Mere build may take
    several minutes. Therefore the build (involving compilers, code
    generators,static analyzers, running unit tests etc.) is best to be done
    by continuous integration system (or farm) to save the time of each
    developer building it.

    A tool does not suddenly start to fail out of blue. Either someone
    modified the file with what it fails or modified the tool or modified
    something on what one or other depends. If integration is continuous then
    it is very clear who committed that breaking change. Just back out
    that breaking change-set, notify the one who committed it and let him to
    deal with it. If he can't then he will find aid who can. We are software
    developers so dodging defects is our everyday bread and butter.
    Öö Tiib, Jun 28, 2013
  14. With unsigned plain char, a character literal with a negative int value
    today would have a large positive value if its type changed to char,
    assuming the implementation didn't change to signed plain char at the
    same time.

    CHAR_MAX > INT_MAX with signed plain char requires int to have padding
    bits and less range than char, which AIUI isn't allowed.

    Stephen Sprunk, Jun 29, 2013
  15. The computer can break.
    But most C compilers will always compile valid C code, most text editors will
    always show the real contents of files, most compressors will always
    archive correctly. The bugs are elsewhere. If you use the Unix philosophy
    of "each tool does one thing" then those tools tend to be stable and
    bug free. If you use the alternative philosophy of the "integrated
    system" then you're constantly adding features, and often things
    break. (However integrated systems are often easier to use, it's not
    all one way).
    Malcolm McLean, Jun 29, 2013
  16. Right -- but that's only an issue when CHAR_BIT >= 16, which is the
    context I missed in my previous response. As I also noted elsethread,
    the conversion from char to int, where char is an unsigned type and the
    value doesn't fit, is implementation-defined; the result is *probably*
    negative, but it's not guaranteed.
    I think that's right.
    Keith Thompson, Jun 29, 2013
  17. Philip Lantz

    Öö Tiib Guest

    That I told. When things fail let the author of situation to fix it.
    99% of cases he did mess something up. 1% of cases he discovers a bug in
    compiler or the like.
    I like that philosophy. I described a simple tool that can preprocess
    source before compiler (add casts to mallocs, maybe add extern "C" to
    headers). As result majority of good C code can be compiled with
    both C and C++ compiler. There will be cases that still can't but then
    it is better to let human to adjust them instead of making the tool more
    smart and error-prone.
    That is AFAIK still Unix philosophy. We pipe together those simple tools
    to get more sophisticated results. If we just would use each of those
    simple tools alone by hand then Unix would be annoying to use. That set
    will more likely fail since there are more tools and details but on each
    case it is usually simple to understand problem in some simple step in
    that complex chain.
    Öö Tiib, Jun 29, 2013
  18. Philip Lantz

    James Kuyper Guest

    Up to this point, you're saying almost exactly what I just said, just
    with slightly different wording.
    Almost. The conversion to 'int' would be guaranteed to produce exactly
    the same value that the character literal would have had under the
    current rules. In order to demonstrate the change, you have to convert
    it to a signed type with a MAX value greater than INT_MAX. 'long int' is
    likely to be such a type, but even intmax_t is not guaranteed to be such
    a type.
    You might consider it insane to have char be unsigned on such a
    implementation, but such an implementation could be fully conforming. It
    would violate some widely held expectations, but if it is fully
    conforming, then those expectations were unjustified. Is there any
    reason other than such expectations why you would consider such an
    implementation insane?
    For implementations where CHAR_MAX > INT_MAX, some character literals
    must have a negative value, so that never applies.
    Why? What provision of the C++ standard would force them to do that?
    To me, the single strongest argument against considering such code to be
    broken is the fact that the C standard guarantees that character
    literals have 'int' type. You haven't explained why you consider such
    code broken. My best guess is that you think that choosing 'int' rather
    than 'char' was so obviously and so seriously wrong, that programmers
    have an obligation to write their code so that it will continue to work
    if the committee ever gets around to correcting that mistake. I agree
    with you that the C++ rules are more reasonable, but I don't think it's
    likely that the C committee will ever change that feature of C, and it's
    even less likely that it will do so any time soon. Therefore, that
    doesn't seem like a reasonable argument to me - so I'd appreciate
    knowing what your actual argument against such code is.

    Summarizing what I said earlier, as far as I have been able to figure
    out, the behavior of C code can change as a result of character literals
    changing from 'int' to 'char' in only a few ways:
    1. sizeof('character_literal'), which is a highly implausible construct;
    it's only plausible use that isn't redundant with #ifdef __cplusplus is
    by someone who incorrectly expects it to be equivalent to sizeof(char);
    and if someone did expect that, they should also have incorrectly
    expected it to be a synonym for '1'; so why not write '1' instead?
    2. _Generic() is too new and too poorly supported for code using it to
    be a significant problem at this time.
    3. Obscure, and possibly mythical, implementations where CHAR_MAX > INT_MAX.

    I consider the third item to be overwhelmingly the most significant of
    the three issues, even though the unlikelihood of such implementations
    makes it an insignificant issue in absolute terms. Ignoring the other
    two issues (and assuming that LONG_MAX > INT_MAX), consider the
    following code:

    char c = 'C';
    long literal = 'C';
    long variable = c;
    int offset = -13;

    Under the current rules, on an implementation where CHAR_MAX <= INT_MAX:
    c+offset and 'C'+ offset both have the type 'int'. 'c', 'literal' and
    'variable' are all guaranteed to be positive.

    Under the current rules, on an implementation where CHAR_MAX > INT_MAX:
    c+offset will have the type 'unsigned int', but 'C' + offset will have
    the type 'int'. It is possible (though extremely implausible) that c >
    INT_MAX. If it is, the same will be true of 'variable', but 'literal'
    will be negative.

    If character literals were changed to have the type 'char', on an
    implementation where CHAR_MAX <= INT_MAX:
    c+offset and 'C' + offset would both have the type 'int'. 'c',
    'literal', and 'variable' would all be guaranteed to be positive.

    If character literals were changed to have the type 'char', on an
    implementation where CHAR_MAX > INT_MAX:
    c + offset and 'C' + offset would both have the type 'unsigned int'. It
    would be possible (though extremely implausible) that c > INT_MAX. If it
    were, the same would be true for both 'literal' and 'variable'.

    Therefore, the only implementations where code would have different
    behavior if character literals were changed to 'char' are those where
    CHAR_MAX > INT_MAX. And the only differences involve behavior that,
    under the current rules, is different from the behavior for CHAR_MAX <=
    INT_MAX. Therefore, the only code that will break if this rule is
    changes is code that currently goes out of it's way to correctly deal
    with the possibility that CHAR_MAX > INT_MAX. I cannot see how you could
    justify labeling code as 'broken', just because it correctly (in terms
    of the current standard) deals with such an extremely obscure side issue.

    On the other hand, the simplest way to deal with the possibility that
    CHAR_MAX > INT_MAX is to insert casts:

    if(c == (char)'C')

    long literal = (char)'C';

    Such code would not be affected by such a change. Only code that copes
    with the possibility by other methods (such as #if CHAR_MAX > INT_MAX)
    would be affected. I suppose you could call such code broken - but only
    if you can justify insisting that programmers have an obligation to deal
    with the possibility that the committee might change this rule.
    James Kuyper, Jul 1, 2013
  19. When I get confused, I tend to dump my current state in hopes that
    someone can point out an error that led to said confusion.
    Why? I thought that, while converting a negative value to unsigned was
    well-defined, converting an out-of-range unsigned value to signed was not.
    I consider it insane to have an unsigned plain char when character
    literals can be negative.
    Granted, one can create arbitrary character literals, but doing so
    ventures into "contrived" territory. I only mean to include real
    characters, which I think means ones in the source or execution
    character sets.
    In C++, character literals have type char, so if char is unsigned, then
    by definition no character literal can be negative.
    Well, I'm not sure how much of a "choice" that really was, rather than
    an accident of C's evolution from an untyped language and everything
    becoming an "int" by default.
    I cannot recall having seen any code that would break if that mistake
    were corrected, and I'm reasonably certain none of mine would because I
    thought character literals _were_ of type char until many years after
    first learning C--and I still code as if it were true because I want my
    code to still work if compiled as C++.
    We know there are systems where sizeof(int)==1; can we really assume
    that plain char is signed on all such implementations, which is the only
    way for them to _avoid_ CHAR_MAX > INT_MAX?
    My gut says more code would break on systems where CHAR_MAX > INT_MAX
    than would break if character literals were chars; few programmers would
    think about accommodating the former or even realize it could exist,
    whereas most either mistakenly think the latter is true or are actually
    coding for the C-like subset of C++ where it _is_ true.

    Stephen Sprunk, Jul 1, 2013
  20. Philip Lantz

    James Kuyper Guest

    I mentioned my argument for that conclusion earlier in this thread -
    both you and Keith seem to have skipped over it without either accepting
    it or explaining why you had rejected it. Here it is again.

    The standard defines the behavior of fputc() in terms of the conversion
    of int to unsigned char ( It defines the behavior of fgetc()
    in terms of the conversion from unsigned char to int ( All
    other I/O is defined in terms of the behavior of those two functions -
    the other I/O functions don't have to actually call those functions, but
    they are required to behave as if they did. It also requires that "Data
    read in from a binary stream shall compare equal to the data that were
    earlier written out to that stream, under the same implementation."
    (7.21.2p3). While, in general, conversion to signed type of a value that
    is too big to be represented by that type produces an
    implementation-defined result or raises an implementation-defined
    signal, for this particular conversion, I think that 7.21.2p3 implicitly
    prohibits the signal, and requires that if 'c' is an unsigned char, then

    (unsigned char)(int)c == c

    If CHAR_MAX > INT_MAX, then 'char' must behave the same as 'unsigned
    char'. Also, on such an implementation, there cannot be more valid 'int'
    values than there are 'char' values, and the inversion requirement
    implies that there cannot be more char values than there are valid 'int'
    values. This means that we must also have, if 'i' is an int object
    containing a valid representation, that

    (int)(char)i == i

    In particular, this applies when i==EOF, which is why comparing fgetc()
    values with EOF is not sufficient to determine whether or not the call
    was successful. Negative zero and positive zero have to convert to the
    same unsigned char, which would make it impossible to meet both
    inversion requirements, so it also follows that 'int' must have a 2's
    complement representation on such a platform.

    You've already said that. What you haven't done so far is explained why.
    I agree that there's a bit of conflict there, but 'insane' seems extreme.

    There's no requirement that any member, not even of the basic execution
    character set, have an encoding that is <= INT_MAX. It's pretty
    unlikely for members of the basic execution set, but it seems a very
    likely thing for members of the extended character set that are
    represented by UCNs for code points that are greater than INT_MAX. All
    such characters must have a character literal that is negative if
    I'd forgotten that C++ had a different rule for the value of a character
    literal than C does. The C rule is defined in terms of conversion of a
    char object's value to type 'int', which obviously would be
    inappropriate given that C++ gives character literals a type of 'char'.
    Somehow I managed to miss that "obvious" conclusion, and I didn't bother
    to check. Sorry.

    The essence of what I've been saying is that it's fairly difficult to
    write such code, except by relying upon sizeof() or _Generic(), and
    almost impossible to do so accidentally.
    Every time I've brought up the odd behavior of implementations which
    have UCHAR_MAX > INT_MAX, it's been argued that they either don't exist
    or are so rare that we don't need to bother worrying about them.
    Implementations where CHAR_MAX>INT_MAX must be even rarer (since they
    are a subset of implementations where UCHAR_MAX > INT_MAX), so I'm
    surprised (and a bit relieved) to see someone actually arguing for the
    probable existence of such implementations. I'd feel happier about it if
    someone could actually cite one, but I don't remember anyone ever doing so.
    Well, that follows from what I said above. Almost all breakage that
    would occur if character literals were changed to char would occur on
    platforms where CHAR_MAX > INT_MAX, and would therefore count for both
    categories. However, I'll go farther, and say that it's not only "more
    code", but "a lot more code".
    James Kuyper, Jul 1, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.