disgusting compiler !! hahaha!!

Discussion in 'C Programming' started by raj shekar, May 7, 2014.

  1. This *definitely* causes UB. I think "Numeric Recipes in C" used
    that technique, and yes, it commonly works, but I can easily imagine
    it failing on some architectures if a_ happens to be allocated at
    the beginning of a segment.
    Keith Thompson, May 8, 2014
    1. Advertisements

  2. raj shekar

    BartC Guest

    That's measuring rather than counting. Measuring (as in your top example
    where you measure from left of the first box to the right of the last box),
    necessarily starts from zero (and in real-life tends to be a continuous
    value rather than a whole number).

    But there are three boxes above and it is natural to number them 1, 2 and 3.
    The physical positions might be 0.0, 1.0, 2.0 and 3.0 at edges of the boxes,
    and even 1.8 for 80% across the middle box.

    There are few exceptions, for example I would deal with discrete bits in an
    32-bit integer, but number them from 0 to 31 because that's the convention
    (but I'm not sure C assigns an index to them anyway).
    So did I; changing from 1999 to 2000 was a bigger deal than whether this was
    actually the third millennium or not.

    It would also be ludicrous if the 'nineties' for example didn't start until
    1-Jan-1991 and didn't end until 31-Dec-2000. But those of more examples of
    measurement (of time in this case), rather than counting or indexing.
    Yes, for some kinds of calculations, base 0 is simpler. This doesn't means
    you have to use base 0 everywhere. In that language construct, Algol-68 uses
    base 1.
    Only if you have to do the index calculations yourself (accessing a 1D array
    as 3D for example). Otherwise base 1 is probably a little simpler than base
    0 (index from 1 to L, M, N for each index, instead of 0 to L-1, M-1, N-1).
    BartC, May 8, 2014
    1. Advertisements

  3. I find you both stupid and boring.

    Do I get full credit for that?

    The problem in US politics today is that it is no longer a Right/Left
    thing, or a Conservative/Liberal thing, or even a Republican/Democrat
    thing, but rather an Insane/not-Insane thing.

    (And no, there's no way you can spin this into any confusion about
    who's who...)
    Kenny McCormack, May 8, 2014
  4. (snip, I wrote)
    Yes, I was probably thinking about older C, but even so, compare
    C11 to Fortran 2008, or even PL/I in 1966. (Note for one that
    both PL/I and Fortran don't have reserved words. Just another
    complication for the compiler to figure out.)

    -- glen
    glen herrmannsfeldt, May 8, 2014
  5. For a large number of practical problems, you need double precision
    arithmetic to get single precision results.

    The precision you need for intermediate values in matrix
    computations tends to increase with the size of the matrix,
    and matrix computations have gotten a lot larger since the
    days of 36 bit machines.

    It would be interesting to have a 48 bit floating point type,
    so six bytes on byte addressed machines.
    -- glen
    glen herrmannsfeldt, May 8, 2014
  6. Well, it mostly fails at the beginning of a segment if you do bounds
    checking wrong, or if the arithmetic wraps wrong.

    Not that I like the solution very much.

    -- glen
    glen herrmannsfeldt, May 8, 2014
  7. You don't *have* to do it if your compiler supports VLAs.
    Keith Thompson, May 8, 2014
  8. raj shekar

    Walter Banks Guest

    Points well taken. I agree that 32 bit floats for many applications
    are under what is needed. Byte machines and some floating point
    packages make it not that hard to add a byte to the mantissa.

    I implemented 48 bit floats on a 24 bit processor a few years ago.
    (40bit Mantissa)

    Walter Banks, May 8, 2014
  9. raj shekar

    David Brown Guest

    The problem is not in the compiler - it could add the "-'a'" when
    accessing the array, or it could (on most targets, in most situations)
    do the subtraction on the symbol at link time.

    The problem is user expectations, along with the expectations of other
    tools that read or write C code. It is a fundamental part of the
    definition of C that if xs is an array, then xs and xs[0] have the same
    address, and a pointer to xs[0] is the same as xs. Allowing non-zero
    lower bounds breaks that rule.

    In languages like Pascal or Ada, there never was such a rule - so there
    never has been such an assumption. In C++, the rule is true for POD
    arrays (inherited from C), but not for classes with a [] operator, and
    it is fine to have different lower bounds or different types for the
    indices. But making lower bounds non-zero would break C, so it is not
    going to happen (at least, not without a lot of effort, and perhaps some
    new syntax to make things clear).

    As I said, I have no argument against it being a nice feature that would
    make some code clearer and give better compile-time checking. I just
    don't see it happening in C.
    I think the "[lower:]" syntax is /definitely/ optimistic for C.
    Infinite lists are common in functional programming languages, but we
    won't see lazy evaluation in C in the near future!
    gcc checks array bounds at compile time, but only if you have at least
    -O2 or -Os optimisation (and -Wall), as the optimisation passes are used
    to collect the information needed for checking. There are a number of
    gcc warnings that only work (or only work well) with optimisation enabled.

    I would say that compile-time checking of types and bounds is essential
    to such a feature - the whole point would be to help write better code,
    and that means making it easier for the programmer to write clearer code
    and also making it easier for the tools to spot mistakes.
    A lot of people use C++ as ABC - "A Better C". It is not necessarily a
    bad idea. And while C++ introduces a lot of extra complexity, it has
    got a bit easier with C++11 "auto", which can save a lot of messy
    template typing.
    David Brown, May 9, 2014
  10. raj shekar

    BartC Guest

    C allows "[]", it just means the bounds are not specified:

    int a[] = {1,2,3};
    int (*b)[];
    extern int c[];
    BartC, May 9, 2014
  11. raj shekar

    James Kuyper Guest

    a_-1 does violate a bound, so I don't see how implementing pointer
    arithmetic in such a way that it fails catastrophically would qualify as
    doing "bounds checking wrong".
    James Kuyper, May 9, 2014
  12. raj shekar

    Walter Banks Guest

    C has variable scoping right but didn't see the significant
    advantages that similar scoping rules would have for functions.
    A lot of code reliability could have been improved with local

    The implementation of nested functions has very little impact
    compiler implementation. Quite a few C compilers have scoped
    functions capability implemented as a C extension.

    Walter Banks, May 9, 2014
  13. raj shekar

    Walter Banks Guest

    C as a language could use range syntax (as if it really need a larger grammar)
    'a' .. 'z' type syntax has a lot of uses and commonly found in code so that a
    tight compiler implementation would be useful.

    For example

    if ( a in [-5..46]) // stolen from pascal
    . . .

    Similarly with switch case

    case ['0'..'9'] :
    case [100..200] :

    Many compilers already have the code generation for these constructs and
    generate it by pattern matching to the most common source constructs.


    Walter Banks, May 9, 2014
  14. raj shekar

    David Brown Guest

    I often use languages that support local functions - Pascal and Python.
    It is rare that I find them useful, except for lambda functions in
    Python. When programming in C, I usually use gcc which has support for
    local functions, but I have never felt it would significantly improve my
    programs. I think that in most cases where local functions really would
    make a difference to the structure and quality of the program, you are
    probably better off using C++ with access to class member functions
    (including local classes) and lambdas.
    Local functions can often be implemented easily - they can be treated as
    a normal "static" function by the compiler. But sometimes the
    interaction between variables in the outer function and their usage
    inside the local function can make a real mess - the generated static
    function needs pointers to outer function variables, variables that used
    to be in registers now need to go on the stack, and your optimisation is
    lost. And when people start doing "clever" things like taking the
    address of the local function, it gets worse - on gcc, this is
    implemented using "trampolines" which are run-time generated code put on
    the stack.

    All in all, general local functions are a pain - and if you restrict
    them too much (such as disallowing their address to be taken), you lose
    many of the possible uses (such as for a sorting function for qsort()).

    Although gcc has nested functions (and has had for many years - they
    needed the functionality for languages such as Pascal and Ada), they are
    seldom used in C. C++ lambdas and class members are usually a better
    choice if you need such structures.

    (Note - I don't know the details of /why/ trampolines are needed for
    nested functions in C, while they are not needed for C++ lambdas.)
    David Brown, May 9, 2014
  15. raj shekar

    Martin Shobe Guest

    GCC uses trampolines for C because taking the address of the nested
    function must result in a function pointer. C++ doesn't need that for
    lambdas since they are objects instead of functions.

    Martin Shobe
    Martin Shobe, May 9, 2014
  16. raj shekar

    BartC Guest

    There is an even simpler implementation where the local function can't
    access the local variables of its enclosing function. (I found out I could
    do this simply by commenting out a check on functions being defined inside

    The local function is compiled as though it was outside. The sole advantage
    is that the name of the local function only has a scope within its enclosing
    function. (And the same name can be reused inside another function, if the
    compiler provides a unique name for each nested function.)

    Also if you move/copy the main function elsewhere, it will take all its
    locals with it.
    BartC, May 9, 2014
  17. The languages with local (internal) functions usually don't have
    something like C's file scope functions.

    Sometimes it is convenient to have a small function to help with
    something, but not really worth a normal external function.
    Also, it is often more readable to have it nearby.

    Yes, it gets complicated in the case of recursion, where you need
    to get the right instance of the caller.
    If qsort() had a (void*) argument that it passed through to the called
    function, it would make some comparisons easier. That would avoid
    many of the cases where you would want to use the internal function
    to access outside data.

    -- glen
    glen herrmannsfeldt, May 9, 2014
  18. raj shekar

    BartC Guest

    Even simpler is to use:

    if (a in -5..46) ...

    ie. matching against a range (instead of being a member of a set as I
    assumed the [...] construction was). This would directly map to:

    if (a>=-5 && a<=46) ...

    (and hopefully implemented so that a is evaluated once).
    The argument against ranges for switch cases, when it frequently comes up in
    c.l.c., is that someone might be tempted to write:

    case 'A'..'Z':

    instead of:

    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case
    case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case
    case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': case
    case 'Y': case 'Z':

    (and the former does have a certain conciseness about it). This won't be
    quite right if it happens that EBCDIC is being used. But if that is not the
    case, or letters are not involved, then it is silly to have to type out
    dozens of consecutive case labels.
    BartC, May 9, 2014
  19. It's not quite that simple. References to objects defined in a
    containing function are non-trivial. There are at least two common ways
    to implement this:

    - A "static link", a pointer in a nested function to
    the frame for its parent, forming a linked list for multiple
    nesting levels); or

    - A "display", an array of pointers to the frames for all lexically
    enclosing functions.

    In either case, the compiler has to generate extra code to maintain
    these pointers. (No such extra code is needed for a program that
    doesn't have nested functions, or possibly if it does but they don't
    refer to parent declarations).

    And C function pointers make it possible to call a nested function when
    its parent is not executing. If the nested function refers to
    declarations in its parent, presumably the behavior is undefine (or, as
    the gcc documentation puts it, "all hell will break loose").

    An example:

    #include <stdio.h>

    void (*funcptr)(void);

    static void outer(void) {
    int outer_var = 42;
    void inner(void) {
    printf("in inner, outer_var = %d\n", outer_var);
    funcptr = inner;

    int main(void) {

    When I run this program on my system, the output is:

    in inner, outer_var = 42
    in inner, outer_var = 42

    but only by luck; the indirect call has undefined behavior since
    outer_var doesn't exist at that point.
    Keith Thompson, May 9, 2014
    CHIN Dihedral, May 10, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.