disgusting compiler !! hahaha!!

Discussion in 'C Programming' started by raj shekar, May 7, 2014.

  1. raj shekar

    raj shekar Guest

    Features of C that
    seem to have evolved with the compiler-writer in mind are:

    Arrays start at 0 rather than 1
    The fundamental C types map directly onto underlying hardware
    The auto keyword is apparently useless
    Array names in expressions "decay" into pointers
    Floating-point expressions were expanded to double-length-precision everywhere
    No nested functions (functions contained inside other functions)
     
    raj shekar, May 7, 2014
    #1
    1. Advertisements

  2. Computers like to index from zero, but it isn't hard at all
    for a compiler to subtract one. Converting algorithms between
    zero based and one based indexing is harder than it seems it
    should be.
    That was, and is, pretty usual. PL/I allows one to specify what
    is actually needed, such that the compiler figures out how to
    implement it. Most languages give what the hardware gives.
    Pretty useless in PL/I, too, but is there for consistency.
    Again, it doesn't make much difference to compilers.

    Well, overall, C is a fairly simple language to compile.
    This is just one feature.
    In the early days of C, it was used more for systems programming,
    and less for scientific programming. There was not so much worry
    about a little inefficiency in the generated code, and it simplifies
    the math library by only needing one of each. I suppose promoting
    function arguments could be separate from promoting for other
    operators, though.
    Multics and PL/I were used by many working on the beginnings of C,
    and PL/I has internal procedures. There are some complications
    in doing it right, especially in the case of function pointers
    and recursion.

    Fortran didn't add internal procedures until Fortran 90, and the
    ability to use them with pointers until Fortran 2003.

    -- glen
     
    glen herrmannsfeldt, May 7, 2014
    #2
    1. Advertisements

  3. raj shekar

    BartC Guest

    Why not have a language allow both 0-based and 1-based? Sometimes 0-based is
    useful (for measuring or for use with offsets), sometimes 1-based is (for
    counting); and sometimes N-based is. It's not hard.

    Then no conversion is necessary.
    It means having a special kludge to make the type-system work, with a big
    hole in it where value-arrays would normally go.
    Have you tried it? For this class of language (considering the primitive
    types and operations that it has), it ought to be easy to compile.

    I've thought about it myself, then considered that a C compiler needs to
    make sense of nightmare headers full of in-decipherable macros and pragmas
    and attributes (eg. stdafx.h), and has to restrain itself from reporting
    things such as 'int a,a,a,a,a;' (apparently legal), and realised what an
    undertaking it would actually be.
     
    BartC, May 7, 2014
    #3
  4. tcc (tiny C compiler) comes with source. Unlike gcc it's not a massive project.
     
    Malcolm McLean, May 7, 2014
    #4
  5. raj shekar

    jacob navia Guest

    Le 07/05/2014 20:37, Richard a écrit :
    The APL Language had a global variable called "Origin" that could be
    zero or one. According to this value array would start at 1 (default) or
    at zero (if you set Origin to zero).

    This gave users the choice, but led to subtle bugs. It would suffice to
    forget the origin change somewhere and all your software would no longer
    run since if you wrote it using origin 1 and somebody set the origin to
    zero all your array accesses would be wrong.

    The nice thing with origin 1 is that since array inderf zero doesn't
    exist, many functions can return zero for saying "Search failed". With
    origin zero you must return some other flag value (like 0xfffffff) or
    whatever, what always provokes problems.
     
    jacob navia, May 7, 2014
    #5
  6. raj shekar

    James Kuyper Guest

    I doubt that this was the primary reason for any of those features,
    though it was (and certainly should have been) one of the issues taken
    into consideration.
    Having spent most of my life using C, index-0 feels more natural, but
    that's only to be expected.. I've translated a fair amount of index-1
    code written for Fortran into index-0 code for C. The translation
    process itself can be annoyingly tricky, but properly done, the
    translation is pretty much a wash; overall the code is generally about
    equally complicated before and after the conversion. However, to the
    extent that I saw a difference, I generally found that it was in C's
    favor: there were slightly more "+1"s in the Fortran code than in the C
    code.
    That can be true, and was particularly true on the platforms where C was
    first developed, but the mapping is not necessarily simple or obvious.
    Implementations have a lot of freedom in those choices, and developers
    have occasionally been surprised by the choices that were made.
    However, to the extent that it is true, that was done at least as much
    for the benefit of the developer as for the compiler-writer.
    'auto' did not become useless until 'implicit int' was removed from the
    language in C99. However, taking advantage of 'implicit int' was never a
    very good idea (which is why it was removed from the language), so
    'auto' was relatively useless even in C89. To understand why it was
    there, you need to look at the history of the languages that were
    predecessors to C, where 'auto' was more useful. I don't remember the
    details - but they have been mentioned by others in this newsgroup.
    More precisely, that applies to all lvalues of array type, whether or
    not they happen to be the names of arrays.

    You can't meaningfully talk about the consequences of changing just that
    one rule, because it's too tightly integrated into the other rules of C
    as they're currently written. For instance, subscripting is only
    defined for pointers. The only reason why array[2] refers to the third
    element of "array" is because "array" itself automatically converts into
    a pointer to the first element of "array". A suggestion that this rule
    should not have been chosen must be accompanied with suggestions about
    how the other rules of C should have been changed to work properly
    without that rule.

    However, one relevant issue is that C was designed to rely upon
    pass-by-value, which means that an array could only be passed to a
    function by creating a pointer. Having that pointer created
    automatically was designed as a convenience for the developer, I suspect
    that it might actually make things a little more complicated for the
    compiler writer, because it means that arrays are treated differently
    from other object types.
    I suspect that decision was based upon a lot of experience showing that
    single-precision was often insufficiently accurate.

    In modern C, it's a bit easier to avoid such implicit conversions than
    in was in K&R C. In C89, function prototypes were added, which allow you
    to declare function arguments to be float, thereby avoiding that
    implicit conversion. In C99, almost every <math.h> function that takes a
    double argument has another version with the same name plus an 'f' at
    the end which takes float arguments and (where appropriate) returns a
    float value.
     
    James Kuyper, May 7, 2014
    #6
  7. raj shekar

    BartC Guest

    There are innumerable benefits from being flexible in specifying the lower
    bound of an array.

    One being that it makes it easier to port code or an algorithm that uses a
    different base.
    C conflates arrays with pointers too much which is why you're thinking of
    offsets when you should be thinking of indices.
     
    BartC, May 7, 2014
    #7
  8. raj shekar

    BartC Guest

    There always seems to be a risk of off-by-one errors whenever I try it (with
    subtle bugs due to some <= needing to be < at some place you didn't check.
    Also with some algorithms which make use of the oddness or evenness of an
    index, they will need extra care).

    But one short-cut way of converting 1-based to 0-based is just to make the
    arrays one element longer, and to carry on using 1-based indexing (ignoring
    element zero). Not elegant, and a bit wasteful, but better than introducing
    bugs.

    (Going the other way would be more difficult, except that some languages
    that are 1-based, also allow N-based including 0-based. Being tolerant
    about these matters is helpful.)
     
    BartC, May 7, 2014
    #8
  9. raj shekar

    James Harris Guest

    Over the years I've come across different approaches to saying "not found"
    from which I infer that there is no single best answer. Possible responses:

    * one less than the lowest index
    * the lowest index
    * the highest index
    * one more than the highest index
    * throw an exception

    Just a thought but perhaps the best option is either to throw an exception
    and have a catch clause which deals with it appropriately or, if the
    language does not support exceptions, allow the caller to pass the value it
    wants to be returned if the index is not found.

    James
     
    James Harris, May 7, 2014
    #9
  10. raj shekar

    James Kuyper Guest

    Yes, that's the main thing that makes the process tricky.
    That's an example of what I consider not "properly done". The one-based
    indexing is going to confuse any maintenance programmer who's used to
    C's normal 0-based indexing (even if properly warned, and especially if
    not). I think it's better to bite the bullet and do what's needed to get
    it right the first time, rather than creating traps for unwary future
    maintainers.
     
    James Kuyper, May 7, 2014
    #10
  11. raj shekar

    Stefan Ram Guest

    What is »int« mapped onto?
    »auto« helps B programmers who want to start using C
    immediately feel at home. This is a reason for the
    great success of C, which is now number 1 on TIOBE,
    beating your favorite language.
    not always
    not always
    »no function /declarations/ contained
    insided other function /declarations/«.

    This keeps C simple and small, so that efficient
    C compilers are available for many targets.
     
    Stefan Ram, May 7, 2014
    #11
  12. raj shekar

    James Kuyper Guest

    While true, that's not directly relevant to his point. Change
    "declaration" to "definition", and what you say is both true, and
    exactly what I assume he's talking about.
     
    James Kuyper, May 7, 2014
    #12
  13. raj shekar

    Stefan Ram Guest

    Actually, these are two different values, there is a

    meta indicator (indicating failure / success), and a
    result (only in the case of success).

    Explicitly, the function thus should return two values
    (I call this: »out-of-band error indication«).

    #include <stdio.h>

    struct result
    { int valid; /* << here is the explicit error indicator */
    int value; };

    static struct result divide( int const numerator, int const denominator )
    { struct result result;
    if( result.valid = denominator )result.value = numerator / denominator;
    return result; }

    static void print_division( int const numerator, int const denominator )
    { struct result result = divide( numerator, denominator );
    if( result.valid )printf( "result = %d\n", result.value ); }

    int main(){ print_division( 4, 0 ); print_division( 3, 1 ); }
     
    Stefan Ram, May 7, 2014
    #13
  14. raj shekar

    Stefan Ram Guest

    If have recently used too many sub-standard programming languages
    where definitions are called »declarations«, sorry.

    A /function definition/ is part of the C source text, so the meaning
    of »to nest« is inherited from the nesting of texts.

    A /function/ is an abstract entity that is not part of the source
    text. In C, there are indeed values that have »function type«.
    These are called »function designator« by N1570 6.3.2.1p4.
    They do not refer to /function definitions/ which usually are not
    available at run-time anymore. The meaning of the verb »to nest«
    is not specified for such functions which are the values of
    function designators.
     
    Stefan Ram, May 7, 2014
    #14
  15. (snip, I wrote)
    After writing that, I thought that one could have a [[ ]] operator,
    for a subtract one and index. As far as I know, there is no
    current use for that syntax that would cause an ambiguity.

    -- glen
     
    glen herrmannsfeldt, May 7, 2014
    #15
  16. (snip, someone wrote)
    For writing new programs and algorithms, there is nothing wrong
    with 0 origin, but it is a fair amount of work, and easy to get
    wrong, to convert an existing program or algorithm.

    It is still usual in matrix mathematics to index from 1.

    It wouldn't be hard at all to add a new operator.

    -- glen
     
    glen herrmannsfeldt, May 7, 2014
    #16
  17. A function *declaration* is something like:

    void func(void);

    The corresponding function *definition* (which also provides a
    declaration) is:

    void func(void) {
    /* ... */
    }

    What's disallowed in standard C is, for example:

    void outer(void) {
    void disallowed(void) {
    /* ... */
    }
    /* ... */
    }

    Both function declarations and function definitions are part of the
    source text. Nobody was referring to functions as abstract entities,
    or to function designators.
     
    Keith Thompson, May 7, 2014
    #17
  18. You'd probably have to add the syntax as a grammar rule, keeping [ and ]
    as the only tokens. Making it an actual operator (which to me implies
    that the [[ and the ]] are tokens) complicates the lexer since it can't
    apply the "maximal-munch" rule anymore.
     
    Ben Bacarisse, May 8, 2014
    #18
  19. raj shekar

    Kaz Kylheku Guest

    High leve languages also have zero based arrays.

    Zero based arrays are also described in Lisp 2 (1968), and Lisp
    continues to have zero-based vectors today.

    Python has zero-based arrays.

    One based arrays are useful in some situations. Those situations
    are rare.

    *Supporting* one-based arrays isn't a bad idea.

    Making them *default* is stupid; the default should be zero based.

    Making one-based arrays the only choice is criminally insane.

    The same isn't true of zero based arrays. Zero based arrays being
    the only supported representation is perfectly workable.
    It's that type of language; C wouldn't be C if it wasn't like that;
    it would be something else.

    People who need the semantics of "types that map onto hardware"
    would not use that something else.

    If a higher-level-than-assembly programming language that maps types onto
    hardware didn't exist, it would have to be invented.
    The way arrays and pointers work in C is actually quite brilliant.
    You are misinformed. Float values undergo a special default argument
    promotion when passaed to old style functions without prototypes or
    as trailing arguments to variadic functions.
    This is too bad, but on the hand it is available for decades as an extension in
    GNU C.

    GNU C is more widely available, platform-wise, than whatever language
    you have in mind which has nested functions.
     
    Kaz Kylheku, May 8, 2014
    #19
  20. C++ had a similar problem with << and >>. In nested template
    definitions, a closing >> had to be written as > > so it wouldn't be
    interpreted as a right shift operator. A more recent version of the C++
    standard corrected the problem (I don't remember exactly how).
     
    Keith Thompson, May 8, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.