disgusting compiler !! hahaha!!

raj shekar · May 7, 2014

Features of C that
seem to have evolved with the compiler-writer in mind are:

Arrays start at 0 rather than 1
The fundamental C types map directly onto underlying hardware
The auto keyword is apparently useless
Array names in expressions "decay" into pointers
Floating-point expressions were expanded to double-length-precision everywhere
No nested functions (functions contained inside other functions)

glen herrmannsfeldt · May 7, 2014

raj shekar said:
Features of C that
seem to have evolved with the compiler-writer in mind are:

Arrays start at 0 rather than 1

Computers like to index from zero, but it isn't hard at all
for a compiler to subtract one. Converting algorithms between
zero based and one based indexing is harder than it seems it
should be.

The fundamental C types map directly onto underlying hardware

That was, and is, pretty usual. PL/I allows one to specify what
is actually needed, such that the compiler figures out how to
implement it. Most languages give what the hardware gives.

The auto keyword is apparently useless

Pretty useless in PL/I, too, but is there for consistency.

Array names in expressions "decay" into pointers

Again, it doesn't make much difference to compilers.

Well, overall, C is a fairly simple language to compile.
This is just one feature.

Floating-point expressions were expanded to
double-length-precision everywhere

In the early days of C, it was used more for systems programming,
and less for scientific programming. There was not so much worry
about a little inefficiency in the generated code, and it simplifies
the math library by only needing one of each. I suppose promoting
function arguments could be separate from promoting for other
operators, though.

No nested functions (functions contained inside other functions)

Multics and PL/I were used by many working on the beginnings of C,
and PL/I has internal procedures. There are some complications
in doing it right, especially in the case of function pointers
and recursion.

Fortran didn't add internal procedures until Fortran 90, and the
ability to use them with pointers until Fortran 2003.

-- glen

BartC · May 7, 2014

glen herrmannsfeldt said:
Computers like to index from zero, but it isn't hard at all
for a compiler to subtract one. Converting algorithms between
zero based and one based indexing is harder than it seems it
should be.

Why not have a language allow both 0-based and 1-based? Sometimes 0-based is
useful (for measuring or for use with offsets), sometimes 1-based is (for
counting); and sometimes N-based is. It's not hard.

Then no conversion is necessary.

Again, it doesn't make much difference to compilers.

It means having a special kludge to make the type-system work, with a big
hole in it where value-arrays would normally go.

Well, overall, C is a fairly simple language to compile.
This is just one feature.

Have you tried it? For this class of language (considering the primitive
types and operations that it has), it ought to be easy to compile.

I've thought about it myself, then considered that a C compiler needs to
make sense of nightmare headers full of in-decipherable macros and pragmas
and attributes (eg. stdafx.h), and has to restrain itself from reporting
things such as 'int a,a,a,a,a;' (apparently legal), and realised what an
undertaking it would actually be.

Malcolm McLean · May 7, 2014

I've thought about it myself, then considered that a C compiler needs to
make sense of nightmare headers full of in-decipherable macros and pragmas
and attributes (eg. stdafx.h), and has to restrain itself from reporting
things such as 'int a,a,a,a,a;' (apparently legal), and realised what an
undertaking it would actually be.

tcc (tiny C compiler) comes with source. Unlike gcc it's not a massive project.

jacob navia · May 7, 2014

Le 07/05/2014 20:37, Richard a écrit :

Because it would be ridiculously stupid, bordering on incompetent IMO.
"the first element is at the beginning : "0 offset""

The APL Language had a global variable called "Origin" that could be
zero or one. According to this value array would start at 1 (default) or
at zero (if you set Origin to zero).

This gave users the choice, but led to subtle bugs. It would suffice to
forget the origin change somewhere and all your software would no longer
run since if you wrote it using origin 1 and somebody set the origin to
zero all your array accesses would be wrong.

The nice thing with origin 1 is that since array inderf zero doesn't
exist, many functions can return zero for saying "Search failed". With
origin zero you must return some other flag value (like 0xfffffff) or
whatever, what always provokes problems.

James Kuyper · May 7, 2014

Features of C that
seem to have evolved with the compiler-writer in mind are:

I doubt that this was the primary reason for any of those features,
though it was (and certainly should have been) one of the issues taken
into consideration.

Arrays start at 0 rather than 1

Having spent most of my life using C, index-0 feels more natural, but
that's only to be expected.. I've translated a fair amount of index-1
code written for Fortran into index-0 code for C. The translation
process itself can be annoyingly tricky, but properly done, the
translation is pretty much a wash; overall the code is generally about
equally complicated before and after the conversion. However, to the
extent that I saw a difference, I generally found that it was in C's
favor: there were slightly more "+1"s in the Fortran code than in the C
code.

The fundamental C types map directly onto underlying hardware

That can be true, and was particularly true on the platforms where C was
first developed, but the mapping is not necessarily simple or obvious.
Implementations have a lot of freedom in those choices, and developers
have occasionally been surprised by the choices that were made.
However, to the extent that it is true, that was done at least as much
for the benefit of the developer as for the compiler-writer.

The auto keyword is apparently useless

'auto' did not become useless until 'implicit int' was removed from the
language in C99. However, taking advantage of 'implicit int' was never a
very good idea (which is why it was removed from the language), so
'auto' was relatively useless even in C89. To understand why it was
there, you need to look at the history of the languages that were
predecessors to C, where 'auto' was more useful. I don't remember the
details - but they have been mentioned by others in this newsgroup.

Array names in expressions "decay" into pointers

More precisely, that applies to all lvalues of array type, whether or
not they happen to be the names of arrays.

You can't meaningfully talk about the consequences of changing just that
one rule, because it's too tightly integrated into the other rules of C
as they're currently written. For instance, subscripting is only
defined for pointers. The only reason why array[2] refers to the third
element of "array" is because "array" itself automatically converts into
a pointer to the first element of "array". A suggestion that this rule
should not have been chosen must be accompanied with suggestions about
how the other rules of C should have been changed to work properly
without that rule.

However, one relevant issue is that C was designed to rely upon
pass-by-value, which means that an array could only be passed to a
function by creating a pointer. Having that pointer created
automatically was designed as a convenience for the developer, I suspect
that it might actually make things a little more complicated for the
compiler writer, because it means that arrays are treated differently
from other object types.

Floating-point expressions were expanded to double-length-precision everywhere

I suspect that decision was based upon a lot of experience showing that
single-precision was often insufficiently accurate.

In modern C, it's a bit easier to avoid such implicit conversions than
in was in K&R C. In C89, function prototypes were added, which allow you
to declare function arguments to be float, thereby avoiding that
implicit conversion. In C99, almost every <math.h> function that takes a
double argument has another version with the same name plus an 'f' at
the end which takes float arguments and (where appropriate) returns a
float value.

BartC · May 7, 2014

Richard said:
Because it would be ridiculously stupid, bordering on incompetent IMO.

There are innumerable benefits from being flexible in specifying the lower
bound of an array.

One being that it makes it easier to port code or an algorithm that uses a
different base.

"the first element is at the beginning : "0 offset""

Not hard.

C conflates arrays with pointers too much which is why you're thinking of
offsets when you should be thinking of indices.

BartC · May 7, 2014

James Kuyper said:
I doubt that this was the primary reason for any of those features,
though it was (and certainly should have been) one of the issues taken
into consideration.

Having spent most of my life using C, index-0 feels more natural, but
that's only to be expected.. I've translated a fair amount of index-1
code written for Fortran into index-0 code for C. The translation
process itself can be annoyingly tricky, but properly done, the
translation is pretty much a wash; overall the code is generally about
equally complicated before and after the conversion.

There always seems to be a risk of off-by-one errors whenever I try it (with
subtle bugs due to some <= needing to be < at some place you didn't check.
Also with some algorithms which make use of the oddness or evenness of an
index, they will need extra care).

But one short-cut way of converting 1-based to 0-based is just to make the
arrays one element longer, and to carry on using 1-based indexing (ignoring
element zero). Not elegant, and a bit wasteful, but better than introducing
bugs.

(Going the other way would be more difficult, except that some languages
that are 1-based, also allow N-based including 0-based. Being tolerant
about these matters is helpful.)

James Harris · May 7, 2014

jacob navia said:
Le 07/05/2014 20:37, Richard a écrit :
....

The nice thing with origin 1 is that since array inderf zero doesn't
exist, many functions can return zero for saying "Search failed". With
origin zero you must return some other flag value (like 0xfffffff) or
whatever, what always provokes problems.

Over the years I've come across different approaches to saying "not found"
from which I infer that there is no single best answer. Possible responses:

* one less than the lowest index
* the lowest index
* the highest index
* one more than the highest index
* throw an exception

Just a thought but perhaps the best option is either to throw an exception
and have a catch clause which deals with it appropriately or, if the
language does not support exceptions, allow the caller to pass the value it
wants to be returned if the index is not found.

James

James Kuyper · May 7, 2014

There always seems to be a risk of off-by-one errors whenever I try it (with
subtle bugs due to some <= needing to be < at some place you didn't check.
Also with some algorithms which make use of the oddness or evenness of an
index, they will need extra care).

Yes, that's the main thing that makes the process tricky.

But one short-cut way of converting 1-based to 0-based is just to make the
arrays one element longer, and to carry on using 1-based indexing (ignoring
element zero). Not elegant, and a bit wasteful, but better than introducing
bugs.

That's an example of what I consider not "properly done". The one-based
indexing is going to confuse any maintenance programmer who's used to
C's normal 0-based indexing (even if properly warned, and especially if
not). I think it's better to bite the bullet and do what's needed to get
it right the first time, rather than creating traps for unwary future
maintainers.

Stefan Ram · May 7, 2014

raj shekar said:
Arrays start at 0 rather than 1
http://www.purl.org/stefan_ram/pub/zero

The fundamental C types map directly onto underlying hardware

What is »int« mapped onto?

The auto keyword is apparently useless

»auto« helps B programmers who want to start using C
immediately feel at home. This is a reason for the
great success of C, which is now number 1 on TIOBE,
beating your favorite language.

Array names in expressions "decay" into pointers

not always

Floating-point expressions were expanded to double-length-precision everywhere

not always

No nested functions (functions contained inside other functions)

»no function /declarations/ contained
insided other function /declarations/«.

This keeps C simple and small, so that efficient
C compilers are available for many targets.

James Kuyper · May 7, 2014

»no function /declarations/ contained
insided other function /declarations/«.

While true, that's not directly relevant to his point. Change
"declaration" to "definition", and what you say is both true, and
exactly what I assume he's talking about.

Stefan Ram · May 7, 2014

James Harris said:
Just a thought but perhaps the best option is either to throw an exception
and have a catch clause which deals with it appropriately or, if the
language does not support exceptions, allow the caller to pass the value it
wants to be returned if the index is not found.

Actually, these are two different values, there is a

meta indicator (indicating failure / success), and a
result (only in the case of success).

Explicitly, the function thus should return two values
(I call this: »out-of-band error indication«).

#include <stdio.h>

struct result
{ int valid; /* << here is the explicit error indicator */
int value; };

static struct result divide( int const numerator, int const denominator )
{ struct result result;
if( result.valid = denominator )result.value = numerator / denominator;
return result; }

static void print_division( int const numerator, int const denominator )
{ struct result result = divide( numerator, denominator );
if( result.valid )printf( "result = %d\n", result.value ); }

int main(){ print_division( 4, 0 ); print_division( 3, 1 ); }

Stefan Ram · May 7, 2014

James Kuyper said:
While true, that's not directly relevant to his point. Change
"declaration" to "definition", and what you say is both true, and
exactly what I assume he's talking about.

If have recently used too many sub-standard programming languages
where definitions are called »declarations«, sorry.

A /function definition/ is part of the C source text, so the meaning
of »to nest« is inherited from the nesting of texts.

A /function/ is an abstract entity that is not part of the source
text. In C, there are indeed values that have »function type«.
These are called »function designator« by N1570 6.3.2.1p4.
They do not refer to /function definitions/ which usually are not
available at run-time anymore. The meaning of the verb »to nest«
is not specified for such functions which are the values of
function designators.

glen herrmannsfeldt · May 7, 2014

(snip, I wrote)

Why not have a language allow both 0-based and 1-based?
Sometimes 0-based is useful (for measuring or for use with
offsets), sometimes 1-based is (for counting); and sometimes
N-based is. It's not hard.

Then no conversion is necessary.

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

-- glen

glen herrmannsfeldt · May 7, 2014

(snip, someone wrote)

Because it would be ridiculously stupid, bordering on incompetent IMO.
"the first element is at the beginning : "0 offset""

For writing new programs and algorithms, there is nothing wrong
with 0 origin, but it is a fair amount of work, and easy to get
wrong, to convert an existing program or algorithm.

It is still usual in matrix mathematics to index from 1.

It wouldn't be hard at all to add a new operator.

-- glen

Keith Thompson · May 7, 2014

If have recently used too many sub-standard programming languages
where definitions are called Â»declarationsÂ«, sorry.

A /function definition/ is part of the C source text, so the meaning
of Â»to nestÂ« is inherited from the nesting of texts.

A /function/ is an abstract entity that is not part of the source
text. In C, there are indeed values that have Â»function typeÂ«.
These are called Â»function designatorÂ« by N1570 6.3.2.1p4.
They do not refer to /function definitions/ which usually are not
available at run-time anymore. The meaning of the verb Â»to nestÂ«
is not specified for such functions which are the values of
function designators.

A function *declaration* is something like:

void func(void);

The corresponding function *definition* (which also provides a
declaration) is:

void func(void) {
/* ... */
}

What's disallowed in standard C is, for example:

void outer(void) {
void disallowed(void) {
/* ... */
}
/* ... */
}

Both function declarations and function definitions are part of the
source text. Nobody was referring to functions as abstract entities,
or to function designators.

Ben Bacarisse · May 7, 2014

glen herrmannsfeldt said:
(snip, I wrote)

Why not have a language allow both 0-based and 1-based?
Sometimes 0-based is useful (for measuring or for use with
offsets), sometimes 1-based is (for counting); and sometimes
N-based is. It's not hard.

Click to expand...

Then no conversion is necessary.

Click to expand...

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

You'd probably have to add the syntax as a grammar rule, keeping [ and ]
as the only tokens. Making it an actual operator (which to me implies
that the [[ and the ]] are tokens) complicates the lexer since it can't
apply the "maximal-munch" rule anymore.

Kaz Kylheku · May 8, 2014

Features of C that
seem to have evolved with the compiler-writer in mind are:

Arrays start at 0 rather than 1

High leve languages also have zero based arrays.

Zero based arrays are also described in Lisp 2 (1968), and Lisp
continues to have zero-based vectors today.

Python has zero-based arrays.

One based arrays are useful in some situations. Those situations
are rare.

*Supporting* one-based arrays isn't a bad idea.

Making them *default* is stupid; the default should be zero based.

Making one-based arrays the only choice is criminally insane.

The same isn't true of zero based arrays. Zero based arrays being
the only supported representation is perfectly workable.

The fundamental C types map directly onto underlying hardware

It's that type of language; C wouldn't be C if it wasn't like that;
it would be something else.

People who need the semantics of "types that map onto hardware"
would not use that something else.

If a higher-level-than-assembly programming language that maps types onto
hardware didn't exist, it would have to be invented.

Array names in expressions "decay" into pointers

The way arrays and pointers work in C is actually quite brilliant.

Floating-point expressions were expanded to double-length-precision
everywhere

You are misinformed. Float values undergo a special default argument
promotion when passaed to old style functions without prototypes or
as trailing arguments to variadic functions.

No nested functions (functions contained inside other functions)

This is too bad, but on the hand it is available for decades as an extension in
GNU C.

GNU C is more widely available, platform-wise, than whatever language
you have in mind which has nested functions.

Keith Thompson · May 8, 2014

Ben Bacarisse said:
glen herrmannsfeldt said:

BartC said:

Computers like to index from zero, but it isn't hard at all
for a compiler to subtract one. Converting algorithms between
zero based and one based indexing is harder than it seems it
should be.

Click to expand...

Why not have a language allow both 0-based and 1-based?
Sometimes 0-based is useful (for measuring or for use with
offsets), sometimes 1-based is (for counting); and sometimes
N-based is. It's not hard.

Click to expand...

Then no conversion is necessary.

Click to expand...

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

Click to expand...

You'd probably have to add the syntax as a grammar rule, keeping [ and ]
as the only tokens. Making it an actual operator (which to me implies
that the [[ and the ]] are tokens) complicates the lexer since it can't
apply the "maximal-munch" rule anymore.

C++ had a similar problem with << and >>. In nested template
definitions, a closing >> had to be written as > > so it wouldn't be
interpreted as a right shift operator. A more recent version of the C++
standard corrected the problem (I don't remember exactly how).

c++0x: Is it possible to make the compiler choose scope if unambiguous?	4	Mar 2, 2011
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Apr 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 1, 2008

disgusting compiler !! hahaha!!

raj shekar

glen herrmannsfeldt

BartC

Malcolm McLean

jacob navia

James Kuyper

BartC

BartC

James Harris

James Kuyper

Stefan Ram

James Kuyper

Stefan Ram

Stefan Ram

glen herrmannsfeldt

glen herrmannsfeldt

Keith Thompson

Ben Bacarisse

Kaz Kylheku

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads