"The Unreasonable Effectiveness of C" by Damien Katz

BGB · Jan 17, 2013

C++ has std::valarray. Python has NumPy (although it's not actually part
of the standard library).

well, granted, but I partly meant "plain old arrays", rather than arrays
added as a library feature.

if one includes library features, then pretty much any language with
overloaded operators can have them, and one can include C as well if
they allow for a looser definition of "add" (which allows function calls).

BGB · Jan 17, 2013

BGB said:
BGB said:

But most good high-level langauges now include array operations.
[You can say a=b+c; where a, b, and c are arrays (of any rank)]
No true Scotsman? I use many high-level languages, and none of them have
native support for vector operations. I did use R a few years ago to
validate some C code, I suppose. GCC has built-in vector extensions
for C,
but they're only for single dimensions.

Click to expand...

I have seen hardly any languages which allow adding arrays in this way.

Click to expand...

It's not really that meaningful, except where you build a more specific
type based on 'array', designed for mathematical work.

But if the language allows it, then it would quite likely be to allow
list operations in general, so that in a = b op c, then 'op' is applied
between all corresponding elements (as in languages such as "K", but
without quite such an obvious syntax...)

yep.

I have actually seen more which will interpret this as meaning you
want to append the arrays into a single bigger array.

Click to expand...

That's far more useful. But better to use a symbol other than "+" I think.

IIRC, I think I have the operator '&' set up to do this (would have to
go check).

both '+' and '&' can be used to append strings.

my language diverges from JS though in the case where strings and
numbers are added:
'string+number' steps forwards a certain number of characters (or
results in null if the code goes off the end of the string).

likewise, 'string-number' allows walking backwards, and will produce
null if the code goes before the base.

'string&number', however, converts the number to a string and appends them.

this way made more sense to me, FWIW...

BartC · Jan 18, 2013

Nah. C is popular mainly because it's popular. It doesn't have any
hideous flaws for a low level language, but it's nothing special,
language-wise (and clearly better low-level-ish languages exist).

Which ones?

Giorgos Keramidas · Jan 18, 2013

While it is generally true, the actual meaning of that characterization
is not exactly unambiguous.

C language was true "portable assembler" - i.e. a tool for directly
expressing the programmer's intent in near-machine language - mostly in
its nascent stages, as described in "C Reference Manual" and the first
edition of "C Programming Language". That was indeed a true "assembler".

The standardized C was considerably reworked and became more abstracted
from the actual machines. In many cases when one refers to it as a
"portable assembler" in the above sense of the term, it usually means
that one simply lacks proper understanding of the true abstraction level
built into the language features.

Yet, standardized C retained the crown of "portable assembler" in a
different meaning of the term: it works rather well as a back end
representation for various higher-level language translators,
i.e. instead of translating the higher-level source into assembler it
often makes much more sense to translate them into C.

All quite right, indeed. As you have, quite correctly, written, I also
think there's an important difference between people who use C because
"it's a portable assembler" and people who use C because it makes many
things "portable across different machines".

I think the first group of people do not care at all and shouldn't
really be using the term 'portable' to describe C. WHat they mean by
'portable' is something closer to 'more human readable than assembly' or
'a nice procedural layer over machine code'.

The second group of people are those who care about struct member
alignment, the nuances that make intptr_t different from a cast of
(void *) to uint64_t and so on.

These two uses of 'portable' are _not_ interchangeable, but seeing
statements like "C is portable assembler" tends to confuse the two.

BartC · Jan 18, 2013

Robert Wessel said:
[Which ones?]

Click to expand...

Ada, Modula-2, some versions of Algol, to name a few. Quality is
something of a judgment call, of course.

OK, those aren't quite what I'd call low-level, in the way that C is.

I've worked on a few myself, but there seem to be few mainstream languages
now at the level of C or below (ie. 'machine-oriented'). (In the past there
were languages such as PL/M but now C seems to be the main one.)

glen herrmannsfeldt · Jan 19, 2013

(snip)

OK, those aren't quite what I'd call low-level, in the way that C is.

Yes, but you can also see that C is not quite as high level, either.

I've worked on a few myself, but there seem to be few mainstream languages
now at the level of C or below (ie. 'machine-oriented'). (In the past there
were languages such as PL/M but now C seems to be the main one.)

There was (maybe still is) PL/360.

It is really an assembler that looks like a HLL, as surprising
side effects occur if you aren't looking.

The only others comparable to C that I can think of are the
predecessors, B and BCPL, and, as mentioned, PL/M.

-- glen

Ian Collins · Jan 19, 2013

glen said:
(snip)

Yes, but you can also see that C is not quite as high level, either.

There was (maybe still is) PL/360.

It is really an assembler that looks like a HLL, as surprising
side effects occur if you aren't looking.

The only others comparable to C that I can think of are the
predecessors, B and BCPL, and, as mentioned, PL/M.

Or its successor: C++.

glen herrmannsfeldt · Jan 19, 2013

(snip)

(snip)
All quite right, indeed. As you have, quite correctly, written, I also
think there's an important difference between people who use C because
"it's a portable assembler" and people who use C because it makes many
things "portable across different machines".

OK, consider PL/360, which is S/360 specific, but looks like a
high-level language.

R1:=R1+R1+R1;

That multipliess that value in R1 by 4. (And note that R1 is
general register 1 on a S/360 or descendant processor.)

The generated code should be:

LR R1,R1
AR R1,R1
AR R1,R1

so, you can see a non-portable high-level (looking) language, which
is really an assembler in disguise.

I think the first group of people do not care at all and shouldn't
really be using the term 'portable' to describe C. WHat they mean by
'portable' is something closer to 'more human readable than assembly' or
'a nice procedural layer over machine code'.

So, C is portable in the sense that you don't have to keep track
of registers anymore. Assembler programmers always need to remember
which values are in which register, which need to be saved (such
as before a function call), etc.

Not counting the library (and you could have a pretty fancy library
to go along with your assembler) it is usually not so hard to guess
which instructions a C statement will generate, except for keeping
track of register contents.

The second group of people are those who care about struct member
alignment, the nuances that make intptr_t different from a cast of
(void *) to uint64_t and so on.

Well, assembler programmers can keep track of DSECTs (I don't know
what other assemblers call them, but that is what the OS/360
compilers use for something like struct.)

These two uses of 'portable' are _not_ interchangeable, but seeing
statements like "C is portable assembler" tends to confuse the two.

Much of the time, I might rather write in PL/I, but C compilers
are much easier to find. My first real language was Fortran and,
not so long after that, I learned PL/I. PL/I was so much more fun
to write than Fortran IV, though often ran slower.

-- glen

BGB · Jan 19, 2013

All quite right, indeed. As you have, quite correctly, written, I also
think there's an important difference between people who use C because
"it's a portable assembler" and people who use C because it makes many
things "portable across different machines".

I think the first group of people do not care at all and shouldn't
really be using the term 'portable' to describe C. WHat they mean by
'portable' is something closer to 'more human readable than assembly' or
'a nice procedural layer over machine code'.

The second group of people are those who care about struct member
alignment, the nuances that make intptr_t different from a cast of
(void *) to uint64_t and so on.

These two uses of 'portable' are _not_ interchangeable, but seeing
statements like "C is portable assembler" tends to confuse the two.

a person can basically use it somewhere in between:
C is a "portable assembler" in that it can generally do a lot of the
stuff ASM would normally do, but is high-level enough to not need to be
rewritten for each target machine (but can be used over a range of
machines).

though, there may still be things which C can't do, which may require
occasional use of real ASM code.

one thing that is often a hindrance to using C as a good target language
for compilers is, ironically, its fairly strict adherence to top-level
block-structuring.

some C extensions, such as computed goto, can help to a limited extent
here, but given computed goto can't cross function boundaries (or, by
extension, boundaries between one compilation unit and another), a limit
is placed on its use here (IOW: it doesn't scale very well).

the alternative option is mostly to break things down into smaller units
and emit each as its own function, making heavy use of function-calls
and function pointers instead, but this usually comes at some
performance cost (an HLL function will often be split into a number of C
functions and often involve some use of a "trampoline loop" to call
between them).

note that code compiled using this strategy will often not be
significantly faster IME than using a similar strategy to interpret
threaded code, since in both cases there will be notable slowdown due to
function calls/returns and the use of trampoline loops. code in both
cases will often be several times, or more, slower than "direct" C code
(code lacking excessive function calls or trampoline loops).

(in my own tests, I have usually seen a slowdown of at least 5x-8x from
doing so, typically with most of the running time the "trace dispatch
loop", or, essentially, a "trampoline loop").

it seems to be due mostly to the use of unpredictable indirect calls,
which the CPU seems to really dislike.

an alternative compromise could be though if (as a C extension) it were
possible to fetch and call label-pointers as if they were function
pointers (using the same signature as that of the parent function), and
also potentially refer to them from outside the function in question
(more like a struct), but currently I am not aware of any compiler which
supports this.

foo.label1(3, 4); //call into foo() via an arbitrary label.
fcn=&foo.label1; //get label pointer as function pointer.

possibly with something inside foo like:
"extern label1:"
with extern giving a compiler hint that maybe it should generate any
glue needed to make this callable.

this would not necessarily make calling them any faster (since they
would still need to set up a call-frame / ...), but could make the
general use of label-pointers and computed goto less limited.

though, yes, the big cost is (even if supported) it would still leave
the high-level compiler tied to a specific low-level compiler.

effectively, a lot of this still tends to leave ASM as a "better" target
for "reasonable performance" compiler output (since ASM doesn't really
care how anything is structured).

it can also allow for faster interpreters, although granted,
unpredictable indirect jumps are still a performance killer (in either C
or ASM).

however, making the interpreter work by spitting out a glob of native
code something like (or its CPU/ABI specific analogue):
push ebp
mov ebp, esp
....
mov eax, [...]
mov [esp], eax
call VM_OpA
mov [esp], eax
call VM_OpB
mov [esp], eax
call VM_OpC
mov [esp], eax
call VM_OpD
mov [esp], eax
....
mov esp, ebp
pop ebp
ret

can at least make things a fair bit faster...

or such...

Edward A. Falk · Jan 19, 2013

I have seen hardly any languages which allow adding arrays in this way.

Yeah, I was going to say. What languages *do* add arrays this way?
PL/1? Algol?

Lynn McGuire · Jan 20, 2013

Yeah, I was going to say. What languages *do* add arrays this way?
PL/1? Algol?

Fortran starting with the 1990 spec:

http://research.physics.illinois.edu/ElectronicStructure/498-s97/comp_info/array.html

Lynn

glen herrmannsfeldt · Jan 20, 2013

Edward A. Falk said:
Yeah, I was going to say. What languages *do* add arrays this way?
PL/1? Algol?

Click to expand...

I am not sure about ALGOL. PL/I had array operations from the beginning,
Fortran added them in Fortran 90.

Many interpreted languages have array operations, maybe back
to APL. Mathematica, Matlab, R, and similar systems all do.
A google search for "array expressions" shows Jive,
Haskell, and PostGreSQL (maybe SQL in general), and Cython.

PL/I might be the only one with structure expressions.
You can add (or multiply, or just about anything else)
two structures in PL/I for member by member evaluation.

DCL (1 X, 1Y) , 2 A FIXED DECIMAL(5,2), 2 B FLOAT BINARY(10);

(I haven't written many structure declarations in PL/I,
but I believe that is right.

X=1;
Y=2;
PUT SKIP LIST(X+Y);

-- glen

glen herrmannsfeldt · Jan 20, 2013

(snip)

though, there may still be things which C can't do, which may require
occasional use of real ASM code.

one thing that is often a hindrance to using C as a good target language
for compilers is, ironically, its fairly strict adherence to top-level
block-structuring.

some C extensions, such as computed goto, can help to a limited extent
here, but given computed goto can't cross function boundaries (or, by
extension, boundaries between one compilation unit and another), a limit
is placed on its use here (IOW: it doesn't scale very well).

longjmp() with an array of jmp_buf should work.

-- glen

BartC · Jan 20, 2013

glen herrmannsfeldt said:
(snip)

longjmp() with an array of jmp_buf should work.

I use computed goto for threaded code (I assume computed goto is the use of
label pointers). I use it because in my project, alternatives are slower.

Is longjmp (which I've never used) likely to match the speed of an indirect
goto? And is it suitable for doing large numbers of such jumps one after the
other? (With computed gotos, I do around 100 million every second.)

BGB · Jan 20, 2013

(snip)

longjmp() with an array of jmp_buf should work.

not sure how one would pull this off exactly...

if you mean having a bunch of "if(setjmp(&arr[N]))goto lbl;", this could
work, but IMO is likely to be slower than splitting things up into
functions and calling through function pointers.

I did have an idea which I am writing up in a spec, which I am calling
"CLIL" for "C-like Intermediate Language", which would include some of
the ideas for non-local computed goto (and calls via labels), but would
otherwise be a language more like "C--" or "B".

basically, it would probably be as part of a JIT backend for my
scripting language (and probably less of a pain than writing a full JIT
directly).

or such...

BGB · Jan 20, 2013

I use computed goto for threaded code (I assume computed goto is the use
of label pointers). I use it because in my project, alternatives are
slower.

yeah. computed goto is the use of label pointers and then using goto to
call them. it is nifty, but sadly, isn't available with MSVC, so I am
mostly using function pointers, but still using threaded code.

this works ok, but still can't reach native speeds.

like, using selection sort (an O(n^2) sort) on a 64k element array still
takes several minutes.

likewise for things like, at present, doing a "fib(40)" or similar via
the recursive Fibonacci function.

but things still take considerably less time with natively compiled code
(along the lines of a matter of seconds, like around 15 seconds or so
for the array sort).

so, around 133M swaps/sec for C, and about 8M swaps/second for
interpreted code. with each swap involving spinning around in a "for()"
loop and comparing array indices.

though all this is "usually good enough", and I have been at least
gradually squeezing more speed out of the plain-C interpreter.

Is longjmp (which I've never used) likely to match the speed of an
indirect goto? And is it suitable for doing large numbers of such jumps
one after the other? (With computed gotos, I do around 100 million every
second.)

I doubt it. "longjmp()" is actually likely to be a bit slower than
either computed goto or a call through a function-pointer, since it
essentially involves restoring registers potentially followed by several
indirect jumps.

AFAICT, the issue is mostly that making an indirect call seems to cause
a pipeline stall or similar, although at least the CPU seems somehow
able to predict its way through a return instruction.

a recent speed-squeezing trick though is breaking the opcodes up into
traces, which execute opcodes in sequence (basically, a trace will
unconditionally execute a chain of instructions, usually terminated via
a jump or an opcode which has a chance of throwing an exception).

for example:
BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
op=op->fcn(ctx, tr->thop);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
return(tr->next);
}

(which execute a fixed number of thread-ops and transfer control to the
next trace).

or, the slightly more generic fallback case:
BSVM_ThreadTrace *BSVM_ThTr_RunDefault(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
int i;
op=tr->thop; i=tr->n_thop-1;
while(i--)
{ op=op->fcn(ctx, op); }
return(tr->end(ctx, tr, op));
}

all this can help some, but still isn't native speeds.

I don't know exactly how many ops per second are executing, but I
suspect it is probably "up there", given I am measuring loops involving
calling C functions (via the FFI) and getting/setting object fields
capable of spinning at a rate of several million iterations per second.

so, luckily, it probably at least isn't *that* slow, even if a JIT or
similar would probably be a bit faster.

BGB · Jan 20, 2013

<snip>

I was surprised to find that the following GCC hack works:

#include <stdio.h>

static void *One, *Two;

static int foo(void *label) {
__label__ one, two;
__attribute__((constructor, used)) void bar(void) {
One = &&one;
Two = &&two;
}

goto *label;
one:
return 1;
two:
return 2;
} /* foo() */

int main(void) {
printf("one: %d\n", foo(One));
printf("two: %d\n", foo(Two));

return 0;
} /* main() */

I discuss some of the details in a blog post "Acquiring address of labels
outside of scope": http://25thandclement.com/~william/blog/

well, yes, it works...

the downside is, of course, that you still have to call into the
function (or otherwise be inside the the function) to goto the label
(and have a single massive function to make much use of it).

I am imagining if some of these restrictions were lifted, such that a
language would allow direct inter-procedural goto, and ability to call
directly to labels within a function, so in these regards would be more
like assembler (sort of like a macro-assembler with more C like syntax).

I started writing out some ideas for such a language, but it isn't
really C (the idea ended up involving largely dropping
structured-programming constructs, instead treating functions more as a
way of declaring the active stack-frame layout), ... and would probably
use a simplistic single-pass compiler, and be largely untyped.

(in contrast, C puts considerably more distance between the language and
the underlying ASM code).

still thinking on it though, don't yet know if I will do anything with
the idea.

or such...

BGB · Jan 20, 2013

for example:
BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
op=op->fcn(ctx, tr->thop);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
return(tr->next);
}

blarg...

"clever" last-minute optimization kind of made this one critically broken.

fixed version more like:
BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
op=tr->thop;
op=op->fcn(ctx, op);
...
}

which avoids using an uninitialized variable...

well, nevermind then...

BGB · Jan 20, 2013

You can actually jump into the function. It should be just as safe (or not
safe) as that nested constructor function, which is also technically not
supposed to be invoked outside of the parent's invocation. That's why I was
surprised that GCC actually set it up for the linker to invoke.

I didn't see much that appeared to be outside of normal GCC behavior...

this works mostly because labels exist at compile time, rather than
runtime (as would be the case for auto-variables or similar).

so, in this case, what the nested function is doing is little-different
from a non-nested function, apart from being able to see the labels in
the parent function.

But by jumping straight into the function you wouldn't have the ability to
use any auto variables. You could "return", though, by using another
computed goto in like manner.

well, this is going outside of C-land here...

actually, I imagined that it would still allow auto-variables, just in a
very unsafe way:
whatever is the current stack-frame, will be interpreted as if it had
the same layout as that present in the function body.

this would mean the ability to "safely" goto between any two functions
with the same function-arguments and variable declarations, and still
use the variables (it would be undefined-behavior if the declared
variables don't match exactly though when doing so).

basically, the declared variables and arguments list would be taken as
the "active stack-frame layout" at the point where the code is compiled.

so, if the same variables exist by the same name in the same order, ...,
then they may be accessed as such. and, if the frame-layout doesn't
match, well then...

call/cc?

well, you could probably implement a call/cc via such a mechanism, but
it isn't exactly the same thing as call/cc, since all this would be
operating at a much lower level (call/cc being a much higher-level
mechanism).

basically, in typical ASM (or, non-macro ASM), there are no actual
"functions" in the high-level sense, it is mostly all just registers,
memory, and labels. the calling-convention is then a convention built on
top of these lower-level constructions.

in something like a code-generator, whatever higher-level constructions
exist, are typically mapped over to a "call frame" structure on the
stack, for example:

....
arg3
arg2
arg1
return EIP
saved EBP
var1
var2
var3
....

it can then be asserted then, that the main thing a function body does
is to establish the layout of the call-frame, but does not necessarily
mandate how control flow either enters or leaves the function (the top
of the function then is merely a "default" entry point).

then, a person can allow all the crazy semantics that can be allowed via
horizontal gotos.

calls via a label would be special, mostly in that a call via a label
would construct a new call-frame.

the most likely way to deal with this would be to mark the label as
"extern", and then the compiler would spit out the code necessary to
allow calls into the function via this label (so, it would "cheat"
slightly in this sense, by effectively generating code for multiple
entry points to a function, making the callable label-pointer
essentially be a normal function pointer).

some cases would probably also require generating frame-cleanup code
(similar to that used in tail-call optimization).

but, yeah, still all hypothetical at this point...

or such...

BartC · Jan 20, 2013

BGB said:
On 1/19/2013 7:26 PM, BartC wrote:

yeah. computed goto is the use of label pointers and then using goto to
call them. it is nifty, but sadly, isn't available with MSVC, so I am
mostly using function pointers, but still using threaded code.

this works ok, but still can't reach native speeds.

With my interpreter, and a combined bunch of benchmarks, I get a time of 51
seconds using these label pointers (for opcode dispatch). Using plain
switch it was 62 seconds. With a simple loop calling function pointers it
was 75 seconds (because most inlining ability is lost).

So there's not much in it, and still a long way from native code. (And when
I tested on an ARM processor, there seemed little difference between
indirect gotos, and switch.)

(Plugging in an actual ASM module to do the dispatch, but initially just
calling the C handlers, it was 82 seconds! So the same time as the function
dispatch method, plus extra overheads of the more complex ASM threaded
call-chain (plus save/restore registers). However, it took a just a few
hours to write enough inline functions in the ASM module to beat the best
51-second C timing, but only by 10%, so not worth the trouble or the loss of
flexibility.)

like, using selection sort (an O(n^2) sort) on a 64k element array still
takes several minutes.

likewise for things like, at present, doing a "fib(40)" or similar via the
recursive Fibonacci function.

For fib(40) I get 18 seconds using the function pointer loop, and 9.5
seconds using label pointers. I've achieved about 5 seconds in the past, but
there were too many complications. (Native C was 0.8 to 1.4 seconds.)

I've given up trying to get anywhere near the speed of native code (which is
not going to happen with the sort of approaches I use); instead I have in
mind an intermediate language to do that!

though all this is "usually good enough", and I have been at least
gradually squeezing more speed out of the plain-C interpreter.

That's the thing; for real programs, differences in speed are smaller (while
a few things are faster than C, for a given amount of effort, thanks to the
use of counted strings for example).

"The Unreasonable Effectiveness of C"	12	May 30, 2014
I'm tempted to quit out of frustration	1	Aug 13, 2023
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
On the development of C	211	Mar 9, 2009
C and the future of computing	0	Apr 1, 2011
C language now truly universal	0	Jan 1, 2011
Comparision of C Sharp and C performance	360	Dec 27, 2009
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010

"The Unreasonable Effectiveness of C" by Damien Katz

BGB

BGB

BartC

Giorgos Keramidas

BartC

glen herrmannsfeldt

Ian Collins

glen herrmannsfeldt

BGB

Edward A. Falk

Lynn McGuire

glen herrmannsfeldt

glen herrmannsfeldt

BartC

BGB

BGB

BGB

BGB

BGB

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads