Dennis Ritchie -- An Appreciation

Alan Curry · Nov 2, 2011

we can represent the data like this

typedef struct
{
char name[64];
int id;
float salary;
} EMPLOYEE;

EMPLOYEE employees[10];
or like this

typedef struct
{
char name[10][64];
int id[10];
float salary[10];
} EMPLOYEES;

EMPLOYEES employees;

When this program grows up it'll use malloc/realloc to create and grow the
array, and the second version will need a separate realloc for every field
while the first has one realloc for all.

How do you imagine that problem might be fixed?

nroberts · Nov 2, 2011

nroberts said:
nroberts said:

On Nov 2, 10:14 am, Malcolm McLean <[email protected]>
wrote: ...

We've got ten employees, each with name, payroll id, and salary.
we can represent the data like this
typedef struct
{
char name[64];
int id;
float salary;
} EMPLOYEE;
EMPLOYEE employees[10];
or like this
typedef struct
{
char name[10][64];
int id[10];
float salary[10];
} EMPLOYEES;
EMPLOYEES employees;
The two methods hold the same data, and have the same access
characteristics - iterating through the list is done in O(N) time,
random access is in O(constant) time, searching for the maximum salary
takes O(N) time, etc. Theya are logically equivalent.

Click to expand...

Click to expand...

I thought nobody was saying that!!!!

Click to expand...

No, the assertion that no one was making was that they were
interchangeable.

LOL!

I think that pretty clearly lays out why it's pointless to continue,
at least with you.

nroberts · Nov 2, 2011

On Wed, 2 Nov 2011 13:11:27 -0700 (PDT), jameskuyper

snip]

Well, yes - the point is, in C, one method is supported directly, and
the other way requires "writing custom functions to serve as binder
expressions". That's precisely the bias he's talking about. I can't
say its a very important bias, but it's a real one. It doesn't deserve
the amount of attention it's received so far, but that amount of
attention has been due almost entirely to the challenges you've made
against it.

Click to expand...

There is another bias. It is customary to define a struct with the
actual fields. Thus

typedef struct
{
char name[64];
int id;
float salary;

} EMPLOYEE;

The definition does not impact how the array is allocated. Thus

EMPLOYEE employees[10]; /*and */

EMPLOYEE *employees;
employees = malloc(10 * sizeof(*employees));

both work. However if we are doing struct of arrays we need different
definitions for the declared arrays case and for the allocated arrays
case. Thus

typedef struct
{
char name[10][64];
int id[10];
float salary[10];

} EMPLOYEES;

EMPLOYEES employees;

vs

typedef struct
{
char name[][64]; /* declaration probably wrong */
int *id;
float *salary;

} EMPLOYEES;

EMPLOYEES employees;

with the further inconvenience that every field must be malloced
separately. On the other hand we now have room for the number of
employees in the struct.

Again though, none of that has anything to do with syntax. In a
language with the abstractions of "array" and "structure" or "record"
that also provides the other features you are talking about such as
dynamic allocation, etc... you are going to have the exact same
issues. The logical structure of your program data directly effects
the complexity level of its algorithms. The wrong logical structure
for the wrong kind of operation is going to result in a very complex
algorithm.

Even if you implemented your program in direct machine code, if you
used the conceptual abstractions of "array" and "structure" the
operations you need to perform on your data is going to dictate what
contains what.

This is not a language issue. It's not a C bias, it's a logical one.
It's intrinsic to the definition of "structure" and "array". It's the
same in Pascal, Basic, and any other language that has these
abstractions. It cannot BE any other way.

You can organize your data in different manners, yes. When you have a
bunch of values that are related by a concept, and you want to have a
bunch of values of that concept, the simplest method of dealing with
that data in a language that HAS structures and arrays is to make an
array of structures. The concept is the structure, it has its value
elements, and it's held in an array so you can have many of them.
This is not the most convenient organization for all algorithms but
for most it will be regardless of the syntax you're using.

You can't just organize your data in an inversion of that and expect
things to work the same or as well. Again, this is not a syntatic
issue, it's an issue of equivalence...in that they are not. The
organization is completely different and thus the operations will be
different and some algorithms will be severely complicated by the
change while others will be simplified. This would be true though
whether your blocks are begun with '{' and ended with '}' or if
instead it was 'do' and 'done'.

Similarly you'd have the same issues in a language that only had
lists. Do you keep a list of employees that have names and ids, or do
you keep a list with name and id lists that have entries for each
employee? The choice made here will alter many, many thing and in
general the easiest method is very likely to be the former rather than
the later...be it in C or LISP.

Uncle Steve · Nov 2, 2011

On Nov 1, 6:35 pm, James Kuyper <[email protected]> wrote:

Click to expand...

On 11/01/2011 05:16 PM, nroberts wrote:

Click to expand...

On 11/01/2011 03:59 PM, nroberts wrote:
...

Click to expand...

and has nothing to do with syntax. Structures and arrays are in no
way interchangeable whether you are writing in C, C++, BASIC, or
Brainfuck.

Click to expand...

No one has suggested that they are interchangeable. Only that data which
can be stored as a single array of structures can also be stored as a
single structure by converting each member to an array.

Click to expand...

2+2 = 27?

Click to expand...

Unless you're making a completely pointless statement ...

Click to expand...

Richard Harter's point was precisely that it's significantly more
complicated iterating through the arrays in a structure than through an
array of structures.

Click to expand...

Because *C* has a bias in that direction. I was trying to understand
that assertion because it seems like nonsense to me, and now everyone
wants to pretend it was never made. That's fine. If you guys want to
make nonsensical assertions and then pretend you were making sense the
whole time it's no problem with me, I'll just remain unconvinced.

Click to expand...

We've got ten employees, each with name, payroll id, and salary.

we can represent the data like this

typedef struct
{
char name[64];
int id;
float salary;

} EMPLOYEE;

EMPLOYEE employees[10];
or like this

typedef struct
{
char name[10][64];
int id[10];
float salary[10];

} EMPLOYEES;

EMPLOYEES employees;

The two methods hold the same data, and have the same access
characteristics - iterating through the list is done in O(N) time,
random access is in O(constant) time, searching for the maximum salary
takes O(N) time, etc. Theya are logically equivalent.

Click to expand...

I thought nobody was saying that!!!!

The truth is that they are not at all logically equivalent. The
difference between the two is 100% logical.

The only
difference is the way the employees are laid out in memory.

Click to expand...

There could be no difference in how they're laid out in memory, the
difference is in their *logical* structure--completely the opposite of
what you're saying. I can write the statements to access a field in
English and the difference is still there:

Get the N'th employee from the employee's array and access its id
field.
Get the ids field from the employees structure and access its N'th
element.

No matter what syntax I use that is specific enough for talking to a
computer, I'm still going to have to access the elements in different
manners.

Talk of computer programming and you talk of "manners"? I must have
died and gone to hell.

Not in language X. In language X you're using the field() syntax.
You've essentially got an array of structures in language X no matter
what you might want and you're writing your generic mean function to
use binder expressions like: mean(field(employees, _1, salary)). You
can do this in C by the way though not at quite that high level
(you'll be writing custom functions to serve as binder expressions).

The cowardly lion might have a thing or two to say about that.

Which is all "language X" is. You could write it in C as a macro and
then still access the underlying memory. It's all going to depend
upon the needs of your program.

All you're doing here is arguing for abstractions. You're not showing
that C's syntax forces you into one form of data expression or
another, you're saying that it would be nice to be able to manipulate
your data at a higher level than structures and arrays. There's
nothing stopping you from doing that in C and there's nothing that
makes it more or less difficult except perhaps a lack of utility
functions in the standard library, which is not a syntax issue.

What isn't a syntax issue?

It is true that there are many languages at higher levels than C that
support the kind of expressions you're talking about. C++ in fact
provides much of the functionality you want here, or at least better
facilities to create it. But when you get to this point you're no
longer talking about structures and arrays, you're talking about even
higher level concepts. I do agree that abstractions can be a
wonderful thing, but this is clearly not a C specific issue.

Duh. We're talking about basic computer science. Your 'victorian'
values blind you to the truth of this reality. Oh, shit. Did I say
that on the record?

Regards,

Uncle Steve

James Kuyper · Nov 3, 2011

On 11/02/2011 06:11 PM, Richard Harter wrote:
.....

typedef struct
{
char name[][64]; /* declaration probably wrong */

That would be acceptable if it were the last member of the struct, in
which case it would declare a flexible array member. But if you were to
do that, then the struct object itself would have to be dynamically
allocated. To declare name as a pointer to an array of 64 char, you
would write

char (*name)[64];

int *id;
float *salary;
} EMPLOYEES;

EMPLOYEES employees;

with the further inconvenience that every field must be malloced
separately. On the other hand we now have room for the number of
employees in the struct.

In general, any struct containing a pointer to a dynamically allocated
block of memory of variable length should contain a count that indicates
how long that block is.

Kaz Kylheku · Nov 3, 2011

Not having scope tied destructors hurts.

Not having scope-tied exit code ("unwind protect") is what hurts!

This hurts even when you have destructors because destructors are
clumsy. They let you run code, but in a remote context somewhere with
its own scope which has no visibility into the scope which is
terminating (i.e. the object that actually needs the clean up).

I would gladly trade C++ destructors for unwind protect.

Uncle Steve · Nov 3, 2011

Not having scope-tied exit code ("unwind protect") is what hurts!

This hurts even when you have destructors because destructors are
clumsy. They let you run code, but in a remote context somewhere with
its own scope which has no visibility into the scope which is
terminating (i.e. the object that actually needs the clean up).

I would gladly trade C++ destructors for unwind protect.

Sounds like a pipe dream. HTF are you going to unwind protect Kylie
Minogue? Let's be realistic about this, eh?

Regards,

Uncle Steve

Ian Collins · Nov 3, 2011

Not having scope-tied exit code ("unwind protect") is what hurts!

This hurts even when you have destructors because destructors are
clumsy. They let you run code, but in a remote context somewhere with
its own scope which has no visibility into the scope which is
terminating (i.e. the object that actually needs the clean up).

I would gladly trade C++ destructors for unwind protect.

How would unwind-protect work in cases where you want an object's
destruction to cause the release of a managed resource (a smart pointer
for example)?

Ben Bacarisse · Nov 3, 2011

The big one is that if we have an array of structs
we can readily form a pointer to a particular struct whereas with a
struct of arrays we can't. With an array of structs we can do things
like

func(a) /* pass a pointer to a struct to a function */
b = a; /* Make a copy of a struct */

With a struct of arrays we can't. The issue here is that our data in
effect is a two dimensional array with one of the dimensions being the
fields. In some languages slicing is intrinsic to the language; in C
it is not.

The syntax favors one choice over another; i.e., it is
easier to do certain common things (examples above) with one choice
than the other.

Click to expand...

Part of the debate may be caused by your talking of the *syntax*
favouring one rather than the other. Whilst it is true that there is no
syntax for the "sliced" struct access, nor for a pointer to such a
"sliced" struct, the bias (if that is what it is) goes much deeper than
C's syntax.

The purpose of transposing the data structure is to get a different
layout in memory, so, inevitably, the sliced structs in the "struct of
arrays" layout are not objects in the C sense. All the numerous rules
about object representations, sizes, effective types, pointers to
objects and so on can't apply to these. In other words, the syntax of
the language merely represents its fundamental, low-level, view of
objects, types and values.

Of course, these slices need not be objects -- an enhanced C could
introduce a new kind of pointer and a new kind of "thing" to which these
point, but, again, that's a lot more than removing a bias in the syntax.

I accept that it's possible to view all these changes as simply the
consequence of having new syntax that supports slices or transposed data
structures, but, because of C's low level nature and its history of
having very informally defined semantics, most "C people" (I count
myself as one of these) are used to thinking of the underlying concepts
as fundamental, and of the syntax as being a natural (and rather trivial)
consequence of these.

By the way, in such a language one would want to avoid having to repeat
the size (as has been done in the various examples) -- maybe by just
having a "transpose" keyword on a type declaration or an object
definition. That way, such objects could be naturally malloc'd.

<snip>

Malcolm McLean · Nov 3, 2011

The purpose of transposing the data structure is to get a different
layout in memory, so, inevitably, the sliced structs in the "struct of
arrays" layout are not objects in the C sense. All the numerous rules
about object representations, sizes, effective types, pointers to
objects and so on can't apply to these. In other words, the syntax of
the language merely represents its fundamental, low-level, view of
objects, types and values.

Of course, these slices need not be objects -- an enhanced C could
introduce a new kind of pointer and a new kind of "thing" to which these
point, but, again, that's a lot more than removing a bias in the syntax.

If I have a struct employee, the payroll id and the salary might not
be physically stored on the same memory chip. There's a layer of
electronic wizardry between the memory substrate and the program which
maps maybe several chips into the same address space.
Similarly, at the program level, I need to tie the payroll id to the
salary. The easiest way of doing this is to have the salary directly
follow the payroll id in memory. Then we can pass about raw addresses,
and the data is tied that way. To do that in language X, we'd need
something like an "array base" which gave the base addresses for each
element, then we'd have to specify that records can only be accessed
via the array base (So if in C I pass in a struct employee * to
transfer the salary to the bank account referenced by the payroll id,
in language X I pass in an array base *, plus an index, though of
course the fact that I'm passing two values could be hidden by the
syntax).

Ben Bacarisse · Nov 3, 2011

Malcolm McLean said:
If I have a struct employee, the payroll id and the salary might not
be physically stored on the same memory chip. There's a layer of
electronic wizardry between the memory substrate and the program which
maps maybe several chips into the same address space.
Similarly, at the program level, I need to tie the payroll id to the
salary. The easiest way of doing this is to have the salary directly
follow the payroll id in memory. Then we can pass about raw addresses,
and the data is tied that way. To do that in language X, we'd need
something like an "array base" which gave the base addresses for each
element, then we'd have to specify that records can only be accessed
via the array base (So if in C I pass in a struct employee * to
transfer the salary to the bank account referenced by the payroll id,
in language X I pass in an array base *, plus an index, though of
course the fact that I'm passing two values could be hidden by the
syntax).

I can't tell if you agree with me or not.

I am saying that C's view of objects is defined in terms of it's view of
addresses and address arithmetic -- it's all self contained and
independent of the hardware. The proposal, to have simplex syntax to
declare and access sliced structures as easily as ordinary ones, goes
against that and is, in my opinion, better described as more than some
missing syntax.

nroberts · Nov 3, 2011

(e-mail address removed) (Richard Harter) writes:

The purpose of transposing the data structure is to get a different
layout in memory,

I disagree. The purpose of transposing the data structure is to get a
different *logical view* of the data, which alters the algorithms
you'll write to access or change that data.

They key here isn't that C has this or that syntax, or that it's
abstractions are close to the machine, but that the data is better
organized in one manner or another. Arrays are best for containing N
number of elements of the same type (which is also an abstraction
provided by C). Structs are best for containing N number of elements
of mixed type with different names and meanings. As I showed earlier,
the hierarchy of the data within the data structure is what may bias
us to choosing one or the other as you can show the same question come
up when dealing with languages that don't have structures and arrays.

A higher level of abstraction provides more freedom at the
implementation level than structs and arrays, but that's not the real
key issue. Furthermore, C is expressive enough to allow you to create
higher abstractions when you need them and are willing to pay the
cost. In my opinion it's not quite as expressive as C++, but it is
expressive enough.

I don't see this as a language issue, a syntax issue, or even a memory
layout issue which is only coincidentally similar to the logical
layout that IS the issue. There is certainly reason for this
coincidence in the design of C and many other languages, but from the
perspective of this problem the memory layout could be anything and we
wouldn't care.

Ben Bacarisse · Nov 4, 2011

nroberts said:
I disagree. The purpose of transposing the data structure is to get a
different *logical view* of the data, which alters the algorithms
you'll write to access or change that data.

I meant the purpose of doing so /in this discussion/ was to get a
different layout in memory. The fact that the accesses will then be all
messed up is what Richard Harter describes as a bias in the syntax, and
I prefer to think of as something more fundamental about C's view of
objects and types.

If you do it to get a different *logical view* of the data, then you are
quite content with C as it standard, because that is exactly what you
do get.

They key here isn't that C has this or that syntax, or that it's
abstractions are close to the machine, but that the data is better
organized in one manner or another.

Yes, but there is more than one meaning of better. I've seen a Lisp
system where the cons cells are implemented like this:

int car[1000], int cdr[1000];

rather than the more obvious

struct cell { int car, cdr; } cell[1000];

because such an organisation performed better. The logical relationship
is intended to be the same, but the programmer has been forced to break
it for performance reasons. (The two arrays were not put into a struct
because that would not buy you anything with C as it stands).

I imagine that Richard would prefer C to have something like this:

transposed struct cell { int car, cdr; } cell[1000];

so that cell[42] is still "an object", but cell[42].car is followed by
cell[43].car in memory.

This would enable the programmer to express two kinds of "better" -- the
way the data is logically related, and the way it is laid out in memory.

[This is an interesting discussion, but I may not be able to post for
several days. Oh well...]

<snip>

nroberts · Nov 4, 2011

I meant the purpose of doing so /in this discussion/ was to get a
different layout in memory. The fact that the accesses will then be all
messed up is what Richard Harter describes as a bias in the syntax, and
I prefer to think of as something more fundamental about C's view of
objects and types.

Right, but not specifically C...it's fundamental to the nature of
computing. One could argue for a different set of data structures be
made available, higher abstractions, but I don't see that it's
necessary for the purposes that C was trying to solve in its design.
I think C nicely provides a minimal language for software development;
something that's both small and relatively easy to use. It's not my
language of choice, but it's good for what it is.

They key here isn't that C has this or that syntax, or that it's
abstractions are close to the machine, but that the data is better
organized in one manner or another.

Click to expand...

Yes, but there is more than one meaning of better. I've seen a Lisp
system where the cons cells are implemented like this:

int car[1000], int cdr[1000];

rather than the more obvious

struct cell { int car, cdr; } cell[1000];

because such an organisation performed better.

Right. This is why I said generally, or at least did through most of
the discussion and just slipped up here. Generally speaking, stronger
cohesion equals a better program. Sometimes it pays to organize the
code in an inconvenient manner in order to get better performance out
of certain operations due to being able to use faster algorithms for
them. What you have to do with the data is as important as what it
represents.

The logical relationship
is intended to be the same, but the programmer has been forced to break
it for performance reasons. (The two arrays were not put into a struct
because that would not buy you anything with C as it stands).

The logical relationship may be intended to be the same, but it
isn't. The relationship is much less cohesive since in the former you
enforce the combination of car/cdr by putting them in the same bucket,
while in the latter that relationship is soft and enforced through
manipulation...which requires a lot more work and care from the
programmer. It works for this problem, that's not something I'm
denying, and it represents the same data relationships, also something
I'm not denying, but the way you use it is fundamentally different not
because of anything unique to C, but because of what buckets you're
putting things in.

I imagine that Richard would prefer C to have something like this:

transposed struct cell { int car, cdr; } cell[1000];

so that cell[42] is still "an object", but cell[42].car is followed by
cell[43].car in memory.

This would enable the programmer to express two kinds of "better" -- the
way the data is logically related, and the way it is laid out in memory.

Correct. This is "language X". It's a higher abstraction than
structs and arrays provide. Abstractions are great but you pay a cost
for them. All the operations you do by hand in by using the raw
abstractions has to be done by this "language X" abstraction. This is
fundamental to the logical construction of computers where the only
"data type" we have is a gigantic array that we can chunkify to
various lengths. It is true that you won't get that dot syntax in C,
but you can implement this abstraction with macros or functions on the
fundamental data types that C provides.

If there's a bias here in C it's in what abstractions it provides to
the programmer. In this it shares a great deal with other languages.
Even in languages that have different base abstractions, of which
there are not a lot, you'll have similar issues and apparent "biases".

I suppose that in one way I was wrong in saying this has nothing to do
with memory. The fact that computer memory looks like a giant array
means that all the abstractions we use have to be translatable to that
representation and that the further from it you get, the more
operations have to be performed. This is WHY structs and arrays are
what C has and not something else like "language X" features. They're
not at all far removed from the fundamental data structures of
computing devices. But like I showed with the list example, you don't
need these particular data structures to run into a bias for
cohesion. In fact, I don't think you even need computers for this
issue to exist since we could change the discussion to being about
REAL buckets, files in a drawer, etc...and it would still come up
along with all the tradeoffs everyone's bringing up.

Malcolm McLean · Nov 4, 2011

But like I showed with the list example, you don't
need these particular data structures to run into a bias for
cohesion.

The question is what's more important, for the record to be cohesive
or for the list of fields to be cohesive?

If you say "the record must be cohesive" then that makes it very
expensive to add or delete a field. If you say "the fields must be
cohesive" that makes it rather more expensive to add new records but
the difference isn't so great. When processing data, operations like
"get the mean salary" become easier with the fields cohesive,
operations like "copy the record" become easier if the record is
cohesive.

nroberts · Nov 4, 2011

The question is what's more important, for the record to be cohesive
or for the list of fields to be cohesive?

If you say "the record must be cohesive" then that makes it very
expensive to add or delete a field. If you say "the fields must be
cohesive" that makes it rather more expensive to add new records but
the difference isn't so great. When processing data, operations like
"get the mean salary" become easier with the fields cohesive,
operations like "copy the record" become easier if the record is
cohesive.

Assuming I agreed with all that, how does it support your assertion
that C has a bias toward record cohesion? You still haven't explained
why you think this is a language issue.

Ben Bacarisse · Nov 9, 2011

nroberts said:
Right, but not specifically C...it's fundamental to the nature of
computing.

Forgive me for cutting you off here, but the fact that you say this
makes it clear that I've not explained myself at all well. Until we
clear this up, it's just a waste of time discussing details.

I am saying that the organisation of data in memory, and the
organisation of data into logical structures are orthogonal, but many
languages (like C) link the two together.

In high-level languages, arrays are used to represent data sequences --
there need be no implication that one array element is next to its
successor in memory. Similarly, structures are used to group data into
units that can be manipulated as one. C insists that such a unit be
contiguous in memory, but this does not follow inevitably from the
desire to group two coordinates into a point.

A weak example is Algol 68 arrays. Given a 2D array, any row or column
of it can be passed to a procedure that wants a 1D array. Algol 68 does
not afford the same flexibility to arrays of structures or structures of
arrays (and I don't know any language that does), but a language *could*
do that, and it would not be without value to do so. C can't do it
without completely overhauling its notion of an object.

It's also possible that I am simply not understanding your point. You
wrote "it's fundamental to the nature of computing" just after I talked
about the layout of data in memory, so if the "it" refers to the layout
of data in memory, then I suspect we have a profound disagreement.

One could argue for a different set of data structures be
made available, higher abstractions, but I don't see that it's
necessary for the purposes that C was trying to solve in its design.

I am saying that C can't provide a higher abstraction. It
inevitable links the logical structure with the memory layout.

They key here isn't that C has this or that syntax, or that it's
abstractions are close to the machine, but that the data is better
organized in one manner or another.

Click to expand...

Yes, but there is more than one meaning of better. Â I've seen a Lisp
system where the cons cells are implemented like this:

Â int car[1000], int cdr[1000];

rather than the more obvious

Â struct cell { int car, cdr; } cell[1000];

because such an organisation performed better.

Click to expand...

Right. This is why I said generally, or at least did through most of
the discussion and just slipped up here. Generally speaking, stronger
cohesion equals a better program. Sometimes it pays to organize the
code in an inconvenient manner in order to get better performance out
of certain operations due to being able to use faster algorithms for
them. What you have to do with the data is as important as what it
represents.

I am pointing out that the choice here is unnecessary. A language could
permit this structure to be defined in the "cohesive" way and yet be
laid out in the most efficient way. C forces you to chose between two
logically different organisations if you need two physically different
layouts.

The logical relationship may be intended to be the same, but it
isn't.

Yes, we agree. I said the relationship is not the same -- the
programmer is forced to break it.

I imagine that Richard would prefer C to have something like this:

Â transposed struct cell { int car, cdr; } cell[1000];

so that cell[42] is still "an object", but cell[42].car is followed by
cell[43].car in memory.

This would enable the programmer to express two kinds of "better" -- the
way the data is logically related, and the way it is laid out in memory.

Click to expand...

Correct. This is "language X". It's a higher abstraction than
structs and arrays provide. Abstractions are great but you pay a cost
for them. All the operations you do by hand in by using the raw
abstractions has to be done by this "language X" abstraction. This is
fundamental to the logical construction of computers where the only
"data type" we have is a gigantic array that we can chunkify to
various lengths. It is true that you won't get that dot syntax in C,
but you can implement this abstraction with macros or functions on the
fundamental data types that C provides.

Some operations will be costly and some will be cheap. There is no
unavoidable cost to what I am saying. If efficiency forces an illogical
data design, then that, too, is a cost.

I think I should point out that I doubt this situation occurs
sufficiently often for it to be a concern to language designers. I am
interested in the example from a purely theoretical point of view --
what would a language look like that could use either layout at will?

If there's a bias here in C it's in what abstractions it provides to
the programmer. In this it shares a great deal with other languages.
Even in languages that have different base abstractions, of which
there are not a lot, you'll have similar issues and apparent "biases".

I don't think that is in dispute. I simply suggested that the bias is
not simply in the syntax of C -- it's rooted in C's notion of an object.

I suppose that in one way I was wrong in saying this has nothing to do
with memory. The fact that computer memory looks like a giant array
means that all the abstractions we use have to be translatable to that
representation and that the further from it you get, the more
operations have to be performed. This is WHY structs and arrays are
what C has and not something else like "language X" features.

I don't see that at all. I suggested "language X" specifically because
of the inefficiency of some C abstractions when mapped to some
hypothetical machine's memory and cache layout. It's the behaviour of
real hardware that means the "language X" has value. [This is just an
unfounded assumption, of course. It is possible that there is no data
structure, and no algorithm, running on any hardware that would benefit
from the transposed memory layout I am suggesting. But that is another
argument that no one has taken up one way or the other.]

<snip>

Malcolm McLean · Nov 9, 2011

I don't see that at all. I suggested "language X" specifically because
of the inefficiency of some C abstractions when mapped to some
hypothetical machine's memory and cache layout. It's the behaviour of
real hardware that means the "language X" has value. [This is just an
unfounded assumption, of course. It is possible that there is no data
structure, and no algorithm, running on any hardware that would benefit
from the transposed memory layout I am suggesting. But that is another
argument that no one has taken up one way or the other.]

Getting average salary is likely to be more efficient if the salaries
are contiguous in memory.

However the example is not a good one. Even if you employ all the
people in the world, you can calculate average salary probably faster
than you can press the key to launch the payroll program.

Ben Bacarisse · Nov 9, 2011

Malcolm McLean said:
I don't see that at all. Â I suggested "language X" specifically because
of the inefficiency of some C abstractions when mapped to some
hypothetical machine's memory and cache layout. Â It's the behaviour of
real hardware that means the "language X" has value. Â [This is just an
unfounded assumption, of course. Â It is possible that there is no data
structure, and no algorithm, running on any hardware that would benefit
from the transposed memory layout I am suggesting. Â But that is another
argument that no one has taken up one way or the other.]

Click to expand...

Getting average salary is likely to be more efficient if the salaries
are contiguous in memory.

However the example is not a good one.

Agreed. Note that my example was of a Lisp heap made up of cons cells.

nroberts · Nov 10, 2011

Forgive me for cutting you off here, but the fact that you say this
makes it clear that I've not explained myself at all well. Until we
clear this up, it's just a waste of time discussing details.

Perhaps you came in late and are unfamiliar with the original
statements and just found yourself on the wrong side of a discussion.
You weren't the one to originally claim that C syntax creates these
biases but you ended up arguing with me so I assumed you were
supporting it.

I am saying that the organisation of data in memory, and the
organisation of data into logical structures are orthogonal, but many
languages (like C) link the two together.

Because in computers, memory is one large array like structure.
That's the fundamental abstraction that our hardware attempts to
create.

In high-level languages, arrays are used to represent data sequences --
there need be no implication that one array element is next to its
successor in memory. Similarly, structures are used to group data into
units that can be manipulated as one.

Which is why quite a while back I argued that what people claiming
that C's syntax causes a bias like this are actually arguing for is
abstractions. The further away you get from the fundamental
abstraction that the hardware provides, a long array, the more
operations have to be conducted to project the "higher" abstraction
onto the "lower" one. This is why languages like "language X" cost
more in terms of performance. C is an attempt to provide just enough
abstraction to be useful across all computers, which means it is as
close to the underlying abstraction as necessary.

C insists that such a unit be
contiguous in memory, but this does not follow inevitably from the
desire to group two coordinates into a point.

Actually, C doesn't insist on this at all, it's just the simplest
mapping.

I am saying that C can't provide a higher abstraction. It
inevitable links the logical structure with the memory layout.

This is an incorrect view of higher level languages. Since computers
are what they are, every language inevitably links logical structure
to memory layout. It has to. It can provide an interface that does
not, but that interface comes at a cost and underneath you can be sure
that it projects that interface into the hardware's abstraction in a
very determined manner, just like C does.

I am pointing out that the choice here is unnecessary. A language could
permit this structure to be defined in the "cohesive" way and yet be
laid out in the most efficient way. C forces you to chose between two
logically different organisations if you need two physically different
layouts.

Because C doesn't provide higher abstractions. The abstractions you
are looking for CAN be created in C though and the lower level
abstractions that it provides are a nice way to describe the mapping.
In fact, most languages that we can think of originally start by being
written in a language like C, if they ever even go past that point and
become self-hosting.

On the other hand, what you seem to want, and I agree that it would be
quite difficult to provide in C (or ANY language) is an abstraction
that can, through the way you use it, interpret what the best layout
would be. This is some sort of super-optimizer and indeed would
require a lot of research to provide. The easiest method would
probably be to invent a new language and provide a compiler that did
this; you'd probably write it in C or C++. On the other hand, for
most of the problems you'd attempt to solve with this abstraction it
would probably be best to let the programmer decide what to turn on
and when. This would, at least, be a lot simpler to implement and
would solve most real-world issues. This is probably why I can't
think of a single language that implements anything like what you seem
to expect from C.

Something similar that perhaps actually does provide what you'd
expect, would be a state based template pattern that switches
implementation based on statistics of its use to date. This would be
both more powerful and less powerful than a new language with a really
smart optimizer...depending on your needs. This, and anything more
determinable, can be made in C without much difficulty.

Dennis Ritchie Has Died	28	Oct 13, 2011
Query from Dennis Ritchie	6	Mar 29, 2011
[OT] Dennis Ritchie dies at 70	37	Oct 13, 2011
Errata for The C Programming Language, Second Edition, by Brian Kernighanand Dennis Ritchie	4	May 16, 2009
simple question regarding 5.5 of Ritchie & Kernighan	9	Jun 19, 2005
Solutions for the Kernighan and Ritchie	13	Oct 2, 2008
I NEED AN EXPERT	3	Mar 3, 2024
I'm tempted to quit out of frustration	1	Aug 13, 2023

Dennis Ritchie -- An Appreciation

Alan Curry

nroberts

nroberts

Uncle Steve

James Kuyper

Kaz Kylheku

Uncle Steve

Ian Collins

Ben Bacarisse

Malcolm McLean

Ben Bacarisse

nroberts

Ben Bacarisse

nroberts

Malcolm McLean

nroberts

Ben Bacarisse

Malcolm McLean

Ben Bacarisse

nroberts

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads