What is Expressiveness in a Computer Language

Joachim Durchholz · Jul 13, 2006

Marshall said:
Hmmm. I have heard that argument before and I'm conflicted.

I'd want several things.

A way for me to indicate what assertions must be proven statically.
Highlighting (be it compiler messages or flashing colors in an IDE) that
marks assertions that *will* break.
And highlighting for assertions that *may* break.
In the language, a (possibly) simplicistic inference engine definition
that gives me minimum guarantees about the things that it will be able
to prove; if something is out of the reach of the engine, a
straightforward way to add intermediate assertions until the inference
succeeds.

(Plus diagnostics that tell me where the actual error may be, whether
it's a bug in the code or an omission in the assertions. That's probably
the hardest part of it all.)

Regards,
Jo

Chris Smith · Jul 13, 2006

Marshall said:
Hmmm, well, I cannot agree. You've defined away the pointers
but then slipped them back in again by assumption ("objects
of these types have identity".)

First let me say that the terminology is somewhat problematic.
For the specific issue being discussed here, pointers, identity,
and objects are all the same concept. (I agree that "pointer"
connotes a low-level construct, however.)

Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here). Without object identities, mutability is
useless. What's the use of changing something if you're not sure you'll
ever be able to find it again?

You may limit the scope of object identity arbitrarily, even to the
point that aliasing is impossible (though with lexical closure, that
gets even more limiting than it may first appear)... but you're just
trading off power for simplicity, and the really interesting uses of
mutations are those that allow access to specific objects from any
number different bits of code, on a program-wide or at least module-wide
scope. Most mediocre programmers could replace assignment with
recursion if that assignment is limited to local variables of a single
subroutine. I don't necessarily agree that the result will be a better
program despite others' conviction on the matter; however, the
difference certainly isn't worth complicating the language with mutation
unless you're willing to allow the interesting uses of mutation as well.

Mutability by itself does not imply identity. I agree that mutability
plus identity implies aliasing problems, however.

We might have a terminological issue, then. I'd tend to say that
mutability definitely does imply identity, but identity doesn't imply
aliasing. Same difference.

Darren New · Jul 13, 2006

Chris said:
Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here).

Depends what you mean by "object".

int x = 6; int y = 5; x = y;

I'd say x was mutable, with no "identity" problems involved?

Why is it problematic that variables have identity and are mutable?
Certainly I can later "find" whatever value I put into x.

Joe Marshall · Jul 13, 2006

Marshall said:
Again, I disagree: it is posible to have mutability without
pointers/identity/objects.

I think you are wrong, but before I make a complete ass out of myself,
I have to ask what you mean by `mutability'. (And
pointers/identity/objects, for that matter.)

Alan Bawden discusses the phenomenon of `state' in his Ph.D.
dissertation "Implementing Distributed Systems Using Linear Naming".
MIT AI Lab Technical Report AITR-1627. March 1993 He makes a
persuasive argument that `state' is associated with cycles in naming.

David Hopwood · Jul 13, 2006

Marshall said:
I look forward to reading this. I read a paper on JML a while ago and
found it quite interesting.

... or improve their performance ..

) without changing programs.

Hmmm. I have heard that argument before and I'm conflicted.

I can think of more reasons than just runtime safety for which I'd
want proofs. Termination for example, in highly critical code;
not something for which a runtime check will suffice.

It is true that some properties cannot be verified directly by a runtime check,
but that does not mean that runtime checks are not indirectly useful in verifying
them.

For example, we can check at runtime that a loop variant is strictly decreasing
with each iteration. Then, given that each iteration of the loop body terminates,
it is guaranteed that the loop terminates, *either* because the runtime check
fails, or because the variant goes to zero.

In general, we can verify significantly more program properties using a
combination of runtime checks and static proof, than we can using static proof
alone. That may seem like quite an obvious statement, but the consequence is
that any particular property is, in general, not verified purely statically or
purely at runtime.

I am not opposed to being able to annotate an assertion to say that it should
be statically provable and that a runtime check should not be used. However,

- such annotations should be very lightweight and visually undistracting,
relative to the assertion itself;

- a programmer should not interpret such an annotation on a particular assertion
to mean that its static validity is not reliant on runtime checks elsewhere;

- if the class of assertions that are statically provable changes, then a
tool should be provided which can *automatically* add or remove these
annotations (with programmer approval when they are removed).

I'd like to make a couple more comments about when it is sufficient to detect
errors and when it is necessary to prevent them:

- If a language supports transactions, then this increases the proportion
of cases in which it is sufficient to detect errors in imperative code.
When state changes are encapsulated in a transaction, it is much easier
to recover if an error is detected, because invariants that were true of
objects used by the transaction when it started will be automatically
reestablished. (Purely functional code does not need this.)

- Almost all safety-critical systems have a recovery or safe shutdown
behaviour which should be triggered when an error is detected in the
rest of the program. The part of the program that implements this behaviour
definitely needs to be statically correct, but it is usually only a small
amount of code.

Safety-critical systems that must either prevent errors or continue
functioning in their presence (aircraft control systems, for example) are
in a separate category that demand *much* greater verification effort. Even
for these systems, though, it is still useful to detect errors in cases
where they cannot be prevented. When multiple independent implementations
of a subsystem are used to check each other, this error detection can be
used as an input to the decision of which implementation is failing and
which should take over.

Marshall · Jul 13, 2006

Joachim said:
Well, the implication certainly holds from identity to mutability.
The only definition of identity that I found to hold up for all kinds of
references (pointers, shared-memory identifiers, URLs etc.) is this:

Two pieces of data are identical if and only if:
a) they are equal
b) they stay equal after applying an arbitrary operation to one of them.

This means that for immutable objects, there's no observable difference
between equality and identity (which I think it just fine).

Agreed on all counts.

For the implicaton from mutability to identity, I'm not sure whether
talking about mutation still makes sense without some kind of identity.
For example, you need to establish that the object after the mutation is
still "the same" in some sense, and this "the same" concept is exactly
identity.

Unless we have some specific mechanism to make two named variables
have the same identity, (distinct from having the same value), then
there
is no aliasing. Pointers or references is one such mechanism; lexical
closures over variables is another. (I don't know of any others.)

Then we're agreeing about the most important point anyway.
Yes.

I'm sceptical.
Any examples?

See next post.

Marshall

David Hopwood · Jul 13, 2006

Chris said:
This seems to me a bit misleading. Perhaps someone will explain why I
should stop thinking this way; but currently I classify statements like
this in the "well, sort of" slot of my mind. If functional programming
were really just compilable postconditions, then functional programmers
would be able to skip a good bit of stuff that they really can't. For
example, time and space complexity of code is still entirely relevant
for functional programming. I can't simply write:

(define fib
(lambda (x) (if (< x 2) 1 (+ (fib (- x 1)) (fib (- x 2))))))

and expect the compiler to create an efficient algorithm for me.

This is true, but note that postconditions also need to be efficient
if we are going to execute them.

That is, the difference you've pointed out is not a difference between
executable postconditions and functional programs. Both the inefficient
functional definition of 'fib' and an efficient one are executable
postconditions. In order to prove that the efficient implementation is
as correct as the inefficient one, we need to prove that, treated as
postconditions, the former implies the latter.

(In this case a single deterministic result is required, so the former
will be equivalent to the latter.)

Marshall · Jul 13, 2006

Joe said:
I think you are wrong, but before I make a complete ass out of myself,
I have to ask what you mean by `mutability'. (And
pointers/identity/objects, for that matter.)

Responding to requests for examples from Joachim, Joe, and Chris....

The very simple example is the one Darren New already mentioned.

Consider the following Java fragment:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled in
above.
// The assignment to i cannot be made to affect the value of j.

}

Those two local primitive variables cannot be made to have the same
identity. But you can update them, so this is an example of mutability
without the possibility of identity.

Earlier I also mentioned SQL tables as an example, although SQL
supports *explicit* aliasing via views.

Alan Bawden discusses the phenomenon of `state' in his Ph.D.
dissertation "Implementing Distributed Systems Using Linear Naming".
MIT AI Lab Technical Report AITR-1627. March 1993 He makes a
persuasive argument that `state' is associated with cycles in naming.

I would like to read that, but my brain generally runs out of gas at
about 21
pages, so it's about an order of magnitude bigger than I can currently
handle. :-( As to "cycles in naming" that's certainly an issue. But it
it
a requirement for state? Back to Java locals, it seems to me they meet
the standard definition of state, despite the lack of cycles.

As to pointers/references, I earlier mentioned the existence of the
reference/dereference operations as being definitional. Note that
one can go to some lengths to obscure them, but they're still there.
For example, Java has the reference and dereference operators;
Java's "." operator is actually C's "->" operator.

I am not so bold/foolish as to attempt a defintion of "object" however.

Marshall

Joe Marshall · Jul 13, 2006

Marshall said:
Consider the following Java fragment:

void foo() {
int i = 0;
int j = 0;

// put any code here you want

j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled in
above.
// The assignment to i cannot be made to affect the value of j.

}

True, but you have hidden the pointers. Semantically, the identifiers
i and j refer not to integers but to locations that hold integers. The
assignment modifies the location.

Those two local primitive variables cannot be made to have the same
identity. But you can update them, so this is an example of mutability
without the possibility of identity.

The identity is temporal: You use the same variable name at two
different times. Do you intend for the second `i' to mean the same
variable as the first `i'?

Chris Smith · Jul 13, 2006

Darren New said:
Depends what you mean by "object".

int x = 6; int y = 5; x = y;

I'd say x was mutable, with no "identity" problems involved?

The variable x definitely has identity that's independent of its value.
Some might call that a problem in and of itself, as it complicates the
formal model of the language and makes it difficult to predict what
result will be produced by normal order evaluation.

On the other hand, this thread seems to be using "identity" to mean
"identity with potential for aliasing", in which case it is vacuously
true that eliminating identity also prevents the problems that arise
from aliasing. It is true, and I agree on this with Marshall, that
eliminating the potential for aliasing solves a lot of problems with
checking invariants. I also see, though, that the majority (so far, I'd
say all) of the potential uses for which it's worth introducing mutation
into an otherwise mutation-free language allow the possibility of
aliasing, which sorta makes me wonder whether this problem is worth
solving. I'd like to see an example of code that would be harder to
write without mutation, but which can obey any restriction that's
sufficient to prevent aliasing.

Why is it problematic that variables have identity and are mutable?
Certainly I can later "find" whatever value I put into x.

I simply found the language confusing. I said it would be nonsensical
for a language to have mutation without identity.

Chris Smith · Jul 13, 2006

David Hopwood said:
This is true, but note that postconditions also need to be efficient
if we are going to execute them.

If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.
In both cases, some statement is asserted through a description of a
means of computing it. There may be a distinction worth making here,
but I'm missing it so far.

Rob Warnock · Jul 14, 2006

+---------------
| Joachim Durchholz wrote:
| > Actually SQL has references - they are called "primary keys", but they
| > are references nevertheless.
|
| I strongly object; this is quite incorrect. I grant you that from the
| 50,000 foot level they appear identical, but they are not.
+---------------

Agreed. The only thing different about "primary" keys from any other
key is uniqueness -- a selection by primary key will return only one
record. Other than that constraint, many databases treat them exactly
the same as non-primary keys [e.g., can form indexes on them, etc.].

+---------------
| To qualify as a reference, there need to be reference and dereference
| operations on the reference datatype; there is no such operation is SQL.
+---------------

Not in "ANSI SQL92", say, but there might be in most SQL databases!
[See below re OIDs. Also, SQL:1999 had a "REF" type that was essentially
and OID.]

+---------------
| Would you say the relational algebra has references?
+---------------

Don't confuse "SQL" & "relational algebra"!! You'll get real
relational algebraists *way* bent out of shape if you do that!

+---------------
| > (Some SQL dialects also offer synthetic "ID" fields that are
| > guaranteed to remain stable over the lifetime of a record.
|
| Primary keys are updatable; there is nothing special about them.
+---------------

I think he's probably talking about "OIDs" (object IDs). Most
current SQL-based databases provide them, usually as a normally-
invisible "system column" that doesn't show up when you say
"SELECT * FROM", but that *does* appear if you say "SELECT oid, *",
and may be used as a "primary" key even on tables with no actual
primary key:

rpw3=# select * from toy limit 4;
c1 | c2 | c3 | upd
--------+-------+--------------------------------+-----
fall | tape | My Favorite Thanksgiving | 16
xmas | book | My Favorite Christmas | 2
xmas | video | The Grinch who Stole Christmas | 4
summer | book | Unusual 4ths of July | 17
(4 rows)

rpw3=# select oid, * from toy limit 4;
oid | c1 | c2 | c3 | upd
-------+--------+-------+--------------------------------+-----
19997 | fall | tape | My Favorite Thanksgiving | 16
19998 | xmas | book | My Favorite Christmas | 2
19999 | xmas | video | The Grinch who Stole Christmas | 4
20000 | summer | book | Unusual 4ths of July | 17
(4 rows)

rpw3=# select * from toy where oid = 19998;
c1 | c2 | c3 | upd
------+------+-----------------------+-----
xmas | book | My Favorite Christmas | 2
(1 row)

rpw3=# insert into toy values ('fall','book','Glory Road');
INSERT 32785 1

rpw3=# select oid, * from toy where oid = 32785;
oid | c1 | c2 | c3 | upd
-------+------+------+------------+-----
32785 | fall | book | Glory Road | 21
(1 row)

rpw3=#

See <http://www.postgresql.org/docs/8.1/static/datatype-oid.html>
for how PostgreSQL treats OIDs [including some critical limitations].

-Rob

Marshall · Jul 14, 2006

Joe said:
True, but you have hidden the pointers. Semantically, the identifiers
i and j refer not to integers but to locations that hold integers. The
assignment modifies the location.

What the implementation looks like shouldn't affect how we speak
of the logical model. In the above code, there are no pointers.

By your definition, "pointer" and "variable" are synonyms. That doesn't
seem like a good idea to me. (What if i and j end up in registers?
I have not heard it said that registers have addresses.)

The identity is temporal: You use the same variable name at two
different times. Do you intend for the second `i' to mean the same
variable as the first `i'?

Okay, so "i" and "i" have the same identity. I suppose you could argue
that the language's namespace is an address-space, and variable names
are addresses. At this point, though, you've made the concept of
identity
so broad that it is now necessarily a property of all languages that
use
named values, whether updatable or not. I think you would also have
to call "3" a void pointer by this scheme.

Clearly there is *some* difference between a language which allows
explicit pointers and thus aliasing and one that doesn't. What term
would you use? First-class variables?

Marshall

Marshall · Jul 14, 2006

Chris said:
The variable x definitely has identity that's independent of its value.

I'm not sure what you mean by that.

I also see, though, that the majority (so far, I'd
say all) of the potential uses for which it's worth introducing mutation
into an otherwise mutation-free language allow the possibility of
aliasing, which sorta makes me wonder whether this problem is worth
solving.

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

Marshall

Chris Smith · Jul 14, 2006

Marshall said:
I'm not sure what you mean by that.

I mean, simply, that when you can assign a value to a variable, then you
care that it is that variable and not a different one. That's identity
in the normal sense of the word. The code elsewhere in the procedure is
able to access the value of 'x', and that has meaning even though you
don't know what value 'x' has. This has definite implications, and is a
useful concept; for example, it means that the pure lambda calculus no
longer sufficient to express the semantics of the programming language,
but instead something else is required.

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.
For example, executing:

UPDATE P SET x = 5 WHERE y = 43;

may result in the database having to re-evaluate the constraint that
says that for all P(x, y, z), x must be less than 4 when z = 17. One
difference is that while in general purpose programming languages this
appears to be a daunting task, databases are set up to do these kinds of
things all the time and contain optimizations for it... but the problem
is still the same, and it would still present the same difficulties in
doing formal proofs that running the above UPDATE statement doesn't
violate any invariants.

(If I'm wrong about that, please let me know; I'd very interested if
that's so.)

David Hopwood · Jul 14, 2006

Chris said:
If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition [*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

If a subprogram is intended to have side effects, then its postcondition
can describe the results of the side effects, but must not reexecute them.

[*] E.g. see
<http://www.spatial.maine.edu/~worboys/processes/hoare axiomatic.pdf>,
although the term "postcondition" was introduced later than this paper.

Marshall · Jul 14, 2006

Chris said:
I mean, simply, that when you can assign a value to a variable, then you
care that it is that variable and not a different one. That's identity
in the normal sense of the word.

I guess it is, now that you mention it.

The code elsewhere in the procedure is
able to access the value of 'x', and that has meaning even though you
don't know what value 'x' has. This has definite implications, and is a
useful concept; for example, it means that the pure lambda calculus no
longer sufficient to express the semantics of the programming language,
but instead something else is required.

What you are asking for is some subset of identity, and I've not yet
succeeded in understanding exactly what it is or what its limits are...
except that so far, it seems to have everything to do with pointers or
aliasing.

Perhaps it is specifically first-class identity, rather than identity
per se.

I'm not yet convinced that this is any different from a language with
standard pointer aliasing. If I have two tables A and B, and a foreign
key from A into B, then I run into the same problems with enforcing
constraints that I would see with a pointer model... when I update a
relation, I need to potentially check every other relation that contains
a foreign key into it, in order to ensure that its constraints are not
violated by that constraint. That's the same thing that is being
pointed out as a negative consequence of aliasing in other languages.

No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

In our i-and-j example above, suppose there was a constraint
such that i < j. We have to re-check this constraint if we
update either i or j. That's not the same thing as saying
that i and j are aliased.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint. In general, any constraint on two
variables will have to be rechecked upon update
to either.

For example, executing:

UPDATE P SET x = 5 WHERE y = 43;

may result in the database having to re-evaluate the constraint that
says that for all P(x, y, z), x must be less than 4 when z = 17.

I don't see any aliasing in this example either.

But consider how much worse this problem is if real aliasing
is possible. We have some pointer variable, and initialize
it with the return value from some function. We don't know
anything about what variable is involved. We update through
the pointer. What constraints must we recheck? Apparently
all of them; unless we have perfect alias analysis, we can't
tell what variables are affected by our update.

One
difference is that while in general purpose programming languages this
appears to be a daunting task, databases are set up to do these kinds of
things all the time and contain optimizations for it... but the problem
is still the same, and it would still present the same difficulties in
doing formal proofs that running the above UPDATE statement doesn't
violate any invariants.

(If I'm wrong about that, please let me know; I'd very interested if
that's so.)

I'm interested to hear your reply.

Marshall

Chris Smith · Jul 14, 2006

Marshall said:
Perhaps it is specifically first-class identity, rather than identity
per se.

As in: "the value of one variable can (be/refer to/depend on) the
identity of another variable"? I can certainly see this as as
reasonable concept to consider.

No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint.

There's confusion here coming from different usages of the word
variable. Let us talk instead of values, and of the abstract structures
that gives them meaning. In both cases (invariants in a hypothetical
imperative language, and in a relational database), the constraints make
reference to these structures of values (relations, for example, or
various kinds of data structures), and not to the individual values or
objects that they contain. In both cases, the problem is not that we
don't know what structures to check to verify the invariant; rather,
it's that we have to check ALL of the values in that structure.

As someone pointed out, this is to be expected in a world of mutable
things with identity that are globally locatable. It is simple fact
that if I tell you "I spoke to Barbara's husband", you may need to trace
down who Barbara's husband is before you could discover that, for
example, maybe I actually spoke to your boss, or to your nephew's best-
friend's father. If databases are capable of modeling these kinds of
relationships (and of course they are), then they are as susceptible to
"aliasing" -- in a logical sense that avoids mention of pointer -- as
anyone else.

I don't see any aliasing in this example either.

Actually, this was probably a bad example. Let's stick to the others
involving relationships between tuples.

George Neuner · Jul 14, 2006

On the other hand, there is no problem domain for which pointers
are a requirement. I agree they are deucedly convenient, though.

I would argue that pointers/references _are_ a requirement for I/O. I
know of no workable method for interpreting raw bits as meaningful
data other than to overlay a typed template upon them.

Categorically disallowing address manipulation functionally cripples
the language because an important class of programs (system programs)
cannot be written.

Of course, languages can go overboard the other way too. IMO, C did
not need to provide address arithmetic at the language level,
reinterpretable references and array indexing would have sufficed for
any use. Modula 3's type safe view is an example of getting it right.

It is quite reasonable to say "I don't write _____ so I don't need
[whatever language feature enables writing it]". It is important,
however, to be aware of the limitation and make your choice
deliberately.

George

Marshall · Jul 14, 2006

George said:
I would argue that pointers/references _are_ a requirement for I/O. I
know of no workable method for interpreting raw bits as meaningful
data other than to overlay a typed template upon them.

I think I know what you mean. I agree that pointers are necessary
for, e.g., device drivers. So I have to weaken my earlier statement.

Categorically disallowing address manipulation functionally cripples
the language because an important class of programs (system programs)
cannot be written.

That's fair, although I could argue how important systems programming
is these days. (And C/C++ are ****-of-the-walk there anyway.)

Of course, languages can go overboard the other way too. IMO, C did
not need to provide address arithmetic at the language level,
reinterpretable references and array indexing would have sufficed for
any use. Modula 3's type safe view is an example of getting it right.

It is quite reasonable to say "I don't write _____ so I don't need
[whatever language feature enables writing it]". It is important,
however, to be aware of the limitation and make your choice
deliberately.

Agreed.

Marshall

What is Expresiveness in a Computer Language?	24	Jul 10, 2005
proliferation of computer languages	8	Jul 18, 2008
is laziness a programer's virtue?	13	Apr 16, 2007
Hacker News, Xahlee.Org, and What is Politics?	1	Mar 15, 2010
is laziness a programer's virtue?	12	Apr 15, 2007
Math Notations, Computer Languages, and the “Form” in Formalism	4	Aug 31, 2009
general function for sorting a matrix	4	Aug 29, 2007
A Lambda Logo Tour	5	Apr 5, 2006

What is Expressiveness in a Computer Language

Joachim Durchholz

Chris Smith

Darren New

Joe Marshall

David Hopwood

Marshall

David Hopwood

Marshall

Joe Marshall

Chris Smith

Chris Smith

Rob Warnock

Marshall

Marshall

Chris Smith

David Hopwood

Marshall

Chris Smith

George Neuner

Marshall

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads