enhance an array's static type by a lower length-bound.

A

Andreas Leitgeb

This is an idea that occurred to me while reading the Ada vs Java thread,
although not really applying there.

It is about some enhancement that I'd think could go into some future
version of Java (but likely won't anyway). I'm posting it here not
for the purpose of "proposing" it, but to learn about reasons why this
might not be do-able or not worth it... or if someone else has thought
of this already long before.

The point of the idea is, that array variables would be added a static
bit of information, namely a lower bound (0 by default) for the length
of an array that can legally be stored in that variable.

I'll start with an example of what it might look like at source-level:

class Demo {
static int get9th( int[10] a_ia ) {
return a_ia[9]; // guaranteed not to throw an AIOOB-Exception
}

public static void main(String[] args) {
int[20] ia1 = new int[42]; // Fine, it's just a lower bound.
//int[20] ia2 = new int[1]; // this would cause a compile-time error.

// new type[ constant ]'s return value would be type[constant]
// new type[ runtime value ]'s return value would be type[0]
// Later versions of Java could in principle generate a
// stricter return type (higher lower bound), if they can prove
// a higher lower bound for the runtime value, but that might
// impact selection of method overloads (see further below),
// so might not be all that good an idea.

get9th ( ia1 ); // get9th requires "at least 10" and gets "at least 20"
//get9th( new int[5] ); // this would trigger a compile-time error.

int[] ia2 = new int[42]; // or equiv: int[0] ia2 = ...
//foo( ia2 ); // compile-error: no guarantees on length of ia2.
get9th ( (int[10]) ia2 ); // a length-cast: no compiletime error,
// but a runtime-check on the length. if too short: exception
// right at the cast. This is for treating return values from
// old code, that might have provided length-guarantees in the
// documentation.

// not exactly a design-goal, but a dead giveaway:
if ( ia2 instanceof int[40] ) { ... }
}
}

I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )

Methods could overload others just for different length-bounds.
The one with the largest provably satisfyable length-bound would be
chosen among those with otherwise matching signature.

I believe that there will be no issue about contravariance with this
scenario, thus no need to ever specify upper-bounds. If I'm missing
something, I'd be interested to know.

Any technical flaws so far?

One reason that might thwart it could be, that arrays are already
a step-child of Java (generics), so anything about improving on
them may be seen as a down-side, generally, for conceivably making
them more safe, thus more inviting to be used.
Another one could be, that ArrayIndexOutOfBoundsExceptions are
not currently perceived as a problem that could be alleviated
by static checks, anyway.
 
J

Joshua Cranmer

I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )

You could do it by adding a few new annotations, one to annotate
min-length values for method parameters/return value and one for the
fields. Note that this means that the type information is lost at
runtime, so an int[123] is sugared at runtime as int[] and could not be
exposed as int[123] (you would also have to add more information into
the LocalVariableTable).

The issue you get here is that often times the size of the array bound
isn't useful in terms of a static number constant, but rather in terms
of previous constants, e.g. if you want to get the i'th pixel of an rgb
pixel byte array, you need to guarantee that the length is at least 3 * i.

On the other hand, I'm not sure the utility of this is particularly
great. You wouldn't be able to avoid any more runtime checks beyond what
the JIT could most likely already optimize out.
 
R

Robert Klemme

The point of the idea is, that array variables would be added a static
bit of information, namely a lower bound (0 by default) for the length
of an array that can legally be stored in that variable.
I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )

Methods could overload others just for different length-bounds.
The one with the largest provably satisfyable length-bound would be
chosen among those with otherwise matching signature.

Not sure whether this would work - especially since binding of the
signature occurs at compile time while you might only learn the real
length of a parameter at runtime.

Also there is probably a whole lot of issues with java.lang.reflect.
Another one could be, that ArrayIndexOutOfBoundsExceptions are
not currently perceived as a problem that could be alleviated
by static checks, anyway.

I think this is closer to the real reason why this is probably not such
a good idea. What you propose would be useful only if the added
property of arrays would allow for *compile time* checking - because
Java has runtime checks already. For compile time checks to be
reasonable you would need to forbid

int[] a1 = expr;
int[10] a2 = a1;

because depending on the complexity of expr it might not be possible for
the compiler to determine the guaranteed min length of a1.

And finally the question: how often do you know the array length in
advance? How useful is this in practice? I have the feeling that
people would use this as a lazy shortcut for defining a class with n
integer fields. And then how does the usefulness weight against the
cost of realizing this? I think your intuition (that we won't see this)
was pretty good already. :)

Kind regards

robert
 
R

Roedy Green

The point of the idea is, that array variables would be added a static
bit of information, namely a lower bound (0 by default) for the length
of an array that can legally be stored in that variable.

I used Pascal before I used Java. The one advantage of Java 0-based
arrays is you KNOW what the base is without having to check.

It takes a bit of a mental shift, but eventually you always think
iteration starting at 0, just as when you were a kid, you would not
start counting at anything but 1.
 
L

Lew

Roedy said:
I used Pascal before I used Java. The one advantage of Java 0-based
arrays is you KNOW what the base is without having to check.

It takes a bit of a mental shift, but eventually you always think
iteration starting at 0, just as when you were a kid, you would not
start counting at anything but 1.

+1

FWIW

There are times when some sort of variable indexing scheme could be useful for arrays, but in Java's case they opted for simplicity, as has been mentioned in this thread. Arrays are not the be-all and end-all, though - you can get the effects wanted with associative arrays, i.e., Maps. Thus you will gain generics (and lose reified type), heritability, flexibility (and lose notational convenience) and sparseness. You can recapture some of thosethings (e.g., notational convenience) by switching to a different JVM language that supports, e.g., bracket notation for maps.

Some of the complaints about Java can be mitigated (and ever more so) by the growing number of languages that run on the JVM. It doesn't pay for a programmer to get too locked in to any one language, but a single portable andevolving platform such as the JVM can really empower us.
 
A

Andreas Leitgeb

Unfortunately, Roedy completely misunderstood what my post was about.

It was not about lower-bounding the indices, but about lower-bounding
the *length* - thus morphing certain runtime-ArrayIndexOutOfBoundExceptions
into compile-time errors. Are my postings really that hard to grasp?
Even the examples I provided?

Lew said:
+1 also from me, for shredding this utterly wrong interpretation
of my post.
 
A

Andreas Leitgeb

Joshua Cranmer said:
I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )
You could do it by adding a few new annotations,

I must admit, that I've so far mostly ignored Annotations for my
tasks. When you write "adding a few annotations", do you just mean
adding certain standard annotations to the respective places, or did
you mean defining a custom Annotation and write some code for a
compiler-plugin that would then deal with one's new Annotations?

Either way, I doubt, that it would go as far as the feature I
posted (especially wrt overloading).
one to annotate
min-length values for method parameters/return value and one for the
fields. Note that this means that the type information is lost at
runtime, so an int[123] is sugared at runtime as int[] and could not be
exposed as int[123] (you would also have to add more information into
the LocalVariableTable).

I wouldn't mind this loss. At runtime one can still check the .length
of the array. (Btw., I'm also quite content with generics' erasure ;-)
The issue you get here is that often times the size of the array bound
isn't useful in terms of a static number constant, but rather in terms
of previous constants, e.g. if you want to get the i'th pixel of an rgb
pixel byte array, you need to guarantee that the length is at least 3 * i.

It's obvious, that if some array's length doesn't have any *static*
boundaries, then a static check won't do anything useful.
If arrays are used to implement the high-level concept of output
parameters, then a method might want to do this:
void foo( int in, int[1] out ) {
out[0]=in; /* example doesn't need to do sthg useful */
}
to ensure that it won't be called with a zero-length array.
On the other hand, I'm not sure the utility of this is particularly
great. You wouldn't be able to avoid any more runtime checks beyond what
the JIT could most likely already optimize out.

The primary goal was of course compile-time checks. If any runtime-check
could be saved at all, that would be merely a nice side-effect.

This is some "solution in search of a *practical* problem to solve".
Maybe, requiring a minimum length for an array is already an anti-pattern.
Maybe someone else happens to know of a plausible usecase.
But I'd be mostly interested if anyone could find any plain
inconsistency in it.
 
J

Joshua Cranmer

Joshua Cranmer said:
I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )
You could do it by adding a few new annotations,

I must admit, that I've so far mostly ignored Annotations for my
tasks. When you write "adding a few annotations", do you just mean
adding certain standard annotations to the respective places, or did
you mean defining a custom Annotation and write some code for a
compiler-plugin that would then deal with one's new Annotations?

I was actually referring to class file annotations, not Java annotations.
 
A

Andreas Leitgeb

Patricia Shanahan said:
I don't think I've ever seen a language with dynamic array creation that
has a lower bound on the length of an array a variable can reference.
I'm not sure how it would work, or how I would use it.

The particular approach may appear new, but it isn't much different
from the well-known object-hierarchy. Except: the "sub-tree" starting
from an unbounded array type[] is just the linear chain of all bounded
arrays type[#], sorted by their bound value.
Usually, if I have a non-zero lower length bound it is 1 or 2, but most
of my array handling code is designed and tested for zero length and up.

As we're all used (and even "bound") to the current state (while doing Java,
at least), it's no surprise at all, that you do such checks at runtime.

What do you do, if the array is smaller than required?
throw an exception? That should/could be compiler's job.
You perform some other sensible task? Then bounded arrays
won't help, nor affect you.
Even when there is a lower bound, I'm not sure knowing it would help
much with compile time error detection. For example, a method to
calculate the sample mean of some observations requires at least one
element, but is going to iterate over the whole array.

Maybe I forgot to explain, that this lower-bounding doesn't have an
impact on applicable indices:

int[10] ia = getArray(); // defined as: int[12] getArray() { ... }
ia[9]++; // is guaranteed *not* to throw an AIOOBE!
// If the array were too small, then getArray()
// couldn't have returned it.
ia[15]++; // still perfectly legal. But it *may or may not* throw
// an AIOOBE at runtime, depending on the actual array.

There is no general guarantee, that some ia will not throw an AIOOBE,
unless there's also a guarantee on i's range. JIT might make use of
such extra guarantees, that do not depend on the actual array.

double mean( double[1] values ) {
// just go ahead and sum up, then divide by values.length
// as it will be non-zero. Statically ruling out "null"
// is another day's work - or maybe can already be done
// using Java's Annotations (?)
...
}
 
A

Andreas Leitgeb

Joshua Cranmer said:
Joshua Cranmer said:
On 8/17/2011 10:04 AM, Andreas Leitgeb wrote:
I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )
You could do it by adding a few new annotations,
I must admit, that I've so far mostly ignored Annotations for my
tasks. When you write "adding a few annotations", do you just mean
adding certain standard annotations to the respective places, or did
you mean defining a custom Annotation and write some code for a
compiler-plugin that would then deal with one's new Annotations?
I was actually referring to class file annotations, not Java annotations.

They are called "annotations", too? I thought, they were called "attributes".

But anyway, yes, that might also be a way to implement it.
(That's also, how signatures of generic methods, etc. are handled)
 
A

Andreas Leitgeb

Robert Klemme said:
The point of the idea is, that array variables would be added a static
bit of information, namely a lower bound (0 by default) for the length
of an array that can legally be stored in that variable.
I believe such a change would be source-compatible, but might require
some severe changes at the bytecode/class-file-format level, to allow
to specify such a lower bound in a type signature. Such a change should
be possible in a way, that old class-files can still be used. e.g.:
get9th:(]10I)I (currently without bounds: get9th:([I)I )
Methods could overload others just for different length-bounds.
The one with the largest provably satisfyable length-bound would be
chosen among those with otherwise matching signature.

Not sure whether this would work - especially since binding of the
signature occurs at compile time while you might only learn the real
length of a parameter at runtime.

That was the point, that the resolution would still be at compile-time,
and based on the particular statically known length bound, not the actual
length.

e.g.:
void foo(int[] ia) { ... }
void foo(int[10] ia) { ... }
void foo(int[20] ia) { ... }
...
foo(new int[25]) -> third
int[5] ia = new int[15];
foo (ia ) -> first, not second!

That would be entirely consistent with

void foo (Object o) { ... }
void foo (List l) { ... }
void foo (ArrayList al) { ... }
...
foo ( new ArrayList() { ... } ) // (subclass of AL) -> 3rd.
Collection c = new Vector();
foo ( c ) -> calls first, not second.
Also there is probably a whole lot of issues with java.lang.reflect.

Hmm, the Class-instances representing arrays e.g. for a call to
Class.getMethod() likely would need to be specific for each bound.
There'd need to be a way to get a class-object for, say, int[42].
May be tricky, but doesn't seem impossible to me.
If you know of a specific difficulty here, then I beg you to speak up!
I think this is closer to the real reason why this is probably
not such a good idea.

I don't refute this. I just like to principially think it through,
anyway. Maybe until some show-stopper "that can't ever work" is
brought up.
What you propose would be useful only if the added
property of arrays would allow for *compile time* checking -
Well, that's the intention behind it - to some degree.
because Java has runtime checks already. For compile time checks to be
reasonable you would need to forbid
int[] a1 = expr;
int[10] a2 = a1;

I wrote that in my original post, already. It requires an explicit
cast in the second line, and the cast would mean that a runtime check
on the length occurs. Principially the same story as with object-type
casts.
because depending on the complexity of expr it might not be possible for
the compiler to determine the guaranteed min length of a1.

The guaranteed min length of a1 is 0 - because the brackets are empty.
I have the feeling that
people would use this as a lazy shortcut for defining a class with n
integer fields.

That would be an admittedly entirely unwanted dead giveaway, but just
like the imbalance of cost vs practical applicability, I'd like to
keep that out of the technical discussion. It would be relevant, if
it was intended as a serious proposal, but that isn't the case here.
 
R

Roedy Green

That should/could be compiler's job.

If you are the only person who has this problem, then it should
definitely not be the compiler's job. You can't put features in
languages unless 99% of programmers can understand them. That rules
out quite number of esoteric ideas.

I exaggerate somewhat, but designing computer languages is a bit like
designing control panels for elevators. They have to be obvious.

Your idea is not as hairy as generics, but it is not something that
can be understood even with two readings.

--
Roedy Green Canadian Mind Products
http://mindprod.com
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is,
the search for a superior moral justification for selfishness.
~ John Kenneth Galbraith (born: 1908-10-15 died: 2006-04-29 at age: 97)
 
A

Andreas Leitgeb

Robert Klemme said:
e.g.:
void foo(int[] ia) { ... }
void foo(int[10] ia) { ... }
void foo(int[20] ia) { ... }
How likely is it that you want to have vastly different implementations
of foo() with different array lenghts?

Very unlikely :) (see later in this posting about benefits)
Hmm, the Class-instances representing arrays e.g. for a call to
Class.getMethod() likely would need to be specific for each bound.
There'd need to be a way to get a class-object for, say, int[42].
May be tricky, but doesn't seem impossible to me.
If you know of a specific difficulty here, then I beg you to speak up!
As far as I can see you would need a Class instance for every
combination of base type and min length. Plus you would need to ensure
that type compatibility check methods return the appropriate boolean
values. So far so good.
If you try to make them inherit (i.e. int[4] "is a" int[3]) things get
really crazy because int[4] is also an int[2] and int[1] and int[0].
That doesn't fit Java's inheritance model well. Now if you only have
int[4] and int[1] in the system the superclass of int[4] would be
different than if you also had int[2] in the system. Now add dynamic
class loading into the mix...

If, for using (say) int[4200000], the JVM would really need all the
4200000 other array-"superclasses" in memory at once, then this might
just be one such show-stopper that I searched for. Thanks :)

Maybe it could be fixed with some special case, where the superclasses
of a bounded array-class do not really need to be in memory, but this
would surely require a much deeper change than what I anticipated for it.

Why only to "some degree"? Compile time type safety would be the only
benefit compared to the current situation. We do have runtime checks
already.

"To some degree":
The compiler would only check assignment to bounded arrays. (including
the assignment that kind-of-happens to the formal parameters of the called
method)

Thus, the compiler could make sure, that a method taking a bounded array
would never get called with a smaller one (but could still get null, btw).

The compiler would however *not* check the particular indexing operations,
(therefore I wrote: "to some degree") as the array might be larger and
indices higher than the lower bound might just be legal.
Why do you want to keep that out of the discussion? When considering
technical solutions evaluating cost vs. benefit is always an important
part of the evaluation.

Sometimes these two aspects are evaluated separately. The benefit has
already been evaluated as near-zero, because most arrays in practical
use (those used internally for certain collections) just do not benefit
from a lower bound on length.
The only usecases so far were algorithms on arrays that require a
particular minimum length (such as obtaining the mean value), and
the anti-pattern of mis-using arrays to store separate fields.

Now, I'm evaluating the cost, and it is already higher than what I
expected (due to the JVM requiring complete chains of classes up to
Object).
The quest to find further costs would be less verbose, if the
question about benefit weren't always re-raised.
I think it's clear that what you propose can be solved technically -

There's changes that (would) take a straight-forward effort to implement,
and there are changes that require some kind of wider refactoring. I'm
curious about the latter ones that would be necessary for implementing
that pseudo-proposal. Just for the sake of the discussion, actually,
as I already pointed out in the first post of this thread.
 
A

Andreas Leitgeb

Roedy Green said:
If you are the only person who has this problem, then it should
definitely not be the compiler's job. You can't put features in
languages unless 99% of programmers can understand them. That rules
out quite number of esoteric ideas.

I entirely agree with you. However, if you didn't really read the
first few lines of the original post, then you're probably just
wasting your time if you read or answer the other posts of this
thread. (You're of course free to do just that - namely waste your
time - if you're in that mood :)

This thread is neither about a problem to be solved, nor about a
serious proposal for Java - despite certain formulations (picked
out of the context) may make it seem like it was.
 
L

Lew

Andreas said:
I entirely agree with you. However, if you didn't really read the
first few lines of the original post, then you're probably just
wasting your time if you read or answer the other posts of this
thread. (You're of course free to do just that - namely waste your
time - if you're in that mood :)

This thread is neither about a problem to be solved, nor about a
serious proposal for Java - despite certain formulations (picked
out of the context) may make it seem like it was.

Discussions like this one are incredibly valuable for the evolution of the Java language, and for computer languages in general.
 
R

Robert Klemme

Discussions like this one are incredibly valuable for the evolution
of the Java language, and for computer languages in general.

.... and probably also for the understanding of programming languages.

Cheers

robert
 
M

Michal Kleczek

I don't think I've ever seen a language with dynamic array creation that
has a lower bound on the length of an array a variable can reference.
I'm not sure how it would work, or how I would use it.

Wouldn't any language with dependent typing have it? There are some
academic languages implementing it (actually parts of since such
typechecking is undecidable).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top