Philosophical question on collection reference types

  • Thread starter Patricia Shanahan
  • Start date
P

Patricia Shanahan

I habitually use the interface types for java.util collection
references:

List<SomeClass> myList = new ArrayList<SomeClass>();

However, in writing some code I started questioning this. The code uses
frequent indexed access to lists with thousands of elements. I wrote it
with the intention of using ArrayList, for fast random access.

Would it perhaps be more appropriate to declare:

ArrayList<SomeClass> myList = new ArrayList<SomeClass>();

to indicate that the code is designed with an ArrayList in mind, and may
need to be rewritten if the List class changes? The code would function,
given sufficient patience, with any List implementation, but I would
never design it that way if I were going to use a LinkedList.

Patricia
 
C

Chris Uppal

Patricia said:
I habitually use the interface types for java.util collection
references:

List<SomeClass> myList = new ArrayList<SomeClass>();

However, in writing some code I started questioning this. The code uses
frequent indexed access to lists with thousands of elements. I wrote it
with the intention of using ArrayList, for fast random access.

It's an interesting question, and in this case following the established dogma
does work against making the code as communicative as possible. It is an
example of hiding /too much/ information.

Perhaps you could define:

interface IndexedList
extends
java.util.List,
java.util.RandomAccess
{
}

and then write your code to require/promise an IndexedList instead of a List.

If that doesn't work for you, then I think you have to choose which is the more
likely of two possible futures.

1) If you think you are likely to want a better List implementation than
ArrayList (e.g. one that keeps its elements in fixed-size chunks rather than in
one huge contiguous array), then use List. On this subject, note that the
objects returned by Arrays.toList() and Collections.unmodifiableList() are
Lists but are not ArrayLists (nor IndexedLists come to that).

2) If you think you (or anyone) is likely to supply, or assume, an
inappropriate implementation of List, such as LinkedList, then use ArrayList
explicitly.

At a guess, and only judging from what you've said about your work, (1) seems
more likely to me than (2). So, if my guess is right, I'd use List. And, if I
wanted to clarify the implicit assumptions my code was making, I would add an
explanatory comment. Or even add:
assert (list instanceof RandomAccess);

-- chris
 
M

Mark Space

Patricia said:
I habitually use the interface types for java.util collection
references:

List<SomeClass> myList = new ArrayList<SomeClass>();

However, in writing some code I started questioning this. The code uses

The first thing that occurs to me is that trying too hard to anticipate
future requirements is likely to result in Gold Plating -- complexity
that your application doesn't need and won't ever use.

OTOH, if you think it's likely that your big data object may need to be
implemented differently in the future, using an abstract type like List
seems to make sense. List is "free" (somebody has already done the work
to define it) and works with a variety of existing tools. Implementing
your own List based on, for example, a Strategy Pattern and using memory
mapped files, sounds easier than trying to do the same with an ArrayList.

My two cents.
 
P

Patricia Shanahan

Mark said:
The first thing that occurs to me is that trying too hard to anticipate
future requirements is likely to result in Gold Plating -- complexity
that your application doesn't need and won't ever use.

I don't see the complexity difference between the two alternatives,
declaring myList as a List or as an ArrayList.
OTOH, if you think it's likely that your big data object may need to be
implemented differently in the future, using an abstract type like List
seems to make sense. List is "free" (somebody has already done the work
to define it) and works with a variety of existing tools. Implementing
your own List based on, for example, a Strategy Pattern and using memory
mapped files, sounds easier than trying to do the same with an ArrayList.

I think it is VERY unlikely that I'll ever use anything other than
ArrayList. ArrayList is fast enough for what I'm doing. Indeed, that is
the point of my question - since it is an ArrayList, and is very
unlikely to ever be anything other than an ArrayList, without major
changes to the code that uses it, why not call it an ArrayList in the
reference type declaration?

This is primarily a question about programming as human-human
communication, to either another programmer or my future self. I can
change the declared type, in either direction, with very little
programming effort.

Patricia
 
D

Daniel Pitts

I don't see the complexity difference between the two alternatives,
declaring myList as a List or as an ArrayList.




I think it is VERY unlikely that I'll ever use anything other than
ArrayList. ArrayList is fast enough for what I'm doing. Indeed, that is
the point of my question - since it is an ArrayList, and is very
unlikely to ever be anything other than an ArrayList, without major
changes to the code that uses it, why not call it an ArrayList in the
reference type declaration?

This is primarily a question about programming as human-human
communication, to either another programmer or my future self. I can
change the declared type, in either direction, with very little
programming effort.

Patricia

Returning a List lets you keep the implementation abstract. If you
document the fact that "the List returned will be a RandomAccess
list", users of your class can feel safe using random access without
worrying about it being an ArrayList, Arrays.toList(), or some other
tbd list.

I think its overkill to add implementation complexity to the names of
things. You run the risk of not knowing where to stop:

public interface ListWithConstantTimeLookup<T> {
T getInConstantTime(int offset);
void putInConstantTime(T o, int offset);
}

Alternatively, users who need random access but can't assume its
available can do this:

List<Foo> list = getAllFoo();
if (!(list instanceof RandomAccess)) {
list = Arrays.asList(list.toArray(new Foo[0]);
// or list = new ArrayList(list);
}

or,
List<Foo> list = getAllFoo();
assert list instanceof RandomAccess;

Although, I have to say that in most cases I'm able to deal with lists
through iteration rather than by lookup. Often times I'll use
Collection as the return type or parameter type, if it really doesn't
matter whats going on.
 
P

Patricia Shanahan

Daniel said:
Returning a List lets you keep the implementation abstract. If you
document the fact that "the List returned will be a RandomAccess
list", users of your class can feel safe using random access without
worrying about it being an ArrayList, Arrays.toList(), or some other
tbd list.

In this case the List is not being returned, but used locally to
implement an algorithm that makes random accesses.
Although, I have to say that in most cases I'm able to deal with lists
through iteration rather than by lookup. Often times I'll use
Collection as the return type or parameter type, if it really doesn't
matter whats going on.

I tend to use List or Set most of the time. Usually, there is some
possibility that I'll change my mind about the implementing class
without rewriting the code that uses it. This time, that is very unlikely.

Patricia
 
D

Daniel Pitts

In this case the List is not being returned, but used locally to
implement an algorithm that makes random accesses.
If its a locally used collection, I can't think of an argument for
either abstract or concrete, except for consistency sake. "She always
returns List, but declares ArrayList inside her methods. That's
confusing." I've seen people do this too (return List, but declare
ArrayList for local variables)

More to the point, my point was more about finding an alternative
algorithm which can utilize iteration over random access. I realize
that this isn't always feasible or even always possible, but for the
majority of things that I come across, it is often possible.
I tend to use List or Set most of the time. Usually, there is some
possibility that I'll change my mind about the implementing class
without rewriting the code that uses it. This time, that is very unlikely.

Patricia

Well, even so, since the use is local, you can probably easily change
the type anyway as long as the API remains the same. You wouldn't
even need to worry about "interface" or "class". It quacks like a duck
and walks like a duck. This is one argument for weakly-typed
languages. An interface simply becomes what is defined by an object.
It could be constructed, restructured, or even retrofitted at
runtime.

I guess my point is that really the only factor that I can see that
really tips the scales is the "consistency" factor. Using the most
generic/abstract type possible in most situations (where it DOES
decouple classes), but in one case using a concrete type just because
it makes something seem more clear to you*, versus being consistent
all the way through. Humans like consistency.

* I've been learning lately that even the best designed code can seem
opaque to someone, and the worst design code can make perfect sense to
someone. Its dangerous to assume that someone (even yourself) will
understand your "perfect" design sometime in the future. :)

Anyway, good luck,
Daniel.
 
P

Piotr Kobzda

Patricia said:
In this case the List is not being returned, but used locally to
implement an algorithm that makes random accesses.

You might express that using generic method with a type parameter in
form of intersection type.

<E, L extends List<E> & RandomAccess> void processList(L list) {
// implement your algorithm here ...
}

Intersection type may be used as well for method's return type, or even
for local (or filed of generic class) declaration.


However, there are some restrictions imposed by intersection types.

For example, in processList() implementation you can not do that:

L l = new ArrayList<E>();

Compiler complains with: "Type mismatch: cannot convert from
ArrayList<E> to L".

That's confusing, especially in front of compiler's acceptance (without
any warning) of:

processList(new ArrayList<String>());


I presume the reason is (as usually is such a cases) type erasure --
what I wont even try to ensure...

Simple workaround for that is to use unchecked cast:

L l = (L) new ArrayList<E>();


I tend to use List or Set most of the time. Usually, there is some
possibility that I'll change my mind about the implementing class
without rewriting the code that uses it. This time, that is very unlikely.

Than try with intersection type. (Honestly, I've never tried that, so
I'm far from recommending it).


piotr
 
L

Lew

Chris said:
It's an interesting question, and in this case following the established dogma
does work against making the code as communicative as possible. It is an
example of hiding /too much/ information.

As I understand it, the established dogma is to use the widest type that does
what you want. List is too wide for this, so use ArrayList.

Like many other ideas, the oversimplification seems to stick in people's minds
that one should "always use the interface". The real principle is to use the
widest type that fits.

-- Lew
 
L

Lew

ArrayList promises to be Serializable, List does not.

There are times when you should specify the implementation.

Use the type that makes the promises you need. Don't ask for generality when
you want specificity.

-- Lew
 
C

Chris Uppal

Lew said:
As I understand it, the established dogma is to use the widest type that
does what you want. List is too wide for this, so use ArrayList.

Note of terminological difference: what you're calling wide is what I call the
narrowest type -- the one which conveys the least information.

Like many other ideas, the oversimplification seems to stick in people's
minds that one should "always use the interface". The real principle is
to use the widest type that fits.

In this case, though, Patricia /can/ use List, since it supports all the
necessary operations. ArrayList would be overspecifying (or might be). The
information that is, or isn't, being communicated isn't really type-related[*]
at all, but is a quality-of-service aspect. It isn't enough just to be able to
do <something>, but the object must be able to do <something> with certain
performance guarantees.

-- chris

[*] "type" here being construed in the sense of Java's type system. It's
possible to imagine a type system where asymptotic performance bounds were part
of methods' type signatures.
 
L

Lew

Chris said:
Note of terminological difference: what you're calling wide is what I call the
narrowest type -- the one which conveys the least information.

So noted, and I was not being precise. My sense of "wide" means "encompasses
the largest number of subtypes". By rights I should have said "highest"
instead of "widest", thus conveying the usual depiction of top-rooted
inheritance hierarchies and conforming to the notion of "upcasting".

Good call.

-- Lew
 
P

Piotr Kobzda

Chris said:
Lew wrote:

Or simply -- use the least specialized type that fits.

In this case, though, Patricia /can/ use List, since it supports all the
necessary operations. ArrayList would be overspecifying (or might be). The
information that is, or isn't, being communicated isn't really type-related[*]
at all, but is a quality-of-service aspect. It isn't enough just to be able to
do <something>, but the object must be able to do <something> with certain
performance guarantees.
[*] "type" here being construed in the sense of Java's type system. It's
possible to imagine a type system where asymptotic performance bounds were part
of methods' type signatures.

Agreed. However, I see no problem with type system defined like that.
A lot of (less or more useful) "typologies" coexists in Java's type
system (are they all, for example Serializable semantics, really
type-related?), why then not to add that another one (i.e. performance
aspect) too? Especially when that matters, and may become a source of
not only poor quality of service, but even of denial of service (e.g.
system not responding in "reasonable" time...).

That type can be simply defined (the IndexedList from your previous post
is an example). But it holds obvious disadvantage, classes like
ArralList must additionally implement it to become useful.

We can also define that type "virtually" using type intersection --
which I can't point out important disadvantages for.


What's then the argument against using intersection type here?


BTW -- I've just explored a bit deeper my advice from previous post, and
it results in conclusion, that intersection type can be used even
simpler in that case, e.g. that way:

<L extends List<?> & RandomAccess> void processList(L list)

or, of course, when a concrete list elements type is know, that way:

<L extends List<String> & RandomAccess> void processList(L list)


Nevertheless, my previous parameterization with two type variables, may
still appear useful when we need additional constrains on list's element
types, or when exact return type of list is unknown, and we don't want
to lose any information about it from method's return type, e.g.:

<E, L extends List<E> & RandomAccess> L filter(L list) {
// ...
return list;
}

Although, usage of such a method is probably slightly limited (instead
of method-chaining and the like), it's /at least/ worth to be noted as
an alternative declaration.

Inconvenience here is when we want to have, for example, a generic
factory method like that:

<E, L extends List<E> & RandomAccess> L createRandomAccessList() {
return (L) new ArrayList<E>(); // unchecked cast
}

Which, despite of that ugly unchecked cast, seems to be useful at first,
and it even works as expected, but holds a small problem... We can not
directly pass its result as an input for another method which it should
be compatible with -- the following won't compile:

processList(createRandomAccessList());

That's because type inference takes place here, so without any
additional advice compiler fails with that.

Hopeless, while Java generics is implemented using type erasure (and
capture conversion, and so on...), we (I) can't even workaround it, i.e.
to make it working without explicitly provided type argument...


But let's beck to the original problem...

Don't you think, that in Patricia's particular case, /possibly/ neither
the use of List nor ArrayList is better than using intersection type of
List and RandomAccess?


piotr
 
P

Piotr Kobzda

Piotr said:
Inconvenience here is when we want to have, for example, a generic
factory method like that:

<E, L extends List<E> & RandomAccess> L createRandomAccessList() {
return (L) new ArrayList<E>(); // unchecked cast
}

Which, despite of that ugly unchecked cast, seems to be useful at first,
and it even works as expected, but holds a small problem...

Well. It's not a small problem here (and it's not /that/ problem in
fact), it's rather bigger one... Here is a mistake in my treatment of
second type parameter, not a generics implementation nor a compiler
fault. The ArrayList<E> is not valid substitute for all possible
We can not
directly pass its result as an input for another method which it should
be compatible with -- the following won't compile:

processList(createRandomAccessList());

That's because type inference takes place here, so without any
additional advice compiler fails with that.

Hopeless, while Java generics is implemented using type erasure (and
capture conversion, and so on...), we (I) can't even workaround it, i.e.
to make it working without explicitly provided type argument...

However, it can be done correctly using factory object implementing the
following-like interface:

interface RandomAccessListFactory<E, L extends List<E> &
RandomAccess> {
L newList();
}

We can then create its implementation like that:

static <E> RandomAccessListFactory<E, ? extends List<E>>
defaultListFactory() {
return new RandomAccessListFactory<E, ArrayList<E>>() {
public ArrayList<E> newList() {
return new ArrayList<E>();
}
};
}

and then use it, that way:

processList(defaultListFactory().newList());

The compiler nicely infers all types as needed. (Notice no need for
explicit declaration of ArrayList beside the factory scope).


The above is wide extension, and possibly very far from, the point of
the original question. It simply shows, that working with intersection
types is not as straight forward as working with the regular Java types
(more, it still appears to me as a little abuse of the intersection
types), but it seams to be possible.

Of course, I'm still far from recommending it -- just wondering, how to
efficient work with that, and if it all is worth the effort?...


piotr
 
C

Chris Uppal

Piotr said:
That type can be simply defined (the IndexedList from your previous post
is an example). But it holds obvious disadvantage, classes like
ArralList must additionally implement it to become useful.

We can also define that type "virtually" using type intersection --
which I can't point out important disadvantages for.


What's then the argument against using intersection type here?

Right now, I have no real argument against it -- the reason I didn't mention it
myself is that it simply didn't occur to me that you could express the dual
requirement in that way.

Besides the fact (unfortunate but true) that I tend to forget that intersection
types are available at all, I don't feel confident enough in my "feel" for what
knock-on effects a given declaration will have (especially a relatively
"advanced" one), to be able to adocate using or avoiding that approach. I
would be a good deal happier if intersection types could be declared as
"top-level" constuctions -- the way they are resticted to use in type bounds
makes me uneasy, but I don't yet know whether that unease is justified.

I want think about this some more, and do some experiments -- it's a good
chance to learn a bit...

-- chris
 
D

Daniel Pitts

Right now, I have no real argument against it -- the reason I didn't mention it
myself is that it simply didn't occur to me that you could express the dual
requirement in that way.

Besides the fact (unfortunate but true) that I tend to forget that intersection
types are available at all, I don't feel confident enough in my "feel" for what
knock-on effects a given declaration will have (especially a relatively
"advanced" one), to be able to adocate using or avoiding that approach. I
would be a good deal happier if intersection types could be declared as
"top-level" constuctions -- the way they are resticted to use in type bounds
makes me uneasy, but I don't yet know whether that unease is justified.

I want think about this some more, and do some experiments -- it's a good
chance to learn a bit...

-- chris

It seems that it would be a useful Java syntax to allow type
intersections in regular variable declarations.

public <E> void myFunction(List<E>&RandomAccess randomAccessList)
{
}

It would be especially useful for this type of use:
public void myFunction() {
Runnable&Comparable<?> myRunner = getNextRunner();
myRunner.run();
return myRunner.compareTo(getNextRunner());
}

Note that I use methods in both the Runnable interface and Comparable
interface.

I'm not sure I like the syntax, but it could be worse :).

My examples may not seem useful, but imagine the power with custom
interfaces, rather than built in ones.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top