setSize ArrayList, when will it come?

J

Jan Burse

Jan said:
I guess Larry will give me some money, when he sees that
with the new setSize() his servers will be more green. There
should be a market for energy efficiency nowadays.

Look see: Migrating from Vector to ArrayList is all
about energy saving. ArrayList does not use synchronized,
therefore it does use less CPU. But in my case the
migration is not perfect, since setSize() is missing.

Migrating Hashtable to HashMap didn't cause me any pain
so far. I saw the following speed-up and thus
energy saveing in some sense:

32-bit JDK Synchronized: 12'898 ms (*)
32-bit JDK Un-Schronized: 12'438 ms
64-bit JDK Synchronized: 7'769 ms
64-bit JDK Un-Synchronized: 7'308 ms

Now I am in the progress of migrating the Vectors to
ArrayList. But because ArrayList is mindlessly lacking
setSize() I am stuck. The bug report shows that I
am not the only one who has experienced this problem
so far.

Oracle is probably less pronounced in being green
compared to IBM. IBM does it under the cover of
its smarter planet initiative.
http://www.eweek.com/c/a/IT-Infrastructure/5-Steps-to-Green-IT/
For oracle I found also something:
http://www.oracle.com/us/products/applications/green/061929.html

Best Regards

(*)
http://www.facebook.com/photo.php?fbid=177557452312810
 
A

Andreas Leitgeb

Jan Burse said:
I guess Larry will give me some money, when he sees that
with the new setSize() his servers will be more green.

I'm all in favor of setSize() for easing mechanical Vector->
ArrayList transition, but I'm sure, some of the workarounds
for its lack could lead to even "greener" code:

1.) a good estimate on what size you'll end up using, and
create the ArrayList with that size. Eating up a few
kilobytes (or even up to a megabyte) of nulls too much
is probably still greener than re-allocating the buffer
each time a new largest-so-far index shows up. If the
original estimation turns out to be too small, then grow
it with addAll(...nCopies(...)) - it wouldn't be done
all that often, so the extra effort wouldn't matter.

2.) if no such limit can be reasonably predicted, then
chances are that a real sparse structure really fits
the bill better. (I didn't quite understand your argument
against using a Map - it didn't make sense to me when you
wrote it)

The ideal (semi-)sparse structure wouldn't even need a setSize(),
as it would grow the internal buffer as needed to allow .set() to
work with any (non-negative) index, and let .get() just return null
for indices beyond the current end instead of throwing
IndexOutOfBounds-Exceptions.
 
J

Joshua Cranmer

No you didn't get it. I need a fast specialized
setSize() as argued in the bug reference. According
to the collections documentation, the nCopies method
does create an extra object:

http://download.oracle.com/javase/1...llections.html#nCopies(int, java.lang.Object)

... The newly allocated data object is tiny (it contains
a single reference to the data object). ...

But it is not only that this tiny object of type
Collections.CopiesList will be created. For addAll
call an Iterator AbstractList.Itr will be
created I guess, not sure.

Okay, if you're so annoyed about the creation of several objects, you
can create your own version of a CopiesList that implements both
Collection and Iterator and implements everything on itself, for a
single overhead of around 12 bytes. If that is still too much, run a
garbage collection. You now have a few hundred extra kilobytes of data.
If you're still complaining, you're using the wrong language.
 
J

Joshua Cranmer

Look see: Migrating from Vector to ArrayList is all
about energy saving. ArrayList does not use synchronized,
therefore it does use less CPU. But in my case the
migration is not perfect, since setSize() is missing.

We have given *several* ways to re-emulate setSize using ArrayList.
Since moving from Vector to ArrayList causes savings in loss of
synchronized, perhaps the speed slowdown in your emulations is more than
made up for.
 
J

Jan Burse

Andreas said:
1.) a good estimate on what size you'll end up using, and
create the ArrayList with that size. Eating up a few
kilobytes (or even up to a megabyte) of nulls too much
is probably still greener than re-allocating the buffer
each time a new largest-so-far index shows up. If the
original estimation turns out to be too small, then grow
it with addAll(...nCopies(...)) - it wouldn't be done
all that often, so the extra effort wouldn't matter.
2.) if no such limit can be reasonably predicted, then
chances are that a real sparse structure really fits
the bill better. (I didn't quite understand your argument
against using a Map - it didn't make sense to me when you
wrote it)

I never placed an argument against map. Where was this?
Just assume that the original code had setSize() for
whatever reason, and that the redesign of the original
code is not at stake.

But you are right, a redesign could of course be an option.
The ideal (semi-)sparse structure wouldn't even need a setSize(),
as it would grow the internal buffer as needed to allow .set() to
work with any (non-negative) index, and let .get() just return null
for indices beyond the current end instead of throwing
IndexOutOfBounds-Exceptions.

Actually in the current application the Vector/ArrayList need
not necessarely be sparse. It could be, but it does not have
to be, both scenarios should possible coexist, sparse and non-sparse.
The current application has a high frequency of creating for an
initially Vector/ArrayList of size 0.

Then upon some event (imagine also high frequency) the size
might jump to n, and the element at position n-1 is set to
non null. And then upon some other event the size might jump
to m, and the element at position m-1 is set to non null. But
there is no guarantee that n and m are close. Also it can
happen that an event j arrives, and the element at position
j is set to non null, j being somewhere between 0 and m-1. So
that the array might become gradually less sparse.

Imagine the above process to last a very short time, until the
livecycle of the array ends. Before the livecycle ends, the
array might contain up to 100 or more non-null positions. But
there is no guarantee that they are close together or not.
But when there are 100 entries their indexes might span the
range of 1000. But it is very important that the access to
the elements is O(1).

The sparseness is only mentioned in the oracle bug. But actually
I think the sparseness is irrelevant. Just assume you have an
existing application with Vector that happily uses setSize(). So
what do you do?

Bye
 
J

Jan Burse

Joshua said:
We have given *several* ways to re-emulate setSize using ArrayList.
Since moving from Vector to ArrayList causes savings in loss of
synchronized, perhaps the speed slowdown in your emulations is more than
made up for.

If the emulation eats up the speed gain of the remove the
synchronization, then there is no use in doing the migration
at all. The synchronization primitive in Java is already
highly optimized, it has a fastpath in the non-competitive
case. Removing synchronization gives some performance gain,
but it is in the range of 5-20% only nowadays. It depends of
course on how much synchronization the original application used.

So the chances are very high, that an emulation eats up
exactly these 5-20% percents.

Bye
 
J

Jan Burse

Joshua said:
Okay, if you're so annoyed about the creation of several objects, you
can create your own version of a CopiesList that implements both
Collection and Iterator and implements everything on itself, for a
single overhead of around 12 bytes. If that is still too much, run a
garbage collection. You now have a few hundred extra kilobytes of data.
If you're still complaining, you're using the wrong language.

The language is fine, since it could have an efficient setSize().
Its not a problem of the Java language at all. And of course
I could create also a clone of ArrayList and add setSize there.

But I don't like the idea of copy/pasting the ArrayList implementation.
Once you start implementing your own primitive classes your project
reaches another dimension. But yes of course this is an option, if
the out of the box "collections" do not fit.

Bye
 
J

Jan Burse

Patricia said:
On 8/9/2011 3:36 PM, Jan Burse wrote:
...
...

How about supplying some numbers, such as the proportion of nulls and
the frequency distribution of adding various sizes of null blocks
relative to other activity. Without numbers, it is rarely possible to
get a performance issue.

Maybe even a benchmark showing typical activity for one of your
sparse-ish arrays?

Patricia

You can assume the constraint that the element access
should stay at O(1). This narrows down the possible
data structures a little bit.

Bye
 
A

Andreas Leitgeb

Jan Burse said:
But when there are 100 entries their indexes might span the
range of 1000. But it is very important that the access to
the elements is O(1).

In *that* case, I'd use a plain array, sized to say 2048 and
just index into it.
Just assume you have an existing application with Vector
that happily uses setSize(). So what do you do?

I'd try to find out, whether the current .size() is relevant to the
business logic.
If not, then most of the repeated setSize()ing maybe was just a
waste of cpu-time&energy, anyway.
If it is, then I might keep it in a separate variable or field,
and keep the array or ArrayList at some sufficiently large size.
If it is, and the collection itself gets passed to external
interfaces that care about the size ... or similar misfortunes,
then I'd give up and stay with Vector. (Yes, if ArrayList had
..setSize(), then this particular leaf of the decision tree would
fare better.)

Based on that you mentioned "no money (no time) for a rewrite", I
might just as well give up on conversion, anyway and stay with Vector.
If that is too slow, then *they* should spend some money (or time)
on making it faster.
 
J

Joshua Cranmer

So the chances are very high, that an emulation eats up
exactly these 5-20% percents.

I honestly doubt that. But instead of arguing about it, why not *try*
implementing it and measuring the performance implications to verify for
certain?
 
J

Jan Burse

Andreas said:
In*that* case, I'd use a plain array, sized to say 2048 and
just index into it.

Actually this is a very good idea. But it only
works when the application did not use the ArrayList
as a point of reference for sharing, so that
by pointing to the ArrayList object all clients of
this object would automatically see any changes
on the object. This is not possible with an array.

So have to check how the ArrayList is used in
the application. There are chances that the ArrayList
itself sits in an object and that this parent object
can be used as a point for sharing instead of the
ArrayList. And then I could replace the ArrayList
by an array. If there is no such parent object, then not.

If there is no such parent object, I would need
to wrap the array in some object, and would end
up re-implementing ArrayList. Lets see.

But you are also right that if I know an the size
of the array, and could make the size fixed, then
no change of the array itself would happen. I would
not need to bother about having a shared reference.
But I guess the fixed size assumption is not possible
here. The array might or might not filled at all.

Bye
 
J

Jan Burse

Joshua said:
I honestly doubt that. But instead of arguing about it, why not *try*
implementing it and measuring the performance implications to verify for
certain?

I don't exclude doing this some time ahead.
Currently I was not yet able to do this
given the time and cost constraints around.

Bye
 
J

Jan Burse

Patricia said:
Does that include accesses that require a size increase?

Patricia

I don't have yet frequency figures, and thus cannot
give you much non-functional requirements on
setSize(). But you are right, if no fast path is
available, in some circumstances a semi-fast path
or even a slow path might also do.

Sorry for not being able to provide you more information.

Bye
 
A

Andreas Leitgeb

Jan Burse said:
But I guess the fixed size assumption is not possible
here. The array might or might not filled at all.

Is that now about an odd chance that the array/ArrayList might
grow beyond the initial estimate? Or is that about other code
(that sees the ArrayList) that may be relying on that the last
non-null element is at position "size()-1" ?

If the former: for an odd-chance event, doing
addAll(...nCopies(...))
is good enough.

If the latter: stick to Vector ;-/ Even if ArrayList.setSize()
was ever added, that would surely happen too late for your project.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top