How to iterate a collection fast and memory-efficient?

  • Thread starter Dobieslaw Wroblewski
  • Start date
D

Dobieslaw Wroblewski

Hi.

I took a look at the Sun's tutorial on collections, but it does not say much
about complexity of various operations, and... javadoc says even less :-/.

Please help:

I am using a Vector of elements, but it turned out that Vector is not
suitable anymore and I need SortedMap instead. One of the operations that I
am going to do very frequently is to iterate objects stored in the map in
the key's increasing order.

The only class implementing SortedMap is TreeMap, and in order to iterate, I
believe I have to use TreeMap.values(). Here come 2 questions:

1.
Will I get what I want? They say "collection's iterator will return the
values in the order
that their corresponding keys appear in the tree" in javadoc.

2.
Will this be as cheap in terms of memory and complexity, as in the case of
Vector?
-----------------------------
// v is Vector()
SomeType sth;
int tmpSize = v.size();
for ( int i = 0; i < tmpSize; i++ ) {
sth = (SomeType)v.get(i);
...
}
-----------------------------
Or maybe it will use more memory and time? The problem is that I am going to
iterate frequently, so I do not want the heap to grow fast before garbage
collector does its job.

I simply do not know what TreeMap.values() does actually...

It seems that a new object is created (this means some memory and time)...
luckily, the elements seem not to be copied ("The collection is backed by
this TreeMap instance, so changes to this map are reflected in the
collection, and vice-versa"). I could preserve the reference between
iterations and so save time and space if I iterate frequently. But if no
object is really created, and the method just returns a reference to the
member, then this is useless effort.

Well, then I have to get Collection.iterator()... this time I cannot
preserve its value between iterations because it is forward-only. Will this
call create a new object on each iteration or maybe it will return a
reference to some existing object?

Finally, if the collection and map base on the same elements, then I have no
idea of how fast Iterator.next() is going to be :-/.


Well maby something else than TreeMap?

My requirements:
a. fast access to elements by key,
b. keys are integers, but not a contiguous range [0..n];
they might start above zero and many keys might be missing
in the range,
c. frequent iterating of the elements must be fast and memory efficient,
d. iteration must return elements in order of keys or in order
they were added to the structure,
e. variable number of elements (but adding and removal needs
not to be extremely fast)

Thanks for any help in advance.

DW.
 
D

Dimitri Maziuk

Dobieslaw Wroblewski sez:
My requirements:
a. fast access to elements by key,

You'll need a Map of some kind. Key-based access in a map
is supposed to be O(1).
b. keys are integers, but not a contiguous range [0..n];
they might start above zero and many keys might be missing
in the range,

OK as long as there are no duplicate keys. Note that map key
must be an object (Integer), you can't use ints.
c. frequent iterating of the elements must be fast and memory efficient,

Iteration is O(n). If you're looking for a specific element
you may be able to do binary search on sorted list and cut
it down to O(log n). If you really need to iterate over
every element, it's O(n) no matter what you do.

You'll be creating 2 temp. objects for iteration: a
collection (e.g. values()) and an iterator over it.
Everything else should be references to existing objects.
d. iteration must return elements in order of keys or in order
they were added to the structure,

You'll need a sorted map for the former. However, if you also
want to keep track original order, you'll have to add a List
-- map loses original order. (And before you use Vector for
that, look up ArrayList in the API.)
e. variable number of elements (but adding and removal needs
not to be extremely fast)

OK as long as n is relatively small. With milions of data
points you'll start running into scalability issues.

Dima
 
D

Dobieslaw Wroblewski

You'll be creating 2 temp. objects for iteration: a
collection (e.g. values()) and an iterator over it.

Do you think they are light objects? I am concerned about memory if the
iterations are to be repeated frequently, e.g. in a loop.
I re-checked the docs and asked some other people... I thik I will use
entrySet() instead of values(). Does this also mean an object creation?

I am also thinking of using LinkedHashMap instead of TreeMap.

OK as long as n is relatively small. With milions of data
points you'll start running into scalability issues.

I really doubt there will be more than 100...

DW.
 
F

Frank

Dobieslaw said:
Do you think they are light objects?

Read the javadocs. Map.values() is backed by the original collection,
which means it won't add a lot of overhead.
I am concerned about memory if the
iterations are to be repeated frequently, e.g. in a loop.
I re-checked the docs and asked some other people... I thik I will use
entrySet() instead of values().

You need the keys along with values?
Does this also mean an object creation?
I am also thinking of using LinkedHashMap instead of TreeMap.

You'll lose out on having your data sorted by key in that case.
I really doubt there will be more than 100...

You could do what dimitri mentions at O(log N) as well, at the cost of
an extra reference stored for each element. But I wouldn't worry about
that in this case.. :)


One little word of advice; Don't worry about creating temporary objects,
current garbage collectors make allocation/deallocation very efficient.

Use iterators to walk through your lists, for instance.
The additional overhead from using Vector over ArrayList is probably
several orders of magnitude higher than gc'ing an iterator every now and
then. Plus you won't have to rewrite if you switch to a linked list.
 
F

Frank

Frank said:
You could do what dimitri mentions at O(log N) as well, at the cost of
an extra reference stored for each element. But I wouldn't worry about
that in this case.. :)

In the worst case, that is. You don't mention how you'll be picking
which elements get removed.
 
C

Chris Smith

Dobieslaw Wroblewski said:
I took a look at the Sun's tutorial on collections, but it does not say much
about complexity of various operations, and... javadoc says even less :-/.

Incidentally, I'm not sure why you went from Vector to TreeMap because
of an ordering requirement. Shouldn't you be talking about TreeSet
instead? In any case, the rest of this post applies to either.

If the API specification (i.e., the JavaDocs) doesn't say anything about
asymptotic complexity, then it isn't specified. However, there are
reasonable expectations you could make about a quality implementation.
Those expectations include:

1. TreeMap uses O(n) space for data storage.
2. TreeMap uses O(log n) time for get() and put() operations.
3. TreeMap uses O(n log n) time for complete traversal.
4. TreeMap uses O(n log n) time to build an instance from existing data.

Creating an Iterator and various Set and Collection views is negligible
in time -- meaning O(1) -- and not worth your time to worry.

You said later that you expect 100 elements. If that's true, can't you
find something more useful to worry about than the performance of your
tree traversals... like, say, the probability of world annihilation due
to asteroid collision?

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Smith

Dimitri Maziuk said:
You'll need a Map of some kind. Key-based access in a map
is supposed to be O(1).

Where do you get that from? You certainly didn't get it from the API
specification, which is the only reasonable basis for such a statement.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
D

Dobieslaw Wroblewski

You said later that you expect 100 elements. If that's true, can't you
find something more useful to worry about than the performance of your
tree traversals... like, say, the probability of world annihilation due
to asteroid collision?

:))).

OK, thanks for your help... anyway... this collection will be small, but it
is going to be iterated repeatedly - I was just concerned about the
overhead, as the iterators are going to be created quite often. I am new to
Java and I am not aware of how much memory a simple lightweight class can
take from the heap, and I do not really know how the gc copes with creating
new objects so often.

Consider such an example:

//-----------------------------
public class A { int i; }

A a;
while (true) {
a = new A();
}
//-----------------------------

Will this code kill the system because gc will not free memory on time, or
not?

My code will not be SO BAD, but there might be moments of similar behaviour.

DW.
 
D

Dobieslaw Wroblewski

Read the javadocs. Map.values() is backed by the original collection,
which means it won't add a lot of overhead.

I know that.

You need the keys along with values?

Not really, but the return value of entrySet() is cached inside TreeMap and
according to javadoc the elements are returned in the order I want.

You'll lose out on having your data sorted by key in that case.

No problem. I create new keys in the same order as I insert the elements.

DW.
 
D

Dobieslaw Wroblewski

Well, I just tried such a code:

//------------------
int i = 0;

Integer ii;
while ( i < 5 ) {
ii = new Integer(4);
}

//------------------

It immediately takes about 12 MB of RAM, but also immediately stabilises and
does not take any more... so I am impressed - the gc is really eficient!

OK... so what about this asteroid? :)))))

DW.
 
C

Chris Smith

Dobieslaw Wroblewski said:
OK, thanks for your help... anyway... this collection will be small, but it
is going to be iterated repeatedly - I was just concerned about the
overhead, as the iterators are going to be created quite often.

Fine. Yet you asked about aymptotic complexity. That's only relevant
for large data sets. For a set of 100, the difference between O(n) and
O(log n) is only a constant factor of about 10 to 20. It's not at all
unusual for constant performance factors between algorithms to be at
about that level. So the constant factors are probably going to
overwhelm the theoretical complexity of operations at that point.

Once you realize that you care more about constant factors than
asymptotic complexity, then all of the advice I've written below
applies.
Well, I just tried such a code:

//------------------
int i = 0;

Integer ii;
while ( i < 5 ) {
ii = new Integer(4);
}

//------------------

It immediately takes about 12 MB of RAM, but also immediately stabilises and
does not take any more... so I am impressed - the gc is really eficient!

Garbage collectors do something called generational collection, which
means they isolate new objects and garbage collect them separately.
They then arrange so that garbage collection of just the newly created
objects is very efficient. In general, you don't need to worry about
creating temporary objects in terms of garbage collection. In fact,
attempts to reduce creation of temporary objects are often about as
likely to hurt things as make improvements; if objects survive the first
few garbage collections, they may cause as much as 50 times more garbage
collector work to dispose of them on average.

This is all an application of the fundamental principles of
optimization:

1. Don't do it.
2. (Advanced) Don't do it yet.

And, to add more practical advice to the classical list:

3. Profile first (which implies that you have a complete piece of
software to profile BEFORE you begin optimizing).

4. After each change, run the profiler to see if the change helped or
not. Back out the change if it didn't.

Human beings have a notoriously poor innate sense of what makes their
software slow. That's especially true when you combine a modern CPU
(which does all sorts of branch prediction, pipelining, and memory
access reordering) with three levels of cache, virtual memory and disk
buffering, a JIT compiler with adaptive optimizations, and a garbage
collector. All of those things can play havoc with what's fast and what
isn't, and you can't juggle them all in your mind at once. Don't even
try to predict your performance problems. Instead, finish your code and
then observe the problems.
OK... so what about this asteroid? :)))))

I didn't have a specific one in mind. That's what telescopes are for!

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
D

Dobieslaw Wroblewski

1. TreeMap uses O(n) space for data storage.
2. TreeMap uses O(log n) time for get() and put() operations.
3. TreeMap uses O(n log n) time for complete traversal.
4. TreeMap uses O(n log n) time to build an instance from existing data.

BTW... where did you get these figures from? I found only (2) in Javadoc. It
is surprising for me that the traversal is O(n log n)... I thought of
O(n)... (by 'complete traversal' you mean iteration through all elements,
don't you?).

DW.
 
A

Alan Krueger

Chris said:
Where do you get that from? You certainly didn't get it from the API
specification, which is the only reasonable basis for such a statement.

Plus, it isn't true in general. It might be true of a HashMap with a
good hashing distribution, but TreeMap lookup will likely be O(log N).
 
D

Dobieslaw Wroblewski

Oh, gosh :-(.

I decided to use LinkedHashMap, but then I realised I need also iteration in
a reverse order... how to do that?

DW.
 
A

Alan Krueger

Dobieslaw said:
BTW... where did you get these figures from? I found only (2) in Javadoc. It
is surprising for me that the traversal is O(n log n)... I thought of
O(n)... (by 'complete traversal' you mean iteration through all elements,
don't you?).

The JDK documentation (at least in 1.3 through 1.5, which I have in
front of me) claims TreeMap is a red-black tree, thus a binary search
tree (BST).

http://en.wikipedia.org/wiki/Red-black_tree
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree

Traversal of a BST is usually O(N), though, not O(N log N).
 
C

Chris Smith

Dobieslaw Wroblewski said:
It is surprising for me that the traversal is O(n log n)... I thought of
O(n)... (by 'complete traversal' you mean iteration through all elements,
don't you?).

You're right; I'm wrong. I wasn't thinking, and should have said O(n).
BTW... where did you get these figures from?

From understanding of the family of data structures that are implied by
the name TreeSet. As I said, there probably aren't guarantees of many
of them; but practically, you'd expect a good-quality general-purpose
implementation of a balanced tree to exhibit those characteristics.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
D

Dobieslaw Wroblewski

From understanding of the family of data structures that are implied by
the name TreeSet. As I said, there probably aren't guarantees of many
of them; but practically, you'd expect a good-quality general-purpose
implementation of a balanced tree to exhibit those characteristics.

Thanks for the answer... now I have another problem. Now I realised that I
need also iteration in the reverse order... I am sure that the internal
representation of LinkedHashMap can be iterated backwards (I saw the
source - there's a double linked list inside) but it seems that it is
impossible via the public methods :-(.

DW.
 
C

Chris Smith

Dobieslaw Wroblewski said:
Thanks for the answer... now I have another problem. Now I realised that I
need also iteration in the reverse order... I am sure that the internal
representation of LinkedHashMap can be iterated backwards (I saw the
source - there's a double linked list inside) but it seems that it is
impossible via the public methods :-(.

LinkedHashMap is should be treated as simply a convenience class. You
can build your own by encapsulating a HashMap and a LinkedList in the
same object. If you do so, you can traverse in the reverse direction by
calling listIterator(int), and then using previous() and hasPrevious()
instead of next() and hasNext().

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
D

Dimitri Maziuk

Chris Smith sez:
Where do you get that from? You certainly didn't get it from the API
specification, which is the only reasonable basis for such a statement.

From Algorithms and Data Structures 101. There's a reason I wrote
"supposed to be" rather than "is" -- this is the bound for a hash
with a proper hash function. IRL YMMV of course.

Dima
 
C

Chris Smith

Dimitri Maziuk said:
Chris Smith sez:

From Algorithms and Data Structures 101. There's a reason I wrote
"supposed to be" rather than "is" -- this is the bound for a hash
with a proper hash function. IRL YMMV of course.

The Map interface doesn't imply hashing, though. Implementations like
TreeMap (based on a red-black tree) have different efficiencies. It is
true that key-based access for a HashMap should be O(1); but it is *not*
true in general that key-based access for a Map should be O(1).

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,042
Latest member
icassiem

Latest Threads

Top