Immutable Datastructures with good Sharing

Jan Burse · Nov 6, 2011

Jan said:
Here is some explanation how an immutable set would work,
based on some tree where cloning is only performed along
the insert or delete path:

http://stackoverflow.com/questions/3233473/immutable-data-structures-performance

Could I derive a queue from this?

Bye

Ah, this is already better. Immutable Vectors with Sharing!
(Seems that another terminology for Immutable+Sharing is
persistent, but I would get a conflict with DB terminology)

http://www.codecommit.com/blog/scala/more-persistent-vectors-performance-analysis

I guess I will manage to go from a Vector to a Queue.

Bye

Giovanni Azua · Nov 7, 2011

I guess it requires JDK 1.7 or so for the parameter
type inference. Not sure, picked something up like
that...

Nop, it doesn't need 1.7. I can confirm it works just fine with JDK 1.6.x

Giovanni Azua · Nov 7, 2011

Hi Lew,

I'm not saying anything against Guava, but I fail to see the advantage of that
particular idiom. 'Lists.newArrayList' vs. 'new ArrayList<>' - eh, mezza
mezz'.

I don't blame you. I had the same resistant first reaction when I was
proposed to use Guava. Then something magical happened

I realized that
when you use new with Collections, generally you have to tell the compiler
two times what you want namely the generic parameter. I think that having to
do the same thing two times in development is always a bad thing: two times
harder to maintain, two times the amount of possible mistakes and in short,
two times the amount of work.

// two times having to type RequestData ... no big deal
List<RequestData> requests = new ArrayList<RequestData>();

// making the problem more explicit ...
List<Map<TpchWorkload,Map<String,List<Object>>>> parameters = new
ArrayList<Map<TpchWorkload,Map<String,List<Object>>>>();

// simply beautiful
List<Map<TpchWorkload,Map<String,List<Object>>>> parameters =
Lists.newArrayList();

Now tell me, how much time have you spent in your life fixing the small
mistakes that originate from getting the LHS out of sync with the RHS? This
alone is one of those small details that make your daily development work
enjoyable rather than painful.

Guava rocks!

Arne Vajhøj · Nov 7, 2011

I don't blame you. I had the same resistant first reaction when I was
proposed to use Guava. Then something magical happened I realized that
when you use new with Collections, generally you have to tell the compiler
two times what you want namely the generic parameter. I think that having to
do the same thing two times in development is always a bad thing: two times
harder to maintain, two times the amount of possible mistakes and in short,
two times the amount of work.

// two times having to type RequestData ... no big deal
List<RequestData> requests = new ArrayList<RequestData>();

// making the problem more explicit ...
List<Map<TpchWorkload,Map<String,List<Object>>>> parameters = new
ArrayList<Map<TpchWorkload,Map<String,List<Object>>>>();

// simply beautiful
List<Map<TpchWorkload,Map<String,List<Object>>>> parameters =
Lists.newArrayList();

Now tell me, how much time have you spent in your life fixing the small
mistakes that originate from getting the LHS out of sync with the RHS? This
alone is one of those small details that make your daily development work
enjoyable rather than painful.

Guava rocks!

That particular argument is somewhat obsolete with Java 1.7.

And I am not so sure that avoiding typing is a good goal in
programming, but that is a different discussion.

Arne

Arved Sandstrom · Nov 7, 2011

Ah, this is already better. Immutable Vectors with Sharing!
(Seems that another terminology for Immutable+Sharing is
persistent, but I would get a conflict with DB terminology)

Well, a few of us have already mentioned "persistence" in this thread,
meaning persistence in the data structure sense. The definition features
right at the beginning of one of Okasaki's main initial sections.

Don't take this the wrong way, but I find it disconcerting when a poster
with questions confesses that they're not reading a bunch of the
answers. And how could you possibly have gotten an impression of
Okasaki's work if you missed this particular definition?

Just sayin'.

And, no, "persistence" != Immutable + Sharing.

[ SNIP ]

AHS

Tom Anderson · Nov 7, 2011

Somebody here, who could throw a spot light on how they do a Queue?

I can tell you how i do an O(1), well-sharing queue:

public final class ImmutableQueue<E> {

private static class Link<E> {

public final E element;
public final Link<E> next;

public Link(E element, Link<E> next) {
this.element = element;
this.next = next;
}

}

private final Link<E> head;
private final Link<E> tail;
private final Link<Link<E>> tuft;

private ImmutableQueue(Link<E> head, Link<E> tail, Link<Link<E>> tuft) {
assert (tuft == null) || (tuft.element.next == tail);
this.head = head;
this.tail = tail;
this.tuft = tuft;
}

private ImmutableQueue(Link<E> element) {
this(element, element, null);
}

public ImmutableQueue() {
this(null);
}

public boolean isEmpty() {
return head == null;
}

public int size() {
if (isEmpty()) return 0;
int size = 1;
Link<E> link = head;
while (link != tail) {
++size;
link = link.next;
}
return size;
}

public ImmutableQueue<E> enqueue(E element) {
Link<E> enqueued = new Link<E>(element, head);
if (isEmpty()) return new ImmutableQueue<E>(enqueued);
else return new ImmutableQueue<E>(enqueued, tail, null);
}

public ImmutableQueue<E> dequeue(E[] holder) {
ImmutableQueue<E> queue;

if (head == null) {
throw new NoSuchElementException();
}
else if (head == tail) {
queue = new ImmutableQueue<E>();
}
else {
Link<Link<E>> tuft = this.tuft;
if (tuft == null) {
tuft = backcomb();
assert tuft.element.next == tail;
}
queue = new ImmutableQueue<E>(head, tuft.element, tuft.next);
}

holder[0] = tail.element;
return queue;
}

private Link<Link<E>> backcomb() {
Link<Link<E>> tuft = null;
Link<E> link = head;
while (link != tail) {
tuft = new Link<Link<E>>(link, tuft);
link = link.next;
}
return tuft;
}

}

I'm sure it's obvious what's going on here.

But if not: the core of it is a normal immutable linked list, growing at
the enqueue end. Dequeue is handled by tracking a tail pointer, and
pretending that links past the tail don't exist (rather than treating a
null next pointer as defining the end as usual). There is then an
antiparallel linked list (rooted in 'tuft' - extending the 'tail' metaphor
a bit), starting just before the tail and running back to the head; this
list essentially turns the list into a doubly-linked list, and so allows
efficient dequeues.

Keeping that list accurate at all times would be expensive, so instead, it
is constructed when needed, ie when we need to dequeue and we don't have
it. Note that when it is constructed, it reaches all the way back to the
head, but as more elements are enqueued, the head moves away from its
tail, leaving it only partially covering the queue. That's fine: we only
use it to find the link before the tail, so we don't need it to go far.
When we've dequeued enough elements to have worn it down completely, we
rebuild it.

Enqueueing is obviously O(1). Dequeueing is O(1) if you don't need to
rebuild the backwards list. If you do need to rebuild the backwards list,
it's O(n); you will then not need to rebuild for another n dequeues, which
makes it amortatively O(1).

In terms of sharing, all the forward links are shared. Backwards links
will be shared too, but if two separate lineages of queues both have to
rebuild the backwards list, the links will not be shared, even though
they are identical. Consider:

ImmutableQueue noah; // has one backward link remaining
ImmutableQueue shem = noah.dequeue(holder);
ImmutableQueue ham = noah.dequeue(holder);
// shem and ham are identical
ImmutableQueue elam = shem.dequeue(holder);
ImmutableQueue cush = ham.dequeue(holder);
// elam and cush are identical, but have separate backward lists

In terms of packratting, backwards lists become garbage, but the links in
forward lists never do. You could always walk from any head right back yea
even unto the first tail. I think that's hard to avoid in an immutable
structure.

Also, i think i've just grokked what that C# dude's half-backwards list is
all about. It's like this, but throwing away the forwards list when you
make the backwards list, which saves memory and lets forward list links
become garbage. Clever stuff.

tom

Giovanni Azua · Nov 7, 2011

If you had read the whole thread, or indeed just the OP's initial post,
you would understand that these classes are not helpful. They are

Indeed it was a lucky shot but read on ...

To recap, Jan wants to be able to say:

ImmutableStack<String> a = new ImmutableStack<String>(); // empty
ImmutableStack<String> b = a.push("foo");
assert a.isEmpty();
assert b.size() == 1;

That is, the quasi-mutation creates a new instance exhibiting the change,
rather than changing an existing instance.

The Guava Immutable* classes do not have such methods.

Would this check the box?

ImmutableMap<String, Integer> a = ImmutableMap.of("A", 1, "B", 2, "C", 3);
ImmutableMap<String, Integer> b =
new ImmutableMap.Builder<String, Integer>().
putAll(a).put("D", 4).build();

I see absolutely no improvement in the code here, only unhelpful
obfuscation.

When you say it like this, any function can be labeled as "unhelpful
obfuscation"

Giovanni

Giovanni Azua · Nov 7, 2011

That particular argument is somewhat obsolete with Java 1.7.

I haven't looked into Java 1.7 yet, enlighten me please?

And I am not so sure that avoiding typing is a good goal in
programming, but that is a different discussion.

I was referring to doing/specifying the same thing twice which is not the
same thing as avoiding typing.

The whole point is not avoid typing but:
1) avoid specifying the same thing twice
2) avoid making mistakes
3) while at it, having the code a lot more readable.
4) easier to maintain e.g. changing N declarations once rather than 2*N

Arne Vajhøj · Nov 7, 2011

I haven't looked into Java 1.7 yet, enlighten me please?

Java 1.7 support:

List<String> list = new ArrayList<>();
Map<String,String> map = new HashMap<>();

Arne

Lew · Nov 7, 2011

Tom said:
I can tell you how i do an O(1), well-sharing queue:

public final class ImmutableQueue<E> {

private static class Link<E> {

Side note: One might observe that this 'E' does not correspond to the 'E' of the outer class's generic parameter. One might also observe that this does no harm whatsoever.

Side note 2: TABs? Really?

public final E element;
public final Link<E> next;

public Link(E element, Link<E> next) {
this.element = element;
this.next = next;
}

}

private final Link<E> head;
private final Link<E> tail;
private final Link<Link<E>> tuft;

private ImmutableQueue(Link<E> head, Link<E> tail, Link<Link<E>> tuft) {
assert (tuft == null) || (tuft.element.next == tail);
this.head = head;
this.tail = tail;
this.tuft = tuft;
}

private ImmutableQueue(Link<E> element) {
this(element, element, null);
}

public ImmutableQueue() {
this(null);
}

public boolean isEmpty() {
return head == null;
}

public int size() {
if (isEmpty()) return 0;
int size = 1;
Link<E> link = head;
while (link != tail) {
++size;
link = link.next;
}

Alternative form if you like for loops:

for (Link<E> link = head; link != tail; link = link.next) {
++size;
}

return size;
}

public ImmutableQueue<E> enqueue(E element) {
Link<E> enqueued = new Link<E>(element, head);
if (isEmpty()) return new ImmutableQueue<E>(enqueued);
else return new ImmutableQueue<E>(enqueued, tail, null);
}

public ImmutableQueue<E> dequeue(E[] holder) {
ImmutableQueue<E> queue;

if (head == null) {
throw new NoSuchElementException();
}
else if (head == tail) {
queue = new ImmutableQueue<E>();
}
else {
Link<Link<E>> tuft = this.tuft;
if (tuft == null) {
tuft = backcomb();
assert tuft.element.next == tail;
}
queue = new ImmutableQueue<E>(head, tuft.element, tuft.next);
}

holder[0] = tail.element;
return queue;
}

private Link<Link<E>> backcomb() {
Link<Link<E>> tuft = null;
Link<E> link = head;
while (link != tail) {
tuft = new Link<Link<E>>(link, tuft);
link = link.next;
}
return tuft;
}

}

I'm sure it's obvious what's going on here.

But if not: the core of it is a normal immutable linked list, growing at
the enqueue end. Dequeue is handled by tracking a tail pointer, and
pretending that links past the tail don't exist (rather than treating a
null next pointer as defining the end as usual). There is then an
antiparallel linked list (rooted in 'tuft' - extending the 'tail' metaphor
a bit), starting just before the tail and running back to the head; this
list essentially turns the list into a doubly-linked list, and so allows
efficient dequeues.

Keeping that list accurate at all times would be expensive, so instead, it
is constructed when needed, ie when we need to dequeue and we don't have
it. Note that when it is constructed, it reaches all the way back to the
head, but as more elements are enqueued, the head moves away from its
tail, leaving it only partially covering the queue. That's fine: we only
use it to find the link before the tail, so we don't need it to go far.
When we've dequeued enough elements to have worn it down completely, we
rebuild it.

Enqueueing is obviously O(1). Dequeueing is O(1) if you don't need to
rebuild the backwards list. If you do need to rebuild the backwards list,
it's O(n); you will then not need to rebuild for another n dequeues, which
makes it amortatively O(1).

In terms of sharing, all the forward links are shared. Backwards links
will be shared too, but if two separate lineages of queues both have to
rebuild the backwards list, the links will not be shared, even though
they are identical. Consider:

ImmutableQueue noah; // has one backward link remaining
ImmutableQueue shem = noah.dequeue(holder);
ImmutableQueue ham = noah.dequeue(holder);
// shem and ham are identical
ImmutableQueue elam = shem.dequeue(holder);
ImmutableQueue cush = ham.dequeue(holder);
// elam and cush are identical, but have separate backward lists

In terms of packratting, backwards lists become garbage, but the links in
forward lists never do. You could always walk from any head right back yea
even unto the first tail. I think that's hard to avoid in an immutable
structure.

Also, i think i've just grokked what that C# dude's half-backwards list is
all about. It's like this, but throwing away the forwards list when you
make the backwards list, which saves memory and lets forward list links
become garbage. Clever stuff.

It's a sweet piece of work. Thanks for sharing it.

Lew · Nov 7, 2011

Giovanni said:
I don't blame you. I had the same resistant first reaction when I was
proposed to use Guava. Then something magical happened I realized that
when you use new with Collections, generally you have to tell the compiler
two times what you want namely the generic parameter. I think that having to
do the same thing two times in development is always a bad thing: two times
harder to maintain, two times the amount of possible mistakes and in short,
two times the amount of work.

The diamond operator mitigates that, and the two characters of it are smaller than the overhead of pulling in a whole other class.

I think pulling in a whole utility class at runtime because you're too lazy to type two extra characters, or even a dozen, is silly. Roofers are laughing at your idea of too much work.

But that's just me.

Jan Burse · Nov 7, 2011

Arved said:
answers. And how could you possibly have gotten an impression of
Okasaki's work if you missed this particular definition?

Just sayin'.

And, no, "persistence" != Immutable + Sharing.

I missed it, what is persistence?

Jan Burse · Nov 7, 2011

Giovanni said:
Would this check the box?

ImmutableMap<String, Integer> a = ImmutableMap.of("A", 1, "B", 2, "C", 3);
ImmutableMap<String, Integer> b =
new ImmutableMap.Builder<String, Integer>().
putAll(a).put("D", 4).build();

I guess the putAll() results in cloning, therefore
no good sharing, therefore not a solution to the
problem at hand.

But I might be wrong.

Jan Burse · Nov 7, 2011

Jan said:
I guess the putAll() results in cloning, therefore
no good sharing, therefore not a solution to the
problem at hand.

But I might be wrong.

Yep it is of no use.

First we have behind newArrayList the following:

public static <E> ArrayList<E> newArrayList() {
return new ArrayList<E>();
}

http://code.google.com/p/google-col...runk/src/com/google/common/collect/Lists.java

Then we have behind put and putAll():

public static class Builder<K, V> {
final List<Entry<K, V>> entries = Lists.newArrayList();

public Builder<K, V> put(K key, V value) {
entries.add(entryOf(key, value));
return this;
}

public Builder<K, V> putAll(Map<? extends K, ? extends V> map) {
for (Entry<? extends K, ? extends V> entry : map.entrySet()) {
put(entry.getKey(), entry.getValue());
}
return this;
}

http://www.docjar.com/html/api/com/google/inject/internal/ImmutableMap.java.html

So the whole thing, in version 1.0, doesn't make
much sense, except that it has the label
Google on it.

Bye

Giovanni Azua · Nov 7, 2011

The diamond operator mitigates that, and the two characters of it are smaller
than the overhead of pulling in a whole other class.

Excellent, same idea but built in the language doesn't seem such a silly
concept after all.

I think pulling in a whole utility class at runtime because you're too lazy to
type two extra characters, or even a dozen, is silly. Roofers are laughing at
your idea of too much work.

When I start using 1.7 then the utility is not needed anymore but you failed
to realize the advantage and then you laugh? I laughed too but more about
your ignorance

Andreas Leitgeb · Nov 7, 2011

Giovanni Azua said:
Nop, it doesn't need 1.7. I can confirm it works just fine with JDK 1.6.x

Generic type inference has been done in two steps:
- for static methods in 1.6
- for instance creation (i.e. "new ...") in 1.7

If you're stuck to 1.6 now, there might be some temptation to resort
to tricks for gaining generic type inference even for the second, but
be aware, that this is not exactly a "future investment" for the
maintainability of your code.

Giovanni Azua · Nov 7, 2011

If you're stuck to 1.6 now, there might be some temptation to resort
to tricks for gaining generic type inference even for the second, but
be aware, that this is not exactly a "future investment" for the
maintainability of your code.

Am I stuck to 1.6? static methods with generics in Java 1.6 is a "trick" and
obsolete in 1.7?

Andreas Leitgeb · Nov 7, 2011

Giovanni Azua said:
Am I stuck to 1.6?

I'd expect you to know. ;-) (surely, "if X ..." doesn't mean to imply X)
If you're not, then my advice would be to use 1.7 and use

new Foobar said:
static methods with generics in Java 1.6 is a "trick" and
obsolete in 1.7?

Static methods that do nothing but wrap a "new ..." for the
sole reason of saving retyping of generic parameters, that is.
Yes, I'd call those obsolete in 1.7 - "obsolete" in the same way
as an eventually self-written type-safe IntegerList back in 1.4 days
is meanwhile obsoleted by 1.5's generic List.

Silvio Bierman · Nov 7, 2011

I missed it, what is persistence?

Basically a persistent data structure is one that somehow retains its
previous value upon a modification.

Although not necessarily so this is usually realized in the form of a
data structure that implements modifications solely as operations that
return the modified version leaving the current version unchanged,
effectively resulting in an immutable data structure. For this to
perform well, both space and time wise, sharing is often necessary.

A singly linked list is probably the simplest example. As long as
existing nodes in the list are never modified (both its value as its
link to the next element) additions can only be done at the head of the
list and common tails can always be shared.

Perhaps looking into the standard libraries of Scala or Clojure would be
a good idea. Both provide immutable lists, trees, vectors, hashmaps etc.

Silvio

Daniel Pitts · Nov 7, 2011

Dear All,

Value objects are some well known immutable
datastructures, when an operation is applied,
typically simply a new object is created. So
for example doing:

3 = 1 + 2

With java.lang.Integer objects would involve
creating a new Integer with value left.intValue()
+ right.intValue().

But instead of call the constructor, one can also call
Integer.valueOf(), and there you have some sharing.
Namely the class Integer caches the value objects for
small values (at least the last time I looked into
the Oracle source, it was there...).

More elaborate sharing is seen in the String class
by the substring() operation. This operation does
reuse the original character array and only encapsulates
the offset and the length in a new object. I wouldnt
recomende that to get one character from a 1 MB
long string, since it will prevent the original
character array from GC. But in some situation this
is quite handy.

I am now looking for immutable datastructures with
a similar sharing. In the String class example, the
operation was substring() that produced a new immutable
object. I am looking for:

Some stack class, where: (Easy)
pop() creates a new immutable stack
push() creates a new immutable stack
With good sharing.

Some queue class, where: (Hard?)
enqueue() creates a new immutable queue
dequeue() creates a new immutable queue
With good sharing.

What is your favorite datastructure?

Bye

Linked list will definitely work well as a "shared" stack object,
because the "tail" of any given stack doesn't change with the operations.

As far as the queue goes, you run the risk of packratting if you share
the underlying data structure. AFAIK, there are three common ways to
implement a straight (as opposed to priority) queue.

1. Linked lists with a head and tail reference
2. Array list with a tail reference, and dequeue shifts all data.
3. Array list with a head and tail reference. Known as a ring deque.

With the array approaches, you need to clear release the head reference
on dequeue to avoid packratting. This happens automatically with the
linked list. This means they can't be "immutable" as dequeue must be
mutative.

For the linked list approach, enqueue is the problematic operation. A
new tail must be appended and the old tail pointer made to point to it.
That operation changes the "next" value in the old tail.

So, you can decide "share on enqueue, copy on dequeue" or "copy on
enqueue, share on dequeue". Based on requirements of your use-case, you
can choose the appropriate alternative. You can't have both, but if you
can get halfway there.

Strings...immutable?	125	Mar 18, 2007
Immutable Arrays?	5	Apr 20, 2008
Immutable cyclic graph with dependency injection	8	Jun 29, 2009
Immutable and Mutable Types	16	Mar 17, 2008
Immutable object thread-safety	4	Oct 26, 2008
If String is immutable, then why does this work?	7	May 5, 2006
Immutable Geometry Types	2	Dec 17, 2007
Address of an immutable object	6	May 30, 2010

Immutable Datastructures with good Sharing

Jan Burse

Giovanni Azua

Giovanni Azua

Arne Vajhøj

Arved Sandstrom

Tom Anderson

Giovanni Azua

Giovanni Azua

Arne Vajhøj

Lew

Lew

Jan Burse

Jan Burse

Jan Burse

Giovanni Azua

Andreas Leitgeb

Giovanni Azua

Andreas Leitgeb

Silvio Bierman

Daniel Pitts

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads