Returns a canonical representation for the string object.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
What is a canonical representation for the string object. I realize
that canonical means the simplest form of String.
Not quite. In this context the word canonical means the /single/ special
object which represents that value.
It's simpler if we forget about Strings for a second. Say we have a class
Point which represents (x, y) pairs. Now we can clearly have two different
Point instances which represent the point (1, 2) -- they are equal but
distinct. And that might not be a problem at all. But suppose that it /is/ a
problem for some reason (we might want to save space, or we might want to use
the clearer and faster == to compare Points rather than calling equals() all
the time). That's when we'd want to use canonical representations.
We need to ensure that each point is represented by no more than one instance
of Point. To do that we have to keep a registry somewhere of existing Points,
and when someone needs a new one, instead of just creating an instance, they
have to ask the registry for "the" instance corresponding to, say, (22, 55).
In ths case, all Point instances are canonical -- because we don't allow any
other way of "getting hold" of an instance.
We might relax that, and instead of insisting that /all/ Points be unique, we
allow client code to create any Points that it likes, but for there to be a way
to convert any old Point into the single "special" Point that is held in our
registry for that (x,y) pair. That operation is called interning. By
separating it from the logic for creating points we avoid the potentially
expensive bottleneck of a lookup in our registry whenever a Point is created.
In this picture we would allow:
Point p1 = new Point(1, 2);
Point p2 = new Point(1, 2);
and p1 and p2 would be distinct objects. p1 != p2 even though p1.equals(p2).
Now if client code wanted to work with canonical Points instead (for efficiency
or for clarity), then it would say:
p1 = p1.intern();
p2 = p2.intern();
and then it is guaranteed that p1 == p2.
All that would make more sense if Points were big objects that were expensive
in space, and which took a long time to compare with equals(). You can imagine
that we are dealing with millions of Points in 27-dimensional space, with
coordinates represented as thousand-digit BigIntegers, if you like ;-)
Anyway, getting back to Strings. The picture there is almost exactly the same.
When you create a new String, it is not guaranteed to be == to other Strings
with the same value. Normally that is not a problem, but in some programs it's
a /big/ problem. So the class library provides a way to reduce a String to the
single special instance representing a given character sequence.
The reason I took a digression via Points is that there's an additional twist
in the tale when it comes to Strings. The Java language guarantees that any
String instance which is the value of a string literal in the source program,
will already have been interned.. So it you write:
String s1 = "hello";
String s2 = "hello";
String tmp = new String(new char[] { 'h', 'e', 'l', 'l', 'o' } );
String s3 = tmp.intern();
then all three variables, s1, s2, and s3, will end up referring to the same
String instance -- to the canonical instance representing 'hello'.
-- chris