String intern() question

H

higgledy

If anyone can help me with this, I'd appreciate it.

String t = "cat";
String s = "dog";
From the API:
Returns a canonical representation for the string object.

It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.

What is a canonical representation for the string object. I realize
that canonical means the simplest form of String.

Why would someone use intern()? If equals() does the same thing?
 
J

James McGill

If anyone can help me with this, I'd appreciate it.

String t = "cat";
String s = "dog";

intern()-ing won't matter until you have other strings that are either
"cat" or "dog".
Why would someone use intern()? If equals() does the same thing?

They don't do the same thing. equals() is a condition of intern().

intern() makes duplicate strings into the same reference. It's a
benefit of the assurance that strings are immutable. Do you see why
it's possible?

In theory, equals() is faster on strings that have been interned. Also,
you can use much less storage if your data happens to have a lot of
duplicate strings.
 
R

Roedy Green

What is a canonical representation for the string object. I realize
that canonical means the simplest form of String.

The if you use String.intern Java does a look up of the string
constants it ins intern pool. If it is already there it returns you
the address of the official one, if not, it adds your string to the
pool and adds that.

The advantages of interning are:

1. no duplicate strings.
2. you can compare quickly with == rather that equals.

the disadvantages are:
1. time to intern
2. possibly strings don't get garbage collected. I'm not sure what the
latest scoop on that is.

see http://mindprod.com/jgloss/interned.html
 
C

Chris Uppal

Returns a canonical representation for the string object.

It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.

What is a canonical representation for the string object. I realize
that canonical means the simplest form of String.

Not quite. In this context the word canonical means the /single/ special
object which represents that value.

It's simpler if we forget about Strings for a second. Say we have a class
Point which represents (x, y) pairs. Now we can clearly have two different
Point instances which represent the point (1, 2) -- they are equal but
distinct. And that might not be a problem at all. But suppose that it /is/ a
problem for some reason (we might want to save space, or we might want to use
the clearer and faster == to compare Points rather than calling equals() all
the time). That's when we'd want to use canonical representations.

We need to ensure that each point is represented by no more than one instance
of Point. To do that we have to keep a registry somewhere of existing Points,
and when someone needs a new one, instead of just creating an instance, they
have to ask the registry for "the" instance corresponding to, say, (22, 55).
In ths case, all Point instances are canonical -- because we don't allow any
other way of "getting hold" of an instance.

We might relax that, and instead of insisting that /all/ Points be unique, we
allow client code to create any Points that it likes, but for there to be a way
to convert any old Point into the single "special" Point that is held in our
registry for that (x,y) pair. That operation is called interning. By
separating it from the logic for creating points we avoid the potentially
expensive bottleneck of a lookup in our registry whenever a Point is created.
In this picture we would allow:
Point p1 = new Point(1, 2);
Point p2 = new Point(1, 2);
and p1 and p2 would be distinct objects. p1 != p2 even though p1.equals(p2).
Now if client code wanted to work with canonical Points instead (for efficiency
or for clarity), then it would say:
p1 = p1.intern();
p2 = p2.intern();
and then it is guaranteed that p1 == p2.

All that would make more sense if Points were big objects that were expensive
in space, and which took a long time to compare with equals(). You can imagine
that we are dealing with millions of Points in 27-dimensional space, with
coordinates represented as thousand-digit BigIntegers, if you like ;-)

Anyway, getting back to Strings. The picture there is almost exactly the same.
When you create a new String, it is not guaranteed to be == to other Strings
with the same value. Normally that is not a problem, but in some programs it's
a /big/ problem. So the class library provides a way to reduce a String to the
single special instance representing a given character sequence.

The reason I took a digression via Points is that there's an additional twist
in the tale when it comes to Strings. The Java language guarantees that any
String instance which is the value of a string literal in the source program,
will already have been interned.. So it you write:
String s1 = "hello";
String s2 = "hello";
String tmp = new String(new char[] { 'h', 'e', 'l', 'l', 'o' } );
String s3 = tmp.intern();
then all three variables, s1, s2, and s3, will end up referring to the same
String instance -- to the canonical instance representing 'hello'.

-- chris
 
H

higgledy

Thank you to all responses. Now I understand canonical and intern() I
REALLY everyone appreciate helping me.
 
D

Dimitri Maziuk

Chris Uppal sez:
....
The reason I took a digression via Points is that there's an additional twist
in the tale when it comes to Strings. The Java language guarantees that any
String instance which is the value of a string literal in the source program,
will already have been interned..

There is another one: not only it eliminates duplicate strings,
it can deal with substrings of already interned strings (presumably
by storing offset & length). The resulting magic appears to be tuned
for the case of "mostly unique" strings, i.e. interning duplicate
strings does not save as much space as you'd expect. Or at least it
didn't back when I played with it.

Dima
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top