String intern() question

Discussion in 'Java' started by higgledy@gmail.com, Mar 22, 2006.

  1. Guest

    If anyone can help me with this, I'd appreciate it.

    String t = "cat";
    String s = "dog";

    >From the API:

    Returns a canonical representation for the string object.

    It follows that for any two strings s and t, s.intern() == t.intern()
    is true if and only if s.equals(t) is true.

    What is a canonical representation for the string object. I realize
    that canonical means the simplest form of String.

    Why would someone use intern()? If equals() does the same thing?
     
    , Mar 22, 2006
    #1
    1. Advertising

  2. James McGill Guest

    On Tue, 2006-03-21 at 18:49 -0800, wrote:
    > If anyone can help me with this, I'd appreciate it.
    >
    > String t = "cat";
    > String s = "dog";


    intern()-ing won't matter until you have other strings that are either
    "cat" or "dog".

    > Why would someone use intern()? If equals() does the same thing?


    They don't do the same thing. equals() is a condition of intern().

    intern() makes duplicate strings into the same reference. It's a
    benefit of the assurance that strings are immutable. Do you see why
    it's possible?

    In theory, equals() is faster on strings that have been interned. Also,
    you can use much less storage if your data happens to have a lot of
    duplicate strings.
     
    James McGill, Mar 22, 2006
    #2
    1. Advertising

  3. Roedy Green Guest

    On 21 Mar 2006 18:49:42 -0800, wrote, quoted or
    indirectly quoted someone who said :

    >What is a canonical representation for the string object. I realize
    >that canonical means the simplest form of String.


    The if you use String.intern Java does a look up of the string
    constants it ins intern pool. If it is already there it returns you
    the address of the official one, if not, it adds your string to the
    pool and adds that.

    The advantages of interning are:

    1. no duplicate strings.
    2. you can compare quickly with == rather that equals.

    the disadvantages are:
    1. time to intern
    2. possibly strings don't get garbage collected. I'm not sure what the
    latest scoop on that is.

    see http://mindprod.com/jgloss/interned.html
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Mar 22, 2006
    #3
  4. Chris Uppal Guest

    wrote:

    > > From the API:

    > Returns a canonical representation for the string object.
    >
    > It follows that for any two strings s and t, s.intern() == t.intern()
    > is true if and only if s.equals(t) is true.
    >
    > What is a canonical representation for the string object. I realize
    > that canonical means the simplest form of String.


    Not quite. In this context the word canonical means the /single/ special
    object which represents that value.

    It's simpler if we forget about Strings for a second. Say we have a class
    Point which represents (x, y) pairs. Now we can clearly have two different
    Point instances which represent the point (1, 2) -- they are equal but
    distinct. And that might not be a problem at all. But suppose that it /is/ a
    problem for some reason (we might want to save space, or we might want to use
    the clearer and faster == to compare Points rather than calling equals() all
    the time). That's when we'd want to use canonical representations.

    We need to ensure that each point is represented by no more than one instance
    of Point. To do that we have to keep a registry somewhere of existing Points,
    and when someone needs a new one, instead of just creating an instance, they
    have to ask the registry for "the" instance corresponding to, say, (22, 55).
    In ths case, all Point instances are canonical -- because we don't allow any
    other way of "getting hold" of an instance.

    We might relax that, and instead of insisting that /all/ Points be unique, we
    allow client code to create any Points that it likes, but for there to be a way
    to convert any old Point into the single "special" Point that is held in our
    registry for that (x,y) pair. That operation is called interning. By
    separating it from the logic for creating points we avoid the potentially
    expensive bottleneck of a lookup in our registry whenever a Point is created.
    In this picture we would allow:
    Point p1 = new Point(1, 2);
    Point p2 = new Point(1, 2);
    and p1 and p2 would be distinct objects. p1 != p2 even though p1.equals(p2).
    Now if client code wanted to work with canonical Points instead (for efficiency
    or for clarity), then it would say:
    p1 = p1.intern();
    p2 = p2.intern();
    and then it is guaranteed that p1 == p2.

    All that would make more sense if Points were big objects that were expensive
    in space, and which took a long time to compare with equals(). You can imagine
    that we are dealing with millions of Points in 27-dimensional space, with
    coordinates represented as thousand-digit BigIntegers, if you like ;-)

    Anyway, getting back to Strings. The picture there is almost exactly the same.
    When you create a new String, it is not guaranteed to be == to other Strings
    with the same value. Normally that is not a problem, but in some programs it's
    a /big/ problem. So the class library provides a way to reduce a String to the
    single special instance representing a given character sequence.

    The reason I took a digression via Points is that there's an additional twist
    in the tale when it comes to Strings. The Java language guarantees that any
    String instance which is the value of a string literal in the source program,
    will already have been interned.. So it you write:
    String s1 = "hello";
    String s2 = "hello";
    String tmp = new String(new char[] { 'h', 'e', 'l', 'l', 'o' } );
    String s3 = tmp.intern();
    then all three variables, s1, s2, and s3, will end up referring to the same
    String instance -- to the canonical instance representing 'hello'.

    -- chris
     
    Chris Uppal, Mar 22, 2006
    #4
  5. Guest

    Thank you to all responses. Now I understand canonical and intern() I
    REALLY everyone appreciate helping me.
     
    , Mar 22, 2006
    #5
  6. Chris Uppal sez:
    ....
    > The reason I took a digression via Points is that there's an additional twist
    > in the tale when it comes to Strings. The Java language guarantees that any
    > String instance which is the value of a string literal in the source program,
    > will already have been interned..


    There is another one: not only it eliminates duplicate strings,
    it can deal with substrings of already interned strings (presumably
    by storing offset & length). The resulting magic appears to be tuned
    for the case of "mostly unique" strings, i.e. interning duplicate
    strings does not save as much space as you'd expect. Or at least it
    didn't back when I played with it.

    Dima
    --
    Sufficiently advanced incompetence is indistinguishable from malice.
     
    Dimitri Maziuk, Mar 22, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Recruit Interns

    Verification Intern Positions Available

    Recruit Interns, Aug 14, 2003, in forum: VHDL
    Replies:
    0
    Views:
    1,232
    Recruit Interns
    Aug 14, 2003
  2. Roedy Green

    Intern failure

    Roedy Green, Apr 6, 2004, in forum: Java
    Replies:
    2
    Views:
    383
    David Zimmerman
    Apr 7, 2004
  3. Robert Mischke
    Replies:
    3
    Views:
    1,553
    Tony Morris
    May 19, 2005
  4. Paul J. Lucas

    synchronized using String.intern()

    Paul J. Lucas, Jan 30, 2009, in forum: Java
    Replies:
    37
    Views:
    4,740
    Arne Vajhøj
    Feb 4, 2009
  5. Ian Hunter

    String.intern vs String.to_sym

    Ian Hunter, Jul 28, 2008, in forum: Ruby
    Replies:
    2
    Views:
    204
    Stephen Celis
    Jul 28, 2008
Loading...

Share This Page