String reuse

B

Bob Rivers

Hi,

I was searching comp.lang.java and I didn't found a "definitive"
answer about. Taking a looking into Java Glossaty, I found that:

"Strings are immutable. Therefore they can be reused indefinitely, and
they can be shared for many purposes. When you assign one String
variable to another, no copy is made. Even when you take a substring
there is no new String created. New Strings are created when: you
concatentate. you do reads...." (I didn't understand "you do reads".
Sorry my english).

Ok, there is no doubt that when I concatenate strings (+=) java will
generate a new object. But, if I only assign a new value to a String,
will a new String be created?

For example:

String str = "Hi";
str = "Hi Bob!";

In the example above, how many strings I created?

TIA,

Bob - Brazil
 
E

Eric Sosman

Bob said:
Hi,

I was searching comp.lang.java and I didn't found a "definitive"
answer about. Taking a looking into Java Glossaty, I found that:

"Strings are immutable. Therefore they can be reused indefinitely, and
they can be shared for many purposes. When you assign one String
variable to another, no copy is made. Even when you take a substring
there is no new String created. New Strings are created when: you
concatentate. you do reads...." (I didn't understand "you do reads".
Sorry my english).

English is notoriously irregular and idiom-ridden,
and I sometimes wonder how anyone can learn it -- Americans
certainly do not! In this context "you do reads" is taken
to mean

you = you
do = perform
reads = input operations
Ok, there is no doubt that when I concatenate strings (+=) java will
generate a new object. But, if I only assign a new value to a String,
will a new String be created?

For example:

String str = "Hi";
str = "Hi Bob!";

In the example above, how many strings I created?

Two: one for "Hi" and one for "Hi Bob!" It is possible
that these two Strings share some storage, so that there is
only one 'H' and only one 'i' in the JVM's memory -- but
there are two distinct String objects nonetheless.

It is important to understand that str is not a String;
it is a *reference* to a String. Here is another piece of
code that uses two String objects, not four:

String s1 = "Hi";
String s2 = "Hi";
String s3 = "Hi";
String str = "Hi Bob!";

The three distinct references s1,s2,s3 all refer to the same
String object.
 
M

Matt SDF Smith

Java creates a String constant pool where all String literals live. So, in you
example, there would be two Strings literals living in the pool: "Hi" and "Hi
Bob!". Your programs has one String instance, str, and it would start by
pointing at "Hi" but then is changed to point to "Hi Bob!". So, to answer the
question: There is one String instance and two String literals.

Why have a String literal pool? Often String literals end up being longer than
just a few bytes. Keeping a single copy saves heap space by not needleesly
duplicating the String literal. Also the program can be faster as it doesn't
need to copy literals when they are assigned.

For example:
byte[] chars = new byte[10240];
Arrays.fill( chars, 'A' );
String longString = new String( chars );
String anotherInstance = longString;
String yetAnotherInstance = anotherInstance;

This example creates a 10K String literal with 3 String instances pointing to
is. If each instance got its own copy of the literal then lots of memory would
be wasted.
 
J

Joona I Palaste

Bob Rivers said:
I was searching comp.lang.java and I didn't found a "definitive"
answer about. Taking a looking into Java Glossaty, I found that:
"Strings are immutable. Therefore they can be reused indefinitely, and
they can be shared for many purposes. When you assign one String
variable to another, no copy is made. Even when you take a substring
there is no new String created. New Strings are created when: you
concatentate. you do reads...." (I didn't understand "you do reads".
Sorry my english).
Ok, there is no doubt that when I concatenate strings (+=) java will
generate a new object. But, if I only assign a new value to a String,
will a new String be created?
For example:
String str = "Hi";
str = "Hi Bob!";

In the example above, how many strings I created?

Usually, two. In the general case, assigning a variable to refer to
another object has nothing whatsoever to do with the object the
variable previously referred to.
This is usually the case with Strings, too. If we modify your code
slightly:

String str = "Hi";
str = "Bob!";

then the answer I have given above applies fully. However, in your
original code, the string "Hi" is a substring of the string "Hi Bob!"
so the compiler might optimise the code to reuse a portion of the same
character storage. None of this optimisation has any effects visible to
your code, though - it happens entirely within the JVM.
Note also that when you assign a variable to refer to another object,
the previously referred object becomes eligible for garbage collection,
if its reference chains to any currently live threads become broken.
 
T

Thomas Fritsch

Bob said:
Hi,

I was searching comp.lang.java and I didn't found a "definitive"
answer about. Taking a looking into Java Glossaty, I found that:

"Strings are immutable. Therefore they can be reused indefinitely, and
they can be shared for many purposes. When you assign one String
variable to another, no copy is made. Even when you take a substring
there is no new String created. New Strings are created when: you
concatentate. you do reads...." (I didn't understand "you do reads".
Sorry my english).

Ok, there is no doubt that when I concatenate strings (+=) java will
generate a new object. But, if I only assign a new value to a String,
will a new String be created?

For example:

String str = "Hi";
str = "Hi Bob!";

In the example above, how many strings I created?

You created 2 string objects: "Hi" and "Hi Bob!". And the end the
variable str references the object "Hi". The object "Hi Bob!" is no
longer referenced anywhere (and hence it will be garbage-collected some
time later).
Note the subtle distinction between variables (referencing an object)
and the objects themselves.
 
J

Joona I Palaste

You created 2 string objects: "Hi" and "Hi Bob!". And the end the
variable str references the object "Hi". The object "Hi Bob!" is no
longer referenced anywhere (and hence it will be garbage-collected some
time later).

You got these two backwards.
 
T

Thomas Fritsch

Thomas said:
You created 2 string objects: "Hi" and "Hi Bob!". And the end the
variable str references the object "Hi". The object "Hi Bob!" is no
Stupid me! It is the other way round, of course!
 
O

Oscar kind

Bob Rivers said:
String str = "Hi";
str = "Hi Bob!";

In the example above, how many strings I created?

Two String Objects, and one reference to a String.
 
L

Lasse Reichstein Nielsen

Matt SDF Smith said:
Java creates a String constant pool where all String literals live.
So, in you example, there would be two Strings literals living in
the pool: "Hi" and "Hi Bob!".

Correct. Completely technical, String *literals* is a concept of Java
source code. When compiled, they generate entries in the constant
pool and code for generating a String instance from such a constant.
Your programs has one String instance, str, and it would start by
pointing at "Hi" but then is changed to point to "Hi Bob!".

"str" is not a String instance. It is a variable of type String.
It does point to different instances of the String class at different
points of the execution.
So, to answer the question: There is one String
instance and two String literals.

There are two String literals. A string literal is a source expression
of type String, so it evaluates to a String instance. Since the two
literals are different, the String instances are also different, so
there are *two* String instances (referred to in turn by *one* String
variable).

/L
 
W

Will Hartung

Matt SDF Smith said:
Why have a String literal pool? Often String literals end up being longer than
just a few bytes. Keeping a single copy saves heap space by not needleesly
duplicating the String literal. Also the program can be faster as it doesn't
need to copy literals when they are assigned.

For example:
byte[] chars = new byte[10240];
Arrays.fill( chars, 'A' );
String longString = new String( chars );
String anotherInstance = longString;
String yetAnotherInstance = anotherInstance;

This example creates a 10K String literal with 3 String instances pointing to
is. If each instance got its own copy of the literal then lots of memory would
be wasted.

At runtime, there is a different behavior.

As the original post mentioned "when strings are read", which means that
when you get data from a file, or a database, or whatever, all of the
strings that it creates are brand new, even if you've seen them before.

For example, if I have a file with 5 lines:
1234567890
1234567890
1234567890
1234567890
1234567890

And I do a BufferedReader#readLine() from that file, I will get 5 new
strings, that while are all .equals, will in fact have their own storage,
and therefore will not be == (i.e. the same object).

The String class, however, has a rarely used method named "intern", which
will make the effort to make .equals Strings actually ==.

So, I could do this (given the contrived file above):
String s1, s2;

s1 = br.readLine();
s2 = br.readLine();

boolean f = s1 == s2; // False;

s1 = s1.intern();
s2 = s2.intern();

f = s1 == s2; // True.

This can come into play if you're loading, say, a large XML document as an
example. There are a lot of common tags in an XML document, and if you take
the effort to actually intern the tag names, you'll save memory overall. If
I load 100 <text>Some text</text> tags, then by interning the tags as they
are read will free the other 99 new "test" tag Strings. For all I know, the
XML parsers DO do this (it's more important in a DOM situation, than
something like SAX), I don't know, but it's a simple example where you can
easily benefit from taking this extra step, and if Strings were not
immutable, you'd not be able to take advantage of something like this.

Regards,

Will Hartung
([email protected])
 
S

steph

Le 19/10/2004 19:14, Matt SDF Smith a &eacute;crit :
Java creates a String constant pool where all String literals live. So, in you
example, there would be two Strings literals living in the pool: "Hi" and "Hi
Bob!". Your programs has one String instance, str, and it would start by
pointing at "Hi" but then is changed to point to "Hi Bob!". So, to answer the
question: There is one String instance and two String literals.

Why have a String literal pool? Often String literals end up being longer than
just a few bytes. Keeping a single copy saves heap space by not needleesly
duplicating the String literal. Also the program can be faster as it doesn't
need to copy literals when they are assigned.

For example:
byte[] chars = new byte[10240];
Arrays.fill( chars, 'A' );
String longString = new String( chars );
String anotherInstance = longString;
String yetAnotherInstance = anotherInstance;

This example creates a 10K String literal with 3 String instances pointing to
is. If each instance got its own copy of the literal then lots of memory would
be wasted.

For me, it's not realy true.
This code will consume 20k of heap memory, because "new String(chars)" will
duplicate the array of characters.
Indeed, because you can do
byte[] chars = new byte[10240];
Arrays.fill( chars, 'A' );
String longString = new String( chars );
chars[334]='B';
and because strings are imutable, the constructor has to duplicate the data.
 
M

Matt SDF Smith

You are correct in stating that 20K will be used. 10K by the byte array and
10K by the Strings. However, the question being asked was how much memory was
being eaten by the Strings. The byte array was an example of creating a large
String. Also, as you pointed out, changing the byte array does not affect the
Strings in any way.


Le 19/10/2004 19:14, Matt SDF Smith a &eacute;crit :
For me, it's not realy true.
This code will consume 20k of heap memory, because "new String(chars)" will
duplicate the array of characters.
Indeed, because you can do
byte[] chars = new byte[10240];
Arrays.fill( chars, 'A' );
String longString = new String( chars );
chars[334]='B';
and because strings are imutable, the constructor has to duplicate the data.
 
T

Tony Morris

Matt SDF Smith said:
Java creates a String constant pool where all String literals live.

This comment is potentially misleading.
The concept of a "String literal pool" is in fact a very abstract one, that
should be used only to understand how String compile-time constants (JLS 2e
15.28) are handled at runtime. That is, there is no 'actual pool'.

To demonstrate this point, consider the following:

class X
{
void m()
{
{
String s = "s";
}

{
String s = "s";
}
}
}

How many String objects exist during the execution of m()?
Most of you will say 1, but this is not true.
Good luck.
 
Y

Yogo

class X
{
void m()
{
{
String s = "s";
}

{
String s = "s";
}
}
}

How many String objects exist during the execution of m()?
Most of you will say 1, but this is not true.

Hmm, I think there is only 1 String object. I tried the following and both
references point to the same String object ("One String object" is
displayed).

public class StringPool {
public static void main (String s[]){
new X().m();
}
}

class X {
void m() {
RefObject o1=null, o2=null;
{
String s = "s";
o1=new RefObject(s);
}
{
String s = "s";
o2=new RefObject(s);
}
if (o1.equals(o2))
System.out.println("One String object");
else
System.out.println("Two String objects");
}
}

class RefObject {
String ref;
RefObject (String ref){
this.ref=ref;
}
public boolean equals(RefObject obj){
return (this.ref == obj.ref);
}
}


Yogo
 
T

Tony Morris

Yogo said:
class X
{
void m()
{
{
String s = "s";
}

{
String s = "s";
}
}
}

How many String objects exist during the execution of m()?
Most of you will say 1, but this is not true.

Hmm, I think there is only 1 String object. I tried the following and both
references point to the same String object ("One String object" is
displayed).

public class StringPool {
public static void main (String s[]){
new X().m();
}
}

class X {
void m() {
RefObject o1=null, o2=null;
{
String s = "s";
o1=new RefObject(s);
}
{
String s = "s";
o2=new RefObject(s);
}
if (o1.equals(o2))
System.out.println("One String object");
else
System.out.println("Two String objects");
}
}

class RefObject {
String ref;
RefObject (String ref){
this.ref=ref;
}
public boolean equals(RefObject obj){
return (this.ref == obj.ref);
}
}


Yogo

Yes - in your case, you are right - there is only one instance (I only
skimmed it, but I noticed you held references).
However, note that your code and my code differ substantially.

Here's the catch: prove that my code can quite potentially use 2 String
instances.
Or better, prove that a String literal is not necessarily the same instance
throughout the execution of a VM (despite JLS 2e 3.20.5 which is ambiguous).

It can be done.
 
L

Lee Fesperman

Tony said:
This comment is potentially misleading.
The concept of a "String literal pool" is in fact a very abstract one, that
should be used only to understand how String compile-time constants (JLS 2e
15.28) are handled at runtime. That is, there is no 'actual pool'.

Why do you say that? java.lang.String#intern() states that it uses the pool (of unique
strings.)
 
J

John C. Bollinger

Lee said:
Why do you say that? java.lang.String#intern() states that it uses the pool (of unique
strings.)

The class String maintains a pool of unique Strings, to which
String.intern() method may add a String when invoked. References to
compile-time constant Strings are guaranteed to refer to an interned
String. However, as I was recently made aware, otherwise unreferenced
String instances can be GC'd from the intern pool ("can" in the sense
that recent Sun JVMs can be observed to do so), so it is not in general
safe to assume that the String instance representing a particular
compile-time constant String at one point in your program's execution
is the same instance that represents the same or an equal String at some
other point in your program's execution. Whether this is a reasonable
behavior is subject to debate, but the fact that Sun's own JVM exhibits
it makes the question rather moot.


John Bollinger
(e-mail address removed)
 
L

Lee Fesperman

John said:
The class String maintains a pool of unique Strings, to which
String.intern() method may add a String when invoked. References to
compile-time constant Strings are guaranteed to refer to an interned
String. However, as I was recently made aware, otherwise unreferenced
String instances can be GC'd from the intern pool ("can" in the sense
that recent Sun JVMs can be observed to do so), so it is not in general
safe to assume that the String instance representing a particular
compile-time constant String at one point in your program's execution
is the same instance that represents the same or an equal String at some
other point in your program's execution. Whether this is a reasonable
behavior is subject to debate, but the fact that Sun's own JVM exhibits
it makes the question rather moot.

Of course, my article was just to point out to Tony that the pool does exist. However,
you bring up a good issue for discussion. I had assumed that the string pool never
shrunk.

Thinking about it, it seems to be a good optimization for the JVM to make. Generally, an
application couldn't even tell that unreferenced strings were dropped from the pool. The
only ways to detect it would be to use WeakReference's, etc. or to retain the
identityHashCode for an intern'ed string for later checking. The latter is not
guaranteed to work, though.
 
J

John C. Bollinger

Lee said:
John C. Bollinger wrote:


Of course, my article was just to point out to Tony that the pool does exist. However,
you bring up a good issue for discussion. I had assumed that the string pool never
shrunk.

Thinking about it, it seems to be a good optimization for the JVM to make. Generally, an
application couldn't even tell that unreferenced strings were dropped from the pool. The
only ways to detect it would be to use WeakReference's, etc. or to retain the
identityHashCode for an intern'ed string for later checking. The latter is not
guaranteed to work, though.

FWIW, The test code I have seen that demonstrates the Sun VM behavior in
fact uses the identityHashCode technique.

I was surprised, too, to find that interned Strings were being GCd. My
initial reading of the JLS led me to think that they would not be GC'd,
but the behavior does not seem to be forbidden. As it takes rather
extraordinary means to detect the situation, I'm not loosing any sleep
over it.


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top