String: how works?

G

Guest

The question is about JVM.

I have a
string = new String("aa...aa")
with
size() == 10000

when I call
string2 = string.substring(0, 9000)
string2 allocates another block of memory, or simply points to the same
block of memory with private startIndex and endIndex?

I ask because I want to use this function (substring) in time-critical
loop. If substring allocates extra memory it is not acceptable.

I forgot: after substring, no other changes to string2
 
S

steepyirl

Yes, it DOES allocate another chunk of memory. As far as I know, String
objects are treated as immutable classes by the JVM; therefore, every
time you perform a string operation that modifies a string, it actually
creates a copy and modifies that, instead of modifying the original.
For the purposes of your application, consider looking into
StringBuffers instead - this is a class that will allow its contents to
be modified without creating a new instance.
 
E

Eric Sosman

The question is about JVM.

I have a
string = new String("aa...aa")
with
size() == 10000

when I call
string2 = string.substring(0, 9000)
string2 allocates another block of memory, or simply points to the same
block of memory with private startIndex and endIndex?

I ask because I want to use this function (substring) in time-critical
loop. If substring allocates extra memory it is not acceptable.

I forgot: after substring, no other changes to string2

After the above, memory will contain

1: A String object for the literal "aa...aa", consisting
of the usual bookkeeping for an Object plus the fields
that are unique to the String class. One of those
fields will be a reference to

2: A char[] array containing all those 'a's. The array
itself is an Object, of course, and contains some
special fields of its own.

3: Another String object created by `new' (this is the
object that `string' refers to). Its array reference
will refer to

4: Another char[] array containing a second copy of all
those 'a's, also created by the `new'.

5: A third String object created by substring() (this
is the object that `string2' refers to). The array
reference will point to the array mentioned as #4
above; some private fields in the #5 String object
indicate which portion of the array it "owns."

In short, items #3 and #4 exist only because the
String(String) constructor made copies (effectively) of
#1 and #2. The substring() method allocates memory, but
only for the #5 item; the char[] array that holds the
data characters is shared by both #3 and #5.
 
G

googmeister

string2 allocates another block of memory

No, it does not create a copy of the 10,000 characters as the
previous poster claims.
simply points to the same
block of memory with private startIndex and endIndex?

Yes, because Strings are immutable, it is able to create a
new String object with a new start and end index, and
index into the same character array as the original
string.

There is still overhead for each String object, but it
is independent of the length of the string.
 
T

Thomas G. Marshall

Eric Sosman coughed up:
The question is about JVM.

I have a
string = new String("aa...aa")
with
size() == 10000

when I call
string2 = string.substring(0, 9000)
string2 allocates another block of memory, or simply points to the
same block of memory with private startIndex and endIndex?

I ask because I want to use this function (substring) in
time-critical loop. If substring allocates extra memory it is not
acceptable.

I forgot: after substring, no other changes to string2

After the above, memory will contain

1: A String object for the literal "aa...aa", consisting
of the usual bookkeeping for an Object plus the fields
that are unique to the String class. One of those
fields will be a reference to

2: A char[] array containing all those 'a's. The array
itself is an Object, of course, and contains some
special fields of its own.

I'm not sure that this is technically the case. It /probably/ is, but I
wouldn't rule out the compiler making allowances for the constructor when
called with a literal. That is, it might well make one string, and assign
it. Unless of course it is spelled out in the JLS that it actually goes
through the extra steps of creating the object for the literal first.

3: Another String object created by `new' (this is the
object that `string' refers to). Its array reference
will refer to

4: Another char[] array containing a second copy of all
those 'a's, also created by the `new'.

5: A third String object created by substring() (this
is the object that `string2' refers to). The array
reference will point to the array mentioned as #4
above; some private fields in the #5 String object
indicate which portion of the array it "owns."

In short, items #3 and #4 exist only because the
String(String) constructor made copies (effectively) of
#1 and #2. The substring() method allocates memory, but
only for the #5 item; the char[] array that holds the
data characters is shared by both #3 and #5.
 
T

Thomas Kellerer

The question is about JVM.

I have a
string = new String("aa...aa")
with
size() == 10000

when I call
string2 = string.substring(0, 9000)
string2 allocates another block of memory, or simply points to the same
block of memory with private startIndex and endIndex?

I ask because I want to use this function (substring) in time-critical
loop. If substring allocates extra memory it is not acceptable.

Pretty easy to find out by looking at the source code which comes with the JDK:

substring(beginIndex, endIndex) basically calls:

new String(offset + beginIndex, endIndex - beginIndex, value)

(with value beeing the private char[] storage) and that constructor does simply
the following:

String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

so, not further memory allocated. References are copied

Thomas
 
T

Tor Iver Wilhelmsen

Eric Sosman said:
4: Another char[] array containing a second copy of all
those 'a's, also created by the `new'.

Why do you think it needs to? It can just point to the same char
array: A String is immutable - only a constructor that takes a mutable
type (e.g. char[]) needs to copy.

Going by Sun's source, the char array is only copied if the array is
larger than necessary. So it would only be called in the case of e.g.

new String(someString.substring(a,b))

because substring() reuses char array for speed.
 
E

Eric Sosman

Tor said:
4: Another char[] array containing a second copy of all
those 'a's, also created by the `new'.


Why do you think it needs to? It can just point to the same char
array: A String is immutable - only a constructor that takes a mutable
type (e.g. char[]) needs to copy.

Going by Sun's source, the char array is only copied if the array is
larger than necessary. So it would only be called in the case of e.g.

new String(someString.substring(a,b))

because substring() reuses char array for speed.

It looks like you're right: The pre-existing char[]
is not copied unless it's too long. So the final total
of memory chunks in the O.P.'s situation is four: the
String objects for the literal, for the new String
constructed from the literal, and for the substring,
and a single char[] referred to by all three. What
the O.P. seemed concerned about was memory consumption
by the substring method, and on that point my answer
was right (for a change ...): substring does in fact
allocate memory, but does not copy part of the char[].

Thanks for the correction.
 
S

Steve Horsley

Eric said:
4: Another char[] array containing a second copy of all
those 'a's, also created by the `new'.


Why do you think it needs to? It can just point to the same char
array: A String is immutable - only a constructor that takes a mutable
type (e.g. char[]) needs to copy.

Going by Sun's source, the char array is only copied if the array is
larger than necessary. So it would only be called in the case of e.g.

new String(someString.substring(a,b))

because substring() reuses char array for speed.


It looks like you're right: The pre-existing char[]
is not copied unless it's too long. So the final total
of memory chunks in the O.P.'s situation is four: the
String objects for the literal, for the new String
constructed from the literal, and for the substring,
and a single char[] referred to by all three. What
the O.P. seemed concerned about was memory consumption
by the substring method, and on that point my answer
was right (for a change ...): substring does in fact
allocate memory, but does not copy part of the char[].

Thanks for the correction.

I would like to disagree here. I haven't checked the JLS, but it
is my belief that new String(String) always allocates a fresh
char[]. I have used this feature as a way of reducing memory
usage, like this:

while((line = myBufferedReader.readLine()) != null) {
// find a short extract from the log file line...
s = line.substring(blah, blah);
// myList.add(s); // uses lots of memory
myList.add(new String(s)); // uses much less memory
}

Since substring keeps the same char[] as the original line
string, my short substrings always come with lots of unused
char[], and hoarding these substrings can waste lots of memory.
Using new String(string) not only saved memory, it made my app
run twice as fast, I guess because of the reduced garbage
collection needed. This was on a Sun 1.3 JVM.

Steve
 
T

timjowers

Run this with a profiler and see. Then let's see what's correct. I
think Java 1.4.2 no longer allocates until touched (the old
StringBuffer better than String argument is superfluous in most cases).

I'm interested to hear what you find. By "touched" is it "read" or
"written"? Does substring touch it? For an old StringBuffer versus
String example I used HP's Jtune or jmeter to read the output of
-Xrunhprof and see that Strings are not allocated when init'ed like in
prior JVM versions and as I'd expected.

TimJowers
 
T

Thomas Schodt

Steve said:
Eric said:
The pre-existing char[]
is not copied unless it's too long.

I haven't checked the JLS, but it is my
belief that new String(String) always allocates a fresh char[].

It will not allocate a new char] if the char[] backing the new String is
the same size as the original. Which is exactly what Eric said.
 
A

Antti S. Brax

I would like to disagree here. I haven't checked the JLS, but it
is my belief that new String(String) always allocates a fresh
char[].

If the String given as a parameter contains baggage (the array
size is larger than what length() returns) then a new array is
created.

And let's also note explicitely that this is a useful trick only
if the original String is discarded right after the new String
is created.
 
T

Tor Iver Wilhelmsen

Steve Horsley said:
I would like to disagree here. I haven't checked the JLS, but it is my
belief that new String(String) always allocates a fresh char[].

It's not a question of belief: Just check java/lang/String.java in
src.zip.
s = line.substring(blah, blah);
// myList.add(s); // uses lots of memory
myList.add(new String(s)); // uses much less memory

.... because in this particular case a new char[] is created that is
shorter.
Since substring keeps the same char[] as the original line string, my
short substrings always come with lots of unused char[], and hoarding
these substrings can waste lots of memory.

Yes, because in your code example you are not interested in keeping
the whole String you read, so it makes sense to trim.

In the OP's case that is not an issue.
 
E

Eric Sosman

Thomas said:
Steve said:
Eric Sosman wrote:

The pre-existing char[]
is not copied unless it's too long.

I haven't checked the JLS, but it is my
belief that new String(String) always allocates a fresh char[].


It will not allocate a new char] if the char[] backing the new String is
the same size as the original. Which is exactly what Eric said.

Credit where it's due: That's exactly what I said
after being corrected by Tor Iver Wilhelmsen. "You
learn something new every day."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,781
Messages
2,569,615
Members
45,294
Latest member
LandonPigo

Latest Threads

Top