String Theory

M

MrFredBloggs

If I have a string:

String parentString = new String("This is a string");

and I call:

String childString = parentString.subString(10, 16); //==> "string"

I understand that I have created a new instance of String called
childString.

However, does childString now internally contain the char[] array:

private char[] data = char[] {'s', 't', 'r', 'i','n', 'g'};

or does it still refer to the char[] array in parentString.


Regards,

Fred.
 
F

Fred

It refers to the char[] array in the parent string. Take look at the
code from the java.lang.String class below:

public String substring(int beginIndex, int endIndex) {
.............
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}

/* ----------------- Note the comment
--------------------------------------- */
// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}


This shouldn't be all too troubling, though, given the immutable
characteristic of a String i.e. you'd be hard pressed to find a case
where mutating the immuatable parentString will affect your
childString.

-Fred
 
I

Ingo R. Homann

Hi,
It refers to the char[] array in the parent string....
This shouldn't be all too troubling, though,...

Well, it *might* cause troubles concerning teh garbage collection.

Some time ago, I read a huge log file with a java-application, doing
somthing like this:

String line;
ArrayList smallPieces=new ArrayList();
while((line=in.readLine())!=null) {
smallPieces.add(line.substring(0,1));
}

One might wonder why the free memory was getting low. Of course, at the
end of the loop, all characters in the whole file are somehow
"reachable" an cannot be GC'ed! (Although they cannnot really be reached
by the program...)

Of course it would be clever to use a StringBuffer in that case or
explicitely copy the part of the char[] from the orig String...

Ciao,
Ingo
 
T

Tor Iver Wilhelmsen

Ingo R. Homann said:
Of course it would be clever to use a StringBuffer in that case or
explicitely copy the part of the char[] from the orig String...

Yes, or new String(the.substring()) which does create a small char
array.
 
T

Thomas G. Marshall

Ingo R. Homann coughed up:
Hi,
It refers to the char[] array in the parent string....
This shouldn't be all too troubling, though,...

Well, it *might* cause troubles concerning teh garbage collection.

Some time ago, I read a huge log file with a java-application, doing
somthing like this:

String line;
ArrayList smallPieces=new ArrayList();
while((line=in.readLine())!=null) {
smallPieces.add(line.substring(0,1));
}

One might wonder why the free memory was getting low. Of course, at
the end of the loop, all characters in the whole file are somehow
"reachable" an cannot be GC'ed! (Although they cannnot really be
reached by the program...)

Of course it would be clever to use a StringBuffer in that case


Ah....careful. StringBuffer changed in one significant way from 1.4 to 1.5.
You would still need to do the extra new to trim down the internal char
array in 1.4.x (and prior)

String myShorterString = new String(myStringBuffer.toString());

But in 1.5, a (StringBuffer).toString() creates a new string with a new char
array.


....[rip]...
 
D

Dale King

If I have a string:

String parentString = new String("This is a string");

and I call:

String childString = parentString.subString(10, 16); //==> "string"

I understand that I have created a new instance of String called
childString.

To be pedantic, instances of objects are not called anything. They have
references not names. You have created a variable called childString
that holds the reference to the new String instance.
However, does childString now internally contain the char[] array:

private char[] data = char[] {'s', 't', 'r', 'i','n', 'g'};

or does it still refer to the char[] array in parentString.

As others have said it will usually reuse the same array, but that is
not in fact guaranteed.
 
T

Thomas G. Marshall

Dale King coughed up:
If I have a string:

String parentString = new String("This is a string");

and I call:

String childString = parentString.subString(10, 16); //==> "string"

I understand that I have created a new instance of String called
childString.

To be pedantic, instances of objects are not called anything. They
have references not names. You have created a variable called
childString that holds the reference to the new String instance.
However, does childString now internally contain the char[] array:

private char[] data = char[] {'s', 't', 'r', 'i','n', 'g'};

or does it still refer to the char[] array in parentString.

As others have said it will usually reuse the same array, but that is
not in fact guaranteed.


When is it not guaranteed? (String).substring(int, int) invokes the
String(int, int, char[]) package scoped constructor:

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

Which keeps the internal array for reuse in the new string.


--
Puzzle: You are given a deck of cards all face down
except for 10 cards mixed in which are face up.
If you are in a pitch black room, how do you divide
the deck into two piles (may be uneven) that each
contain the same number of face-up cards?
Answer (rot13): Sebz naljurer va gur qrpx, qrny bhg
gra pneqf naq syvc gurz bire.
 
D

Dale King

Thomas said:
Dale King coughed up:


When is it not guaranteed?

When running on a different VM. What I was saying is that the contract
for the method does not specify that it must work that way. It usually
does, but it ain't guaranteed.
 
T

Thomas G. Marshall

Dale King coughed up:
When running on a different VM. What I was saying is that the contract
for the method does not specify that it must work that way. It usually
does, but it ain't guaranteed.


You'll need to explain this to me then. It's not the contract.

The method body /itself/ dictates this.

It's possible that some compilers and vm's would take liberties with the
string class, since it is /very/ tightly coupled to the functionality of the
language proper. (I've pondered this myself in usenet before). But not
honoring the source would indicate a broken compiler/vm.

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

What about the above might change?
 
C

Christian Gudrian

Thomas said:
The method body /itself/ dictates this.

The method body is not part of the contract. A different VM might
implement that particular method differently.

Christian
 
R

Roland

Dale King coughed up:
When running on a different VM. What I was saying is that the contract
for the method does not specify that it must work that way. It usually
does, but it ain't guaranteed.



You'll need to explain this to me then. It's not the contract.

The method body /itself/ dictates this.

It's possible that some compilers and vm's would take liberties with the
string class, since it is /very/ tightly coupled to the functionality of the
language proper. (I've pondered this myself in usenet before). But not
honoring the source would indicate a broken compiler/vm.

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

What about the above might change?
Sun, in a next release, or maybe even the Apache Harmony project, may
decide to implement the constructor differently, e.g. initializing the
instance using a native method:

String(int offset, int count, char value[]) {
init(offset, count, value);
}
private native init(int offset, int count, char value[]);

In this case you cannot tell whether the new instance shares the value
array with its originator.
--
Regards,

Roland de Ruiter
___ ___
/__/ w_/ /__/
/ \ /_/ / \
 
T

Tor Iver Wilhelmsen

Christian Gudrian said:
The method body is not part of the contract. A different VM might
implement that particular method differently.

Yes, but trimming to a char[] is the only rational reason for having
the String(String) constructor in the first place. (Java does not use
copy constructors the way C++ does.)
 
T

Thomas G. Marshall

Christian Gudrian coughed up:
The method body is not part of the contract.

No kidding. That's what I said. Here:

Thomas G. Marshall:
It's not the contract.
The method body itself dictates this.

Wait. Does this mean that a VM only has to honor the contract for all the
java classes, and not honor the actual classes themselves (as written by
sun) ? Are they allowed to, say, rewrite ArrayList.java to whatever they
feel like?

(The terms "has to" and "allowed" would be relating to what sun legally
requires for a vm to be able to call itself "java", obviously not what they
are physically able to do).

A different VM might
implement that particular method differently.

By changing the String.java / String.class ?
 
T

Thomas G. Marshall

Tor Iver Wilhelmsen coughed up:
Christian Gudrian said:
The method body is not part of the contract. A different VM might
implement that particular method differently.

Yes, but trimming to a char[] is the only rational reason for having
the String(String) constructor in the first place. (Java does not use
copy constructors the way C++ does.)


The constructor I mentioned is package scope only, and is used by the
substring() method for quick creation of a new string (by reusing the
char[]). This whole thread was about whether or not the substring() method
keeps or creates a new char array. The source shows that it keeps the one
as is.

The question AISI is now to what degree are manufacturers of VM's allowed to
change the source?
 
C

Chris Uppal

Thomas said:
Wait. Does this mean that a VM only has to honor the contract for all the
java classes, and not honor the actual classes themselves (as written by
sun) ? Are they allowed to, say, rewrite ArrayList.java to whatever they
feel like?

The stuff is src.zip is not part of the contract. It is not even /necessarily/
the source that Sun compiled to create the corresponding edition of the Java
platform. Other vendors may (and probably do) compile their version of the
platform from their version of the source.

The stuff in the bytecodes is not part of the contract either. In part because
it will necessarily include implementation artefacts that are not intended to
be stable over releases, but also because (IIRC) the Sun licence prohibits
decompilation.

Lastly, it has been fairly well established that the Sun JVM /does/ as a matter
of fact ignore the bytecode implementation of some of the core methods (mostly
to do with the core numeric functions). I find this faintly disturbing myself,
although it is no big deal in practise.

-- chris
 
T

Tor Iver Wilhelmsen

Thomas G. Marshall said:
The constructor I mentioned is package scope only, and is used by
the substring() method for quick creation of a new string (by
reusing the char[]). This whole thread was about whether or not the
substring() method keeps or creates a new char array. The source
shows that it keeps the one as is.

Sorry, I misinterpreted stuff: Yes, in that case the *speedy* thing is
to keep the char[] and not copy it.

But in a case where you take the substring of a longish String which
is then thrown away (e.g. extracting data from a text file), this is
wasteful.

It's indeed a problem that neither the JLS nor the javadoc for
substring() mention that the implementation can keep the original
char[] around, but instead claim it returns a "new String" which can
be interpreted as creating a new, shorter char[].

It has been reported to Sun:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4637640

They have closed it, because they consider it a duplicate of

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4546734

.... which is strange; that bug deals only with StringBuffer. Possibly
someone at Sun got hung up on the mention of StringBuffer in the first
and didn't consider the actual behavior reported (which was that of
String).

Hoewver, another bug report is still in progress:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622

That one has a reference to a freeware library with apparently a
better implementation of String.
 
T

Thomas G. Marshall

Tor Iver Wilhelmsen coughed up:
"Thomas G. Marshall"
The constructor I mentioned is package scope only, and is used by
the substring() method for quick creation of a new string (by
reusing the char[]). This whole thread was about whether or not the
substring() method keeps or creates a new char array. The source
shows that it keeps the one as is.

Sorry, I misinterpreted stuff: Yes, in that case the *speedy* thing is
to keep the char[] and not copy it.

But in a case where you take the substring of a longish String which
is then thrown away (e.g. extracting data from a text file), this is
wasteful.

It's indeed a problem that neither the JLS nor the javadoc for
substring() mention that the implementation can keep the original
char[] around, but instead claim it returns a "new String" which can
be interpreted as creating a new, shorter char[].

It has been reported to Sun:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4637640

They have closed it, because they consider it a duplicate of

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4546734

... which is strange; that bug deals only with StringBuffer. Possibly
someone at Sun got hung up on the mention of StringBuffer in the first
and didn't consider the actual behavior reported (which was that of
String).

I'm mentioned here more than once the business with StringBuffer. It
changed from 1.4.mumble to 1.5, where a new string with stripped char[] is
created. See my first post within this thread, and several other places
around these parts...
 
T

Thomas G. Marshall

Chris Uppal coughed up:
The stuff is src.zip is not part of the contract.

This term "contract" has multiple meanings here in OO, legal, and legal-OO
land. We're best avoiding it.

It is not even
/necessarily/ the source that Sun compiled to create the
corresponding edition of the Java platform. Other vendors may (and
probably do) compile their version of the platform from their version
of the source.

The stuff in the bytecodes is not part of the contract either. In
part because it will necessarily include implementation artefacts
that are not intended to be stable over releases, but also because
(IIRC) the Sun licence prohibits decompilation.

Lastly, it has been fairly well established that the Sun JVM /does/
as a matter of fact ignore the bytecode implementation of some of the
core methods (mostly to do with the core numeric functions). I find
this faintly disturbing myself, although it is no big deal in
practise.


Ah, ok, asked and answered, thanks Chris.

One would have happily thought that the entire point of the .classes would
be not to just provide something for your code to reference, but to actually
supply the code that is executed as blessed from sun.

I would have hoped that it was this way. I would have hoped that it was a
requirement of the licensing (to be able to call your product a "java" VM).

What /does/ make it a big deal is that it means that you cannot go to the
..java source *NOR even to the java .class* to find authoritative answers.

You're saying that *nothing* in the java source and bytecodes can be taken
for granted from JVM to JVM. That's a shame, and in my opinion, a
borderline-fiasco.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top