Substring changes (JDK 1.7)

J

Jan Burse

Dear All,
Recent versions of the JDK do not reuse the backing char[].
The reason is that the offset and length fields have been
removed from String to save memory.

Did this affect some of your code?

Bye
 
J

Joshua Cranmer

Dear All,
Recent versions of the JDK do not reuse the backing char[].
The reason is that the offset and length fields have been
removed from String to save memory.

Did this affect some of your code?

Bye


Wrong on both counts. Where did you read this nonsense?

<http://hg.openjdk.java.net/jdk7/jdk7-gate/jdk/file/tip/src/share/classes/java/lang/String.java>

<http://hg.openjdk.java.net/jdk8/jdk8-gate/jdk/rev/2c773daa825d>
suggests differently...
 
L

Lars Enderin

2013-01-10 18:22, markspace skrev:
That's 8, not 7. If you're going to ask about JDK 8, don't put "JDK
1.7" in your subject title.
The only question was in the OP. Jan Burse set the title, not Joshua.
 
J

Jan Burse

Jan said:
Dear All,
Recent versions of the JDK do not reuse the backing char[].
The reason is that the offset and length fields have been
removed from String to save memory.

Did this affect some of your code?

Bye

Its from JDK 1.7 Update 10

Look see:

C:\Users\Jan Burse>java -version
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

rt.jar:

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];

/** Cache the hash code for the string */
private int hash; // Default to 0

/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;

-- and --

public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > value.length) {
throw new StringIndexOutOfBoundsException(endIndex);
}
int subLen = endIndex - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return ((beginIndex == 0) && (endIndex == value.length)) ? this
: new String(value, beginIndex, subLen);
}

-- and --

public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
 
J

Jan Burse

Hi,

It was originally observed in a Scala newsgroup:

why is String grouped() so slow?
https://groups.google.com/forum/?fromgroups=#!topic/scala-user/D1qmblInfyg

Bye

Jan said:
Jan said:
Dear All,
Recent versions of the JDK do not reuse the backing char[].
The reason is that the offset and length fields have been
removed from String to save memory.

Did this affect some of your code?

Bye

Its from JDK 1.7 Update 10

Look see:

C:\Users\Jan Burse>java -version
java version "1.7.0_10"
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

rt.jar:
 
R

Roedy Green

Did this affect some of your code?

If this change happens, you would no longer consider using new String(
String) to unencumber a substring.

You no longer have to worry a about a tiny substring holding a meg+
sized base string around in memory.
 
R

Robert Klemme

If this change happens, you would no longer consider using new String(
String) to unencumber a substring.

You no longer have to worry a about a tiny substring holding a meg+
sized base string around in memory.

Instead you have to worry about tons of substrings drawn from the same
input String to occupy a lot more memory and slowing down GC. Trade
offs, trade offs...

Cheers

robert
 
R

Roedy Green

Instead you have to worry about tons of substrings drawn from the same
input String to occupy a lot more memory and slowing down GC. Trade
offs, trade offs...

I wonder if it could work like this.

Perhaps GC could notice a giant string encumbered by a few small
strings, and could do a new String for you and gc the base string.

If you don't need the base string itself, I think most of the time you
are best off to do the new string.

For what I do, I am peeling off small strings from a big string which
represents a file image. I keep the big string to the last minute.
Encumbering works well for me.
 
S

Stefan Ram

Robert Klemme said:
Instead you have to worry about tons of substrings drawn from the same
input String to occupy a lot more memory and slowing down GC. Trade
offs, trade offs...

But this is more natural, it fulfills the expection of non-expert
programmers. Expert programmers can implement a custom string class
with the previous behaviour, or, - possibly better - a custom
implementation of CharSequence (if only more APIs would use
CharSequence instead of String!).
 
R

Robert Klemme

But this is more natural, it fulfills the expection of non-expert
programmers.

But it would be a significant change. There is so much software written
under the assumption of the old implementation. That change might
actually break existing programs (break in the sense of less performance
or new GC issues).

Then again it might be that there are just not that many programs which
make use of that knowledge. Who knows?
Expert programmers can implement a custom string class
with the previous behaviour,

Well, shouldn't such a basic thing be part of the standard library?
or, - possibly better - a custom
implementation of CharSequence (if only more APIs would use
CharSequence instead of String!).

I agree. But unfortunately public classes and APIs are set in stone.

Kind regards

robert
 
R

Roedy Green

Well, shouldn't such a basic thing be part of the standard library?

String is final and many things take a String parm and nothing else.
You can create something similar and use it like String.
 
J

Jan Burse

Roedy said:
If this change happens, you would no longer consider using new String(
String) to unencumber a substring.

You no longer have to worry a about a tiny substring holding a meg+
sized base string around in memory.

Have to sift through my code and check
every line that uses substring() whether
there is some better solution.

For example I trapped myself doing things like:

int k = path.lastIndexOf('/');
while (k!=-1) {
String name = path.substring(k+1);
/* do something with name */
path = path.substring(0,k);
k = path.lastIndexOf('/');
}

I guess the compiler cannot eliminate the copying
in the last substring(0,k), since String does not
have a length field anymore.

It would need to introduce an extra field in the
code, this also how I would rewrite the code and
used the two arguments variant of lastIndexOf.

But I guess the JIT cannot do it automatically,
or will it? Ever seen a tool that shows the
JITed assembler?

Bye

P.S.: I also wonder how performant java.io.File
now is.
 
J

Jan Burse

Chris said:
And what's wrong with that ? Seems a sensible approach to me.

If you mean that it's suddenly/significantly/ slower, then I don't believe
you. (Though I freely admit that there will be a tiny few cases where it
/does/ matter -- in which cases I will be wrong.)

-- chris

With the sharing semantics, its complexity is O(n+m), where
n is the length of the string and m is the number of
backslashes. The m counts for the number of creation of
shared String shells.

Without the sharing semantics, when substring copies, its
complexity is O(n^2), assuming m is not too small. In each
of the m interation you do not anymore create a String shell,
but instead in the following statement

path = path.substring(0,k);

you do copy a fair amount of path. JDK 1.7 Update 10 has not
anymore the sharing semantics. So when my m are not too small,
its probably a good idea to rewrite the code.

Bye
 
J

Jan Burse

Jan said:
Without the sharing semantics, when substring copies, its
complexity is O(n^2), assuming m is not too small. In each
of the m interation you do not anymore create a String shell,
but instead in the following statement

I guess a better estimate would be O(m^2 * n/m) = O(m * n).
 
R

Robert Klemme

String is final and many things take a String parm and nothing else.
You can create something similar and use it like String.

There is no reason in what you say that it should not be part of the std
lib.

Kind regards

robert
 
M

markspace

There is no reason in what you say that it should not be part of the std
lib.

javax.swing.text.Segment preserves the semantics of a shared buffer.
It's not a drop-in replacement for String (many of the methods differ or
are absent). But Segment is extensible, so critical missing methods
could be added.

I wonder if the best way to go would be to cheat and have String
extended into a SharedString with the old implementation. This would
violate the finality of String, but it's possible to synthesize these
sorts of things if one has control of the JVM. Obviously, this needs to
come from Oracle.
 
J

Jan Burse

markspace said:
javax.swing.text.Segment preserves the semantics of a shared buffer.
It's not a drop-in replacement for String (many of the methods differ or
are absent). But Segment is extensible, so critical missing methods
could be added.

Not available on Android I guess, :-(
 
M

markspace

Not available on Android I guess, :-(


Or you could just write your own from scratch. It's not hard. But
again I kind of doubt anyone is doing enough heavy string processing on
a small embedded device like Android where this kind of thing is going
to affect actual performance.

Didn't your original complaint come from the Scala group? What does
Scala have to do with Android?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top