Substring

  • Thread starter Dirk Bruere at NeoPax
  • Start date
D

dimka

Some times ago, I have problem with strings. In profiler, I saw that
more than 70% of all memory used for String. If you have same problem,
you can use libraries for optimizing(this is true java way :)). I used
javolution(http://javolution.org/), for example. I replaced all string
with javolution.text.Text:
This class has the same methods as Java String and .NET String with
the following benefits:
--No need for an intermediate StringBuffer/StringBuilder in order to
manipulate textual documents (insertion, deletion or concatenation).
--Bug free. They are not plagued by the String.substring(int) memory
leak bug (when small substrings prevent memory from larger string from
being garbage collected).
--More flexible as they allows for search and comparison with any
java.lang.String or CharSequence.
--Support custom allocation policies (instances allocated on the
"stack" when executing in a StackContext).
 
E

Eric Sosman

Dirk said:
I assume that "" is not the same as a null string?
What is it like in memory?

A String of zero length is an empty String. It is a
perfectly valid String object, and you can do with it all
the same things you can do with any other String. You
can query its length, you can compute its hashCode, you
can compare it for equality or for order with other Strings,
you can concatenate it with other Strings, ... It's a String.

`null' is a value that a reference variable may have,
signifying that it refers to no object at all.
 
D

Dirk Bruere at NeoPax

Eric said:
A String of zero length is an empty String. It is a
perfectly valid String object, and you can do with it all
the same things you can do with any other String. You
can query its length, you can compute its hashCode, you
can compare it for equality or for order with other Strings,
you can concatenate it with other Strings, ... It's a String.

`null' is a value that a reference variable may have,
signifying that it refers to no object at all.

Got it.
And to test for an empty string...

myString.equals(""); ?

Dirk

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.theconsensus.org/ - A UK political party
http://www.onetribe.me.uk/wordpress/?cat=5 - Our podcasts on weird stuff
 
M

Mark Space

Dirk said:
And to test for an empty string...

myString.equals(""); ?


Yes, almost. "myString" may be a null reference, so you might get a
NullPointerException here. For this reason, it's common to see this:

"".equals( myString )

You can use your way if you like, but if you see this, don't get
confused. It's the same, the author just didn't want to to check for
myString == null separately.
 
M

Mark Space

Dirk said:
Yes, well I am trying it in a prog that doesn't work.
Hence the question.

You can make a second program to test this easily. Then also you can
read the source code for substring(). Most of the Java API is pretty
readable.
 
D

Dirk Bruere at NeoPax

Mark said:
Yes, almost. "myString" may be a null reference, so you might get a
NullPointerException here. For this reason, it's common to see this:

"".equals( myString )

You can use your way if you like, but if you see this, don't get
confused. It's the same, the author just didn't want to to check for
myString == null separately.

Neat!

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.theconsensus.org/ - A UK political party
http://www.onetribe.me.uk/wordpress/?cat=5 - Our podcasts on weird stuff
 
D

Dirk Bruere at NeoPax

L

Lew

Eric said:
     If `myString' has the value `null', both of these tests will
throw exceptions.  For this reason, in some circumstances it may
be preferable to write `"".equals(myString)', which will return
`true' or `false' but will never throw up.

It is stronger to test 'myString' for 'null' explicitly, or to set up
an invariant (enforced with 'assert' clauses) that it cannot be.

If the logic of the algorithm requires non-nullity, an explicit check
dcouments that.
 
T

Tom Anderson

The nil string ("null string" is just too confusing) would only "hold on" to
the character array because it was used by some other String expression,

It might be, it might not be. For instance:

String x = readMassiveFile();
x = x.substring(23, 23);

You're now holding the massive file's characters in memory despite there
being no way to use any of them. Note that this is just a special case of
the more general problem of string packratting, where you start off with a
big string, chop any combination of smaller bits out, and throw away the
big one, which leaves the big string's characters held in memory. Of
course, java's designers knew about this, and decided the tradeoff was
still worthwhile; i'm sure they're right in the general case, although i'd
love to see some measurements.

However, while the buffer-sharing approach may make sense in general, for
the empty string, it doesn't. It would have been very easy to put a guard
clause at a suitable point in substring that did:

if (beginIndex == endIndex) return "";

That would return an empty string from the constant pool, which would not
hold the character array from 'this' (or any other non-constant-pool
string, i assume) in memory. Plus, since constant pool strings are
interned, it would mean that all empty strings returned from substring
would be identical, which would occasionally speed up comparisons. And it
would avoid constructing a new object for empty substrings. All this would
cost just one extra integer comparison in substring, so would surely
(famous last words) be worth it.

tom
 
A

Arved Sandstrom

Tom said:
It might be, it might not be. For instance:

String x = readMassiveFile();
x = x.substring(23, 23);

You're now holding the massive file's characters in memory despite there
being no way to use any of them. Note that this is just a special case
of the more general problem of string packratting, where you start off
with a big string, chop any combination of smaller bits out, and throw
away the big one, which leaves the big string's characters held in
memory. Of course, java's designers knew about this, and decided the
tradeoff was still worthwhile; i'm sure they're right in the general
case, although i'd love to see some measurements.
[ SNIP ]

As suggested in 2001 and 2005 Sun bug reports regarding this very
behaviour of substring(...), if the implementation of substring() does
not change it would be helpful if the Javadoc for the method points out
this behaviour. I don't believe that the current behaviour is what most
users would intuitively expect.

Having said that, I don't think it's a bad default behaviour. There are
so many different permutations of how substrings are manufactured and
used that I'm doubtful that the other approach, copying the relevant
part of the original character array, is going to be better across all
use cases.

Judging by the defect report comments there are as many folks who like
the current behaviour as those who don't. There seems to be a general
acknowledgment though that the Javadocs should point out the
consequences of the implementation, and also how to get around it by
using the String(String) ctor, as Mark Space pointed out.

AHS
 
M

Mark Space

Tom said:
if (beginIndex == endIndex) return "";

Interestingly, the no-argument for String returns a new, empty String
instead of a shared instance:

public String() {
this.offset = 0;
this.count = 0;
this.value = new char[0];
}


I wonder why?
 
M

Mike Schilling

Mark said:
Tom said:
if (beginIndex == endIndex) return "";

Interestingly, the no-argument for String returns a new, empty String
instead of a shared instance:

public String() {
this.offset = 0;
this.count = 0;
this.value = new char[0];
}


I wonder why?

Show us the constructor syntax that would return a shared instance. :)
(Admittedly, the zero-length char array could be shared.)
 
M

Mark Space

Mike said:
Show us the constructor syntax that would return a shared instance. :)
(Admittedly, the zero-length char array could be shared.)


....


... frik ding blast Java won't let me put a "return" statement in a
ctor ... what is the matter with this language?

ok, ok, I guess I didn't think about that comment much.
 
L

Lew

...

  ... frik ding blast Java won't let me put a "return" statement in a
ctor ... what is the matter with this language?

ok, ok, I guess I didn't think about that comment much.

"Return" as a normal English verb applies just fine, if informally, to
the result of a ctor. In that context, one simply knows that it
doesn't mean to put a 'return' in the ctor itself, but refers to the
result of a 'new' expression with that ctor.

I guess if you want to be nitpicky, you could say the ctor "yields" an
instance, or "constructs" an instance, but really, why fret over
saying that it "returns" an instance?
 
S

Seamus MacRae

Peter said:
That's a false dichotomy, since most of the time, the fast algorithm
also uses less memory.

A false dichotomy would have been if he'd said programmers ALWAYS have
to choose between fast algorithms that use more memory and slow ones
that use less memory. He did not. He said they OFTEN have to, and so far
as I am aware, that much is true.
 
S

Seamus MacRae

Arved said:
Having said that, I don't think it's a bad default behaviour. There are
so many different permutations of how substrings are manufactured and
used that I'm doubtful that the other approach, copying the relevant
part of the original character array, is going to be better across all
use cases.

Far more important, the current design gives the programmer the choice:
they can use substring, or use new String(substring) (and maybe also
substring.intern()).

If substring() made a copy, the reverse would not be true, since Java
doesn't allow direct access to the char[] in a String*.
Judging by the defect report comments there are as many folks who like
the current behaviour as those who don't. There seems to be a general
acknowledgment though that the Javadocs should point out the
consequences of the implementation, and also how to get around it by
using the String(String) ctor, as Mark Space pointed out.

Yes, the Javadocs should do so.

* It is possible to meddle with the char[] using reflection, except in
an unsigned applet, but this is slow, awkward, and hazardous. It is also
possible to roll your own String-like class, implementing it with
whatever optimizations you choose, but it won't interoperate with code
that expects java.lang.String objects.
 
S

Seamus MacRae

Lew said:
"Return" as a normal English verb applies just fine, if informally, to
the result of a ctor. In that context, one simply knows that it
doesn't mean to put a 'return' in the ctor itself, but refers to the
result of a 'new' expression with that ctor.

I guess if you want to be nitpicky, you could say the ctor "yields" an
instance, or "constructs" an instance, but really, why fret over
saying that it "returns" an instance?

You're usually the one nitpicking over precision of language.

Personally, I'd have designed the language to have constructors
automatically private. Static factory methods, with actual NAMES, would
be the way to return new instances. String's could share an instance,
and inefficient instance creation would rarely become locked in by
changing it breaking the API. Changing a ctor to a static factory method
breaks your API, whereas changing the implementation of an existing
static factory method may avoid doing so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Official Java Classes 10
Can an Applet beep? 4
ListModel name 10
Accessing static field 21
Sorting a JList 4
File over network timeout 3
Free keyboard applet 5
Change character in string 105

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top