impl a collection

C

Chris Smith

Roedy said:
Hmm. The String class code has changed. The new code avoids pinning
large blocks of RAM, but it does more System.arrayCopys.

Nope, not the case, unless you're thinking of something different from
me. The issue here has always been that String.substring resulted in
saving the entire character data from the original String, and that
still happens. The code you quoted:
public String substring(int beginIndex, int endIndex) {
[argument checking]
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex,
value);
}

Note that, assuming this isn't the trivial case of a no-op substring,
the functionality comes down to the constructor String(int,int,char[]).
Perhaps you're confusing that with String(char[],int,int) -- but they
are really quite different. The former constructor, which is used here,
does not perform a single copy of the data at all, but merely reuses the
same array, but creates a new value with different offsets and lengths
so as to preserve the character array.

So the same problem still exists.

The constructor that you quote, on the other hand, does make a copy if
the underlying data is larger than what is desired. That's a special
feature of that constructor, not something that's inherent to all
constructors.
The question now is, when did it change?

And the answer, I think, is that it never did.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

Roedy: I hate to mention this, but you're one of those people.
You recently ranted against XML. We can now justify using a text-
based interchange format precisely because the costs of memory
and bandwidth have dropped.

Bandwidth is may be cheap but it is nowhere near fast enough for
psychological comfort. We need to stop washing our feet in bandwidth
and treat it like a precious resource, the way we used to treat RAM,
for a least another 5 years.

Even if you feel comfy with a T1 most of your customers are getting by
with much less.
 
S

Sudsy

Roedy said:
Now there's another assinine idea -- ASCII armouring on 8-bit
transparent channels.

Why? It avoids many of the pitfalls associated with vendor-specific
protocol implementations. Does a file termination flag consist of
^Z, a NULL, a null byte, or none of the above?
Would you prefer a mix of text and binary? Even MIME uses encoding
rules. Do you wish to revert to the days when you couldn't attach
data in any format but text?
PKC using asymmetric keys makes perfect sense to me.
I guess I just don't understand your dismissal of XML. While you
rail against it, you unintentionally provide justification for its
popularity.
Quite obviously YMV.
 
R

Roedy Green

XML is flagrant conspicuous waste. See Veblen's Theory of the Leisure
Class.
http://www.amazon.com/exec/obidos/ASIN/0140187952/canadianmindprod

XML is waste for the sake of waste.

I find it morally repugnant.

I have got to put together a proper XML logo. It will be one of those
grossly obese Hawaiian women who were kept in huts and fed huge
amounts of poi to make them fat as possible. They were so fat they
could not move and had to be hauled about on slings. Remember Alii Nui
Queen Malama in Hawaii?

I tease the XML creators with a purported photo of the inventors on my
website at http://mindprod.com/jgloss/xml.html
 
T

Tony Morris

Note that new String does not necessarily make a copy, though it does
create a new String object.

Of course it does.
String is immutable (JLS 3.10.5) (almost:
http://www.xdweb.net/~dibblego/java/trivia/answers.html#q1)
hence the ability to immediately deduce that the original statement was very
blatantly incorrect.

--
Tony Morris
(BInfTech, Cert 3 I.T.)
Software Engineer
(2003 VTR1000F)
Sun Certified Programmer for the Java 2 Platform (1.4)
Sun Certified Developer for the Java 2 Platform
 
R

Roedy Green

return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex,

This means a technique of coding I often use that used to be very
efficient is now very costly.

I would process a giant string in chunks and keep carving off a tiny
chunk off the head, like this looping


rest = rest.substring( smalloffset );


Now instead will have to keep track of where I am in the giant string
instead and do all offset relative to the very start.

I was wondering why it was doing GC so frequently.
 
R

Roedy Green

Why? It avoids many of the pitfalls associated with vendor-specific
protocol implementations.

The current scheme is a protocol everyone has to use, and it is
inefficient in terms of bandwidth, and is clumsy to process. You
could simply send the data down the wire in binary without farting
around with layers of wrapping. If you have an error-correcting
protocol it is not as if you need to scan for start of record.

We are using protocols that are solving problems that don't exist
anymore, e.g. 7 bit channels, and lost data characters.

You could treat sockets almost as casually as files.
 
R

Roedy Green

Of course it does.
String is immutable (JLS 3.10.5) (almost:
http://www.xdweb.net/~dibblego/java/trivia/answers.html#q1)
hence the ability to immediately deduce that the original statement was very
blatantly incorrect.

Not so. Substring USED to simply record an offset into the base
immutable char array. It did not need to make a copy. This is part of
the beauty of Java. Sun was able to change the way this worked without
me noticing.

This all worked because Strings are immutable.
 
C

Chris Smith

Roedy said:
This means a technique of coding I often use that used to be very
efficient is now very costly.

Roedy, you may want to read my response before you go change all your
code. There has been no change in this implementation detail.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

Roedy, you may want to read my response before you go change all your
code. There has been no change in this implementation detail.

Since which release are you sure it has not changed? I clearly recall
studying the code and figuring this all out. I have been around a long
time. This may be way back to Java 1.0.1 or 1.0.2
 
T

Tony Morris

Not so. Substring USED to simply record an offset into the base
immutable char array. It did not need to make a copy. This is part of
the beauty of Java. Sun was able to change the way this worked without
me noticing.

This all worked because Strings are immutable.

First, I always assume that we are talking about the Java programming
language (isn't that the topic of this forum?).
This language is strictly defined by several specifications, including a
language specification, API specification, a VM specification.
These specifications make it clear (by omission) that the original statement
is incorrect - I can't see how this can be argued with.
The statement might better (but still not good) have been rephrased as "The
Sun Java implementation ..." (based on your assumption) since otherwise,
people like me, assume that we are talking about Java, and conclude that the
statement is quite obviously incorrect.

Writing code that relies on some implementation detail is general poor form.
Making recommendations without qualifying which implementation detail it is
that you are relying on is even worse.
Making recommendations that begin with the statement "start by mutating your
strings" is again, even worse and discredits the remainder of the
recommendation.

Given the level of inaccuracy of the statements made (all of them), a
general conclusion is that it is incorrect.

One thing that I might also point out is that an "immutable char array"
simply does not exist - never has - Arrays are always mutable.

--
Tony Morris
(BInfTech, Cert 3 I.T.)
Software Engineer
(2003 VTR1000F)
Sun Certified Programmer for the Java 2 Platform (1.4)
Sun Certified Developer for the Java 2 Platform
 
C

Chris Smith

Roedy said:
Since which release are you sure it has not changed? I clearly recall
studying the code and figuring this all out. I have been around a long
time. This may be way back to Java 1.0.1 or 1.0.2

I'm sure that in 1.4.2 and build 1.5.0-beta-b31, String.substring does
not copy any of the character data of the original String. It's
possible that some minor aspect of the way this is done has changed,
because I didn't go off in search of older code to analyse... but in any
case, your recent statements (that modern Java does copy String contents
on substring) are not accurate, and your code that slices off pieces of
a larger String using substring is just as efficient as it always was.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

I'm sure that in 1.4.2 and build 1.5.0-beta-b31, String.substring does
not copy any of the character data of the original String. It's
possible that some minor aspect of the way this is done has changed,
because I didn't go off in search of older code to analyse... but in any
case, your recent statements (that modern Java does copy String contents
on substring) are not accurate, and your code that slices off pieces of
a larger String using substring is just as efficient as it always was.

I checked in 1.4.2_04 and discovered I am indeed wrong. On the other
hand I distinctly recall looking through the code for String, intern,
StringBuffer etc to learn how they all worked. I was suprised to find
they use this pinning mechanism. It seems unlikely the pinning
mechanism is something I conjured up in a dream. What I would like to
find out is WHEN it switched.
 
C

Chris Smith

Roedy said:
I checked in 1.4.2_04 and discovered I am indeed wrong. On the other
hand I distinctly recall looking through the code for String, intern,
StringBuffer etc to learn how they all worked. I was suprised to find
they use this pinning mechanism. It seems unlikely the pinning
mechanism is something I conjured up in a dream. What I would like to
find out is WHEN it switched.

I'm still not sure we're communicating. The "pinning mechanism" *does*
exist. That's what I'm trying to say. All this stuff that you're
wondering when you dreamed up is really there, and as far as I know it's
always been there.

I suspect you read the code correctly some time back, but are missing
something now. What's incorrect here is your belief that it's gone now.
When you first said it was gone, you posted code that calls one
constructor, and the source code to a completely different (and
unrelated) constructor as evidence. I believe you've just made a simple
mistake in tracing the code.

Unless, of course, I don't understand what you mean by "pinning
mechanism" in the first place...

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
R

Roedy Green

Writing code that relies on some implementation detail is general poor form.

Java is designed so you CAN'T write code based on some implementation
detail. With the pinning issue at hand, are two ways to write code.
One works faster with a pinning substring implementation, the other
with the new non-pinning implementation. You might as well pick the
style that fits you current JVM. What point would there be in picking
the slow style if both are roughly the same difficulty to code?
 
R

Roedy Green

One thing that I might also point out is that an "immutable char array"
simply does not exist - never has - Arrays are always mutable.

Not the char[] array inside a String. There is method that exists to
modify it and no way to add one.

They are as immutable as the primitive ints inside an Integer, which
normally also are mutable.
 
R

Roedy Green

Unless, of course, I don't understand what you mean by "pinning
mechanism" in the first place...

I have taken another look. You are right. I did not make this up.
Base strings are STILL pinned.


When I did a detailed look at this a long time ago, I discovered
substring worked NOT by copying a string a characters and creating a
new String object with its own internal char[]. Instead it just
pointed to the BASE string's char[] object, and recorded an offset and
length.

If you then did a short substring of a giant string, and dropped all
references to the giant string, the giant String (or at least its
char[] guts) were pinned in RAM by the internal reference of the new
substring.

You could break this pinning/encumbrance by doing a new String on the
substring, which force a copy, and frees ongoing dependence on the
base.

New String is smart enough to avoid the copy if the substring and the
full string are identical. There is no benefit in breaking the pin,
and a penalty of creating a duplicate.


the code in there TODAY for substring looks like this:

/**
* Returns a new string that is a substring of this string. The
* substring begins at the specified <code>beginIndex</code> and
* extends to the character at index <code>endIndex - 1</code>.
* Thus the length of the substring is
<code>endIndex-beginIndex</code>.
* <p>
* Examples:
* <blockquote><pre>
* "hamburger".substring(4, 8) returns "urge"
* "smiles".substring(1, 5) returns "mile"
* </pre></blockquote>
*
* @param beginIndex the beginning index, inclusive.
* @param endIndex the ending index, exclusive.
* @return the specified substring.
* @exception IndexOutOfBoundsException if the
* <code>beginIndex</code> is negative, or
* <code>endIndex</code> is larger than the length of
* this <code>String</code> object, or
* <code>beginIndex</code> is larger than
* <code>endIndex</code>.
*/
public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex -
beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex,
value);
}


There is no sign of the old pinning logic there.

Further one-parm substring is defined in terms of that with:

public String substring(int beginIndex) {
return substring(beginIndex, count);
}

Again so sign of the alleged pinning.


However, if you look at the hidden String constructor used internally,
you see:

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

there you are! creating the substring with a pin!!! Just as I
remembered. So things have not changed, perhaps just rearranged a
little. Substring DOES pin the base String's char[] array in RAM. The
pinning is done in the hidden String constuctor used by substring.
 
T

Tony Morris

Roedy Green said:
form.

Java is designed so you CAN'T write code based on some implementation
detail.

Not entirely accurate - since, you certainly can.
"Java has a design intention of ..." is more accurate.
With the pinning issue at hand, are two ways to write code.
One works faster with a pinning substring implementation, the other
with the new non-pinning implementation. You might as well pick the
style that fits you current JVM. What point would there be in picking
the slow style if both are roughly the same difficulty to code?

That's the point - there is no concept of "you [sic] current JVM" anywhere
in this thread, except for an assumption with no real basis.

I assumed Java - correct me if I am wrong, but that is the topic of this
forum.

--
Tony Morris
(BInfTech, Cert 3 I.T.)
Software Engineer
(2003 VTR1000F)
Sun Certified Programmer for the Java 2 Platform (1.4)
Sun Certified Developer for the Java 2 Platform
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top