how actually string store in java machine

M

markspace

... prints

Foo.HELLO == "Hello": true

... on my Java 1.7.0_51 system. You might also want to re-read
Section 3.10.5 of the Java Language Specification; for Java 7 and 8
it reads (in part)


Well, that I did not expect. No wonder it takes so long to load a class
if each class has to be parsed for string literals and each one added to
a hash map someplace.

Grrr, I'm actually mad about that. String are *required* to be
intern'd?! That's really funky.
 
R

Roedy Green

And certainly if you read a string from a file or the network, the VM
does not call String.intern() for you. That would be silly.

1. It is a fairly substantial operation, like a HashMap put.
2. IIRC it suppresses GC.
--
Roedy Green Canadian Mind Products http://mindprod.com
"A great lathe operator commands several times the wage of an average lathe
operator, but a great writer of software code is worth 10,000 times the
price of an average software writer."
~ Bill Gates
 
R

Roedy Green

Well, we do now have a couple of java.lang.String.join() methods.
And a java.util.StringJoiner class.

String message = String.join("-", "Java", "is", "cool");
// message returned is: "Java-is-cool"

There is also a version that takes an Iterable.


The String "[George:Sally:Fred]" may be constructed as follows:


StringJoiner sj = new StringJoiner( ":", "[", "]" );
sj.add( "George" ).add( "Sally" ).add( "Fred" );
String desiredString = sj.toString();

StringJoiner lives in java.util not java.lang like String.
--
Roedy Green Canadian Mind Products http://mindprod.com
"A great lathe operator commands several times the wage of an average lathe
operator, but a great writer of software code is worth 10,000 times the
price of an average software writer."
~ Bill Gates
 
J

Jan Burse

Jan said:
The bottom line could be, that we are not in position to
say what the above program does alone from the Java
Language Specification, since it is JVM implementation
dependent. But I might be wrong.

I am wrong, see also other poster who looked up JLS.

Test results so far, for a slightly modified code:

public static void main(String[] args) {
String t1 = A.getT();
String t2 = B.getT();
System.out.println(t1==t2);
}

jdk1.5.0_22: true
jdk1.6.0_45: true
jdk1.7.0_51: true
jdk1.8.0: true
 
E

Eric Sosman

Well, that I did not expect. No wonder it takes so long to load a class
if each class has to be parsed for string literals and each one added to
a hash map someplace.

There's surely no parsing to speak of. The string literals'
data arrives in the class' constant pool, along with a lot of
other stuff, and is pretty easily located therein. I suspect
that the time required to fish them out and intern() them is
peanuts compared to the time spent on the rest of class loading
and initialization (but I have absolutely no data to support my
suspicion). And then there's the JIT ...
Grrr, I'm actually mad about that. String are *required* to be
intern'd?! That's really funky.

I think so, too. It feels like an implementation detail
that might better have been left unspecified. Similarly, I
feel that the span of pooled Integer and suchlike objects ought
not to be part of the language definition, but should have been
left to the discretion of a JVM's implementors.
 
T

taqmcg

....
I think so, too. It feels like an implementation detail
that might better have been left unspecified.

But that would mean that the behavior of programs that compared constant strings was subject to subtle implementation dependencies, anathema to the spirit behind the development of Java. I'd suggest the fact that the designers spelled this out is a testament the to the concern they had to making Java a really portable language and the thoroughness of their analysis.

Tom McGlynn
 
E

Eric Sosman

But that would mean that the behavior of programs that compared constant strings was subject to subtle implementation dependencies, anathema to the spirit behind the development of Java. I'd suggest the fact that the designers spelled this out is a testament the to the concern they had to making Java a really portable language and the thoroughness of their analysis.

Yes, it would change the behavior -- of programs that relied
on the dubious practice of comparing "value" objects with == rather
than with equals(). All I'm saying is that I think it would have
been better not to encourage the dubious practice. Too late now,
of course.

As for "subtle implementation dependencies" -- Well, Java is
certainly not free of them. It has fewer such dependencies than
many other languages I've used, but still has enough to make life
interesting.
 
T

taqmcg

Yes, it would change the behavior -- of programs that relied
on the dubious practice of comparing "value" objects with == rather
than with equals(). All I'm saying is that I think it would have
been better not to encourage the dubious practice. Too late now,
of course.

As for "subtle implementation dependencies" -- Well, Java is
certainly not free of them. It has fewer such dependencies than
many other languages I've used, but still has enough to make life
interesting.

Personally I've been very impressed with how carefully consistency of execution is addressed in the Java design. The only places I've seen real implementation dependenciy is in the interaction with the world external to the Java environment (notably the file system) and multithreading. Not sure that there's really anything one can do about the first. I don't use multithreaded tasks much but I gather the technology was not sufficiently developed to fully specify this. Regardless my sense is that most of the variability in multithreaded applications is not so much differences in implementation of threading but in the interactions between the threads and the scheduler which again might be thought of as something external.

I'd be interested in understanding other areas where Java programs can legally give different answers absent some difference in external inputs. The only one I'm aware of is the non-strict handling of floating point. They tried to enforce consistency there originally -- strict was the original requirement, but the performance cost there was simply too great and the tiny degree of implementation dependency that's allowed there now affects a miniscule fraction of calculations.

My sense of the spec was not that they wanted to encourage users to use == to compare constant strings -- nor to discourage it -- but they recognized that it was a legal operation and so that it needed a defined result. From that perspective they might have chosen that constants in different classes would use different instances. I don't know that that would be a better or worse choice, but either would -- from the perspective of Write-once, run many times -- be better than leaving it unspecified. I used the wordanathema above, and I think that was appropriate for their (i.e., the designers of Java) view of leaving something undefined as a mechanism to discourage its use.


Regards,
Tom McGlynn
 
E

Eric Sosman

Personally I've been very impressed with how carefully consistency of execution is addressed in the Java design. The only places I've seen real implementation dependenciy is in the interaction with the world external to the Java environment (notably the file system) and multithreading. Not sure that there's really anything one can do about the first. I don't use multithreaded tasks much but I gather the technology was not sufficiently developed to fully specify this. Regardless my sense is that most of the variability in multithreaded applications is not so much differences in implementation of threading but in the interactions between the threads and the scheduler which again might be thought of as something external.

I'd be interested in understanding other areas where Java programs can legally give different answers absent some difference in external inputs. The only one I'm aware of is the non-strict handling of floating point. They tried to enforce consistency there originally -- strict was the original requirement, but the performance cost there was simply too great and the tiny degree of implementation dependency that's allowed there now affects a miniscule fraction of calculations.

My sense of the spec was not that they wanted to encourage users to use == to compare constant strings -- nor to discourage it -- but they recognized that it was a legal operation and so that it needed a defined result. From that perspective they might have chosen that constants in different classes would use different instances. I don't know that that would be a better or worse choice, but either would -- from the perspective of Write-once, run many times -- be better than leaving it unspecified. I used the word anathema above, and I think that was appropriate for their (i.e., the designers of Java) view of leaving something undefined as a mechanism to discourage its use.

As I said, Java behaves more consistently across platforms than
other languages in my experience. "Write Once, Run Anywhere" was a
goal of the language, but like "Don't Be Evil" it's not a goal that
was attained in perfection. A few examples of variability:

- Integer.valueOf(int): "This method will always cache values in
the range -128 to 127, inclusive, and *may* [emphasis mine]
cache other values outside of this range." Hence, a test
like `System.identityHashCode(Integer.valueOf(200)) ==
System.identityHashCode(Integer.valueOf(200))' may yield
either true or false, depending on the implementation. Much
the same holds for other primitive wrapper classes, too.

- HashMap: "This class makes no guarantees as to the order of
the map; in particular, it does not guarantee that the order
will remain constant over time." Also, "Note that the fail-fast
behavior of an iterator cannot be guaranteed."

- Map: "The behavior of a map is not specified if the value of
an object is changed in a manner that affects equals comparisons
while the object is a key in the map." Also, "Implementations
are free to implement optimizations whereby the equals invocation
is avoided, for example, by first comparing the hash codes of the
two keys," so you might *or might not* hit a breakpoint (etc.)
in an equals() method when searching or inserting.

- Class.getMethods(): "The elements in the array returned are
not sorted and are not in any particular order." You might
get differently-ordered arrays for the same class from
different JVM's, and the same goes for getConstructors()
and so on, too.

- Evaluate `Object.class.hashCode() > System.class.hashCode()'.
Discuss.

I'll grant that a "sane" program would not be affected by these
or similar implementation dependencies. It seems to me, though,
that a "sane" program wouldn't compare String instances with `=='.
 
T

taqmcg

On 3/28/2014 9:58 AM, taqmcg@...
As I said, Java behaves more consistently across platforms than
other languages in my experience. "Write Once, Run Anywhere" was a
goal of the language, but like "Don't Be Evil" it's not a goal that
was attained in perfection. A few examples of variability:
.... examples delete ...
Thanks Eric. The first is pretty close to the situation with strings. Butit's interesting to note that these are all (I think) in the implementation of methods so that the documentation of these system dependencies is not in the language standard itself, but in the Javadocs for the standard library. No methods are involved in the == comparison of strings, so if it were to be system dependent, then presumably that would have to be noted inthe JLS itself. Are there examples where the system dependencies are explicitly called out in the JLS, i.e., similar to allowing non-strict math results?

Regards,
Tom McGlynn
 
E

Eric Sosman

... examples delete ...
Thanks Eric. The first is pretty close to the situation with strings. But it's interesting to note that these are all (I think) in the implementation of methods so that the documentation of these system dependencies is not in the language standard itself, but in the Javadocs for the standard library. No methods are involved in the == comparison of strings, so if it were to be system dependent, then presumably that would have to be noted in the JLS itself. Are there examples where the system dependencies are explicitly called out in the JLS, i.e., similar to allowing non-strict math results?

I dunno. In Original Java the == operator behaved the same way for
all reference types (it asked "Do these two references denote the same
instance?"), so the results of using == on two Integer references, for
example, could differ depending on how many values the particular JVM
chose to cache. But then they changed the meaning of == for the
primitive wrappers, so that difference went away (and that's why I
mentioned using System.identityHashCode() in the comparison).

It strikes me, though, that you seem to argue Java's platform
independence by progressively whittling away the dependencies that
arise. You began with "We'll just ignore strictfp" and "Oh, let's not
worry about anything involving the host environment," and now it seems
"Methods don't count, either." With an approach like this, I might
well argue that all birds are crows: We'll just agree not to consider
peacocks, and emus don't fly so they scarcely count, and finches aren't
big enough to worry with, and ... eventually I'll have proved universal
crowhood! ;-)
 
T

taqmcg

I dunno. In Original Java the == operator behaved the same way for
all reference types (it asked "Do these two references denote the same
instance?"), so the results of using == on two Integer references, for
example, could differ depending on how many values the particular JVM
chose to cache. But then they changed the meaning of == for the
primitive wrappers, so that difference went away (and that's why I
mentioned using System.identityHashCode() in the comparison).


It strikes me, though, that you seem to argue Java's platform
independence by progressively whittling away the dependencies that
arise. You began with "We'll just ignore strictfp" and "Oh, let's not
worry about anything involving the host environment," and now it seems
"Methods don't count, either." With an approach like this, I might
well argue that all birds are crows: We'll just agree not to consider
peacocks, and emus don't fly so they scarcely count, and finches aren't
big enough to worry with, and ... eventually I'll have proved universal
crowhood! ;-)

Maybe! I'm not trying to weasel out of this. I think it's useful to understand the limits of the WORM mantra.

Non-strict math is certainly an example where one is allowed to have systemdependent results based solely upon the language of the current JLS. But in terms of the design process for the language, that wasn't because they didn't want to eliminate that dependency. Indeed the original spec precluded non-strict math. It was only when they faced the issue that the spec wasnot feasibly implemented (in the sense that this requirement drastically affected floating point performance on most existing systems) that they changed the spec. And even then they did not generally allow extended precision calculations, which would have potentially affected large numbers of results, but only calculations that used extended ranges in the exponents.

So I think a reasonable question that's not a "who is a real Scotsman?" is "Are there areas where the JLS itself allows a system dependency that was overlooked in the design of the language?" Non-strict math wasn't overlooked. It was explicitly added for feasibility. A second question would be "What system dependencies are explicitly permitted by the JLS?" That would include non-strict math, at least for versions > 1.0.

With autoboxing one now has the same issue with constant Integers fields that we had with String (without involving any methods):

class A
public Integer xxx = 1111;
end

class B
public Integer yyy = 1111;
end

I hadn't checked but would have presumed we had a defined results for

A.xxx == B.yyy

regardless of its magnitude.

But that's not the case: The actual language in the JLS that I see is (5.1..7):
If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. Forother values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. This would allow (but not require) sharing of some or all of these references.

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K.
<<<End quote

So somewhat to my surprise, the result is not well defined, but it's prettyclear that this is not an unconsidered system dependency, but a second case that the designers were fully cognizant of and explicitly decided to permit for expediency but were not especially happy about.

I'd be interested in understanding other areas in the JLS where either the language is unintentionally underspecified, or where there is an explicit deferral to the system.

Regards,
Tom McGlynn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top