R
Roedy Green
I was curious how new String ( byte[], encoding ) could guess the
correct size of the buffer to convert into String.
It makes an estimate based on number of bytes times the max number of
chars per byte, an attribute of the encoding. This will be slightly
on the high side if there are any multibyte chars, but accurate for
Latin-1. It then decodes, and calls trim to System.arraycopy to get an
char[] the right size. The new String then does another
System.arraycopy.
You leave in your wake the original byte[], two char[] and the string.
Going the other way String -> byte uses similar logic, but the buffer
size is not so fortunate. For UTF-8 it makes the conservative
assumption each char might need 3 bytes, making the buffer 3 times
bigger than it needs to be in the ordinary case.
Sun could optimise could streamline these operations to cut out the
intermediate objects.
Here's an idea. Why not allow strings and char arrays etc to
temporarily be too big. They are logically sized. Only on the next GC
do the objects get pruned to size if need be. You would save a lot of
copying and new object creating just to get arrays the precise correct
size. There would be a method to prune an array to size that just
logically chopped it and marked it for later true pruning. Most of the
time though such objects will soon be discarded, and you then get away
without ever doing the copy.
correct size of the buffer to convert into String.
It makes an estimate based on number of bytes times the max number of
chars per byte, an attribute of the encoding. This will be slightly
on the high side if there are any multibyte chars, but accurate for
Latin-1. It then decodes, and calls trim to System.arraycopy to get an
char[] the right size. The new String then does another
System.arraycopy.
You leave in your wake the original byte[], two char[] and the string.
Going the other way String -> byte uses similar logic, but the buffer
size is not so fortunate. For UTF-8 it makes the conservative
assumption each char might need 3 bytes, making the buffer 3 times
bigger than it needs to be in the ordinary case.
Sun could optimise could streamline these operations to cut out the
intermediate objects.
Here's an idea. Why not allow strings and char arrays etc to
temporarily be too big. They are logically sized. Only on the next GC
do the objects get pruned to size if need be. You would save a lot of
copying and new object creating just to get arrays the precise correct
size. There would be a method to prune an array to size that just
logically chopped it and marked it for later true pruning. Most of the
time though such objects will soon be discarded, and you then get away
without ever doing the copy.