Try using a direct byte buffer which has been pre-allocated. Direct
buffers are allocated outside the Java heap (using malloc()?), so the
allocation cost is high. They only really provide a performance boost
when re-used.
That of course depends on the usage scenario, e.g. the frequency of
allocation etc. If you serve long lasting connections the cost of
allocating and freeing a DirectByteBuffer is negligible and other
reasons may gain more weight in the decision to use a direct or heap
buffer (e.g. whether access to the byte[] can make things faster as I
assume is the case in my decoding tests posted upthread).
Also, if you dig into the internals you will find that a heap buffer
reading a file or socket will copy bytes from a direct buffer anyway,
and that Java does its own internal pooling/re-allocation of direct
buffers.
Can you point me to more information about this? Or are you referring
to OpenJDK's source code?
Often benchmarks will say heap buffers are faster, because they
allocate buffer then read some data then allow buffer to be garbage
collected.
I think heap byte buffers were faster in my tests (see upthread) not
because of allocation and GC (this was not included in the time
measurement) but rather because data would cross the boundary between
non Java heap memory (where they arrive from the OS) to Java heap more
infrequently because of the larger batches. If you have to fetch
individual bytes from a ByteBuffer off Java heap you have to make the
transition much more frequent.
In the heap buffer case, the internal direct buffer pool
is being used. In the direct buffer case, a new one is being
allocated each time, which is slow.
Can you point me to writing about that internal byte buffer pool in the
JRE? I could not find anything.
I may be wrong but... are the byte get()/set() calls not trapped by
some compiler intrinsics and optimized away?
DirectByteBuffer.get() contains a native call to fetch the byte - and I
don't think the JIT will optimize away native calls. The JRE just does
not have any insights into what JNI calls do.
Allocation costs
and memory copying need to be avoided as much as possible.
While I agree with that general tendency of the statement ("allocation
costs") I believe nowadays one needs to be very careful with these
statements. For example, if you share immutable data structures across
threads which require copying during manipulation that allocation and GC
cost may very well be smaller than the cost of locking in a more
traditional approach. The correct answer is "it depends" all to often -
which is disappointing but eventually more helpful.
That said, zero copy IO is still largely a myth.
I guess, the best you can get is reading from a memory mapped file and
writing those bytes directly to another channel, i.e. without those
bytes needing to enter the Java heap. Of course there is just a limited
set of use cases that fit this model.
Cheers
robert