Implementing high-performance cache in Java

  • Thread starter skvarenina.peter
  • Start date
S

skvarenina.peter

Hello,

I would like to ask for your expert opinion regarding high-performance
cache implementation in Java.

My goal is to create a cache implementation that would handle large
amount of concurrent reads (preferably without any locking) and can
handle write concurrency in the best possible way. The updating of the
records in the cache is of no concern; all updates are versioned and
therefore represent separate records in the cache.

I noticed the properties of the JDK's ConcurrentHashMap which seem to
fit the objective very well - the class supports unrestricted
retrieval concurrency without locking and adjustable probabilistic non-
blocking writing concurrency, which would be a good basis of a cache.
This might be also done by an own structure utilizing the read-copy-
update approach.

The cache removal algorithm could be based on a concurrent priority
queue implementation, either skip-list or heap based, or even some
simple threshold algorithm. The priorities would be assigned according
to a chosen algorithm (FIFO/LRU/LFU etc.). There is of course a
problem with the cache memory management - it is not easy to estimate
the Java class instance memory size and the exact time when a record
should be removed if the cache is memory bound. Of course, the use of
soft references is possible, however there is no possibility to
guarantee the removal of records according to the cache retention
algorithm. One possibility is to split the cache content (e.g. into
two halves/according to the usage threshold) where the records with
higher priorities (i.e. the ones that should be retained) would be
referenced by hard references whereas the ones with lower priorities
would be referenced by soft references, while allowing promotion/
demotion to/from each part according to the access statistics. The
hard referenced part will still need to be restricted in size by some
estimate of the record size (e.g. max record size). Unfortunately, I
don't have the luxury to precompute the size of a Java class instance
in memory in advance from marshaled state.

My question is how such a cache should be constructed? Do you think
the proposed ideas are worth exploring? What is the current state of
art of cache implementation?

I would love to hear your opinion and will be very thankful for any
references to papers pertaining to the problems outlined here!

Thank you and kind regards,
Peter Skvarenina
 
A

Arne Vajhøj

I would like to ask for your expert opinion regarding high-performance
cache implementation in Java.

My goal is to create a cache implementation that would handle large
amount of concurrent reads (preferably without any locking) and can
handle write concurrency in the best possible way. The updating of the
records in the cache is of no concern; all updates are versioned and
therefore represent separate records in the cache.

I noticed the properties of the JDK's ConcurrentHashMap which seem to
fit the objective very well - the class supports unrestricted
retrieval concurrency without locking and adjustable probabilistic non-
blocking writing concurrency, which would be a good basis of a cache.
This might be also done by an own structure utilizing the read-copy-
update approach.

The cache removal algorithm could be based on a concurrent priority
queue implementation, either skip-list or heap based, or even some
simple threshold algorithm. The priorities would be assigned according
to a chosen algorithm (FIFO/LRU/LFU etc.). There is of course a
problem with the cache memory management - it is not easy to estimate
the Java class instance memory size and the exact time when a record
should be removed if the cache is memory bound. Of course, the use of
soft references is possible, however there is no possibility to
guarantee the removal of records according to the cache retention
algorithm. One possibility is to split the cache content (e.g. into
two halves/according to the usage threshold) where the records with
higher priorities (i.e. the ones that should be retained) would be
referenced by hard references whereas the ones with lower priorities
would be referenced by soft references, while allowing promotion/
demotion to/from each part according to the access statistics. The
hard referenced part will still need to be restricted in size by some
estimate of the record size (e.g. max record size). Unfortunately, I
don't have the luxury to precompute the size of a Java class instance
in memory in advance from marshaled state.

My question is how such a cache should be constructed? Do you think
the proposed ideas are worth exploring? What is the current state of
art of cache implementation?

I would love to hear your opinion and will be very thankful for any
references to papers pertaining to the problems outlined here!

I think you should start by looking at existing Java cache products:
oscache
ehcache
JBOSS Cache
etc.

And then look at JSR 107, which is supposed to end up with a standard
for cache in Java (note that it is not a fast moving standard !!).

Arne
 
S

skvarenina.peter

Many thanks Arne & Mark!

I guess the easiest way is to explore the existing open source caches
and see if what inspiring ideas are there for my particular case. I
was looking at cache4j and ehcache prior to writing my question; the
ConcurrentHashMap alone with soft references was about the same
performance as cache4j in my testing scenario.

Best regards,
Peter Skvarenina
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top