advice on loading and searching large map in memory

T

Tom Anderson

Does it make more sense to repeatedly query small repeatable numbers of
parameters rather than an arbitrary number of parameters because of the
saving on not having to re-compile the prepared statement?

That's the thinking.
In relation to the cached: the size of the cache would be 1.5GB

Certainly small enough to consider keeping it all in memory, but big
enough not to do it without making sure it was a good idea. That would be
1.5 GB you then can't use for anything else.

Since your keys and query patterns are simple, you might consider a NoSQL
key-value store of some sort, like Tokyo Cabinet. Let the filesystem cache
be your cache.

tom
 
E

eunever32

You say, "Right," then propose to go in the opposite direction.  Interesting ...

Thanks to everyone who replied.

I tested a relational query with where key in (a, b, c, ... x1000)

The system responded in less than 1 second. So that's an acceptable
response time.
Because of the the speedy response I see no need to introduce a web
service with a memory cache

Thanks everyone for taking the time to reply.

Cheers.
 
R

Roedy Green

Can people recommend an approach?

You might want to see if there is a SQL engine that does this for you.

When the data does not change, it is possible to construct perfect
hash lookups that have no collisions, i.e. are very fast.

It sounds like you may be doing a relatively simple lookup, one that
does not require the full power of a database. You might be able to
take advantage of specific features of your lookup, e.g. compressing
the payload which is not needed for lookup.

Your DBS engine will already be multithread. Your part should be too.
Perhaps just throwing cores and RAM at it may be the cheapest
solution.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
..
 
J

Jim Janney

Hi

We have a requirement to query across two disparate systems. Both
systems are read-only so no need for updates and once loaded and no
need to check for updates. I would plan to reload the data afresh each
day. Records on both systems map one-one and each has 7million
records.

The first system is legacy and I am reluctant to redevelop (C code).
The second is standard Java/tomcat/SQL

The non-relational query can return up to 1000 records.

This could therefore result in 1000 queries to the relational system
(just one table) before returning to the user.

To avoid 1000 relational queries I was planning to "cache" the entire
relational table in memory. I was planning to have a web service which
would load the entire relational table into memory. The web service,
running in a separate tomcat could then be queried 1000 times or maybe
get a single request with 1000 values and return all results in one
go. Having a separate tomcat process would help to isolate any memory
issues eg JVM heap size.

Can people recommend an approach?

Because the entire set of records would always be in memory does that
make using something like ehcache pointless?

Issues I would anticipate:
time to load 7m records each morning
memory issues
best Java collection to hold the map (HashMap?) The map would be
(int, int) -> Object
Any suggestions regarding specialized cache utility eg EhCache

Thanks in advance.

I'm late to this and I see you've already found a better solution, but
for future reference I will mention that

(int, int) -> Object

can be implemented as

long -> Object

and that

http://trove4j.sourceforge.net/

includes a TLongObjectHashMap that looks promising (I haven't actually
tried it).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top