advice on loading and searching large map in memory

Tom Anderson · Feb 21, 2011

Does it make more sense to repeatedly query small repeatable numbers of
parameters rather than an arbitrary number of parameters because of the
saving on not having to re-compile the prepared statement?

That's the thinking.

In relation to the cached: the size of the cache would be 1.5GB

Certainly small enough to consider keeping it all in memory, but big
enough not to do it without making sure it was a good idea. That would be
1.5 GB you then can't use for anything else.

Since your keys and query patterns are simple, you might consider a NoSQL
key-value store of some sort, like Tokyo Cabinet. Let the filesystem cache
be your cache.

tom

Lew · Feb 21, 2011

.. . .

That's not a huge amount of data. Have you thought of creating a
cache and loading it all into a single memory store?

eunever32 · Feb 22, 2011

You say, "Right," then propose to go in the opposite direction. Interesting ...

Thanks to everyone who replied.

I tested a relational query with where key in (a, b, c, ... x1000)

The system responded in less than 1 second. So that's an acceptable
response time.
Because of the the speedy response I see no need to introduce a web
service with a memory cache

Thanks everyone for taking the time to reply.

Cheers.

Roedy Green · Feb 23, 2011

Can people recommend an approach?

You might want to see if there is a SQL engine that does this for you.

When the data does not change, it is possible to construct perfect
hash lookups that have no collisions, i.e. are very fast.

It sounds like you may be doing a relatively simple lookup, one that
does not require the full power of a database. You might be able to
take advantage of specific features of your lookup, e.g. compressing
the payload which is not needed for lookup.

Your DBS engine will already be multithread. Your part should be too.
Perhaps just throwing cores and RAM at it may be the cheapest
solution.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
..

Jim Janney · Feb 23, 2011

[email protected] said:
Hi

We have a requirement to query across two disparate systems. Both
systems are read-only so no need for updates and once loaded and no
need to check for updates. I would plan to reload the data afresh each
day. Records on both systems map one-one and each has 7million
records.

The first system is legacy and I am reluctant to redevelop (C code).
The second is standard Java/tomcat/SQL

The non-relational query can return up to 1000 records.

This could therefore result in 1000 queries to the relational system
(just one table) before returning to the user.

To avoid 1000 relational queries I was planning to "cache" the entire
relational table in memory. I was planning to have a web service which
would load the entire relational table into memory. The web service,
running in a separate tomcat could then be queried 1000 times or maybe
get a single request with 1000 values and return all results in one
go. Having a separate tomcat process would help to isolate any memory
issues eg JVM heap size.

Can people recommend an approach?

Because the entire set of records would always be in memory does that
make using something like ehcache pointless?

Issues I would anticipate:
time to load 7m records each morning
memory issues
best Java collection to hold the map (HashMap?) The map would be
(int, int) -> Object
Any suggestions regarding specialized cache utility eg EhCache

Thanks in advance.

I'm late to this and I see you've already found a better solution, but
for future reference I will mention that

(int, int) -> Object

can be implemented as

long -> Object

and that

http://trove4j.sourceforge.net/

includes a TLongObjectHashMap that looks promising (I haven't actually
tried it).

advice needed on maintaining large collections of unique ids	15	Jun 23, 2009
advice/thoughts on garbage collection?	20	Feb 13, 2009
Slow loading of large in-memory tables	1	Sep 7, 2004
Advice needed on best way of persisting and changing web.xml valuesfor service continuity	0	Mar 4, 2008
Call for Papers Reminder: International Conference on ComputerScience and Applications ICCSA 2012	0	Jun 13, 2012
Need advice on finding memory leak	1	Oct 16, 2005
loading a treeview in asp.net	1	Dec 13, 2007
dealing with large csv files	5	Nov 30, 2008

advice on loading and searching large map in memory

Tom Anderson

Lew

eunever32

Roedy Green

Jim Janney

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads