Searching a disk-backed Map

Arne Vajhøj · Aug 22, 2009

Roedy said:
I rolled my own something similar. It is not that much code to handle
the random quotations you see on my website.

What you could do in write a class that uses a HashMap internally just
to hold the keys and objects that are offsets in a sequential file.

When you build the Map, you write the objects out with writeUTF or
writeObject, and record the size/offset of the stream before the
write.

Then to lookup in the Map, you look up the key, get the offset, seek
and do a read. You don't even need to know the length.

This is pretty fast, especially when the drive/OS does read caching.
If you wanted to make it even faster, you could put the objects in a
NIO memory mapped file, but that limits your file size. You also
might put the file on fast flash drive.

I would write such a beast to your specs for $50 US.

I think Stefan is capable of writing his own code.

Arne

Arne Vajhøj · Aug 22, 2009

Stefan said:
I have this crazy idea to write applications that are based
on Java SE only and not require any additional library.
I know that this idea might not be very pragmatic or reasonable.

/If/ Derby would finally be included in Java SE, I would
love to use it.

It is since 1.6.

Arne

Arne Vajhøj · Aug 22, 2009

Tom said:
And if you don't believe me - how about Oracle?

http://www.oracle.com/technology/products/berkeley-db/je/index.html

Relational databases are the most sophisticated tool available to the
developer for data storage and analysis. Most persisted object data is
never analyzed using ad-hoc SQL queries; it is usually simply retrieved
and reconstituted as Java objects. The overhead of using a sophisticated
analytical storage engine is wasted on this basic task of object
retrieval. The full analytical power of the relational model is not
required to efficiently persist Java objects. In many cases, it is
unnecessary overhead. In contrast, Berkeley DB Java Edition does not have
the overhead of an ad-hoc query language like SQL, and so does not incur
this penalty.

The result is faster storage, lower CPU and memory requirements, and a
more efficient development process.

That software is freeware; if i was going to implement a disk-backed
map, it's where i'd start.

I am not sure that I agree with the argument.

It very common to:
- do SQL based reporting based on data stored via ORM
- load objects not by id but by criterias on other fields

Arne

Tom Anderson · Aug 25, 2009

I am not sure that I agree with the argument.

It very common to:
- do SQL based reporting based on data stored via ORM
- load objects not by id but by criterias on other fields

Neither of which are required to implement a disk-backed map.

tom

Arne Vajhøj · Aug 29, 2009

Tom said:
Neither of which are required to implement a disk-backed map.

No.

But I was commenting on the big block of text that you chose
not to quote.

When you remove the text people comment on then becomes very
hard to understand the replies.

Arne

In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
a fast malloc/free implementation & benchmarks	0	Mar 20, 2011
PermGen, Class Unloading and Garbage Collection once again	1	Sep 1, 2009
Callbacks and checked exceptions	15	Oct 27, 2008
comp.lang.java.gui FAQ	0	Sep 13, 2006
7.0 wishlist?	321	Oct 29, 2008
How a linker works (continued)	29	Mar 26, 2008
PEP 384: Defining a Stable ABI	0	May 17, 2009

Searching a disk-backed Map

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Tom Anderson

Arne Vajhøj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads