C
Chris
I posted this a couple of days ago, but got no responses and it didn't show
up in Google groups, so it might have gotten lost along the way. If not,
sorry for the redundancy...
I need extremely fast random access to an extremely large file. The file
could be 100 gigabytes in size; the machine will have at most 2 or 4 gb of
memory. Certain parts of the file will be accessed more frequently than
others. The question is, what is the most efficient way to handle caching?
I could organize the file in pages, and write some kind of LRU cache to hold
the most active pages in JVM memory. I'm thinking, though, that it might be
simpler and more efficient to let the operating system handle it through
normal disk caching.
The advantages of this scheme will be 1) I won't have to write code, 2) the
cache memory will be owned by the operating system and not the JVM, which
means the user won't have to worry about memory configuration, 3) the
operating system can release disk cache memory when it's needed for some
other process, and 4) the operating system's disk caching algorithms are
probably much smarter than anything I could write.
The disadvantages are 1) accessing parts of the file still has to go through
the Java IO code, which will be slow compared to direct memory access, and
2) I don't know if this really works.
Thoughts?
Also -- does anyone know how Linux allocates memory to the disk cache? Does
it use all available memory? Or is there some way to configure it?
up in Google groups, so it might have gotten lost along the way. If not,
sorry for the redundancy...
I need extremely fast random access to an extremely large file. The file
could be 100 gigabytes in size; the machine will have at most 2 or 4 gb of
memory. Certain parts of the file will be accessed more frequently than
others. The question is, what is the most efficient way to handle caching?
I could organize the file in pages, and write some kind of LRU cache to hold
the most active pages in JVM memory. I'm thinking, though, that it might be
simpler and more efficient to let the operating system handle it through
normal disk caching.
The advantages of this scheme will be 1) I won't have to write code, 2) the
cache memory will be owned by the operating system and not the JVM, which
means the user won't have to worry about memory configuration, 3) the
operating system can release disk cache memory when it's needed for some
other process, and 4) the operating system's disk caching algorithms are
probably much smarter than anything I could write.
The disadvantages are 1) accessing parts of the file still has to go through
the Java IO code, which will be slow compared to direct memory access, and
2) I don't know if this really works.
Thoughts?
Also -- does anyone know how Linux allocates memory to the disk cache? Does
it use all available memory? Or is there some way to configure it?