"Virtual memory" framework for Java

Z

zeus

Hi all,
I am dealing with very large in-memory data structures. The data
structures are accessed in different frequencies varying between every
30 seconds and few minutes. My first try was to zip the information to
a file when un-needed, but it caused very high I/O wait on Solaris
machine. What I really want is something like virtual memory - I want
to be oblivious to where portions of the data structure are stored
(in-memory or on disk) whenever I need to access the data, I want it to
be loaded into memory.
Do you know of such a framework for "virtual memory" in Java?

Thanks
 
T

Thomas Hawtin

zeus said:
I am dealing with very large in-memory data structures. The data
structures are accessed in different frequencies varying between every
30 seconds and few minutes. My first try was to zip the information to
a file when un-needed, but it caused very high I/O wait on Solaris
machine. What I really want is something like virtual memory - I want
to be oblivious to where portions of the data structure are stored
(in-memory or on disk) whenever I need to access the data, I want it to
be loaded into memory.
Do you know of such a framework for "virtual memory" in Java?

Use java.nio to memory map a file.

Be careful what you map though. If the file goes away or truncates you
can get an asynchronous exception (InternalError), and on Linux the JVM
actually crashes.

Tom Hawtin
 
I

Ingo R. Homann

Hi,

Thomas said:
Use java.nio to memory map a file.

Wich class of the package do you mean exactly? I find nothing apropriate.

My idea would be to use the java.lang.ref-Package to achieve some kind
of virtual memory.

Ciao,
Ingo
 
Z

zeus

Thomas said:
Use java.nio to memory map a file.

Be careful what you map though. If the file goes away or truncates you
can get an asynchronous exception (InternalError), and on Linux the JVM
actually crashes.

Tom Hawtin

Can you elaborate a little bit about the utilization of the memory
mapped file for virtual memory as I described?
 
I

Ingo R. Homann

Hi zeus,
Can you elaborate a little bit about the utilization of the memory
mapped file for virtual memory as I described?

Again: I'm not sure if the MappedByteBuffer is apropriate for what you
want to do, since AFAICS it only maps bytes (and other native types) and
not complete Objects!

I think, something like java.lang.ref will be better since you are able
to de-/serialize complete Objects using this.

Ciao,
Ingo
 
A

Antti S. Brax

Again: I'm not sure if the MappedByteBuffer is apropriate for what you
want to do, since AFAICS it only maps bytes (and other native types) and
not complete Objects!

Objects can be serialized to bytes.
 
A

Antti S. Brax

I am dealing with very large in-memory data structures.

What does very large mean?
The data
structures are accessed in different frequencies varying between every
30 seconds and few minutes. My first try was to zip the information to
a file when un-needed, but it caused very high I/O wait on Solaris
machine.

Did it cause _IO wait_ or wait on the _compression algorithm_
used in ZIP files? Do you have several ZIP files or one large?
How did you organize the ZIP file? Did you try different
directory hierarchies? Have you tried plain files instead of
ZIP files?

You are not giving us enough information to help you
efficiently.
 
I

Ingo R. Homann

Hi Antti,
Objects can be serialized to bytes.

OK, but isn't this overhead? When using a MappedByteBuffer, *everytime*
you want to access an Object, it must be deserialized, and not only,
when it is stored to disk!

Ciao,
Ingo
 
A

Antti S. Brax

Hi Antti,


OK, but isn't this overhead? When using a MappedByteBuffer, *everytime*
you want to access an Object, it must be deserialized, and not only,
when it is stored to disk!

Of course it is. But I was just pointing out an error in your
statement. Anyway this discussion is useless since the original
poster did not specify that he is serializing objects (his use
of ZIP files indicates against it).
 
I

Ingo R. Homann

Hi,
Of course it is. But I was just pointing out an error in your
statement.

I meant it is not adequate. I did not mean, that it is impossible.

(-; Note, that we are both "Erbsenzähler" (*), as we say in germany. ;-)
Anyway this discussion is useless since the original
poster did not specify that he is serializing objects (his use
of ZIP files indicates against it).

Well, I guess, zipping a serialized Object is more straightforward than
zipping a non-serialized Object which must be serialized before
zipping... :)

Anyway you are right, the discussion is useless since the OP does not
tell us a bit more about his problem...

Ciao,
Ingo

(*) That is a nice word for "Korinthenkacker" which (as
http://dict.leo.org says) means "nitpicker", which - in my hears -
sounds not so nice... or am I wrong? What do you say in Finland?
 
Z

zeus

A lot to comment on.
1. When I say huge data structures I mean a lot of varying sized data
structures from very small to few megabytes in the larger. When I'm
saying a lot I refer to 1000s of such separate structures.
2. I don't see how weak (or other) references can help in this case.
For as long as they have reference they will remain in memory and when
they lose reference then anyway they will be removed (by gc). What I
want is to load and unload from memory as needed.
3. When I say some are accessed frequently I mean every 30 seconds, and
this applies the small data structures mainly. The larger structures
are accessed every 10-15 minutes.
4. I tried the zipping on the larger data structures which are less
frequently accessed and it caused the OS to stuck on IO wait when
zipping and unzipping (gzip input/output stream)
5. Serialization and deserialization are very expensive, however is
there any clean way to avoid them?
 
I

Ingo R. Homann

Hi zeus,
A lot to comment on.
1. When I say huge data structures I mean a lot of varying sized data
structures from very small to few megabytes in the larger. When I'm
saying a lot I refer to 1000s of such separate structures.

OK. A "good" solution should be able to deal with this.
2. I don't see how weak (or other) references can help in this case.
For as long as they have reference they will remain in memory and when
they lose reference then anyway they will be removed (by gc). What I
want is to load and unload from memory as needed.

Well, AFAIK there *are* possibilities to get informed when a WeakRef
will be 'removed'. IIRC, ReferenceQueue does this (but working with
finalize (which works much better than it is said to) may work as well).
So, when a the WeakRef is removed, you store the object to disk, and
when someone tries to access it afterwards, you reload it. I think, this
should work!
3. When I say some are accessed frequently I mean every 30 seconds, and
this applies the small data structures mainly. The larger structures
are accessed every 10-15 minutes.

It should be no greater problem to "count" how often/frequent an object
is accessed and to primarily suspend objects to disk that are accessed
in a low frequency.
4. I tried the zipping on the larger data structures which are less
frequently accessed and it caused the OS to stuck on IO wait when
zipping and unzipping (gzip input/output stream)

That problem should not occur when using this self-implemented solution.
5. Serialization and deserialization are very expensive, however is
there any clean way to avoid them?

You can implement your own methods for doing this, but I cannot say in
how far this is faster that the java-build-in serialisation.

Ciao,
Ingo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top