Using a certain percentage of the JVM storage

Stefan Ram · Sep 17, 2008

An appliation should use a certain percentage (approximately)
of the data storage area of its JVM for output file buffers.
If the buffers ever hold more data than this, they are to be
flushed to their respective files. The buffer size should not
be fixed, so that when the user calls the JVM on the command
line with a large maximum memory size, the application will
use it, or when a small memory size is specified, the
application will still run.

Is there a best practice to achieve this?

Stefan Ram · Sep 18, 2008

Eric Sosman said:
from totalMemory() to estimate "memory in use," and subtract that
from maxMemory() to estimate "memory potentially available." Deduct
a fudge factor for classes yet to be loaded and objects yet to be

Thanks, I wonder whether it would make sense to use file
buffers which are linked only weakly. I have no experience
with weak references. But it would help if I would know that I
can rely on their finalizer being called when they are are
disposed. Then I could flush the buffers to their files
whenever one of them is being disposed by the memory reclaimer
or when the application ends.

Stefan Ram · Sep 18, 2008

Eric Sosman said:
contemplating the forest? What is your purpose in postponing
the actual writes? Are you, maybe, bouncing around in a big

I contemplate a kind of blog or CMS software with »fine tags«.
So there will be thousands of different tags, like for example
»( isabout Java )AND( isa tutorial )«.

All blog entries are to be appended to their corresponding
list (table, category). The above example will be appended to
two lists: The sublist of tutorials within the list of works about
Java and the sublist of works about Java within the list of
tutorials. By a kind of »combinatorial explosion« for more
tags, there will be even more target lists.

I might start holding the many target lists in memory, and
eventually write them out to HTML documents. But when there
are many entries, the lists will not fit in memory anymore.
So I will have to use files for the lists.

Now I see thousand little files. For each entry, the program
will open a dozen of these files to append something to them.
Therefore, I imagine that the program might spend a lot of
time seeking to the end of files. I thought, I might speed
this up by buffering as many files in memory as possible.

It would not hurt, if weakly referenced buffers will not be
reclaimed during the lifetime of the program, because I can
flush all of them when the program is terminating. I just need
to be be sure that my finalizer will be called just before the
object becomes unreachable, i.e., just before the weak
reference is removed by the memory reclaimer. This way, every
buffer would be flushed either by its finalizer or by the
flush method at the end of the program.

Tom Anderson · Sep 18, 2008

It is the idiomatic pain of Java that programmers have to pay very close
attention to resource release.

I wonder if the solution is thus not to do the buffering in java. The OS
does buffering, right? And it's in a very good position to balance
buffering against RAM demand - i know that doing that well was something
that made FreeBSD fast back in the 90s, and i assume everyone does it now.

So, how about doing a minimal amount of buffering on the java side (just
up to a few KB) and then just writing everything, and letting the OS
handle the buffering?

Alternatively, how about using memory-mapped files? I have absolutely no
idea how those behave in terms of buffering.

tom

John W Kennedy · Sep 18, 2008

Tom said:
I wonder if the solution is thus not to do the buffering in java. The OS
does buffering, right? And it's in a very good position to balance
buffering against RAM demand - i know that doing that well was something
that made FreeBSD fast back in the 90s, and i assume everyone does it now.

So, how about doing a minimal amount of buffering on the java side (just
up to a few KB) and then just writing everything, and letting the OS
handle the buffering?

Bad idea. From a portability viewpoint, you don't know how well or how
much the OS buffers, and, from a performance viewpoint, you're making
far more trips through the dispatcher than you otherwise would.

And, pragmatically, I can tell you that you get something like a 9,000%
performance degradation. Mapped files, Buffered streams or
readers/writers, and hand-made buffers are all much better than relying
on OS buffers.
--
John W. Kennedy
"The whole modern world has divided itself into Conservatives and
Progressives. The business of Progressives is to go on making mistakes.
The business of the Conservatives is to prevent the mistakes from being
corrected."
-- G. K. Chesterton

Tom Anderson · Sep 19, 2008

Bad idea. From a portability viewpoint, you don't know how well or how much
the OS buffers,

Realistically, how much does that vary?

and, from a performance viewpoint, you're making far more trips through
the dispatcher than you otherwise would.

True. But you're also using less memory.

And, pragmatically, I can tell you that you get something like a 9,000%
performance degradation.

A remarkably specific number!

I wasn't advocating no buffering, for precisely that reason. Using a
buffer of, say, 8 kB would mean you weren't doing a syscall every time you
wrote a byte. But it would also mean you weren't using tons of memory. 8
kB might not be the right size - but i can't believe that arbitrarily
large buffers are the right solution here.

Mapped files, Buffered streams or readers/writers, and hand-made buffers
are all much better than relying on OS buffers.

Even in the context of Stefan's problem?

tom

John W Kennedy · Sep 19, 2008

Tom said:
Realistically, how much does that vary?

True. But you're also using less memory.

A remarkably specific number!

What part of "something like" didn't you understand?

I wasn't advocating no buffering, for precisely that reason. Using a
buffer of, say, 8 kB would mean you weren't doing a syscall every time
you wrote a byte. But it would also mean you weren't using tons of
memory. 8 kB might not be the right size - but i can't believe that
arbitrarily large buffers are the right solution here.

Even in the context of Stefan's problem?

At least one of them is.

However, the /real/ solution to his problem may be to use a database. In
almost 100 out of 99 cases, thats the correct answer.

EJP · Sep 19, 2008

Stefan said:
Therefore, I imagine that the program might spend a lot of
time seeking to the end of files.

Why? Opening a file for append isn't a time-consuming operation, and
specifically it isn't O(size of the file) - there is no actual 'seeking
to the end of the file' at all.

> I thought, I might speed
this up by buffering as many files in memory as possible.

You will. But the benefits of buffering are hyperbolic. There's a huge
benefit from a buffer of two bytes, as you're cutting out 50% of the
system calls. Doubling that saves you another 25%. Doubling that,
another 12.5%. And so forth. You can work out for yourself where the
returns have diminished asymptotically to zero. It isn't far away. I
usually use buffers of 8192 myself, which is also the default for
BufferedOutputStream, but I have a suspicion this is twice or even 8
times as big as necessary. This sort of thing used to depend on the disk
cluster size, but intelligent controllers have made that sort of
consideration redundant.

But actually I agree with the poster who said this is a job for a
database. A CMS actually.

Tom Anderson · Sep 19, 2008

What part of "something like" didn't you understand?

Any of it, actually. Does that mean +/- 10%? Same order of magnitude? Any
string of digits ending with a percent sign?

I don't see how you can possibly put a number on the general case of
'using a smaller buffer', when we haven't defined how big the original
buffer is, how big the smaller buffer is, what the usage pattern is, what
the OS is, or any one of a dozen other factors that will determine how
fast the two approaches work. I can well believe that the 9000% comes from
a specific case or family of cases you've looked at, but i don't see how
you can generalise that to all cases.

tom

Arne Vajhøj · Oct 12, 2008

John said:
What part of "something like" didn't you understand?

I find it remarkable even with "something like".

"something like" does not mean "randomly chosen" - it means
"in the area of".

Arne

Arne Vajhøj · Oct 12, 2008

John said:
And, pragmatically, I can tell you that you get something like a 9,000%
performance degradation. Mapped files, Buffered streams or
readers/writers, and hand-made buffers are all much better than relying
on OS buffers.

It is nonsense to expect a single number across all apps
and operating systems.

Some operating systems are actually very good at caching others
are very bad.

Arne

Arne Vajhøj · Oct 12, 2008

Stefan said:
An appliation should use a certain percentage (approximately)
of the data storage area of its JVM for output file buffers.
If the buffers ever hold more data than this, they are to be
flushed to their respective files. The buffer size should not
be fixed, so that when the user calls the JVM on the command
line with a large maximum memory size, the application will
use it, or when a small memory size is specified, the
application will still run.

Is there a best practice to achieve this?

It will be a bit complex code, but I can not see any
problem doing it.

You can get maxMemory. You can allocate some buffers
(byte arrays) of that size. You write to those buffers
instead of to file. When attempting to write to a full
buffer then data gets flushed to disk. Just like almost
any other cache in the universe.

I assume that you can live with a crash of the
app losing the data in the buffer !

Arne

John W Kennedy · Oct 12, 2008

Arne said:
It is nonsense to expect a single number across all apps
and operating systems.

Some operating systems are actually very good at caching others
are very bad.

There's more involved than caching. In many cases, the actual caching is
fine, but there is a terrible price to pay in additional trips through
the dispatcher.

But this thread is over a month old, anyway....

--
John W. Kennedy
"There are those who argue that everything breaks even in this old
dump of a world of ours. I suppose these ginks who argue that way hold
that because the rich man gets ice in the summer and the poor man gets
it in the winter things are breaking even for both. Maybe so, but I'll
swear I can't see it that way."
-- The last words of Bat Masterson

How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
JVM/Java memory footprint	16	Jan 29, 2007
Monitoring the performance of a JVM	1	Jan 7, 2005
Is a byte data type really a 32-bit int in the JVM?	18	Feb 3, 2008
Sun's 1.5 JVM: the -XX:DefaultMaxRAM option and its cousins	0	Dec 12, 2006
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023

Using a certain percentage of the JVM storage

Stefan Ram

Stefan Ram

Stefan Ram

Tom Anderson

John W Kennedy

Tom Anderson

John W Kennedy

EJP

Tom Anderson

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

John W Kennedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads