memory and speed

Z

zwasdl

Hi friends,
I have an object, IntVector (see code below), with similar
functionality as Vector, but only handles int to save resource/memory
and simplify operation.
The size of the IntVector could grow very big, such as 50000,
although it's not always so big.
Please comment my code regarding the speed and memory usage, and any
suggestion is welcomed:
1. overall structure: is there a better way to model it?
2. doubleSize(): I don't want it to run too often since it need a
complete copy of the content. But if I set a big n, it may take too
much memory not used
3. How do I prompt java to release memory in the old iArray after size
doubled? I know I can't control garbage collection, but is there any
way to make it release memory faster?

Sometimes my code takes over 1GB memory, and lots of time is spending
on memory allocatiion and GC. so any improvement may help.

Your help is deeply appreciated,

Wei



class IntVector {
private int[] iArray;
private int size;
private int limit;

public IntVector() {
limit=8;
iArray = new int[limit];
size=0;
}

public boolean add(int a) {
// if size out of limit,
// create new int iArray with double the limit
// and copy the original elements into new iArray
if (size==limit) doubleSize();

iArray[size++]=a;

return true;
}

private void doubleSize() {
int n=2;
int[] newArray = new int[limit*n];
limit*=n;
System.arraycopy(iArray,0,newArray,0,size);
iArray=newArray;
}
.......
}
 
E

Eric Sosman

Hi friends,
I have an object, IntVector (see code below), with similar
functionality as Vector, but only handles int to save resource/memory
and simplify operation.
The size of the IntVector could grow very big, such as 50000,
although it's not always so big.
Please comment my code regarding the speed and memory usage, and any
suggestion is welcomed:
1. overall structure: is there a better way to model it?

Instead of keeping all the values in one array whose
size doubles as needed (which requires copying all the old
values each time), you might consider keeping an array of
array references. The first array would hold elements 0-7,
the next would hold the sixteen elements 8-23, the next
would hold thirty-two elements 24-55, and so on. This makes
access by index a little harder (but the arithmetic isn't
hard, because of the regular pattern), but saves a lot of
copying. As things now stand, by the time your IntVector has
grown to 50000 values it has copied 8+16+...+32768=65528 values.

An even simpler approach might be to add another constructor
that accepts an "initial capacity" argument, and/or to add an
ensureCapacity() method. With these, a caller that had a
reasonably good idea of how large a particular IntVector would
grow could communicate that knowledge to the class and thereby
avoid a lot of the doubling and copying.
2. doubleSize(): I don't want it to run too often since it need a
complete copy of the content. But if I set a big n, it may take too
much memory not used

You need to study the actual size distributions to
decide whether to start out larger, grow by more than a
factor of two, or do something else. Something as simple
as an array of counters would do it, where count[n] is the
number of IntVectors that doubled n or more times (hence
count[n+1]-count[n] is the number that doubled exactly n
times.) In doubleSize() you'd use the old limit value
to determine n and then increment the appropriate counter.
3. How do I prompt java to release memory in the old iArray after size
doubled? I know I can't control garbage collection, but is there any
way to make it release memory faster?

The old array becomes eligible for garbage collection as
soon as you execute `iArray = newArray'. Since the old array
must still exist in the arraycopy() call immediately before
that, I don't see how you can expect to get rid of it any
sooner.

As for when "eligible for collection" turns to "collected,"
you can't do anything very useful about that. That is, you
can (probably) do something about it, but it won't be very
useful! The JVM will run GC when it determines that memory is
becoming tight, and running GC before that time -- when memory
is plentiful -- is seldom a good idea. (There are exceptions,
but they are rare and becoming rarer as GC technology grows
more sophisticated.) Usually, the JVM has a much better notion
of the "memory pressure" than you can have, and is in a better
position to make decisions about managing it.
Sometimes my code takes over 1GB memory, and lots of time is spending
on memory allocatiion and GC. so any improvement may help.

Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?

As for the time, I understand how you can estimate the
time taken by GC but I'm unfamiliar with measuring how
much is used by memory allocation. Again, some instrumentation
around the doublings would probably be illuminating.
Your help is deeply appreciated,

Wei



class IntVector {
private int[] iArray;
private int size;
private int limit;

public IntVector() {
limit=8;
iArray = new int[limit];
size=0;
}

public boolean add(int a) {
// if size out of limit,
// create new int iArray with double the limit
// and copy the original elements into new iArray
if (size==limit) doubleSize();

iArray[size++]=a;

return true;
}

private void doubleSize() {
int n=2;
int[] newArray = new int[limit*n];
limit*=n;
System.arraycopy(iArray,0,newArray,0,size);
iArray=newArray;
}
......
}
 
Z

zwasdl

Eric, Thanks for the reply. Pls see my follow-ups below.

Eric said:
Instead of keeping all the values in one array whose
size doubles as needed (which requires copying all the old
values each time), you might consider keeping an array of
array references. The first array would hold elements 0-7,
the next would hold the sixteen elements 8-23, the next
would hold thirty-two elements 24-55, and so on. This makes
access by index a little harder (but the arithmetic isn't
hard, because of the regular pattern), but saves a lot of
copying. As things now stand, by the time your IntVector has
grown to 50000 values it has copied 8+16+...+32768=65528 values.

I know it's not efficient to copy the values, so your suggestion is
good on that sense, but I need to frequently get those values. Your
approach makes get, search and indexing much harder as I understand.

An even simpler approach might be to add another constructor
that accepts an "initial capacity" argument, and/or to add an
ensureCapacity() method. With these, a caller that had a
reasonably good idea of how large a particular IntVector would
grow could communicate that knowledge to the class and thereby
avoid a lot of the doubling and copying.

I thought about that, but senarios may vary, so I'll be hesitated to
hard code it in caller.
2. doubleSize(): I don't want it to run too often since it need a
complete copy of the content. But if I set a big n, it may take too
much memory not used

You need to study the actual size distributions to
decide whether to start out larger, grow by more than a
factor of two, or do something else. Something as simple
as an array of counters would do it, where count[n] is the
number of IntVectors that doubled n or more times (hence
count[n+1]-count[n] is the number that doubled exactly n
times.) In doubleSize() you'd use the old limit value
to determine n and then increment the appropriate counter.
3. How do I prompt java to release memory in the old iArray after size
doubled? I know I can't control garbage collection, but is there any
way to make it release memory faster?

The old array becomes eligible for garbage collection as
soon as you execute `iArray = newArray'. Since the old array
must still exist in the arraycopy() call immediately before
that, I don't see how you can expect to get rid of it any
sooner.

As for when "eligible for collection" turns to "collected,"
you can't do anything very useful about that. That is, you
can (probably) do something about it, but it won't be very
useful! The JVM will run GC when it determines that memory is
becoming tight, and running GC before that time -- when memory
is plentiful -- is seldom a good idea. (There are exceptions,
but they are rare and becoming rarer as GC technology grows
more sophisticated.) Usually, the JVM has a much better notion
of the "memory pressure" than you can have, and is in a better
position to make decisions about managing it.
Sometimes my code takes over 1GB memory, and lots of time is spending
on memory allocatiion and GC. so any improvement may help.

Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?

This is just the size of one object, and I'll have 2D array
[150][40000] of the objects.
and I have a few of those arrays.
As for the time, I understand how you can estimate the
time taken by GC but I'm unfamiliar with measuring how
much is used by memory allocation. Again, some instrumentation
around the doublings would probably be illuminating.

I didn't estimate GC time, I just watch task manager, and see the
memory grows and drop, and CPU is mostly idle.
Your help is deeply appreciated,

Wei



class IntVector {
private int[] iArray;
private int size;
private int limit;

public IntVector() {
limit=8;
iArray = new int[limit];
size=0;
}

public boolean add(int a) {
// if size out of limit,
// create new int iArray with double the limit
// and copy the original elements into new iArray
if (size==limit) doubleSize();

iArray[size++]=a;

return true;
}

private void doubleSize() {
int n=2;
int[] newArray = new int[limit*n];
limit*=n;
System.arraycopy(iArray,0,newArray,0,size);
iArray=newArray;
}
......
}
 
E

Eric Sosman

Eric, Thanks for the reply. Pls see my follow-ups below.

Eric said:
[...]
Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?


This is just the size of one object, and I'll have 2D array
[150][40000] of the objects.
and I have a few of those arrays.

150 * 40000 = six million IntVector objects. If there's
nothing else at all in your 1GB of memory, these objects are
occupying only about 168 bytes apiece, on the average. Each
IntVector probably takes about 24 bytes for itself plus, oh,
perhaps 16+4*N bytes for an N-byte array, so the average N
is around (168-24-16)/4 = 32 values per IntVector. Less, of
course, if "a few" of those arrays also exist (and perhaps
much more if the arrays are mostly full of null references
and there are far fewer than six million IntVectors in play).

Again, I ask whether you're looking in the wrong place.
It might be better to try to improve the "containing" data
structure than to worry about large IntVectors -- the back-
of-the-envelope calculcation above suggests that very few
of them can be "large."

Let's take a step back: Why are you storing six million
(times "a few") vectors of integer values in the first place?
What do they represent, and what are you trying to do with
them? Describe the outer problem instead of the inner one,
and maybe someone will have a better idea.
 
R

Robert Klemme

Eric Sosman said:
Eric, Thanks for the reply. Pls see my follow-ups below.

Eric said:
[...]
Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?


This is just the size of one object, and I'll have 2D array
[150][40000] of the objects.
and I have a few of those arrays.

150 * 40000 = six million IntVector objects. If there's
nothing else at all in your 1GB of memory, these objects are
occupying only about 168 bytes apiece, on the average. Each
IntVector probably takes about 24 bytes for itself plus, oh,
perhaps 16+4*N bytes for an N-byte array, so the average N
is around (168-24-16)/4 = 32 values per IntVector. Less, of
course, if "a few" of those arrays also exist (and perhaps
much more if the arrays are mostly full of null references
and there are far fewer than six million IntVectors in play).

Again, I ask whether you're looking in the wrong place.
It might be better to try to improve the "containing" data
structure than to worry about large IntVectors -- the back-
of-the-envelope calculcation above suggests that very few
of them can be "large."

Let's take a step back: Why are you storing six million
(times "a few") vectors of integer values in the first place?
What do they represent, and what are you trying to do with
them? Describe the outer problem instead of the inner one,
and maybe someone will have a better idea.

Definitively agree.

As a side note: a BitSet can also be viewed as a set of integers. If set
like behavior is required and iterations do not occur too often this is a
good (i.e. memory savvy) alternative.

Kind regards

robert
 
Y

yufeng.yao

Eric said:
Eric, Thanks for the reply. Pls see my follow-ups below.

Eric said:
[...]
Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?


This is just the size of one object, and I'll have 2D array
[150][40000] of the objects.
and I have a few of those arrays.

150 * 40000 = six million IntVector objects. If there's
nothing else at all in your 1GB of memory, these objects are
occupying only about 168 bytes apiece, on the average. Each
IntVector probably takes about 24 bytes for itself plus, oh,
perhaps 16+4*N bytes for an N-byte array, so the average N
is around (168-24-16)/4 = 32 values per IntVector. Less, of
course, if "a few" of those arrays also exist (and perhaps
much more if the arrays are mostly full of null references
and there are far fewer than six million IntVectors in play).

I have 2D array, which is 150 networks, each networkcontains 40000
nodes.
The IntVector instores the entry of a node to other reachable nodes. I
also have
a FloatVector, with same size, which instores the arc costs
corresponding to the InVector. Will FloatVector use more memory?
Because I found the 'out of memory'
error occurs when the program reaches here to create/instore such 2D
array.
 
E

Eric Sosman

Eric said:
(e-mail address removed) wrote On 06/09/06 16:17,:
Eric, Thanks for the reply. Pls see my follow-ups below.

Eric Sosman wrote:

[...]
Your 50000-place array (it's really a 65536-place array)
occupies 128KB, plus another 64KB during the doubling step.
That's 192KB, which is only 0.019% of your 1GB. Are you sure
you're looking in the right place?


This is just the size of one object, and I'll have 2D array
[150][40000] of the objects.
and I have a few of those arrays.

150 * 40000 = six million IntVector objects. If there's
nothing else at all in your 1GB of memory, these objects are
occupying only about 168 bytes apiece, on the average. Each
IntVector probably takes about 24 bytes for itself plus, oh,
perhaps 16+4*N bytes for an N-byte array, so the average N
is around (168-24-16)/4 = 32 values per IntVector. Less, of
course, if "a few" of those arrays also exist (and perhaps
much more if the arrays are mostly full of null references
and there are far fewer than six million IntVectors in play).


I have 2D array, which is 150 networks, each networkcontains 40000
nodes.
The IntVector instores the entry of a node to other reachable nodes. I
also have
a FloatVector, with same size, which instores the arc costs
corresponding to the InVector. Will FloatVector use more memory?
Because I found the 'out of memory'
error occurs when the program reaches here to create/instore such 2D
array.

An int[N] array and a float[N] array will take the same
amount of memory (on any reasonable JVM). But let's return
to the back of the envelope, with this new information:

150 * 40000 = six million IntVector objects plus six
million FloatVector objects of matching sizes, for twelve
million "IntVector equivalents." If these account for all
of your 1GB, each occupies only ~90 bytes, on the average.
With the same assumptions about overhead space as before,
the average number of elements per vector is (90-24-16)/4
or between twelve and thirteen. You've got about fifty
bytes of "payload" and forty bytes of "overhead" per vector.
(Perhaps a little different; those assumptions about the
"overhead" size are just plucked from thin air and might
not hold for your JVM.)

Given the range of the indices, you could save some
memory by changing IntVector to ShortVector (you'd need a
small amount of care about negative values, though). This
would shrink the total payload by 25%, but since the payload
amounts to something like 55% of the original grand total
the overall savings would be around 15% -- possibly enough,
possibly not.

But (I'm sounding like a broken record, I know): Are
you looking in the right place? What is it that you're
trying to do with all these arc costs? How do the 150
networks interact that requires you to keep them all in
memory simultaneously? How "sparse" or "dense" is the
connectivity in these networks? Do you really need the
ability to represent multiple different-cost arcs between
the same pair of nodes, or is there only (at most) one
arc? Are the arcs directed, or symmetrical (so that the
cost of A-to-B equals the cost of B-to-A, or possibly its
negative)?

Again: What problem are you trying to solve? Don't
describe the techniques that are failing to solve it,
tell us about the problem itself. Have you ever heard
the saying "He can't see the forest for the trees?" Put
a little distance between yourself and the trees, and
tell us about the forest.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top