Serizlize You Cannot Use for Object Size

M

moleskyca1

One thread here said you can serialize object and count serialized
bytes to get the size of the object
This is incorrect. I serialize small class with one boolean and 2
chars and
got something crazy like 637 bytes for the size. I serialize to file
and you see the problem:

¼φ ♣{sr java.io.NotSerializableException(Vx τå▬5☻ xr
↔java.io_ObjectStreamExce
ptiond├Σkì9√▀☻ xr ‼java.io.IOExceptionlÇsde%≡½☻ xr ‼
java.lang.Exception╨²▼>

There is many bytes serialized so you can't use this to count size of
object in bytes.
 
M

Manish Pandit

One thread here said you can serialize object and count serialized
bytes to get the size of the object

That is never the way to get object's in-memory size. What if the
class declares all the fields as transient?

The output you pasted indicates that one or more of the attributes in
the class that you serialized are non-serializable (do not implement
java.io.Serializable).

-cheers,
Manish
 
R

Roedy Green

One thread here said you can serialize object and count serialized
bytes to get the size of the object
This is incorrect. I serialize small class with one boolean and 2
chars and
got something crazy like 637 bytes for the size. I serialize to file
and you see the problem:

dump the stream to a file and have a look at it with a hex editor. A
Serialized stream has quite a bit of overhead for the first object,
namely the fully qualified names of all the types of all the fields
used and the field names. Try dumping several objects and compare the
streams. You will see the incremental size is quite reasonable.
Further, you normally GZIP these streams. They compact nicely. See
http://mindprod.com/applet/fileio.html
for sample code.

Further objects pointed to and their descriptors go in the stream too.
Often much more crud that you imagine gets dragged along. Check it
out with a hex editor to make sure you have not inadvertently dragged
along the kitchen sink.
 
G

Guest

One thread here said you can serialize object and count serialized
bytes to get the size of the object
This is incorrect. I serialize small class with one boolean and 2
chars and
got something crazy like 637 bytes for the size. I serialize to file
and you see the problem:

¼φ ♣{sr java.io.NotSerializableException(Vx τå▬5☻ xr
↔java.io_ObjectStreamExce
ptiond├Σkì9√▀☻ xr ‼java.io.IOExceptionlÇsde%≡½☻ xr ‼
java.lang.Exception╨²▼>

There is many bytes serialized so you can't use this to count size of
object in bytes.

Correct.

Try something like:

public class SizeOf2 {
private final static int N = 1000000;
public static long mem() {
System.gc();
Runtime rt = Runtime.getRuntime();
return rt.totalMemory() - rt.freeMemory();
}
public static void main(String[] args) {
long m1 = mem();
int[] ia = new int[N];
long m2 = mem();
System.out.println("sizeof int = " + (m2 - m1)*1.0/N);
ia = null;
long m3 = mem();
double[] xa = new double[N];
long m4 = mem();
System.out.println("sizeof double = " + (m4 - m3)*1.0/N);
xa = null;
}
}


Arne
 
M

Manivannan Palanichamy

One thread here said you can serialize object and count serialized
bytes to get the size of the object
This is incorrect. I serialize small class with one boolean and 2
chars and
got something crazy like 637 bytes for the size. I serialize to file
and you see the problem:

¼φ ♣{sr java.io.NotSerializableException(Vx τå▬5☻ xr
↔java.io_ObjectStreamExce
ptiond├Σkì9√▀☻ xr ‼java.io.IOExceptionlÇsde%≡½☻ xr ‼
java.lang.Exception╨²▼>

There is many bytes serialized so you can't use this to count size of
object in bytes.

First of all, what is meant by Object's size? This is not C/ C++. In
C, C++ the object size is calculated by summing up the member
variables. But, in java, it depends on implementation. For example,
the java language specification just says that the boolean should take
either 'true' or 'false'. But, it doesnt force any constraints on the
implementation like, the boolean size should be 1 or 10 bytes. Some
implementation might represent a boolean variable in 1 single byte.
Some other implementation may represent the boolean varibale in 2
bytes or more. So, size is all about implementation specific.

One more thing, the 'Object Construction' is also implementation
specific. Assume, you declare 5 integers in a serialized class. So,
you think that the object instance size for the class will be (5 * 4)
20 bytes. But, that cant be the case always. Because, jvm might add
some internal fields to represent the 'serialized' feature. Or it
might do some trick over constructing the particular instance. So, no
guarantee that your measured 'size' will be accurate.

I would suggest not to talk about *size* in java. Talk about memory.
 
M

moleskyca1

First of all, what is meant by Object's size? This is not C/ C++. In
C, C++ the object size is calculated by summing up the member
variables. But, in java, it depends on implementation. For example,
the java language specification just says that the boolean should take
either 'true' or 'false'. But, it doesnt force any constraints on the
implementation like, the boolean size should be 1 or 10 bytes. Some
implementation might represent a boolean variable in 1 single byte.
Some other implementation may represent the boolean varibale in 2
bytes or more. So, size is all about implementation specific.

One more thing, the 'Object Construction' is also implementation
specific. Assume, you declare 5 integers in a serialized class. So,
you think that the object instance size for the class will be (5 * 4)
20 bytes. But, that cant be the case always. Because, jvm might add
some internal fields to represent the 'serialized' feature. Or it
might do some trick over constructing the particular instance. So, no
guarantee that your measured 'size' will be accurate.

I would suggest not to talk about *size* in java. Talk about memory.

So what is the way to compute the memory consumed by object? This is
hard to work with language that doesn't support this. Can anyone post
some code that work? Say you have this class what will is total memory
for each instance:

public class Goo implements Serializable {
public int one;
public boolean two;
public boolean three;
public double x;

}

This is something that programmer should be able to do on any
language. I cannot do it in java, but i am new. Can anyone do it?
 
L

Lew

So what is the way to compute the memory consumed by object? This is
hard to work with language that doesn't support this. Can anyone post
some code that work? Say you have this class what will is total memory
for each instance:

public class Goo implements Serializable {
public int one;
public boolean two;
public boolean three;
public double x;

}

This is something that programmer should be able to do on any
language. I cannot do it in java, but i am new. Can anyone do it?

AFAIK there is no general answer to "how large is an object" in Java, unless
one explicitly accounts for the time element.

"Memory consumed" makes most sense in a runtime context. Others have alluded
to the difficulty, for example, of correlating the size of a serialized
representation to any runtime impact. Let's grant that what we care about is
amount of memory consumed by an instance at runtime.

But runtime is an interval - a program is loaded, runs for a while then ends.
The envelope of that varies according to the complexity of the program, its
usage patterns, whether it's a server process and so on. During that
interval, the shape of a program varies wildly due to Java's dynamic nature.

Even individual objects of a class could be implemented differently at
different times during runtime. For that matter, the same instance can change
its memory footprint during its lifetime. Are you interested in the
instantaneous memory footprint, the mean memory consumption, its maximum?
Over a single instance's lifetime or aggregated for the lifetime of the class?

For that matter, the class itself might be garbage collected altogether during
the program's run. If it's something used only during program initialization,
it might have an instantaneous footprint that is egregious but has no negative
impact on the program's performance during normal operation after it's been
collected. Even during the init phase, hotspotting might inline the whole
thing and it would essentially disappear even while in use.

These factors make it difficult to give any kind of simple answer to your
question.
 
L

Lasse Reichstein Nielsen

So what is the way to compute the memory consumed by object? This is
hard to work with language that doesn't support this.

No it's not. I have yet to need it for anything in Java.

It all depends on how you think about memory. In C, you need to count
bytes and do pointer arithmetic. You need to know how many bytes you
use, because you are doing memory management manually. In Java, you
don't care about the exact size of an object. You care about how many
objects you create and how long they live, but whether an object has 4
or 8 bytes of overhead is completely irrelevant.
Can anyone post some code that work? Say you have this class what
will is total memory for each instance:

public class Goo implements Serializable {
public int one;
public boolean two;
public boolean three;
public double x;
}

As others have said, neither the Java Language specification or the
Java Virtual Machine specifications give requirements on how large
object implementations must be. Different JVM implementations can,
and probably do, differ.
This is something that programmer should be able to do on any
language. I cannot do it in java, but i am new. Can anyone do it?

Can you say what you need it for? Curiosity is fine, but any algorithm
that cares about the physical size of an object, i.e., deals with
objects on the byte level, is likely to be less portable than one that
deals with objects at the object level.

/L
 
P

Patricia Shanahan

Lasse said:
No it's not. I have yet to need it for anything in Java.

It all depends on how you think about memory. In C, you need to count
bytes and do pointer arithmetic. You need to know how many bytes you
use, because you are doing memory management manually. In Java, you
don't care about the exact size of an object. You care about how many
objects you create and how long they live, but whether an object has 4
or 8 bytes of overhead is completely irrelevant.

Huh? Here's a specific problem. Suppose I have an application that uses
a lot of memory. The size of a problem can be expressed in terms of a
few parameters. I know the numbers of certain types of objects that will
be created, as functions of those problem size parameters.

For simplicity, let's assume a single basic size parameter N. However,
for real problems there may be more size parameters.

I would like to run a problem with N=10,000. I know, by experiment, that
it does not run on any machine to which I currently have access. If I
ask my academic adviser (or my manager if I were working in industry)
for access to a bigger memory, the inevitable question is "How big?".

How should I go about answering that question, without caring about
object sizes?

Patricia
 
L

Lasse Reichstein Nielsen

Huh? Here's a specific problem. Suppose I have an application that uses
a lot of memory. The size of a problem can be expressed in terms of a
few parameters. I know the numbers of certain types of objects that will
be created, as functions of those problem size parameters.

For simplicity, let's assume a single basic size parameter N. However,
for real problems there may be more size parameters.

I would like to run a problem with N=10,000. I know, by experiment, that
it does not run on any machine to which I currently have access. If I
ask my academic adviser (or my manager if I were working in industry)
for access to a bigger memory, the inevitable question is "How big?".

I admit I was overgeneralizing. However, I don't think it was by much :)

So the "How big?" question is a good one, but hard to answer.
If you know that on the current version of the VM, OS and hardware you
plan to use, the objects takes exactly 24 bytes, how much memory will
you need then? What if it was 32 bytes?

If 99% of the heap will be occupied by the data objects, which
will all stay live during the entire computation, then the poor garbage
collector won't have much to do. If not, you need to know also how many
objects are alive at a time (this might not be linear in the problem
size). You should consider how a generational garbage collector interacts
with all these objects, and tweak VM parameters to match.

If you are cutting it so close that the overhead of the object
implementation counts, then I would go for a language with manual
memory management instead of Java. I wouldn't rely on automatic,
garbage collected, memory management for something with so much
simultaneous live data.
I'm certain not everybody agrees with such heresy :)
How should I go about answering that question, without caring about
object sizes?

Object sizes alone won't be enough. It's the path of bit-fiddling and
platform-specific tweaking to get there, and then Java has already
lost much of its advantage.

/L
 
P

Patricia Shanahan

Lasse said:
I admit I was overgeneralizing. However, I don't think it was by much :)

So the "How big?" question is a good one, but hard to answer.
If you know that on the current version of the VM, OS and hardware you
plan to use, the objects takes exactly 24 bytes, how much memory will
you need then? What if it was 32 bytes?

The technique I would use to answer the question is to measure, on the
JVM I intend to use but on a system to which I have access, the in-use
memory (totalMemory() - freeMemory() immediately after a System.gc()
call) with controlled numbers of objects in existence. Given that
result, and the relationships between problem size and object creation,
I can project out the memory for larger problem sizes.
If you are cutting it so close that the overhead of the object
implementation counts, then I would go for a language with manual
memory management instead of Java. I wouldn't rely on automatic,
garbage collected, memory management for something with so much
simultaneous live data.
I'm certain not everybody agrees with such heresy :)

There is too much that is convenient about Java for me to throw it out
just because memory size estimation requires some effort. Note that I
have seen far more consistency than you seem to expect in things like
array and object overhead. Of course, switching to a 64 bit JVM does
make a difference.

Patricia
 
G

Guest

So what is the way to compute the memory consumed by object? This is
hard to work with language that doesn't support this. Can anyone post
some code that work? Say you have this class what will is total memory
for each instance:

public class Goo implements Serializable {
public int one;
public boolean two;
public boolean three;
public double x;

}

This is something that programmer should be able to do on any
language. I cannot do it in java, but i am new. Can anyone do it?

I posted a solution yesterday.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Patricia said:
The technique I would use to answer the question is to measure, on the
JVM I intend to use but on a system to which I have access, the in-use
memory (totalMemory() - freeMemory() immediately after a System.gc()
call) with controlled numbers of objects in existence. Given that
result, and the relationships between problem size and object creation,
I can project out the memory for larger problem sizes.

And I actually posted a code snippet doing so yesterday.

Arne
 
T

Twisted

One thread here said you can serialize object and count serialized
bytes to get the size of the object
This is incorrect. I serialize small class with one boolean and 2
chars and
got something crazy like 637 bytes for the size.

This has all kinds of problems.

First, objects may be larger in memory than serialized, due to
transient fields. As others noted, measuring size in memory is best
done by something like

Runtime rt = Runtime.getRuntime();
System.gc();
int usage = rt.totalMemory() - rt.freeMemory();
MyObject myObject = new MyObject(args);
System.gc();
int size = rt.totalMemory() - rt.freeMemory() - usage;

This still will vary from VM to VM, and may not work perfectly
(System.gc() doesn't guarantee the gc runs, so it can err high if
transient objects are made and discarded by the MyObject constructor
and the second System.gc() does nothing, and it can err low if the
first System.gc() does nothing and the second does and collects some
garbage).

Measuring the size of serialized objects can be done more reliably,
since for an identical object it will be identical on all platforms
given the same version of the object's class. It may vary from
instance to instance depending on what objects it references or
contains via its member variables though. Still it will give you an
idea of how much disk space or bandwidth serialized instances will
consume in bulk.

But serialized output contains overhead; this will be most of your 637
bytes. I'd serialize an N-element array of MyObjects and an N+1-
element array of MyObjects, both with every array cell containing a
MyObject (rather than null), and look at the difference in their file
sizes. I'd make the fields that would tend to reference shared objects
reference a single instance from all these MyObjects so that their
referents "don't count" in the final calculation, and the fields that
would tend to reference "owned" objects or "contained" ones reference
separate ones for each instance so that their referents do count. This
will give the best idea of scaling behavior when a large number of
MyObjects are serialized on a single stream. Your original figure of
637 bytes is, on the other hand, unfortunately exactly what you can
expect if each one is serialized on a separate stream.

Note that the suggested method of measurement should end up summing
the MyObject size as the size of its fields, with fields of reference
type being the size of a pointer or some equivalent, plus the size of
the objects an instance references with such fields and actually
"owns".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top