Efficient hashmap serialization?

Sn0tters · Sep 2, 2005

Hi,

I have a a hash map full of objects that contain references to other
objects in that map, grouped in a threaded message fashion. There's a
linked list style previous messages and next message.

I serialize this map to send it over RMI, but it seems quite slow.

Does anyone have any recommendations on efficiently serializing it? I
know I will have to write a function to serialize the objects manually
but not much beyond that.

Thanks
Wil

jan V · Sep 2, 2005

I have a a hash map full of objects that contain references to other

objects in that map, grouped in a threaded message fashion. There's a
linked list style previous messages and next message.

I serialize this map to send it over RMI, but it seems quite slow.

Does anyone have any recommendations on efficiently serializing it? I
know I will have to write a function to serialize the objects manually
but not much beyond that.

Can you give us the types of the keys and values of your Map? Are the
messages plain Strings, or are you using a proper message abstraction type?

If you understand what Serialization does for you, maybe you will change
your opinion on it being quite slow... it's got a lot to do, and it does
everything through reflection.

If you really, really need to speed things up, then consider writing custom
private readObject()/writeObject() methods for your Map's value objects...
though you should profile the serialization step to see what really is
causing the whole thing to eat time.

Sn0tters · Sep 2, 2005

I told a slight lie, it's a HashTable

private Hashtable<Double, Message> Messages = new Hashtable<Double,
Message>();

The Message object has a load of elements,

private double postNumber = 0;
private LinkTag postName;
private ImageTag postIcon;
private Text subject = null;
private List<AbstractNode> text = null;
private List<Tag> postAppeal = null;
private Identity poster = null;
private Date date = null;
private HashMap<Double,RawMessage> nextInThread = new
HashMap<Double,RawMessage>();
private RawMessage previousInThread = null;

My main worry is that each object is being serialized many times, think
of this example:

Objects A & B are in the hashtable
Both objects reference each other.
When A is serialized the reference to B serializes B
When B is serialized the reference to A serializes A

Is this a valid assumption?

readObject and writeObject methods are my number one optimization I'm
think of.

Thanks
Wil

jan V · Sep 2, 2005

My main worry is that each object is being serialized many times, think

of this example:

Is this a valid assumption?

Nope. Serialization deals with this (intelligently, I may add)

Sn0tters · Sep 2, 2005

That's good to know!

I'll look further in to implmenting those methods then.

Thanks!
Wil

Roedy Green · Sep 2, 2005

Objects A & B are in the hashtable
Both objects reference each other.
When A is serialized the reference to B serializes B
When B is serialized the reference to A serializes A

Is this a valid assumption?

Nope. Each object appears at most once in the ObjectStream. This
causes a problem sometimes since object fields can be updated and you
only have the old values recorded.

Further consider a long array of ints. Basically it is recorded just
as efficiently as if you had used DataOutputStream. ObjectStreams futz
around getting started, but once they get going they pick up steam
since they don't redundantly record information.

I like serialised streams mainly for three reasons:

1. you can read/write arbitrarily complex datastructures with a line
of code. You don't have to maintain some hideous bug-prone mapping.

2. for long arrays, they have little more overhead than using a
DataOutputStream

3. For Applets they let you predigest data with the most complicated
processing and parsing you choose, then hand it off in very compact
form to the Applet that can read it with no extra downloaded classes
and no extra application parsing classes.

There are downsides. See my essay
http://mindprod.com/jgloss/serialization.

Roedy Green · Sep 3, 2005

There are downsides. See my essay
http://mindprod.com/jgloss/serialization.

oops
http://mindprod.com/jgloss/serialization.html

jan V · Sep 3, 2005

I like serialised streams mainly for three reasons:

1.
2.
[3.]

(e-mail address removed) did you get those? These are bloody strong arguments,
so if you really want to start overriding standard serialization, you better
have even stronger arguments.

Raymond DeCampo · Sep 4, 2005

I told a slight lie, it's a HashTable

Have you considered the possibility that the synchronized nature of
Hashtable is the cause of the performance?

HTH,
Ray

Chris Uppal · Sep 4, 2005

I serialize this map to send it over RMI, but it seems quite slow.

I'm not sure how RMI interacts with serialisation, but I suspect that there may
be a problem here.

As jan V says, within any given serialised stream, objects are only represented
once, even if they are part of a complicated object network. However, this may
not be the case when you are using RMI, unless you are driving it in such a way
that RMI can "see" that it is getting several references to the same object in
different requests. As I say, I'm not clear on exactly how RMI manages such
things, so I could be completely wrong, but it sounds as if what may be
happening is that your entire object-network is being serialised on every
request. If that's the case then it should be easy enough to diagnose (fixing
it is different ;-) since each RMI request would generate network traffic
(easily visible with any handy network monitor) that is of the same order of
size as your datastructure when serialised out to disk.

Incidentally, even if I am guessing wrong here, I think that it would be a good
idea to try serialising your data to file before worrying about RMI. If that's
slow too, then RMI isn't the problem and can be ignored while you focus on the
serialisation alone.

-- chris

jan V · Sep 4, 2005

Chris Uppal said:
I'm not sure how RMI interacts with serialisation, but I suspect that there may
be a problem here.

Unfortunately, RMI doesn't keep ObjectOutputStreams around across remote
method calls. This means that passing the same object as argument to two
consecutive RMI calls will result in two complete serialization steps. This
could be hugely expensive for "tip of the iceberg" objects. But the OP seems
to imply that he transmits his Map as one object in one call, so this
problem doesn't seem to appy.

once, even if they are part of a complicated object network. However, this may
not be the case when you are using RMI, unless you are driving it in such a way
that RMI can "see" that it is getting several references to the same object in
different requests.

RMI isn't that clever (I'm relying on what I can read in "java.rmi - The
Remote Method Invocation Guide" by Pitt & McNiff). RMI apparently only does
clever cacheing of connections (sockets).

Incidentally, even if I am guessing wrong here, I think that it would be a good
idea to try serialising your data to file before worrying about RMI. If that's
slow too, then RMI isn't the problem and can be ignored while you focus on the
serialisation alone.

Even better: serialize to a ByteArrayOutputStream which would additionally
hide any filing system/disk performance interference.

Chris Uppal · Sep 4, 2005

jan said:
Unfortunately, RMI doesn't keep ObjectOutputStreams around across remote
method calls. This means that passing the same object as argument to two
consecutive RMI calls will result in two complete serialization steps.
Thanks.

This could be hugely expensive for "tip of the iceberg" objects. But the
OP seems to imply that he transmits his Map as one object in one call, so
this problem doesn't seem to appy.

I get the impression that there's a fairly large network of objects that he's
iterating over. From the first post:

There's a linked list style previous messages
and next message.

which I take to mean that the design is for the bulk of the data to remain at
the "server" and for the "client" to fetch data piecemeal by traversing the
object graph in small steps. If each step requires (re)transmitting the entire
graph then there's a problem.

But I really ought to go refresh my memory of how RMI actually works before
speculating further.

-- chris

E.J. Pitt · Sep 5, 2005

jan said:
RMI isn't that clever (I'm relying on what I can read in "java.rmi - The
Remote Method Invocation Guide" by Pitt & McNiff). RMI apparently only does
clever cacheing of connections (sockets).

Yup. If it did reuse the ObjectOutputStreams so that changed values
weren't propagated, it would raise the whole question of whether this is
the correct semantic for a remote method call. There are arguments both
ways: for example, this is part of the reason why an
RMIClientSocketFactory has to have a class-based equals() method, but on
the whole if you think about it RMI wouldn't be much use if it did fail
to propagate argument & result changes - you'd have to have some way of
turning that off, etc etc etc.

Esmond Pitt

jan V · Sep 5, 2005

E.J. Pitt said:
Yup. If it did reuse the ObjectOutputStreams so that changed values
weren't propagated, it would raise the whole question of whether this is
the correct semantic for a remote method call. There are arguments both
ways: for example, this is part of the reason why an
RMIClientSocketFactory has to have a class-based equals() method, but on
the whole if you think about it RMI wouldn't be much use if it did fail
to propagate argument & result changes - you'd have to have some way of
turning that off, etc etc etc.

Esmond Pitt

That's pretty cool. At least two Java book authors contributing to this
thread. I wonder if any other thread can do better? ;-)

Sn0tters · Sep 7, 2005

Hi,

I got an early version of my app going and while the performance of
transferring the Hashtable isn't lightening fast it only takes a few
seconds which is bearable enough.

Ideally if I could optimize it I would like to but I don't think I will
do at this point as I could well fall in to a quagmire!

Cheers!
Wil

Sn0tters · Sep 7, 2005

Once RMI gets it's mits on the Hashtable I presume it keeps the lock
exclusively so for the transmission I don't think that would be a
problem.

I've seen in my implmentation that concurrency does not seem to be an
issue ( so far! )

Wil

Sn0tters · Sep 7, 2005

It's a relatively large network but not overly.

The design is like jan said, initially the whole Hashtable is
transferred, which is obviously the bit that takes time. Then iterative
updates are done to each object that changes.

If the time for the inital update becomes a problem and I will do as
you say and throw it out to a file to see how long that takes.

Out of interest on debugging the server there seems to be recursive
calls to

ObjectOutputStream.writeObject0
ObjectOutputStream.writeOrdinaryObject
ObjectOutputStream.writeSerialData
ObjectOutputStream.defaultwriteFields

Different Serialization Technique In .NET	0	Sep 27, 2013
HashMap implementation question	3	Jul 7, 2004
naive serialization	4	Oct 15, 2010
need clarification on HashMap storage-retrieval	7	Sep 3, 2008
HashMaps, hashcodes, equals, and Serialization	4	Dec 30, 2006
Serialization, RMI and deep copy	1	Oct 30, 2003
is serialization really a performance hog?	3	Aug 18, 2006
Custom XML Webservice serialization	0	May 16, 2006

Efficient hashmap serialization?

Sn0tters

jan V

Sn0tters

jan V

Sn0tters

Roedy Green

Roedy Green

jan V

Raymond DeCampo

Chris Uppal

jan V

Chris Uppal

E.J. Pitt

jan V

Sn0tters

Sn0tters

Sn0tters

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads