Efficient hashmap serialization?

Discussion in 'Java' started by Sn0tters@yahoo.co.uk, Sep 2, 2005.

  1. Guest

    Hi,

    I have a a hash map full of objects that contain references to other
    objects in that map, grouped in a threaded message fashion. There's a
    linked list style previous messages and next message.

    I serialize this map to send it over RMI, but it seems quite slow.

    Does anyone have any recommendations on efficiently serializing it? I
    know I will have to write a function to serialize the objects manually
    but not much beyond that.

    Thanks
    Wil
     
    , Sep 2, 2005
    #1
    1. Advertising

  2. jan V Guest

    > I have a a hash map full of objects that contain references to other
    > objects in that map, grouped in a threaded message fashion. There's a
    > linked list style previous messages and next message.
    >
    > I serialize this map to send it over RMI, but it seems quite slow.
    >
    > Does anyone have any recommendations on efficiently serializing it? I
    > know I will have to write a function to serialize the objects manually
    > but not much beyond that.


    Can you give us the types of the keys and values of your Map? Are the
    messages plain Strings, or are you using a proper message abstraction type?

    If you understand what Serialization does for you, maybe you will change
    your opinion on it being quite slow... it's got a lot to do, and it does
    everything through reflection.

    If you really, really need to speed things up, then consider writing custom
    private readObject()/writeObject() methods for your Map's value objects...
    though you should profile the serialization step to see what really is
    causing the whole thing to eat time.
     
    jan V, Sep 2, 2005
    #2
    1. Advertising

  3. Guest

    I told a slight lie, it's a HashTable

    private Hashtable<Double, Message> Messages = new Hashtable<Double,
    Message>();

    The Message object has a load of elements,

    private double postNumber = 0;
    private LinkTag postName;
    private ImageTag postIcon;
    private Text subject = null;
    private List<AbstractNode> text = null;
    private List<Tag> postAppeal = null;
    private Identity poster = null;
    private Date date = null;
    private HashMap<Double,RawMessage> nextInThread = new
    HashMap<Double,RawMessage>();
    private RawMessage previousInThread = null;

    My main worry is that each object is being serialized many times, think
    of this example:

    Objects A & B are in the hashtable
    Both objects reference each other.
    When A is serialized the reference to B serializes B
    When B is serialized the reference to A serializes A

    Is this a valid assumption?


    readObject and writeObject methods are my number one optimization I'm
    think of.

    Thanks
    Wil
     
    , Sep 2, 2005
    #3
  4. jan V Guest

    > My main worry is that each object is being serialized many times, think
    > of this example:


    > Is this a valid assumption?


    Nope. Serialization deals with this (intelligently, I may add)
     
    jan V, Sep 2, 2005
    #4
  5. Guest

    That's good to know!

    I'll look further in to implmenting those methods then.

    Thanks!
    Wil
     
    , Sep 2, 2005
    #5
  6. Roedy Green Guest

    On 2 Sep 2005 11:50:38 -0700, ""
    <> wrote or quoted :

    >Objects A & B are in the hashtable
    >Both objects reference each other.
    >When A is serialized the reference to B serializes B
    >When B is serialized the reference to A serializes A
    >
    >Is this a valid assumption?


    Nope. Each object appears at most once in the ObjectStream. This
    causes a problem sometimes since object fields can be updated and you
    only have the old values recorded.

    Further consider a long array of ints. Basically it is recorded just
    as efficiently as if you had used DataOutputStream. ObjectStreams futz
    around getting started, but once they get going they pick up steam
    since they don't redundantly record information.

    I like serialised streams mainly for three reasons:

    1. you can read/write arbitrarily complex datastructures with a line
    of code. You don't have to maintain some hideous bug-prone mapping.

    2. for long arrays, they have little more overhead than using a
    DataOutputStream

    3. For Applets they let you predigest data with the most complicated
    processing and parsing you choose, then hand it off in very compact
    form to the Applet that can read it with no extra downloaded classes
    and no extra application parsing classes.

    There are downsides. See my essay
    http://mindprod.com/jgloss/serialization.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
     
    Roedy Green, Sep 2, 2005
    #6
  7. Roedy Green Guest

    Roedy Green, Sep 3, 2005
    #7
  8. jan V Guest

    > I like serialised streams mainly for three reasons:

    1.
    2.
    [3.]

    did you get those? These are bloody strong arguments,
    so if you really want to start overriding standard serialization, you better
    have even stronger arguments.
     
    jan V, Sep 3, 2005
    #8
  9. wrote:
    > I told a slight lie, it's a HashTable
    >


    Have you considered the possibility that the synchronized nature of
    Hashtable is the cause of the performance?

    HTH,
    Ray

    --
    XML is the programmer's duct tape.
     
    Raymond DeCampo, Sep 4, 2005
    #9
  10. Chris Uppal Guest

    wrote:

    > I serialize this map to send it over RMI, but it seems quite slow.


    I'm not sure how RMI interacts with serialisation, but I suspect that there may
    be a problem here.

    As jan V says, within any given serialised stream, objects are only represented
    once, even if they are part of a complicated object network. However, this may
    not be the case when you are using RMI, unless you are driving it in such a way
    that RMI can "see" that it is getting several references to the same object in
    different requests. As I say, I'm not clear on exactly how RMI manages such
    things, so I could be completely wrong, but it sounds as if what may be
    happening is that your entire object-network is being serialised on every
    request. If that's the case then it should be easy enough to diagnose (fixing
    it is different ;-) since each RMI request would generate network traffic
    (easily visible with any handy network monitor) that is of the same order of
    size as your datastructure when serialised out to disk.

    Incidentally, even if I am guessing wrong here, I think that it would be a good
    idea to try serialising your data to file before worrying about RMI. If that's
    slow too, then RMI isn't the problem and can be ignored while you focus on the
    serialisation alone.

    -- chris
     
    Chris Uppal, Sep 4, 2005
    #10
  11. jan V Guest

    "Chris Uppal" <-THIS.org> wrote in message
    news:431ad0bf$0$38044$...
    > wrote:
    >
    > > I serialize this map to send it over RMI, but it seems quite slow.

    >
    > I'm not sure how RMI interacts with serialisation, but I suspect that

    there may
    > be a problem here.


    Unfortunately, RMI doesn't keep ObjectOutputStreams around across remote
    method calls. This means that passing the same object as argument to two
    consecutive RMI calls will result in two complete serialization steps. This
    could be hugely expensive for "tip of the iceberg" objects. But the OP seems
    to imply that he transmits his Map as one object in one call, so this
    problem doesn't seem to appy.

    > once, even if they are part of a complicated object network. However,

    this may
    > not be the case when you are using RMI, unless you are driving it in such

    a way
    > that RMI can "see" that it is getting several references to the same

    object in
    > different requests.


    RMI isn't that clever (I'm relying on what I can read in "java.rmi - The
    Remote Method Invocation Guide" by Pitt & McNiff). RMI apparently only does
    clever cacheing of connections (sockets).

    > Incidentally, even if I am guessing wrong here, I think that it would be a

    good
    > idea to try serialising your data to file before worrying about RMI. If

    that's
    > slow too, then RMI isn't the problem and can be ignored while you focus on

    the
    > serialisation alone.


    Even better: serialize to a ByteArrayOutputStream which would additionally
    hide any filing system/disk performance interference.
     
    jan V, Sep 4, 2005
    #11
  12. Chris Uppal Guest

    jan V wrote:

    > > I'm not sure how RMI interacts with serialisation, but I suspect that
    > > there may be a problem here.

    >
    > Unfortunately, RMI doesn't keep ObjectOutputStreams around across remote
    > method calls. This means that passing the same object as argument to two
    > consecutive RMI calls will result in two complete serialization steps.


    Thanks.


    > This could be hugely expensive for "tip of the iceberg" objects. But the
    > OP seems to imply that he transmits his Map as one object in one call, so
    > this problem doesn't seem to appy.


    I get the impression that there's a fairly large network of objects that he's
    iterating over. From the first post:

    > There's a linked list style previous messages
    > and next message.


    which I take to mean that the design is for the bulk of the data to remain at
    the "server" and for the "client" to fetch data piecemeal by traversing the
    object graph in small steps. If each step requires (re)transmitting the entire
    graph then there's a problem.

    But I really ought to go refresh my memory of how RMI actually works before
    speculating further.

    -- chris
     
    Chris Uppal, Sep 4, 2005
    #12
  13. E.J. Pitt Guest

    jan V wrote:
    > RMI isn't that clever (I'm relying on what I can read in "java.rmi - The
    > Remote Method Invocation Guide" by Pitt & McNiff). RMI apparently only does
    > clever cacheing of connections (sockets).


    Yup. If it did reuse the ObjectOutputStreams so that changed values
    weren't propagated, it would raise the whole question of whether this is
    the correct semantic for a remote method call. There are arguments both
    ways: for example, this is part of the reason why an
    RMIClientSocketFactory has to have a class-based equals() method, but on
    the whole if you think about it RMI wouldn't be much use if it did fail
    to propagate argument & result changes - you'd have to have some way of
    turning that off, etc etc etc.

    Esmond Pitt
     
    E.J. Pitt, Sep 5, 2005
    #13
  14. jan V Guest

    "E.J. Pitt" <> wrote in message
    news:BYTSe.23709$...
    > jan V wrote:
    > > RMI isn't that clever (I'm relying on what I can read in "java.rmi - The
    > > Remote Method Invocation Guide" by Pitt & McNiff). RMI apparently only

    does
    > > clever cacheing of connections (sockets).

    >
    > Yup. If it did reuse the ObjectOutputStreams so that changed values
    > weren't propagated, it would raise the whole question of whether this is
    > the correct semantic for a remote method call. There are arguments both
    > ways: for example, this is part of the reason why an
    > RMIClientSocketFactory has to have a class-based equals() method, but on
    > the whole if you think about it RMI wouldn't be much use if it did fail
    > to propagate argument & result changes - you'd have to have some way of
    > turning that off, etc etc etc.
    >
    > Esmond Pitt


    That's pretty cool. At least two Java book authors contributing to this
    thread. I wonder if any other thread can do better? ;-)
     
    jan V, Sep 5, 2005
    #14
  15. Guest

    Hi,

    I got an early version of my app going and while the performance of
    transferring the Hashtable isn't lightening fast it only takes a few
    seconds which is bearable enough.

    Ideally if I could optimize it I would like to but I don't think I will
    do at this point as I could well fall in to a quagmire!

    Cheers!
    Wil
     
    , Sep 7, 2005
    #15
  16. Guest

    Once RMI gets it's mits on the Hashtable I presume it keeps the lock
    exclusively so for the transmission I don't think that would be a
    problem.

    I've seen in my implmentation that concurrency does not seem to be an
    issue ( so far! )

    Wil
     
    , Sep 7, 2005
    #16
  17. Guest

    It's a relatively large network but not overly.

    The design is like jan said, initially the whole Hashtable is
    transferred, which is obviously the bit that takes time. Then iterative
    updates are done to each object that changes.

    If the time for the inital update becomes a problem and I will do as
    you say and throw it out to a file to see how long that takes.

    Out of interest on debugging the server there seems to be recursive
    calls to

    ObjectOutputStream.writeObject0
    ObjectOutputStream.writeOrdinaryObject
    ObjectOutputStream.writeSerialData
    ObjectOutputStream.defaultwriteFields
     
    , Sep 7, 2005
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Vince Darley
    Replies:
    4
    Views:
    4,506
    emilchacko
    Mar 2, 2010
  2. Replies:
    8
    Views:
    2,292
    deadsea
    Jan 2, 2005
  3. Replies:
    3
    Views:
    1,070
  4. Rakesh
    Replies:
    10
    Views:
    12,239
    Mike Schilling
    Apr 8, 2008
  5. Roedy Green

    idea for more efficient HashMap

    Roedy Green, Jan 12, 2013, in forum: Java
    Replies:
    15
    Views:
    409
    Arne Vajhøj
    Feb 2, 2013
Loading...

Share This Page