Container functionality and marshalling

Discussion in 'C++' started by coal@mailvault.com, Jan 28, 2008.

  1. Guest

    I've been thinking about implementing some of the ideas
    discussed in this thread on clc++m.
    http://preview.tinyurl.com/2l8cnh
    Mainly how to go about calculating the total message
    length and using it in a header before sending the payload.

    In some cases like vector<int>, it is easy to multiply the size
    of the vector by the size of an int and determine how many bytes are
    involved. If it is a vector<string> though, I have to add up the
    lengths of all of the strings.

    I've wondered whether it would be helpful to have containers
    that tracked the total number of bytes they are managing
    rather than going through this calculation each time.

    For example, if a set<string> has thousands of elements
    and only a handful of changes occur to the set between
    uses of the set as a marshalling parameter, the work to
    count up everything from scratch seems like a waste
    compared to just making a few additions/subtractions to
    a count.

    Any thoughts on the utility and design of containers
    like that?

    Brian Wood
    Ebenezer Enterprises
    www.webebenezer.net
    , Jan 28, 2008
    #1
    1. Advertising

  2. Guest

    On 1ÔÂ29ÈÕ, ÉÏÎç6ʱ57·Ö, wrote:
    > I've been thinking about implementing some of the ideas
    > discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
    > Mainly how to go about calculating the total message
    > length and using it in a header before sending the payload.
    >
    > In some cases like vector<int>, it is easy to multiply the size
    > of the vector by the size of an int and determine how many bytes are
    > involved. If it is a vector<string> though, I have to add up the
    > lengths of all of the strings.
    >
    > I've wondered whether it would be helpful to have containers
    > that tracked the total number of bytes they are managing
    > rather than going through this calculation each time.
    >
    > For example, if a set<string> has thousands of elements
    > and only a handful of changes occur to the set between
    > uses of the set as a marshalling parameter, the work to
    > count up everything from scratch seems like a waste
    > compared to just making a few additions/subtractions to
    > a count.
    >
    > Any thoughts on the utility and design of containers
    > like that?
    >
    > Brian Wood
    > Ebenezer Enterpriseswww.webebenezer.net


    you can implement a derived class of set, and overload
    all the methods changing total bytes, count up the total size
    in those methods and call the same method of base class.
    , Jan 29, 2008
    #2
    1. Advertising

  3. Guest

    wrote:

    > On 1?29?, ??6?57?, wrote:
    >> I've been thinking about implementing some of the ideas
    >> discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
    >> Mainly how to go about calculating the total message
    >> length and using it in a header before sending the payload.
    >>
    >> In some cases like vector<int>, it is easy to multiply the size
    >> of the vector by the size of an int and determine how many bytes are
    >> involved. If it is a vector<string> though, I have to add up the
    >> lengths of all of the strings.
    >>
    >> I've wondered whether it would be helpful to have containers
    >> that tracked the total number of bytes they are managing
    >> rather than going through this calculation each time.
    >>
    >> For example, if a set<string> has thousands of elements
    >> and only a handful of changes occur to the set between
    >> uses of the set as a marshalling parameter, the work to
    >> count up everything from scratch seems like a waste
    >> compared to just making a few additions/subtractions to
    >> a count.
    >>
    >> Any thoughts on the utility and design of containers
    >> like that?
    >>
    >> Brian Wood
    >> Ebenezer Enterpriseswww.webebenezer.net

    >
    > you can implement a derived class of set, and overload
    > all the methods changing total bytes, count up the total size
    > in those methods and call the same method of base class.


    That will be a little tricky: You could change the erase and intert
    functions to update the count. However, many function allow client code to
    change elements by returning references (e.g., dereferencing non-const
    iterators). Since these functions return references and not
    smart-references (which we could only have if the dot-operator was
    overloadable), there is no hook to sneak in update code.

    Probably it would be easier to implement a container like class with a
    minimal interface (just enough for the application) that uses a set
    internally. Then one can enforce client code to go through functions that
    update the byte count.


    Best

    Kai-Uwe Bux
    , Jan 29, 2008
    #3
  4. James Kanze Guest

    On Jan 28, 11:57 pm, wrote:
    > I've been thinking about implementing some of the ideas
    > discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
    > Mainly how to go about calculating the total message
    > length and using it in a header before sending the payload.


    The usual solution it to set it to 0, and fill it in after
    you've finished marshalling. Much easier, and it avoids all
    sorts of problems. (For example, a machine with IBM floats
    might choose to serialize them double, rather than float, since
    the range of an IBM float is greater than that of an IEEE
    float.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jan 30, 2008
    #4
  5. Guest

    On Jan 29, 3:47 am, wrote:
    > wrote:
    > > On 1?29?, ??6?57?, wrote:
    > >> I've been thinking about implementing some of the ideas
    > >> discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
    > >> Mainly how to go about calculating the total message
    > >> length and using it in a header before sending the payload.

    >
    > >> In some cases like vector<int>, it is easy to multiply the size
    > >> of the vector by the size of an int and determine how many bytes are
    > >> involved.  If it is a vector<string> though, I have to add up the
    > >> lengths of all of the strings.

    >
    > >> I've wondered whether it would be helpful to have containers
    > >> that tracked the total number of bytes they are managing
    > >> rather than going through this calculation each time.

    >
    > >> For example, if a set<string> has thousands of elements
    > >> and only a handful of changes occur to the set between
    > >> uses of the set as a marshalling parameter, the work to
    > >> count up everything from scratch seems like a waste
    > >> compared to just making a few additions/subtractions to
    > >> a count.

    >
    > >> Any thoughts on the utility and design of containers
    > >> like that?

    >
    > >> Brian Wood
    > >> Ebenezer Enterpriseswww.webebenezer.net

    >
    > > you can implement a derived class of set, and overload
    > > all the methods changing total bytes, count up the total size
    > > in those methods and call the same method of base class.

    >
    > That will be a little tricky: You could change the erase and intert
    > functions to update the count. However, many function allow client code to
    > change elements by returning references (e.g., dereferencing non-const
    > iterators). Since these functions return references and not
    > smart-references (which we could only have if the dot-operator was
    > overloadable), there is no hook to sneak in update code.
    >


    So if a mutex is associated with the container and inserts and erases
    are synchronized, a sum of the lengths may be incorrect before it is
    finished. Even the simple way can't guarantee much.

    Brian Wood
    , Jan 30, 2008
    #5
  6. Guest

    On Jan 30, 2:57 am, James Kanze <> wrote:

    >
    > The usual solution it to set it to 0, and fill it in after
    > you've finished marshalling.  Much easier, and it avoids all
    > sorts of problems.  


    I think it avoids the possibility of saying the total length
    is one thing and it being something else. However, that
    approach has some drawbacks. You have to (probably) copy
    everything as you go and if there is a maximum message
    size that winds up being exceeded, you have done a bunch of
    copying for nothing. You also cannot start sending anything
    until everything has been marshalled. And it also means you
    have to have buffers as big as the max msg size. I don't
    think buffer sizes should be tied to that parameter.

    Given the current containers, though, perhaps what you suggest
    is necessary. I hope, though that container technology will
    mature and permit a more efficient approach here.

    >(For example, a machine with IBM floats
    > might choose to serialize them double, rather than float, since
    > the range of an IBM float is greater than that of an IEEE
    > float.)
    >


    OK

    Brian Wood
    , Jan 30, 2008
    #6
  7. Guest

    On Jan 30, 12:11 pm, wrote:
    > On Jan 30, 2:57 am, James Kanze <> wrote:
    >
    > > The usual solution it to set it to 0, and fill it in after
    > > you've finished marshalling.  Much easier, and it avoids all
    > > sorts of problems.  

    >
    > I think it avoids the possibility of saying the total length
    > is one thing and it being something else.  However, that
    > approach has some drawbacks.  You have to (probably) copy
    > everything as you go and if there is a maximum message
    > size that winds up being exceeded, you have done a bunch of
    > copying for nothing.  You also cannot start sending anything
    > until everything has been marshalled.  And it also means you
    > have to have buffers as big as the max msg size.  I don't
    > think buffer sizes should be tied to that parameter.
    >


    Over here,
    http://www.gamedev.net/community/forums/topic.asp?topic_id=480778
    "Antheus" says, "Also, the longer the message, the smaller the
    overhead. Serializing large packets will yield higher throughput."

    I think there is some truth to that. That puts a little pressure
    on you to have larger messages and possibly exceed the max msg
    size. And since the messages are relatively large, you're holding
    more back from heading on it's way.

    Brian Wood
    , Jan 30, 2008
    #7
  8. James Kanze Guest

    On Jan 30, 7:11 pm, wrote:
    > On Jan 30, 2:57 am, James Kanze <> wrote:
    > > The usual solution it to set it to 0, and fill it in after
    > > you've finished marshalling. Much easier, and it avoids all
    > > sorts of problems.


    > I think it avoids the possibility of saying the total length
    > is one thing and it being something else. However, that
    > approach has some drawbacks. You have to (probably) copy
    > everything as you go and if there is a maximum message
    > size that winds up being exceeded, you have done a bunch of
    > copying for nothing. You also cannot start sending anything
    > until everything has been marshalled. And it also means you
    > have to have buffers as big as the max msg size. I don't
    > think buffer sizes should be tied to that parameter.


    I said it was the usual solution, not the perfect solution. On
    modern hardware, I suspect that it is also the most appropriate
    solution in most cases.

    > Given the current containers, though, perhaps what you suggest
    > is necessary. I hope, though that container technology will
    > mature and permit a more efficient approach here.


    If the containers are all in memory, it's generally a pretty
    efficient approach as well. It does become problematic when
    you're returning disk based data, however; if you're using the
    keep alive option in an HTTP server, for example, and are
    serving up a dynamically generated page which can be several
    Gigabytes (hopefully not to someone using a dial-up
    connection:).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jan 31, 2008
    #8
  9. Guest

    On Jan 31, 1:50 pm, James Kanze <> wrote:
    > On Jan 30, 7:11 pm, wrote:
    >
    > > On Jan 30, 2:57 am, James Kanze <> wrote:
    > > > The usual solution it to set it to 0, and fill it in after
    > > > you've finished marshalling.  Much easier, and it avoids all
    > > > sorts of problems.

    > > I think it avoids the possibility of saying the total length
    > > is one thing and it being something else.  However, that
    > > approach has some drawbacks.  You have to (probably) copy
    > > everything as you go and if there is a maximum message
    > > size that winds up being exceeded, you have done a bunch of
    > > copying for nothing.  You also cannot start sending anything
    > > until everything has been marshalled.  And it also means you
    > > have to have buffers as big as the max msg size.  I don't
    > > think buffer sizes should be tied to that parameter.

    >
    > I said it was the usual solution, not the perfect solution.  On
    > modern hardware, I suspect that it is also the most appropriate
    > solution in most cases.
    >


    I'm having second thoughts. I forgot that I make the assumption
    that the arguments passed to a marshalling function are fixed
    while the function is executing.* When marshalling a list<int>,
    first the size() is marshalled and then the elements. If another
    thread inserts elements into the list between those two steps it
    will lead to undefined behaviour. I think Boost Serialization
    makes this assumption also. Probably I forgot about that because
    it isn't documented anywhere.

    I don't think I'm going to try to factor IBM floats into the
    equation at this point. To start off with I'd be happy to
    support a total message length for ints, containers/string and
    IEEE floats. Given the above assumption I'm not aware of any
    reason why I shouldn't calculate the total length upfront.

    * It may be possible to refine that to "fixed until the
    argument has been marshalled."

    Brian Wood
    Ebenezer Enterprises
    , Feb 1, 2008
    #9
  10. James Kanze Guest

    On Feb 1, 10:02 pm, wrote:

    [...]
    > I'm having second thoughts. I forgot that I make the assumption
    > that the arguments passed to a marshalling function are fixed
    > while the function is executing.* When marshalling a list<int>,
    > first the size() is marshalled and then the elements. If another
    > thread inserts elements into the list between those two steps it
    > will lead to undefined behaviour.


    If any thread can modifying the list, then all threads accessing
    it must use some sort of lock. Synchronous access to a
    container (or any other object) is only allowed if no thread is
    modifying it.

    > I think Boost Serialization makes this assumption also.
    > Probably I forgot about that because it isn't documented
    > anywhere.


    It's a basic rule of thread safety. Individual objects only
    have to document it when they don't follow the rule.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Feb 1, 2008
    #10
  11. Guest

    On Feb 1, 5:30 pm, James Kanze <> wrote:
    >
    > If any thread can modifying the list, then all threads accessing
    > it must use some sort of lock.  Synchronous access to a
    > container (or any other object) is only allowed if no thread is
    > modifying it.
    >


    I agree. The marshalling code doesn't know if a lock is needed
    or not, but if it is, it's the caller's responsibility to get a
    lock on the container before calling a marshalling function.


    Below is the output I get (this isn't available online yet)
    when the input is:
    Msgs
    (list<int>, deque<string>) @MSGID_1
    }

    I snipped the Receive function as it isn't ready yet.
    I'm considering moving the code that calculates the
    total message size into a separate function called
    CalculateMarshallingSize; it might be useful by itself.

    These files are included below
    http://home.seventy7.com/misc/Buffer.hh
    http://home.seventy7.com/misc/Counter.hh
    http://home.seventy7.com/misc/ErrorWordsShepherd.hh

    Also, I assume that these are defined elsewhere.
    const unsigned int MSGID_1 = 4000;
    const unsigned int MAX_MSGLENGTH = 100000;

    The first argument to SetErrorWords sometimes reflects the
    marshalling argument, but othertimes it is useless.


    // computer-generated output
    #include <deque>
    #include <list>
    #include <string>
    #include <Counter.hh>
    #include <Buffer.hh>


    struct Msgs
    {
    inline
    Msgs() {}
    inline
    ~Msgs() {}

    inline
    int
    Send(Buffer* buf, const list<int>& about1, const deque<string>&
    about2)
    {
    unsigned int headCount = 0;
    unsigned int slen = 0;
    if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
    }
    // Determine total length of the message.
    Counter cntr(MAX_MSGLENGTH);
    if (!cntr.Add(sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
    }
    if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
    }

    if (!cntr.Add(sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    deque<string >::const_iterator mediator1 = about2.begin();
    deque<string >::const_iterator omega1 = about2.end();
    for (; mediator1 != omega1; ++mediator1) {
    if (!cntr.Add((*mediator1).length())) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    }

    if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
    }

    headCount = about1.size();
    if (!buf->Receive(&headCount, sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
    }
    list<int >::const_iterator mediator2 = about1.begin();
    list<int >::const_iterator omega2 = about1.end();
    for (; mediator2 != omega2; ++mediator2) {
    if (!buf->Receive(&(*mediator2), sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
    }
    }

    headCount = about2.size();
    if (!buf->Receive(&headCount, sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    deque<string >::const_iterator mediator3 = about2.begin();
    deque<string >::const_iterator omega3 = about2.end();
    for (; mediator3 != omega3; ++mediator3) {
    slen = (*mediator3).length();
    if (!buf->Receive(&slen, sizeof(slen))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    if (!buf->Receive((*mediator3).c_str(), slen)) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
    }
    }

    if (!buf->SendStoredData()) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
    }
    return 1;
    }
    // end of computer-generated output


    I've compiled it but that's about it.

    Brian Wood
    Ebenezer Enterprises
    , Feb 3, 2008
    #11
  12. Guest

    On Feb 3, 2:07 pm, wrote:
    >
    > I agree. The marshalling code doesn't know if a lock is needed
    > or not, but if it is, it's the caller's responsibility to get a
    > lock on the container before calling a marshalling function.
    >
    > Below is the output I get (this isn't available online yet)
    > when the input is:
    > Msgs
    >   (list<int>, deque<string>) @MSGID_1
    >
    > }
    >
    > I snipped the Receive function as it isn't ready yet.
    > I'm considering moving the code that calculates the
    > total message size into a separate function called
    > CalculateMarshallingSize; it might be useful by itself.
    >
    > These files are included belowhttp://home.seventy7.com/misc/Buffer.hhhttp://home.seventy7.com/misc/Counter.hhhttp://home.seventy7.com/misc/ErrorWordsShepherd.hh
    >
    > Also, I assume that these are defined elsewhere.
    > const unsigned int MSGID_1 = 4000;
    > const unsigned int MAX_MSGLENGTH = 100000;
    >
    > The first argument to SetErrorWords sometimes reflects the
    > marshalling argument, but othertimes it is useless.
    >
    > // computer-generated output
    > #include <deque>
    > #include <list>
    > #include <string>
    > #include <Counter.hh>
    > #include <Buffer.hh>
    >
    > struct Msgs
    > {
    > inline
    > Msgs() {}
    > inline
    > ~Msgs() {}
    >
    > inline
    > int
    > Send(Buffer* buf, const list<int>& about1, const deque<string>&
    > about2)
    > {
    >   unsigned int headCount = 0;
    >   unsigned int slen = 0;
    >   if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
    >     buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   // Determine total length of the message.
    >   Counter cntr(MAX_MSGLENGTH);
    >   if (!cntr.Add(sizeof(int))) {
    >     buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
    >     buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    >     return 0;
    >   }
    >
    >   if (!cntr.Add(sizeof(int))) {
    >     buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
    >     buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   deque<string >::const_iterator mediator1 = about2.begin();
    >   deque<string >::const_iterator omega1 = about2.end();
    >   for (; mediator1 != omega1; ++mediator1) {
    >     if (!cntr.Add((*mediator1).length())) {
    >       buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >       return 0;
    >     }
    >   }
    >
    >   if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
    >     buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    >     return 0;
    >   }
    >
    >   headCount = about1.size();
    >   if (!buf->Receive(&headCount, sizeof(int))) {
    >     buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   list<int >::const_iterator mediator2 = about1.begin();
    >   list<int >::const_iterator omega2 = about1.end();
    >   for (; mediator2 != omega2; ++mediator2) {
    >     if (!buf->Receive(&(*mediator2), sizeof(int))) {
    >       buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    >       return 0;
    >     }
    >   }
    >
    >   headCount = about2.size();
    >   if (!buf->Receive(&headCount, sizeof(int))) {
    >     buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   deque<string >::const_iterator mediator3 = about2.begin();
    >   deque<string >::const_iterator omega3 = about2.end();
    >   for (; mediator3 != omega3; ++mediator3) {
    >     slen = (*mediator3).length();
    >     if (!buf->Receive(&slen, sizeof(slen))) {
    >       buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >       return 0;
    >     }
    >     if (!buf->Receive((*mediator3).c_str(), slen)) {
    >       buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    >       return 0;
    >     }
    >   }
    >
    >   if (!buf->SendStoredData()) {
    >     buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    >     return 0;
    >   }
    >   return 1;}
    >
    > // end of computer-generated output
    >
    > I've compiled it but that's about it.
    >


    Whoops. It looks I accidently caught the }; that correspond to
    struct Msgs
    {

    when I snipped the Receive function.

    Brian Wood
    , Feb 3, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Vivi Orunitia
    Replies:
    11
    Views:
    4,451
    Martijn Lievaart
    Feb 4, 2004
  2. Brian
    Replies:
    17
    Views:
    602
    Brian
    Dec 25, 2009
  3. Charles Hixson

    Singletons and Marshalling

    Charles Hixson, Nov 14, 2003, in forum: Ruby
    Replies:
    0
    Views:
    98
    Charles Hixson
    Nov 14, 2003
  4. Summercool
    Replies:
    1
    Views:
    110
    Xavier Noria
    Sep 11, 2007
  5. Junkone
    Replies:
    1
    Views:
    87
    Robert Klemme
    Jun 5, 2008
Loading...

Share This Page