Container functionality and marshalling

C

coal

I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.
http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.

In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. If it is a vector<string> though, I have to add up the
lengths of all of the strings.

I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.

For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.

Any thoughts on the utility and design of containers
like that?

Brian Wood
Ebenezer Enterprises
www.webebenezer.net
 
Z

zhangyw80

I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.

In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. If it is a vector<string> though, I have to add up the
lengths of all of the strings.

I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.

For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.

Any thoughts on the utility and design of containers
like that?

Brian Wood
Ebenezer Enterpriseswww.webebenezer.net

you can implement a derived class of set, and overload
all the methods changing total bytes, count up the total size
in those methods and call the same method of base class.
 
J

jkherciueh

you can implement a derived class of set, and overload
all the methods changing total bytes, count up the total size
in those methods and call the same method of base class.

That will be a little tricky: You could change the erase and intert
functions to update the count. However, many function allow client code to
change elements by returning references (e.g., dereferencing non-const
iterators). Since these functions return references and not
smart-references (which we could only have if the dot-operator was
overloadable), there is no hook to sneak in update code.

Probably it would be easier to implement a container like class with a
minimal interface (just enough for the application) that uses a set
internally. Then one can enforce client code to go through functions that
update the byte count.


Best

Kai-Uwe Bux
 
J

James Kanze

I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.

The usual solution it to set it to 0, and fill it in after
you've finished marshalling. Much easier, and it avoids all
sorts of problems. (For example, a machine with IBM floats
might choose to serialize them double, rather than float, since
the range of an IBM float is greater than that of an IEEE
float.)
 
C

coal

That will be a little tricky: You could change the erase and intert
functions to update the count. However, many function allow client code to
change elements by returning references (e.g., dereferencing non-const
iterators). Since these functions return references and not
smart-references (which we could only have if the dot-operator was
overloadable), there is no hook to sneak in update code.

So if a mutex is associated with the container and inserts and erases
are synchronized, a sum of the lengths may be incorrect before it is
finished. Even the simple way can't guarantee much.

Brian Wood
 
C

coal

The usual solution it to set it to 0, and fill it in after
you've finished marshalling.  Much easier, and it avoids all
sorts of problems.  

I think it avoids the possibility of saying the total length
is one thing and it being something else. However, that
approach has some drawbacks. You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. You also cannot start sending anything
until everything has been marshalled. And it also means you
have to have buffers as big as the max msg size. I don't
think buffer sizes should be tied to that parameter.

Given the current containers, though, perhaps what you suggest
is necessary. I hope, though that container technology will
mature and permit a more efficient approach here.
(For example, a machine with IBM floats
might choose to serialize them double, rather than float, since
the range of an IBM float is greater than that of an IEEE
float.)

OK

Brian Wood
 
C

coal

I think it avoids the possibility of saying the total length
is one thing and it being something else.  However, that
approach has some drawbacks.  You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing.  You also cannot start sending anything
until everything has been marshalled.  And it also means you
have to have buffers as big as the max msg size.  I don't
think buffer sizes should be tied to that parameter.

Over here,
http://www.gamedev.net/community/forums/topic.asp?topic_id=480778
"Antheus" says, "Also, the longer the message, the smaller the
overhead. Serializing large packets will yield higher throughput."

I think there is some truth to that. That puts a little pressure
on you to have larger messages and possibly exceed the max msg
size. And since the messages are relatively large, you're holding
more back from heading on it's way.

Brian Wood
 
J

James Kanze

I think it avoids the possibility of saying the total length
is one thing and it being something else. However, that
approach has some drawbacks. You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. You also cannot start sending anything
until everything has been marshalled. And it also means you
have to have buffers as big as the max msg size. I don't
think buffer sizes should be tied to that parameter.

I said it was the usual solution, not the perfect solution. On
modern hardware, I suspect that it is also the most appropriate
solution in most cases.
Given the current containers, though, perhaps what you suggest
is necessary. I hope, though that container technology will
mature and permit a more efficient approach here.

If the containers are all in memory, it's generally a pretty
efficient approach as well. It does become problematic when
you're returning disk based data, however; if you're using the
keep alive option in an HTTP server, for example, and are
serving up a dynamically generated page which can be several
Gigabytes (hopefully not to someone using a dial-up
connection:).
 
C

coal

I said it was the usual solution, not the perfect solution.  On
modern hardware, I suspect that it is also the most appropriate
solution in most cases.

I'm having second thoughts. I forgot that I make the assumption
that the arguments passed to a marshalling function are fixed
while the function is executing.* When marshalling a list<int>,
first the size() is marshalled and then the elements. If another
thread inserts elements into the list between those two steps it
will lead to undefined behaviour. I think Boost Serialization
makes this assumption also. Probably I forgot about that because
it isn't documented anywhere.

I don't think I'm going to try to factor IBM floats into the
equation at this point. To start off with I'd be happy to
support a total message length for ints, containers/string and
IEEE floats. Given the above assumption I'm not aware of any
reason why I shouldn't calculate the total length upfront.

* It may be possible to refine that to "fixed until the
argument has been marshalled."

Brian Wood
Ebenezer Enterprises
 
J

James Kanze

On Feb 1, 10:02 pm, (e-mail address removed) wrote:

[...]
I'm having second thoughts. I forgot that I make the assumption
that the arguments passed to a marshalling function are fixed
while the function is executing.* When marshalling a list<int>,
first the size() is marshalled and then the elements. If another
thread inserts elements into the list between those two steps it
will lead to undefined behaviour.

If any thread can modifying the list, then all threads accessing
it must use some sort of lock. Synchronous access to a
container (or any other object) is only allowed if no thread is
modifying it.
I think Boost Serialization makes this assumption also.
Probably I forgot about that because it isn't documented
anywhere.

It's a basic rule of thread safety. Individual objects only
have to document it when they don't follow the rule.
 
C

coal

If any thread can modifying the list, then all threads accessing
it must use some sort of lock.  Synchronous access to a
container (or any other object) is only allowed if no thread is
modifying it.

I agree. The marshalling code doesn't know if a lock is needed
or not, but if it is, it's the caller's responsibility to get a
lock on the container before calling a marshalling function.


Below is the output I get (this isn't available online yet)
when the input is:
Msgs
(list<int>, deque<string>) @MSGID_1
}

I snipped the Receive function as it isn't ready yet.
I'm considering moving the code that calculates the
total message size into a separate function called
CalculateMarshallingSize; it might be useful by itself.

These files are included below
http://home.seventy7.com/misc/Buffer.hh
http://home.seventy7.com/misc/Counter.hh
http://home.seventy7.com/misc/ErrorWordsShepherd.hh

Also, I assume that these are defined elsewhere.
const unsigned int MSGID_1 = 4000;
const unsigned int MAX_MSGLENGTH = 100000;

The first argument to SetErrorWords sometimes reflects the
marshalling argument, but othertimes it is useless.


// computer-generated output
#include <deque>
#include <list>
#include <string>
#include <Counter.hh>
#include <Buffer.hh>


struct Msgs
{
inline
Msgs() {}
inline
~Msgs() {}

inline
int
Send(Buffer* buf, const list<int>& about1, const deque<string>&
about2)
{
unsigned int headCount = 0;
unsigned int slen = 0;
if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}
// Determine total length of the message.
Counter cntr(MAX_MSGLENGTH);
if (!cntr.Add(sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}

if (!cntr.Add(sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
deque<string >::const_iterator mediator1 = about2.begin();
deque<string >::const_iterator omega1 = about2.end();
for (; mediator1 != omega1; ++mediator1) {
if (!cntr.Add((*mediator1).length())) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
}

if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}

headCount = about1.size();
if (!buf->Receive(&headCount, sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
list<int >::const_iterator mediator2 = about1.begin();
list<int >::const_iterator omega2 = about1.end();
for (; mediator2 != omega2; ++mediator2) {
if (!buf->Receive(&(*mediator2), sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
}

headCount = about2.size();
if (!buf->Receive(&headCount, sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
deque<string >::const_iterator mediator3 = about2.begin();
deque<string >::const_iterator omega3 = about2.end();
for (; mediator3 != omega3; ++mediator3) {
slen = (*mediator3).length();
if (!buf->Receive(&slen, sizeof(slen))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
if (!buf->Receive((*mediator3).c_str(), slen)) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
}

if (!buf->SendStoredData()) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}
return 1;
}
// end of computer-generated output


I've compiled it but that's about it.

Brian Wood
Ebenezer Enterprises
 
C

coal

I agree. The marshalling code doesn't know if a lock is needed
or not, but if it is, it's the caller's responsibility to get a
lock on the container before calling a marshalling function.

Below is the output I get (this isn't available online yet)
when the input is:
Msgs
  (list<int>, deque<string>) @MSGID_1

}

I snipped the Receive function as it isn't ready yet.
I'm considering moving the code that calculates the
total message size into a separate function called
CalculateMarshallingSize; it might be useful by itself.

These files are included belowhttp://home.seventy7.com/misc/Buffer.hhhttp://home.seventy7.com/misc/Counter.hhhttp://home.seventy7.com/misc/ErrorWordsShepherd.hh

Also, I assume that these are defined elsewhere.
const unsigned int MSGID_1 = 4000;
const unsigned int MAX_MSGLENGTH = 100000;

The first argument to SetErrorWords sometimes reflects the
marshalling argument, but othertimes it is useless.

// computer-generated output
#include <deque>
#include <list>
#include <string>
#include <Counter.hh>
#include <Buffer.hh>

struct Msgs
{
inline
Msgs() {}
inline
~Msgs() {}

inline
int
Send(Buffer* buf, const list<int>& about1, const deque<string>&
about2)
{
  unsigned int headCount = 0;
  unsigned int slen = 0;
  if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
  }
  // Determine total length of the message.
  Counter cntr(MAX_MSGLENGTH);
  if (!cntr.Add(sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
  }
  if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
  }

  if (!cntr.Add(sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
  }
  if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
  }
  deque<string >::const_iterator mediator1 = about2.begin();
  deque<string >::const_iterator omega1 = about2.end();
  for (; mediator1 != omega1; ++mediator1) {
    if (!cntr.Add((*mediator1).length())) {
      buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
      return 0;
    }
  }

  if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
  }

  headCount = about1.size();
  if (!buf->Receive(&headCount, sizeof(int))) {
    buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
    return 0;
  }
  list<int >::const_iterator mediator2 = about1.begin();
  list<int >::const_iterator omega2 = about1.end();
  for (; mediator2 != omega2; ++mediator2) {
    if (!buf->Receive(&(*mediator2), sizeof(int))) {
      buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
      return 0;
    }
  }

  headCount = about2.size();
  if (!buf->Receive(&headCount, sizeof(int))) {
    buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
    return 0;
  }
  deque<string >::const_iterator mediator3 = about2.begin();
  deque<string >::const_iterator omega3 = about2.end();
  for (; mediator3 != omega3; ++mediator3) {
    slen = (*mediator3).length();
    if (!buf->Receive(&slen, sizeof(slen))) {
      buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
      return 0;
    }
    if (!buf->Receive((*mediator3).c_str(), slen)) {
      buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
      return 0;
    }
  }

  if (!buf->SendStoredData()) {
    buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
    return 0;
  }
  return 1;}

// end of computer-generated output

I've compiled it but that's about it.

Whoops. It looks I accidently caught the }; that correspond to
struct Msgs
{

when I snipped the Receive function.

Brian Wood
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top