On 12-02-07 07:38 PM, BGB wrote:
On 2/7/2012 11:11 AM, jebblue wrote:
[ SNIP ]
I recommend using sockets.
in general, I agree (sockets generally make the most sense), although
there are cases where file-based communications can make sense, although
probably not in the form as described in the OP.
another issue (besides how to pass messages), is what sort of form to
pass messages in.
usually, in my case, if storing data in files, I tend to prefer
ASCII-based formats.
usually, for passing messages over sockets, I have used "compact"
specialized binary formats, typically serialized data from some other
form (such as XML nodes or S-Expressions). although "magic byte value"
based message formats are initially simpler, they tend to be harder to
expand later (whereas encoding/decoding some more generic form, though
initially more effort, can turn out to be easier to maintain and extend
later).
note: this does not mean SOAP or CORBA or some other "standardized"
messaging system, rather just that one initially builds and processes
the messages in some form that is more high-level than spitting out
bytes, and processing everything via a loop and a big "switch()" or
similar (although this can be an initially fairly simple option, so has
some merit due to ease of implementation).
the main reason for picking a binary message-serialization format (for
something like S-Expressions or XML nodes), would be mostly if there is
a chance that the serialized data will go over the internet, and a
textual format can be a bit bulkier (and thus slower to transmit over a
slower connection), as well as typically being slower to decode (a
sanely designed message format can be much more quickly unpacked than a
textual format can be parsed).
sending text over sockets may have merits as well, and is generally
preferable for "open" protocols.
or such...
I've done a fair bit with sockets myself, including recently, in fact
including on a current gig. Some of the message formats have been
designed by others, some by me. A few of them are specialized industry
standards, some are very custom and bespoke.
A few of the formats have been binary: fixed-length blocks of data with
fields at various offsets. Works well enough if it suits the data.
A bunch of others have been text and line-oriented: a fixed number of
lines of data in known order, so that line 10 is always the data for a
particular field.
Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
endpoints if that's what's appropriate for constructing message payloads.
I like text-based protocols, for some simple situations, that behave
like SMTP or POP. But it obviously depends on what you expect your
client and server to do, it's just another approach to be aware of.
well, text need not be all that limiting.
You may have misunderstood something I said if you got that impression
from me, that text is all that limiting.
[ SNIP ]
ok.
it came off that you were implying that text only really worked well for
simple protocols, like SMTP, POP, HTTP, ...
I haven't been able to completely avoid using the DOM, but I loathe the
API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
generally I'll use SAX or StAX.
I have rarely done things for which SAX has made sense...
usually in cases where SAX would make sense, I end up using
line-oriented text formats instead (because there is often little
obvious reason for why XML syntax would make much sense).
I almost never encounter a situation where DOM is called for, simply
because no random access to the document is called for. When I send XML
back and forth as a payload, the entire thing is meant to be used, and
it makes sense to do the immediate and complete conversion into real
information rather than storing it into an opaque and kludgy DOM
representation.
often, I use it for things like compiler ASTs, where it competes some
against S-Expressions (they are produced by the main parser, worked on,
and then later converted into bytecode or similar).
typically, one works by walking the tree, and potentially
rebuilding/rewriting a new tree in the process, or maybe adding
annotations to the existing tree.
a recent case where I did consider using XML as a message-passing
protocol, I ended up opting for S-Expressions (or, more properly,
Lisp-style lists) instead, mostly because they are a lot easier to build
and process, and much less painful than working with a DOM-style API
(and also because S-Expressions tend to perform better and use less
memory in my case as well...).
typically, the messages are tree-structured data of some sort (in the
recent example, it was being used for scene-graph delta messages, which
basically update the status of various objects in the scene, as well as
passing other events for things "going on", like sound-effects being
heard, updates to camera location and status, ...).
it is also desirable to keep the serialized representation small, since
a lot may be going on (in real time), and it would be annoying (say, to
players) if the connection got needlessly bogged down sending lots of
overly verbose update messages (more so if one has stuff like
network-synchronized rag-dolls or similar, where a ragdoll may send
position updates for nearly every bone for every frame).
say:
(bonedelta 499 (bone 0 (org ...) (rot ...)) (bone 1 (org ...) (rot ...))
....)
(bonedelta 515 ...)
....
hence, it may make a little sense to employ a compressed binary format.
I also personally dislike schemas or similar concepts, as they tend to
make things brittle (both the transmitter and receiver need a correct
and up-to-date schema, creating a higher risk of version issues), and
typically don't really compress all that much better (and are
potentially worse) than what a decent adaptive coding can do.
("on the wire", S-Exps and XML are not all that drastically different,
the main practical differences are more in terms of how one may work
with them in-program).
granted, yes, text+deflate also works OK if one is feeling lazy (since
IME Deflate will typically reduce textual XML or S-Exps to around
10%-25% their original size, vs say a 5%-10% one might get with a
specialized binary format).
there is also the tradeoff of designing a binary format to be standalone
(say, including its own Huffman compressor), or to be used in
combination with deflate (at which point one tries to design the format
to instead produce data which deflate can utilize efficiently).
in the latter option, there is the secondary concern of external deflate
(assuming that the data will probably be sent in a compressed channel or
stored in a ZIP file or similar), or using deflate internally (like in
PNG or similar).
there are many tradeoffs...
For a lot of situations, not just message passing between endpoints, I
have backed away from XML anyway. For configuration files I have gotten
newly enthused by .properties files, because so often they fit the bill
much better than XML configuration files. And I mentioned JSON
previously, I prefer that to XML in many situations now.
I typically use line-oriented text formats for most of these purposes...
never really did understand why someone would use XML for things like
configuration files (it neither makes them easier to process, nor does
it help anything with users trying to edit them).
as-is, my configuration format consists of "console commands", which may
in turn set "cvars" or issue key-binding commands, ...
for another (more serious) system, I am using a format which is
partially a hybrid of INI and REG files (it is for a registry-like
hierarchical database). I have on/off considered switching to a binary
database format, but never got around to it.
some amount of other data is stored in formats similar to the Quake map
format, or other special-purpose text formats.
[ SNIP ]
yeah, but this applies to programming in general, so message-passing is
likely nothing special here.
That's true, but it's maybe a bit more of an art form with messages.
Your message producer may be Java and produce beautiful exceptions in
your carefully designed exception hierarchy, but your clients may very
well not be Java at all, in which case you may end up with an error
message sub-protocol that borrows ideas from from HTTP status codes.
A lot of Java programmers these days maybe have never really dealt with
return codes, because we sort of tell them not to use them in Java, but
in the case of implementation-neutral status codes (including ones for
errors) that's really the design mindset that you need to be in: status
codes.
granted, I am actually primarily a C and C++ programmer, but
message-passing isn't particularly language-specific. granted, yes, the
lack of "standard" exceptions is an annoyance in C, where typically one
either needs to not use exceptions, or end up using non-portable
exception mechanisms, and there is no particularly good way to "build
ones' own", although some people have before done some fairly "creative"
things with macros...
one issue maybe special to sockets though
is the matter of whether or not the whole message has been received,
often resulting in some annoying code to basically read messages from
the socket and not decode them until the entire message has been received.
There is that. Although I find that once you've worked through one or
two socket implementations that you tend to devise some pretty re-usable
code for handling the incomplete message situations.
[ SNIP ]
yep.
one can always tag messages and then give them with a length.
{ tag, length, data[length] }
message is then not processed until entire data region is received.
typically, this is plenty sufficient.
likewise, a PPP/HDLC style system (message start/end codes) could also
be used.
depending on other factors, one can also do things like in JPEG or MPEG,
and use a special escape-code for messages and control-codes.
this can allow a top-level message format like:
{ escape-code, tag [ length, data[length] ... ] }
typically, in such cases (I have seen) there have been ways to escape
the escape-code, usually for cases where the escape code appeared
by-chance in the data. this in-turn adds the annoyance of typically
having to escape any escape-codes in the payload data.
some others have partly worked around the above by making the escape
code fairly long (32 or 48 bits or more) and very unlikely to appear by
chance, and likely involving "sanity checks" to try to rule out false
positives.
say: { escape-magic, tag, length, data[length], checksum }
with the assumption that chance is very unlikely to lead to all of:
an escape magic, a valid tag value, a sane length, and a valid checksum.
depending, the escape-magic and tag can be the same value.
for example:
the byte 0x7E is magic;
7E,00 escapes 7E (or maybe 7E,7E)
7E,01 Start Of Message (followed by message data)
7E,02 End Of Message (maybe, followed by checksum)
others: reserved for link-control messages.
then one can pass encoded messages over the link.
typically, I have not tried parsing incomplete messages, as trying to
make a message decoder deal gracefully with truncated data is a bit more
of a hassle.
depending on other factors (say, if one is using Huffman), then one can
also use special markers to transmit the Huffman tables and other things.
say:
7E,03: Stream Reset (possibly followed by a stream/protocol ID magic)
7E,04-07: Huffman Tables 0-3
7E,08: End Of Huffman Table
....