Which data serialization format?

Discussion in 'Perl Misc' started by Tobias Nissen, Aug 8, 2011.

  1. Hi,

    I am unsure which data serialization format to use. I need a format that
    also supports serialization of binary data.

    A complete RPC-implementation, as in the case of Thrift, would be a
    nice add-on. On the other hand I could do it myself and would have some
    more control over what's happening under the hood.

    Since it's pretty standard nowadays, I'd really like to use JSON, but
    sending binary data over it could become quiet awkward.

    What I am looking for is a stable, simple, proven and tested system,
    which's Perl(!) implementation is in widespread use.

    That's basically my question for this post. If you like to read on,
    here's my findings so far...


    I have made some tests with the official Perl-Implementation of Thriftâ°
    and have not yet discovered any major issues that would prevent me from
    using it. However, it lacks proper documenation and I am not entirely
    convinced that the user-mailing list would be especially helpful, once
    I dig deeper into it.

    Protocol Buffers OTOH has robust implementations in some languages, but
    apparantly Perl's not one of them. There's Google::protocolBuffers¹
    which hasn't had a release since 2008 and is stuck at version 0.08. Then
    there's protobuf-perl² -- which is dead. Last but not least there's
    protobuf-perlxs³ which received two bug reports this year and had its
    last commit in august last year.

    BSON could be an alternative, but it's Perl-Implementation is, well,
    minimalâ´.

    BTW, I want a format with a large Perl user base in particular.

    Best regards and TIA,
    Tobias

    _________
    â° http://thrift.apache.org/
    ¹ http://search.cpan.org/~gariev/Google-ProtocolBuffers-0.08/lib/Google/ProtocolBuffers.pm
    ² http://code.google.com/p/protobuf-perl/
    ³ http://code.google.com/p/protobuf-perlxs/
    â´ http://search.cpan.org/~minimal/BSON-0.03/lib/BSON.pm
     
    Tobias Nissen, Aug 8, 2011
    #1
    1. Advertising

  2. * Tobias Nissen wrote in comp.lang.perl.misc:
    >I am unsure which data serialization format to use. I need a format that
    >also supports serialization of binary data.
    >
    >A complete RPC-implementation, as in the case of Thrift, would be a
    >nice add-on. On the other hand I could do it myself and would have some
    >more control over what's happening under the hood.
    >
    >Since it's pretty standard nowadays, I'd really like to use JSON, but
    >sending binary data over it could become quiet awkward.
    >
    >What I am looking for is a stable, simple, proven and tested system,
    >which's Perl(!) implementation is in widespread use.


    That would seem to be Storable.pm, which is a Core module since v5.7.3;
    it supports pretty much everything you can reasonably represent in Perl
    including, say, circular references, which formats like JSON are unable
    to support without adding awkward indirection. There are a couple of RPC
    related modules on CPAN that use Storable for marshalling. Main problem
    would be that there may be version incompatibilities.
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
    25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
     
    Bjoern Hoehrmann, Aug 8, 2011
    #2
    1. Advertising

  3. Bjoern Hoehrmann wrote:
    > * Tobias Nissen wrote in comp.lang.perl.misc:
    >> I am unsure which data serialization format to use. I need a format
    >> that also supports serialization of binary data.
    >>
    >> A complete RPC-implementation, as in the case of Thrift, would be a
    >> nice add-on. On the other hand I could do it myself and would have
    >> some more control over what's happening under the hood.
    >>
    >> Since it's pretty standard nowadays, I'd really like to use JSON, but
    >> sending binary data over it could become quiet awkward.
    >>
    >> What I am looking for is a stable, simple, proven and tested system,
    >> which's Perl(!) implementation is in widespread use.

    >
    > That would seem to be Storable.pm, which is a Core module since
    > v5.7.3; it supports pretty much everything you can reasonably
    > represent in Perl including, say, circular references, which formats
    > like JSON are unable to support without adding awkward indirection.


    Sorry, I forgot to mention it, but I like the idea of the format to be
    programming language independent. Thrift e.g. seems to officially
    support 14 different languages, which was one of the reasons why I
    picked it for my experiments.

    I also dislike the absence of basic types (like String, Int, Bool, ...)
    when using Storable. Or the the fact that I'd have to write a type
    checker myself. Also there are no schemas and hence no automatic code
    generation. It's all too dynamic for my use case.

    (No, XML (in whatever form) is not an option :) )

    > There are a couple of RPC related modules on CPAN that use Storable
    > for marshalling. Main problem would be that there may be version
    > incompatibilities.


    That, too, is something that both Thrift and Protocol Buffers try to
    address.
     
    Tobias Nissen, Aug 8, 2011
    #3
  4. * Tobias Nissen wrote in comp.lang.perl.misc:
    >Sorry, I forgot to mention it, but I like the idea of the format to be
    >programming language independent. Thrift e.g. seems to officially
    >support 14 different languages, which was one of the reasons why I
    >picked it for my experiments.
    >
    >I also dislike the absence of basic types (like String, Int, Bool, ...)
    >when using Storable. Or the the fact that I'd have to write a type
    >checker myself. Also there are no schemas and hence no automatic code
    >generation. It's all too dynamic for my use case.


    The types are there just as they exist in Perl, and you can use any tool
    that is compatible with the type system for things like validation. It's
    just bits on the disk, but with all serialization formats you get things
    like "dictionary with key string example and value number 1" in memory;
    JSON::Schema for instance does not require you to pass in actual JSON.

    The popular formats with good tool support in Perl are YAML and JSON and
    for both you'd have to use an encoding like Base64 to reliably serialize
    binary data (YAML though allows you to tag values as Base64-encoded, so
    in theory the support for binary data is better there, but tool support
    for that is a bit lacking).

    Note that Perl itself does not distinguish between text and binary, you
    will have to include logic for that in the code regardless of the format
    you pick, if you actually need to tell those cases apart (if you do not
    care, note that U+0000 through U+00FF are perfectly valid characters and
    can be represented easily in either format; in that sense both support
    binary data quite well).
    --
    Björn Höhrmann · mailto: · http://bjoern.hoehrmann.de
    Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
    25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
     
    Bjoern Hoehrmann, Aug 8, 2011
    #4
  5. Tobias Nissen

    Ted Zlatanov Guest

    On Mon, 8 Aug 2011 17:25:05 +0200 Tobias Nissen <> wrote:

    TN> I have made some tests with the official Perl-Implementation of Thriftâ°
    TN> and have not yet discovered any major issues that would prevent me from
    TN> using it. However, it lacks proper documenation and I am not entirely
    TN> convinced that the user-mailing list would be especially helpful, once
    TN> I dig deeper into it.

    I suffered quite a bit using Thrift from Perl for the
    Net::Cassandra::Easy module. It was frustrating. I think Chip
    Salzenberg had a similar experience and ended up routing around the
    Thrift Perl modules, using the C or C++ Thrift interfaces, last time I
    talked to him. Thrift in Perl is slow, slow, slow.

    The Thrift developers asked me to rewrite their Perl support to make it
    more efficient. I don't have the cycles to do it, but if anyone wants
    to do it... feel free.

    There's also Avro, which at one time was a contender to replace Thrift
    for Cassandra. It didn't, but it's a pretty nice protocol, similar to
    Thrift and probably better for Perl support.

    Today, I would either use JSON-encoded binary data or I would find a
    way to use Google Protocol Buffers through a C/C++ embedded library,
    depending on the complexity of the work.

    Ted
     
    Ted Zlatanov, Aug 8, 2011
    #5
  6. Bjoern Hoehrmann <> writes:

    > That would seem to be Storable.pm, which is a Core module since v5.7.3;
    > it supports pretty much everything you can reasonably represent in Perl
    > including, say, circular references, which formats like JSON are unable
    > to support without adding awkward indirection.


    On the other hand, supporting pretty much everything has a cost even
    though you don't use cyclic structures. For one benchmark[0] I measured
    JSON::XS to be about 4 times faster going from perl to serialized format
    and 25% faster the other way too.

    The JSON::XS output was even smaller than the Storable output...

    0) https://github.com/pmakholm/benchmark-serialize-perl/blob/master/README


    But in the end it all depends on your needs. If you need support for
    non-treeish data JSON is a no-go. If you need direct support for blessed
    references JSON needs to be wrapped (I have never benchmarked this). If
    ypu need a open door for non-perl languages Storable is a no-go.

    If you think it is worth discussing (that is, in you case it will not be
    a microoptimization), then come up with some example structures and feed
    them to Benchmark::Serialize. This gives you numbers - and then we can
    discuss the relevance of these numbers afterwards.


    > There are a couple of RPC related modules on CPAN that use Storable
    > for marshalling. Main problem would be that there may be version
    > incompatibilities.


    Yeah, I have some bad experiences. Version incompatibilities. 32 bit/64
    bit incompatibilities. The 3 years later we want to interface the system
    from this lua scriptable C++ project incompatibilities.

    But hey, used right it is at least endian agnostic. Not that I remember
    the last time I deployed my in-house developed project on anything not
    from the x86 family of endianness...

    Even if I needed the extra features I would not easily go for Storable.

    //Makholm
     
    Peter Makholm, Aug 8, 2011
    #6
  7. Ted Zlatanov wrote:
    [...]
    > Today, I would either use JSON-encoded binary data or I would find a
    > way to use Google Protocol Buffers through a C/C++ embedded library,
    > depending on the complexity of the work.


    Since the whole thing is going to do RPC I want something that does not
    process messages not conforming to some kind of schema. JSON::Schema¹
    does not seem to be in a "production grade" state.

    I'm not quite sure whether protoxs² validates messages/requests/calls
    (whatever you want to call it) upon receipt. But at least the generated
    code forces programmers on the client side and on the server side to a
    common set of fields.

    I want to use Moose throughout the code. Is there some RPC framework
    that it more suited for the use with Moose (and protobuf) than others?
    There are so many to pick from!

    ¹ http://search.cpan.org/dist/JSON-Schema/
    ² http://code.google.com/p/protobuf-perlxs/
     
    Tobias Nissen, Aug 9, 2011
    #7
  8. Tobias Nissen wrote:
    [...]
    > I'm not quite sure whether protoxs² validates messages/requests/calls
    > (whatever you want to call it) upon receipt.


    Ah please ignore that, protobuf doesn't have an RPC-implementation, I
    confused it with Thrift.
     
    Tobias Nissen, Aug 9, 2011
    #8
  9. Tobias Nissen

    Ted Zlatanov Guest

    On Tue, 9 Aug 2011 09:24:21 +0200 Tobias Nissen <> wrote:

    TN> Ted Zlatanov wrote:
    TN> [...]
    >> Today, I would either use JSON-encoded binary data or I would find a
    >> way to use Google Protocol Buffers through a C/C++ embedded library,
    >> depending on the complexity of the work.


    TN> Since the whole thing is going to do RPC I want something that does not
    TN> process messages not conforming to some kind of schema. JSON::Schema¹
    TN> does not seem to be in a "production grade" state.

    I don't know what a schema will buy you, because I don't know your
    specific needs. If you *really* need a schema, maybe you need a
    database that will enforce it for you. But anyhow, my point was that
    the Thrift Perl bindings are probably too slow.

    TN> I want to use Moose throughout the code. Is there some RPC framework
    TN> that it more suited for the use with Moose (and protobuf) than others?

    Do not jump into Moose if you need speed, unless it's for managing just
    the connections. Moose may be too slow to manage your data structures.
    At least benchmark it before you commit to using it.

    Ted
     
    Ted Zlatanov, Aug 9, 2011
    #9
  10. Tobias Nissen

    Ted Zlatanov Guest

    Ted Zlatanov wrote:
    >>> Today, I would either use JSON-encoded binary data or I would find a
    >>> way to use Google Protocol Buffers through a C/C++ embedded library,
    >>> depending on the complexity of the work.


    Speaking of JSON-encoded binary, MessagePack may be decent according to
    this blog entry:

    http://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/

    He also mentions Avro, Thrift, and Protocol Buffers obviously, so these
    four seem to be a fairly solid set of choices nowadays.

    Ted
     
    Ted Zlatanov, Aug 9, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    8
    Views:
    2,273
    deadsea
    Jan 2, 2005
  2. Replies:
    3
    Views:
    1,056
  3. Dimitri Ognibene
    Replies:
    4
    Views:
    797
    Dimitri Ognibene
    Sep 2, 2006
  4. joshbaptiste
    Replies:
    2
    Views:
    230
    joshbaptiste
    Apr 15, 2009
  5. Ramunas Urbonas
    Replies:
    1
    Views:
    412
    Dino Chiesa [Microsoft]
    Jul 27, 2004
Loading...

Share This Page