Which data serialization format?

T

Tobias Nissen

Hi,

I am unsure which data serialization format to use. I need a format that
also supports serialization of binary data.

A complete RPC-implementation, as in the case of Thrift, would be a
nice add-on. On the other hand I could do it myself and would have some
more control over what's happening under the hood.

Since it's pretty standard nowadays, I'd really like to use JSON, but
sending binary data over it could become quiet awkward.

What I am looking for is a stable, simple, proven and tested system,
which's Perl(!) implementation is in widespread use.

That's basically my question for this post. If you like to read on,
here's my findings so far...


I have made some tests with the official Perl-Implementation of Thriftâ°
and have not yet discovered any major issues that would prevent me from
using it. However, it lacks proper documenation and I am not entirely
convinced that the user-mailing list would be especially helpful, once
I dig deeper into it.

Protocol Buffers OTOH has robust implementations in some languages, but
apparantly Perl's not one of them. There's Google::protocolBuffers¹
which hasn't had a release since 2008 and is stuck at version 0.08. Then
there's protobuf-perl² -- which is dead. Last but not least there's
protobuf-perlxs³ which received two bug reports this year and had its
last commit in august last year.

BSON could be an alternative, but it's Perl-Implementation is, well,
minimalâ´.

BTW, I want a format with a large Perl user base in particular.

Best regards and TIA,
Tobias

_________
â° http://thrift.apache.org/
¹ http://search.cpan.org/~gariev/Google-ProtocolBuffers-0.08/lib/Google/ProtocolBuffers.pm
² http://code.google.com/p/protobuf-perl/
³ http://code.google.com/p/protobuf-perlxs/
â´ http://search.cpan.org/~minimal/BSON-0.03/lib/BSON.pm
 
B

Bjoern Hoehrmann

* Tobias Nissen wrote in comp.lang.perl.misc:
I am unsure which data serialization format to use. I need a format that
also supports serialization of binary data.

A complete RPC-implementation, as in the case of Thrift, would be a
nice add-on. On the other hand I could do it myself and would have some
more control over what's happening under the hood.

Since it's pretty standard nowadays, I'd really like to use JSON, but
sending binary data over it could become quiet awkward.

What I am looking for is a stable, simple, proven and tested system,
which's Perl(!) implementation is in widespread use.

That would seem to be Storable.pm, which is a Core module since v5.7.3;
it supports pretty much everything you can reasonably represent in Perl
including, say, circular references, which formats like JSON are unable
to support without adding awkward indirection. There are a couple of RPC
related modules on CPAN that use Storable for marshalling. Main problem
would be that there may be version incompatibilities.
 
T

Tobias Nissen

Bjoern said:
* Tobias Nissen wrote in comp.lang.perl.misc:

That would seem to be Storable.pm, which is a Core module since
v5.7.3; it supports pretty much everything you can reasonably
represent in Perl including, say, circular references, which formats
like JSON are unable to support without adding awkward indirection.

Sorry, I forgot to mention it, but I like the idea of the format to be
programming language independent. Thrift e.g. seems to officially
support 14 different languages, which was one of the reasons why I
picked it for my experiments.

I also dislike the absence of basic types (like String, Int, Bool, ...)
when using Storable. Or the the fact that I'd have to write a type
checker myself. Also there are no schemas and hence no automatic code
generation. It's all too dynamic for my use case.

(No, XML (in whatever form) is not an option :) )
There are a couple of RPC related modules on CPAN that use Storable
for marshalling. Main problem would be that there may be version
incompatibilities.

That, too, is something that both Thrift and Protocol Buffers try to
address.
 
B

Bjoern Hoehrmann

* Tobias Nissen wrote in comp.lang.perl.misc:
Sorry, I forgot to mention it, but I like the idea of the format to be
programming language independent. Thrift e.g. seems to officially
support 14 different languages, which was one of the reasons why I
picked it for my experiments.

I also dislike the absence of basic types (like String, Int, Bool, ...)
when using Storable. Or the the fact that I'd have to write a type
checker myself. Also there are no schemas and hence no automatic code
generation. It's all too dynamic for my use case.

The types are there just as they exist in Perl, and you can use any tool
that is compatible with the type system for things like validation. It's
just bits on the disk, but with all serialization formats you get things
like "dictionary with key string example and value number 1" in memory;
JSON::Schema for instance does not require you to pass in actual JSON.

The popular formats with good tool support in Perl are YAML and JSON and
for both you'd have to use an encoding like Base64 to reliably serialize
binary data (YAML though allows you to tag values as Base64-encoded, so
in theory the support for binary data is better there, but tool support
for that is a bit lacking).

Note that Perl itself does not distinguish between text and binary, you
will have to include logic for that in the code regardless of the format
you pick, if you actually need to tell those cases apart (if you do not
care, note that U+0000 through U+00FF are perfectly valid characters and
can be represented easily in either format; in that sense both support
binary data quite well).
 
T

Ted Zlatanov

TN> I have made some tests with the official Perl-Implementation of Thriftâ°
TN> and have not yet discovered any major issues that would prevent me from
TN> using it. However, it lacks proper documenation and I am not entirely
TN> convinced that the user-mailing list would be especially helpful, once
TN> I dig deeper into it.

I suffered quite a bit using Thrift from Perl for the
Net::Cassandra::Easy module. It was frustrating. I think Chip
Salzenberg had a similar experience and ended up routing around the
Thrift Perl modules, using the C or C++ Thrift interfaces, last time I
talked to him. Thrift in Perl is slow, slow, slow.

The Thrift developers asked me to rewrite their Perl support to make it
more efficient. I don't have the cycles to do it, but if anyone wants
to do it... feel free.

There's also Avro, which at one time was a contender to replace Thrift
for Cassandra. It didn't, but it's a pretty nice protocol, similar to
Thrift and probably better for Perl support.

Today, I would either use JSON-encoded binary data or I would find a
way to use Google Protocol Buffers through a C/C++ embedded library,
depending on the complexity of the work.

Ted
 
P

Peter Makholm

Bjoern Hoehrmann said:
That would seem to be Storable.pm, which is a Core module since v5.7.3;
it supports pretty much everything you can reasonably represent in Perl
including, say, circular references, which formats like JSON are unable
to support without adding awkward indirection.

On the other hand, supporting pretty much everything has a cost even
though you don't use cyclic structures. For one benchmark[0] I measured
JSON::XS to be about 4 times faster going from perl to serialized format
and 25% faster the other way too.

The JSON::XS output was even smaller than the Storable output...

0) https://github.com/pmakholm/benchmark-serialize-perl/blob/master/README


But in the end it all depends on your needs. If you need support for
non-treeish data JSON is a no-go. If you need direct support for blessed
references JSON needs to be wrapped (I have never benchmarked this). If
ypu need a open door for non-perl languages Storable is a no-go.

If you think it is worth discussing (that is, in you case it will not be
a microoptimization), then come up with some example structures and feed
them to Benchmark::Serialize. This gives you numbers - and then we can
discuss the relevance of these numbers afterwards.

There are a couple of RPC related modules on CPAN that use Storable
for marshalling. Main problem would be that there may be version
incompatibilities.

Yeah, I have some bad experiences. Version incompatibilities. 32 bit/64
bit incompatibilities. The 3 years later we want to interface the system
from this lua scriptable C++ project incompatibilities.

But hey, used right it is at least endian agnostic. Not that I remember
the last time I deployed my in-house developed project on anything not
from the x86 family of endianness...

Even if I needed the extra features I would not easily go for Storable.

//Makholm
 
T

Tobias Nissen

Ted Zlatanov wrote:
[...]
Today, I would either use JSON-encoded binary data or I would find a
way to use Google Protocol Buffers through a C/C++ embedded library,
depending on the complexity of the work.

Since the whole thing is going to do RPC I want something that does not
process messages not conforming to some kind of schema. JSON::Schema¹
does not seem to be in a "production grade" state.

I'm not quite sure whether protoxs² validates messages/requests/calls
(whatever you want to call it) upon receipt. But at least the generated
code forces programmers on the client side and on the server side to a
common set of fields.

I want to use Moose throughout the code. Is there some RPC framework
that it more suited for the use with Moose (and protobuf) than others?
There are so many to pick from!

¹ http://search.cpan.org/dist/JSON-Schema/
² http://code.google.com/p/protobuf-perlxs/
 
T

Tobias Nissen

Tobias Nissen wrote:
[...]
I'm not quite sure whether protoxs² validates messages/requests/calls
(whatever you want to call it) upon receipt.

Ah please ignore that, protobuf doesn't have an RPC-implementation, I
confused it with Thrift.
 
T

Ted Zlatanov

TN> Ted Zlatanov wrote:
TN> [...]
TN> Since the whole thing is going to do RPC I want something that does not
TN> process messages not conforming to some kind of schema. JSON::Schema¹
TN> does not seem to be in a "production grade" state.

I don't know what a schema will buy you, because I don't know your
specific needs. If you *really* need a schema, maybe you need a
database that will enforce it for you. But anyhow, my point was that
the Thrift Perl bindings are probably too slow.

TN> I want to use Moose throughout the code. Is there some RPC framework
TN> that it more suited for the use with Moose (and protobuf) than others?

Do not jump into Moose if you need speed, unless it's for managing just
the connections. Moose may be too slow to manage your data structures.
At least benchmark it before you commit to using it.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top