A neat trick to serialize arrays and hashes

Discussion in 'Perl Misc' started by J. Romano, Jun 18, 2004.

  1. J. Romano

    J. Romano Guest

    Dear Perl community,

    Today I invented a neat new trick that I thought I'd share with
    everyone here.

    But before I continue, I'd like to point out to anyone out there
    who thinks that my trick is "obvious to everyone but inexperienced
    programmers" or that "it's not worth knowing because better approaches
    exist" that some people enjoy learning a new simple trick, even if
    they never get a chance to apply it. Besides, sharing a trick that
    was just discovered (even if most programmers already know about it)
    has the benefit of educating any programmer who, for some reason or
    another, happens to not be aware of that particular technique. So if
    you really must reply saying that you already knew this trick, instead
    of saying how it didn't help you at all, how about sharing something
    else that might be useful to someone in the Perl community? That
    would be much appreciated.

    Anyway, now that I'm off my soap box, here is what I discovered
    this morning:

    The pack string "(w/a*)*" is useful for serializing arrays and
    hashes -- that is, it can pack and unpack arrays and hashes to and
    from a string. Let me explain in more detail:

    I have an array, which holds the names of some animals:

    @a = ("dog", "cat", "bird", "camel", "giraffe");

    I might want to serialize @a into a string for the purpose of storing
    it off into a file so I can retrieve it later. Well, I could use the
    Data::Dumper module to create a string (and later the eval command to
    extract out the reference which then I can assign to the array), but
    that can get complicated if I don't have much experience using the
    Data::Dumper module.

    Well, using the pack string "(w/a*)*" I can easily serialize the
    array into a string like so:

    $string = pack("(w/a*)*", @a);

    Now $string contains all the encoded information needed to reconstruct
    the @a array. So if I wanted to use $string to create a @b array that
    was identical to the @a array, I can use unpack() with the same pack
    string:

    @b = unpack("(w/a*)*", $string);

    Neat, doncha think? This same technique also works with hashes:

    $string = pack("(w/a*)*", %ENV);
    %wow = unpack("(w/a*)*", $string);
    # The %wow hash is now an exact copy of %ENV

    Now that we have a string representation of an array or hash, we
    can save the string to a file, send it over a socket, or even encrypt
    it using some encryption algorithm.

    This approach can even handle arrays (and hashes) that contain
    scalars consisting of newlines, null-bytes, and other unprintable
    characters!

    There are a few important items to point out:

    1. The serialized string will most likely contain
    non-printable characters, which may include some
    newline characters, even if no scalar in the
    original array/hash contains a "\n" character.
    Because of this, you should use the binmode()
    function on any filehandle you plan to print the
    string out to.

    2. If the array or hash contains any numbers, they
    will be converted to their string representation.

    3. This technique only handles simple arrays and hashes.
    In other words, multi-dimensional arrays and hashes,
    lists of lists, an references are not handled
    correctly. If you really want to serialize a
    complex structure such as one of these, I recommend
    using another approach, like taking advantage of
    the Data::Dumper module. You CAN however, create
    an array of these serialized arrays, and serialize
    that array!

    4. The "w" in the pack string "(w/a*)*" allows for the
    encoding of any arbitrary-length string, even if it
    is longer than 0xffffffff bytes (4,294,967,295
    bytes). But since "w" is only used for encoding
    non-negative integers, the "(w/a*)*" pack string
    cannot be used to encode arrays or hashes
    containing negative-length strings. Fortunately,
    that's never been a problem for me. :)

    5. I do not know if this trick can handle arrays
    and hashes containing Unicode strings. My guess
    is that it can, but I haven't tested it so I can't
    say for sure.

    Anyway, that's my trick that I thought I would share with the rest
    of you. Have fun with it!

    -- Jean-Luc Romano
     
    J. Romano, Jun 18, 2004
    #1
    1. Advertising

  2. J. Romano

    Matt Garrish Guest

    "J. Romano" <> wrote in message
    news:...
    >


    <snip explanation of packing>

    >
    > There are a few important items to point out:
    >
    > 1. The serialized string will most likely contain
    > non-printable characters, which may include some
    > newline characters, even if no scalar in the
    > original array/hash contains a "\n" character.
    > Because of this, you should use the binmode()
    > function on any filehandle you plan to print the
    > string out to.
    >
    > 2. If the array or hash contains any numbers, they
    > will be converted to their string representation.
    >
    > 3. This technique only handles simple arrays and hashes.
    > In other words, multi-dimensional arrays and hashes,
    > lists of lists, an references are not handled
    > correctly. If you really want to serialize a
    > complex structure such as one of these, I recommend
    > using another approach, like taking advantage of
    > the Data::Dumper module. You CAN however, create
    > an array of these serialized arrays, and serialize
    > that array!
    >
    > 4. The "w" in the pack string "(w/a*)*" allows for the
    > encoding of any arbitrary-length string, even if it
    > is longer than 0xffffffff bytes (4,294,967,295
    > bytes). But since "w" is only used for encoding
    > non-negative integers, the "(w/a*)*" pack string
    > cannot be used to encode arrays or hashes
    > containing negative-length strings. Fortunately,
    > that's never been a problem for me. :)
    >
    > 5. I do not know if this trick can handle arrays
    > and hashes containing Unicode strings. My guess
    > is that it can, but I haven't tested it so I can't
    > say for sure.
    >


    Sorry to rain on your parade, but with all the caveats don't you think it
    would be better just to use the Storable module, especially since it's part
    of the core distribution? Better techniques are worth noting for the simple
    reason that they're better...

    Matt
     
    Matt Garrish, Jun 18, 2004
    #2
    1. Advertising

  3. J. Romano

    Ben Morrow Guest

    Quoth (J. Romano):
    > 3. This technique only handles simple arrays and hashes.
    > In other words, multi-dimensional arrays and hashes,
    > lists of lists, an references are not handled
    > correctly. If you really want to serialize a
    > complex structure such as one of these, I recommend
    > using another approach, like taking advantage of
    > the Data::Dumper module. You CAN however, create
    > an array of these serialized arrays, and serialize
    > that array!


    ....however, you can't then unserialize it, as the references have been
    stringified and can't be converted back to refs. Yet another reason the
    use Storable, which does this right...

    Ben

    --
    perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
    qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
    1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
    2047502190/' #
     
    Ben Morrow, Jun 18, 2004
    #3
  4. Also sprach J. Romano:

    [...]

    > Well, using the pack string "(w/a*)*" I can easily serialize the
    > array into a string like so:
    >
    > $string = pack("(w/a*)*", @a);
    >
    > Now $string contains all the encoded information needed to reconstruct
    > the @a array. So if I wanted to use $string to create a @b array that
    > was identical to the @a array, I can use unpack() with the same pack
    > string:
    >
    > @b = unpack("(w/a*)*", $string);
    >
    > Neat, doncha think? This same technique also works with hashes:
    >
    > $string = pack("(w/a*)*", %ENV);
    > %wow = unpack("(w/a*)*", $string);
    > # The %wow hash is now an exact copy of %ENV
    >
    > Now that we have a string representation of an array or hash, we
    > can save the string to a file, send it over a socket, or even encrypt
    > it using some encryption algorithm.


    [...]

    > 5. I do not know if this trick can handle arrays
    > and hashes containing Unicode strings. My guess
    > is that it can, but I haven't tested it so I can't
    > say for sure.


    It can, but there's a slight drawback: You'll lose the UTF-8 flag when
    unpacking the string:

    $ perl -MDevel::peek -Mcharnames=:full
    Dump((unpack "(w/a*)*", pack "(w/a*)*", "\N{EURO-CURRENCY SIGN}123")[0]);
    ^D
    SV = PV(0x8139efc) at 0x8144cc4
    REFCNT = 1
    FLAGS = (TEMP,POK,pPOK)
    PV = 0x8140020 "\342\202\240123"\0
    CUR = 6
    LEN = 7
    $ perl -MDevel::peek -Mcharnames=:full
    Dump("\N{EURO-CURRENCY SIGN}123");
    ^D
    SV = PV(0x8174280) at 0x814891c
    REFCNT = 1
    FLAGS = (POK,READONLY,pPOK,UTF8)
    PV = 0x81494a8 "\342\202\240123"\0 [UTF8 "\x{20a0}123"]
    CUR = 6
    LEN = 7

    That's a bit of a problem because you can't tell whether
    "\342\202\240123" is just a sequence of bytes or whether it happens to
    be a unicode string.

    > Anyway, that's my trick that I thought I would share with the rest
    > of you. Have fun with it!


    Very nice, thank you. I'm quite a fan of pack/unpack and so I love every
    trick involving those.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Jun 18, 2004
    #4
  5. J. Romano

    J. Romano Guest

    "Matt Garrish" <> wrote in message news:<tFtAc.36736$>...
    >
    > Sorry to rain on your parade, but with
    > all the caveats don't you think it
    > would be better just to use the Storable
    > module, especially since it's part
    > of the core distribution? Better
    > techniques are worth noting for the simple
    > reason that they're better...


    I remember learning about Storable a while back, but I didn't think
    it was part of the core distribution. When I run:

    perl -e "use Storable"

    I get: Can't locate Storable.pm in @INC (...)

    In case you're wondering, the first line of my "perl -v" output is:

    This is perl, v5.6.1 built for i386-linux

    It could be that I'm using an old version of Perl. So for those
    who are using a version too old to have the Storable module (and, for
    some reason, are unable to install that module), they can still use
    the "(w/a*)*" serialization trick in a pinch.

    -- Jean-Luc
     
    J. Romano, Jun 18, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben Holness

    Hashes of Hashes via subs

    Ben Holness, Oct 5, 2003, in forum: Perl
    Replies:
    8
    Views:
    582
    Ben Holness
    Oct 8, 2003
  2. Gordz
    Replies:
    3
    Views:
    3,971
    John Oakes
    Jun 7, 2004
  3. x1
    Replies:
    20
    Views:
    301
    Daniel Calvelo
    Dec 27, 2005
  4. Tim O'Donovan

    Hash of hashes, of hashes, of arrays of hashes

    Tim O'Donovan, Oct 27, 2005, in forum: Perl Misc
    Replies:
    5
    Views:
    229
  5. Replies:
    3
    Views:
    221
Loading...

Share This Page