"pack" perldoc error?

Discussion in 'Perl Misc' started by J. Romano, Nov 6, 2003.

  1. J. Romano

    J. Romano Guest

    Dear All,

    I recently got burned with some code that I wrote, so I thought I
    might share it with you all (at least to raise awareness):

    I was packing a string with a line like:

    $packedString = pack("a10",$text);

    and unpacking with a line like:

    $text2 = unpack("A10",$packedString);

    I packed the text with "a" because I wanted to pad the remaining
    spaces with null characters, and I unpacked the same string with "A"
    because, according to the documentation, "A" strips trailing spaces
    and nulls.

    So as long as the $text variable doesn't have any trailing spaces
    (and assuming it's not longer than 10 characters), $text2 should
    always be the same as $text, right?

    Well, no, it's not. Apparently, if $text ends with any number of
    newlines (like "Hello!\n\n\n\n"), the newlines will get stripped off
    when it is unpacked with "A", even though the documentation only says
    that trailing spaces and null will be stripped off (while making no
    mention of other types of whitespace). I tried this on Linux and
    Win32, and they both behave the same way. It took me a while to
    figure this out when I was hunting down a bug in one of my scripts.

    Another thing I found that was not correct was in the perldoc for
    "pack". It explicitly says: 'When packing, "a", and "Z" are
    equivalent.' However, I found (at least on Linux and
    Win32-ActiveState) that "Z" will always null-terminate a string, even
    if it means an extra character of the string gets cut off.

    Here's a short perl script to demontrate:

    #!/usr/bin/perl -w
    use strict;

    my $text = "1234567890";

    my $a_packed = pack('a5', $text);
    my $Z_packed = pack('Z5', $text);

    my $a_len = length($a_packed);
    my $Z_len = length($Z_packed);

    print "\$a_packed: \"$a_packed\" (length = $a_len)\n";
    print "\$Z_packed: \"$Z_packed\" (length = $Z_len)\n";

    __END__

    This script prints:

    $a_packed: "12345" (length = 5)
    $Z_packed: "1234" (length = 5)

    According to the perldoc, "a" and "Z" are equivalent when used with
    "pack", yet apparently they are not here, or else $a_packed and
    $z_packed would be identical strings. Apparently "Z" always
    null-terminates the string, even if the string is longer than the
    length of the field to pack it in.

    So this is either a bug with the pack function or else the perldoc
    documentation is a little wrong. I would like to use packing with "Z"
    in one of my scripts, but I'm a little hesitant to do so until I know
    how it's supposed to work (and that it's working correctly).

    Does anybody know what the correct behavior of pack("Z5",$text)
    should be?

    (And is unpacking with "A" supposed to strip off any trailing
    newlines or whitespace?)

    Thanks.

    -- J.
     
    J. Romano, Nov 6, 2003
    #1
    1. Advertising

  2. J. Romano

    Anno Siegel Guest

    J. Romano <> wrote in comp.lang.perl.misc:

    [...]

    > $packedString = pack("a10",$text);
    > $text2 = unpack("A10",$packedString);
    >
    > I packed the text with "a" because I wanted to pad the remaining
    > spaces with null characters, and I unpacked the same string with "A"
    > because, according to the documentation, "A" strips trailing spaces
    > and nulls.
    >
    > So as long as the $text variable doesn't have any trailing spaces
    > (and assuming it's not longer than 10 characters), $text2 should
    > always be the same as $text, right?
    >
    > Well, no, it's not. Apparently, if $text ends with any number of
    > newlines (like "Hello!\n\n\n\n"), the newlines will get stripped off


    ....and also tabs, and other white-space characters. Apparently, "space"
    is used in the sense of "any white-space character" here. That could
    be made clearer, but isn't unheard-of.

    > Another thing I found that was not correct was in the perldoc for
    > "pack". It explicitly says: 'When packing, "a", and "Z" are
    > equivalent.' However, I found (at least on Linux and
    > Win32-ActiveState) that "Z" will always null-terminate a string, even
    > if it means an extra character of the string gets cut off.


    Yes, the equivalence seems to be restricted to the case where both
    formats are long enough to allow at least one trailing zero. If it
    isn't, "Z" forces a null byte, but "a" doesn't. This certainly
    deserves explicit mention, though it *is* described elsewhere in the
    doc.

    Anno
     
    Anno Siegel, Nov 6, 2003
    #2
    1. Advertising

  3. J. Romano

    J. Romano Guest

    -berlin.de (Anno Siegel) wrote in message news:<bod6f1$n78$-Berlin.DE>...

    > Yes, the equivalence seems to be restricted to the case where both
    > formats are long enough to allow at least one trailing zero. If it
    > isn't, "Z" forces a null byte, but "a" doesn't. This certainly
    > deserves explicit mention, though it *is* described elsewhere in the
    > doc.


    Thanks for your response, Anno.

    It looks like you're absolutely right about that... If I had read
    just one more paragraph, I would have seen the part where it mentions
    that "Z" "...always packs a trailing null byte under all
    circumstances."

    Oh, well... had I not asked the question, I may have been more
    confused if I had read that paragraph after all (since the statements
    appear to slightly contradict each other).

    So correct me if I'm wrong here:

    When I want to pack and unpack strings to/from a C structure that
    requires that its strings be NULL-terminated, I should pack and unpack
    with "Z".

    But when I want to pack and unpack strings to/from a record for use
    with a Perl script (that doesn't require a NULL-terminator), I should
    pack with "a" (to pad with null bytes) and unpack with "Z" (to strip
    off trailing null bytes). Although I unpack with "Z", packing with
    "a" would allow me to make use of the extra character that would
    otherwise get used as a null terminator.

    This will only work, of course, if I have no nulls in my string.

    -- J.
     
    J. Romano, Nov 8, 2003
    #3
  4. J. Romano

    Anno Siegel Guest

    J. Romano <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) wrote in message
    > news:<bod6f1$n78$-Berlin.DE>...


    [...]

    > So correct me if I'm wrong here:
    >
    > When I want to pack and unpack strings to/from a C structure that
    > requires that its strings be NULL-terminated, I should pack and unpack
    > with "Z".
    >
    > But when I want to pack and unpack strings to/from a record for use
    > with a Perl script (that doesn't require a NULL-terminator), I should
    > pack with "a" (to pad with null bytes) and unpack with "Z" (to strip
    > off trailing null bytes). Although I unpack with "Z", packing with
    > "a" would allow me to make use of the extra character that would
    > otherwise get used as a null terminator.


    First off, your equivalence is wrong. Not every string in C needs a
    trailing zero, and on the other hand, zero-bytes can be meaningful in
    Perl strings as well.

    I'm not sure against what eventualities you are trying to guard yourself.
    If you don't need padding, use 'a' for both directions, and don't generate
    any padding while pack()-ing. If we're talking fixed-length fields
    (sorry if that has been clear in the thread, I'm not sure right now),
    the a/Z combo looks about right, because then padding *will* happen for
    short strings.

    Anno
     
    Anno Siegel, Nov 8, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tim Jones
    Replies:
    0
    Views:
    400
    Tim Jones
    Jan 31, 2004
  2. Replies:
    27
    Views:
    841
    Marc 'BlackJack' Rintsch
    Mar 17, 2006
  3. James Edward Gray II

    Perldoc Equivalent?

    James Edward Gray II, Aug 28, 2004, in forum: Ruby
    Replies:
    19
    Views:
    374
    Mauricio Fernández
    Sep 6, 2004
  4. Derek Smith

    ri and rdoc....like perldoc?

    Derek Smith, Jul 7, 2010, in forum: Ruby
    Replies:
    7
    Views:
    180
    Roger Pack
    Jul 9, 2010
  5. Alexander Farber

    pack 'C3U*' not same as pack 'C3(xC)*'

    Alexander Farber, Jun 23, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    157
    Ilmari Karonen
    Jun 23, 2005
Loading...

Share This Page