"pack" perldoc error?

J

J. Romano

Dear All,

I recently got burned with some code that I wrote, so I thought I
might share it with you all (at least to raise awareness):

I was packing a string with a line like:

$packedString = pack("a10",$text);

and unpacking with a line like:

$text2 = unpack("A10",$packedString);

I packed the text with "a" because I wanted to pad the remaining
spaces with null characters, and I unpacked the same string with "A"
because, according to the documentation, "A" strips trailing spaces
and nulls.

So as long as the $text variable doesn't have any trailing spaces
(and assuming it's not longer than 10 characters), $text2 should
always be the same as $text, right?

Well, no, it's not. Apparently, if $text ends with any number of
newlines (like "Hello!\n\n\n\n"), the newlines will get stripped off
when it is unpacked with "A", even though the documentation only says
that trailing spaces and null will be stripped off (while making no
mention of other types of whitespace). I tried this on Linux and
Win32, and they both behave the same way. It took me a while to
figure this out when I was hunting down a bug in one of my scripts.

Another thing I found that was not correct was in the perldoc for
"pack". It explicitly says: 'When packing, "a", and "Z" are
equivalent.' However, I found (at least on Linux and
Win32-ActiveState) that "Z" will always null-terminate a string, even
if it means an extra character of the string gets cut off.

Here's a short perl script to demontrate:

#!/usr/bin/perl -w
use strict;

my $text = "1234567890";

my $a_packed = pack('a5', $text);
my $Z_packed = pack('Z5', $text);

my $a_len = length($a_packed);
my $Z_len = length($Z_packed);

print "\$a_packed: \"$a_packed\" (length = $a_len)\n";
print "\$Z_packed: \"$Z_packed\" (length = $Z_len)\n";

__END__

This script prints:

$a_packed: "12345" (length = 5)
$Z_packed: "1234" (length = 5)

According to the perldoc, "a" and "Z" are equivalent when used with
"pack", yet apparently they are not here, or else $a_packed and
$z_packed would be identical strings. Apparently "Z" always
null-terminates the string, even if the string is longer than the
length of the field to pack it in.

So this is either a bug with the pack function or else the perldoc
documentation is a little wrong. I would like to use packing with "Z"
in one of my scripts, but I'm a little hesitant to do so until I know
how it's supposed to work (and that it's working correctly).

Does anybody know what the correct behavior of pack("Z5",$text)
should be?

(And is unpacking with "A" supposed to strip off any trailing
newlines or whitespace?)

Thanks.

-- J.
 
A

Anno Siegel

[...]
$packedString = pack("a10",$text);
$text2 = unpack("A10",$packedString);

I packed the text with "a" because I wanted to pad the remaining
spaces with null characters, and I unpacked the same string with "A"
because, according to the documentation, "A" strips trailing spaces
and nulls.

So as long as the $text variable doesn't have any trailing spaces
(and assuming it's not longer than 10 characters), $text2 should
always be the same as $text, right?

Well, no, it's not. Apparently, if $text ends with any number of
newlines (like "Hello!\n\n\n\n"), the newlines will get stripped off

....and also tabs, and other white-space characters. Apparently, "space"
is used in the sense of "any white-space character" here. That could
be made clearer, but isn't unheard-of.
Another thing I found that was not correct was in the perldoc for
"pack". It explicitly says: 'When packing, "a", and "Z" are
equivalent.' However, I found (at least on Linux and
Win32-ActiveState) that "Z" will always null-terminate a string, even
if it means an extra character of the string gets cut off.

Yes, the equivalence seems to be restricted to the case where both
formats are long enough to allow at least one trailing zero. If it
isn't, "Z" forces a null byte, but "a" doesn't. This certainly
deserves explicit mention, though it *is* described elsewhere in the
doc.

Anno
 
J

J. Romano

Yes, the equivalence seems to be restricted to the case where both
formats are long enough to allow at least one trailing zero. If it
isn't, "Z" forces a null byte, but "a" doesn't. This certainly
deserves explicit mention, though it *is* described elsewhere in the
doc.

Thanks for your response, Anno.

It looks like you're absolutely right about that... If I had read
just one more paragraph, I would have seen the part where it mentions
that "Z" "...always packs a trailing null byte under all
circumstances."

Oh, well... had I not asked the question, I may have been more
confused if I had read that paragraph after all (since the statements
appear to slightly contradict each other).

So correct me if I'm wrong here:

When I want to pack and unpack strings to/from a C structure that
requires that its strings be NULL-terminated, I should pack and unpack
with "Z".

But when I want to pack and unpack strings to/from a record for use
with a Perl script (that doesn't require a NULL-terminator), I should
pack with "a" (to pad with null bytes) and unpack with "Z" (to strip
off trailing null bytes). Although I unpack with "Z", packing with
"a" would allow me to make use of the extra character that would
otherwise get used as a null terminator.

This will only work, of course, if I have no nulls in my string.

-- J.
 
A

Anno Siegel

J. Romano said:
(e-mail address removed)-berlin.de (Anno Siegel) wrote in message

[...]

So correct me if I'm wrong here:

When I want to pack and unpack strings to/from a C structure that
requires that its strings be NULL-terminated, I should pack and unpack
with "Z".

But when I want to pack and unpack strings to/from a record for use
with a Perl script (that doesn't require a NULL-terminator), I should
pack with "a" (to pad with null bytes) and unpack with "Z" (to strip
off trailing null bytes). Although I unpack with "Z", packing with
"a" would allow me to make use of the extra character that would
otherwise get used as a null terminator.

First off, your equivalence is wrong. Not every string in C needs a
trailing zero, and on the other hand, zero-bytes can be meaningful in
Perl strings as well.

I'm not sure against what eventualities you are trying to guard yourself.
If you don't need padding, use 'a' for both directions, and don't generate
any padding while pack()-ing. If we're talking fixed-length fields
(sorry if that has been clear in the thread, I'm not sure right now),
the a/Z combo looks about right, because then padding *will* happen for
short strings.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top