pack 'C3U*' not same as pack 'C3(xC)*'

A

Alexander Farber

Hi,

I have a small card game. The clients are Java-applets and the
server is written in C, mostly forwarding data from applet to applet.

The message format is:

1 byte: Number of unicode chars (s. below)
2 byte: Player number
3 byte: Event id
up to 510 bytes: A Java unicode string

Now I'm trying to rewrite my C-server to perl, because that way
it's easier to add features (syslog, auth against an SQL-db, etc.)

I have problems to understand what would be the best pack-format for
my messages. I have read "perldoc -f pack" numerous times and also
the many O'Reilly books I have, but the best I've come up with is

pack "C3(xC)*", length $ascii_str, $num, $id, unpack "C*",
$ascii_str;

for the cases, when I need to send an ASCII string (like an IP address
string) from the server to the Java-applet and thus have to stuff the
upper bytes of that ASCII with zeros (that's why the "x" above).

I wonder, why doesn't pack "C3U*" do the same? Here is a demo:

# perl -e '$str=pack "C3(xC)*", 4, 0, 14, unpack "C*", "test"; \
print join " ", unpack "C*", $str'

4 0 14 0 116 0 101 0 115 0 116

# perl -e '$str=pack "C3U*", 4, 0, 14, unpack "C*", "test"; \
print join " ", unpack "C*", $str'

4 0 14 116 101 115 116

As you see, the stuffing zeros are missing in the second output.
But why? Doesn't "perldoc -f pack" say

If you don't want this [UTF8] to happen, you can
begin your pattern with "C0" (or anything else) to force
Perl not to UTF8 encode your string, and then follow
this with a "U*" somewhere in your pattern.

Regards
Alex

PS: Also I wonder, if there are any nicer ways to communicate
Java-strings to Perl. "perldoc -f pack" mentions "n/..."
for Java-Strings, but doesn't elaborate. Is it "n/U*" ?
 
M

Mark

Alexander said:
Hi,

I have a small card game. The clients are Java-applets and the
server is written in C, mostly forwarding data from applet to applet.

The message format is:

1 byte: Number of unicode chars (s. below)
2 byte: Player number
3 byte: Event id
up to 510 bytes: A Java unicode string

Now I'm trying to rewrite my C-server to perl, because that way
it's easier to add features (syslog, auth against an SQL-db, etc.)
PS: Also I wonder, if there are any nicer ways to communicate
Java-strings to Perl. "perldoc -f pack" mentions "n/..."
for Java-Strings, but doesn't elaborate. Is it "n/U*" ?

I'd be tempted to use XML as the data format, in fact, I'd probably use
SOAP.

Mark
 
I

Ilmari Karonen

Alexander Farber said:
The message format is:

1 byte: Number of unicode chars (s. below)
2 byte: Player number
3 byte: Event id
up to 510 bytes: A Java unicode string

Your "Java unicode string" is presumably in (big-endian) UCS-2, which
is the representation used internally by Java. This is not how perl
normally encodes Unicode strings.
I have problems to understand what would be the best pack-format for
my messages. I have read "perldoc -f pack" numerous times and also
the many O'Reilly books I have, but the best I've come up with is

pack "C3(xC)*", length $ascii_str, $num, $id, unpack "C*", $ascii_str;

This is indeed a perfectly good way to convert ASCII (or ISO Latin 1)
text to UCS-2. If you want to handle characters above 255 as well,
may I suggest something like:

pack "C3n*", length($string), $num, $id, unpack "U*", $string;
I wonder, why doesn't pack "C3U*" do the same? Here is a demo:

Because pack("U*") encodes the characters in UTF-8, not in UCS-2.
UTF-8 is a variable-length format which encodes ASCII characters in a
single byte and other characters in 2 or more bytes. So if your
original string only contains ASCII characters, it makes no difference
whether you use "U*" or "C*".

UTF-8 is also the format used by perl to store Unicode strings
internally, although perl hides this fact reasonably well -- in
theory, at least. As perl's Unicode support matures, practice is
gradually starting to approach theory here.

For more information, try googling for UTF-8 and UCS-2.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top