pack 'C3U*' not same as pack 'C3(xC)*'

Discussion in 'Perl Misc' started by Alexander Farber, Jun 23, 2005.

  1. Hi,

    I have a small card game. The clients are Java-applets and the
    server is written in C, mostly forwarding data from applet to applet.

    The message format is:

    1 byte: Number of unicode chars (s. below)
    2 byte: Player number
    3 byte: Event id
    up to 510 bytes: A Java unicode string

    Now I'm trying to rewrite my C-server to perl, because that way
    it's easier to add features (syslog, auth against an SQL-db, etc.)

    I have problems to understand what would be the best pack-format for
    my messages. I have read "perldoc -f pack" numerous times and also
    the many O'Reilly books I have, but the best I've come up with is

    pack "C3(xC)*", length $ascii_str, $num, $id, unpack "C*",
    $ascii_str;

    for the cases, when I need to send an ASCII string (like an IP address
    string) from the server to the Java-applet and thus have to stuff the
    upper bytes of that ASCII with zeros (that's why the "x" above).

    I wonder, why doesn't pack "C3U*" do the same? Here is a demo:

    # perl -e '$str=pack "C3(xC)*", 4, 0, 14, unpack "C*", "test"; \
    print join " ", unpack "C*", $str'

    4 0 14 0 116 0 101 0 115 0 116

    # perl -e '$str=pack "C3U*", 4, 0, 14, unpack "C*", "test"; \
    print join " ", unpack "C*", $str'

    4 0 14 116 101 115 116

    As you see, the stuffing zeros are missing in the second output.
    But why? Doesn't "perldoc -f pack" say

    If you don't want this [UTF8] to happen, you can
    begin your pattern with "C0" (or anything else) to force
    Perl not to UTF8 encode your string, and then follow
    this with a "U*" somewhere in your pattern.

    Regards
    Alex

    PS: Also I wonder, if there are any nicer ways to communicate
    Java-strings to Perl. "perldoc -f pack" mentions "n/..."
    for Java-Strings, but doesn't elaborate. Is it "n/U*" ?
    Alexander Farber, Jun 23, 2005
    #1
    1. Advertising

  2. Alexander Farber

    Mark Guest

    Alexander Farber wrote:
    > Hi,
    >
    > I have a small card game. The clients are Java-applets and the
    > server is written in C, mostly forwarding data from applet to applet.
    >
    > The message format is:
    >
    > 1 byte: Number of unicode chars (s. below)
    > 2 byte: Player number
    > 3 byte: Event id
    > up to 510 bytes: A Java unicode string
    >
    > Now I'm trying to rewrite my C-server to perl, because that way
    > it's easier to add features (syslog, auth against an SQL-db, etc.)

    <snip>

    > PS: Also I wonder, if there are any nicer ways to communicate
    > Java-strings to Perl. "perldoc -f pack" mentions "n/..."
    > for Java-Strings, but doesn't elaborate. Is it "n/U*" ?


    I'd be tempted to use XML as the data format, in fact, I'd probably use
    SOAP.

    Mark
    Mark, Jun 23, 2005
    #2
    1. Advertising

  3. Alexander Farber <> kirjoitti 23.06.2005:
    >
    > The message format is:
    >
    > 1 byte: Number of unicode chars (s. below)
    > 2 byte: Player number
    > 3 byte: Event id
    > up to 510 bytes: A Java unicode string


    Your "Java unicode string" is presumably in (big-endian) UCS-2, which
    is the representation used internally by Java. This is not how perl
    normally encodes Unicode strings.

    > I have problems to understand what would be the best pack-format for
    > my messages. I have read "perldoc -f pack" numerous times and also
    > the many O'Reilly books I have, but the best I've come up with is
    >
    > pack "C3(xC)*", length $ascii_str, $num, $id, unpack "C*", $ascii_str;


    This is indeed a perfectly good way to convert ASCII (or ISO Latin 1)
    text to UCS-2. If you want to handle characters above 255 as well,
    may I suggest something like:

    pack "C3n*", length($string), $num, $id, unpack "U*", $string;

    > I wonder, why doesn't pack "C3U*" do the same? Here is a demo:


    Because pack("U*") encodes the characters in UTF-8, not in UCS-2.
    UTF-8 is a variable-length format which encodes ASCII characters in a
    single byte and other characters in 2 or more bytes. So if your
    original string only contains ASCII characters, it makes no difference
    whether you use "U*" or "C*".

    UTF-8 is also the format used by perl to store Unicode strings
    internally, although perl hides this fact reasonably well -- in
    theory, at least. As perl's Unicode support matures, practice is
    gradually starting to approach theory here.

    For more information, try googling for UTF-8 and UCS-2.

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
    Ilmari Karonen, Jun 23, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    589
    Juan T. Llibre
    Aug 31, 2005
  2. c guan
    Replies:
    0
    Views:
    521
    c guan
    Aug 4, 2004
  3. George Davis
    Replies:
    1
    Views:
    1,054
    WebcastMaker
    Aug 29, 2004
  4. Tim Jones
    Replies:
    0
    Views:
    370
    Tim Jones
    Jan 31, 2004
  5. Jimmy

    Why #pragma pack not take effect?

    Jimmy, Jul 3, 2007, in forum: C Programming
    Replies:
    5
    Views:
    759
    Kenny McCormack
    Jul 4, 2007
Loading...

Share This Page