utf8, length and syswrite are killing me

Discussion in 'Perl Misc' started by A. Farber, Feb 17, 2010.

  1. A. Farber

    A. Farber Guest

    Hello,

    I have a russian card game at
    http://apps.facebook.com/video-preferans/
    which I've recently moved from using urlencoded data
    to XML data in UTF-8. Since then it often hangs
    for the users and I suspect, that my subroutine:

    sub enqueue {
    my $child = shift;
    my $data = shift;
    my $fh = $child->{FH};
    my $response = $child->{RESPONSE};

    # flash.net.Socket.readUTF() expects 16-bit prefix in network
    order
    my $prefix = pack 'n', length $data;

    # append to the end of the outgoing queue
    push @{$response}, $prefix . $data;
    }

    packs wrong number of bytes for cyrillic messages.

    I'm using perl v5.10.0 at OpenBSD 4.5 and
    "perldoc -tf length" suggests using
    length(Encoding::encode_utf8(EXPR))

    But when I put the line:

    use Encode::Encoding;
    ....
    my $prefix = pack 'n', length(Encoding::encode_utf8($data));

    then it borks with

    Undefined subroutine &Encoding::encode_utf8 called at Child.pm line
    229.

    Any help please?

    Also I have to mention, that when users chat
    in Russian, my server just passes their cyrillic
    messages around (with sysread - poll - syswrite).

    But for their cyrillic words in my program (I "use utf8;")
    I have to call utf8::encode($cyrillic_word) before I can
    write it away with syswrite or it would die ("wide char").

    I've tried moving utf8::encode($data) into the
    enqueue subroutine above but it doesn' allow me
    (maybe because parts of $data are not utf8??)

    Regards
    Alex
    A. Farber, Feb 17, 2010
    #1
    1. Advertising

  2. A. Farber

    Guest

    On Wed, 17 Feb 2010 10:28:59 -0800 (PST), "A. Farber" <> wrote:

    >Hello,
    >
    >I have a russian card game at
    >http://apps.facebook.com/video-preferans/
    >which I've recently moved from using urlencoded data
    >to XML data in UTF-8. Since then it often hangs
    >for the users and I suspect, that my subroutine:
    >
    >sub enqueue {
    > my $child = shift;
    > my $data = shift;
    > my $fh = $child->{FH};
    > my $response = $child->{RESPONSE};
    >
    > # flash.net.Socket.readUTF() expects 16-bit prefix in network
    >order
    > my $prefix = pack 'n', length $data;
    >
    > # append to the end of the outgoing queue
    > push @{$response}, $prefix . $data;
    >}
    >
    >packs wrong number of bytes for cyrillic messages.
    >

    If '$data' is still a Perl string,
    I would encode() to UTF-8 octets then
    push @outarray, pack ('n a*', length($octets), $octets);
    But, you could do it a couple of different ways. Basically
    you want the length to be of the encoded data, not the length
    of the perl string (if it's in Perl character semantics).

    You really don't want to push '$prefix . $data' if $data is
    not yet encoded utf-8. If it is already encoded utf-8, then
    the length would be correct because its already bytes (octets),
    not character semantics.

    You should read the Unicode docs: perluniintro, perlunicode, unicode, etc.
    Each have links that take you to each other documentation.

    Below is some examples of a couple of ways to do it. See what works
    for you.

    -sln

    ----------------------
    use strict;
    use warnings;
    use Encode;

    binmode (STDOUT, ':encoding(UTF-8)');

    ##
    my $perlstring = "This is a string <\x{2100}>...";
    my $utf8octets = encode('UTF-8', $perlstring);
    my $packd_string = pack('n', length($utf8octets));
    my $unpackd_string = unpack('n', $packd_string);
    print "** Perl string : '$perlstring', length = ", length($perlstring),"\n\n";
    print "UTF-8 octets: '$utf8octets', length = ", length($utf8octets),"\n\n";
    print "Packed length of encoded string is $unpackd_string\n\n";

    ##
    my $len_plus_octets = $packd_string . $utf8octets;
    print "Length.UTF-8 octets: '$len_plus_octets'\n\n";

    ##
    my $packd_all = pack ('n a*', length($utf8octets), $utf8octets);
    print "Packed all : '$packd_all', length = ",length($packd_all),"\n\n";

    ##
    my ($len,$octets) = unpack ('n a*', $packd_all);
    print "Unpacked all : '$octets', length = ",length($octets),"\n";
    print " : read packed length = $len\n\n";
    my $decoded_string = decode('UTF-8', $octets);
    print "** Perl string : '$decoded_string', length = ", length($decoded_string), "\n\n";
    if ($decoded_string eq $perlstring) {
    print "** Perl strings are equal.\n";
    }
    else {
    print "** Perl strings are not equal.\n";
    }
    __END__
    ** Perl string : 'This is a string <GäÇ>...', length = 23

    UTF-8 octets: 'This is a string <+ó-ä-Ç>...', length = 25

    Packed length of encoded string is 25

    Length.UTF-8 octets: ' ?This is a string <+ó-ä-Ç>...'

    Packed all : ' ?This is a string <+ó-ä-Ç>...', length = 27

    Unpacked all : 'This is a string <+ó-ä-Ç>...', length = 25
    : read packed length = 25

    ** Perl string : 'This is a string <GäÇ>...', length = 23

    ** Perl strings are equal.
    , Feb 17, 2010
    #2
    1. Advertising

  3. A. Farber

    A. Farber Guest

    Thank you! I've ended up with encode($data) and after that the
    length() gives me the number of bytes for the syswrite (I hope)
    A. Farber, Feb 18, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alan Davies
    Replies:
    7
    Views:
    189
    Alan Davies
    Nov 27, 2003
  2. Zach Dennis
    Replies:
    2
    Views:
    183
    Brian Schröder
    Aug 11, 2005
  3. gry
    Replies:
    2
    Views:
    697
    Alf P. Steinbach
    Mar 13, 2012
  4. sysread and syswrite analogy

    , Dec 4, 2008, in forum: Perl Misc
    Replies:
    7
    Views:
    209
  5. A. Farber
    Replies:
    1
    Views:
    118
    J. Gleixner
    Dec 29, 2009
Loading...

Share This Page