"Wide character in syswrite" in writing an HTML form.

Discussion in 'Perl Misc' started by Ben Bullock, May 14, 2006.

  1. Ben Bullock

    Ben Bullock Guest

    I have written a Perl script which accesses a WWW form, gets the text, does
    some editing and then sends it back. I'm encountering a problem. I'm using
    the latest version of Perl, 5.8.8, with the libwww and HTML::Form modules. I
    keep getting the above error message "Wide character in syswrite" when my
    code tries to update the page. It is in UTF8. Also, I have had some
    characters mangled. I have tried extensive searches of Google about how to
    solve this problem, with no luck so far. One thing I have tried is "use open
    ':utf8';", but it doesn't work. Can anyone here suggest some way to solve
    this problem? It is very exasperating since otherwise the script is working
    perfectly. Thanks.
    Ben Bullock, May 14, 2006
    #1
    1. Advertising

  2. On Sun, 14 May 2006, Ben Bullock wrote:

    > I have written a Perl script which accesses a WWW form, gets the
    > text, does some editing and then sends it back. I'm encountering a
    > problem. I'm using the latest version of Perl, 5.8.8, with the
    > libwww and HTML::Form modules. I keep getting the above error
    > message "Wide character in syswrite" when my code tries to update
    > the page. It is in UTF8.


    I don't know the answer, but it's an area that's of interest to me...

    When you know the answer, maybe you'd be in a position to update this
    bug: http://rt.cpan.org/Ticket/Display.html?id=17249 - SCNR. I found
    that with a simple google - odd that you didn't mention it yourself.

    I hadn't seen it before, so I can't really say what it implies yet.
    Just how close would you say that report is to your own problem?

    > Also, I have had some characters mangled.


    Sorry, but this is not a useful report!

    To get a worthwhile result, you need to boil the problem down into a
    simple test case that we can reproduce for ourselves. Web forms
    submission is particularly fraught with pitfalls and hurdles. Just
    saying the equivalent of "it doesn't work" gets us no further.

    > have tried extensive searches of Google about how to solve this
    > problem, with no luck so far.


    Be explicit! Otherwise, people trying to help you are just going to
    repeat the things you already found.

    good luck

    --

    Debugging package - may contain traces
    Alan J. Flavell, May 14, 2006
    #2
    1. Advertising

  3. Ben Bullock

    Guest

    Alan J. Flavell wrote:
    > On Sun, 14 May 2006, Ben Bullock wrote:
    >
    > > I have written a Perl script which accesses a WWW form, gets the
    > > text, does some editing and then sends it back. I'm encountering a
    > > problem. I'm using the latest version of Perl, 5.8.8, with the
    > > libwww and HTML::Form modules. I keep getting the above error
    > > message "Wide character in syswrite" when my code tries to update
    > > the page. It is in UTF8.

    >
    > I don't know the answer, but it's an area that's of interest to me...
    >
    > When you know the answer, maybe you'd be in a position to update this
    > bug: http://rt.cpan.org/Ticket/Display.html?id=17249 - SCNR. I found
    > that with a simple google - odd that you didn't mention it yourself.


    I found that one, it looks similar to my situation. I don't know why
    you think that it's odd that I didn't mention it though - I found
    buckets of hits on Google for similar-looking things, and I don't know
    which one is relevant.

    > I hadn't seen it before, so I can't really say what it implies yet.
    > Just how close would you say that report is to your own problem?
    >
    > > Also, I have had some characters mangled.

    >
    > Sorry, but this is not a useful report!


    Hmm? Some non-ascii UTF8 characters got mangled into non-UTF8
    compliant characters going out from my program back to the WWW form.

    > To get a worthwhile result, you need to boil the problem down into a
    > simple test case that we can reproduce for ourselves. Web forms
    > submission is particularly fraught with pitfalls and hurdles. Just
    > saying the equivalent of "it doesn't work" gets us no further.


    I don't have a complete test program I can show you here,
    unfortunately. I've tracked the bug to the following lines in my code:

    use LWP::UserAgent;
    use HTML::Form;

    sub replace_text_in_form
    {
    my $ua = $_[0]; # user agent
    my $form = $_[1]; # already-parsed form from HTML::Form
    my $newtext = $_[2]; # update the textbox with this new text
    $form->value ("textbox", $newtext);
    my $request = $form->click;
    my $response = $ua->request($request);
    return $response;
    }

    Sometimes I get a value in "$response->status_line" of "500 Wide
    character in syswrite" error and it fails, and sometimes it works, but
    either time some of the non-ascii characters get mangled. I checked
    and the characters are mangled after they go out: they are OK going in
    to the above.

    > > have tried extensive searches of Google about how to solve this
    > > problem, with no luck so far.

    >
    > Be explicit! Otherwise, people trying to help you are just going to
    > repeat the things you already found.


    Yeah, well, people often say things like that on Usenet, but then you
    give them more details to work on, and after all that you often find
    they don't know the answer anyway :). Have a nice day.
    , May 14, 2006
    #3
  4. wrote:
    > Alan J. Flavell wrote:
    >> On Sun, 14 May 2006, Ben Bullock wrote:
    >> > I have written a Perl script which accesses a WWW form, gets the
    >> > text, does some editing and then sends it back. I'm encountering a
    >> > problem. I'm using the latest version of Perl, 5.8.8, with the
    >> > libwww and HTML::Form modules. I keep getting the above error
    >> > message "Wide character in syswrite" when my code tries to update
    >> > the page. It is in UTF8.

    [...]
    >> To get a worthwhile result, you need to boil the problem down into a
    >> simple test case that we can reproduce for ourselves. Web forms
    >> submission is particularly fraught with pitfalls and hurdles. Just
    >> saying the equivalent of "it doesn't work" gets us no further.

    >
    > I don't have a complete test program I can show you here,
    > unfortunately. I've tracked the bug to the following lines in my code:
    >
    > use LWP::UserAgent;
    > use HTML::Form;
    >
    > sub replace_text_in_form
    > {
    > my $ua = $_[0]; # user agent
    > my $form = $_[1]; # already-parsed form from HTML::Form
    > my $newtext = $_[2]; # update the textbox with this new text
    > $form->value ("textbox", $newtext);


    $form->value("textbox", encode($charset, $newtext));

    where $charset must be the charset of the page containing the form (If
    you know that's UTF-8 you can hardcode it in your script, but it is
    probably safer to get it from the original page).

    >> Be explicit! Otherwise, people trying to help you are just going to
    >> repeat the things you already found.

    >
    > Yeah, well, people often say things like that on Usenet, but then you
    > give them more details to work on, and after all that you often find
    > they don't know the answer anyway :). Have a nice day.


    They can't know if they know the answer if they don't even know the
    question!

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
    Peter J. Holzer, May 14, 2006
    #4
  5. Ben Bullock

    John Bokma Guest

    wrote:

    > Alan J. Flavell wrote:


    [ .. ]

    >> Be explicit! Otherwise, people trying to help you are just going to
    >> repeat the things you already found.

    >
    > Yeah, well, people often say things like that on Usenet, but then you
    > give them more details to work on, and after all that you often find


    ^^^^

    That is a very good choice of wording: *you* give them *work* and several
    people try to do that work, for *free*.

    > they don't know the answer anyway :). Have a nice day.


    Even if people pay me to do work, if they don't give me enough details I
    can't answer a very important question: can I help in the first place.

    --
    John Bokma Freelance software developer
    &
    Experienced Perl programmer: http://castleamber.com/
    John Bokma, May 14, 2006
    #5
  6. Ben Bullock

    Ben Bullock Guest

    "John Bokma" <> wrote in message
    news:Xns97C38891F701Acastleamber@130.133.1.4...
    > wrote:
    >
    >> Alan J. Flavell wrote:

    >
    > [ .. ]
    >
    >>> Be explicit! Otherwise, people trying to help you are just going to
    >>> repeat the things you already found.

    >>
    >> Yeah, well, people often say things like that on Usenet, but then you
    >> give them more details to work on, and after all that you often find

    >
    > ^^^^
    >
    > That is a very good choice of wording: *you* give them *work* and several
    > people try to do that work, for *free*.


    *I* only saw *one* post when I replied there. *Thanks* to the several other
    people who tried to do the work for *free*.

    Thanks also to *John* *Bokma* for all the asterisks.
    Ben Bullock, May 15, 2006
    #6
  7. Ben Bullock

    Ben Bullock Guest

    "Peter J. Holzer" <> wrote in message
    news:...
    > wrote:
    >> I don't have a complete test program I can show you here,
    >> unfortunately. I've tracked the bug to the following lines in my code:
    >>
    >> use LWP::UserAgent;
    >> use HTML::Form;
    >>
    >> sub replace_text_in_form
    >> {
    >> my $ua = $_[0]; # user agent
    >> my $form = $_[1]; # already-parsed form from HTML::Form
    >> my $newtext = $_[2]; # update the textbox with this new text
    >> $form->value ("textbox", $newtext);

    >
    > $form->value("textbox", encode($charset, $newtext));
    >
    > where $charset must be the charset of the page containing the form (If
    > you know that's UTF-8 you can hardcode it in your script, but it is
    > probably safer to get it from the original page).


    No, I know that it's utf-8. Thanks very much for this tip. Surprisingly (to
    me at least) it worked, so today's Perl superhero is Peter J. Holzer. Call
    me ignorant (preferably with some added asterisks) but I don't really
    understand why it wasn't working before, or what the above is doing. Anyway,
    thanks. You're a lifesaver. In case some people don't know about "encode",
    it was also necessary to write

    use Encode;

    at the top of the script. After reading "perldoc -f Encode" I found that
    there is another function called "encode_utf8" which I used in the end to
    save having to write the $charset variable in the above.
    Ben Bullock, May 15, 2006
    #7
  8. Ben Bullock

    John Bokma Guest

    "Ben Bullock" <> wrote:

    > *I* only saw *one* post when I replied there.


    That's Usenet. It's not an instant help desk.

    > *Thanks* to the several
    > other people who tried to do the work for *free*.


    :-D.

    > Thanks also to *John* *Bokma* for all the asterisks.


    You're welcome :-D I often test my shift key, because I need it a lot with
    Perl programming :-D

    --
    John Bokma Freelance software developer
    &
    Experienced Perl programmer: http://castleamber.com/
    John Bokma, May 15, 2006
    #8
  9. On Mon, 15 May 2006, Ben Bullock wrote:

    > "Peter J. Holzer" <> wrote in message
    > news:...


    > > $form->value("textbox", encode($charset, $newtext));


    [...]

    > Thanks very much for this tip. Surprisingly (to me
    > at least) it worked, so today's Perl superhero is Peter J. Holzer.


    Indeed.

    > I don't really understand why it wasn't working before, or what the
    > above is doing. Anyway, thanks. You're a lifesaver. In case some
    > people don't know about "encode", it was also necessary to write
    >
    > use Encode;
    >
    > at the top of the script. After reading "perldoc -f Encode"

    [...]

    Well, let's read the documentation of Encode to see what light it
    throws on our understanding. (I think I learned something from this,
    anyway).

    $octets = encode(ENCODING, $string [, CHECK])

    Encodes a string from Perl's internal form into ENCODING and returns
    a sequence of octets.

    What that says is that you feed it a "string" (i.e of characters
    represented in Perl's internal format, which might include "wide"
    unicode characters, and in this case actually did so), and it returns
    a sequence of octets as they might be expected in the outside world.

    The complaint you were getting was that a "wide character" had been
    fed to syswrite (which was open to a socket). If I take a look at the
    documentation for syswrite, then towards the end it says:

    Note that if the filehandle has been marked as :utf8 , Unicode
    characters are written instead of bytes (the LENGTH, OFFSET, and the
    return value of syswrite() are in UTF-8 encoded Unicode characters).
    The :encoding(...) layer implicitly introduces the :utf8 layer.

    It seems to me that the observed symptoms are saying that the
    filehandle had not, in fact, been "marked as :utf8", yet it was
    finding itself being fed with (Perl's internal representation of)
    unicode data. By feeding it instead with a sequence of binary "octets"
    - the output from encode() - we are smuggling our utf8-encoded data
    into the syswrite() without Perl being explicitly aware of it ("as
    binary data", if you will). HTTP is defined to be 8-bit clean, so I
    guess this is OK. I interpret this as the approach mentioned under
    binmode as "raw".

    It's an open question whether this was the module author's intention?

    At least, that's how I'm rationalising it - feel free to shoot this
    down. It all seems to make coherent sense when the data is
    represented in Perl's internal (utf8-based) form. Presumably some
    different approach is needed when the web form in question is wanted
    to be in one of the traditional 8-bit encodings (iso-8859-2,
    windows-1251, whatever).

    Does anyone have contact with the module author (Gisle Aas) - this
    seems like something that could/should be explained in the module
    documentation?

    best
    Alan J. Flavell, May 15, 2006
    #9
  10. Ben Bullock

    Ben Bullock Guest

    "Alan J. Flavell" <> wrote in message
    news:p...
    > It seems to me that the observed symptoms are saying that the
    > filehandle had not, in fact, been "marked as :utf8", yet it was
    > finding itself being fed with (Perl's internal representation of)
    > unicode data. By feeding it instead with a sequence of binary "octets"
    > - the output from encode() - we are smuggling our utf8-encoded data
    > into the syswrite() without Perl being explicitly aware of it ("as
    > binary data", if you will).


    This helped me to understand what's going on, so thank you very much. As to
    whether it's a bug or a feature, I'll leave such weighty matters for others
    to decide. It was certainly "unexpected behaviour" from my point of view.
    The funny thing is that I've been using that script since last July to send
    utf8 encoded Japanese characters, and hadn't had a problem with it. The
    mangled stuff was things like pound signs and unicode half signs.
    Ben Bullock, May 16, 2006
    #10
  11. On Tue, 16 May 2006, Ben Bullock wrote:

    > funny thing is that I've been using that script since last July to
    > send utf8 encoded Japanese characters, and hadn't had a problem with
    > it.


    Interesting.

    > The mangled stuff was things like pound signs and unicode half
    > signs.


    You didn't say that before, and there might be something significant
    in the detail. Witness this earlier discussion:

    |> > Also, I have had some characters mangled.
    |
    |> Sorry, but this is not a useful report!
    |
    |Hmm? Some non-ascii UTF8 characters got mangled [...]

    and compare it with the new information which you now provided.

    On the basis of that new information, I'd say there's a possibility
    that the code does not realise that it needs to use the utf8
    representation, unless the characters are above 255.

    --

    luser asked for an agronomic keyboard...
    Alan J. Flavell, May 16, 2006
    #11
  12. Alan J. Flavell wrote:
    > On Tue, 16 May 2006, Ben Bullock wrote:
    >> funny thing is that I've been using that script since last July to
    >> send utf8 encoded Japanese characters, and hadn't had a problem with
    >> it.

    >
    > Interesting.
    >
    >> The mangled stuff was things like pound signs and unicode half
    >> signs.

    >
    > You didn't say that before, and there might be something significant
    > in the detail.

    [...]
    > On the basis of that new information, I'd say there's a possibility
    > that the code does not realise that it needs to use the utf8
    > representation, unless the characters are above 255.


    Right. By default perl streams are in "byte-mode": Every character of a
    perl string is written as one byte. This is for backwards-compatiblity
    with older versions of perl. So, if you write the string "½£" to a
    stream, it will be printed as two bytes: 0xBD 0xA3. But if you try to
    write "½€", perl would have to write 0xBD 0x20AC. Since there is no way
    that perl can stuff the value 0x20AC into 8 bits, it converts the whole
    string into UTF-8 and prints that instead: That's now 5 Bytes: 0xC2 0xBD
    0xE2 0x82 0xAC. Since that may not be what you wanted, it also gives you
    the "Wide character in syswrite" warning.

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
    Peter J. Holzer, May 20, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HNguyen
    Replies:
    4
    Views:
    2,389
    HNguyen
    Dec 21, 2004
  2. Li Zhang
    Replies:
    4
    Views:
    6,045
    softip
    Feb 27, 2009
  3. Mark
    Replies:
    2
    Views:
    6,058
  4. Ron Smith
    Replies:
    14
    Views:
    1,058
    Ron Smith
    Aug 9, 2004
  5. rob c
    Replies:
    4
    Views:
    311
    McKirahan
    Dec 30, 2005
Loading...

Share This Page