LibXML UTF8 - Input is not proper UTF-8, indicate encoding !

Discussion in 'Perl' started by Vlajko Knezic, Mar 5, 2005.

  1. Not so sure what is going on here but is something to do with the way UTF8
    is handled in Perl and/or LibXML



    The sctript below:

    - accepts a value from a form text field;

    - builds XML document around it,

    - deparses the document to the string using toString(),

    - parses the string into the XML document using parse_string()

    - transforms XML document into HTML document using XSL
    transformation



    Everything works well until UTF8 character is entered in the text field (for
    example é) . In that case when trying to run parse_string() code crashes
    with the message:

    =====================================================================

    :2: parser error : Input is not proper UTF-8, indicate encoding
    !<test><test_text>abcé</test_text></test> ^:2: error:
    Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
    ^ at C:/_work/vsurvey/site/test1.cgi line
    24=====================================================================



    I know that the code below does not make much sense but this is an
    abstraction of the much more complex code. Environment is Perl 5.8; Apache;
    Windows XP.



    Hints and/or explanation what was coded wrong and how should it be fixed are
    very much appreciated.



    Vlajko Knezic,

    Toronto, Ontario



    ---------------------------------------------------------------------------------------------------------------------

    test.cgi



    #! c:/Perl/bin/Perl.exe



    use CGI;

    use XML::LibXML;

    use XML::LibXSLT;

    use CGI::Carp qw( fatalsToBrowser );

    use Encode;



    my $mDocument = XML::LibXML::Document-> new();

    my $parser = XML::LibXML->new();



    $mDocument->setEncoding("UTF8");

    my $mCGI = new CGI;

    print $mCGI->header;

    my $mTest_text = $mCGI->param('test');;



    my $mTest = $mDocument-> createElement("test");

    my $mTestText = $mDocument-> createElement("test_text");

    $mTestText->appendTextNode($mTest_text);

    $mTest->appendChild($mTestText);

    $mDocument->setDocumentElement( $mTest );

    $mDocument->setEncoding("UTF8");

    my $mTestXML = $mDocument->toString();

    my $mParsedTestXML = $parser->parse_string($mTestXML);



    my $mParsedXMLXSL = $parser->parse_file('test.xsl');

    my $mParserXSL = XML::LibXSLT->new();

    my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);

    my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);

    my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);

    print $mPrintPageHTML;



    test.xsl



    <?xml version="1.0"?>

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:eek:utput method="html" encoding="UTF-8" indent="yes"
    omit-xml-declaration="yes"/>

    <xsl:template match="//test">

    <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

    </head>

    <html>

    <body>

    <xsl:value-of select="test_text"/>

    <form name="test" type="post" target="_self">

    <input type="text" name="test" /><input type="submit" name="button"/>

    </form>

    </body>

    </html>

    </xsl:template>

    </xsl:stylesheet>
     
    Vlajko Knezic, Mar 5, 2005
    #1
    1. Advertising

  2. Vlajko Knezic

    Joe Smith Guest

    Vlajko Knezic wrote:

    > $mDocument->setEncoding("UTF8");
    > my $mCGI = new CGI;
    > my $mTest_text = $mCGI->param('test');;


    This is the point, you need to encode $mTest_text into
    UTF8 before doing anything with that string. You have
    promised the XML library that you will be working with
    UTF8, therefore it is up to you to ensure that everything
    is UTF8 (not ISO8859-1).

    Any further questions should be posted to comp.lang.perl.misc
    and not this newsgroup (comp.lang.perl is defunct).
    -Joe
     
    Joe Smith, Mar 6, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,885
    Jon Skeet [C# MVP]
    Jun 9, 2004
  2. Ian Gregory
    Replies:
    1
    Views:
    510
  3. Olav
    Replies:
    3
    Views:
    4,259
  4. Vlajko Knezic
    Replies:
    2
    Views:
    391
    Wes Groleau
    Mar 5, 2005
  5. Replies:
    2
    Views:
    388
    Nathan Keel
    Aug 14, 2009
Loading...

Share This Page