LibXML element->toString vs document->toString

Discussion in 'Perl Misc' started by Fergus McMenemie, Jul 12, 2012.

  1. Hi, I have been driven mad by the following, which took ages to track
    down. What is going on? I appears it is invalid to use toString on the
    document object.


    #! /usr/local/bin/perl -w
    use strict;
    use warnings;
    use utf8;
    use Encode;
    use XML::LibXML;
    binmode(STDOUT, ":utf8");

    my $src= join("",<DATA>);
    print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
    my $parser = XML::LibXML->new();
    my $x = $parser->parse_string($src)->documentElement();
    my $str=$x->toString(1);
    print "$str\n";
    print "string 1 is invalid \n" unless ( Encode::is_utf8($str,1) );

    $x = $parser->parse_string($src);
    $str=$x->toString(1);
    print "$str\n";
    print "string 2 is invalid \n" unless ( Encode::is_utf8($str,1) );

    __DATA__
    <?xml version="1.0" encoding="utf-8" standalone="no"?>
    <plugin name="\xc5\x81"></plugin>
     
    Fergus McMenemie, Jul 12, 2012
    #1
    1. Advertisements

  2. Agreed, the warnings are there. However it did appear to make the
    issue clearer. This example is rather goofy and posting it to USEnet
    added a few more wrinkles. My original code and the real program
    contained the actual characters. However my USEnet reader would not
    let me post the real chars. Hence the octets.

    My issue is that document->toString does not appear to work. Please
    ignore the use of us_utf8.
    I have to pass references to DOM objects around all over the
    place. I find I am having to make use of either documentElement()
    or ownerDocument() depending on what I am doing. I would like to have
    a consistent "pattern" for doing this. I would like to setting on
    passing the document object around but it is anoying that I cant then
    use toString.
     
    Fergus McMenemie, Jul 13, 2012
    #2
    1. Advertisements

  3. Thanks for the tip. My code now reads:-

    use strict;
    use warnings;
    use Encode;
    use XML::LibXML;
    binmode(STDOUT, ":utf8");

    my $src= join("",<DATA>);
    $src =~ s/\\x([0-9a-f][0-9a-f])/chr hex $1/egi;
    $src = Encode::decode "utf8", $src;
    print "LibXML VERSION=$XML::LibXML::VERSION\n";
    print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
    my $parser = XML::LibXML->new();
    my $x = $parser->parse_string($src)->documentElement();
    my $str=$x->toString(1);
    print "$str\n";
    print "string 1 is invalid \n" unless ( Encode::is_utf8($str,1) );

    $x = $parser->parse_string($src);
    $str=$x->toString(1);
    print "$str\n";
    print "string 2 is invalid \n" unless ( Encode::is_utf8($str,1) );

    __DATA__
    <?xml version="1.0" encoding="utf-8" standalone="no"?>
    <plugin
    name="\xef\xbd\xb1\xef\xbd\xb2\xef\xbd\xb3\xef\xbd\xb4\xef\xbd\xb5"></pl
    ugin>


    And fails on my mac running OS X Snow Leopard. But the 'real' version is
    running with perl 5.12 on centos and also fails there. No sure about the
    version of LibXML.

    Does it work for your?
     
    Fergus McMenemie, Jul 14, 2012
    #3
  4.  
    Fergus McMenemie, Jul 14, 2012
    #4
  5. My newsreader does not properly upport UTF8 I guess lots of others still
    dont either.

    MacSoup - my soups gone off!
     
    Fergus McMenemie, Jul 17, 2012
    #5
  6. Duh!
    Thanks I dont know how I managed to miss that bit.
     
    Fergus McMenemie, Jul 17, 2012
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.