perl, XML::LibXML: encoding problems while changing attributes on an XML string

Discussion in 'XML' started by kellner, Jul 23, 2006.

  1. kellner

    kellner Guest

    Hello,

    I'm parsing a chunk of XML code and would like to add attribute values
    to individual tags if these are lacking. This is with perl 5.8.6,
    libxml2 2.6.17, XML::LibXML 1.58.

    Basically, I have the parser add the attribute values to the respective
    nodes and then use the toString method of XML::LibXML::Document to
    write the modified text to a scalar. Both the original and the modified
    text evaluate properly as utf8, but the modified text doesn't print
    properly on the console, nor does it get entered as utf8 into a MySQL
    database.

    I don't really understand what's going on, and on what level the
    error(s) could be located (console encoding, perl encoding, XML
    encoding), and would appreciate any help I can get ...

    Here's the code:
    ------------------------------------------------

    #!/usr/bin/perl

    use strict;
    use XML::LibXML;
    use Encode 'decode_utf8';
    use vars qw ($parser $p);
    $parser = XML::LibXML->new();
    my $version = XML::LibXML::LIBXML_DOTTED_VERSION;
    print "libxml2 $version\n-------------\nXML::LibXML
    $XML::LibXML::VERSION\n-------------------\n";


    $p->{text} = qq|
    <p>
    <q who="Blabla">pramÄṇavÄrttikasvavá¹›ttiá¹­Ä«kÄ</q> And this is
    some further text.<br/>And even more text.<br/>And more.
    <q who="Blabla2">The second quotation!</q>.
    pramÄṇavÄrttikasvavá¹›ttiá¹­Ä«kÄ.
    </p>|;

    my $a = &validate_text($p->{text});
    print "$a \n";

    sub validate_text {
    my $text = shift;
    if (decode_utf8($text)) { print "TEXT is utf8\n";} else { print "is not
    utf8\n";}
    print "TESTING $text\n";
    my $id = 1;
    my $doc = $parser->parse_string($text);
    my $root = $doc->getDocumentElement;

    my @quotations = $root->findnodes('q');
    foreach my $q (@quotations) {
    unless ($q->hasAttribute('id')) { print "NO ID\n";
    $q->setAttribute('id', "$id"); ++$id;}
    else { print "HAS ID\n";}
    my $id_new = $q->getAttribute('id');
    print "NEW ID: $id_new\n";
    }

    my $newtext= $root->toString;
    if (decode_utf8($newtext)) { print "NEW TEXT is utf8\n";} else { print
    "is not utf8\n";}
    return ($newtext);
    }
    ------------------------------------------------------------

    I know that I can set a document encoding by creating a new $doc
    altogether, but I don't want to do this in this case, as the
    createDocument method prepends an xml version string to the created
    document, and this messes up the routines which process the code
    afterwards.

    Thanks in advance,

    Birgit Kellner
     
    kellner, Jul 23, 2006
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian Gregory
    Replies:
    1
    Views:
    513
  2. Iain
    Replies:
    2
    Views:
    681
  3. Olav
    Replies:
    3
    Views:
    4,287
  4. nicolas
    Replies:
    0
    Views:
    619
    nicolas
    Aug 16, 2005
  5. Iain
    Replies:
    1
    Views:
    156
    Martin Honnen
    Dec 15, 2003
Loading...

Share This Page