xml::twig - writing utf-8

Discussion in 'Perl Misc' started by miletwo@gmail.com, May 25, 2006.

  1. Guest

    I'm trying to read xml file and rewrite as RSS using following file.
    Problem is, it is not forcing UTF-8 no matter what I do. Any help
    appreciated.

    ***********************
    #!/bin/perl -w
    #use strict;
    use XML::Twig;
    use utf8;

    use open OUT => ":utf8";
    use open IN => ":utf8";

    my $shownum = 10;
    my $thisyear = '2006';
    my $field= 'releasedate';
    my $twig= new XML::Twig( keep_encoding=> 1);

    open(INFILE, "directorylist.xml");
    $twig->parse(\*INFILE);

    my $root= $twig->root;
    my @releases= $root->children;

    my $output = "";

    $output .= '<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n";
    $output .= '<channel>' . "\n\n";
    $output .= <<EOT;
    <title>scrubbed Incorporated - Recent News</title>
    <link>http://www.scrubbed.com/press/</link>
    <description>Visit the scrubbed Press Center where you will find
    many resources, including press releases, corporate information,
    technology overviews, executive bios and photos, the scrubbed logo and
    more.<br />If you are a member of the media and are not able to find
    what you are looking for in the Press Center, please send an email to
    corpcomm\@scrubbed.com.</description>
    <language>en-us</language>

    EOT

    for(my $i=0; $i < $shownum; $i++){
    $output .= "\t" . '<item>' . "\n";
    $output .= "\t\t" . '<title>' .
    $releases[$i]->first_child('headline')->text . '</title>' . "\n";
    $output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' .
    $thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n";
    $output .= "\t\t" . '<description>' .
    $releases[$i]->first_child('subheader')->text . '</description>' .
    "\n";
    $output .= "\t\t" . '<dc:date>' .
    $releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n";
    $output .= "\t" . '</item>';
    $output .= "\n\n";
    }

    $output .= "</channel>\n</rss>";
    Encode::_utf8_on($output);

    open(FILEWRITE,">:utf8", "press.rss");
    binmode FILEWRITE, ":utf8";
    print FILEWRITE $output;
     
    , May 25, 2006
    #1
    1. Advertising

  2. wrote:

    > I'm trying to read xml file and rewrite as RSS using following file.
    > Problem is, it is not forcing UTF-8 no matter what I do. Any help
    > appreciated.


    Your script works for me. Please provide a complete example that
    demonstrates the error. Your script tries to read a file named
    directorylist.xml, but you didn't provide that file. I had to read your
    script to find out what that file should contain, and write one myself.
    Maybe there is an error in your input file.

    Also you didn't provide any information about the system you are using.
    I tested it with Debian Sarge (perl 5.8.4, XML::Twig 3.17).

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
     
    Peter J. Holzer, May 25, 2006
    #2
    1. Advertising

  3. wrote:
    > I'm trying to read xml file and rewrite as RSS using following file.
    > Problem is, it is not forcing UTF-8 no matter what I do. Any help
    > appreciated.
    >
    > ***********************
    > #!/bin/perl -w
    > #use strict;
    > use XML::Twig;
    > use utf8;
    >
    > use open OUT => ":utf8";
    > use open IN => ":utf8";
    >
    > my $shownum = 10;
    > my $thisyear = '2006';
    > my $field= 'releasedate';
    > my $twig= new XML::Twig( keep_encoding=> 1);
    >
    > open(INFILE, "directorylist.xml");
    > $twig->parse(\*INFILE);
    >
    > my $root= $twig->root;
    > my @releases= $root->children;
    >
    > my $output = "";
    >
    > $output .= '<rss version="2.0"
    > xmlns:dc="http://purl.org/dc/elements/1.1/">' . "\n";
    > $output .= '<channel>' . "\n\n";
    > $output .= <<EOT;
    > <title>scrubbed Incorporated - Recent News</title>
    > <link>http://www.scrubbed.com/press/</link>
    > <description>Visit the scrubbed Press Center where you will find
    > many resources, including press releases, corporate information,
    > technology overviews, executive bios and photos, the scrubbed logo and
    > more.<br />If you are a member of the media and are not able to find
    > what you are looking for in the Press Center, please send an email to
    > corpcomm\@scrubbed.com.</description>
    > <language>en-us</language>
    >
    > EOT
    >
    > for(my $i=0; $i < $shownum; $i++){
    > $output .= "\t" . '<item>' . "\n";
    > $output .= "\t\t" . '<title>' .
    > $releases[$i]->first_child('headline')->text . '</title>' . "\n";
    > $output .= "\t\t" . '<link>http://www.scrubbed.com/press/releases/' .
    > $thisyear . '/' . $releases[$i]->att('name') . '.html</link>' . "\n";
    > $output .= "\t\t" . '<description>' .
    > $releases[$i]->first_child('subheader')->text . '</description>' .
    > "\n";
    > $output .= "\t\t" . '<dc:date>' .
    > $releases[$i]->first_child('releasedate')->text . '</dc:date>' . "\n";
    > $output .= "\t" . '</item>';
    > $output .= "\n\n";
    > }
    >
    > $output .= "</channel>\n</rss>";
    > Encode::_utf8_on($output);
    >
    > open(FILEWRITE,">:utf8", "press.rss");
    > binmode FILEWRITE, ":utf8";
    > print FILEWRITE $output;


    Whaouh! You sure want to make sure you get UTF-8 on output! Except of
    course that the keep_encoding option tells XML::Twig not output the same
    encoding as you got in the input (which you did not show us as
    mentionned by the previous poster).

    If you want to output utf-8, the best way is NOT to do anything: by
    default the parser will convert anything into utf-8, and the output will
    be in that encoding.

    Did you try your code without the various utf8-related instructions
    peppered though it? What was the result?

    --
    mirod
     
    Michel Rodriguez, May 26, 2006
    #3
  4. Guest

    Here's directorylist.xml. I'm on MacOSX but also tried running this on
    my Solaris box and it does the same thing. I've also tried it with and
    without keep_encoding, so don't "think" that's it.

    Thanks for replies.
    <?xml version="1.0" encoding="UTF-8"?>
    <directory>
    <file name="060525_brings_custom_user">
    <releasedate>05-25-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[XXSCRUBBEDXX Brings Custom User-Interface
    Capabilities to U.S. Cellular's easyedgeSM with the uiOne
    Solution]]></headline>
    <subheader><![CDATA[]]></subheader>
    <division>Corp, QIS</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060524_initiates_patent_infringement">
    <releasedate>05-24-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[XXSCRUBBEDXX Initiates Patent Infringement
    Proceedings in the UK against Nokia]]></headline>
    <subheader><![CDATA[]]></subheader>
    <division>Corp</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060518_takes_XXSCRUBBEDXX_2006">
    <releasedate>05-18-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[XXSCRUBBEDXX Takes XXSCRUBBEDXX 2006 to the
    Next Level with Addition of Telecom Italia and XXSCRUBBEDXX to an
    Already Impressive XXSCRUBBEDXX 2006 Conference Agenda]]></headline>
    <subheader><![CDATA[Premiere Players in the Industry Showcase
    Advanced Data Capabilities at XXSCRUBBEDXX 2006 Conference in San Diego
    May 31-June 2]]></subheader>
    <division>Corp, QIS</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060518_averitt_selects_omnitracs">
    <releasedate>05-18-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[AVERITT Selects XXSCRUBBEDXX's OmniTRACS®
    and OmniExpress® Mobile Communication Systems for Entire Fleet and
    Service Centers]]></headline>
    <subheader><![CDATA[Leading Freight and Supply Chain Management
    Provider with International Reach One of First to Implement End-to-End
    Solution for Improved Fleet Communications]]></subheader>
    <division>Corp, QWBS</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060517_clears_up_misunderstandings">
    <releasedate>05-17-2006</releasedate>
    <releasetime>12:36 PM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[XXSCRUBBEDXX Clears Up Misunderstandings
    Regarding the ITC Staff Attorney Briefing]]></headline>
    <subheader><![CDATA[]]></subheader>
    <division>Corp</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060512_hospital_democratic_republic">
    <releasedate>05-12-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[Hospital in the Democratic Republic of Congo to
    Be Outfitted with CDMA2000 1xEV-DO to Help Improve Healthcare in
    Africa]]></headline>
    <subheader><![CDATA[XXSCRUBBEDXX Pledges Donation and Technology
    to the Dikembe Mutombo Foundation, First Hospital Built in the Congo in
    Nearly 40 Years]]></subheader>
    <division>Corp</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060509_british_sky_broadcasting">
    <releasedate>05-09-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[XXSCRUBBEDXX and British Sky Broadcasting
    Announce Intent to Conduct XXSCRUBBEDXX™ Technology Trial in United
    Kingdom]]></headline>
    <subheader><![CDATA[Joint Exercise Expected to be Europe's First
    Technical Trial of Open, Network-Agnostic FLO Technology]]></subheader>
    <division>Corp</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    <file name="060509_application_downloads_XXSCRUBBEDXX">
    <releasedate>05-09-2006</releasedate>
    <releasetime>04:30 AM</releasetime>
    <timezone>America/Los_Angeles</timezone>
    <headline><![CDATA[Application Downloads with XXSCRUBBEDXX's
    XXSCRUBBEDXX® Solution Surpass Three Million in Thailand on Hutch's
    Advanced CDMA2000 1X Network]]></headline>
    <subheader><![CDATA[Active Hutchison CAT Customers Have Downloaded
    an Average of 10 Applications Each Since XXSCRUBBEDXX Launched, Numbers
    Continue to Grow]]></subheader>
    <division>Corp, QIS</division>
    <categories></categories>
    <document></document>
    <exclude></exclude>
    </file>
    </directory>
     
    , May 26, 2006
    #4
  5. wrote:
    > Here's directorylist.xml. I'm on MacOSX but also tried running this on
    > my Solaris box and it does the same thing. I've also tried it with and
    > without keep_encoding, so don't "think" that's it.


    This file contains only 8 <file/> elements. Your script crashes with

    Can't call method "first_child" on an undefined value at ./miletwo line 40.

    if there are less than 10 children of the root element, before it even
    opens the output file. So with this file, your script doesn't write
    anything. How do you determine whether a non-existent file is UTF-8 or
    not?

    hp

    --
    _ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
    |_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
    | | | | würde.
    __/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
     
    Peter J. Holzer, May 26, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sherman Willden
    Replies:
    4
    Views:
    660
    Sherman Willden
    Aug 8, 2003
  2. Sherman Willden
    Replies:
    1
    Views:
    143
    Sisyphus
    Jul 25, 2003
  3. Sherman Willden
    Replies:
    3
    Views:
    173
    Sherman Willden
    Aug 8, 2003
  4. Andres Monroy-Hernandez

    XML::Twig constructor disregarding map_xmlns - bug in module?

    Andres Monroy-Hernandez, Aug 29, 2004, in forum: Perl Misc
    Replies:
    0
    Views:
    110
    Andres Monroy-Hernandez
    Aug 29, 2004
  5. alwaysonnet

    Get XML content using XML::Twig

    alwaysonnet, Apr 21, 2010, in forum: Perl Misc
    Replies:
    19
    Views:
    203
    Klaus
    Apr 29, 2010
Loading...

Share This Page