Ripping out parts of a DOM using XML::XSLT

N

NiallBCarter

Hey folks

I have what I think is a simple task but one that I am struggling
with.
I have this KML (Slightly abridged)

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0" xmlns:gml="http://
www.opengeospatial.org/standards/gml">
<Document>
<open>0</open>
<name>Gazetteer for Scotland</name>
<Placemark>
<name> Troon </name>
<description>A resort town on the coast of Kyle in Ayrshire, Troon
lies at the north end of Ayr Bay on a headland that extends into the
</description>
<Point id="t476">
<coordinates>-4.6562,55.54207,0</coordinates>
</Point>
<Link>http://www.geo.ed.ac.uk/scotgaz/towns/townfirst476.html
</Link>
</Placemark>
<Placemark>
<name> Niall Home </name>
<description>Not a lovely little home in the western isles
</description>
<Point id="t576">
<coordinates>-5.3454,53.46532,0</coordinates>
</Point>
<Link>http://www.geo.ed.ac.uk/scotgaz/towns/townfirst576.html
</Link>
</Placemark>
<ExtendedData xmlns:GforS="http://www.geo.ed.ac.uk/scotgaz/">
<GforS:Copyright>
&lt;p&gt;All Images and Text are Copyright (c) The Gazetteer for
Scotland 1995-2008 &lt;/p&gt;
</GforS:Copyright>
</ExtendedData>
</Document>
</kml>

This KML is generated by a perl script lying on the web and so I am
able to 'get' this KML from the web and parse it into a DOM using the
script below:



#!/usr/bin/perl

use XML::DOM;
use Data::Dumper;
use LWP::Simple;
use XML::DOM::XPath;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

#Gets the contents of the URLs
my $sg_get = $ua->get('http://www.geo.ed.ac.uk/scotgaz/cgi_bin/mid/
module2/scotgaz.pl?xmin=232095&ymin=630744&xmax=232695&ymax=631344');

#Creates instance of new XML::DOM::parser and uses to make new DOM
object
my $parser = new XML::DOM::parser;

# Parses the contents of the 'got' URL
my $sg_dom = $parser->parse ($sg_get->content);

#Prints the contents of a DOM::parsed document to the screen
#print Dumper($sg_dom);

#Saves the content of the DOM to a file
#$sg_dom->printToFile ("out.kml");

$sg_dom->dispose;



What I would actuallylike to do is use an XSL to pull out only the
Placemarks (including the placemark tags).
The reason for this is that I actually have four of these KML files
and I want to rip out all the Placemarks from each KML and then using
DOM (i know how to) put all the placemarks into one single KML and
serve this out to the user.

So how far have I got?
Well, the XSL file I am using is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:eek:utput method="xml" indent="yes" version="1.0" omit-xml-
declaration ="no"/>

<xsl:template match="/Document/Placemark">
<xsl:for-each select="kml/document/Placemark">
<Placemark>
<name><xsl:for-each select="kml/document/Placemark/name"></name>
</Placemark>
</xsl:for-each>
);


As you can see it is only getting the name from each Placemark and it
doesn't work! When using with teh perl script:

#!/usr/local/bin/perl

use XML::XSLT;
use strict;
use warnings;

my $xsl="xsl.xsl";
my $xml="out.kml";

# Create an instance of XSLT
my $xslt = eval { XML::XSLT->new($xsl, warnings => 1, debug => 0) };

print $xslt->to_dom;

# Free the memory
$xslt->dispose();

It says that it cannot create an instance of the XSL! For the moment I
have written my KML to a file to ease things up a little.

So can any of you people help? In the past I have found you all really
helpful and so I hope you can help me with this one! Sorry for the
long post but I wanted to try to get all the info down!

Cheers,

Niall
 
B

Bjoern Hoehrmann

* NiallBCarter wrote in comp.lang.perl.misc:
I have this KML (Slightly abridged)

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0" ...

These two specify the name of the element, that is, the tuple

{'http://earth.google.com/kml/2.0', 'kml'}

The parts are referred to as namespace name and local name.
<xsl:template match="/Document/Placemark">
<xsl:for-each select="kml/document/Placemark">

Here the 'kml' refers to just

{'', 'kml'}

Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala

/kml:kml/kml:Document/...

or use predicates with the namespace-uri() and local-name() functions:

/*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.
 
N

NiallBCarter

{'', 'kml'}

Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala

/kml:kml/kml:Document/...

or use predicates with the namespace-uri() and local-name() functions:

/*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

So would I be right in saying that it is not as simple as I thought?


Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.

I am aware that there are more sophisticated modules but I experience
troubles with each one I try. It was suggested that I stick to
XML::XSLT for the moment and try to come up with a work around rather
than have to hassle the IT staff to install more modules on the
managed computer I work on.

Essentially all I want to do is strip out the <kml>, <Document> parts
and be left with a DOM containing <Placemark> and all contents </
Placemark>
Is there a simple way of doing this?

rgds,

Niall
 
M

MSwanberg

  {'', 'kml'}
Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala
  /kml:kml/kml:Document/...
or use predicates with the namespace-uri() and local-name() functions:
  /*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

So would I be right in saying that it is not as simple as I thought?
Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.

I am aware that there are more sophisticated modules but I experience
troubles with each one I try. It was suggested that I stick to
XML::XSLT for the moment and try to come up with a work around rather
than have to hassle the IT staff to install more modules on the
managed computer I work on.

Essentially all I want to do is strip out the <kml>, <Document> parts
and be left with a DOM containing <Placemark> and all contents </
Placemark>
Is there a simple way of doing this?

rgds,

Niall

Have you tried the DOM method "getElementsByTagName"?

You could also use some XQL to parse out only the elements you want to
see.

You will still have to manipulate the nodes manually (i.e. you won't
be using XSL like you want to), but it's not too tough to create a new
DOM Document and appendChild each node you step through.

-Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top