Ripping out parts of a DOM using XML::XSLT

NiallBCarter · Jun 18, 2008

Hey folks

I have what I think is a simple task but one that I am struggling
with.
I have this KML (Slightly abridged)

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0" xmlns:gml="http://
www.opengeospatial.org/standards/gml">
<Document>
<open>0</open>
<name>Gazetteer for Scotland</name>
<Placemark>
<name> Troon </name>
<description>A resort town on the coast of Kyle in Ayrshire, Troon
lies at the north end of Ayr Bay on a headland that extends into the
</description>
<Point id="t476">
<coordinates>-4.6562,55.54207,0</coordinates>
</Point>
<Link>http://www.geo.ed.ac.uk/scotgaz/towns/townfirst476.html
</Link>
</Placemark>
<Placemark>
<name> Niall Home </name>
<description>Not a lovely little home in the western isles
</description>
<Point id="t576">
<coordinates>-5.3454,53.46532,0</coordinates>
</Point>
<Link>http://www.geo.ed.ac.uk/scotgaz/towns/townfirst576.html
</Link>
</Placemark>
<ExtendedData xmlns:GforS="http://www.geo.ed.ac.uk/scotgaz/">
<GforS:Copyright>
<p>All Images and Text are Copyright (c) The Gazetteer for
Scotland 1995-2008 </p>
</GforS:Copyright>
</ExtendedData>
</Document>
</kml>

This KML is generated by a perl script lying on the web and so I am
able to 'get' this KML from the web and parse it into a DOM using the
script below:

#!/usr/bin/perl

use XML:

OM;
use Data:

umper;
use LWP::Simple;
use XML:

OM::XPath;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

#Gets the contents of the URLs
my $sg_get = $ua->get('http://www.geo.ed.ac.uk/scotgaz/cgi_bin/mid/
module2/scotgaz.pl?xmin=232095&ymin=630744&xmax=232695&ymax=631344');

#Creates instance of new XML:

OM:

arser and uses to make new DOM
object
my $parser = new XML:

OM:

arser;

# Parses the contents of the 'got' URL
my $sg_dom = $parser->parse ($sg_get->content);

#Prints the contents of a DOM:

arsed document to the screen
#print Dumper($sg_dom);

#Saves the content of the DOM to a file
#$sg_dom->printToFile ("out.kml");

$sg_dom->dispose;

What I would actuallylike to do is use an XSL to pull out only the
Placemarks (including the placemark tags).
The reason for this is that I actually have four of these KML files
and I want to rip out all the Placemarks from each KML and then using
DOM (i know how to) put all the placemarks into one single KML and
serve this out to the user.

So how far have I got?
Well, the XSL file I am using is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl

utput method="xml" indent="yes" version="1.0" omit-xml-
declaration ="no"/>

<xsl:template match="/Document/Placemark">
<xsl:for-each select="kml/document/Placemark">
<Placemark>
<name><xsl:for-each select="kml/document/Placemark/name"></name>
</Placemark>
</xsl:for-each>
);

As you can see it is only getting the name from each Placemark and it
doesn't work! When using with teh perl script:

#!/usr/local/bin/perl

use XML::XSLT;
use strict;
use warnings;

my $xsl="xsl.xsl";
my $xml="out.kml";

# Create an instance of XSLT
my $xslt = eval { XML::XSLT->new($xsl, warnings => 1, debug => 0) };

print $xslt->to_dom;

# Free the memory
$xslt->dispose();

It says that it cannot create an instance of the XSL! For the moment I
have written my KML to a file to ease things up a little.

So can any of you people help? In the past I have found you all really
helpful and so I hope you can help me with this one! Sorry for the
long post but I wanted to try to get all the info down!

Cheers,

Niall

Bjoern Hoehrmann · Jun 18, 2008

* NiallBCarter wrote in comp.lang.perl.misc:

I have this KML (Slightly abridged)

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0" ...

These two specify the name of the element, that is, the tuple

{'http://earth.google.com/kml/2.0', 'kml'}

The parts are referred to as namespace name and local name.

<xsl:template match="/Document/Placemark">
<xsl:for-each select="kml/document/Placemark">

Here the 'kml' refers to just

{'', 'kml'}

Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala

/kml:kml/kml

ocument/...

or use predicates with the namespace-uri() and local-name() functions:

/*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.

NiallBCarter · Jun 18, 2008

{'', 'kml'}

Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala

/kml:kml/kmlocument/...

or use predicates with the namespace-uri() and local-name() functions:

/*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

So would I be right in saying that it is not as simple as I thought?

Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.

I am aware that there are more sophisticated modules but I experience
troubles with each one I try. It was suggested that I stick to
XML::XSLT for the moment and try to come up with a work around rather
than have to hassle the IT staff to install more modules on the
managed computer I work on.

Essentially all I want to do is strip out the <kml>, <Document> parts
and be left with a DOM containing <Placemark> and all contents </
Placemark>
Is there a simple way of doing this?

rgds,

Niall

MSwanberg · Jun 23, 2008

{'', 'kml'}

Click to expand...

Which is a different tuple. Also, the root element in your document is
this kml element, while your template match attribute looks for a root
element called 'Document'. To get the names right you have to either
declare a namespace prefix and use it in your XPath expressions ala

Click to expand...

/kml:kml/kmlocument/...

Click to expand...

or use predicates with the namespace-uri() and local-name() functions:

Click to expand...

/*[local-name() = 'kml' and namespace-uri() = 'http://...']/...

Click to expand...

So would I be right in saying that it is not as simple as I thought?

Do note that there are more sophisticated XML/XSLT/XPath modules on
CPAN, XML::LibXSLT for example.

Click to expand...

I am aware that there are more sophisticated modules but I experience
troubles with each one I try. It was suggested that I stick to
XML::XSLT for the moment and try to come up with a work around rather
than have to hassle the IT staff to install more modules on the
managed computer I work on.

Essentially all I want to do is strip out the <kml>, <Document> parts
and be left with a DOM containing <Placemark> and all contents </
Placemark>
Is there a simple way of doing this?

rgds,

Niall

Have you tried the DOM method "getElementsByTagName"?

You could also use some XQL to parse out only the elements you want to
see.

You will still have to manipulate the nodes manually (i.e. you won't
be using XSL like you want to), but it's not too tough to create a new
DOM Document and appendChild each node you step through.

-Mike

Need simple help on KML (Google version of XML)	2	Oct 4, 2007
XSLT and XML namespace issue	4	Jun 21, 2007
Elementtree find problem	1	Dec 11, 2007
XML to CSV via XSL	1	Jun 20, 2008
XML to XML using XSLT	1	Aug 18, 2011
problem of python whitespace XML dom	0	Jan 13, 2016
How to avoid Out of Memory Errors when dealing with a large XML file?	2	Jan 10, 2011
remove specail character from xml using xslt	0	Nov 2, 2011

Ripping out parts of a DOM using XML::XSLT

NiallBCarter

Bjoern Hoehrmann

NiallBCarter

MSwanberg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads