extract plain-text from xml-file / remove all tags

P

peter pilsl

For feeding the content of an xml-file to a search-indexer I need to
remove all tags and extract the plaintext out of a xml-file.

I use the null-xls-stylesheet

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

</xsl:stylesheet>

to remove all tags.
Problem now is that, while it actually works, I found out that removing
all tags is not exactely what I want, cause I ended up with all content
in one string without any whitespace in between. So actually what i want
is to replace all tags with a space.

I use linux/xsltproc/perl and I am definitely no master of xml. I rarely
used it until now and while I do quite fine in perl, I cannot master
this simple xml-problem on my own

thnx for any help
peter
 
R

Richard Tobin

peter pilsl said:
Problem now is that, while it actually works, I found out that removing
all tags is not exactely what I want, cause I ended up with all content
in one string without any whitespace in between. So actually what i want
is to replace all tags with a space.

As usual, you should start with an identity stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Then add a rule that mataches all elements, and replaces them with
space, then the result of processing the children, then another space.
In this case you need to specify a priority, because "*" has the same
priority as "node()".

<xsl:template match="*" priority="1">
<xsl:text> </xsl:text>
<xsl:apply-templates select="node()"/>
<xsl:text> </xsl:text>
</xsl:template>

It would be slightly simpler to write a template for elements and a
template for text, but I wanted to illustrate the generally useful
method of starting with an identity stylesheet and adding templates.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top