Extracing data from XHTML file into another

C

chris_huh

I think those headline elements are deep down inside the table so you
either have to spell out the complete path or use
         <xsl:value-of select="descendant::xhtml:headline"/>

oh so i need to declare every single thing. I will have a go tomorrow
and see if that works.
thanks a lot.
 
C

chris_huh

oh so i need to declare every single thing. I will have a go tomorrow
and see if that works.
thanks a lot.

Ah yes that works great. I've got all the headlines coming up in a
ul.

If it possible to extract the href value too. I was thinking something
like <a href="{descendant::xhtml:href}"> but that doesn't work.

Thanks
 
M

Martin Honnen

chris_huh said:
If it possible to extract the href value too. I was thinking something
like <a href="{descendant::xhtml:href}"> but that doesn't work.

XHTML does not have any 'href' element and I don't see any 'href'
elements in the markup you posted.
In XHTML the 'a' elements have a 'href' attribute so you could try
descendant::xhtml:a/@href
if that is what you are looking for but in the sample you posted earlier
you only have
<a href="#">
so I am not sure that is what you are looking for.
 
C

chris_huh

XHTML does not have any 'href' element and I don't see any 'href'
elements in the markup you posted.
In XHTML the 'a' elements have a 'href' attribute so you could try
   descendant::xhtml:a/@href
if that is what you are looking for but in the sample you posted earlier
you only have
   <a href="#">
so I am not sure that is what you are looking for.

yes, sorry that is what i meant. The href attribute of the a element.

And that worked great.

Thanks for all your help, i think everything is working as i wanted
now. Although there is one more thing (but more of a cherry-on-top
sort of thing). At the moment i obviously have to use an asp page to
create the page. Which if i then try to include in a shtml file using
the SSI includes will not work as the shtml file isn't an ASP, so i
can't use shtml files.

Is there a way with XML that you can force it to make an external
file. So you have an input file, an XSL file, a file that does all the
transforming and then an output file (which could be .sssi). This
would just save me having to use .ASP pages for just this one thing.
But if that isn't possible, its no problem.
 
M

Martin Honnen

chris_huh said:
Is there a way with XML that you can force it to make an external
file. So you have an input file, an XSL file, a file that does all the
transforming and then an output file (which could be .sssi). This
would just save me having to use .ASP pages for just this one thing.

Your ASP uses MSXML objects with VBScript. VBScript can also be used in
Windows Script Host (WSH) files so instead of embedding your script code
in ASP you could write a .vbs file (e.g. prog.vbs) and execute that with
WSH (e.g. by doing 'cscript prog.vbs' at a command prompt). The main
change would be to use
Set xml = CreateObject(...)
instead
Set xml = Server.CreateObject(...)
and then obviously you would need to to write files instead of
Response.Writing stuff to the browser.
See MSDN for
WSH:http://msdn.microsoft.com/en-us/library/9bbdkx3k(VS.85).aspx

If you need help with that then I suggest you find a VBScript newsgroup
on news.microsoft.com

Other options obviously would be to not use script languages but rather
more modern approaches like the .NET framework and its XML classes/APIs
to solve the problems. There are free Visual Studio Express editions for
VB.NET and C# where you would have the advantage of getting IDE support
like Intellisense to write your programs.
 
C

chris_huh

Your ASP uses MSXML objects with VBScript. VBScript can also be used in
Windows Script Host (WSH) files so instead of embedding your script code
in ASP you could write a .vbs file (e.g. prog.vbs) and execute that with
WSH (e.g. by doing 'cscript prog.vbs' at a command prompt). The main
change would be to use
   Set xml = CreateObject(...)
instead
   Set xml = Server.CreateObject(...)
and then obviously you would need to to write files instead of
Response.Writing stuff to the browser.
See MSDN for
WSH:http://msdn.microsoft.com/en-us/library/9bbdkx3k(VS.85).aspx

If you need help with that then I suggest you find a VBScript newsgroup
on news.microsoft.com

Other options obviously would be to not use script languages but rather
more modern approaches like the .NET framework and its XML classes/APIs
to solve the problems. There are free Visual Studio Express editions for
VB.NET and C# where you would have the advantage of getting IDE support
like Intellisense to write your programs.

Thanks, i might look into that. Although it might be a bit overkill.

Thanks a lot.
 
P

Peter Flynn

chris_huh said:
Is there a way to extract data from one xhtml file and create another
one with it. I want to create a basic file with all the headlines from
a news page listed in it (like an rss feed).

Pass it through XML Tidy to ensure it becomes XHTML, then use an XML
processor to extract the bits you want with XPath statements. This is a
form of screen-scraping and it is used quite extensively to extract
information like headlines to create RSS feeds and the like.

Suppose you test the document, and after being Tidy'd you find that the
headlines are all in H4 elements inside a DIV whose class is "news". In
an XSLT transformation you could write something like

<xsl:template match="/">
<html>
<head><title>Copied headlines</title></head>
<body>
<ul>
<xsl:apply-templates select="//div[@class='news']/h4"/>
</ul>
</body>
</html>
</xsl:template>

<xsl:template match="h4">
<li>
<xsl:value-of select="."/>
</li>
</xsl:template>

///Peter
 
C

chris_huh

chris_huh said:
Is there a way to extract data from one xhtml file and create another
one with it. I want to create a basic file with all the headlines from
a news page listed in it (like an rss feed).

Pass it through XML Tidy to ensure it becomes XHTML, then use an XML
processor to extract the bits you want with XPath statements. This is a
form of screen-scraping and it is used quite extensively to extract
information like headlines to create RSS feeds and the like.

Suppose you test the document, and after being Tidy'd you find that the
headlines are all in H4 elements inside a DIV whose class is "news". In
an XSLT transformation you could write something like

<xsl:template match="/">
   <html>
     <head><title>Copied headlines</title></head>
     <body>
       <ul>
         <xsl:apply-templates select="//div[@class='news']/h4"/>
       </ul>
     </body>
   </html>
</xsl:template>

<xsl:template match="h4">
   <li>
     <xsl:value-of select="."/>
   </li>
</xsl:template>

///Peter

That's what i had thought about doing but wasn't too sure on the
steps. I have got it working using a separate asp file now so i think
everything is fine.

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top