XSLT, HTML to XML, understanding external Website

A

Arne Pagel

Dear all,

currently I am searching for a concept for importing Information from an external website to my
xslt/php based lunch order system.
My current Idea is to filter this external website with xslt and convert the necessary information
to an xml file.

At the moment I am trying to import the weekly changing menu from a Restaurant.
The Problem is, that the Website of this restaurant is probably maintained through a web based CMS,
which means that the quality and consistency of the web page is not that high.

Main problem is that one important Information delimiter is the linefeed <BR> within normal text.
I am stuck at the point how I can react on <BR> Tags at normal node text.

Below you can find an extract of the original web-site.

With the current xslt Template an empty node filtering is done:

- - -
<xsl:template match="table/tr/td/div">
<xsl:if test=". != ''">
DIV:<xsl:value-of select="." /> <br/>
</xsl:if>
</xsl:template>
- - -

Now I want to add the following functionality:
- this Template should just work at a table which contains the phrase "Mittagstischkarte"
- The linefeed's <br> within the text should be Identified
- The Menues are just clearly separated by the price,
an number of the Format X.XX should be identified
- Rows with just formating content without real text (A-Z a-z 0-9) should be ignored

Do you think this can all be done with xlst?
It is also possible to do this in more templates with different calls from php, or to add some php
post / intermediate processing.


Here is the extract of the Original website (sorry, content is German)
- - -
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td width="30" height="552"></td>
<td width="529" valign="top">
<div align="center"><font size="4"><b>Mittagstischkarte</b></font><br><br><font
size="4"><font size="3">Unser wöchentlich wechselnder Mittagstisch</font></font> <br><font
size="4"><font size="3">von 12.00 bis 14.00 Uhr</font></font></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font>&nbsp;</div>
<div align="center"><b><font size="3">"Eintopf der Woche"</font></b><br>Linseneintopf mit
Bockwurst<br>¤ 5,50 <br></div>
<div align="center"><font size="3"></font>&nbsp;</div>
<div align="center"><font size="6">Tagessuppe &nbsp; 1,50 ¤<br><br></font>&nbsp;<br></div>
<div align="center"><font size="3">Kasseler mit Sauerkraut und Kartoffelpüree<br><b>5,50
¤</b><br></font><br><font size="4"><font size="2">__________</font></font><font size="4"><br>kl.
Schnitzel mit Sauce nach Wahl,<br>Bratkartoffeln und Gemüse<br><br></font><font size="4"><b>5,50
¤<br><br></b></font></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center">______________</div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4"></font></div><div align="center"><font size="4"
color="#f0f090">frische Bratwurst<br>mit Bratkartoffeln und Gemüse<br></font></div><div
align="center"></div>
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4"></font></div>

<div align="center"><font size="4">5,50 ¤</font></div>
<div align="center">____________</div>
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4">fruchtiges Hähnchengeschnetzeltes<br>im Reisrand mit
Salat<br>5,50 ¤<br>---------<br></font></div>
<div align="center"></div>
<div align="center"><font size="4">2 Spiegeleier<br>&nbsp;mit Salzkartoffeln und Blattspinat<br>5,50
¤</font></div>
<div align="center"><font size="4"></font></div>
<div align="center"></div>
<div align="center"><font size="4">_________</font><br><font
size="5"><br>Dessert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1,50
¤</font><br><br><br><br></div><font size="4"><br></font>
<div align="center"><font size="4"><font size="4"></font></font></div><font
size="4">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br><br></font>
<div align="center"></div>
<div align="center"><font size="3"></font></div>
<div align="center"></div> </td>
<td width="30"></td>
</tr>
</table>

This page is loaded via the DOM Function loadHTMLFile

- - -
Regards Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top