H
hawat.thufir
I'm trying do some "screen scraping", and am using
<http://www.oreilly.com/catalog/xmlhks/> for inspiration.
First I'd like to convert XHTML to XML, or extract XML from XHTML, I'm
not sure how to phrase that.
"Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It
for Data"
<http://hacks.oreilly.com/pub/h/2125>
Is what I'd like to do down the line, but for now I'm working on
something simpler.
First,
"Convert an HTML Document to XHTML with HTML Tidy"
<http://hacks.oreilly.com/pub/h/2054>
Instead of Tidy, I went with TagSoup
<http://mercury.ccil.org/~cowan/XML/tagsoup/>.
Then I'd like go from XHTML to XML in order to:
"Generate an XSLT Identity Stylesheet with Relaxer"
<http://hacks.oreilly.com/pub/h/2069>
How do I get the XML from the XHTML, please?
here's what I have:[thufir@arrakis tagSoup]$
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:13 IST 2005
[thufir@arrakis tagSoup]$ pwd
/home/thufir/Desktop/tagSoup
[thufir@arrakis tagSoup]$ ll
total 60
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ java -jar tagsoup.jar --files google.html
src: google.html dst: google.xhtml
[thufir@arrakis tagSoup]$ ll
total 76
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 10568 Aug 14 23:34 google.xhtml
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ cat google.xhtml -n
1 <?xml version="1.0" standalone="yes"?>
2
3 <html version="-//W3C//DTD HTML 4.01 Transitional//EN"
xmlns="http://www.w3.org/1999/xhtml"><head><title>Google
Directory</title><style><!--
4 body,td,a,p,.h{font-family: arial,sans-serif;}
..h{color:#008000}
..q{text-decoration:none; color:#0000cc;}
5 //--></style><script>
6 <!--
7 function sf(){document.f.q.focus();}
8 // -->
9 </script></head><body bgcolor="#ffffff" text="#000000"
link="#3300cc" vlink="#660066" alink="#ff0000" onload="sf();">
10 <center>
11 <table cellpadding="0" cellspacing="0" border="0"><tr><td
align="right" colspan="1" rowspan="1" valign="bottom"><img
src="http://www.google.com/images/hp0.gif" width="158" height="78"
alt="Google Directory"></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp1.gif"
width="50" height="78" alt=""></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp2.gif"
width="68" height="78" alt=""></img></td></tr><tr><td align="right"
colspan="1" rowspan="1" valign="top" class="h"><b>Directory</b></td><td
colspan="1" rowspan="1" valign="top"><img
src="http://www.google.com/images/hp3.gif" width="50" height="32"
alt=""></img></td><td colspan="1" rowspan="1" valign="top"
class="h"></td></tr></table><br clear="none"></br><table border="0"
cellspacing="0" cellpadding="0"><tr><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="0" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="0a" href="http://www.google.com/webhp?hl=en"><font
size="-1">Web</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="1" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="1a" href="http://www.google.com/imghp?hl=en"><font
size="-1">Images</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="2" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="2a" href="http://www.google.com/grphp?hl=en"><font
size="-1">Groups</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="3" bgcolor="#008000" width="95"><font color="#ffffff"
size="-1"><b>Directory</b></font></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="4" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="4a" href="http://www.google.com/nwshp?hl=en"><font
size="-1">News</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td colspan="1" rowspan="1"
width="15"> </td></tr><tr><td colspan="12" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><form
enctype="application/x-www-form-urlencoded" method="get"
action="http://www.google.com/search" name="f"><table cellpadding="0"
cellspacing="0"><tr align="middle" valign="center"><td colspan="1"
rowspan="1" width="150"> </td><td colspan="1" rowspan="1"><input
maxlength="256" type="text" name="q" size="40"
value=""></input><script>document.f.q.focus();</script><input
type="submit" name="btnG" value="Google Search"></input><input
type="hidden" name="hl" value="en"></input><input type="hidden"
name="cat" value="gwd/Top"></input></td><td align="left" colspan="1"
rowspan="1" width="150"><font size="-2"> • <a
shape="rect" href="http://www.google.com/dirhelp.html">Directory
Help</a></font></td></tr></table></form><p><font color="#008000"><b>The
web organized by topic into categories.</b></font></p><p></p><table
align="center" width="1%" border="0" cellspacing="7"
cellpadding="0"><tr><td colspan="4" rowspan="1" bgcolor="#008000"><img
width="1" height="1" alt=""></img></td></tr><tr><td colspan="1"
rowspan="1"> </td><td colspan="1" nowrap="nowrap" rowspan="1">
12 <b><a shape="rect" href="/Top/Arts/">Arts</a></b><br
clear="none"></br>
13 <font size="-1"><a shape="rect"
href="/Top/Arts/Movies/">Movies</a>, <a shape="rect"
href="/Top/Arts/Music/">Music</a>, <a shape="rect"
href="/Top/Arts/Television/">Television</a>, ...</font><p>
14 <b><a shape="rect" href="/Top/Business/">Business</a></b><br
clear="none"></br>
15 <font size="-1"><a shape="rect"
href="/Top/Business/Major_Companies/">Companies</a>, <a shape="rect"
href="/Top/Business/Financial_Services/">Finance</a>, <a shape="rect"
href="/Top/Business/Employment/">Jobs</a>, ...</font></p><p>
16 <b><a shape="rect" href="/Top/Computers/">Computers</a></b><br
clear="none"></br>
17 <font size="-1"><a shape="rect"
href="/Top/Computers/Internet/">Internet</a>, <a shape="rect"
href="/Top/Computers/Hardware/">Hardware</a>, <a shape="rect"
href="/Top/Computers/Software/">Software</a>, ...</font></p><p>
18 <b><a shape="rect" href="/Top/Games/">Games</a></b><br
clear="none"></br>
19 <font size="-1"><a shape="rect"
href="/Top/Games/Board_Games/">Board</a>, <a shape="rect"
href="/Top/Games/Roleplaying/">Roleplaying</a>, <a shape="rect"
href="/Top/Games/Video_Games/">Video</a>, ...</font></p><p>
20 <b><a shape="rect" href="/Top/Health/">Health</a></b><br
clear="none"></br>
21 <font size="-1"><a shape="rect"
href="/Top/Health/Alternative/">Alternative</a>, <a shape="rect"
href="/Top/Health/Fitness/">Fitness</a>, <a shape="rect"
href="/Top/Health/Medicine/">Medicine</a>, ...</font></p><p>
22 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
23 <b><a shape="rect" href="/Top/Home/">Home</a></b><br
clear="none"></br>
24 <font size="-1"><a shape="rect"
href="/Top/Home/Consumer_Information/">Consumers</a>, <a shape="rect"
href="/Top/Home/Homeowners/">Homeowners</a>, <a shape="rect"
href="/Top/Home/Family/">Family</a>, ...</font><p>
25 <b><a shape="rect" href="/Top/Kids_and_Teens/">Kids and
Teens</a></b><br clear="none"></br>
26 <font size="-1"><a shape="rect"
href="/Top/Kids_and_Teens/Computers/">Computers</a>, <a shape="rect"
href="/Top/Kids_and_Teens/Entertainment/">Entertainment</a>, <a
shape="rect" href="/Top/Kids_and_Teens/School_Time/">School</a>,
....</font></p><p>
27 <b><a shape="rect" href="/Top/News/">News</a></b><br
clear="none"></br>
28 <font size="-1"><a shape="rect"
href="/Top/News/Media/">Media</a>, <a shape="rect"
href="/Top/News/Newspapers/">Newspapers</a>, <a shape="rect"
href="/Top/News/Current_Events/">Current Events</a>, ...</font></p><p>
29 <b><a shape="rect"
href="/Top/Recreation/">Recreation</a></b><br
clear="none"></br> 30 <font size="-1"><a shape="rect"
href="/Top/Recreation/Food/">Food</a>, <a shape="rect"
href="/Top/Recreation/Outdoors/">Outdoors</a>, <a shape="rect"
href="/Top/Recreation/Travel/">Travel</a>, ...</font></p><p>
31 <b><a shape="rect" href="/Top/Reference/">Reference</a></b><br
clear="none"></br>
32 <font size="-1"><a shape="rect"
href="/Top/Reference/Education/">Education</a>, <a shape="rect"
href="/Top/Reference/Libraries/">Libraries</a>, <a shape="rect"
href="/Top/Reference/Maps/">Maps</a>, ...</font></p><p>
33 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
34 <b><a shape="rect" href="/Top/Regional/">Regional</a></b><br
clear="none"></br>
35 <font size="-1"><a shape="rect"
href="/Top/Regional/Asia/">Asia</a>, <a shape="rect"
href="/Top/Regional/Europe/">Europe</a>, <a shape="rect"
href="/Top/Regional/North_America/">North America</a>, ...</font><p>
36 <b><a shape="rect" href="/Top/Science/">Science</a></b><br
clear="none"></br>
37 <font size="-1"><a shape="rect"
href="/Top/Science/Biology/">Biology</a>, <a shape="rect"
href="/Top/Science/Social_Sciences/Psychology/">Psychology</a>, <a
shape="rect" href="/Top/Science/Physics/">Physics</a>,
....</font></p><p>
38 <b><a shape="rect" href="/Top/Shopping/">Shopping</a></b><br
clear="none"></br>
39 <font size="-1"><a shape="rect"
href="/Top/Shopping/Vehicles/Autos/">Autos</a>, <a shape="rect"
href="/Top/Shopping/Clothing/">Clothing</a>, <a shape="rect"
href="/Top/Shopping/Gifts/">Gifts</a>, ...</font></p><p>
40 <b><a shape="rect" href="/Top/Society/">Society</a></b><br
clear="none"></br>
41 <font size="-1"><a shape="rect"
href="/Top/Society/Issues/">Issues</a>, <a shape="rect"
href="/Top/Society/People/">People</a>, <a shape="rect"
href="/Top/Society/Religion_and_Spirituality/">Religion</a>,
....</font></p><p>
42 <b><a shape="rect" href="/Top/Sports/">Sports</a></b><br
clear="none"></br>
43 <font size="-1"><a shape="rect"
href="/Top/Sports/Basketball/">Basketball</a>, <a shape="rect"
href="/Top/Sports/Football/">Football</a>, <a shape="rect"
href="/Top/Sports/Soccer/">Soccer</a>, ...</font></p><p>
44 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="3" rowspan="1"><b><a shape="rect"
href="/Top/World/">World</a></b><br clear="none"></br>
45 <font size="-1"><a shape="rect"
href="/Top/World/Deutsch/">Deutsch</a>, <a shape="rect"
href="/Top/World/Espa%C3%B1ol/">Espa�ol</a>, <a shape="rect"
href="/Top/World/Fran%C3%A7ais/">Fran�ais</a>, <a shape="rect"
href="/Top/World/Italiano/">Italiano</a>, <a shape="rect"
href="/Top/World/Japanese/">Japanese</a>, <a shape="rect"
href="/Top/World/Korean/">Korean</a>, <a shape="rect"
href="/Top/World/Nederlands/">Nederlands</a>, <a shape="rect"
href="/Top/World/Polska/">Polska</a>, <a shape="rect"
href="/Top/World/Svenska/">Svenska</a>, ...</font><p>
46 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="1" nowrap="nowrap" rowspan="1"><font
size="-1"> </font></td></tr><tr><td colspan="4" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><font size="-1"><a
shape="rect"
href="http://www.google.com/ads/">Advertise with Us</a> - <a
shape="rect"
href="http://www.google.com/about.html">Jobs, Press, Cool Stuff...</a></font><p><font
face="arial,sans-serif" size="-1"> ©2004 Google</font></p><br
clear="none"></br><table align="center" border="0" bgcolor="#336600"
cellpadding="3" cellspacing="0"><tr><td colspan="1" rowspan="1"> <table
width="100%" cellpadding="2" cellspacing="0" border="0"><tr
align="center"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2" color="#ffffff">Help build the largest
human-edited directory on the web.</font></td></tr><tr align="center"
bgcolor="#cccccc"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2">
47 <a shape="rect" href="http://dmoz.org/add.html">
48 Submit a Site</a> - <a shape="rect"
href="http://dmoz.org/about.html"><b>Open Directory Project</b></a> -
49 <a shape="rect" href="http://dmoz.org/cgi-bin/apply.cgi">Become
an Editor</a> </font>
50 </td></tr></table>
51 </td></tr></table>
52 </center></body></html>
53
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:57 IST 2005
[thufir@arrakis tagSoup]$
Thanks,
Thufir
<http://www.oreilly.com/catalog/xmlhks/> for inspiration.
First I'd like to convert XHTML to XML, or extract XML from XHTML, I'm
not sure how to phrase that.
"Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It
for Data"
<http://hacks.oreilly.com/pub/h/2125>
Is what I'd like to do down the line, but for now I'm working on
something simpler.
First,
"Convert an HTML Document to XHTML with HTML Tidy"
<http://hacks.oreilly.com/pub/h/2054>
Instead of Tidy, I went with TagSoup
<http://mercury.ccil.org/~cowan/XML/tagsoup/>.
Then I'd like go from XHTML to XML in order to:
"Generate an XSLT Identity Stylesheet with Relaxer"
<http://hacks.oreilly.com/pub/h/2069>
How do I get the XML from the XHTML, please?
here's what I have:[thufir@arrakis tagSoup]$
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:13 IST 2005
[thufir@arrakis tagSoup]$ pwd
/home/thufir/Desktop/tagSoup
[thufir@arrakis tagSoup]$ ll
total 60
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ java -jar tagsoup.jar --files google.html
src: google.html dst: google.xhtml
[thufir@arrakis tagSoup]$ ll
total 76
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 10568 Aug 14 23:34 google.xhtml
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ cat google.xhtml -n
1 <?xml version="1.0" standalone="yes"?>
2
3 <html version="-//W3C//DTD HTML 4.01 Transitional//EN"
xmlns="http://www.w3.org/1999/xhtml"><head><title>Google
Directory</title><style><!--
4 body,td,a,p,.h{font-family: arial,sans-serif;}
..h{color:#008000}
..q{text-decoration:none; color:#0000cc;}
5 //--></style><script>
6 <!--
7 function sf(){document.f.q.focus();}
8 // -->
9 </script></head><body bgcolor="#ffffff" text="#000000"
link="#3300cc" vlink="#660066" alink="#ff0000" onload="sf();">
10 <center>
11 <table cellpadding="0" cellspacing="0" border="0"><tr><td
align="right" colspan="1" rowspan="1" valign="bottom"><img
src="http://www.google.com/images/hp0.gif" width="158" height="78"
alt="Google Directory"></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp1.gif"
width="50" height="78" alt=""></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp2.gif"
width="68" height="78" alt=""></img></td></tr><tr><td align="right"
colspan="1" rowspan="1" valign="top" class="h"><b>Directory</b></td><td
colspan="1" rowspan="1" valign="top"><img
src="http://www.google.com/images/hp3.gif" width="50" height="32"
alt=""></img></td><td colspan="1" rowspan="1" valign="top"
class="h"></td></tr></table><br clear="none"></br><table border="0"
cellspacing="0" cellpadding="0"><tr><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="0" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="0a" href="http://www.google.com/webhp?hl=en"><font
size="-1">Web</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="1" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="1a" href="http://www.google.com/imghp?hl=en"><font
size="-1">Images</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="2" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="2a" href="http://www.google.com/grphp?hl=en"><font
size="-1">Groups</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="3" bgcolor="#008000" width="95"><font color="#ffffff"
size="-1"><b>Directory</b></font></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="4" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="4a" href="http://www.google.com/nwshp?hl=en"><font
size="-1">News</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td colspan="1" rowspan="1"
width="15"> </td></tr><tr><td colspan="12" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><form
enctype="application/x-www-form-urlencoded" method="get"
action="http://www.google.com/search" name="f"><table cellpadding="0"
cellspacing="0"><tr align="middle" valign="center"><td colspan="1"
rowspan="1" width="150"> </td><td colspan="1" rowspan="1"><input
maxlength="256" type="text" name="q" size="40"
value=""></input><script>document.f.q.focus();</script><input
type="submit" name="btnG" value="Google Search"></input><input
type="hidden" name="hl" value="en"></input><input type="hidden"
name="cat" value="gwd/Top"></input></td><td align="left" colspan="1"
rowspan="1" width="150"><font size="-2"> • <a
shape="rect" href="http://www.google.com/dirhelp.html">Directory
Help</a></font></td></tr></table></form><p><font color="#008000"><b>The
web organized by topic into categories.</b></font></p><p></p><table
align="center" width="1%" border="0" cellspacing="7"
cellpadding="0"><tr><td colspan="4" rowspan="1" bgcolor="#008000"><img
width="1" height="1" alt=""></img></td></tr><tr><td colspan="1"
rowspan="1"> </td><td colspan="1" nowrap="nowrap" rowspan="1">
12 <b><a shape="rect" href="/Top/Arts/">Arts</a></b><br
clear="none"></br>
13 <font size="-1"><a shape="rect"
href="/Top/Arts/Movies/">Movies</a>, <a shape="rect"
href="/Top/Arts/Music/">Music</a>, <a shape="rect"
href="/Top/Arts/Television/">Television</a>, ...</font><p>
14 <b><a shape="rect" href="/Top/Business/">Business</a></b><br
clear="none"></br>
15 <font size="-1"><a shape="rect"
href="/Top/Business/Major_Companies/">Companies</a>, <a shape="rect"
href="/Top/Business/Financial_Services/">Finance</a>, <a shape="rect"
href="/Top/Business/Employment/">Jobs</a>, ...</font></p><p>
16 <b><a shape="rect" href="/Top/Computers/">Computers</a></b><br
clear="none"></br>
17 <font size="-1"><a shape="rect"
href="/Top/Computers/Internet/">Internet</a>, <a shape="rect"
href="/Top/Computers/Hardware/">Hardware</a>, <a shape="rect"
href="/Top/Computers/Software/">Software</a>, ...</font></p><p>
18 <b><a shape="rect" href="/Top/Games/">Games</a></b><br
clear="none"></br>
19 <font size="-1"><a shape="rect"
href="/Top/Games/Board_Games/">Board</a>, <a shape="rect"
href="/Top/Games/Roleplaying/">Roleplaying</a>, <a shape="rect"
href="/Top/Games/Video_Games/">Video</a>, ...</font></p><p>
20 <b><a shape="rect" href="/Top/Health/">Health</a></b><br
clear="none"></br>
21 <font size="-1"><a shape="rect"
href="/Top/Health/Alternative/">Alternative</a>, <a shape="rect"
href="/Top/Health/Fitness/">Fitness</a>, <a shape="rect"
href="/Top/Health/Medicine/">Medicine</a>, ...</font></p><p>
22 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
23 <b><a shape="rect" href="/Top/Home/">Home</a></b><br
clear="none"></br>
24 <font size="-1"><a shape="rect"
href="/Top/Home/Consumer_Information/">Consumers</a>, <a shape="rect"
href="/Top/Home/Homeowners/">Homeowners</a>, <a shape="rect"
href="/Top/Home/Family/">Family</a>, ...</font><p>
25 <b><a shape="rect" href="/Top/Kids_and_Teens/">Kids and
Teens</a></b><br clear="none"></br>
26 <font size="-1"><a shape="rect"
href="/Top/Kids_and_Teens/Computers/">Computers</a>, <a shape="rect"
href="/Top/Kids_and_Teens/Entertainment/">Entertainment</a>, <a
shape="rect" href="/Top/Kids_and_Teens/School_Time/">School</a>,
....</font></p><p>
27 <b><a shape="rect" href="/Top/News/">News</a></b><br
clear="none"></br>
28 <font size="-1"><a shape="rect"
href="/Top/News/Media/">Media</a>, <a shape="rect"
href="/Top/News/Newspapers/">Newspapers</a>, <a shape="rect"
href="/Top/News/Current_Events/">Current Events</a>, ...</font></p><p>
29 <b><a shape="rect"
href="/Top/Recreation/">Recreation</a></b><br
clear="none"></br> 30 <font size="-1"><a shape="rect"
href="/Top/Recreation/Food/">Food</a>, <a shape="rect"
href="/Top/Recreation/Outdoors/">Outdoors</a>, <a shape="rect"
href="/Top/Recreation/Travel/">Travel</a>, ...</font></p><p>
31 <b><a shape="rect" href="/Top/Reference/">Reference</a></b><br
clear="none"></br>
32 <font size="-1"><a shape="rect"
href="/Top/Reference/Education/">Education</a>, <a shape="rect"
href="/Top/Reference/Libraries/">Libraries</a>, <a shape="rect"
href="/Top/Reference/Maps/">Maps</a>, ...</font></p><p>
33 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
34 <b><a shape="rect" href="/Top/Regional/">Regional</a></b><br
clear="none"></br>
35 <font size="-1"><a shape="rect"
href="/Top/Regional/Asia/">Asia</a>, <a shape="rect"
href="/Top/Regional/Europe/">Europe</a>, <a shape="rect"
href="/Top/Regional/North_America/">North America</a>, ...</font><p>
36 <b><a shape="rect" href="/Top/Science/">Science</a></b><br
clear="none"></br>
37 <font size="-1"><a shape="rect"
href="/Top/Science/Biology/">Biology</a>, <a shape="rect"
href="/Top/Science/Social_Sciences/Psychology/">Psychology</a>, <a
shape="rect" href="/Top/Science/Physics/">Physics</a>,
....</font></p><p>
38 <b><a shape="rect" href="/Top/Shopping/">Shopping</a></b><br
clear="none"></br>
39 <font size="-1"><a shape="rect"
href="/Top/Shopping/Vehicles/Autos/">Autos</a>, <a shape="rect"
href="/Top/Shopping/Clothing/">Clothing</a>, <a shape="rect"
href="/Top/Shopping/Gifts/">Gifts</a>, ...</font></p><p>
40 <b><a shape="rect" href="/Top/Society/">Society</a></b><br
clear="none"></br>
41 <font size="-1"><a shape="rect"
href="/Top/Society/Issues/">Issues</a>, <a shape="rect"
href="/Top/Society/People/">People</a>, <a shape="rect"
href="/Top/Society/Religion_and_Spirituality/">Religion</a>,
....</font></p><p>
42 <b><a shape="rect" href="/Top/Sports/">Sports</a></b><br
clear="none"></br>
43 <font size="-1"><a shape="rect"
href="/Top/Sports/Basketball/">Basketball</a>, <a shape="rect"
href="/Top/Sports/Football/">Football</a>, <a shape="rect"
href="/Top/Sports/Soccer/">Soccer</a>, ...</font></p><p>
44 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="3" rowspan="1"><b><a shape="rect"
href="/Top/World/">World</a></b><br clear="none"></br>
45 <font size="-1"><a shape="rect"
href="/Top/World/Deutsch/">Deutsch</a>, <a shape="rect"
href="/Top/World/Espa%C3%B1ol/">Espa�ol</a>, <a shape="rect"
href="/Top/World/Fran%C3%A7ais/">Fran�ais</a>, <a shape="rect"
href="/Top/World/Italiano/">Italiano</a>, <a shape="rect"
href="/Top/World/Japanese/">Japanese</a>, <a shape="rect"
href="/Top/World/Korean/">Korean</a>, <a shape="rect"
href="/Top/World/Nederlands/">Nederlands</a>, <a shape="rect"
href="/Top/World/Polska/">Polska</a>, <a shape="rect"
href="/Top/World/Svenska/">Svenska</a>, ...</font><p>
46 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="1" nowrap="nowrap" rowspan="1"><font
size="-1"> </font></td></tr><tr><td colspan="4" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><font size="-1"><a
shape="rect"
href="http://www.google.com/ads/">Advertise with Us</a> - <a
shape="rect"
href="http://www.google.com/about.html">Jobs, Press, Cool Stuff...</a></font><p><font
face="arial,sans-serif" size="-1"> ©2004 Google</font></p><br
clear="none"></br><table align="center" border="0" bgcolor="#336600"
cellpadding="3" cellspacing="0"><tr><td colspan="1" rowspan="1"> <table
width="100%" cellpadding="2" cellspacing="0" border="0"><tr
align="center"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2" color="#ffffff">Help build the largest
human-edited directory on the web.</font></td></tr><tr align="center"
bgcolor="#cccccc"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2">
47 <a shape="rect" href="http://dmoz.org/add.html">
48 Submit a Site</a> - <a shape="rect"
href="http://dmoz.org/about.html"><b>Open Directory Project</b></a> -
49 <a shape="rect" href="http://dmoz.org/cgi-bin/apply.cgi">Become
an Editor</a> </font>
50 </td></tr></table>
51 </td></tr></table>
52 </center></body></html>
53
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:57 IST 2005
[thufir@arrakis tagSoup]$
Thanks,
Thufir