H
hawat.thufir
Given an xhtml file, how can I "export" the data to plain-text? That is,
I want:
google www.google.com
Whereas, if I copy and paste what the browser shows, I lose the URL and
end up with:
google
The idea is that I want to import the data to MySQL using the mysqlimport
command, but mysqlimport requires plain-text. The xhtml file in question:
[thufir@localhost Desktop]$ cat raw.xhtml -n
1 <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta
http-equiv="content-type" content="text/html; charset=utf-8" /><title
/><meta name="generator" content="StarOffice/OpenOffice.org XSLT
(http://xml.openoffice.org/sx2ml)" /><meta name="created"
content="2006-02-07T15:19:17" /><meta name="changed"
content="2006-02-07T15:36:55" /><base href="." /><style type="text/css">
3 @page { }
4 table { border-collapse:collapse; border-spacing:0;
empty-cells:show }
5 td, th { vertical-align:top; }
6 h1, h2, h3, h4, h5, h6 { clear:both }
7 ol, ul { padding:0; }
8 * { margin:0; }
9 *.ta1 { }
10 *.ce1 { font-family:Courier; color:#000000;
font-size:10pt; font-style:normal; text-shadow:none; font-weight:normal; }
11 *.ce2 { font-family:Courier; color:#000000; }
12 *.Default { font-family:'Bitstream Vera Sans'; }
13 *.Heading { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
14 *.Heading1 { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
15 *.Result { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
16 *.Result2 { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
17 *.co1 { width:0.8925in; }
18 *.ro1 { height:0.1756in; }
19 *.ro2 { height:0.1681in; }
20 </style></head><body dir="ltr"><table border="0"
cellspacing="0" cellpadding="0" class="ta1"><colgroup><col width="99"
/></colgroup><tr class="ro1"><td style="text-align:left;width:0.8925in; "
class="ce1"><p><a href="http://www.google.com/">google
</a>Â Â </p></td></tr><tr class="ro2"><td
style="text-align:left;width:0.8925in; " class="ce2" /></tr><tr
class="ro2"><td style="text-align:left;width:0.8925in; " class="ce2"
/></tr></table></body></html>[thufir@localhost Desktop]$ date
Tue Feb 7 15:52:34 EST 2006
[thufir@localhost Desktop]$
thanks,
Thufir
I want:
google www.google.com
Whereas, if I copy and paste what the browser shows, I lose the URL and
end up with:
The idea is that I want to import the data to MySQL using the mysqlimport
command, but mysqlimport requires plain-text. The xhtml file in question:
[thufir@localhost Desktop]$ cat raw.xhtml -n
1 <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta
http-equiv="content-type" content="text/html; charset=utf-8" /><title
/><meta name="generator" content="StarOffice/OpenOffice.org XSLT
(http://xml.openoffice.org/sx2ml)" /><meta name="created"
content="2006-02-07T15:19:17" /><meta name="changed"
content="2006-02-07T15:36:55" /><base href="." /><style type="text/css">
3 @page { }
4 table { border-collapse:collapse; border-spacing:0;
empty-cells:show }
5 td, th { vertical-align:top; }
6 h1, h2, h3, h4, h5, h6 { clear:both }
7 ol, ul { padding:0; }
8 * { margin:0; }
9 *.ta1 { }
10 *.ce1 { font-family:Courier; color:#000000;
font-size:10pt; font-style:normal; text-shadow:none; font-weight:normal; }
11 *.ce2 { font-family:Courier; color:#000000; }
12 *.Default { font-family:'Bitstream Vera Sans'; }
13 *.Heading { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
14 *.Heading1 { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
15 *.Result { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
16 *.Result2 { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
17 *.co1 { width:0.8925in; }
18 *.ro1 { height:0.1756in; }
19 *.ro2 { height:0.1681in; }
20 </style></head><body dir="ltr"><table border="0"
cellspacing="0" cellpadding="0" class="ta1"><colgroup><col width="99"
/></colgroup><tr class="ro1"><td style="text-align:left;width:0.8925in; "
class="ce1"><p><a href="http://www.google.com/">google
</a>Â Â </p></td></tr><tr class="ro2"><td
style="text-align:left;width:0.8925in; " class="ce2" /></tr><tr
class="ro2"><td style="text-align:left;width:0.8925in; " class="ce2"
/></tr></table></body></html>[thufir@localhost Desktop]$ date
Tue Feb 7 15:52:34 EST 2006
[thufir@localhost Desktop]$
thanks,
Thufir