xhtml -> database

H

hawat.thufir

cross-posted to: mailing.database.myodbc,comp.text.xml

I have an xhtml file whose data I'd like to import to MySQL.
Unfortunately, mysqlimport will only work with text files. Mixed in
with text are some links, URL's, which I'd like to import to the
database. For the most part, a copy/paste into a plain-text file would
do the trick, but the links get lost in the process.

how do I grab the links?

Here's a snippet of the xhtml:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta
http-equiv="content-type" content="text/html; charset=utf-8" /><title
/><meta name="generator" content="StarOffice/OpenOffice.org XSLT
(http://xml.openoffice.org/sx2ml)" /><meta name="created"
content="2006-02-07T04:18:53" /><meta name="changed"
content="2006-02-07T04:19:36" /><base href="." /><style
type="text/css">
@page { }
table { border-collapse:collapse; border-spacing:0;
empty-cells:show }
td, th { vertical-align:top; }
h1, h2, h3, h4, h5, h6 { clear:both }
ol, ul { padding:0; }
* { margin:0; }
*.ta1 { }
*.ce1 { font-family:'Nimbus Roman No9 L'; font-size:10pt;
font-style:normal; text-shadow:none; font-weight:normal; }
*.Default { font-family:'Bitstream Vera Sans'; }
*.Heading { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
*.Heading1 { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
*.Result { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
*.Result2 { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
*.co1 { width:0.8925in; }
*.ro1 { height:0.4146in; }
*.ro2 { height:0.2173in; }
*.ro3 { height:0.611in; }
*.ro4 { height:0.8083in; }
*.ro5 { height:0.1681in; }
....



thanks,

Thufir
 
S

Soren Kuula

cross-posted to: mailing.database.myodbc,comp.text.xml

I have an xhtml file whose data I'd like to import to MySQL.
Unfortunately, mysqlimport will only work with text files. Mixed in
with text are some links, URL's, which I'd like to import to the
database. For the most part, a copy/paste into a plain-text file would
do the trick, but the links get lost in t

If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
they are and what they are pointing at, I suggest you think of the XML
as just a text, and use a regular expression extractor thing. Unix geeks
have sed, grep etc., or you can code it in .NET or Java or Perl pretty
easily.

Soren
 
H

hawat.thufir

Soren said:
If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
they are and what they are pointing at, I suggest you think of the XML
as just a text, and use a regular expression extractor thing. Unix geeks
have sed, grep etc., or you can code it in .NET or Java or Perl pretty
easily.

Soren

I think I'll give it a go with Saxon, hopefully this weekend. For this
particular example, yes, the URL's are "anywhere" but that might not be
the case down the road.


thanks,

Thufir
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top