xhtml -> database

Discussion in 'XML' started by hawat.thufir@gmail.com, Feb 7, 2006.

  1. Guest

    cross-posted to: mailing.database.myodbc,comp.text.xml

    I have an xhtml file whose data I'd like to import to MySQL.
    Unfortunately, mysqlimport will only work with text files. Mixed in
    with text are some links, URL's, which I'd like to import to the
    database. For the most part, a copy/paste into a plain-text file would
    do the trick, but the links get lost in the process.

    how do I grab the links?

    Here's a snippet of the xhtml:

    <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"><head><meta
    http-equiv="content-type" content="text/html; charset=utf-8" /><title
    /><meta name="generator" content="StarOffice/OpenOffice.org XSLT
    (http://xml.openoffice.org/sx2ml)" /><meta name="created"
    content="2006-02-07T04:18:53" /><meta name="changed"
    content="2006-02-07T04:19:36" /><base href="." /><style
    type="text/css">
    @page { }
    table { border-collapse:collapse; border-spacing:0;
    empty-cells:show }
    td, th { vertical-align:top; }
    h1, h2, h3, h4, h5, h6 { clear:both }
    ol, ul { padding:0; }
    * { margin:0; }
    *.ta1 { }
    *.ce1 { font-family:'Nimbus Roman No9 L'; font-size:10pt;
    font-style:normal; text-shadow:none; font-weight:normal; }
    *.Default { font-family:'Bitstream Vera Sans'; }
    *.Heading { font-family:'Bitstream Vera Sans';
    text-align:center ! important; font-size:16pt; font-style:italic;
    font-weight:bold; }
    *.Heading1 { font-family:'Bitstream Vera Sans';
    text-align:center ! important; font-size:16pt; font-style:italic;
    font-weight:bold; }
    *.Result { font-family:'Bitstream Vera Sans';
    font-style:italic; font-weight:bold; text-decoration:underline; }
    *.Result2 { font-family:'Bitstream Vera Sans';
    font-style:italic; font-weight:bold; text-decoration:underline; }
    *.co1 { width:0.8925in; }
    *.ro1 { height:0.4146in; }
    *.ro2 { height:0.2173in; }
    *.ro3 { height:0.611in; }
    *.ro4 { height:0.8083in; }
    *.ro5 { height:0.1681in; }
    ....



    thanks,

    Thufir
     
    , Feb 7, 2006
    #1
    1. Advertising

  2. Soren Kuula Guest

    wrote:
    > cross-posted to: mailing.database.myodbc,comp.text.xml
    >
    > I have an xhtml file whose data I'd like to import to MySQL.
    > Unfortunately, mysqlimport will only work with text files. Mixed in
    > with text are some links, URL's, which I'd like to import to the
    > database. For the most part, a copy/paste into a plain-text file would
    > do the trick, but the links get lost in t


    If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
    they are and what they are pointing at, I suggest you think of the XML
    as just a text, and use a regular expression extractor thing. Unix geeks
    have sed, grep etc., or you can code it in .NET or Java or Perl pretty
    easily.

    Soren
     
    Soren Kuula, Feb 10, 2006
    #2
    1. Advertising

  3. Guest

    Soren Kuula wrote:
    > wrote:
    > > cross-posted to: mailing.database.myodbc,comp.text.xml
    > >
    > > I have an xhtml file whose data I'd like to import to MySQL.
    > > Unfortunately, mysqlimport will only work with text files. Mixed in
    > > with text are some links, URL's, which I'd like to import to the
    > > database. For the most part, a copy/paste into a plain-text file would
    > > do the trick, but the links get lost in t

    >
    > If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
    > they are and what they are pointing at, I suggest you think of the XML
    > as just a text, and use a regular expression extractor thing. Unix geeks
    > have sed, grep etc., or you can code it in .NET or Java or Perl pretty
    > easily.
    >
    > Soren


    I think I'll give it a go with Saxon, hopefully this weekend. For this
    particular example, yes, the URL's are "anywhere" but that might not be
    the case down the road.


    thanks,

    Thufir
     
    , Feb 10, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    7
    Views:
    954
  2. chronos3d
    Replies:
    9
    Views:
    834
    Andy Dingley
    Dec 5, 2006
  3. Usha2009
    Replies:
    0
    Views:
    1,174
    Usha2009
    Dec 20, 2009
  4. xhtml champs
    Replies:
    0
    Views:
    567
    xhtml champs
    Aug 1, 2011
  5. xhtml champs
    Replies:
    0
    Views:
    1,083
    xhtml champs
    Aug 2, 2011
Loading...

Share This Page