HTML to XML

Discussion in 'XML' started by 2peachy, Jan 14, 2004.

  1. 2peachy

    2peachy Guest

    2peachy, Jan 14, 2004
    #1
    1. Advertising

  2. 2peachy wrote:
    > hello... I am brand new to this...
    > I did a search with no results...
    >
    > how do you convert an html page into an xml page ?


    For valid HTML documents you can use sx from OpenSP. Or use tidy to
    output XHTML.
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
    Johannes Koch, Jan 14, 2004
    #2
    1. Advertising

  3. 2peachy

    Andy Dingley Guest

    On Tue, 13 Jan 2004 20:29:34 -0600, 2peachy
    <4designers.com> wrote:

    >how do you convert an html page into an xml page ?


    How long is a piece of string ?


    How many pages are you dealing with ? Is this a one-off "I want to
    convert my site" or a regular "I want to scrape stock prices from
    another site and make them into an XML feed" ?

    What's "HTML" ? Is this well-coded valid HTML 3.2 / 4.0, XHTML or
    some tag-soup written by a M$oft tool ? What happens if it's not
    valid ? Can your code crash, abandon the page, scream for human help,
    or must it make a best-attempt ?

    Can you avoid this altogether ? Can you obtain the content by some
    friendlier means, such as RSS, direct access to the database, or some
    other source ?

    Why do you want to do it ? There are no "XML pages", there are only
    XML documents. If you want to end up with "a web page" at the end of
    it, then raw XML isn't enough of a finishing point, you need to take
    it further.

    What is "XML" ? What DTD or Schema are you aiming at ?


    For one-offs, use Dave Raggett's Tidy (easily obtained via HTMLKit).
    Even if you're not looking for an XHTML output, Tidy can be an
    excellent pre-processor for sorting out ugly Tag Soup.

    For screen-scrapes, use your favourite scripting language (Perl is
    always a good start, but you could use Python or even JavaScript) and
    use someone else's HTML parser.

    RSS 1.0 is a good XML Schema to target at for generic screen scraping,
    even if you don;t think your content is "relevant" to a newseed (but
    RSS 0.92 isn't)

    --
    Do whales have krillfiles ?
    Andy Dingley, Jan 14, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Cheng[MSFT]

    RE: Convert HTML to XML or Paser HTML

    Steven Cheng[MSFT], Jan 9, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    3,454
    George Ter-Saakov
    Feb 12, 2004
  2. Joerg Jooss

    Re: Convert HTML to XML or Paser HTML

    Joerg Jooss, Jan 11, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    548
    Joerg Jooss
    Jan 11, 2004
  3. Q.Z
    Replies:
    0
    Views:
    570
  4. Adam Akhtar
    Replies:
    9
    Views:
    508
    Florian Gilcher
    Aug 16, 2008
  5. Erik Wasser
    Replies:
    5
    Views:
    436
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page