Parse a html file as a XML file

Discussion in 'ASP .Net' started by Stan SR, Jan 19, 2008.

  1. Stan SR

    Stan SR Guest

    Hi,

    I need to read a html file and parse it as a XML File.

    All my html file have this structure.
    <html>
    <head>
    <title>
    </title>
    <script language="javascript">
    </script>
    </head>
    <body>
    </body>
    </html>

    My code has to read some sections (title, script, body).
    Everything works when the script language (javascript code) section has not
    code or not a lot, but sometimes it fails when there are characters like ;
    (especially in "for" statement).
    So for that works, I had to add "decorate" the script section with
    <![CDATA[ ]]> and it looks like

    <script language="javascript">
    <![CDATA[

    ]]>
    </script>

    Is there a way to parse the file without using the <![CDATA[ ]]> tag ?

    Stan
    Stan SR, Jan 19, 2008
    #1
    1. Advertising

  2. Try <!-- and -->, which is a standard practice. I imagine some parsers will
    still puke on this methodology, but it should solve the major issue.

    Can you solve this without doing anything? Probably not. It is the nature of
    freeform sections, which XML does not understand the same way HTML parsers
    do, as the rules are more strict.

    --
    Gregory A. Beamer
    MVP, MCP: +I, SE, SD, DBA

    *************************************************
    | Think outside the box!
    |
    *************************************************
    "Stan SR" <> wrote in message
    news:...
    > Hi,
    >
    > I need to read a html file and parse it as a XML File.
    >
    > All my html file have this structure.
    > <html>
    > <head>
    > <title>
    > </title>
    > <script language="javascript">
    > </script>
    > </head>
    > <body>
    > </body>
    > </html>
    >
    > My code has to read some sections (title, script, body).
    > Everything works when the script language (javascript code) section has
    > not code or not a lot, but sometimes it fails when there are characters
    > like ; (especially in "for" statement).
    > So for that works, I had to add "decorate" the script section with
    > <![CDATA[ ]]> and it looks like
    >
    > <script language="javascript">
    > <![CDATA[
    >
    > ]]>
    > </script>
    >
    > Is there a way to parse the file without using the <![CDATA[ ]]> tag ?
    >
    > Stan
    >
    >
    Cowboy \(Gregory A. Beamer\), Jan 19, 2008
    #2
    1. Advertising

  3. You could try using Simon Mourier's "HtmlAgilityPack", which can be found on
    codeplex.com.
    It uses the concept of HtmlDocument class which parses the HTML of the page
    into an XPATH conformant document object that works "just like" XmlDocument.
    -- Peter
    Site: http://www.eggheadcafe.com
    UnBlog: http://petesbloggerama.blogspot.com
    MetaFinder: http://www.blogmetafinder.com


    "Stan SR" wrote:

    > Hi,
    >
    > I need to read a html file and parse it as a XML File.
    >
    > All my html file have this structure.
    > <html>
    > <head>
    > <title>
    > </title>
    > <script language="javascript">
    > </script>
    > </head>
    > <body>
    > </body>
    > </html>
    >
    > My code has to read some sections (title, script, body).
    > Everything works when the script language (javascript code) section has not
    > code or not a lot, but sometimes it fails when there are characters like ;
    > (especially in "for" statement).
    > So for that works, I had to add "decorate" the script section with
    > <![CDATA[ ]]> and it looks like
    >
    > <script language="javascript">
    > <![CDATA[
    >
    > ]]>
    > </script>
    >
    > Is there a way to parse the file without using the <![CDATA[ ]]> tag ?
    >
    > Stan
    >
    >
    >
    Peter Bromberg [C# MVP], Jan 19, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Francesco Moi
    Replies:
    8
    Views:
    565
    Martin Honnen
    Feb 21, 2005
  2. Replies:
    19
    Views:
    1,121
    Daniel Vallstrom
    Mar 15, 2005
  3. Rob Hunter
    Replies:
    2
    Views:
    143
    Keith Fahlgren
    Aug 31, 2007
  4. 7stud --

    optparse: parse v. parse! ??

    7stud --, Feb 20, 2008, in forum: Ruby
    Replies:
    3
    Views:
    185
    7stud --
    Feb 20, 2008
  5. Morten Guldager
    Replies:
    0
    Views:
    117
    Morten Guldager
    Dec 30, 2012
Loading...

Share This Page