Parse a html file as a XML file

S

Stan SR

Hi,

I need to read a html file and parse it as a XML File.

All my html file have this structure.
<html>
<head>
<title>
</title>
<script language="javascript">
</script>
</head>
<body>
</body>
</html>

My code has to read some sections (title, script, body).
Everything works when the script language (javascript code) section has not
code or not a lot, but sometimes it fails when there are characters like ;
(especially in "for" statement).
So for that works, I had to add "decorate" the script section with
<![CDATA[ ]]> and it looks like

<script language="javascript">
<![CDATA[

]]>
</script>

Is there a way to parse the file without using the <![CDATA[ ]]> tag ?

Stan
 
C

Cowboy \(Gregory A. Beamer\)

Try <!-- and -->, which is a standard practice. I imagine some parsers will
still puke on this methodology, but it should solve the major issue.

Can you solve this without doing anything? Probably not. It is the nature of
freeform sections, which XML does not understand the same way HTML parsers
do, as the rules are more strict.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top