Function or approach to test if XML is well-formed? Design advice?

K

Ken Fine

I periodically receive a 5+ MB XML document that I hand-load into SQL Server
using SQLXML running under a DTS process.

Unfortunately, the document is human-created, and (very unfortunately) often
has invalid elements, which breaks the bulk load. I've been managing this
problem by loading the document into Visual Studio and using it to identify
the offending line numbers, and fixing it by hand. Given my truly minimal
coding skills when I designed the app way back when, this was the best
approach.

I'd like to automate this crummy work now.

Using ASP, is there a way to generically test whether a given snippet of XML
is well-formed?

I've mapped out a very crude approach that involves walking through the
document with ASP's regex matching abilities, assigning the contents between
<item> and </item> to a variable, testing that section for well-formedness,
parsing out the pieces of information I want, inserting these pieces to the
database, and looping as necessary to finish the job.

If there are better approaches that can accommodate the occassionally-broken
incoming XML, I'd love to hear suggestions. I do not have a formal CS
background and sometimes what should be obvious is not.

Thanks,
-KF
 
K

Ken Fine

FYI, I was using the colloquial meaning of "invalid" when I wrote
"...Unfortunately, the document is human-created, and (very unfortunately)
often
has invalid elements..."

What I meant was that sometimes a given section of XML is not well-formed.
Testing validation against an XML schema isn't the important thing here.

-KF
 
K

Ken Fine

I believe I may have found the answer to my own question. You can employ the
MSXML/XMLDOM parser to test, as follows:

<%@LANGUAGE="VBSCRIPT" CODEPAGE="1252"%>
<% Dim mydoc,strXML
Set mydoc=Server.CreateObject("Microsoft.XMLDOM")

strXML="<book><author>author1</author><title>title1</title></book>"
mydoc.loadXML(strXML)

if mydoc.parseError.errorcode<>0 then
response.write "failure"
'error handling code, jump out of script
else
response.write "success"
' proceed with DB insert, etc...
end if

%>

4guys has a good article about all this:
http://www.4guysfromrolla.com/webtech/101200-1.shtml

Still curious if there's a better design approach for this. I'm wondering if
splitting the valid XML nodes into an array would be more efficient than
looping, but I don't know if it's possible to do the tests for
well-formedness if I try to generate an array.

-KF
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top