XPath, XMLHttpRequest and parsing DOM

X

Xandor Leahte

Hey there,

I wish to introduce you to a problem that i get working on Javascript
and XPath.

Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

I think it's a problem of parsing request but i dont know ways to do
it; maybe i could use a hidden iframe but it's not so elegant. I wish
to know if you know something about this problem, maybe a problem
about DOM parsing or something like that...

Thanks for all your reply and sorry for my english, I hope you can
forgive me!

Sincerely,
X.
 
M

Martin Honnen

Xandor said:
Hey there,

I wish to introduce you to a problem that i get working on Javascript
and XPath.

Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

I don't see why parseFromString on responseText would work when
responseXML could not be built.
Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
<html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml'; else return
null; }, 0, null);
 
T

Thomas 'PointedEars' Lahn

Martin said:
Xandor said:
Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

I don't see why parseFromString on responseText would work when
responseXML could not be built.
ACK
Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
<html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml'; else return
null; }, 0, null);

FYI: The (experimental) jsx.xpath object makes this easier and the
programming more flexible. For example, the above code can be written as

jsx.xpath.evaluate('_xhtml:html/_xhtml:body', doc,
jsx.xpath.createCustomNSResolver({
_xhtml: 'http://www.w3.org/1999/xhtml'
}));

(where you might want to alias jsx.xpath or the used methods, or
jsx._import(jsx.xpath, …) them in order to increase runtime efficiency.)

This should work with implementations of DOM Level 3 XPath and MSXML alike.
The only dependency for xpath.js, which defines that object, is object.js.

<http://PointedEars.de/websvn/filedetails.php?repname=JSX&path=/trunk/xpath.js>


PointedEars
 
X

Xandor Leahte

I don't see why parseFromString on responseText would work when
responseXML could not be built.

Hey there! Thanks for reply! Sometimes responseXML cannot be build
cause of content/type of request; im handling right now to force
XMLHttpRequest to ask a defined content/type
(using .setRequestHeader()).
Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
   <html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
   doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml';else return
null; }, 0, null);

This is a sample of the page that i've to parse: http://pastebin.com/njtdvcLH
Im just working on a information extraction module and i've to handle
the page using XPath.
I just tried it on a shell like Firebug when the document is the
Document object itself and XPath queries work. A sample of my code is
here: http://pastebin.com/QdXzhDba

Thanks a lot for reply!
 
M

Martin Honnen

Xandor said:
Hey there! Thanks for reply! Sometimes responseXML cannot be build
cause of content/type of request; im handling right now to force
XMLHttpRequest to ask a defined content/type
(using .setRequestHeader()).

You can handle that case with Firefox/Mozilla with
overrideMimeType("application/xml")
https://developer.mozilla.org/en/xmlhttprequest#overrideMimeType()_Non-standard

This is a sample of the page that i've to parse: http://pastebin.com/njtdvcLH
Im just working on a information extraction module and i've to handle
the page using XPath.
I just tried it on a shell like Firebug when the document is the
Document object itself and XPath queries work. A sample of my code is
here: http://pastebin.com/QdXzhDba

Well if you have an XML DOM document and want to run XPath against XHTML
where the elements are in the XHTML default namespace then doing
createNSResolver(doc.documentElement) does not help, you will need to
implement your own namespace resolver (which is as easy as using a
function expression
function (prefix) {
if (prefix === 'x') {
return 'http://www.w3.org/1999/xhtml';
}
else {
return null;
}
}
) then you have to use the choosen prefix (e.g. 'x') in your path
expressions (as in /x:html/x:body//x:a/@href).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top