XPath, XMLHttpRequest and parsing DOM

Xandor Leahte · Aug 8, 2011

Hey there,

I wish to introduce you to a problem that i get working on Javascript
and XPath.

Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

I think it's a problem of parsing request but i dont know ways to do
it; maybe i could use a hidden iframe but it's not so elegant. I wish
to know if you know something about this problem, maybe a problem
about DOM parsing or something like that...

Thanks for all your reply and sorry for my english, I hope you can
forgive me!

Sincerely,
X.

Martin Honnen · Aug 8, 2011

Xandor said:
Hey there,

I wish to introduce you to a problem that i get working on Javascript
and XPath.

Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

I don't see why parseFromString on responseText would work when
responseXML could not be built.

Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
<html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml'; else return
null; }, 0, null);

Thomas 'PointedEars' Lahn · Aug 8, 2011

Martin said:
Xandor said:

Be r an XMLHttpRequest object; i want to make a request through a
webpage inside my domain (so no security problem); with r i can handle
r.responseText and r.responseXML: sometimes i can't use responseXML
cause of no valid syntax of the document, so I've to use responseText.
So, creating the DOM document like this way:

var doc = new DOMParser().parseFromString(r.responseText, "text/
xml")

Click to expand...

I don't see why parseFromString on responseText would work when
responseXML could not be built.
ACK

Then I can try to evaluate a XPath expression on doc, like:

doc.evaluate(query, doc, null, 0, null)

where query is a valid XPath expression. There's the problem: if I
make a query like "//*[@id='foo']" or "//*" it works perfectly;
otherwise if i make a query like "/html/body" or "/ol/li/a" or
something without wildcard * included, the evaluate function returns
null. I can't understand why if i dont use the wildcard query doesn't
work (see: query works if I try to evaluate it in a "document" contest
like in firebug/js console where my page is the "document" object).

Click to expand...

Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
<html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml'; else return
null; }, 0, null);

FYI: The (experimental) jsx.xpath object makes this easier and the
programming more flexible. For example, the above code can be written as

jsx.xpath.evaluate('_xhtml:html/_xhtml:body', doc,
jsx.xpath.createCustomNSResolver({
_xhtml: 'http://www.w3.org/1999/xhtml'
}));

(where you might want to alias jsx.xpath or the used methods, or
jsx._import(jsx.xpath, â€¦) them in order to increase runtime efficiency.)

This should work with implementations of DOM Level 3 XPath and MSXML alike.
The only dependency for xpath.js, which defines that object, is object.js.

<http://PointedEars.de/websvn/filedetails.php?repname=JSX&path=/trunk/xpath.js>

PointedEars

Xandor Leahte · Aug 10, 2011

I don't see why parseFromString on responseText would work when
responseXML could not be built.

Hey there! Thanks for reply! Sometimes responseXML cannot be build
cause of content/type of request; im handling right now to force
XMLHttpRequest to ask a defined content/type
(using .setRequestHeader()).

Post a sample of the XML markup you parse with DOMParser. I suspect it
is a namespace problem i.e. you have
<html xmlns="http://www.w3.org/1999/xhtml">...</html>
in your responseText and then you parse that with DOMParser a XML DOM
document is built with the elements all belonging to the XHTML
namespace. In that case with doc.evaluate you need to pass in a
namespace resolver and use a prefix e.g.
doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
(prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml';else return
null; }, 0, null);

This is a sample of the page that i've to parse: http://pastebin.com/njtdvcLH
Im just working on a information extraction module and i've to handle
the page using XPath.
I just tried it on a shell like Firebug when the document is the
Document object itself and XPath queries work. A sample of my code is
here: http://pastebin.com/QdXzhDba

Thanks a lot for reply!

Martin Honnen · Aug 12, 2011

Xandor said:
Hey there! Thanks for reply! Sometimes responseXML cannot be build
cause of content/type of request; im handling right now to force
XMLHttpRequest to ask a defined content/type
(using .setRequestHeader()).

You can handle that case with Firefox/Mozilla with
overrideMimeType("application/xml")
https://developer.mozilla.org/en/xmlhttprequest#overrideMimeType()_Non-standard

This is a sample of the page that i've to parse: http://pastebin.com/njtdvcLH
Im just working on a information extraction module and i've to handle
the page using XPath.
I just tried it on a shell like Firebug when the document is the
Document object itself and XPath queries work. A sample of my code is
here: http://pastebin.com/QdXzhDba

Well if you have an XML DOM document and want to run XPath against XHTML
where the elements are in the XHTML default namespace then doing
createNSResolver(doc.documentElement) does not help, you will need to
implement your own namespace resolver (which is as easy as using a
function expression
function (prefix) {
if (prefix === 'x') {
return 'http://www.w3.org/1999/xhtml';
}
else {
return null;
}
}
) then you have to use the choosen prefix (e.g. 'x') in your path
expressions (as in /x:html/x:body//x:a/@href).

XMLHttpRequest -- wrong Wikipedia article?	5	Apr 15, 2011
XmlHttpRequest & responseXml	1	Mar 17, 2005
Opera XPath issue	9	Mar 26, 2010
"walk over," and XPath-based substitutions?	2	Apr 6, 2013
XmlHttpRequest	19	Mar 31, 2005
AJAX and XPATH	2	Jan 9, 2008
xmlhttprequest not working	4	Sep 21, 2005
XMLHttpRequest and "continuation chaining"	1	Jan 19, 2007

XPath, XMLHttpRequest and parsing DOM

Xandor Leahte

Martin Honnen

Thomas 'PointedEars' Lahn

Xandor Leahte

Martin Honnen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads