XPath, XMLHttpRequest and parsing DOM

Discussion in 'Javascript' started by Xandor Leahte, Aug 8, 2011.

  1. Hey there,

    I wish to introduce you to a problem that i get working on Javascript
    and XPath.

    Be r an XMLHttpRequest object; i want to make a request through a
    webpage inside my domain (so no security problem); with r i can handle
    r.responseText and r.responseXML: sometimes i can't use responseXML
    cause of no valid syntax of the document, so I've to use responseText.
    So, creating the DOM document like this way:

    var doc = new DOMParser().parseFromString(r.responseText, "text/
    xml")

    Then I can try to evaluate a XPath expression on doc, like:

    doc.evaluate(query, doc, null, 0, null)

    where query is a valid XPath expression. There's the problem: if I
    make a query like "//*[@id='foo']" or "//*" it works perfectly;
    otherwise if i make a query like "/html/body" or "/ol/li/a" or
    something without wildcard * included, the evaluate function returns
    null. I can't understand why if i dont use the wildcard query doesn't
    work (see: query works if I try to evaluate it in a "document" contest
    like in firebug/js console where my page is the "document" object).

    I think it's a problem of parsing request but i dont know ways to do
    it; maybe i could use a hidden iframe but it's not so elegant. I wish
    to know if you know something about this problem, maybe a problem
    about DOM parsing or something like that...

    Thanks for all your reply and sorry for my english, I hope you can
    forgive me!

    Sincerely,
    X.
     
    Xandor Leahte, Aug 8, 2011
    #1
    1. Advertisements

  2. I don't see why parseFromString on responseText would work when
    responseXML could not be built.
    Post a sample of the XML markup you parse with DOMParser. I suspect it
    is a namespace problem i.e. you have
    <html xmlns="http://www.w3.org/1999/xhtml">...</html>
    in your responseText and then you parse that with DOMParser a XML DOM
    document is built with the elements all belonging to the XHTML
    namespace. In that case with doc.evaluate you need to pass in a
    namespace resolver and use a prefix e.g.
    doc.evaluate('xhtml:html/xhtml:body', doc, function(prefix) { if
    (prefix === 'xhtml') return 'http://www.w3.org/1999/xhtml'; else return
    null; }, 0, null);
     
    Martin Honnen, Aug 8, 2011
    #2
    1. Advertisements

  3. FYI: The (experimental) jsx.xpath object makes this easier and the
    programming more flexible. For example, the above code can be written as

    jsx.xpath.evaluate('_xhtml:html/_xhtml:body', doc,
    jsx.xpath.createCustomNSResolver({
    _xhtml: 'http://www.w3.org/1999/xhtml'
    }));

    (where you might want to alias jsx.xpath or the used methods, or
    jsx._import(jsx.xpath, …) them in order to increase runtime efficiency.)

    This should work with implementations of DOM Level 3 XPath and MSXML alike.
    The only dependency for xpath.js, which defines that object, is object.js.

    <http://PointedEars.de/websvn/filedetails.php?repname=JSX&path=/trunk/xpath.js>


    PointedEars
     
    Thomas 'PointedEars' Lahn, Aug 8, 2011
    #3
  4. Hey there! Thanks for reply! Sometimes responseXML cannot be build
    cause of content/type of request; im handling right now to force
    XMLHttpRequest to ask a defined content/type
    (using .setRequestHeader()).
    This is a sample of the page that i've to parse: http://pastebin.com/njtdvcLH
    Im just working on a information extraction module and i've to handle
    the page using XPath.
    I just tried it on a shell like Firebug when the document is the
    Document object itself and XPath queries work. A sample of my code is
    here: http://pastebin.com/QdXzhDba

    Thanks a lot for reply!
     
    Xandor Leahte, Aug 10, 2011
    #4
  5. You can handle that case with Firefox/Mozilla with
    overrideMimeType("application/xml")
    https://developer.mozilla.org/en/xmlhttprequest#overrideMimeType()_Non-standard

    Well if you have an XML DOM document and want to run XPath against XHTML
    where the elements are in the XHTML default namespace then doing
    createNSResolver(doc.documentElement) does not help, you will need to
    implement your own namespace resolver (which is as easy as using a
    function expression
    function (prefix) {
    if (prefix === 'x') {
    return 'http://www.w3.org/1999/xhtml';
    }
    else {
    return null;
    }
    }
    ) then you have to use the choosen prefix (e.g. 'x') in your path
    expressions (as in /x:html/x:body//x:a/@href).
     
    Martin Honnen, Aug 12, 2011
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.