java xerces xpath fails with namespace

J

jacksu

I have a simple program to run xpath with xerces 1_2_7

XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();


XPathExpression xp = xPath.compile(strXpr);
System.out.println(xp.evaluate(new InputSource(new
FileInputStream("a.xml"))));

if a.xml is
<?xml
version="1.0"?><root><parent><son>theTextValue</son></parent></root>

and strXpr is /root/parent/son/text()
I got correct value back "theTextValue".

But if parent is with namespace, such as soap message, it always fails,
eg.:<?xml version="1.0"?><soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soap:Body><son>theValue</son>
</soap:Body></soapEnvelope>

strXpr is /soap:Envelope/soap:Body/son/text()

empty string was returned...

Any suggestion are welcome.

Thanks.
 
J

Joe Kesselman

XPath is namespace-sensitive. To correctly search a namespaced document,
you must use prefixes in your XPath and provide bindings from those
prefixes to the appropriate namespace URIs.

If you really insist on doing a namespace-insensitive search, it's
possible by using kluge-arounds such as node()[name()="foo"] ... but
REALLY not recommended. Namespaces are used because they're a meaningful
distinction. Don't attempt to ignore or bypass them.
 
J

Joe Kesselman

jacksu said:
and my xpath is:
//soap:Envelope/soap:Body/text()
or
//Envelope/Body/text()

The second won't work. The first will, *if* you've told your XPath
processor that the soap: prefix maps to
"http://schemas.xmlsoap.org/soap/envelope/"

How you do that depends on the processor. If you're using the XPath
within an XSLT stylesheet, you just need to make sure soap: has been
properly declared as a namespace at a point where it will be inherited
by the statement which is executing the XPath; the usual practice is to
define most namespaces all the way up at the top-level xsl:stylesheet
element to make sure they're available throughout the document.

If you're using an XPath API of some sort, check its docs to find out
how to tell it the mapping between prefixes and namespace URIs.
 
S

Soren Kuula

jacksu wrote:

Hi
//soap:Envelope/soap:Body/text()
or
//Envelope/Body/text()

all no result back....

Hmm... Shold fail with a loud bang, if you used a prefix in the XPath
but did not bind it to a namespace.

Try look around in the documentation for that XPath evaluation thing (I
don't know it), for something called a namespace environment, context or
something like that. When you found it, you will need to call something
like bind("soap", "http://namespaceURI/of/SOAP") on it before evaluating.


Soren
 
M

Martin Honnen

jacksu said:
I have a simple program to run xpath with xerces 1_2_7

XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();


XPathExpression xp = xPath.compile(strXpr);
System.out.println(xp.evaluate(new InputSource(new
FileInputStream("a.xml"))));

That looks like Java using the JAXP XPath API from Java 1.5 to me.
However Xerces-Java 2.6 or 2.7 might implement that, but rather not 1_2_7.

If you want to use namespaces then you need to pass in an object
implementing javax.xml.namespace.NamespaceContext that implements the
methods to resolve prefixes to namespace URIs and the other way round.
Then do
xp.setNamespaceContext(yourObjectImplementingNamespaceContext);
before you compile or evaluate expressions.
 
J

Joe Kesselman

Working example for the Apache/Xalan code:

public static void main(String[] args) {
String strXpr = "/a:foo/a:bar";
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
try {
// Anonymous hardcoded Namespace Context:
NamespaceContext MyNSC=new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix.equals("a")) return "urn:a";
else return XMLConstants.NULL_NS_URI;
}
public String getPrefix(String namespace) {
return null; // Just dummied out; Xalan doesn't need it.
}
public Iterator getPrefixes(String namespace) {
return null; // Just dummied out; Xalan doesn't need it.
}
};

xPath.setNamespaceContext(mynsc);
XPathExpression xp = xPath.compile(strXpr);
System.out.println(xp.evaluate(
new InputSource(new FileInputStream("a.xml"))));
} catch (Exception e) {
e.printStackTrace();
}
}
 
J

Joe Kesselman

Whups. Typo crept in while recopying this into the newsgroup; obviously,
MyNSC and mynsc were supposed to be the same variable. That's what I get
for trying to simplify the example on the fly.
 
J

jacksu

Thanks a lot.

It works fine in pure with-prefix mode, but seems have problem in
with-prefix/without-prefix mixed mode
such as:
<?xml version='1.0' encoding='UTF-8'?><soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soap:Body><mynode
xmlns="http://mynamespace">mytext</mynode></soap:Body></soap:Envelope>

I tried:
//soap:Envelope/soap:Body/mynode/text()

If I gives prefix to mynode, then everything works fine.

Any more suggestion?

Thanks.
 
M

Martin Honnen

jacksu said:
<?xml version='1.0' encoding='UTF-8'?><soap:Envelope
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soap:Body><mynode
xmlns="http://mynamespace">mytext</mynode></soap:Body></soap:Envelope>

I tried:
//soap:Envelope/soap:Body/mynode/text()

If I gives prefix to mynode, then everything works fine.

You need to use a prefix in the XPath expression, no need to change the
input XML but for the XPath you need a prefix bound to
http://mynamespace to select those elements.
See
<http://www.faqts.com/knowledge_base/view.phtml/aid/34022/fid/616>
Simply make sure you use a prefix e.g.
pf1:mynode
and your NamespaceContext returns the URI http://mynamespace for that
prefix.
 
G

Greg

I believe that I have prefixed my xpath properly, but I get an
XPathExpressionException when I evaluate it.

The xpath looks like this (note the "xhtml" prefix):


/xhtml:html/xhtml:body//xhtml:div[@class='reviewlist']


The source XML document is a garden-variety XHTML web page whose root
html element declares the document's default namespace as being
http://www.w3.org/1999/xhtml.


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">


As I understand it, writing the html element like this declares that it
belongs to the http://www.w3.org/1999/xhtml namespace. So, to evaluate
an xpath for this document, my javax.xml.xpath.XPath must have a
namespace context set. Here is my implementation of the the
javax.xml.namespace.NamespaceContext interface (note this
implementation accommodates mutlitple namespaces by use of a
java.util.HashMap - a tip from
http://www.onjava.com/pub/a/onjava/2005/01/12/xpath.html - lest a part
of the web page be in another language and its element has, say,
xml:lang="fr").


import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

import javax.xml.namespace.NamespaceContext;

public class NamespaceContextImpl implements NamespaceContext {

private Map map;

/**
* A contructor that instantiates a new java.util.HashMap in which
* namespace URIs will be mapped to prefixes.
*
* This method is inherited from the implemented NamespaceContext
interface.
*/
public NamespaceContextImpl() {
map = new HashMap();
}

/**
* Adds a prefix and namespace URI pair to this
* NamespaceContextImpl's HashMap.
*
* This method is not inherited from the implemented
* NamespaceContext interface.
*/
public void setNamespaceURI(String prefix, String namespaceURI) {
map.put(prefix, namespaceURI);
}

/**
* Gets the namespace URI mapped to the given
* prefix from this NamespaceContextImpl's HashMap.
*
* This method is inherited from the implemented NamespaceContext
interface.
*/
public String getNamespaceURI(String prefix) {
return (String)map.get(prefix);
}

/**
* Gets the prefix to which the given namespace
* URI is mapped in this NamespaceContextImpl's
* HashMap.
*
* This method is inherited from the implemented
* NamespaceContext interface.
*/
public String getPrefix(String namespaceURI) {

Set keys = map.keySet();

// Loop through the prefixes until one is found
// whose corresponding namespace URI matches
// the namespace URI passed to this method.
// Return that prefix.
for(Iterator i = keys.iterator(); i.hasNext(); ) {
String prefix = (String)i.next();
String uri = (String)map.get(prefix);
if(uri.equals(namespaceURI)) return prefix;
}

// If prefix is found with a namespace URI matching the
// namespace URI passed to this method, return null.
return null;
}

/**
* This method is inherited from the implemented
* NamespaceContext interface.
*/
public Iterator getPrefixes(String namespaceURI) {
return null;
}

}


I then use this NamespaceContext like this:


javax.xml.xpath.XPathFactory factory = XPathFactory.newInstance();
javax.xml.xpath.XPath xpath = factory.newXPath();

NamespaceContextImpl nsctx = new NamespaceContextImpl();
nsctx.setNamespaceURI("xml", "http://www.w3.org/XML/1998/namespace");
nsctx.setNamespaceURI("xhtml", "http://www.w3.org/1999/xhtml");
xpath.setNamespaceContext(nsctx);


For the evaluate method, I need a org.xml.sax.InputSource, which i get
like this:


java.net.URL url = new
java.net.URL("http://www.mywebpage.com/index.html");
java.net.HttpURLConnection huc =
(java.net.HttpURLConnection)url.openConnection();
org.xml.sax.InputSource ins = new
org.xml.sax.InputSource(huc.getInputStream());


Now I can evaluate the xpath and, I hope, get a org.w3c.dom.NodeList:


org.w3c.dom.NodeList nl = (
org.w3c.dom.NodeList)xpath.evaluate(
"/xhtml:html/xhtml:body//xhtml:div[@class='reviewlist']"
ins,
javax.xml.xpath.XPathConstants.NODESET
);



The result I get from calling this method, though, is a
javax.xml.xpath.XPathExpressionException, I think. At least, when I
catch the exception and call its toString() method, that what it says
it is. When I call its getCause() method, though, I get:

java.net.ConnectExcpetion: Connection timed out.


Its stack trace looks like this:


com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:475)
my.package.WebsiteHandler.startElement(Unknown Source)
org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
Source)

org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown
Source)

org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

org.cochrane.sitebuilder.servlet.WebsiteBuilder.parseWebSiteLayoutXML(Unknown
Source)
org.cochrane.sitebuilder.servlet.WebsiteBuilder.service(Unknown
Source)
javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)

org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:81)

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
org.jboss.web.tomcat.security.CustomPrincipalValve.invoke(CustomPrincipalValve.java:39)
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:153)

org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:59)

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)

org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)

org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744)
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)
java.lang.Thread.run(Thread.java:595)




Is it an XPathExpressionException? Is it a java.net.ConnectException?
What's going on?
 
J

Joe Kesselman

Greg said:
java.net.ConnectExcpetion: Connection timed out.

That's not a namespace problem, or shouldn't be. Namespaces are just
strings in URI format; the system never attempts to retrieve anything
from that URI. (At least, not unless you're getting involved in the
Semantic Web world, which is a different set of issues.) Hence,
namespaces don't have connections and don't time out.

It looks like the problem is during your attempt to retrieve your source
document, since it's reporting that the problem is in the parser.
 
G

Greg

It looks like the problem is during your attempt to retrieve your source
document, since it's reporting that the problem is in the parser.
So there's something wrong with this source:

http://www.cochrane.org/reviews/en/topics/60_new.html

?

I certainly have no trouble retrieving that page in a browser (without
any noticeable connection time delay). And I'm told by
http://validator.w3.org that it's "valid" XHTML, so I'm under the
impression that its a well-formed XML document.
 
G

Greg

Thanks for your response and suggestions, Joe.

The sum of the networking code, I belive, is just that which produces
a org.xml.sax.InputSource for the javax.xml.xpath.XPath's
evaluate(String expression, InputSource source, QName returnType)
method:

java.net.URL url = new
java.net.URL("http://www.exampleurl.com/index.html");
java.net.HttpURLConnection huc =
(java.net.HttpURLConnection)url.openConnection();
org.xml.sax.InputSource ins = new
org.xml.sax.InputSource(huc.getInputStream());

I'm not a master of networking code. The above code is something which
I more or less copy-and-pasted and which has "just worked" (with
several different URLs) for weeks up until a few days ago.

Just fyi, an update:

Every now and again the thrown XPathExpressionException's getCause()
method returns:

org.xml.sax.SAXParseException: Premature end of file

instead of

java.net.ConnectExcpetion: Connection timed out.

It's interesting (and frustrating) given that I can detect no change in
my code or in the XML source document between when this was working
last week and a few days ago when this XPathExpressionException started
showing up.

Here's the stack trace of the org.xml.sax.SAXParseException returned
from the getCause() method of the XPathExpressionException:

org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:468)
org.cochrane.sitebuilder.contenthandler.WebsiteHandler.startElement(Unknown
Source)
org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
Source)
org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown
Source)
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
org.cochrane.sitebuilder.servlet.WebsiteBuilder.parseWebSiteLayoutXML(Unknown
Source)
org.cochrane.sitebuilder.servlet.WebsiteBuilder.service(Unknown Source)
javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:81)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
org.jboss.web.tomcat.security.CustomPrincipalValve.invoke(CustomPrincipalValve.java:39)
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:153)
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:59)
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744)
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)
java.lang.Thread.run(Thread.java:595)
 
J

Joe Kesselman

Greg said:
I'm not a master of networking code. The above code is something which
I more or less copy-and-pasted and which has "just worked" (with
several different URLs) for weeks up until a few days ago.

May be a malfunction on the server's end rather than yours, then. Or in
the network. Timeouts and premature end of file could be either.
 
G

Greg

Well, the problem has gone away as mysteriously as it appeared.

For posterity, I can report only that the cause of the
XPathExpressionException was also occasionally "java.io.IOException:
stream is closed".

For now, given the variety of the causes, I'm pinning the ultimate
cause on my unkempt code, which I have now tidied up and documented
substantially.

Thanks for your help, Joe.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top