S
Sven
Dear all,
I'm trying to extract data from HTML using XPath in Java.
Unfortunately the text contents of nodes may contain <br/> tags which
are not correctly interpreted, at least not for me
A <p> node may contain this text:
<p>
Test1<br/>
Test2<br/>
Test3
</p>
Which is returned by the XPath query as "Test1Test2Test3" but I need
it as "Test1\nTest2\nTest3" or "Test1 Test2 Test3".
Here's example code (Java 6):
public class Example {
public static void main( String[] args ) throws Exception {
final XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
String value = (String)xPath.evaluate(
"//p",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
xPath = xPathFactory.newXPath();
value = (String)xPath.evaluate(
"//p/text()",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
xPath = xPathFactory.newXPath();
value = (String)xPath.evaluate(
"//p/node()",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
}
}
This code returns:
Test1Test2Test3
Test1
Test1
Is there any way (XPath function etc) which will return the contents
as desired?
Thank you!
I'm trying to extract data from HTML using XPath in Java.
Unfortunately the text contents of nodes may contain <br/> tags which
are not correctly interpreted, at least not for me
A <p> node may contain this text:
<p>
Test1<br/>
Test2<br/>
Test3
</p>
Which is returned by the XPath query as "Test1Test2Test3" but I need
it as "Test1\nTest2\nTest3" or "Test1 Test2 Test3".
Here's example code (Java 6):
public class Example {
private static final String html = said:Test2<br/>Test3</p></body></html>";
public static void main( String[] args ) throws Exception {
final XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
String value = (String)xPath.evaluate(
"//p",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
xPath = xPathFactory.newXPath();
value = (String)xPath.evaluate(
"//p/text()",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
xPath = xPathFactory.newXPath();
value = (String)xPath.evaluate(
"//p/node()",
new InputSource( new StringReader( html ) ),
XPathConstants.STRING );
System.out.println( value );
}
}
This code returns:
Test1Test2Test3
Test1
Test1
Is there any way (XPath function etc) which will return the contents
as desired?
Thank you!