How do you get a class attribute from a span tag

D

Daryn

Hi,

I am using org.w3c.dom to extract values from some HTML.

Some html with a span tag like this: <SPAN class='item c123'>-</SPAN>

My code is like this:

StringReader reader = new StringReader(html());
InputSource inputSource = new InputSource(reader);
SAX2DOM sax2dom = new SAX2DOM();

Parser tagSoupParser = new Parser();
tagSoupParser.setContentHandler(sax2dom);
tagSoupParser.setFeature(Parser.namespacesFeature, false);
tagSoupParser.parse(inputSource);

Document document = (Document) sax2dom.getDOM();
NodeList trElements = document.getElementsByTagName("span");

Node node = trElements.item(0);


I would like to do something like this:
((Element)node).getAttributes().getNamedItem("class")

But that throws an "com.sun.org.apache.xerces.internal.dom.TextImpl
cannot be cast to org.w3c.dom.Element" exception.

How can I get the value of the class attribute in that span tag?

Thanks in advance!
 
D

Daniele Futtorovic

Hi,

I am using org.w3c.dom to extract values from some HTML.

Some html with a span tag like this:<SPAN class='item c123'>-</SPAN>

My code is like this:

StringReader reader = new StringReader(html());
InputSource inputSource = new InputSource(reader);
SAX2DOM sax2dom = new SAX2DOM();

Parser tagSoupParser = new Parser();
tagSoupParser.setContentHandler(sax2dom);
tagSoupParser.setFeature(Parser.namespacesFeature, false);
tagSoupParser.parse(inputSource);

Document document = (Document) sax2dom.getDOM();
NodeList trElements = document.getElementsByTagName("span");

Node node = trElements.item(0);


I would like to do something like this:
((Element)node).getAttributes().getNamedItem("class")

But that throws an "com.sun.org.apache.xerces.internal.dom.TextImpl
cannot be cast to org.w3c.dom.Element" exception.

How can I get the value of the class attribute in that span tag?

Thanks in advance!

Sounds fishy. Make sure that the code you're running as the same as what
you've posted; make sure the exception occurs where you suggest it
occurs. Furthermore I'd suggest printing out the elements of the
returned list. You can also check the type of a node by comparing its
nodeType (or somesuch) property against the constants defined in the
org.w3c.dom.Node class.
Also, why aren't you using a DocumentBuilder if what you need is a DOM?
 
A

Arne Vajhøj

I am using org.w3c.dom to extract values from some HTML.

Some html with a span tag like this:<SPAN class='item c123'>-</SPAN>

My code is like this:

StringReader reader = new StringReader(html());
InputSource inputSource = new InputSource(reader);
SAX2DOM sax2dom = new SAX2DOM();

Parser tagSoupParser = new Parser();
tagSoupParser.setContentHandler(sax2dom);
tagSoupParser.setFeature(Parser.namespacesFeature, false);
tagSoupParser.parse(inputSource);

Document document = (Document) sax2dom.getDOM();
NodeList trElements = document.getElementsByTagName("span");

Node node = trElements.item(0);


I would like to do something like this:
((Element)node).getAttributes().getNamedItem("class")

But that throws an "com.sun.org.apache.xerces.internal.dom.TextImpl
cannot be cast to org.w3c.dom.Element" exception.

How can I get the value of the class attribute in that span tag?

This:

String xml = "<SPAN class='item c123'>bla bla</SPAN>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml )));
Element elm = (Element)doc.getElementsByTagName("SPAN").item(0);
System.out.println("content = " +
elm.getFirstChild().getNodeValue());
System.out.println("class = " + elm.getAttribute("class"));

works here.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top