Extracting text (cross platform)

D

Debbie

Is there a standard way to extract text from a web page, without using
innertext/innerhtml?

It's an academic exercise, and we've been advised that we can't use
Internet Explorer DOM extensions that are not part of the W3C DOM.

Thanks,

Debbie
 
M

Martin Honnen

Debbie said:
Is there a standard way to extract text from a web page, without using
innertext/innerhtml?

It's an academic exercise, and we've been advised that we can't use
Internet Explorer DOM extensions that are not part of the W3C DOM.

Well then use the W3C DOM, text will sit in text nodes as leaf nodes of
the DOM tree and each text node has a property named nodeValue that will
give you the text in the text node. You could also use the data property
for that.
If you want the text in an element then you will either have to go
through the child nodes and concatenate the text of the child nodes
(where you might have to recursively go down the tree until you have the
text nodes) or depending on your needs and requirements you can use the
W3C DOM Level 3 property named textContent which Mozilla has been
supporting for quite some time and which at least Opera supports too now.
Then there is the W3C DOM Level 2 Range API that also allows you to get
the text in a range so you could position the range on an element node
and call toString on the range e.g.
var range = document.createRange();
range.selectNodeContents(someNode);
var text = range.toString();
Mozilla and Opera 8 and later support the Range API.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top