Obtaining the textNode from within multiple elements.

D

Daz

Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

<table>
<tbody>
<tr>
<td>
<i><b>example text</b></i>
</td>
<td>
example text
</td>
<td>
<font color="blue">example text</font>
</td>
</tr>
<tr>
<td>
<b>example text</b>
</td>
<td>
<font color="green"><u><b>example text</b></u></font>
</td>
<td>
<b>example text</u>
</td>
</tr>
</tbody>
</table>

Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

Many thanks.

Daz.
 
R

RobG

Daz said:
Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

You can use a number of strategies based on feature detection: firstly
try textContent, if that is not supported, try innerText. If that
isn't supported, you have a choice of innerHTML and striping out the
tags, or you can recursively iterate over all the nodes and grab just
the text.

There are some functions posted here:

<URL:
http://groups.google.com/group/comp...=innertext+textcontent#29f5c61c0ce91bfeCopies are included below.

[...]
Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

It's probably better if you say what you want the script to do, simply
getting all the text may not be what you really need.


Posted functions:

Using fallback to innerHTML and a regular expression to remove tags:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return el.innerHTML.replace(/<[^>]+>/g,'');
}

A better regular expression might be:

.replace( /<[^<>]+>/g, '' )

Suggested by Mike Winter:
<URL:
http://groups.google.com.au/group/c...gexp+remove+html+tags&rnum=5#3e06dda8f672ef5f
To avoid issues with regular expressions, use recursion - it will be
slower but that may not matter:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;

// If both fail, use recursion
return getText2(el);

// Recursive inner function
function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x.nodeType) {
txt += x.data;
} else if (1 == x.nodeType){
txt += getText2(x);
}
}

// Collapse whitespace before returning
return txt.replace(/\s+/g,' ');
}
}
 
D

Daz

RobG said:
Daz said:
Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

You can use a number of strategies based on feature detection: firstly
try textContent, if that is not supported, try innerText. If that
isn't supported, you have a choice of innerHTML and striping out the
tags, or you can recursively iterate over all the nodes and grab just
the text.

There are some functions posted here:

<URL:
http://groups.google.com/group/comp...=innertext+textcontent#29f5c61c0ce91bfeCopies are included below.

[...]
Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

It's probably better if you say what you want the script to do, simply
getting all the text may not be what you really need.


Posted functions:

Using fallback to innerHTML and a regular expression to remove tags:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return el.innerHTML.replace(/<[^>]+>/g,'');
}

A better regular expression might be:

.replace( /<[^<>]+>/g, '' )

Suggested by Mike Winter:
<URL:
http://groups.google.com.au/group/c...gexp+remove+html+tags&rnum=5#3e06dda8f672ef5f
To avoid issues with regular expressions, use recursion - it will be
slower but that may not matter:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;

// If both fail, use recursion
return getText2(el);

// Recursive inner function
function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x.nodeType) {
txt += x.data;
} else if (1 == x.nodeType){
txt += getText2(x);
}
}

// Collapse whitespace before returning
return txt.replace(/\s+/g,' ');
}
}


All very good ideas. I tried innerText, which isn't supported by
Firefox, so I was considering recursion but hoped there may have been a
better way. I would imagine that textContent is the key that just might
help me out. As I am designing XPIs for Firefox, I don't need to worry
about other browsers not working with the code.

Many thanks again.

Daz.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top