Obtaining the textNode from within multiple elements.

Daz · Dec 10, 2006

Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

<table>
<tbody>
<tr>
<td>
<i><b>example text</b></i>
</td>
<td>
example text
</td>
<td>
<font color="blue">example text</font>
</td>
</tr>
<tr>
<td>
<b>example text</b>
</td>
<td>
<font color="green"><u><b>example text</b></u></font>
</td>
<td>
<b>example text</u>
</td>
</tr>
</tbody>
</table>

Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

Many thanks.

Daz.

RobG · Dec 11, 2006

Daz said:
Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

You can use a number of strategies based on feature detection: firstly
try textContent, if that is not supported, try innerText. If that
isn't supported, you have a choice of innerHTML and striping out the
tags, or you can recursively iterate over all the nodes and grab just
the text.

There are some functions posted here:

<URL:
http://groups.google.com/group/comp...=innertext+textcontent#29f5c61c0ce91bfeCopies are included below.

[...]

Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

It's probably better if you say what you want the script to do, simply
getting all the text may not be what you really need.

Posted functions:

Using fallback to innerHTML and a regular expression to remove tags:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return el.innerHTML.replace(/<[^>]+>/g,'');
}

A better regular expression might be:

.replace( /<[^<>]+>/g, '' )

Suggested by Mike Winter:
<URL:
http://groups.google.com.au/group/c...gexp+remove+html+tags&rnum=5#3e06dda8f672ef5f
To avoid issues with regular expressions, use recursion - it will be
slower but that may not matter:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;

// If both fail, use recursion
return getText2(el);

// Recursive inner function
function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x.nodeType) {
txt += x.data;
} else if (1 == x.nodeType){
txt += getText2(x);
}
}

// Collapse whitespace before returning
return txt.replace(/\s+/g,' ');
}
}

Daz · Dec 11, 2006

RobG said:
Daz said:

Hi everyone.

Is there a simple way for me to get the value of the textNodes from
this piece of HTML, without iterating through the whole thing?

Click to expand...

You can use a number of strategies based on feature detection: firstly
try textContent, if that is not supported, try innerText. If that
isn't supported, you have a choice of innerHTML and striping out the
tags, or you can recursively iterate over all the nodes and grab just
the text.

There are some functions posted here:

<URL:
http://groups.google.com/group/comp...=innertext+textcontent#29f5c61c0ce91bfeCopies are included below.

[...]

Please note the format of the text is different in each cell, and that
the code I need to obtain the textNodes from is not mine, so I cannot
change that format. I am simply using JavaScript to make a browser
extension that will do useful things with the page.

Click to expand...

It's probably better if you say what you want the script to do, simply
getting all the text may not be what you really need.

Posted functions:

Using fallback to innerHTML and a regular expression to remove tags:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;
return el.innerHTML.replace(/<[^>]+>/g,'');
}

A better regular expression might be:

.replace( /<[^<>]+>/g, '' )

Suggested by Mike Winter:
<URL:
http://groups.google.com.au/group/c...gexp+remove+html+tags&rnum=5#3e06dda8f672ef5f
To avoid issues with regular expressions, use recursion - it will be
slower but that may not matter:

function getText(el)
{
if (el.textContent) return el.textContent;
if (el.innerText) return el.innerText;

// If both fail, use recursion
return getText2(el);

// Recursive inner function
function getText2(el) {
var x = el.childNodes;
var txt = '';
for (var i=0, len=x.length; i<len; ++i){
if (3 == x.nodeType) {
txt += x.data;
} else if (1 == x.nodeType){
txt += getText2(x);
}
}

// Collapse whitespace before returning
return txt.replace(/\s+/g,' ');
}
}

All very good ideas. I tried innerText, which isn't supported by
Firefox, so I was considering recursion but hoped there may have been a
better way. I would imagine that textContent is the key that just might
help me out. As I am designing XPIs for Firefox, I don't need to worry
about other browsers not working with the code.

Many thanks again.

Daz.

Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Sort by number of characters	1	Nov 2, 2023
Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
Javascript DOM	1	Mar 29, 2023
HTML Table Issue	1	Aug 29, 2022
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020

Obtaining the textNode from within multiple elements.

Daz

RobG

Daz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads