getElementsByTagName 'n such

Discussion in 'Javascript' started by norfleet, May 28, 2004.

  1. norfleet

    norfleet Guest

    hi folks,

    OK, so let's say, for example, I have a bit of HTML that looks like
    this:

    <td class="regular1b" valign="top">
    <a href="notfound.html"><span class="list5"><b>Lecture
    V</b></span></a>
    </td>

    And I want to save all the text ("all" meaning the tags and
    everything) between the <td> and </td>. Using JavaScript, I was able
    to isolate the <td></td> by doing:

    var w = myTable.getElementsByTagName("TD");

    So then I have an IF statement within a FOR loop that looks like:

    if (w.item(i).className == "regular1b")
    alert(w.childNodes[0].nodeValue);

    The ALERT() is just a place holder to make sure things are working.
    The thing is, nodeValue returns NULL because there's no actual text
    within the <td></td> tags; the only thing there is more HTML code, and
    the text between the <span></span> apparently isn't considered part of
    the <td></td> tags.

    I guess I'm wondering if there's another way to go about getting the
    text from in between the <td></td> tags short of just doing a
    brute-force text search on the whole darn page. Any help would be
    much appreciated...

    Fleet
    norfleet, May 28, 2004
    #1
    1. Advertising

  2. norfleet

    Ron Guest

    norfleet wrote:

    >hi folks,
    >
    >OK, so let's say, for example, I have a bit of HTML that looks like
    >this:
    >
    ><td class="regular1b" valign="top">
    > <a href="notfound.html"><span class="list5"><b>Lecture
    >V</b></span></a>
    ></td>
    >
    >And I want to save all the text ("all" meaning the tags and
    >everything) between the <td> and </td>. Using JavaScript, I was able
    >to isolate the <td></td> by doing:
    >
    >var w = myTable.getElementsByTagName("TD");
    >
    >So then I have an IF statement within a FOR loop that looks like:
    >
    >if (w.item(i).className == "regular1b")
    > alert(w.childNodes[0].nodeValue);
    >
    >The ALERT() is just a place holder to make sure things are working.
    >The thing is, nodeValue returns NULL because there's no actual text
    >within the <td></td> tags; the only thing there is more HTML code, and
    >the text between the <span></span> apparently isn't considered part of
    >the <td></td> tags.
    >
    >I guess I'm wondering if there's another way to go about getting the
    >text from in between the <td></td> tags short of just doing a
    >brute-force text search on the whole darn page. Any help would be
    >much appreciated...
    >
    >Fleet
    >
    >

    Heya Fleet,
    Unless the document is normalized, childNodes[0] may be a whitespace
    text node. You might want to normalize your TD before reading from it.
    In addition, nodeValue is supposed to return null for any element node
    ->
    http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-1950641247
    .. Unfortunately, the best (possibly only) current way to get what you
    want is to use the non-standard innerHTML property of your TD object. It
    is implemented in the latest versions of IE and Gecko-based browsers.
    Ron, May 28, 2004
    #2
    1. Advertising

  3. norfleet wrote:

    > OK, so let's say, for example, I have a bit of HTML that looks like
    > this:
    >
    > <td class="regular1b" valign="top">
    > <a href="notfound.html"><span class="list5"><b>Lecture
    > V</b></span></a>
    > </td>
    >
    > And I want to save all the text ("all" meaning the tags and
    > everything) between the <td> and </td>. Using JavaScript, I was able
    > to isolate the <td></td> by doing:
    >
    > var w = myTable.getElementsByTagName("TD");
    >
    > So then I have an IF statement within a FOR loop that looks like:
    >
    > if (w.item(i).className == "regular1b")


    As you have seen, there is no need to call the item() method explicitely
    when accessing the DOM with an ECMAScript implementation. Using the square
    bracket property accessor syntax, that method or the namedItem() method
    is called implicitely, depending on the type of the operand.

    <http://www.w3.org/TR/DOM-Level-2-HTML/ecma-script-binding.html>

    > alert(w.childNodes[0].nodeValue);
    >
    > The ALERT() is just a place holder to make sure things are working.
    > The thing is, nodeValue returns NULL because there's no actual text
    > within the <td></td> tags;


    It returns `null' (ECMAScript is case-sensitive) because the first child
    node is an element node. This is documented and standards compliant
    behavior. Think of the contents of the "td" element as a subtree where
    nested content is a child node. Provided that the whitespace after the
    start tag of the "td" element and before the start tag of the "a" element
    is not considered a text node (proprietary behavior!), this subtree looks like

    ..
    ..
    ..
    '- TD class="regular1b" valign="top"
    | |
    | '- A href="notfound.html"
    | |
    | '- SPAN class="list5"
    | |
    | '- B
    | |
    | '- TEXT "Lecture V"
    |
    |- ...
    ..
    ..
    ..

    (The "Show parse tree" feature of the W3C Validator
    <http://validator.w3.org/> provides a similar presentation.)

    You see that childNodes[0] or firstChild refers to an element node.

    Standard compliant parsing would result in

    ..
    ..
    ..
    '- TD class="regular1b" valign="top"
    | |
    | |- TEXT "\n\t"
    | |
    | '- A HREF="notfound.html"
    | |
    | '- SPAN class="list5"
    | |
    | '- B
    | |
    | '- TEXT "Lecture V"
    |
    |- ...
    ..
    ..
    ..

    so in Mozilla/5.0 (Mozilla, Netscape 6+, Firefox, Camino,
    ....) you get "\n\t" for childNodes[0].nodeValue.

    That is why it was suggested to normalize the document, such as

    <td class="regular1b" valign="top"><a
    href="notfound.html"
    ><span class="list5"
    ><b>Lecture V</b></span></a></td>


    > the only thing there is more HTML code, and the text between the
    > <span></span> apparently isn't considered part of the <td></td> tags.


    That misconception is the main cause for your problem.

    > I guess I'm wondering if there's another way to go about getting the
    > text from in between the <td></td> tags short of just doing a
    > brute-force text search on the whole darn page. Any help would be
    > much appreciated...


    There is. The "innerHTML" property has been suggested. But since it is
    proprietary, and you are using the standards compliant DOM, you should
    rather serialize the subtree, traversing it. Depending on the UA's DOM,
    there are predefined serializer objects, such as XMLSerializer in the
    Gecko DOM. But you can code your own serializer as well.


    PointedEars
    Thomas 'PointedEars' Lahn, May 28, 2004
    #3
  4. norfleet

    DU Guest

    norfleet wrote:

    > hi folks,
    >
    > OK, so let's say, for example, I have a bit of HTML that looks like
    > this:
    >
    > <td class="regular1b" valign="top">
    > <a href="notfound.html"><span class="list5"><b>Lecture
    > V</b></span></a>
    > </td>
    >
    > And I want to save all the text ("all" meaning the tags and
    > everything) between the <td> and </td>. Using JavaScript, I was able
    > to isolate the <td></td> by doing:
    >
    > var w = myTable.getElementsByTagName("TD");
    >
    > So then I have an IF statement within a FOR loop that looks like:
    >
    > if (w.item(i).className == "regular1b")
    > alert(w.childNodes[0].nodeValue);
    >
    > The ALERT() is just a place holder to make sure things are working.
    > The thing is, nodeValue returns NULL because there's no actual text
    > within the <td></td> tags; the only thing there is more HTML code, and
    > the text between the <span></span> apparently isn't considered part of
    > the <td></td> tags.
    >
    > I guess I'm wondering if there's another way to go about getting the
    > text from in between the <td></td> tags





    There is. The textContent attribute in the Node interface (DOM 3 Core)
    is supported by Mozil1a 1.5+. I tried it with your specific markup code
    (with all the white-space, line feed, etc) and it worked without a
    problem. I tried it with more complex subtree and it worked as expected.

    Bug 210451: Implement Node.textContent
    http://bugzilla.mozilla.org/show_bug.cgi?id=210451

    http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Node3-textContent

    For other browsers not supporting DOM 3 Node Interface, you can create a
    traversal subtree function and get/fetch the text or use the
    non-standard innerHTML attribute.

    DU



    short of just doing a
    > brute-force text search on the whole darn page. Any help would be
    > much appreciated...
    >
    > Fleet
    DU, Jun 6, 2004
    #4
  5. norfleet

    DU Guest

    norfleet wrote:

    > hi folks,
    >
    > OK, so let's say, for example, I have a bit of HTML that looks like
    > this:
    >
    > <td class="regular1b" valign="top">
    > <a href="notfound.html"><span class="list5"><b>Lecture
    > V</b></span></a>
    > </td>
    >
    > And I want to save all the text ("all" meaning the tags and
    > everything) between the <td> and </td>. Using JavaScript, I was able
    > to isolate the <td></td> by doing:
    >
    > var w = myTable.getElementsByTagName("TD");
    >
    > So then I have an IF statement within a FOR loop that looks like:
    >
    > if (w.item(i).className == "regular1b")
    > alert(w.childNodes[0].nodeValue);
    >
    > The ALERT() is just a place holder to make sure things are working.
    > The thing is, nodeValue returns NULL because there's no actual text
    > within the <td></td> tags; the only thing there is more HTML code, and
    > the text between the <span></span> apparently isn't considered part of
    > the <td></td> tags.
    >


    I suggest you play around, get to know, get accustomed to using
    Mozilla's DOM inspector. You can install it on Netscape 7.1 and Firefox
    0.8 as well. This is how I personally noticed that white-space between
    nodes are treated as anonymous text nodes. What you say above is not
    true (your misconception is widely common) and was explained in

    Whitespace in the DOM
    http://www.mozilla.org/docs/dom/technote/whitespace/

    DU

    > I guess I'm wondering if there's another way to go about getting the
    > text from in between the <td></td> tags short of just doing a
    > brute-force text search on the whole darn page. Any help would be
    > much appreciated...
    >
    > Fleet
    DU, Jun 6, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?Sm9l?=

    SelectNodes vs. GetElementsByTagName

    =?Utf-8?B?Sm9l?=, Nov 2, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    41,118
    =?Utf-8?B?Sm9l?=
    Nov 2, 2005
  2. mynamehere
    Replies:
    0
    Views:
    438
    mynamehere
    Dec 14, 2003
  3. Danny
    Replies:
    1
    Views:
    410
    wooks
    Jul 23, 2004
  4. Ragnar Heil
    Replies:
    5
    Views:
    7,507
    Ragnar Heil
    Apr 27, 2005
  5. Simon Dahlbacka

    xml getElementsByTagName w/o recursion?

    Simon Dahlbacka, Feb 11, 2004, in forum: Python
    Replies:
    2
    Views:
    1,201
    Chris Herborth
    Feb 11, 2004
Loading...

Share This Page