Extracting Data from IE

Discussion in 'Javascript' started by chris_j_adams@hotmail.com, Oct 30, 2006.

  1. Guest

    Hi,

    I'm slowly discovering the world of JavaScript, so I'm not sure I'm
    attacking this problem in the right manner, thus if I'm in the wrong
    newsgroup, my apologies.

    What I'm trying to do is extract some news items from a web site. To
    do this, I'm using Microsoft Word VBA and using the following bit of
    script:

    '// Open web site
    IeApp.Navigate
    "http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
    Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

    '// Find text to extract
    txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
    txt = IeApp.Document.GetElementByID("a2copy").innerhtml

    When extracting the text (ie. "txt") I seem to get more than just the
    text of the body that I'm after, and the resulting junk is difficult to
    remove. I've looked at the object model but not real sure what I
    should be looking for, so wondering if anyone here can spare a bit of
    time to provide a pointer. For example, is there a tag that would more
    easily refer to the required text?

    Many thanks in advance if you can share some advice or guidance.
    Regards,
    Chris Adams
    , Oct 30, 2006
    #1
    1. Advertising

  2. wrote:

    > I'm slowly discovering the world of JavaScript, so I'm not sure I'm
    > attacking this problem in the right manner, thus if I'm in the wrong
    > newsgroup, my apologies.
    >
    > What I'm trying to do is extract some news items from a web site. To
    > do this, I'm using Microsoft Word VBA and using the following bit of
    > script:
    >
    > '// Open web site
    > IeApp.Navigate
    > "http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
    > Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
    >
    > '// Find text to extract
    > txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
    > txt = IeApp.Document.GetElementByID("a2copy").innerhtml
    >
    > When extracting the text (ie. "txt") I seem to get more than just the
    > text of the body that I'm after, and the resulting junk is difficult to
    > remove.


    So you are not using JavaScript at all but you are automating Internet
    Explorer with VBA. The IE object model for HTML documents is documented
    here:
    <http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp>

    You might be after the |innerText| property instead of the |innerHTML|
    property of element objects. Or you might want to look at specific child
    or descendant nodes of an element you have found with getElementById.

    For instance
    IeApp.Document.getElementById("a2copy")
    gives you a div element object which then has other nodes (e.g. table
    element) as child nodes. Once you have an element node you can access
    its |firstChild|, |lastChild|, |childNodes| collection, you can call
    |getElementsByTagName| on the element to find descendant elements of a
    certain tag name.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Oct 30, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walterd
    Replies:
    1
    Views:
    406
    DalePres
    Apr 28, 2004
  2. RSH
    Replies:
    1
    Views:
    417
    Eliyahu Goldin
    Jun 2, 2005
  3. =?Utf-8?B?Z2xlbm4=?=

    Extracting data from a DataSet

    =?Utf-8?B?Z2xlbm4=?=, Mar 21, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    1,511
    =?Utf-8?B?Z2xlbm4=?=
    Mar 21, 2006
  4. Max
    Replies:
    6
    Views:
    6,079
    Malcolm Dew-Jones
    Sep 17, 2004
  5. Roedy Green

    Extracting Data from Forms

    Roedy Green, Aug 25, 2005, in forum: Java
    Replies:
    2
    Views:
    364
    Real Gagnon
    Aug 26, 2005
Loading...

Share This Page