Extracting Data from IE

C

chris_j_adams

Hi,

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove. I've looked at the object model but not real sure what I
should be looking for, so wondering if anyone here can spare a bit of
time to provide a pointer. For example, is there a tag that would more
easily refer to the required text?

Many thanks in advance if you can share some advice or guidance.
Regards,
Chris Adams
 
M

Martin Honnen

I'm slowly discovering the world of JavaScript, so I'm not sure I'm
attacking this problem in the right manner, thus if I'm in the wrong
newsgroup, my apologies.

What I'm trying to do is extract some news items from a web site. To
do this, I'm using Microsoft Word VBA and using the following bit of
script:

'// Open web site
IeApp.Navigate
"http://www.radioaustralia.net.au/francais/stories/s1776501.htm"
Do: Loop Until IeApp.ReadyState = READYSTATE_COMPLETE

'// Find text to extract
txtTitle = IeApp.Document.GetElementByID("a2title").innerhtml
txt = IeApp.Document.GetElementByID("a2copy").innerhtml

When extracting the text (ie. "txt") I seem to get more than just the
text of the body that I'm after, and the resulting junk is difficult to
remove.

So you are not using JavaScript at all but you are automating Internet
Explorer with VBA. The IE object model for HTML documents is documented
here:
<http://msdn.microsoft.com/library/d...hor/dhtml/reference/dhtml_reference_entry.asp>

You might be after the |innerText| property instead of the |innerHTML|
property of element objects. Or you might want to look at specific child
or descendant nodes of an element you have found with getElementById.

For instance
IeApp.Document.getElementById("a2copy")
gives you a div element object which then has other nodes (e.g. table
element) as child nodes. Once you have an element node you can access
its |firstChild|, |lastChild|, |childNodes| collection, you can call
|getElementsByTagName| on the element to find descendant elements of a
certain tag name.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top