HTML Screen Scraping Q

G

George Durzi

I'd like to screen-scrape company news from cbsmarketwatch. Consider this
URL as an example:
http://cbs.marketwatch.com/tools/quotes/news.asp?symb=MSFT When you browse
there, there's two sections, 1. News Headlines for Microsoft Corporation,
and 2. Press Releases about Microsoft Corporation.

I've already written the code to post to the page and grab the HTML into a
string. If you browse the source of the above linked webpage, here's an
excerpt of how the news headlines would look:

<TABLE WIDTH="100%" CELLPADDING="0" CELLSPACING="0" border="0" ID="Table1">
<?xml version="1.0" encoding="UTF-16" ?>
<TR class="tb01">
<TD COLSPAN="4" height="20">
<A class="lk03"
href="/tools/quotes/news.asp?siteid=mktw&symb=MSFT&amp;property=sid&amp;valu
e=3140&amp;doctype=2006">News Headlines for Microsoft Corporation (MSFT)</A>
</TD>
</TR>
<TR>
<TD NOWRAP="TRUE" width="110" valign="top">12:58pm 02/13/04</TD>
<TD valign="top">
<A class="lk01"
HREF="/news/story.asp?guid=%7B01470A47%2D936B%2D444D%2DB6FC%2DD111A9E61EE4%7
D&amp;siteid=mktw&amp;">Market Snapshot</A>
</td>
</TR>
</TABLE>

What I'd like to do is create a dataset (or anything else I can bind to a
datagrid) containing the news items.
I noticed that the news items are enclosed in a table which has <?xml
version="1.0" encoding="UTF-16" ?>
Would this allow me an easy way to navigate this HTML?
What tools can I use to do this? Regular Expressions?

Any tips are greatly appreciated.
 
G

George Durzi

I hear ya ... What I was trying to do at first was find an RSS feed which
took in a stock ticker as a parameter and gave me back some news headlines.
All I could find when I chased that one down was a beta RSS feed that a
developer at Yahoo had created. Unfortunately, it's no longer around, and
there's nothing like it.

This is gonna be used for an intranet application, and the news headlines
are gonna link to the actual cbsmarketwatch pages. I will also be crediting
the source of the newsfeed.

Hopefully that should cover it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top