Screen scraping in ASP.NET

J

Jim Giblin

I need to scrape specific information from another website, specifically the
prices of precious metals from several different vendors. While I will
credit the vendors as the data source, I do not want to use the format of
their pages, and want the inforamtion consolidated to a single page of my
design.

I did something like this for a client a couple of years ago in ASP, but it
was complex, and I do not have access to the code. A colleague advised me
that ASP.Net could accomplish this task much easier, but I have little
experience with it.

Can anyone guide me in the right direction.

Thanks,
J. Giblin
 
J

Jens Christian Mikkelsen

Jim Giblin said:
I need to scrape specific information from another website, specifically the
prices of precious metals from several different vendors. While I will
credit the vendors as the data source, I do not want to use the format of
their pages, and want the inforamtion consolidated to a single page of my
design.

I did something like this for a client a couple of years ago in ASP, but it
was complex, and I do not have access to the code. A colleague advised me
that ASP.Net could accomplish this task much easier, but I have little
experience with it.

Can anyone guide me in the right direction.

Thanks,
J. Giblin

Retrieving the HTML is done using the WebRequest class, but you probably
already knew that.

For getting the data from the HTML, I would recommend using regular
expressions with named capturing groups. It is a very reliable and flexible
way of implementing screen scraping.

Try googling on screen scraping and regex or regular expressions, there are
several articles on this.

/Jens
 
J

Jim Giblin

Jens,

I was aware of the WebRequest class as well as the DownloadData method for
pulling in the HTML, but did not have any direction on searching the HTML,
or parsing out the individual expressions once I have metatext in memory.

REGEX was the key!!!! I did attempt to search Google for "screen scraping"
and got referenced to several products like ASPTear which perform the same
function as the DownloadData method just bundled in a class.

In anyone has any code examples, I would really appreciate a different
implementation of this class.

Thanks,
Jim
 
J

Jens Christian Mikkelsen

Jim Giblin said:
In anyone has any code examples, I would really appreciate a different
implementation of this class.

Hi Jim,

Here is an example, which gets the latest news from a Danish travel news
site, generates an RSS feed from it and writes it to the ASP.NET Response
output stream.



Dim sPage As String
Dim oWriter As XmlTextWriter

sPage = WebRequest.GetPage("http://www.standby.dk/")

Dim sPattern As String
sPattern = "<img src=fileadmin/tmpl/standby_pil_red.jpeg border=0>"
sPattern &= "<A HREF=""(?<url>[^""]+)"">"
sPattern &= "(?<title>[^<]+)</a>"

Dim oRegex As New Regex(sPattern, RegexOptions.ExplicitCapture)

Dim oMatches As MatchCollection
oMatches = oRegex.Matches(sPage)

oWriter = New XmlTextWriter(Response.OutputStream,
System.Text.Encoding.UTF8)

oWriter.WriteStartElement("rss")
oWriter.WriteAttributeString("version", "2.0")
oWriter.WriteStartElement("channel")
oWriter.WriteElementString("title", "STAND BY")
oWriter.WriteElementString("link", "http://www.standby.dk")
oWriter.WriteElementString("description", "The Scandinavian Travel Trade
Journal")
oWriter.WriteElementString("language", "da")
For Each oMatch As Match In oMatches
oWriter.WriteStartElement("item")
oWriter.WriteElementString("title", oMatch.Groups("title").ToString)
oWriter.WriteElementString("link", "http://www.standby.dk/" &
oMatch.Groups("url").ToString)
oWriter.WriteEndElement() ' item
Next
oWriter.WriteEndElement() ' channel
oWriter.WriteEndElement() ' rss
oWriter.Flush()
oWriter.Close()



/Jens
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top