Example Script to parse web page links and extract data?

L

livin

I'm hoping someone knows of an example script I can see to help me build
mine.

I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.
Here's steps it needs to accomplish...

1) login to the site (windows dialog when hitting web page) *optional*

2) Choose menu link from ASP page (script shows/hides menu items depending
on mouseover) *optional*

3) Basic Search Form and enter zip code or city to pull all the data.

4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/Scripts/Member.asp?PubID=AA059 where the
PubID is record I need.

Table format looks like this:

<tr>

<td bgcolor="#C0C0C0" align="center">

<a href="javascript:GetAgent('MA142')">

<font face="Arial" size="2">6</font></a></td>

<td><font face="Arial" size="2">

<a href="javascript:GetAgent('MA142')">Alaze</a><br></font></td>

<td><font face="Arial" size="2">Mark <br></font></td>


<td><font face="Arial" size="2">MA142</font><br>

</td>

<td><font face="Arial" size="2">

<a href="javascript:GetBroker('COLD56')">Banker Success
Realty</a><br></font></td>

<td>COLD56</td>

<td><font face="Arial" size="2"><script LANGUAGE="javascript">

<!--

writePhoneNumber('480-999-9999');

//--></script></td>

</tr>
 
S

Steven Bethard

livin said:
I'm looking for an easy way to automate the below web site browsing and pull
the data I'm searching for.

This is a task that BeautifulSoup[1] is usually good for.
4) After search, table shows many links (hundreds sometimes) to the actual
data I need.
Links are this format... <a href="javascript:GetAgent('AA059')">

5) Each link opens new window with table providing required data.
The URLs that each href opens is this...
http://armls.marketlinx.com/Roster/Scripts/Member.asp?PubID=AA059 where the
PubID is record I need.

I'm not entirely sure I got your problem description right, but I think
points 4 and 5 would look something like:

base_url = 'http://armls.marketlinx.com/.../Member.asp?PubID=AA059'
html = urllib.urlopen(base_url).read()
soup = BeautifulSoup.BeautifulSoup(html)

link_matcher = re.compile(r'javascript:GetAgent('[^']*')
for link_elem in soup('a', {'href': link_matcher}):
...

HTH,

STeVe

[1] http://www.crummy.com/software/BeautifulSoup/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top