Regex Help

S

Support Desk

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is


regex = r'<a href=["|\']([^"|\']+)["|\']>'

page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)
 
M

Miki

Hello,
Anybody know of a good regex to parse html links from html code?
BeautifulSoup is *the* library to handle HTML

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

soup = BeautifulSoup(urlopen("http://python.org/"))
for a in soup("a"):
print a["href"]

HTH,
 
L

Lawrence D'Oliveiro

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is


regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?
 
S

Support Desk

Thanks for the reply, I found out the problem was occurring later on in the
script. The regexp works well.

-----Original Message-----
From: Lawrence D'Oliveiro [mailto:[email protected]_zealand]
Sent: Tuesday, September 23, 2008 6:51 PM
To: (e-mail address removed)
Subject: Re: Regex Help

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is


regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
arinsharma
Top