Regex Help

Support Desk · Sep 22, 2008

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is

regex = r'<a href=["|\']([^"|\']+)["|\']>'

page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)

Miki · Sep 23, 2008

Hello,

Anybody know of a good regex to parse html links from html code?

BeautifulSoup is *the* library to handle HTML

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

soup = BeautifulSoup(urlopen("http://python.org/"))
for a in soup("a"):
print a["href"]

HTH,

Lawrence D'Oliveiro · Sep 24, 2008

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is

regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?

Support Desk · Sep 24, 2008

Thanks for the reply, I found out the problem was occurring later on in the
script. The regexp works well.

-----Original Message-----
From: Lawrence D'Oliveiro [mailto:[email protected]_zealand]
Sent: Tuesday, September 23, 2008 6:51 PM
To: (e-mail address removed)
Subject: Re: Regex Help

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is

regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?

Lawrence D'Oliveiro · Sep 25, 2008

Thanks for the reply ...

A: The vulture doesn't get Frequent Poster miles.
Q: What's the difference between a top-poster and a vulture?

Problems with using event handlers for button and textarea input	1	Nov 29, 2021
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
need some debug-infos on a simple regex	3	Nov 13, 2010
I need help fixing my website	2	Oct 15, 2023
Parsing multiple lines from text file using regex	0	Oct 27, 2013
Help with code	0	Jun 12, 2022
Help with my responsive home page	2	Dec 14, 2022
Convert AWK regex to Python	6	May 16, 2011

Regex Help

Support Desk

Miki

Lawrence D'Oliveiro

Support Desk

Lawrence D'Oliveiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads