Regular Expression not working

Fritz Bayer · Jun 15, 2005

Hello,

I'm trying to extract urls from a document.

The following code does not work correctly:

while ($content =~ m!$(<p
class=(["']?)g\2>.*?>.*?<a.*?href=(["'])?(http://([^\3]+)))!ig)
{
print "1 $1\n";
print "2 $2\n";
print "3 $3\n";
print "4 $4\n";
print "5 $5\n";
}

The problem is that

([^\3]+)

is also matching the character " or ' from the third capturing group,
even though it should NOT.

If matches them not because the third capturing is empty (not " or '),
but because somehow \3 can't be used inside a [...] block.

Why is that and whats the workaround for this?

Fritz

Greg Bacon · Jun 15, 2005

: I'm trying to extract urls from a document.
: [...]
: Why is that and whats the workaround for this?

The workaround is to use the HTML::LinkExtor module from CPAN:

http://search.cpan.org/dist/HTML-Parser/lib/HTML/LinkExtor.pm

I also see brian d foy's HTML::SimpleLinkExtor, but I haven't
used it.

Hope this helps,
Greg

John W. Krahn · Jun 15, 2005

Greg Bacon wrote:
<snip>

I see that you haven't posted the weekly statistics for a while. Have you
given up on that?

John

Greg Bacon · Jun 15, 2005

: I see that you haven't posted the weekly statistics for a while. Have
: you given up on that?

A while back, I realized I'd missed a week but also noticed the
absence of any clamor over it.

Greg

Slideshow not working properly	2	Jan 7, 2023
Why is Python telling me variable is local not global?	3	Sep 2, 2023
Using python recursion to calculate the Parenthesis part not working	4	Feb 5, 2023
Code was not Working Please Help	1	May 30, 2023
Recursion regular expression (xtended)	1	Aug 16, 2010
CSS Grid inside slider not working...	1	Nov 28, 2022
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Addition and substraction of polynomials is working fine but the multiplication isn't; what's wrong with my code	1	Nov 22, 2022

Regular Expression not working

Fritz Bayer

Greg Bacon

John W. Krahn

Greg Bacon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads