Regular Expression not working

F

Fritz Bayer

Hello,

I'm trying to extract urls from a document.

The following code does not work correctly:

while ($content =~ m!$(<p
class=(["']?)g\2>.*?>.*?<a.*?href=(["'])?(http://([^\3]+)))!ig)
{
print "1 $1\n";
print "2 $2\n";
print "3 $3\n";
print "4 $4\n";
print "5 $5\n";
}

The problem is that

([^\3]+)

is also matching the character " or ' from the third capturing group,
even though it should NOT.

If matches them not because the third capturing is empty (not " or '),
but because somehow \3 can't be used inside a [...] block.

Why is that and whats the workaround for this?

Fritz
 
J

John W. Krahn

Greg Bacon wrote:
<snip>

I see that you haven't posted the weekly statistics for a while. Have you
given up on that?


John
 
G

Greg Bacon

: I see that you haven't posted the weekly statistics for a while. Have
: you given up on that?

A while back, I realized I'd missed a week but also noticed the
absence of any clamor over it.

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top