Regular Expression help - href replacement

S

Saya

Hi,

I have a text string looking something like this:

my $test = "<a bookmark></a> en href=\"http://www.google.com\" dette
er en test for at checke <a target=\"_new\"
href=\"http://www.google.dk\" prop=\"saya\"> NNIT<\\a> link test og
lidt mere tekst til :) <a href=\"www.saya.dk\"
target=\"new\">saya<\\a> den replacer garanteret også <a
href=\"/saya.asp\">ASP<\\a>";

In the above their are 4 links:
1. "<a bookmark></a> en href=\"http://www.google.com\" - Not a real
link, should not be processed
2. <a target=\"_new\" href=\"http://www.google.dk\" prop=\"saya\">
this should become <a target="_new"
href="/redirect.asp?forwardURL=http://www.google.dk" prop="saya">
3. <a href=\"www.saya.dk\" target=\"new\"> should become <a
href="/redircet.asp?forwardURL=www.saya.dk" target="new">
4. <a href=\"/saya.asp\"> - Shoulf not change since it is an
"internal" link

Now I have come so far as to this expression:

$test =~ s/(<a.*?)(href=\")(http|www)/$1$2\/redirect.asp?forwardURL=$3/gi;

but the result this yields is also replacing link 1, which is not to
change.
What am I doing wrong here ? Is there a better way to achieve this ?

Any help will be greatly appreciated.

Regards
Says
 
B

Ben Morrow

my $test = "<a bookmark></a> en href=\"http://www.google.com\" dette
er en test for at checke <a target=\"_new\"
href=\"http://www.google.dk\" prop=\"saya\"> NNIT<\\a> link test og
lidt mere tekst til :) <a href=\"www.saya.dk\"
target=\"new\">saya<\\a> den replacer garanteret også <a
href=\"/saya.asp\">ASP<\\a>";

In the above their are 4 links:
1. "<a bookmark></a> en href=\"http://www.google.com\" - Not a real
link, should not be processed
2. <a target=\"_new\" href=\"http://www.google.dk\" prop=\"saya\">
this should become <a target="_new"
href="/redirect.asp?forwardURL=http://www.google.dk" prop="saya">
3. <a href=\"www.saya.dk\" target=\"new\"> should become <a
href="/redircet.asp?forwardURL=www.saya.dk" target="new">
4. <a href=\"/saya.asp\"> - Shoulf not change since it is an
"internal" link

Now I have come so far as to this expression:

$test =~ s/(<a.*?)(href=\")(http|www)/$1$2\/redirect.asp?forwardURL=$3/gi;

but the result this yields is also replacing link 1, which is not to
change.
What am I doing wrong here ?

Trying to parse HTML with a regex.
Is there a better way to achieve this ?

Use one of the HTML::* modules from CPAN.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top