regexp question + html::parser question on the side

B

boris bass

i am trying to scan the html file for all the anchor tags and the
following regex


while ( $content =~ /<\s*a\s+.*?>/s )


doesn't seem to work , what is wrong here?

angle brackets don't need to be escaped, do they?

obviously, the string i am trying to match is

<a href="something">

but i get no match


ps. this could probably be done via HTML::parser module and not
through the regular expressions.

the task i am trying to accomplish: find

<a href="something">

and change it to

<a href="">

i.e. delete a link to whatever and leave an empty string in place of
it.

if somebody could post a code snippet how to do it would also be
appreciated. doesn't have to be tested, just to point me at the right
direction. i looked at html::parser doc page, but i haven't figured it
out on my own


thanks,


boris
 
A

Anno Siegel

boris bass said:
i am trying to scan the html file for all the anchor tags and the
following regex


while ( $content =~ /<\s*a\s+.*?>/s )


doesn't seem to work , what is wrong here?

angle brackets don't need to be escaped, do they?

obviously, the string i am trying to match is

<a href="something">

but i get no match

I do. *shrug*
ps. this could probably be done via HTML::parser module and not
through the regular expressions.

Yes.

Anno
 
D

David K. Wall

boris bass said:
if somebody could post a code snippet how to do it would also be
appreciated. doesn't have to be tested, just to point me at the
right direction. i looked at html::parser doc page, but i haven't
figured it out on my own

If HTML::parser seems weird, try HTML::TokeParser. It may seem more
intuitive.
 
B

Bob Walton

boris said:
i am trying to scan the html file for all the anchor tags and the
following regex


while ( $content =~ /<\s*a\s+.*?>/s )


doesn't seem to work , what is wrong here?


Mostly you need the g and the i switches. The g switch so you don't
generate an infinite loop on the first match, and the i switch so you
match something like <A href="xxx">. Example:

{local $/;$content=<DATA>}
while($content=~/<\s*a\s+.*?>/sgi){print "matched $&\n"}
__END__
some html <a href="sdflkj"> and <A href="sflkjwer"> and< a
href="werlkj" > and <a href="sdlfkj">

And note that that will not perform perfectly due to the possibility of
stuff like maybe <img src="xxx" alt="<a b c d>"> etc etc in the HTML.
For 100% performance, use one of the HTML parsers.


....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top