regular expression

D

dj

Hi,
I am writing a script that parses an html file (which has been retrieved as
a scalar by LWP::UserAgent). The script looks for everything in between the
first <P> tag and the last </P> tag, with any number of <P> and </P> tags in
between. I am sure I have done something like this before, but for the life
of me I can't remember how... (maybe i did it before in lex). Anyone got
any neato suggestions?

Thanks for any help,
Drew
 
N

Nicholas Knight

Hi,
I am writing a script that parses an html file (which has been retrieved
as
a scalar by LWP::UserAgent). The script looks for everything in between
the first <P> tag and the last </P> tag, with any number of <P> and </P>
tags in
between. I am sure I have done something like this before, but for the
life
of me I can't remember how... (maybe i did it before in lex). Anyone
got any neato suggestions?

Are you looking for //s ? It makes '.' match newlines, too. I'd probably
do it like this (the 'i' to ignore case, as some people capitalize all
tags and some don't):

/<p>(.*)<\/p>/si
 
D

dj

Hi Nicholas,

yep, i had something along these lines,

while ($_ =~ s/.+<P>(.+)<\/P>.+/$1/gsi) {
print;
}

but no sub occurs. I have tried a few combinations, but no match :)
 
M

Martien Verbruggen

A _very_ simpleminded approach could do this:

my ($stuff) = /<P>(.*)</P>/i;

Addition: You also need the /s flag to match newlines. But again, I
wouldn't use it.

Martien
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top