RegEx with multiple occurrences

M

Mike

Hi again.

I'm trying to strip all script blocks from HTML, and am using the
following re to do it:

p = re.compile("(\<script.*>*\</script>)",re.IGNORECASE | re.DOTALL)
m = p.search(data)

The problem is that I'm getting everything from the 1st script's start
tag to the last script's end tag in one group - so it seems like it
parses the string from both ends therefore removing far more from that
data than I want. What am I doing wrong?
 
T

Tim Chase

p = re.compile("(\ said:
m = p.search(data)

First, I presume you didn't copy & paste your expression, as
it looks like you're missing a period before the second
asterisk. Otherwise, all you'd get is any number of
greater-than signs followed by a closing "</script>" tag.

Second, you're likely getting some foobar results because
you're not using a "real" string of the form

r'(\ said:
The problem is that I'm getting everything from the 1st
script's start tag to the last script's end tag in one
group - so it seems like it parses the string from both
ends therefore removing far more from that data than I
want. What am I doing wrong?

Looks like you want the non-greedy modifier to the "*"
described at

http://docs.python.org/lib/re-syntax.html

(searching the page for "greedy" should turn up the
paragraph on the modifiers)

You likely want something more like:

r'<script[^>]*>.*?</script>'

In the first atom, you're looking for the remainder of the
script tag (as much stuff that isn't a ">" as possible).
Then you close the tag with the ">", and then take as little
as possible (".*?") of anything until you find the closing
"</script>" tag.

HTH,

-tkc
 
T

Tim Chase

Tim - you're a legend. Thanks.

A leg-end? I always knew something was a-foot. Sorry to
make myself the butt of such joking. :)

My pleasure...glad it seems to be working for you.

-tkc (not much of a legend at all...just a regexp wonk)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top