How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.
Yes, I'd guessed your real problem might be of this nature, which is
why I didn't provide a character class based solution.
$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';
Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.
Again, to get the globally shortest you need to find all candiates and
select the shortest.
So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.
Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match
Not necessarily.
Consider
$text='<script>functionB</script><script>longer! functionB</script>';
Your regex does _not_ find the _longest_ match. It finds the match
that starts in the leftmost position.
I suspect you are not thinking hard enough about what you want. By a
literal interpretation your description of what you want the following
would be an OK match: '<script></script>functionB<script></script>'.
Somehow I suspect (based on domain knowledge) that you wouldn't want
this to be a match but unfortunately computers don't have knowledge
and tend to a bit literal.
For parsing HTML you really should consider using an HTML parser. Any
simple pattern match will fail sooner or later.