Non-greedy matching problem

jason.yfho · Feb 3, 2007

Hello!

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Thank you!

Rgds,
Jason

Dave Slayton · Feb 3, 2007

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....

Dr.Ruud · Feb 3, 2007

Dave Slayton schreef:

(e-mail address removed):

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s".

Click to expand...

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....

The greediness is of no importance here, so /(i[^is]*s)/ is better, or
/(i[^is]+s)/ if at least one character should be between i and s.

If more than one of such string can be in the main string, you'll need
to capture them all, then sort on length and take the shortest.

perl -Mstrict -wle'
print +(sort {length $a <=> length $b}
/i[^is]+s/g)[0]
for @ARGV;
' iiiiabcdssssiiiiabssssiiiiabcssss isiasiabs
iabs
ias

kens · Feb 3, 2007

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....

Hello!

Click to expand...

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

Click to expand...

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Click to expand...

Thank you!

Click to expand...

Rgds,
Jason

Click to expand...

Please do not top-post - makes it harder to follow the conversation.

A minor point - the question mark (?) is not necessary in your regular
expression

$text =~ /(i[^is]*?s)/;

If 's' is not included in the character class, it is needed:
$text =~ /(i[^i]*?s)/;
Ken

Dave Slayton · Feb 3, 2007

kens said:
how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....

Hello!

Click to expand...

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

Click to expand...

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Click to expand...

Thank you!

Click to expand...

Rgds,
Jason

Click to expand...

Click to expand...

Please do not top-post - makes it harder to follow the conversation.

A minor point - the question mark (?) is not necessary in your regular
expression

$text =~ /(i[^is]*?s)/;

Click to expand...

If 's' is not included in the character class, it is needed:
$text =~ /(i[^i]*?s)/;
Ken

Sorry. Won't happen again.

Also, maybe my solution isn't optimal...he said he wanted 'the shortest
length of string that starts with "i" and ends with "s" from string
"iiiidssss" using regular expression'...well, it works on *that*
string...but not if he wanted the shortest possible such substring from
*any* string....for the string "iiiiiiiiunderstandingssssssssss" my regex
gets "iunders" and not "ings", cuz "iunders" is the leftmost valid match, so
it wins even though it's longer....not sure how to make it get "ings"...the
solution offered by A. Sinan Unur has the same "problem".

Brian McCauley · Feb 3, 2007

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Greedyness only applies at the end.

You can put something greedy in front, but this will always find the
last match.

$text =~ m/.*(i.+?s)/;

It still wont find the globally shortest match. in "iiassiibbss" it
will find "ibbs".

To find the globally shotest match you need to find all the matches
and then find the shortest. I could show you how (later) if you want
but I'm just on my way out.

jason.yfho · Feb 3, 2007

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason

jason.yfho · Feb 3, 2007

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason

Brian McCauley · Feb 4, 2007

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

Yes, I'd guessed your real problem might be of this nature, which is
why I didn't provide a character class based solution.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

Again, to get the globally shortest you need to find all candiates and
select the shortest.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Not necessarily.

Consider

$text='<script>functionB</script><script>longer! functionB</script>';

Your regex does _not_ find the _longest_ match. It finds the match
that starts in the leftmost position.

I suspect you are not thinking hard enough about what you want. By a
literal interpretation your description of what you want the following
would be an OK match: '<script></script>functionB<script></script>'.
Somehow I suspect (based on domain knowledge) that you wouldn't want
this to be a match but unfortunately computers don't have knowledge
and tend to a bit literal.

For parsing HTML you really should consider using an HTML parser. Any
simple pattern match will fail sooner or later.

Peter J. Holzer · Feb 4, 2007

(e-mail address removed) wrote in @v45g2000cwv.googlegroups.com:

It seems a perfect match for index and rindex:

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'iiiidssss';

my $s = 'iiiidssssi';

my $start = rindex $s, 'i';
my $end = index $s, 's';

if ( $start > -1 and $start < $end ) {
print substr( $s, $start, $end - $start + 1), "\n";
}

__END__

Hint to the OP: If you provide only one example string and no rules how
it was constructed we can only provide solutions which work for this
string, but not for similar strings because we don't know what "similar"
means.

hp

Perl regex - How to make my greedy quantifier greedier?	1	May 16, 2013
regular expressions and matching delimeters	17	May 21, 2014
Regex: deleting non-matching words	3	Aug 22, 2010
FAQ 6.13 What does it mean that regexes are greedy? How can I get around it?	0	Apr 18, 2011
regexp non-greedy matching bug?	8	Dec 3, 2005
Non-Greedy. Please help. Need for answer asap!	10	Mar 8, 2006
Pyparsing: Non-greedy matching?	2	Dec 30, 2004
Link Matching	17	May 4, 2007

Non-greedy matching problem

jason.yfho

Dave Slayton

Dr.Ruud

kens

Dave Slayton

Brian McCauley

jason.yfho

jason.yfho

Brian McCauley

Peter J. Holzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads