Non-greedy matching problem

J

jason.yfho

Hello!

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Thank you!

Rgds,
Jason
 
D

Dave Slayton

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....
 
D

Dr.Ruud

Dave Slayton schreef:
(e-mail address removed):
I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s".

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....

The greediness is of no importance here, so /(i[^is]*s)/ is better, or
/(i[^is]+s)/ if at least one character should be between i and s.

If more than one of such string can be in the main string, you'll need
to capture them all, then sort on length and take the shortest.

perl -Mstrict -wle'
print +(sort {length $a <=> length $b}
/i[^is]+s/g)[0]
for @ARGV;
' iiiiabcdssssiiiiabssssiiiiabcssss isiasiabs
iabs
ias
 
K

kens

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....


I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.
$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?
Thank you!
Rgds,
Jason

Please do not top-post - makes it harder to follow the conversation.

A minor point - the question mark (?) is not necessary in your regular
expression
$text =~ /(i[^is]*?s)/;
If 's' is not included in the character class, it is needed:
$text =~ /(i[^i]*?s)/;
Ken
 
D

Dave Slayton

kens said:
how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of
anything that's not an 'i' or 's', followed by an 's'....


I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.
$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?
Thank you!
Rgds,
Jason

Please do not top-post - makes it harder to follow the conversation.

A minor point - the question mark (?) is not necessary in your regular
expression
$text =~ /(i[^is]*?s)/;
If 's' is not included in the character class, it is needed:
$text =~ /(i[^i]*?s)/;
Ken

Sorry. Won't happen again.

Also, maybe my solution isn't optimal...he said he wanted 'the shortest
length of string that starts with "i" and ends with "s" from string
"iiiidssss" using regular expression'...well, it works on *that*
string...but not if he wanted the shortest possible such substring from
*any* string....for the string "iiiiiiiiunderstandingssssssssss" my regex
gets "iunders" and not "ings", cuz "iunders" is the leftmost valid match, so
it wins even though it's longer....not sure how to make it get "ings"...the
solution offered by A. Sinan Unur has the same "problem".
 
B

Brian McCauley

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Greedyness only applies at the end.

You can put something greedy in front, but this will always find the
last match.

$text =~ m/.*(i.+?s)/;

It still wont find the globally shortest match. in "iiassiibbss" it
will find "ibbs".

To find the globally shotest match you need to find all the matches
and then find the shortest. I could show you how (later) if you want
but I'm just on my way out.
 
J

jason.yfho

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason
 
J

jason.yfho

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason
 
B

Brian McCauley

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

Yes, I'd guessed your real problem might be of this nature, which is
why I didn't provide a character class based solution.
$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

Again, to get the globally shortest you need to find all candiates and
select the shortest.
So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Not necessarily.

Consider

$text='<script>functionB</script><script>longer! functionB</script>';

Your regex does _not_ find the _longest_ match. It finds the match
that starts in the leftmost position.

I suspect you are not thinking hard enough about what you want. By a
literal interpretation your description of what you want the following
would be an OK match: '<script></script>functionB<script></script>'.
Somehow I suspect (based on domain knowledge) that you wouldn't want
this to be a match but unfortunately computers don't have knowledge
and tend to a bit literal.

For parsing HTML you really should consider using an HTML parser. Any
simple pattern match will fail sooner or later.
 
P

Peter J. Holzer

(e-mail address removed) wrote in @v45g2000cwv.googlegroups.com:


It seems a perfect match for index and rindex:

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'iiiidssss';

my $s = 'iiiidssssi';

my $start = rindex $s, 'i';
my $end = index $s, 's';

if ( $start > -1 and $start < $end ) {
print substr( $s, $start, $end - $start + 1), "\n";
}

__END__

Hint to the OP: If you provide only one example string and no rules how
it was constructed we can only provide solutions which work for this
string, but not for similar strings because we don't know what "similar"
means.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top