Regexp greediness.

A

adamomitcheney

Hi there Perl gurus,

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
$comment = $2;
}
else
{
$hl = 0;
}
print "Target is $hl\n";
print "Comment is \"$comment\"\n";

Produces the following output:
Target is ->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"
Comment is ""

I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
difference.

I realise I could probably use split to do this and then substitute out
the -> or <-, but I'm quite keen to understand what I'm doing wrong.

Cheers - Adam...
 
P

Paul Lalli

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Yes... the documentation for what the special characters do in a
regexp... ;-)
The input ($hl) should look something like this:

In general "something like this" is not good enough. When composing
your post, you should endeavor to give *actual* sample input, output,
and desired output.
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

Are you under the impression that this is looking for either a '->' or
| said:
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;

Is there a space after ] and before (? I'd guess that's (at least part
of) your problem. You're looking for the beginning of the string,
exactly one of -, >, |, or <, followed by a space. No such sequence
exists.
$comment = $2;

Never assign to a $1, $2, $3 etc variable without verifying that the
match succeeded:

if ($h1 =~ s/<your pattern here>/<your replacement here>/){
print "s/// succeeded\n";
$comment = $2;
} else {
warn "s/// failed\n";
}
}
else
{
$hl = 0;
}
print "Target is $hl\n";
print "Comment is \"$comment\"\n";

Produces the following output:
Target is ->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"
Comment is ""

I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
difference.

The "throw it at the wall and see what sticks" method is rarely a good
way of programming.
I realise I could probably use split to do this

I doubt it. split() uses regexps too, so you'd probably copy over your
error.
and then substitute out
the -> or <-, but I'm quite keen to understand what I'm doing wrong.

[ ] make up a character class. They look for any ONE character within
their contained class. You were trying to do an alternation and
cluster the two alternates together. That is accomplished with
parentheses and the |, like so:

(?:->|<-)

The ?: prevents these parentheses from being recognized as a capturing
grouping, thus setting one of the $1, $2, etc variables.

perldoc perlretut
perldoc perlre
perldoc perlreref

Paul Lalli
 
A

adamomitcheney

I'd just like to start with a D'Oh!
Yes... the documentation for what the special characters do in a
regexp... ;-)

No, it was just being daft - I wasn't even looking there (initially
using '[' when I should have been grouping with '()' and I should have
known better.
[ ] make up a character class. They look for any ONE character within
their contained class. You were trying to do an alternation and
cluster the two alternates together. That is accomplished with
parentheses and the |, like so:

(?:->|<-)

The ?: prevents these parentheses from being recognized as a capturing
grouping, thus setting one of the $1, $2, etc variables.

Aye, that was what I was missing.

Thanks for taking the time to point it out.

Adam...
 
G

Gunnar Hjalmarsson

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".
 
A

adamomitcheney

1. See the above comment
2. A blank doesn't match a newline

No, but wasn't intended to - I should have specified that the input I
posted was all one line, but posting it on groups.google munged it a
bit.
3. No need to make them non-greedy (even if that doesn't hurt...)

No, quite. I understand that now.
In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".

I have done, but in this case it was a near-terminal case of stupidity
brought on, I think, by tiredness.

Thanks Gunnar.

Adam...
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

I didn't read the whole thread yet, but just a note on this line..

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

This non-capture grouping '(?:->|<-)' will only match ->- or -<-
If thats whats needed then the grouping is not really necessary,
->|<- does the same thing.

It might be possible (?:->)|(?:<-) was intended, in this case it
will match -> or <-
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

This '(.+)' or '(.*)' should never be used without '?' unless you intend to capture
all until the end of the line (or restriction), otherwise (.+?) must be used mid-string
when real content follows in the match requirement.

In this case the '"' would be captured and result in a failed match. Be especially
careful when the intention of use is mid-string.

Why would you allow any character but a newline here: "(.+)"%$1% ?
Use the 's' modifier here.. $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%s;

Possible new regex:

$hl =~ s%^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"%$1%s;

-good luck-
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".

Haven't studied it but I'm not seeing the need for special delimeters '%', maybe you could
instruct me.

Final note, this needs to be done globally. Just as an exercise, to do this en mass,
assuming from a file...

$hl = join ('', <DATA>);
$hl =~ s/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/$1/sg;

Should you be doing this perpetually...

$RxHl = qr/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/;
while ($hl = <DATA>) {
$hl =~ s/$RxHl/g;
}
 
G

Gunnar Hjalmarsson

robic0 said:
I didn't read the whole thread yet,

That's a break of the netiquette. OTOH, in your case it probably
wouldn't have made a difference.
but just a note on this line..

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

This non-capture grouping '(?:->|<-)' will only match ->- or -<-

Wrong, but what else could we expect from that robic0 character?
If thats whats needed then the grouping is not really necessary,
->|<- does the same thing.

There is more in the regex but those arrows, so your discussion is out
of context and thus irrelevant.
It might be possible (?:->)|(?:<-) was intended, in this case it
will match -> or <-
Sigh.

This '(.+)' or '(.*)' should never be used without '?' unless you intend to capture
all until the end of the line (or restriction), otherwise (.+?) must be used mid-string
when real content follows in the match requirement.

In this case the '"' would be captured and result in a failed match. Be especially
careful when the intention of use is mid-string.

More BS statements. Fact is that greediness _never_ affects whether a
regex matches or not.
Why would you allow any character but a newline here: "(.+)"%$1% ?
Use the 's' modifier here.. $hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%s;

LOL, robic0 commenting on the /s modifier again. Maybe you could explain
how it would make a difference in this case? (Second thought: Please
don't!!)
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".

Didn't see the other '(.+)', yes that needs a '?' as well...
$hl =~ s/^(?:->)|(?:<-)\s+(.+?)@@\s+"(.+?)"/$1/sg; # should be done globally unless you know otherwise

You should avoid repetative substitution if processing large data strings.
In this case the qr// will do no good since it will
not pre-compile the regexp with substitution unknowns.

The faster alternative when processing large strings is to capture and continue...
(this is just my opinion)

while ($hl =~ /(?:(?:->)|(?:<-)\s+(.+?)@@\s+"(.+?)")|(.*?)/sg) {
( ( )|( ) 1 1 2 2 )|3 3
if (defined ($1) {
$hl_new .= $2;
} else {$hl_new .= $3;}
}

I'm sure I've made nistakes in the other posts (actually 1)
-good luck-
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".


Here's just something to bust Gunnar's balls, its the
anti-greedy formula, if you can understand it...

$_ =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ELEMENT(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

... Gunnar, float some iceburgs ...
 
L

Lukas Mai

robic0 schrob:
$hl = join ('', <DATA>);

$hl =~ s/^(?:->)|(?:<-)\s+(.+)@@\s+"(.+?)"/$1/sg;

This regex doesn't make sense. It's parsed as:

( ^-> ) | ( <-\s+(.+)@@\s+"(.+?)" )

because | has very low precedence. (?:->) by itself is always the same
as -> alone. This also means $1 is undef if the first part succeeds.

Please read perldoc perlretut and perldoc perlre.

HTH, Lukas
 
R

robic0

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

Well, your problem has nothing to do with greediness.
The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)

You use the notation for a character class, but you probably just want
to capture the alternate arrows:

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
------------------^-----^^-------^

1. See the above comment
2. A blank doesn't match a newline
3. No need to make them non-greedy (even if that doesn't hurt...)

In other words, this line should do it:

$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

If you haven't already, please also study "perldoc perlre".

On the greedy issue, given

"this string \"some other\"" =~ m/"(.+)"/;

would match and $1 would equal <this string "some other">

it matches the very first doubl quote and the very last double quote.

To match 'this string ' use m/"(.+?)"/
This is preferred since greedy is rarely the intention and it prevents
over-match on imperfect (unknown) text sample data.

As a general rule, always tack on a '?' (anti-greedy) when using wildcard like constructs that
could match multiple characters, or a large set of characters, that would blurr or distort a
specifically intended match construct.

Some examples:
..+?
..*?
[^']*?

etc...
 
P

Paul Lalli

robic0 said:
On the greedy issue, given

"this string \"some other\"" =~ m/"(.+)"/;

would match and $1 would equal <this string "some other">

No it wouldn't.
it matches the very first doubl quote and the very last double quote.

Yes. There are only two double-quote characters in that string. One
before 'some' and one after 'other'. The other " characters that you
typed delimit the string, and are not a part of it.
To match 'this string ' use m/"(.+?)"/

Nope. That would match the exact same thing hte non-greedy version
matched.
This is preferred since greedy is rarely the intention

It's the intention when it is the intention. There is no general rule.
Regexps do what they're needed to do when they're needed to do it.
As a general rule, always tack on a '?' (anti-greedy) when using wildcard like
constructs that
could match multiple characters, or a large set of characters, that would blurr or distort

No, as a general rule, write the right regexp for the given situation.
If you need greediness, use greediness. If you don't, don't.

Paul Lalli
 
L

Lukas Mai

robic0 schrob:
Here's just something to bust Gunnar's balls, its the ^ it's
anti-greedy formula, if you can understand it...
$_ =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ELEMENT(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

OK, let's see:
The last (.+?) doesn't make sense because it's not followed by any
pattern, which means +? will never backtrack to consume more. It should
be equivalent to (.).

The whole thing looks like a horribly broken regex for HTML parsing. It
produces weird results for input like '<META content=">foo">' or '<img
alt="foo"> this is not part of "foo">'. The last one is due to
inappropriate greediness.
.. Gunnar, float some iceburgs ...
I don't understand that but it's "icebergs".

HTH, Lukas
 
T

Tad McClellan

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase.
I have read what I think is the
appropriate perldoc (perldoc -q greedy


Greediness has no application to the problem you specify.

I guess I must have missed
something..


The "Using character classes" section in:

perldoc perlretut

The input ($hl) should look something like this:


Developing a regular expression requires an *exact* understanding
of the format of the string to be matched against.

In what ways can your data be different that what we have
been shown?

->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"


You should speak Perl whenever possible.

Have you seen the Posting Guidelines that are posted here frequently?

if ($hl =~ m%^[->|<-]%)


That is exactly equivalent to:

if ($hl =~ m%^[<>-|]%)

Your string does start with a hyphen, so that part should be matching OK.

{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;


Your pattern matches when the string starts with one of the four
characters in the char class, followed by a space.

Your string does not start with one of those characters followed
by a space, so the pattern fails to match.

You probably wanted grouping rather than a character class:

$hl =~ s%^(->|<-) (.+?)@@ "(.+?)"%$2%;

or, if you don't want to mess up the numbering of the captures:

$hl =~ s%^(?:->|<-) (.+?)@@ "(.+?)"%$1%;

But your string will _still_ not match because the pattern requires
a space following the arrow, but your data above has a newline
following the arrow.

but I'm quite keen to understand what I'm doing wrong.


Using [brackets] instead of (parenthesis).



This short and complete program that you can run may help:

------------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my $hl = '-> M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen'
. '\testforccase\TestTwo.PRM@@ "This is the to-text"';

my($path, $comment) = $hl =~ m/^-[><] (.+)@@ "(.+)"/; # m// in list context

print qq(path="$path"\n);
print qq(comment="$comment"\n);
 
X

Xicheng

Hi there Perl gurus,

I'm using (trying to use) a regexp to extract a path and a comment from
the output of a 'describe' command in clearcase. I suspect I'm being
daft, so please go easy on me... I have read what I think is the
appropriate perldoc (perldoc -q greedy - "What does it mean that
regexes are greedy? How can I get around it? greedy greediness"), but
I'm already doing what it suggests - that is, reducing the greediness
of the '.+' expression with a '?'. I guess I must have missed
something..

The input ($hl) should look something like this:
->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"

I'm trying to get at the path and the comment:

if ($hl =~ m%^[->|<-]%)
{
$hl =~ s%^[->|<-] (.+?)@@ "(.+?)"%$1%;
$comment = $2;
}
else
{
$hl = 0;
}
print "Target is $hl\n";
print "Comment is \"$comment\"\n";

Produces the following output:
Target is ->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text"
Comment is ""

I've also tried escaping the '@@' thus '\@\@' but that hasn't made any
difference.

My suggestion for you is using something more specific ([^@]*), ([^"]*)
instead of always swaying between greedy and non-greedy things(.*?) or
(.*). For you case, if you can make sure that there is not any '@' in
you pathname, you can do it this way:
=============================
$hl =q(->
M:\adam_admin\eq-gla_playpen\eq-cc-gla_playpen\testforccase\TestTwo.PRM@@
"This is the to-text");

if ($hl =~ m%^(?:->|<-)%)
{
$hl =~ s%^(?:->|<-)\s*([^@]*)@@\s*"([^"]*)"%$1%x;
$comment = $2;
}

else
{
$hl = 0;
}

print "Target is $hl\n";
print "Comment is \"$comment\"\n";
=============================
Best,
Xicheng
 
T

Tad McClellan

robic0 said:
$hl =~ s%^(?:->|<-)\s+(.+)@@\s+"(.+)"%$1%;

This non-capture grouping '(?:->|<-)' will only match ->- or -<-


No it won't.

perl -le 'print "matched" if "->no hyphen" =~ /(?:->|<-)/'

(prints "matched")

If thats whats needed then the grouping is not really necessary,
->|<- does the same thing.


No it doesn't.

perl -le 'print "matched" if "->" =~ /(?:->|<-)\s+/'

(makes no output)

perl -le 'print "matched" if "->" =~ /->|<-\s+/'

(prints "matched")

It might be possible (?:->)|(?:<-) was intended,


It was not possible that that was intended, as there is more
stuff to match after the right side of the alternation that
needn't be there for the left side to match.



If you haven't already, please also study "perldoc perlre"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top