order of evaluation

F

Flo

Hello

Consider these two regular expressions
1) .*.*
2) .*?.*?

which are equivalent to
1) (.*)(.*)
2) (.*?)(.*?)

Where the * is the greedy 0-or-more quantifier and *? the lazy 0-or-
more quantifier. In the following \1 is a backreference to the match
inside the first parenthesis, \2 likewise for the 2nd parenthesis.

As I understand it, most flavours of regular expressions have the
following behaviour for the two regexes above
1) \1 returns the whole target string, \2 returns the empty string
2) \1 returns the empty string, \1 returns the whole target string

Is that statement correct at all, i.e. do indeed most flavours have
that behaviour?

Which rules dictates that behaviour? I would say
"for regexes, order of evaluation is left to right"
But I am not sure since I never saw such a statement.

Remember that "order of evaluation" and "precedence" are *not* the
same thing, at least not in general. See also
http://groups.google.com/group/comp.lang.c/browse_thread/thread/5bc23...

Flo
 
P

Paul Lalli

Consider these two regular expressions
1) .*.*
2) .*?.*?

which are equivalent to
1) (.*)(.*)
2) (.*?)(.*?)

Where the * is the greedy 0-or-more quantifier and *? the lazy 0-or-
more quantifier. In the following \1 is a backreference to the match
inside the first parenthesis, \2 likewise for the 2nd parenthesis.

As I understand it, most flavours of regular expressions have the
following behaviour for the two regexes above
1) \1 returns the whole target string, \2 returns the empty string
Yes.

2) \1 returns the empty string, \2 returns the whole target string

No. They both return the empty string. They're both non-greedy.
There is nothing forcing the second one to return anything more than 0
characters.

You can check this for yourself:
$ perl -le'"Foo Bar" =~ /(.*)(.*)/; print qq{"$1"-"$2"}'
"Foo Bar"-""
$ perl -le'"Foo Bar" =~ /(.*?)(.*?)/; print qq{"$1"-"$2"}'
""-""

Now, if you had anchored the patterns to the start/end of the string,
then you would be correct:
$ perl -le'"Foo Bar" =~ /^(.*)(.*)$/; print qq{"$1"-"$2"}'
"Foo Bar"-""
$ perl -le'"Foo Bar" =~ /^(.*?)(.*?)$/; print qq{"$1"-"$2"}'
""-"Foo Bar"
Is that statement correct at all, i.e. do indeed most flavours have
that behaviour?

I have no idea what you mean by "flavors". I'm talking about Perl
regular expressions. If you're asking about regular expressions in
some other language, you'd have to ask a group devoted to that
language.
Which rules dictates that behaviour? I would say
"for regexes, order of evaluation is left to right"
But I am not sure since I never saw such a statement.

perldoc perlre
Alternatives are tried from left to right, so the first
alternative found for which the entire expression matches,
is the one that is chosen.

Paul Lalli
 
F

Flo

Can you also refer me to a rule within the documentation of Perl why
in the first example, where you answered with yes, why it is like
that. I'd like to know the name of the rule which states why the first
and not the second star * greedely matches all. As I said, the fact is
not determined by precedence. Precedence defines other things.
 
X

xhoster

Flo said:
Can you also refer me to a rule within the documentation of Perl why
in the first example, where you answered with yes, why it is like
that.

"Combining pieces together" in perldoc perlre.

Xho
 
B

Brian McCauley

Can you also refer me to a rule within the documentation of Perl why
in the first example, where you answered with yes, why it is like
that. I'd like to know the name of the rule which states why the first
and not the second star * greedely matches all.

I'm guessing that this is a reply to Paul's post.

You are not quoting enough context for this to make sense. You stated
_yourself_ that the behaviour would be explained by a left-to-right
rule but said you couldn't find such a statement in perlre.

Paul found the phrase "left to right" in perlre.

What are you still having trouble with?
 
F

Flo

Paul found the phrase "left to right" in perlre.
What are you still having trouble with?

Because that phrase is about alternation, i.e. the alternation
operator |. It is thus not applicable to my problem.

Flo
 
P

Paul Lalli

Because that phrase is about alternation, i.e. the alternation
operator |. It is thus not applicable to my problem.

You're right. My mistake. I quoted the wrong passage. Instead, how
about this one, from `perldoc perlretut`?
o Principle 0: Taken as a whole, any regexp will be
matched at the earliest possible position in the string.

o Principle 1: In an alternation "a|b|c...", the leftmost
alternative that allows a match for the whole regexp
will be the one used.

o Principle 2: The maximal matching quantifiers "?", "*",
"+" and "{n,m}" will in general match as much of the
string as possible while still allowing the whole regexp
to match.

o Principle 3: If there are two or more elements in a
regexp, the leftmost greedy quantifier, if any, will
match as much of the string as possible while still
allowing the whole regexp to match. The next leftmost
greedy quantifier, if any, will try to match as much of
the string remaining available to it as possible, while
still allowing the whole regexp to match. And so on,
until all the regexp elements are satisfied.


Pay close attentions to Principles 0 and 3.

Paul Lalli
 
J

John W. Krahn

X

Xiong Changnian

Flo said:
I'd like to know the name of the rule which states why the first
and not the second star * greedely matches all.

Um, I'm not sure it's directly stated. Others have posted sources that
the regexp is evaluated left-to-right. By the time the second * comes
up, there isn't anything "remaining" of the target.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top