regex (?!..) problem

Wolfgang Rohdewald · Oct 5, 2009

Hi,

I want to match a string only if a word (C1 in this example) appears
at most once in it. This is what I tried:
('C1',)

but this should not have matched. Why is the .*? behaving greedy
if followed by (?!.*C1)? I would have expected that re first
evaluates (.*?C1) before proceeding at all.

I also tried:
C1C2C3C4').groups()
('C1b1b1b1 b3b3b3b3 C1',)

with the same problem.

How could this be done?

n00m · Oct 5, 2009

Why not check it simply by "count()"?

Carl Banks · Oct 5, 2009

Hi,

I want to match a string only if a word (C1 in this example) appears
at most once in it. This is what I tried:

('C1b1b1b1 b3b3b3b3 C1', '')>>> re.match(r'(.*?C1)','C1b1b1b1 b3b3b3b3 C1C2C3').groups()

('C1',)

but this should not have matched. Why is the .*? behaving greedy
if followed by (?!.*C1)?

It's not.

I would have expected that re first
evaluates (.*?C1) before proceeding at all.

It does.

What you're not realizing is that if a regexp search comes to a dead
end, it won't simply return "no match". Instead it'll throw away part
of the match, and backtrack to a previously-matched variable-length
subexpression, such as ".*?", and try again with a different length.

That's what happened above. At first the group "(.*?C1)" non-greedily
matched the substring "C1", but it couldn't find a match under those
circumstances, so it backtracked to the ".*?". and looked a longer
match, which it found.

Here's something to keep in mind: except for a few corner cases,
greedy versus non-greedy will not affect the substring matched, it'll
only affect the groups.

I also tried:

C1C2C3C4').groups()
('C1b1b1b1 b3b3b3b3 C1',)

with the same problem.

How could this be done?

Can't be done with regexps.

How you would do this kind of depends on your overall goals, but your
first look should be toward the string methods. If you share details
with us we can help you choose a better strategy.

Carl Banks

Wolfgang Rohdewald · Oct 5, 2009

What you're not realizing is that if a regexp search comes to a
dead end, it won't simply return "no match". Instead it'll throw
away part of the match, and backtrack to a previously-matched
variable-length subexpression, such as ".*?", and try again with a
different length.

well, that explains it. This is contrary to what the documentation
says, though. Should I fill a bug report?
http://docs.python.org/library/re.html

Now back to my original problem: Would you have any idea how
to solve it?

count() is no solution in my case, I need re.search to either
return None or a match.

Stefan Behnel · Oct 5, 2009

Wolfgang said:
I want to match a string only if a word (C1 in this example) appears
at most once in it.

def match(s):
if s.count("C1") > 1:
return None
return s

If this doesn't fit your requirements, you may want to provide some more
details.

Stefan

Wolfgang Rohdewald · Oct 5, 2009

def match(s):
if s.count("C1") > 1:
return None
return s

If this doesn't fit your requirements, you may want to provide some
more details.

Well - the details are simple and already given: I need re.search
to either return None or a match. But I will try to state it
differently:

I have a string representing the results for a player of a board
game (Mah Jongg - not the solitaire but the real one, played by
4 players), and I have a list of scoring rules. Those rules
can be modified by the user, he can also add new rules. Mah Jongg
is played with very different rulesets worldwide.

The rules are written as regular expressions. Since what they
do varies greatly I do not want do treat some of them in a special
way. That would theoretically be possible but it would really
complificate things.

For each rule I simply need to check whether it applies or not.
I do that by calling re.search(rule, gamestring) and by checking
the result against None.

Here you can look at all rules I currently have.
http://websvn.kde.org/trunk/playground/games/kmj/src/predefined.py?view=markup
The rule I want to rewrite is called "Robbing the Kong". Of
course it is more complicated than my example with C1.

Here you can find the documentation for the gamestring:
http://websvn.kde.org/trunk/playground/games/doc/kmj/index.docbook?revision=1030476&view=markup
(get HTML files with "meinproc index.docbook")

Carl Banks · Oct 5, 2009

well, that explains it. This is contrary to what the documentation
says, though. Should I fill a bug report?http://docs.python.org/library/re.html

If you're referring to the section where it explains greedy
qualifiers, it is not wrong per se. re.match does exactly what the
documentation says: it matches as few characters as possible to the
non-greedy pattern.

However, since it's easy to misconstrue that if you don't know about
regexp backtracking, perhaps a little mention of backtracking is is
warranted. IMO it's not a documentation bug, so if you want to file a
bug report I'd recommend filing as a wishlist item.

I will mention that my followup contained an error (which you didn't
quote). I said greedy versus non-greedy doesn't affect the substring
matched. That was wrong, it does affect the substring matched; what
it doesn't affect is whether there is a match found.

Now back to my original problem: Would you have any idea how
to solve it?

count() is no solution in my case, I need re.search to either
return None or a match.

Why do you have to use a regexp at all?

In Python we recommend using string operations and methods whenever
reasonable, and avoiding regexps unless you specifically need their
extra power. String operations can easily do the examples you posted,
so I see no reason to use regexps.

Depending on what you want to do with the result, one of the following
functions should be close to what you need. (I am using "word" to
refer to the string being matched against, "token" to be the thing you
don't want to appear more than once.)

def token_appears_once(word,token):
return word.count(token) == 1

def parts(word,token):
head,sep,tail = word.partition("C1")
if sep == "" or "C1" in tail:
return None
return (head,sep,tail)

If you really need a match object, you should do a search, and then
call the .count method on the matched substring to see if there is
more than one occurrence, like this:

def match_only_if_token_appears_once(pattern,wotd,token):
m = re.search(pattern,word)
if m.group(0).count("C1") != 1:
m = None
return m

Carl Banks

Wolfgang Rohdewald · Oct 5, 2009

Why do you have to use a regexp at all?

not one but many with arbitrary content.

please read my answer to Stefan. Take a look
at the regexes I am using:
http://websvn.kde.org/trunk/playground/games/kmj/src/predefined.py?view=markup

moreover they are saved in a data base. I would
not want to do that with python code.

Actually I already did that - I defined Python classes
which made it possible to write a rule as (example)

self.mjRules.append(Rule('Fourfold Plenty', 'PKong()*4 + Pair()',
limits=1))

see
http://websvn.kde.org/trunk/playground/games/kmj/src/rulesets.py?view=markup&pathrev=998607

this was then executed with eval. But eval seems unsafe
and some rules were simply not writable that way without
specialized hard wired python code. That is not what I
envision. Also regex turned out to have about the same
speed IIRC

Hans Mulder · Oct 6, 2009

Stefan said:
def match(s):
if s.count("C1") > 1:
return None
return s

If this doesn't fit your requirements, you may want to provide some more
details.

That will return a false value if s is the empty string.

How about:

def match(s):
return s.count("C1") <= 1

-- HansM

Idk need help in editing this source code	0	Nov 5, 2022
Problem with static data members in templated class	11	Oct 4, 2010
Why is regex so slow?	21	Jun 18, 2013
How can I remove the specific error message after regex validation is done?	1	Jan 25, 2023
My regex kung-fu is not strong =(	0	Apr 4, 2020
regex walktrough	4	Dec 8, 2012
Question about Tashaphyne package in python	1	Mar 3, 2013
parsing nested unbounded XML fields with ElementTree	6	Nov 25, 2013

regex (?!..) problem

Wolfgang Rohdewald

n00m

Carl Banks

Wolfgang Rohdewald

Stefan Behnel

Wolfgang Rohdewald

Carl Banks

Wolfgang Rohdewald

Hans Mulder

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads