Match First Sequence in Regular Expression?

R

Roger L. Cauvin

Peter Hansen said:
Roger said:
Roger L. Cauvin wrote:
"xyz123aaabbab" accept
"xyz123aabbaaab" reject
"xayz123aaabab" accept
"xaaayz123abab" reject
"xaaayz123aaabab" accept


This passes your tests. I haven't closely followed the thread for other
requirements:

pattern = ".*?(?<![a+b])aaab" #look for aaab not preceded by any a+b

Very interesting. I think you may have solved the problem. The key
seems to be the "not preceded by" part. I'm unfamiliar with some of the
notation. Can you explain what "[a+b]" and the "(?<!" do?

I think you might need to add a test case involving a pattern of aaaab
prior to another aaab. From what I gather (not reading too closely), you
would want this to be rejected. Is that true?

xyz123aaaababaaabab

Adding that test would be a good idea. You're right; I would want that
string to be rejected, since in that string the first sequence of 'a'
directly preceding a 'b' is of length 4 instead of 3.

Thanks for the solution!

--
Roger L. Cauvin
(e-mail address removed) (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com
 
A

Alex Martelli

Christoph Conrad said:
Hallo Alex,
r = re.compile("[^a]*a{3}b+(a+b*)*") matches = [s for s in
listOfStringsToTest if r.match(s)]
Unfortunately, the OP's spec is even more complex than this, if we are
to take to the letter what you just quoted; e.g. aazaaab SHOULD match,

Then it's again "a{3}b", isn't it?

Except that this one would also match aazaaaaab, which it shouldn't.


Alex
 
S

Scott David Daniels

How about:
pattern = re.compile('^([^a]|(a+[^ab]))*aaab')

Which basically says, "precede with arbitrarily many non-a's
or a sequences ending in non-b, then must have 3 as followed by a b."

cases = ["xyz123aaabbab", "xayz123aaabab", "xaaayz123aaabab",
"xyz123aaaababaaabab", "xyz123aabbaaab", "xaaayz123abab"]
[re.search(pattern, case) is not None for case in cases]
[True, True, True, False, False, False]

--Scott David Daniels
(e-mail address removed)
 
A

Armin Steinhoff

Alex said:
Hello Roger,

since the length of the first sequence of the letter 'a' is 2. Yours
accepts it, right?

Yes, i misunderstood your requirements. So it must be modified
essentially to that what Tim Chase wrote:

m = re.search('^[^a]*a{3}b', 'xyz123aabbaaab')


...but that rejects 'aazaaab' which should apparently be accepted.

... and that is OK. That was the request:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,008
Latest member
HaroldDark

Latest Threads

Top