a simple regex question

J

John Salerno

Ok, I'm stuck on another Python challenge question. Apparently what you
have to do is search through a huge group of characters and find a
single lowercase character that has exactly three uppercase characters
on either side of it. Here's what I have so far:

pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()

Not sure if 'groups' is necessary or not.

Anyway, this returns one matching string, but when I put this letter in
as the solution to the problem, I get a message saying "yes, but there
are more", so assuming this means that there is more than one character
with three caps on either side, is my RE written correctly to find them
all? I didn't have the parentheses or + sign at first, but I added them
to find all the possible matches, but still only one comes up.

Thanks.
 
J

John Salerno

John said:
Ok, I'm stuck on another Python challenge question. Apparently what you
have to do is search through a huge group of characters and find a
single lowercase character that has exactly three uppercase characters
on either side of it. Here's what I have so far:

pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()

Not sure if 'groups' is necessary or not.

Anyway, this returns one matching string, but when I put this letter in
as the solution to the problem, I get a message saying "yes, but there
are more", so assuming this means that there is more than one character
with three caps on either side, is my RE written correctly to find them
all? I didn't have the parentheses or + sign at first, but I added them
to find all the possible matches, but still only one comes up.

Thanks.

A quick note: I found nine more matches by using findall() instead of
search(), but I'm still curious how to write the RE so that it works
with search, especially since findall wouldn't have returned overlapping
matches. I guess I didn't write it to properly check multiple times.
 
J

Justin Azoff

John said:
Ok, I'm stuck on another Python challenge question. Apparently what you
have to do is search through a huge group of characters and find a
single lowercase character that has exactly three uppercase characters
on either side of it. Here's what I have so far:

pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()

Not sure if 'groups' is necessary or not.

Anyway, this returns one matching string, but when I put this letter in
as the solution to the problem, I get a message saying "yes, but there
are more", so assuming this means that there is more than one character
with three caps on either side, is my RE written correctly to find them
all? I didn't have the parentheses or + sign at first, but I added them
to find all the possible matches, but still only one comes up.

Thanks.

I don't believe you _need_ the parenthesis or the + in that usage...

Have a look at http://docs.python.org/lib/node115.html

It should be obvious which method you need to use to "find them all"
 
J

John Salerno

Justin said:
John said:
Ok, I'm stuck on another Python challenge question. Apparently what you
have to do is search through a huge group of characters and find a
single lowercase character that has exactly three uppercase characters
on either side of it. Here's what I have so far:

pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()

Not sure if 'groups' is necessary or not.

Anyway, this returns one matching string, but when I put this letter in
as the solution to the problem, I get a message saying "yes, but there
are more", so assuming this means that there is more than one character
with three caps on either side, is my RE written correctly to find them
all? I didn't have the parentheses or + sign at first, but I added them
to find all the possible matches, but still only one comes up.

Thanks.

I don't believe you _need_ the parenthesis or the + in that usage...

Have a look at http://docs.python.org/lib/node115.html

It should be obvious which method you need to use to "find them all"

But would findall return this match: aMNHiRFLoDLFb ??

There are actually two matches there, but they overlap. So how would
your write an RE that catches them both?
 
D

Dennis Lee Bieber

Ok, I'm stuck on another Python challenge question. Apparently what you
have to do is search through a huge group of characters and find a
single lowercase character that has exactly three uppercase characters
on either side of it. Here's what I have so far:

pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()
I don't do REs; but what exactly are you supposed to return? A
count, the index to where such a match occurred, the 7-characters
themselves?

I'd probably do something very simplistic:
.... if c[x-3:x-1].isupper() and c[x].islower() and
c[x+1:x+3].isupper():
.... print "=> ", c[x-3:x+3]
....
=> STRiNG
=> VAIlAB
Needs a bit more work since it doesn't exclude having MORE than
three uppercase on a side... Testing -4 and +4 for lowercase would do
most of it... But that ends up making the start and end of data special
cases...
--
 
R

Roel Schroeven

John Salerno schreef:
pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
print re.search(pattern, mess).groups()

Anyway, this returns one matching string, but when I put this letter in
as the solution to the problem, I get a message saying "yes, but there
are more", so assuming this means that there is more than one character
with three caps on either side, is my RE written correctly to find them
all? I didn't have the parentheses or + sign at first, but I added them
to find all the possible matches, but still only one comes up.

Thanks.

A quick note: I found nine more matches by using findall() instead of
search(), but I'm still curious how to write the RE so that it works
with search, especially since findall wouldn't have returned overlapping
matches. I guess I didn't write it to properly check multiple times.

It seems to me you should be able to find all matches with search(). Not
with the pattern you mention above: that will only find matches if they
come right after each other, as in
xXXXxXXXxyYYYyYYYyzZZZzZZZz

You'll need something more like
pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]+)+'
so that it will find matches that are further apart from each other.

That said, I think findall() is a better solution for this problem. I
don't think search() will find overlapping matches either, so that's no
reason not to use findall(), and the pattern is simpler with findall();
I solved this challenge with findall() and this regular expression:

pattern = r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]'
 
P

Paddy

John said:
But would findall return this match: aMNHiRFLoDLFb ??

There are actually two matches there, but they overlap. So how would
your write an RE that catches them both?

I remembered the 'non-consuming' match (?+...) and a miniute of
experimentation gave
the following.
import re
s ="aMNHiRFLoDLFb"
re.findall(r'[A-Z]{3}([a-z])(?=[A-Z]{3})', s) ['i', 'o']

- Paddy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top