a simple regex question

Discussion in 'Python' started by John Salerno, Apr 1, 2006.

  1. John Salerno

    John Salerno Guest

    Ok, I'm stuck on another Python challenge question. Apparently what you
    have to do is search through a huge group of characters and find a
    single lowercase character that has exactly three uppercase characters
    on either side of it. Here's what I have so far:

    pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    print re.search(pattern, mess).groups()

    Not sure if 'groups' is necessary or not.

    Anyway, this returns one matching string, but when I put this letter in
    as the solution to the problem, I get a message saying "yes, but there
    are more", so assuming this means that there is more than one character
    with three caps on either side, is my RE written correctly to find them
    all? I didn't have the parentheses or + sign at first, but I added them
    to find all the possible matches, but still only one comes up.

    Thanks.
    John Salerno, Apr 1, 2006
    #1
    1. Advertising

  2. John Salerno

    John Salerno Guest

    John Salerno wrote:
    > Ok, I'm stuck on another Python challenge question. Apparently what you
    > have to do is search through a huge group of characters and find a
    > single lowercase character that has exactly three uppercase characters
    > on either side of it. Here's what I have so far:
    >
    > pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    > print re.search(pattern, mess).groups()
    >
    > Not sure if 'groups' is necessary or not.
    >
    > Anyway, this returns one matching string, but when I put this letter in
    > as the solution to the problem, I get a message saying "yes, but there
    > are more", so assuming this means that there is more than one character
    > with three caps on either side, is my RE written correctly to find them
    > all? I didn't have the parentheses or + sign at first, but I added them
    > to find all the possible matches, but still only one comes up.
    >
    > Thanks.


    A quick note: I found nine more matches by using findall() instead of
    search(), but I'm still curious how to write the RE so that it works
    with search, especially since findall wouldn't have returned overlapping
    matches. I guess I didn't write it to properly check multiple times.
    John Salerno, Apr 1, 2006
    #2
    1. Advertising

  3. John Salerno

    Justin Azoff Guest

    John Salerno wrote:
    > Ok, I'm stuck on another Python challenge question. Apparently what you
    > have to do is search through a huge group of characters and find a
    > single lowercase character that has exactly three uppercase characters
    > on either side of it. Here's what I have so far:
    >
    > pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    > print re.search(pattern, mess).groups()
    >
    > Not sure if 'groups' is necessary or not.
    >
    > Anyway, this returns one matching string, but when I put this letter in
    > as the solution to the problem, I get a message saying "yes, but there
    > are more", so assuming this means that there is more than one character
    > with three caps on either side, is my RE written correctly to find them
    > all? I didn't have the parentheses or + sign at first, but I added them
    > to find all the possible matches, but still only one comes up.
    >
    > Thanks.


    I don't believe you _need_ the parenthesis or the + in that usage...

    Have a look at http://docs.python.org/lib/node115.html

    It should be obvious which method you need to use to "find them all"

    --
    - Justin
    Justin Azoff, Apr 1, 2006
    #3
  4. John Salerno

    John Salerno Guest

    Justin Azoff wrote:
    > John Salerno wrote:
    >> Ok, I'm stuck on another Python challenge question. Apparently what you
    >> have to do is search through a huge group of characters and find a
    >> single lowercase character that has exactly three uppercase characters
    >> on either side of it. Here's what I have so far:
    >>
    >> pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    >> print re.search(pattern, mess).groups()
    >>
    >> Not sure if 'groups' is necessary or not.
    >>
    >> Anyway, this returns one matching string, but when I put this letter in
    >> as the solution to the problem, I get a message saying "yes, but there
    >> are more", so assuming this means that there is more than one character
    >> with three caps on either side, is my RE written correctly to find them
    >> all? I didn't have the parentheses or + sign at first, but I added them
    >> to find all the possible matches, but still only one comes up.
    >>
    >> Thanks.

    >
    > I don't believe you _need_ the parenthesis or the + in that usage...
    >
    > Have a look at http://docs.python.org/lib/node115.html
    >
    > It should be obvious which method you need to use to "find them all"
    >


    But would findall return this match: aMNHiRFLoDLFb ??

    There are actually two matches there, but they overlap. So how would
    your write an RE that catches them both?
    John Salerno, Apr 1, 2006
    #4
  5. On Fri, 31 Mar 2006 18:39:43 -0500, John Salerno
    <> declaimed the following in comp.lang.python:

    > Ok, I'm stuck on another Python challenge question. Apparently what you
    > have to do is search through a huge group of characters and find a
    > single lowercase character that has exactly three uppercase characters
    > on either side of it. Here's what I have so far:
    >
    > pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    > print re.search(pattern, mess).groups()
    >

    I don't do REs; but what exactly are you supposed to return? A
    count, the index to where such a match occurred, the 7-characters
    themselves?

    I'd probably do something very simplistic:

    >>> c = "A long STRiNGS testing is aVAIlABLe"
    >>> for x in range(3,len(data)-3):

    .... if c[x-3:x-1].isupper() and c[x].islower() and
    c[x+1:x+3].isupper():
    .... print "=> ", c[x-3:x+3]
    ....
    => STRiNG
    => VAIlAB
    >>>


    Needs a bit more work since it doesn't exclude having MORE than
    three uppercase on a side... Testing -4 and +4 for lowercase would do
    most of it... But that ends up making the start and end of data special
    cases...
    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
    Dennis Lee Bieber, Apr 1, 2006
    #5
  6. John Salerno schreef:
    >> pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+'
    >> print re.search(pattern, mess).groups()
    >>
    >> Anyway, this returns one matching string, but when I put this letter in
    >> as the solution to the problem, I get a message saying "yes, but there
    >> are more", so assuming this means that there is more than one character
    >> with three caps on either side, is my RE written correctly to find them
    >> all? I didn't have the parentheses or + sign at first, but I added them
    >> to find all the possible matches, but still only one comes up.
    >>
    >> Thanks.

    >
    > A quick note: I found nine more matches by using findall() instead of
    > search(), but I'm still curious how to write the RE so that it works
    > with search, especially since findall wouldn't have returned overlapping
    > matches. I guess I didn't write it to properly check multiple times.


    It seems to me you should be able to find all matches with search(). Not
    with the pattern you mention above: that will only find matches if they
    come right after each other, as in
    xXXXxXXXxyYYYyYYYyzZZZzZZZz

    You'll need something more like
    pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]+)+'
    so that it will find matches that are further apart from each other.

    That said, I think findall() is a better solution for this problem. I
    don't think search() will find overlapping matches either, so that's no
    reason not to use findall(), and the pattern is simpler with findall();
    I solved this challenge with findall() and this regular expression:

    pattern = r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]'


    --
    If I have been able to see further, it was only because I stood
    on the shoulders of giants. -- Isaac Newton

    Roel Schroeven
    Roel Schroeven, Apr 1, 2006
    #6
  7. John Salerno

    Paddy Guest

    John Salerno wrote:
    > But would findall return this match: aMNHiRFLoDLFb ??
    >
    > There are actually two matches there, but they overlap. So how would
    > your write an RE that catches them both?


    I remembered the 'non-consuming' match (?+...) and a miniute of
    experimentation gave
    the following.

    >>> import re
    >>> s ="aMNHiRFLoDLFb"
    >>> re.findall(r'[A-Z]{3}([a-z])(?=[A-Z]{3})', s)

    ['i', 'o']
    >>>


    - Paddy.
    Paddy, Apr 2, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Saad Malik
    Replies:
    5
    Views:
    374
    John C. Bollinger
    May 2, 2005
  2. johnny

    Simple Python REGEX Question

    johnny, May 11, 2007, in forum: Python
    Replies:
    4
    Views:
    406
    James T. Dennis
    May 12, 2007
  3. Replies:
    3
    Views:
    726
    Reedick, Andrew
    Jul 1, 2008
  4. Sam Kong
    Replies:
    8
    Views:
    111
    Csaba Henk
    Mar 25, 2005
  5. Todd

    Simple regex question

    Todd, Oct 25, 2005, in forum: Ruby
    Replies:
    3
    Views:
    92
    Brian Schröder
    Oct 25, 2005
Loading...

Share This Page