Python's regular expression help

Discussion in 'Python' started by goldtech, Apr 29, 2010.

  1. goldtech

    goldtech Guest

    Hi,
    Trying to start out with simple things but apparently there's some
    basics I need help with. This works OK:
    >>> import re
    >>> p = re.compile('(ab*)(sss)')
    >>> m = p.match( 'absss' )
    >>> m.group(0)

    'absss'
    >>> m.group(1)

    'ab'
    >>> m.group(2)

    'sss'
    ....
    But two questions:

    How can I operate a regex on a string variable?
    I'm doing something wrong here:

    >>> f=r'abss'
    >>> f

    'abss'
    >>> m = p.match( f )
    >>> m.group(0)

    Traceback (most recent call last):
    File "<pyshell#15>", line 1, in <module>
    m.group(0)
    AttributeError: 'NoneType' object has no attribute 'group'

    How do I implement a regex on a multiline string? I thought this
    might work but there's problem:

    >>> p = re.compile('(ab*)(sss)', re.S)
    >>> m = p.match( 'ab\nsss' )
    >>> m.group(0)

    Traceback (most recent call last):
    File "<pyshell#26>", line 1, in <module>
    m.group(0)
    AttributeError: 'NoneType' object has no attribute 'group'
    >>>


    Thanks for the newbie regex help, Lee
    goldtech, Apr 29, 2010
    #1
    1. Advertising

  2. goldtech

    Dodo Guest

    Le 29/04/2010 20:00, goldtech a écrit :
    > Hi,
    > Trying to start out with simple things but apparently there's some
    > basics I need help with. This works OK:
    >>>> import re
    >>>> p = re.compile('(ab*)(sss)')
    >>>> m = p.match( 'absss' )
    >>>> m.group(0)

    > 'absss'
    >>>> m.group(1)

    > 'ab'
    >>>> m.group(2)

    > 'sss'
    > ...
    > But two questions:
    >
    > How can I operate a regex on a string variable?
    > I'm doing something wrong here:
    >
    >>>> f=r'abss'
    >>>> f

    > 'abss'
    >>>> m = p.match( f )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#15>", line 1, in<module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > How do I implement a regex on a multiline string? I thought this
    > might work but there's problem:
    >
    >>>> p = re.compile('(ab*)(sss)', re.S)
    >>>> m = p.match( 'ab\nsss' )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#26>", line 1, in<module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >>>>

    >
    > Thanks for the newbie regex help, Lee


    for multiline, I use re.DOTALL

    I do not know match(), findall is pretty efficient :
    my = "<a href=\"hello world.com\">LINK</a>"
    res = re.findall(">(.*?)<",my)
    >>> res

    ['LINK']

    Dorian
    Dodo, Apr 29, 2010
    #2
    1. Advertising

  3. goldtech

    MRAB Guest

    goldtech wrote:
    > Hi,
    > Trying to start out with simple things but apparently there's some
    > basics I need help with. This works OK:
    >>>> import re
    >>>> p = re.compile('(ab*)(sss)')
    >>>> m = p.match( 'absss' )
    >>>> m.group(0)

    > 'absss'
    >>>> m.group(1)

    > 'ab'
    >>>> m.group(2)

    > 'sss'
    > ...
    > But two questions:
    >
    > How can I operate a regex on a string variable?
    > I'm doing something wrong here:
    >
    >>>> f=r'abss'
    >>>> f

    > 'abss'
    >>>> m = p.match( f )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#15>", line 1, in <module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >

    Look closely: the regex contains 3 letter 's', but the string referred
    to by f has only 2.

    > How do I implement a regex on a multiline string? I thought this
    > might work but there's problem:
    >
    >>>> p = re.compile('(ab*)(sss)', re.S)
    >>>> m = p.match( 'ab\nsss' )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#26>", line 1, in <module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'
    >
    > Thanks for the newbie regex help, Lee


    The string contains a newline between the 'b' and the 's', but the regex
    isn't expecting any newline (or any other character) between the 'b' and
    the 's', hence no match.
    MRAB, Apr 29, 2010
    #3
  4. goldtech

    Tim Chase Guest

    On 04/29/2010 01:00 PM, goldtech wrote:
    > Trying to start out with simple things but apparently there's some
    > basics I need help with. This works OK:
    >>>> import re
    >>>> p = re.compile('(ab*)(sss)')
    >>>> m = p.match( 'absss' )

    >
    >>>> f=r'abss'
    >>>> f

    > 'abss'
    >>>> m = p.match( f )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#15>", line 1, in<module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'


    'absss' != 'abss'

    Your regexp looks for 3 "s", your "f" contains only 2. So the
    regexp object doesn't, well, match. Try

    f = 'absss'

    and it will work. As an aside, using raw-strings for this text
    doesn't change anything, but if you want, you _can_ write it as

    f = r'absss'

    if it will make you feel better :)

    > How do I implement a regex on a multiline string? I thought this
    > might work but there's problem:
    >
    >>>> p = re.compile('(ab*)(sss)', re.S)
    >>>> m = p.match( 'ab\nsss' )
    >>>> m.group(0)

    > Traceback (most recent call last):
    > File "<pyshell#26>", line 1, in<module>
    > m.group(0)
    > AttributeError: 'NoneType' object has no attribute 'group'


    Well, it depends on what you want to do -- regexps are fairly
    precise, so if you want to allow whitespace between the two, you
    can use

    r = re.compile(r'(ab*)\s*(sss)')

    If you want to allow whitespace anywhere, it gets uglier, and
    your capture/group results will contain that whitespace:

    r'(a\s*b*)\s*(s\s*s\s*s)'

    Alternatively, if you don't want to allow arbitrary whitespace
    but only newlines, you can use "\n*" instead of "\s*"

    -tkc
    Tim Chase, Apr 29, 2010
    #4
  5. goldtech

    goldtech Guest

    On Apr 29, 11:49 am, Tim Chase <> wrote:
    > On 04/29/2010 01:00 PM, goldtech wrote:
    >
    > > Trying to start out with simple things but apparently there's some
    > > basics I need help with. This works OK:
    > >>>> import re
    > >>>> p = re.compile('(ab*)(sss)')
    > >>>> m = p.match( 'absss' )

    >
    > >>>> f=r'abss'
    > >>>> f

    > > 'abss'
    > >>>> m = p.match( f )
    > >>>> m.group(0)

    > > Traceback (most recent call last):
    > >    File "<pyshell#15>", line 1, in<module>
    > >      m.group(0)
    > > AttributeError: 'NoneType' object has no attribute 'group'

    >
    > 'absss' != 'abss'
    >
    > Your regexp looks for 3 "s", your "f" contains only 2.  So the
    > regexp object doesn't, well, match.  Try
    >
    >    f = 'absss'
    >
    > and it will work.  As an aside, using raw-strings for this text
    > doesn't change anything, but if you want, you _can_ write it as
    >
    >    f = r'absss'
    >
    > if it will make you feel better :)
    >
    > > How do I implement a regex on a multiline string?  I thought this
    > > might work but there's problem:

    >
    > >>>> p = re.compile('(ab*)(sss)', re.S)
    > >>>> m = p.match( 'ab\nsss' )
    > >>>> m.group(0)

    > > Traceback (most recent call last):
    > >    File "<pyshell#26>", line 1, in<module>
    > >      m.group(0)
    > > AttributeError: 'NoneType' object has no attribute 'group'

    >
    > Well, it depends on what you want to do -- regexps are fairly
    > precise, so if you want to allow whitespace between the two, you
    > can use
    >
    >    r = re.compile(r'(ab*)\s*(sss)')
    >
    > If you want to allow whitespace anywhere, it gets uglier, and
    > your capture/group results will contain that whitespace:
    >
    >    r'(a\s*b*)\s*(s\s*s\s*s)'
    >
    > Alternatively, if you don't want to allow arbitrary whitespace
    > but only newlines, you can use "\n*" instead of "\s*"
    >
    > -tkc


    Yes, most of my problem is w/my patterns not w/any python re syntax.

    I thought re.S will take a multiline string with any spaces or
    newlines and make it appear as one line to the regex. Make "/n" be
    ignored in a way...still playing w/it. Thanks for the help!
    goldtech, Apr 29, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,269
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    831
    Alan Moore
    Dec 2, 2005
  3. GIMME
    Replies:
    3
    Views:
    11,921
    vforvikash
    Dec 29, 2008
  4. pekka niiranen
    Replies:
    5
    Views:
    504
    Paul McGuire
    Oct 20, 2004
  5. Gabriel Genellina

    Re: python regular expression help

    Gabriel Genellina, Apr 12, 2007, in forum: Python
    Replies:
    4
    Views:
    257
    7stud
    Apr 12, 2007
Loading...

Share This Page