make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

Discussion in 'Python' started by aspineux, Mar 29, 2007.

  1. aspineux

    aspineux Guest

    I want to parse

    'foo@bare' or '<foot@bar>' and get the email address foo@bar

    the regex is

    r'<\w+@\w+>|\w+@\w+'

    now, I want to give it a name

    r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

    sre_constants.error: redefinition of group name 'email' as group 2;
    was group 1

    BUT because I use a | , I will get only one group named 'email' !

    Any comment ?

    PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    \w+@\w+)(?(lt)>)'
    aspineux, Mar 29, 2007
    #1
    1. Advertising

  2. aspineux

    Guest

    On Mar 29, 7:22 am, "aspineux" <> wrote:
    > I want to parse
    >
    > 'foo@bare' or '<foot@bar>' and get the email address foo@bar
    >
    > the regex is
    >
    > r'<\w+@\w+>|\w+@\w+'
    >
    > now, I want to give it a name
    >
    > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
    >
    > sre_constants.error: redefinition of group name 'email' as group 2;
    > was group 1
    >
    > BUT because I use a | , I will get only one group named 'email' !
    >
    > Any comment ?
    >
    > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    > \w+@\w+)(?(lt)>)'




    Regular expressions, alternation, named groups ... oh my!

    It tends to get quite complex especially if you need
    to reject cases where the string contains a left bracket
    and not the right, or visa-versa.

    >>> pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b(?!>))')
    >>> for email in ('foo@bar' , '<foo@bar>', '<start@without_end_bracket'):

    .... matched = pattern.search(email)
    .... if matched is not None:
    .... print matched.group('email')
    ....
    foo@bar
    <foo@bar>


    I suggest you try some other solution (maybe pyparsing).

    --
    Hope this helps,
    Steven
    , Mar 29, 2007
    #2
    1. Advertising

  3. aspineux

    aspineux Guest

    On 29 mar, 16:22, "aspineux" <> wrote:
    > I want to parse
    >
    > 'foo@bare' or '<foot@bar>' and get the email address foo@bar
    >
    > the regex is
    >
    > r'<\w+@\w+>|\w+@\w+'
    >
    > now, if I want to give it a name
    >
    > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
    >
    > sre_constants.error: redefinition of group name 'email' as group 2;
    > was group 1
    >
    > BUT because I use a | , I will get only one group named 'email' !


    THEN my regex is meaningful, and the error is meaningless and
    somrthing
    should be change into 're'

    But maybe I'm wrong ?

    >
    > Any comment ?


    I'm trying to start a discussion about something that can be improved
    in 're',
    not looking for a solution about email parsing :)


    >
    > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    > \w+@\w+)(?(lt)>)'
    aspineux, Mar 29, 2007
    #3
  4. aspineux

    Paddy Guest

    On Mar 29, 3:22 pm, "aspineux" <> wrote:
    > I want to parse
    >
    > 'foo@bare' or '<foot@bar>' and get the email address foo@bar
    >
    > the regex is
    >
    > r'<\w+@\w+>|\w+@\w+'
    >
    > now, I want to give it a name
    >
    > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
    >
    > sre_constants.error: redefinition of group name 'email' as group 2;
    > was group 1
    >
    > BUT because I use a | , I will get only one group named 'email' !
    >
    > Any comment ?
    >
    > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    > \w+@\w+)(?(lt)>)'


    use two group names, one for each alternate form and if you are not
    concerned with whichever matched do something like the following:

    >>> s1 = 'foo@bare'
    >>> s2 = '<foo@bare>'
    >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
    >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    'foo@bare'
    >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
    >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    'foo@bare'
    >>>


    - Paddy.
    Paddy, Mar 29, 2007
    #4
  5. aspineux

    aspineux Guest

    On 30 mar, 00:13, "Paddy" <> wrote:
    > On Mar 29, 3:22 pm, "aspineux" <> wrote:
    >
    >
    >
    > > I want to parse

    >
    > > 'foo@bare' or '<foot@bar>' and get the email address foo@bar

    >
    > > the regex is

    >
    > > r'<\w+@\w+>|\w+@\w+'

    >
    > > now, I want to give it a name

    >
    > > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

    >
    > > sre_constants.error: redefinition of group name 'email' as group 2;
    > > was group 1

    >
    > > BUT because I use a | , I will get only one group named 'email' !

    >
    > > Any comment ?

    >
    > > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    > > \w+@\w+)(?(lt)>)'

    >
    > use two group names, one for each alternate form and if you are not
    > concerned with whichever matched do something like the following:
    >

    The problem is the way I create this regex :)

    regex={}
    regex['email']=r'(?P<email1>\w+@\w+)'

    path=r'<%(email)s>|%(email)s' % regex

    Once more, the original question is :
    Is it normal to get an error when the same id used on both side of a
    |

    >
    >
    > >>> s1 = 'foo@bare'
    > >>> s2 = '<foo@bare>'
    > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
    > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    > 'foo@bare'
    > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
    > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    > 'foo@bare'
    >
    > - Paddy.
    aspineux, Mar 30, 2007
    #5
  6. aspineux

    Paddy Guest

    On Mar 30, 1:44 pm, "aspineux" <> wrote:
    > On 30 mar, 00:13, "Paddy" <> wrote:
    >
    > > On Mar 29, 3:22 pm, "aspineux" <> wrote:

    >
    > > > I want to parse

    >
    > > > 'foo@bare' or '<foot@bar>' and get the email address foo@bar

    >
    > > > the regex is

    >
    > > > r'<\w+@\w+>|\w+@\w+'

    >
    > > > now, I want to give it a name

    >
    > > > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

    >
    > > > sre_constants.error: redefinition of group name 'email' as group 2;
    > > > was group 1

    >
    > > > BUT because I use a | , I will get only one group named 'email' !

    >
    > > > Any comment ?

    >
    > > > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
    > > > \w+@\w+)(?(lt)>)'

    >
    > > use two group names, one for each alternate form and if you are not
    > > concerned with whichever matched do something like the following:

    >
    > The problem is the way I create this regex :)
    >
    > regex={}
    > regex['email']=r'(?P<email1>\w+@\w+)'
    >
    > path=r'<%(email)s>|%(email)s' % regex
    >
    > Once more, the original question is :
    > Is it normal to get an error when the same id used on both side of a
    > |
    >
    >
    >
    > > >>> s1 = 'foo@bare'
    > > >>> s2 = '<foo@bare>'
    > > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
    > > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    > > 'foo@bare'
    > > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
    > > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

    > > 'foo@bare'

    >
    > > - Paddy.


    Groups are numbered left-to-right irrespective of the expression
    contents.
    I am quite happy with the names being merely apseudonym for the
    positional
    group number and don't see a problem with not allowing multiple
    occurrences of the same group name.
    I did see some article about RE's and their speed. It seems that if
    Pythons
    RE package distinguished between 'grep style' RE' and the full set of
    Python
    RE's then their are much faster and efficient algorithms available for
    the
    grep style subset.

    - Paddy.
    Paddy, Mar 30, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Devin Jeanpierre
    Replies:
    2
    Views:
    429
    Devin Jeanpierre
    Feb 14, 2012
  2. Jabba Laci

    avoid the redefinition of a function

    Jabba Laci, Sep 12, 2012, in forum: Python
    Replies:
    2
    Views:
    176
    Ramchandra Apte
    Sep 12, 2012
  3. D'Arcy Cain
    Replies:
    0
    Views:
    168
    D'Arcy Cain
    Sep 12, 2012
  4. Michael Torrie

    Re: avoid the redefinition of a function

    Michael Torrie, Sep 12, 2012, in forum: Python
    Replies:
    0
    Views:
    168
    Michael Torrie
    Sep 12, 2012
  5. Jabba Laci
    Replies:
    3
    Views:
    190
    Peter Otten
    Sep 13, 2012
Loading...

Share This Page