make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

aspineux · Mar 29, 2007

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

attn.steven.kuo · Mar 29, 2007

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

Regular expressions, alternation, named groups ... oh my!

It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.
.... matched = pattern.search(email)
.... if matched is not None:
.... print matched.group('email')
....
foo@bar
<foo@bar>

I suggest you try some other solution (maybe pyparsing).

aspineux · Mar 29, 2007

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, if I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

THEN my regex is meaningful, and the error is meaningless and
somrthing
should be change into 're'

But maybe I'm wrong ?

Any comment ?

I'm trying to start a discussion about something that can be improved
in 're',
not looking for a solution about email parsing

Paddy · Mar 29, 2007

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

s1 = 'foo@bare'
s2 = '<foo@bare>'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2'] 'foo@bare'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2'] 'foo@bare'

Click to expand...

Click to expand...

- Paddy.

aspineux · Mar 30, 2007

use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

The problem is the way I create this regex

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|

s1 = 'foo@bare'
s2 = '<foo@bare>'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2'] 'foo@bare'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

Click to expand...

Click to expand...

'foo@bare'

- Paddy.

Paddy · Mar 30, 2007

use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

Click to expand...

The problem is the way I create this regex

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|

s1 = 'foo@bare'
s2 = '<foo@bare>'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2'] 'foo@bare'
matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
matchobj.groupdict()['email1'] or matchobj.groupdict()['email2'] 'foo@bare'

Click to expand...

- Paddy.

Click to expand...

Groups are numbered left-to-right irrespective of the expression
contents.
I am quite happy with the names being merely apseudonym for the
positional
group number and don't see a problem with not allowing multiple
occurrences of the same group name.
I did see some article about RE's and their speed. It seems that if
Pythons
RE package distinguished between 'grep style' RE' and the full set of
Python
RE's then their are much faster and efficient algorithms available for
the
grep style subset.

- Paddy.

Bootstrap contact form not working	2	Feb 15, 2025
Using re.VERBOSE, and re-using components of regex?	1	Apr 16, 2013
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 22, 2014
re question	4	Jun 23, 2006
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
action_page.php form	2	Oct 25, 2020
Working with named groups in re module	2	Jan 10, 2007
RE Engine error with sub()	6	Apr 15, 2005

make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

aspineux

attn.steven.kuo

aspineux

Paddy

aspineux

Paddy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads