Bug? concatenate a number to a backreference: re.sub(r'(zzz:)xxx',r'\1'+str(4444), somevar)

A

abdulet

Well its this normal? i want to concatenate a number to a
backreference in a regular expression. Im working in a multprocess
script so the first what i think is in an error in the multiprocess
logic but what a sorprise!!! when arrived to this conclussion after
some time debugging i see that:

import re
aa = "zzz:xxx"
re.sub(r'(zzz:).*',r'\1'+str(3333),aa)
'[33'

¿?¿?¿? well lets put a : after the backreference

aa = "zzz:xxx"
re.sub(r'(zzz).*',r'\1:'+str(3333),aa)
'zzz:3333'

now its the expected result.... so
should i expect that python concatenate the string to the
backreference before substitute the backreference? or is a bug

tested on:
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Python 2.5.2 (r252:60911, Jan 4 2009, 17:40:26) [GCC 4.3.2] on linux2

with the same result

Cheers!
 
P

Peter Otten

abdulet said:
Well its this normal? i want to concatenate a number to a
backreference in a regular expression. Im working in a multprocess
script so the first what i think is in an error in the multiprocess
logic but what a sorprise!!! when arrived to this conclussion after
some time debugging i see that:

import re
aa = "zzz:xxx"
re.sub(r'(zzz:).*',r'\1'+str(3333),aa)
'[33'

If you perform the addition you get r"\13333". How should the regular
expression engine interpret that? As the backreference to group 1, 13, ...
or 13333? It picks something completely different, "[33", because "\133" is
the octal escape sequence for "[":
'['

You can avoid the ambiguity with

extra = str(number)
extra = re.escape(extra)
re.sub(expr r"\g<1>" + extra, text)

The re.escape() step is not necessary here, but a good idea in the general
case when extra is an arbitrary string.

Peter
 
A

abdulet

abdulet said:
Well its this normal? i want to concatenate a number to a
backreference in a regular expression. Im working in a multprocess
script so the first what i think is in an error in the multiprocess
logic but what a sorprise!!! when arrived to this conclussion after
some time debugging i see that:
import re
aa = "zzz:xxx"
re.sub(r'(zzz:).*',r'\1'+str(3333),aa)
'[33'

If you perform the addition you get r"\13333". How should the regular
expression engine interpret that? As the backreference to group 1, 13, ...
or 13333? It picks something completely different, "[33", because "\133" is
the octal escape sequence for "[":

'['

You can avoid the ambiguity with

extra = str(number)
extra = re.escape(extra)
re.sub(expr r"\g<1>" + extra, text)

The re.escape() step is not necessary here, but a good idea in the general
case when extra is an arbitrary string.

Peter
Aha!!! nice thanks i don't see that part of the re module
documentation and it was in front of my eyes :(( like always its
something silly jjj so thanks again and yes!! is a nice idea to escape
the variable ;)

cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top