trouble with regex with escaped metachars (URGENT please O:-)

F

Fernando Rodriguez

Hi,

I have a filewhose contents looks like this:

Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt

The tokens $<...> must be susbtituted by some user-provided string. The
problem is that those user-provided strings might contain metacharacters, so I
escape them. And that's where I get into trouble.

Here's the code I'm using:

def substitute(name, value, cts):
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""

pat = re.compile("\$<" + name + ">", re.IGNORECASE)

return pat.sub(val, cts) # this line causes the error (see below)

def escapeMetachars( s ):
"""
All metacharacters in the user provided substitution must
be escaped
"""
meta = r'\.^$+*?{[|()'
esc = ''

for c in s:
if c in meta:
esc += '\\' + c
else:
esc += c

return esc

cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""

name = 'OutputDir'
value = "c:\\apps\\whatever\\" # contains the backslash metachar

print substitute( escapeMetachars(name), value, cts)

I get this error:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in -toplevel-
pat.sub(s,cts)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
template = _compile_repl(template, pattern)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
raise error, v # invalid expression
error: bogus escape (end of line)

What on earth is this? O:)

PS: I can't use string.replace() for the susbtitution,because it must be
case-insensitive: the user might enter OUTPUTDIR, and it should still work.
 
A

anton muhin

Fernando said:
Hi,

I have a filewhose contents looks like this:

Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt

The tokens $<...> must be susbtituted by some user-provided string. The
problem is that those user-provided strings might contain metacharacters, so I
escape them. And that's where I get into trouble.

Here's the code I'm using:

def substitute(name, value, cts):
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""

pat = re.compile("\$<" + name + ">", re.IGNORECASE)

return pat.sub(val, cts) # this line causes the error (see below)

def escapeMetachars( s ):
"""
All metacharacters in the user provided substitution must
be escaped
"""
meta = r'\.^$+*?{[|()'
esc = ''

for c in s:
if c in meta:
esc += '\\' + c
else:
esc += c

return esc

cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""

name = 'OutputDir'
value = "c:\\apps\\whatever\\" # contains the backslash metachar

print substitute( escapeMetachars(name), value, cts)

I get this error:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in -toplevel-
pat.sub(s,cts)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
template = _compile_repl(template, pattern)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
raise error, v # invalid expression
error: bogus escape (end of line)

What on earth is this? O:)

PS: I can't use string.replace() for the susbtitution,because it must be
case-insensitive: the user might enter OUTPUTDIR, and it should still work.

The following works:

import re

cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\\apps\\easyjob\\main\\I18N\\US\\res\\license.txt"""

name = 'OutputDir'

pat = re.compile("\$<" + name + ">", re.IGNORECASE)

value = "c:\\apps\\whatever\\"

def escape(s):
return s.replace('\\', '\\\\')

print pat.sub(escape(value), cts)

Note, that you should double \ in cts too (at least my snake prints some
garbage otherwise).

regards,
anton.
 
R

Roel Mathys

Fernando said:
Hi,

I have a filewhose contents looks like this:

Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt

The tokens $<...> must be susbtituted by some user-provided string. The
problem is that those user-provided strings might contain metacharacters, so I
escape them. And that's where I get into trouble.

Here's the code I'm using:

def substitute(name, value, cts):
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""

pat = re.compile("\$<" + name + ">", re.IGNORECASE)

return pat.sub(val, cts) # this line causes the error (see below)

def escapeMetachars( s ):
"""
All metacharacters in the user provided substitution must
be escaped
"""
meta = r'\.^$+*?{[|()'
esc = ''

for c in s:
if c in meta:
esc += '\\' + c
else:
esc += c

return esc

cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""

name = 'OutputDir'
value = "c:\\apps\\whatever\\" # contains the backslash metachar

print substitute( escapeMetachars(name), value, cts)

I get this error:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in -toplevel-
pat.sub(s,cts)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
template = _compile_repl(template, pattern)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
raise error, v # invalid expression
error: bogus escape (end of line)

What on earth is this? O:)

PS: I can't use string.replace() for the susbtitution,because it must be
case-insensitive: the user might enter OUTPUTDIR, and it should still work.

it's the value of "value" that gives trouble (ending with a "bogus" \
followed by an (invisible) end-of-line.
This little patch will do the trick, and apparantly

def substitute(name, value, cts):
pat = re.compile("\$<" + name + ">", re.IGNORECASE)
if value[-1:] == '\\' :
value , suffix = value[:-1] , '\\'
else :
suffix = ''
return pat.sub(value[:-1], cts) + suffix

can't explain it though :)

you could try this as well:

def substitute2( name , value , cts ) :
ucts = cts.upper()
uname = name.upper()
parts = ucts.split( r'$<' + uname + '>' )
if len( parts ) != 2 :
raise 'Something'
return value.join( [ cts[:len(parts[0])] , cts[-len(parts[1]):]])


bye,
rm
 
D

Duncan Booth

Here's the code I'm using:

def substitute(name, value, cts):
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""

pat = re.compile("\$<" + name + ">", re.IGNORECASE)

return pat.sub(val, cts) # this line causes the error
(see below)

def escapeMetachars( s ):
"""
All metacharacters in the user provided substitution must
be escaped
"""
meta = r'\.^$+*?{[|()'
esc = ''

for c in s:
if c in meta:
esc += '\\' + c
else:
esc += c

return esc

cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""

You forgot to double some backslashes here, not that this is relevant to
your problem.
name = 'OutputDir'
value = "c:\\apps\\whatever\\" # contains the backslash metachar

print substitute( escapeMetachars(name), value, cts)

I get this error:
Traceback (most recent call last):
File "<pyshell#38>", line 1, in -toplevel-
pat.sub(s,cts)
Strangely, this line doesn't appear in the code you said you were using.
File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
template = _compile_repl(template, pattern)
File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
raise error, v # invalid expression
error: bogus escape (end of line)

What on earth is this? O:)

Whatever it is, its not what I get when I copy and paste your code.
I get a "NameError: global name 'val' is not defined" when I use the code
you posted. So, I deduce that you modified the code before posting it and
all bets are off.

However, it would appear that your main problem is that pat.sub will try to
unescape any backslashes in the replacement string, so you want to double
them all before using.

Another minor point is that your escapeMetachars looks to be a poor-man's
version of re.escape, so you might prefer to use re.escape instead.

Putting that all together gives:
OutputBaseFilename=$<OutputFileName>
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""
value = value.replace('\\', '\\\\')
name = re.escape(name)
pat = re.compile("\$<" + name + ">", re.IGNORECASE)

return pat.sub(value, cts)

Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=c:\apps\whatever\
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt


That will work fine so long as (a) you only have a few strings to
substitute, and (b) none of the replacements contain sequences looking like
substitution strings.

If you have more strings to substitute you might try making a dict of
replacements and applying them all at the same time:
"""
Finds all the occs in cts of $<name>
and replaces them with value
"""
pat = re.compile("\$<([^>]+)>", re.IGNORECASE)

def repl(match):
key = match.group(1).lower()
return mapping[key]

return pat.sub(repl, cts)

Compression=bzip/9
OutputBaseFilename=C:\output.txt
OutputDir=c:\apps\whatever\
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt
 
F

Fernando Rodriguez

it's the value of "value" that gives trouble (ending with a "bogus" \
followed by an (invisible) end-of-line.
This little patch will do the trick, and apparantly

But the final backslash is escaped, it shouldn't be a problem... :-(
 
R

Roel Mathys

Fernando said:
But the final backslash is escaped, it shouldn't be a problem... :-(

like I said, don't know
even stranger, with this value it does work

=> value = "c:\\apps\\whatever\\\\"

but like I showed in the previous post, you don't really need a regex
for this.

eg:

def substitute2( name , value , cts ) :
ucts = cts.upper()
uname = name.upper()
parts = ucts.split( r'$<' + uname + '>' )
if len( parts ) != 2 :
raise 'Something'
return value.join( [ cts[:len(parts[0])] , cts[-len(parts[1]):]])

will do the job just as fine

bye,
rm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top