Replace various regex

Martin · Feb 12, 2010

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

McColgst · Feb 12, 2010

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split: http://docs.python.org/library/stdtypes.html#str.split

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

I hope I understood correctly, and I hope that helps.

-sean

Martin · Feb 12, 2010

Hi,

Click to expand...

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

Click to expand...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Click to expand...

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

Click to expand...

The file read in is an ascii file.

Click to expand...

f = open(fname, 'r')
s = f.read()

Click to expand...

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

Click to expand...

where CANHT might be

Click to expand...

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

Click to expand...

But this involves me passing the entire string.

Martin

Click to expand...

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split:http://docs.python.org/library/stdtypes.html#str.split

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

I hope I understood correctly, and I hope that helps.

-sean

Hi I like the second suggestion, so this wouldn't rely on me having to
match the numbers only the string canht for example but still allow me
to replace the whole line, is that what you mean?

I tried it and the expression seemed to replace the entire file, so
perhaps i am doing something wrong. But in principle I think that
might be a better scheme than my current one. i tried

if CANHT:
#s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)
s = re.sub(r"[^! canht_ft]", CANHT, s)

MRAB · Feb 12, 2010

McColgst said:
If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split: http://docs.python.org/library/stdtypes.html#str.split

The .split method is the best way if you process the file a line at a
time. The .split method, incidentally, accepts a maxcount argument so
that you can split a line no more than once.

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
the line.

Martin · Feb 12, 2010

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

Click to expand...

check out split:http://docs.python.org/library/stdtypes.html#str.split

Click to expand...

The .split method is the best way if you process the file a line at a
time. The .split method, incidentally, accepts a maxcount argument so
that you can split a line no more than once.

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

Click to expand...

The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
the line.

I hope I understood correctly, and I hope that helps.

Click to expand...

I guess I could read the file a line at a time and try splitting it,
though I though it would be better to read it all once then search for
the various regex I need to match and replace?

I am not sure that regex helps, as that would match and replace every
line which had a "!". Perhaps if i explain more thoroughly?

So the input file looks something like this...

9*0.0 ! canopy(1:ntiles)
12.100 ! cs
0.0 ! gs
9*50.0 ! rgrain(1:ntiles)
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
9*0.46 ! snow_tile(1:ntiles)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
9*276.78 ! tstar_tile(1:ntiles)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

So for each of the strings following the "!" I may potentially want to
match them and replace some of the numbers. That is I might search for
the expression snow_grnd with the intention of substituting 0.46 for
another number. What i came up with was a way to match all the numbers
and pass the replacement string.

Jean-Michel Pichavant · Feb 15, 2010

Martin said:
Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(?

?:%(n)s)[,\s]+){0,%(index)s})(?

%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

Martin · Feb 15, 2010

Martin said:
Martin said:

Hi,

Click to expand...

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

Click to expand...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Click to expand...

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

Click to expand...

The file read in is an ascii file.

Click to expand...

f = open(fname, 'r')
s = f.read()

Click to expand...

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

Click to expand...

where CANHT might be

Click to expand...

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

Click to expand...

But this involves me passing the entire string.

Martin

Click to expand...

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(??:%(n)s)[,\s]+){0,%(index)s})(?%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

and the inside of this func looks like this

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

f.close()
of.close()

return

Jean-Michel Pichavant · Feb 15, 2010

Martin said:
Martin said:

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

Click to expand...

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(??:%(n)s)[,\s]+){0,%(index)s})(?%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

Click to expand...

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

and the inside of this func looks like this

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

f.close()
of.close()

return

Are you sure a simple
if variable_tag in line:
# do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)

JM

Martin · Feb 15, 2010

Martin said:
Martin said:

On Feb 15, 2:03 pm, Jean-Michel Pichavant <[email protected]>
wrote:

Martin wrote:
Hi,
I am trying to come up with a more generic scheme to match and replace
a series ofregex, which look something like this...
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...
The file read in is an ascii file.
f = open(fname, 'r')
s = f.read()
if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)
where CANHT might be
CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
But this involves me passing the entire string.
Thanks.
Martin
I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.
import re
replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}
testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""
outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(??:%(n)s)[,\s]+){0,%(index)s})(?%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'
print outputBuffer

Click to expand...

Click to expand...

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

Click to expand...

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

Click to expand...

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

Click to expand...

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

Click to expand...

and the inside of this func looks like this

Click to expand...

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

Click to expand...

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

Click to expand...

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

Click to expand...

if pattern_found:
pattern_found = False
else:
of.write(line)

Click to expand...

f.close()
of.close()

Click to expand...

return

Click to expand...

Are you sure a simple
if variable_tag in line:
# do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)

JM

My while loop is suitably offended. I have changed it as you
suggested...though if I do: if pattern (variable_tag) in line as you
suggested i would in my example correctly pick the tag lai, but also
one called dcatch_lai, which I wouldn't want. No doubt there is an
obvious solution I am again missing!

of = open(out_fname, 'w')
pattern_found = False

for line in open(in_fname, 'r'):
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

of.close()

Many Thanks.

Find and replace multiple RegEx search expressions	0	Mar 18, 2014
Regex not matching a string	2	Jan 9, 2013
mmap regex search replace	0	Apr 3, 2009
python replace/sub/wildcard/regex issue	4	Jan 19, 2010
Questions about regex	3	May 29, 2009
replace regex in file using a dictionary	3	Apr 5, 2011
regex question	5	Aug 5, 2008
compound regex	0	Feb 9, 2009

Replace various regex

Martin

McColgst

Martin

MRAB

Martin

Jean-Michel Pichavant

Martin

Jean-Michel Pichavant

Martin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads