Replace various regex

M

Martin

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin
 
M

McColgst

Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0      !  lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
    s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+   !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00   !  canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split: http://docs.python.org/library/stdtypes.html#str.split

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

I hope I understood correctly, and I hope that helps.

-sean
 
M

Martin

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...
19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0      !  lai(1:npft)
Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...
The file read in is an ascii file.
f = open(fname, 'r')
s = f.read()
if CANHT:
    s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+   !
canht_ft", CANHT, s)
where CANHT might be
CANHT = '115.01,16.38,0.79,1.26,1.00   !  canht_ft'
But this involves me passing the entire string.

Martin

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00  ,  canht_ft(1:npft) )  and you can do
whatever replace functions you want using the list.

check out split:http://docs.python.org/library/stdtypes.html#str.split

Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

I hope I understood correctly, and I hope that helps.

-sean

Hi I like the second suggestion, so this wouldn't rely on me having to
match the numbers only the string canht for example but still allow me
to replace the whole line, is that what you mean?

I tried it and the expression seemed to replace the entire file, so
perhaps i am doing something wrong. But in principle I think that
might be a better scheme than my current one. i tried

if CANHT:
#s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)
s = re.sub(r"[^! canht_ft]", CANHT, s)
 
M

MRAB

McColgst said:
If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00 , canht_ft(1:npft) ) and you can do
whatever replace functions you want using the list.

check out split: http://docs.python.org/library/stdtypes.html#str.split
The .split method is the best way if you process the file a line at a
time. The .split method, incidentally, accepts a maxcount argument so
that you can split a line no more than once.
Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)
The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
the line.
 
M

Martin

If I understand correctly, there are a couple ways to do it.
One is to use .split() and split by the '!' sign, given that you wont
have more than one '!' on a line. This will return a list of the words
split by the delimiter, in this case being '!', so you should get back
(19.01,16.38,0.79,1.26,1.00  ,  canht_ft(1:npft) )  and you can do
whatever replace functions you want using the list.

The .split method is the best way if you process the file a line at a
time. The .split method, incidentally, accepts a maxcount argument so
that you can split a line no more than once.
Another, is in your regular expression, you can match the first part
or second part of the string by specifying where the '!' is,
if you want to match the part after the '!' I would do something like
r"[^! cahnt_ft]", or something similar (i'm not particularly up-to-
date with my regex syntax, but I think you get the idea.)

The regex would be r"(?m)^[^!]*(!.*)" to capture the '!' and the rest of
the line.


I hope I understood correctly, and I hope that helps.

I guess I could read the file a line at a time and try splitting it,
though I though it would be better to read it all once then search for
the various regex I need to match and replace?

I am not sure that regex helps, as that would match and replace every
line which had a "!". Perhaps if i explain more thoroughly?

So the input file looks something like this...

9*0.0 ! canopy(1:ntiles)
12.100 ! cs
0.0 ! gs
9*50.0 ! rgrain(1:ntiles)
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
9*0.46 ! snow_tile(1:ntiles)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
9*276.78 ! tstar_tile(1:ntiles)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

So for each of the strings following the "!" I may potentially want to
match them and replace some of the numbers. That is I might search for
the expression snow_grnd with the intention of substituting 0.46 for
another number. What i came up with was a way to match all the numbers
and pass the replacement string.
 
J

Jean-Michel Pichavant

Martin said:
Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer
 
M

Martin

Martin said:
I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...
The file read in is an ascii file.
f = open(fname, 'r')
s = f.read()
if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)
where CANHT might be
CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'
But this involves me passing the entire string.

Martin

I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

and the inside of this func looks like this

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

f.close()
of.close()

return
 
J

Jean-Michel Pichavant

Martin said:
Martin said:
Hi,

I am trying to come up with a more generic scheme to match and replace
a series of regex, which look something like this...

19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)

Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...

The file read in is an ascii file.

f = open(fname, 'r')
s = f.read()

if CANHT:
s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+ !
canht_ft", CANHT, s)

where CANHT might be

CANHT = '115.01,16.38,0.79,1.26,1.00 ! canht_ft'

But this involves me passing the entire string.

Thanks.

Martin
I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.

import re

replace = {
'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
}

testBuffer = """
0.749, 0.743, 0.754, 0.759 ! stheta(1:sm_levels)(top to bottom)
0.46 ! snow_grnd
276.78,277.46,278.99,282.48 ! t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00 ! canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 ! lai(1:npft)
"""

outputBuffer = ''
for line in testBuffer.split('\n'):
for key, (index, repl) in replace.items():
if key in line:
parameters = {
'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
'index' : index - 1,
}
# the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
pattern =
'(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
break
outputBuffer += line +'\n'

print outputBuffer

Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.

variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]

# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' ! ' +
variable_tag

# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)

and the inside of this func looks like this

def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):

f = open(in_fname, 'r')
of = open(out_fname, 'w')
pattern_found = False

while True:
line = f.readline()
if not line:
break
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print 'yes'
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

f.close()
of.close()

return

Are you sure a simple
if variable_tag in line:
# do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)


JM
 
M

Martin

Martin said:
On Feb 15, 2:03 pm, Jean-Michel Pichavant <[email protected]>
wrote:
Martin wrote:
Hi,
I am trying to come up with a more generic scheme to match and replace
a series ofregex, which look something like this...
19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
5.0, 4.0, 2.0, 4.0, 1.0      !  lai(1:npft)
Ideally match the pattern to the right of the "!" sign (e.g. lai), I
would then like to be able to replace one or all of the corresponding
numbers on the line. So far I have a rather unsatisfactory solution,
any suggestions would be appreciated...
The file read in is an ascii file.
f = open(fname, 'r')
s = f.read()
if CANHT:
    s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+   !
canht_ft", CANHT, s)
where CANHT might be
CANHT = '115.01,16.38,0.79,1.26,1.00   !  canht_ft'
But this involves me passing the entire string.
Thanks.
Martin
I remove all lines containing things like 9*0.0 in your file, cause I
don't know what they mean and how to handle them. These are not numbers.
import re
replace = {
    'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
    't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
    }
testBuffer = """
 0.749, 0.743, 0.754, 0.759  !  stheta(1:sm_levels)(top to bottom)
0.46                         !  snow_grnd
276.78,277.46,278.99,282.48  !  t_soil(1:sm_levels)(top to bottom)
19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
200.0, 4.0, 2.0, 4.0, 1.0 !  lai(1:npft)
"""
outputBuffer = ''
for line in testBuffer.split('\n'):
    for key, (index, repl) in replace.items():
        if key in line:
            parameters = {
                'n' : '[\d\.]+', # given you example you have to change
this one, I don't know what means 9*0.0 in your file
                'index' : index - 1,
            }
            # the following pattern will silently match any digit before
the <index>th digit is found, and use a capturing parenthesis for the last
            pattern =
'(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
parameters # regexp are sometimes a nightmare to read
            line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
            break
    outputBuffer += line +'\n'
print outputBuffer
Thanks I will take a look. I think perhaps I was having a very slow
day when I posted and realised I could solve the original problem more
efficiently and the problem wasn't perhaps as I first perceived. It is
enough to match the tag to the right of the "!" sign and use this to
adjust what lies on the left of the "!" sign. Currently I have
this...if anyone thinks there is a neater solution I am happy to hear
it. Many thanks.
variable_tag = 'lai'
variable = [200.0, 60.030, 0.060, 0.030, 0.030]
# generate adjustment string
variable = ",".join(["%s" % i for i in variable]) + ' !  ' +
variable_tag
# call func to adjust input file
adjustStandardPftParams(variable, variable_tag, in_param_fname,
out_param_fname)
and the inside of this func looks like this
def adjustStandardPftParams(self, variable, variable_tag, in_fname,
out_fname):
    f = open(in_fname, 'r')
    of = open(out_fname, 'w')
    pattern_found = False
    while True:
        line = f.readline()
        if not line:
            break
        pattern = re.findall(r"!\s+"+variable_tag, line)
        if pattern:
            print 'yes'
            print >> of, "%s" % variable
       pattern_found = True
        if pattern_found:
            pattern_found = False
        else:
            of.write(line)
    f.close()
    of.close()
    return

Are you sure a simple
if variable_tag in line:
    # do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)

JM

My while loop is suitably offended. I have changed it as you
suggested...though if I do: if pattern (variable_tag) in line as you
suggested i would in my example correctly pick the tag lai, but also
one called dcatch_lai, which I wouldn't want. No doubt there is an
obvious solution I am again missing!

of = open(out_fname, 'w')
pattern_found = False

for line in open(in_fname, 'r'):
pattern = re.findall(r"!\s+"+variable_tag, line)
if pattern:
print >> of, "%s" % variable
pattern_found = True

if pattern_found:
pattern_found = False
else:
of.write(line)

of.close()

Many Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,058
Latest member
QQXCharlot

Latest Threads

Top