Regular expression help

N

nclbndk759

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):

def get_data_points(filename):
file = open(filename,'r')
data_points = []
while 1:
line = file.readline()
if not line: break
energy = get_total_energy(line)
data_points.append(energy)
return data_points

def get_total_energy(line):
rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
p = re.compile(rawstr)
return p.match(line,5)

What is being stored in energy is '<_sre.SRE_Match object at
0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
regular expressions for two days now, with no luck. Could someone
please put me out of my misery and give me a clue as to what's going
on? Apologies if it's blindingly obvious or if this question has been
asked and answered before.

Thanks,

Nicole
 
B

Brad

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot...

Why not just split them out instead of using REs?

fp = open("test.txt")
lines = fp.readlines()
fp.close()

for line in lines:
split = line.split()
for pair in split:
pair_split = pair.split("=")
if len(pair_split) == 2:
try:
print pair_split[0], "is", pair_split[1]
except:
pass

Results:

IDLE 1.2.2 ==== No Subprocess ====afrac is .7
mmom is 0
sev is -9.56646
erep is 0
etot is -11.020107
emad is -3.597647
3pv is 0
 
G

Gerard flanagan

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):

def get_data_points(filename):
file = open(filename,'r')
data_points = []
while 1:
line = file.readline()
if not line: break
energy = get_total_energy(line)
data_points.append(energy)
return data_points

def get_total_energy(line):
rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
p = re.compile(rawstr)
return p.match(line,5)

What is being stored in energy is '<_sre.SRE_Match object at
0x2a955e4ed0>', not '-11.020107'. Why?



1. Consider using the 'split' method on each line rather than regexes
2. In your code you are compiling the regex for every line in the file,
you should lift it out of the 'get_total-energy' function so that the
compilation is only done once.
3. A Match object has a 'groups' function which is what you need to
retrieve the data
4. Also look at the findall method:

data = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
emad=-3.597647 3pv=0 '

import re

rx = re.compile(r'(\w+)=(\S+)')

data = dict(rx.findall(data))

print data

hth

G.
 
N

nclbndk759

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I think you're over-complicating this. I'm assuming that you're going to
do a line graph of some sorta, and each new line of the file contains a
new set of data.

The problem you mentioned with your regex returning a match object
rather than a string is because you're simply using a re function that
doesn't return strings. re.findall() is what you want. That being said,
here is working code to mine data from your file.

Code:
line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
mad=-3.597647 3pv=0'

energypat = r'\betot=(-?\d*?[.]\d*)'

#Note: To change the data grabbed from the line, you can change the
#'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
#special character.

energypat = re.compile(energypat)

re.findall(energypat, line)# returns a STRING containing '-12.020107'

This returns a string, which is easy enough to convert to an int. After
that, you can datapoints.append() to your heart's content. Good luck
with your work.



I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like
c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0
extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):
def get_data_points(filename):
    file = open(filename,'r')
    data_points = []
    while 1:
        line = file.readline()
        if not line: break
        energy = get_total_energy(line)
        data_points.append(energy)
    return data_points
def get_total_energy(line):
    rawstr = r"""(?P<key>.*?)=(?P<value>.*?)\s"""
    p = re.compile(rawstr)
    return p.match(line,5)
What is being stored in energy is '<_sre.SRE_Match object at
0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
regular expressions for two days now, with no luck. Could someone
please put me out of my misery and give me a clue as to what's going
on? Apologies if it's blindingly obvious or if this question has been
asked and answered before.

Nicole

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org

iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
=L2VZ
-----END PGP SIGNATURE-----

Thanks guys :)
 
M

Marc 'BlackJack' Rintsch

values = {}
for expression in line.split(" "):
if "=" in expression:
name, val = expression.split("=")
values[name] = val
[…]

And when you get to be a really hard-core Pythonista, you could write
the whole routine above in one line, but this seems clearer. ;-)

I know it's a matter of taste but I think the one liner is still clear
(enough)::

values = dict(s.split('=') for s in line.split() if '=' in s)

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top