extract text from log file using re

Fabian Braennstroem · Sep 13, 2007

Hi,

I would like to delete a region on a log file which has this
kind of structure:

#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

end_number=0
begin_number=0

n = 0
for line in open(filename).readlines():
n = n + 1
if begin_of_res.match(line):
begin_number=n+1
print "Line Number (begin): " + str(n)

if end_of_res.match(line):
end_number=n
print "Line Number (end): " + str(n)

if begin_of_writing.match(line):
begin_w=n+1
print "BeginWriting: " + str(n)
print "HALLO"

if end_of_writing.match(line):
end_w=n+1
print "EndWriting: " +str(n)

if n > end_number:
end_number=n
print "Line Number (end): " + str(end_number)

n = 0
array = []
array_dummy = []
array_mapped = []

mapped = []
mappe = []

n = 0
for line in open(filename).readlines():
n = n + 1
if (begin_number <= n) and (end_number > n):
# if (begin_w <= n) and (end_w > n):
if not reversed_flow.match(line) and not
iteration.match(line) and not
turbulent_viscosity_ratio.match(line):
m=(line.strip().split())
print m
if len(m) > 0:
# print len(m)
laenge_liste=len(m)
# print len(m)
mappe.append(m)

#--end plot
residuals-------------------------------------------------

This works fine ; except for the region with the writing
information:

#-----writing information
-----------------------------------------
Writing "/home/fb/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
# -------end writing information -------------------------------

Does anyone know, how I can this 'writing' stuff too? The
matchingIt occurs a lot :-(

Regards!
Fabian

Peter Otten · Sep 14, 2007

Fabian said:
I would like to delete a region on a log file which has this
kind of structure:

#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

The following regular expressions have some extra backslashes
which change their meaning:

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

But I don't think you need regular expressions at all.
Also, it's better to iterate over the file just once because
you don't need to remember the position of regions to be skipped.
Here's a simplified demo:

def skip_region(items, start, end):
items = iter(items)
while 1:
for line in items:
if start(line):
break
yield line
else:
break
for line in items:
if end(line):
break
else:
break

def begin(line):
return line.strip() == "Writing"

def end(line):
return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
from StringIO import StringIO
return StringIO("""\
iteration # to be ignored
alpha
beta
reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
filename = "fluetest"
for line in skip_region(open(filename), begin, end):
line = line.strip()
if line and not line.startswith(("reversed", "iteration")):
print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different
start/end conditions.

Peter

Paul McGuire · Sep 14, 2007

Hi,

I would like to delete a region on a log file which has this
kind of structure:

How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul

# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))

for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Re for Apache log file format	4	Oct 8, 2013
Php combine identical lines in text file	4	Oct 11, 2023
extract certain values from file with re	8	Oct 6, 2006
Using re to get data from text file	6	Sep 10, 2004
How to match literal backslashes read from a text file using regular expressions?	2	Jul 12, 2005
Extract range of lines from a text file	29	Apr 9, 2006
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
search and replace in a file :: newbie help	3	Sep 9, 2006

extract text from log file using re

Fabian Braennstroem

Peter Otten

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads