extract text from log file using re

Discussion in 'Python' started by Fabian Braennstroem, Sep 13, 2007.

  1. Hi,

    I would like to delete a region on a log file which has this
    kind of structure:


    #------flutest------------------------------------------------------------
    498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
    499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
    reversed flow in 1 faces on pressure-outlet 35.

    Writing
    "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
    5429199 mixed cells, zone 29, binary.
    11187656 mixed interior faces, zone 30, binary.
    20004 triangular wall faces, zone 31, binary.
    1104 mixed velocity-inlet faces, zone 32, binary.
    133638 triangular wall faces, zone 33, binary.
    14529 triangular wall faces, zone 34, binary.
    1350 mixed pressure-outlet faces, zone 35, binary.
    11714 mixed wall faces, zone 36, binary.
    1232141 nodes, binary.
    1232141 node flags, binary.
    Done.


    Writing
    "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
    Done.

    500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
    8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

    reversed flow in 2 faces on pressure-outlet 35.
    501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

    #------------------------------------------------------------------

    I have a small script, which removes lines starting with
    '(re)versed', '(i)teration' and '(t)urbulent' and put the
    rest into an array:

    # -- plot residuals ----------------------------------------
    import re
    filename="flutest"
    reversed_flow=re.compile('^\ re')
    turbulent_viscosity_ratio=re.compile('^\ tu')
    iteration=re.compile('^\ \ i')

    begin_of_res=re.compile('>\ \ \ i')
    end_of_res=re.compile('^\ ad')

    begin_of_writing=re.compile('^\Writing')
    end_of_writing=re.compile('^\Done')

    end_number=0
    begin_number=0


    n = 0
    for line in open(filename).readlines():
    n = n + 1
    if begin_of_res.match(line):
    begin_number=n+1
    print "Line Number (begin): " + str(n)

    if end_of_res.match(line):
    end_number=n
    print "Line Number (end): " + str(n)

    if begin_of_writing.match(line):
    begin_w=n+1
    print "BeginWriting: " + str(n)
    print "HALLO"

    if end_of_writing.match(line):
    end_w=n+1
    print "EndWriting: " +str(n)

    if n > end_number:
    end_number=n
    print "Line Number (end): " + str(end_number)





    n = 0
    array = []
    array_dummy = []
    array_mapped = []

    mapped = []
    mappe = []

    n = 0
    for line in open(filename).readlines():
    n = n + 1
    if (begin_number <= n) and (end_number > n):
    # if (begin_w <= n) and (end_w > n):
    if not reversed_flow.match(line) and not
    iteration.match(line) and not
    turbulent_viscosity_ratio.match(line):
    m=(line.strip().split())
    print m
    if len(m) > 0:
    # print len(m)
    laenge_liste=len(m)
    # print len(m)
    mappe.append(m)


    #--end plot
    residuals-------------------------------------------------

    This works fine ; except for the region with the writing
    information:

    #-----writing information
    -----------------------------------------
    Writing "/home/fb/fluent-0500.cas"...
    5429199 mixed cells, zone 29, binary.
    11187656 mixed interior faces, zone 30, binary.
    20004 triangular wall faces, zone 31, binary.
    1104 mixed velocity-inlet faces, zone 32, binary.
    133638 triangular wall faces, zone 33, binary.
    14529 triangular wall faces, zone 34, binary.
    1350 mixed pressure-outlet faces, zone 35, binary.
    11714 mixed wall faces, zone 36, binary.
    1232141 nodes, binary.
    1232141 node flags, binary.
    Done.
    # -------end writing information -------------------------------

    Does anyone know, how I can this 'writing' stuff too? The
    matchingIt occurs a lot :-(

    Regards!
    Fabian
     
    Fabian Braennstroem, Sep 13, 2007
    #1
    1. Advertising

  2. Fabian Braennstroem

    Peter Otten Guest

    Fabian Braennstroem wrote:

    > I would like to delete a region on a log file which has this
    > kind of structure:
    >
    >
    > #------flutest------------------------------------------------------------
    > 498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    > 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
    > 499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    > 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
    > reversed flow in 1 faces on pressure-outlet 35.
    >
    > Writing
    > "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
    > 5429199 mixed cells, zone 29, binary.
    > 11187656 mixed interior faces, zone 30, binary.
    > 20004 triangular wall faces, zone 31, binary.
    > 1104 mixed velocity-inlet faces, zone 32, binary.
    > 133638 triangular wall faces, zone 33, binary.
    > 14529 triangular wall faces, zone 34, binary.
    > 1350 mixed pressure-outlet faces, zone 35, binary.
    > 11714 mixed wall faces, zone 36, binary.
    > 1232141 nodes, binary.
    > 1232141 node flags, binary.
    > Done.
    >
    >
    > Writing
    > "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
    > Done.
    >
    > 500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
    > 8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500
    >
    > reversed flow in 2 faces on pressure-outlet 35.
    > 501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
    > 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
    >
    > #------------------------------------------------------------------
    >
    > I have a small script, which removes lines starting with
    > '(re)versed', '(i)teration' and '(t)urbulent' and put the
    > rest into an array:
    >
    > # -- plot residuals ----------------------------------------
    > import re
    > filename="flutest"
    > reversed_flow=re.compile('^\ re')
    > turbulent_viscosity_ratio=re.compile('^\ tu')
    > iteration=re.compile('^\ \ i')
    >
    > begin_of_res=re.compile('>\ \ \ i')
    > end_of_res=re.compile('^\ ad')


    The following regular expressions have some extra backslashes
    which change their meaning:

    > begin_of_writing=re.compile('^\Writing')
    > end_of_writing=re.compile('^\Done')


    But I don't think you need regular expressions at all.
    Also, it's better to iterate over the file just once because
    you don't need to remember the position of regions to be skipped.
    Here's a simplified demo:

    def skip_region(items, start, end):
    items = iter(items)
    while 1:
    for line in items:
    if start(line):
    break
    yield line
    else:
    break
    for line in items:
    if end(line):
    break
    else:
    break

    def begin(line):
    return line.strip() == "Writing"

    def end(line):
    return line.strip() == "Done."

    # --- begin demo setup (remove to test with real data) ---
    def open(filename):
    from StringIO import StringIO
    return StringIO("""\
    iteration # to be ignored
    alpha
    beta
    reversed # to be ignored
    Writing
    to
    be
    ignored
    Done.
    gamma
    delta

    """)
    # --- end demo setup ---

    if __name__ == "__main__":
    filename = "fluetest"
    for line in skip_region(open(filename), begin, end):
    line = line.strip()
    if line and not line.startswith(("reversed", "iteration")):
    print line

    skip_region() takes a file (or any iterable) and two functions
    that test for the begin/end of the region to be skipped.
    You can nest skip_region() calls if you have regions with different
    start/end conditions.

    Peter
     
    Peter Otten, Sep 14, 2007
    #2
    1. Advertising

  3. Fabian Braennstroem

    Paul McGuire Guest

    On Sep 13, 4:09 pm, Fabian Braennstroem <> wrote:
    > Hi,
    >
    > I would like to delete a region on a log file which has this
    > kind of structure:
    >


    How about just searching for what you want. Here are two approaches,
    one using pyparsing, one using the batteries-included re module.

    -- Paul


    # -*- coding: iso-8859-15 -*-
    data = """\
    498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
    3.8560e-03 4.8384e-02 11:40:01 499
    499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
    3.8560e-03 4.8384e-02 11:40:01 499
    reversed flow in 1 faces on pressure-outlet 35.

    Writing
    "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
    fluent-050­0.cas"...
    5429199 mixed cells, zone 29, binary.
    11187656 mixed interior faces, zone 30, binary.
    20004 triangular wall faces, zone 31, binary.
    1104 mixed velocity-inlet faces, zone 32, binary.
    133638 triangular wall faces, zone 33, binary.
    14529 triangular wall faces, zone 34, binary.
    1350 mixed pressure-outlet faces, zone 35, binary.
    11714 mixed wall faces, zone 36, binary.
    1232141 nodes, binary.
    1232141 node flags, binary.
    Done.

    Writing
    "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
    fluent-050­0.dat"...
    Done.


    500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
    3.8545e-03 1.3315e-01 11:14:10 500


    reversed flow in 2 faces on pressure-outlet 35.
    501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
    3.8560e-03 4.8384e-02 11:40:01 499
    """

    print "search using pyparsing"
    from pyparsing import *

    integer = Word(nums).setParseAction(lambda t:int(t[0]))
    scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
    t:float(t[0]))
    time = Regex(r"\d\d:\d\d:\d\d")

    logline = (integer("testNum") +
    And([scireal]*7)("data") +
    time("testTime") +
    integer("result"))

    for tRes in logline.searchString(data):
    print "Test#:",tRes.testNum
    print "Data:", tRes.data
    print "Time:", tRes.testTime
    print "Output:", tRes.result
    print

    print
    print "search using re's"
    import re
    integer = r"\d*"
    scireal = r"\d*\.\d*e\-\d\d"
    time = r"\d\d:\d\d:\d\d"
    ws = r"\s*"

    namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
    logline = re.compile(
    namedField(integer,"testNum") + ws +
    namedField( (scireal+ws)*7,"data" ) +
    namedField(time,"testTime") + ws +
    namedField(integer,"result") )
    for m in logline.finditer(data):
    print "Test#:",int(m.group("testNum"))
    print "Data:", map(float,m.group("data").split())
    print "Time:", m.group("testTime")
    print "Output:", int(m.group("result"))
    print

    Prints:

    search using pyparsing
    Test#: 498
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499

    Test#: 499
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499

    Test#: 500
    Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
    0.00014865000000000001, 0.00083913, 0.0038544999999999999,
    0.13314999999999999]
    Time: 11:14:10
    Output: 500

    Test#: 501
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499


    search using re's
    Test#: 498
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499

    Test#: 499
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499

    Test#: 500
    Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
    0.00014865000000000001, 0.00083913, 0.0038544999999999999,
    0.13314999999999999]
    Time: 11:14:10
    Output: 500

    Test#: 501
    Data: [0.0010085999999999999, 0.00024607999999999997,
    9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
    0.0038560000000000001, 0.048384000000000003]
    Time: 11:40:01
    Output: 499
     
    Paul McGuire, Sep 14, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Henrik_the_boss
    Replies:
    0
    Views:
    2,711
    Henrik_the_boss
    Nov 5, 2003
  2. Amratash
    Replies:
    0
    Views:
    559
    Amratash
    Apr 13, 2004
  3. =?Utf-8?B?VG9tIFdpbmdlcnQ=?=

    My.Log.Writeexception not writing to Application Event Log.

    =?Utf-8?B?VG9tIFdpbmdlcnQ=?=, Jan 20, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    2,412
    =?Utf-8?B?VG9tIFdpbmdlcnQ=?=
    Jan 20, 2006
  4. unomystEz
    Replies:
    0
    Views:
    584
    unomystEz
    Nov 19, 2006
  5. Replies:
    0
    Views:
    266
Loading...

Share This Page