Re: Convert AWK regex to Python

Discussion in 'Python' started by J, May 16, 2011.

  1. J

    J Guest

    Hello Peter, Angelico,

    Ok lets see, My aim is to filter out several fields from a log file and write them to a new log file. The current log file, as I mentioned previously, has thousands of lines like this:-
    2011-05-16 09:46:22,361 [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_ CC_SMS_SERVICE_51408_656-ServerThread-VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004 Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >

    All the lines in the log file are similar and they all have the same length(same amount of fields). Most of the fields are separated by spaces except for couple of them which I am processing with AWK (removing "<G_" from the string for example). So in essence what I want to do is evaluate each line in the log file and break them down into fields which I can call individually and write them to a new log file (for example selecting only fields 1, 2 and 3).

    I hope this is clearer now

    Regards,

    Junior
    J, May 16, 2011
    #1
    1. Advertising

  2. On Mon, 16 May 2011 03:57:49 -0700, J wrote:

    > Most of the fields are separated by
    > spaces except for couple of them which I am processing with AWK
    > (removing "<G_" from the string for example). So in essence what I want
    > to do is evaluate each line in the log file and break them down into
    > fields which I can call individually and write them to a new log file
    > (for example selecting only fields 1, 2 and 3).


    fields = line.split(' ')
    output.write(fields[1] + ' ')
    output.write(fields[2] + ' ')
    output.write(fields[3] + '\n')



    --
    Steven
    Steven D'Aprano, May 16, 2011
    #2
    1. Advertising

  3. J

    Peter Otten Guest

    J wrote:

    > Hello Peter, Angelico,
    >
    > Ok lets see, My aim is to filter out several fields from a log file and
    > write them to a new log file. The current log file, as I mentioned
    > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
    > [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
    > CC_SMS_SERVICE_51408_656-ServerThread-

    VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
    > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
    > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
    >
    > All the lines in the log file are similar and they all have the same
    > length (same amount of fields). Most of the fields are separated by
    > spaces except for couple of them which I am processing with AWK (removing
    > "<G_" from the string for example). So in essence what I want to do is
    > evaluate each line in the log file and break them down into fields which I
    > can call individually and write them to a new log file (for example
    > selecting only fields 1, 2 and 3).
    >
    > I hope this is clearer now


    Not much :(

    It doesn't really matter whether there are 100, 1000, or a million lines in
    the file; the important information is the structure of the file. You may be
    able to get away with a quick and dirty script consisting of just a few
    regular expressions, e. g.

    import re

    filename = ...

    def get_service(line):
    return re.compile(r"[(](\w+)").search(line).group(1)

    def get_command(line):
    return re.compile(r"<G_(\w+)").search(line).group(1)

    def get_status(line):
    return re.compile(r"Status:\s+(\d+)").search(line).group(1)

    with open(filename) as infile:
    for line in infile:
    print get_service(line), get_command(line), get_status(line)

    but there is no guarantee that there isn't data in your file that breaks the
    implied assumptions. Also, from the shell hackery it looks like your
    ultimate goal seems to be a kind of frequency table which could be built
    along these lines:

    freq = {}
    with open(filename) as infile:
    for line in infile:
    service = get_service(line)
    command = get_command(line)
    status = get_status(line)
    key = command, service, status
    freq[key] = freq.get(key, 0) + 1

    for key, occurences in sorted(freq.iteritems()):
    print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
    + (occurences,))
    Peter Otten, May 16, 2011
    #3
  4. J

    J Guest

    Thanks for the sugestions Peter, I will give them a try

    Peter Otten wrote:
    > J wrote:
    >
    > > Hello Peter, Angelico,
    > >
    > > Ok lets see, My aim is to filter out several fields from a log file and
    > > write them to a new log file. The current log file, as I mentioned
    > > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
    > > [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
    > > CC_SMS_SERVICE_51408_656-ServerThread-

    > VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
    > > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
    > > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
    > >
    > > All the lines in the log file are similar and they all have the same
    > > length (same amount of fields). Most of the fields are separated by
    > > spaces except for couple of them which I am processing with AWK (removing
    > > "<G_" from the string for example). So in essence what I want to do is
    > > evaluate each line in the log file and break them down into fields which I
    > > can call individually and write them to a new log file (for example
    > > selecting only fields 1, 2 and 3).
    > >
    > > I hope this is clearer now

    >
    > Not much :(
    >
    > It doesn't really matter whether there are 100, 1000, or a million lines in
    > the file; the important information is the structure of the file. You may be
    > able to get away with a quick and dirty script consisting of just a few
    > regular expressions, e. g.
    >
    > import re
    >
    > filename = ...
    >
    > def get_service(line):
    > return re.compile(r"[(](\w+)").search(line).group(1)
    >
    > def get_command(line):
    > return re.compile(r"<G_(\w+)").search(line).group(1)
    >
    > def get_status(line):
    > return re.compile(r"Status:\s+(\d+)").search(line).group(1)
    >
    > with open(filename) as infile:
    > for line in infile:
    > print get_service(line), get_command(line), get_status(line)
    >
    > but there is no guarantee that there isn't data in your file that breaks the
    > implied assumptions. Also, from the shell hackery it looks like your
    > ultimate goal seems to be a kind of frequency table which could be built
    > along these lines:
    >
    > freq = {}
    > with open(filename) as infile:
    > for line in infile:
    > service = get_service(line)
    > command = get_command(line)
    > status = get_status(line)
    > key = command, service, status
    > freq[key] = freq.get(key, 0) + 1
    >
    > for key, occurences in sorted(freq.iteritems()):
    > print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
    > + (occurences,))
    J, May 16, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dan Jacobson
    Replies:
    2
    Views:
    448
    bthoren
    Jul 28, 2003
  2. Matthew Thorley

    python vs awk for simple sysamin tasks

    Matthew Thorley, Jun 3, 2004, in forum: Python
    Replies:
    20
    Views:
    1,590
    Donald 'Paddy' McCarthy
    Jun 5, 2004
  3. Daniel Nogradi

    text file parsing (awk -> python)

    Daniel Nogradi, Nov 22, 2006, in forum: Python
    Replies:
    3
    Views:
    591
  4. Replies:
    3
    Views:
    746
    Reedick, Andrew
    Jul 1, 2008
  5. J
    Replies:
    6
    Views:
    824
Loading...

Share This Page