search for messages in large files

Discussion in 'Perl Misc' started by Jman, Jun 25, 2003.

  1. Jman

    Jman Guest

    I am working with files that grow to a size of 1-2 mb each day
    of the month. The file is closed at the end of each month.
    The format of the messages is:

    aaaaaaaa YY-MN-DY HR:MN:SC MSG1 BBBB
    qqqq wwww eeee rrrr tttt
    yyyyyyy uuuuuuuuu
    iiii

    and

    aaaaaaaa yy-mn-dy hr:mn:sc MSG2 BBBB
    zzzz cccc
    kkkkkkkk

    lllllllll mmmmm nnnn

    I want to do a search of the files each day for some previous days messages.
    The important data in the message to me is the date (YY-MN-DY),
    and the MSG1 (actually MSG[1-50]). Some of the messages have data
    in every line (MSG1), and some messages have lines that are blank followed
    by lines with data. Is there a good, or simple way to gather into a new
    file
    all of the previous days MSGs that I want? Hope my question makes sense.
    Thanks
     
    Jman, Jun 25, 2003
    #1
    1. Advertising

  2. "Martien Verbruggen" <> wrote in message
    news:...
    > On Tue, 24 Jun 2003 19:27:20 -0700,
    > Jman <> wrote:
    > > I am working with files that grow to a size of 1-2 mb each day
    > > of the month. The file is closed at the end of each month.
    > > The format of the messages is:
    > >

    snip
    >
    > It would be better to include _real_ data from your log file, and even
    > better to show more than one record, so we can see whether there is
    > anything between records/messages that can be used.
    >
    > > I want to do a search of the files each day for some previous days

    messages.
    > > The important data in the message to me is the date (YY-MN-DY),
    > > and the MSG1 (actually MSG[1-50]). Some of the messages have data
    > > in every line (MSG1), and some messages have lines that are blank

    followed
    > > by lines with data. Is there a good, or simple way to gather into a new
    > > file
    > > all of the previous days MSGs that I want? Hope my question makes

    sense.
    >
    > Maybe something like (untested):
    >
    > my $yesterday = "03-06-25"; # assuming that that is the format
    > open F, "mylogfile" or die $!;
    > while (<F>)
    > {
    > if (/$yesterday.*MSG(\d\d?)/)
    > {
    > # We now have the message number in $1
    > # Since you're only interested in yesterday, you already know
    > # the date. No need to capture it.
    > print;
    > }
    > }
    > close F;
    >
    > I am assuming that none of the other lines have that pattern. I'm also
    > assuming that the BBBB bits above don't contain anything matching
    > 'MSG\d\d?', or if it foes that it's actually the correct number as
    > well.
    >
    > Hard to tell whether this is sufficient. You give us very little
    > information about what exactly you're having trouble with. next time,
    > apart from showing real data, also show us what you have tried (real
    > code), and which bit exactly you're having trouble with.
    >
    > Martien
    > --
    > |
    > Martien Verbruggen | True seekers can always find something to
    > Trading Post Australia | believe in.
    > |


    Below is an example of the data that I was trying to reflect.
    There is a CTRL M at end of each line after the line that starts "-----New".
    and there is a CTRL Y on the line prior to the line that starts "-----New".
    I am new at this obviously, my original approach was to delete the data that
    I don't need to try to group the messages into paragraphs:
    #!/usr/bin/perl -w
    while (<>) {
    s/^M|^Y|^-.*//;
    print:
    }

    Then I pipe that to another program:
    #!/usr/bin/perl -w
    $/ = "";
    while (<>) {
    print if / 03-06-01 /;
    }

    Here is some file data:

    -----New Message Received on 06-01-2003 at 00:00:03 -----

    S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
    REPT TIME 03-06-01 00:00:03

    -----New Message Received on 06-01-2003 at 00:00:06 -----

    S570-58785830 03-06-01 00:00:06 611262 SLC SANF
    * REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489

    -----New Message Received on 06-01-2003 at 02:47:03 -----

    S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
    A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
    SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
    OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
    DISCARD 0
    OPC 123083056 DPC 456041003 CIC 3004

    -----New Message Received on 06-01-2003 at 02:53:01 -----

    S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
    M REPT AUDSTAT COMPLETED

    ROUTINE AUDIT SCHEDULING IS ALLOWED

    -----New Message Received on 06-01-2003 at 02:54:01 -----

    S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
    A TRC IPCT EVENT 2621

    DN=9759 TERM=3-H'329f DIALED


    DN=5551212
    TIME 02:54:01
     
    Jim McTiernan, Jun 25, 2003
    #2
    1. Advertising

  3. Jman

    Jman Guest

    Actually, as I completely understand the contents of the file, and you do
    not,
    I am trying to explain what the contents of the file looks like, and have
    not
    changed my mind on anything. I was attempting to show how I have
    tried to handle my task, like you requested. I thought that it would be
    better to remove the control characters first, maybe this isn't necessary.
    The "MSG" data that I mentioned in my original posting are the second to
    last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
    Let's say I want to retrieve all of the MAINT messages from 03-06-13,
    what is the best way to do it. Using my style I end up creating large
    files,
    against which I run another script against, creating another large file,
    and running another script against it, until I finally get the data I want.
    I would like to be able to run one script, looking for any day of the month
    with a particular MSG.
    If you can offer anything, thanks, if not thanks anyway
    I am doing my best to explain


    "Martien Verbruggen" <> wrote in message
    news:...
    > On Wed, 25 Jun 2003 11:53:37 -0700,
    > Jim McTiernan <> wrote:
    > >
    > > "Martien Verbruggen" <> wrote in message
    > > news:...
    > >> On Tue, 24 Jun 2003 19:27:20 -0700,
    > >> Jman <> wrote:
    > >> > I am working with files that grow to a size of 1-2 mb each day
    > >> > of the month. The file is closed at the end of each month.
    > >> > The format of the messages is:
    > >> >

    > > snip
    > >>
    > >> It would be better to include _real_ data from your log file, and even
    > >> better to show more than one record, so we can see whether there is
    > >> anything between records/messages that can be used.
    > >>
    > >> > I want to do a search of the files each day for some previous days

    > > messages.
    > >> > The important data in the message to me is the date (YY-MN-DY),
    > >> > and the MSG1 (actually MSG[1-50]). Some of the messages have data
    > >> > in every line (MSG1), and some messages have lines that are blank

    > > followed
    > >> > by lines with data. Is there a good, or simple way to gather into a

    new
    > >> > file
    > >> > all of the previous days MSGs that I want? Hope my question makes

    > > sense.
    > >>
    > >> Maybe something like (untested):
    > >>
    > >> my $yesterday = "03-06-25"; # assuming that that is the format
    > >> open F, "mylogfile" or die $!;
    > >> while (<F>)
    > >> {
    > >> if (/$yesterday.*MSG(\d\d?)/)
    > >> {
    > >> # We now have the message number in $1
    > >> # Since you're only interested in yesterday, you already know
    > >> # the date. No need to capture it.
    > >> print;
    > >> }
    > >> }
    > >> close F;
    > >>
    > >> I am assuming that none of the other lines have that pattern. I'm also
    > >> assuming that the BBBB bits above don't contain anything matching
    > >> 'MSG\d\d?', or if it foes that it's actually the correct number as
    > >> well.
    > >>
    > >> Hard to tell whether this is sufficient. You give us very little
    > >> information about what exactly you're having trouble with. next time,
    > >> apart from showing real data, also show us what you have tried (real
    > >> code), and which bit exactly you're having trouble with.

    > >
    > > Below is an example of the data that I was trying to reflect.
    > > There is a CTRL M at end of each line after the line that starts

    "-----New".
    > > and there is a CTRL Y on the line prior to the line that starts

    "-----New".
    > > I am new at this obviously, my original approach was to delete the data

    that
    > > I don't need to try to group the messages into paragraphs:
    > > #!/usr/bin/perl -w
    > > while (<>) {
    > > s/^M|^Y|^-.*//;
    > > print:
    > > }

    >
    > So... You're removing any initial M or Y, or anything in a line that
    > initially starts with -?
    >
    > > Then I pipe that to another program:
    > > #!/usr/bin/perl -w
    > > $/ = "";
    > > while (<>) {
    > > print if / 03-06-01 /;
    > > }

    >
    > And now you print "paragraphs" that contain that date.
    >
    > > Here is some file data:
    > >
    > > -----New Message Received on 06-01-2003 at 00:00:03 -----
    > >
    > > S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
    > > REPT TIME 03-06-01 00:00:03
    > >

    >
    > Well.. That data doesn't look at all like what you described in your
    > original post. In your OP, you were talking about being interested in
    > some message number, and the date only. I don't see any message
    > number.
    >
    > Given that ctrl-Y seems to be the record separator, or terminator, I'd
    > probably set $/ to ctrl-Y, and then process the file message by
    > message, selecting on whichever criteria you want, and I'm more
    > confused now about what you do and don't want. I'll just make up
    > something, and leave it up to you to change it. You're not clear on
    > whether all of the dates in those messages can be used, or whether it
    > has to be one in the capitalised bits. I'll simply select on that
    > first line, because it's easier.
    >
    >
    > #!/usr/local/bin/perl
    > use strict;
    > use warnings;
    >
    > # Set record separator to ctrl-Y followed by a newline
    > $/ = "\cY\n";
    > my $target_date = "06-01-2003";
    >
    > while (<DATA>)
    > {
    > chomp;
    >
    > # We're only interested in records that contain our target date
    > next unless /Received on $target_date at/;
    >
    > # Remove any M or Y following a newline (Just following your code,
    > # I think)
    > s/\n(M|Y)/\n/g;
    >
    > # Remove that first line. We are not interested in it.
    > s/\A.*--\n//;
    >
    > # Print what's left
    > print;
    > }
    >
    > __DATA__
    > -----New Message Received on 06-01-2003 at 00:00:03 -----
    >
    > S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
    > REPT TIME 03-06-01 00:00:03
    > 
    > -----New Message Received on 06-01-2003 at 00:00:06 -----
    >
    > S570-58785830 03-06-01 00:00:06 611262 SLC SANF
    > * REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489
    > 
    > -----New Message Received on 06-01-2003 at 02:47:03 -----
    >
    > S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
    > A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
    > SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
    > OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
    > DISCARD 0
    > OPC 123083056 DPC 456041003 CIC 3004
    > 
    > -----New Message Received on 06-01-2003 at 02:53:01 -----
    >
    > S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
    > M REPT AUDSTAT COMPLETED
    >
    > ROUTINE AUDIT SCHEDULING IS ALLOWED
    > 
    > -----New Message Received on 06-01-2003 at 02:54:01 -----
    >
    > S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
    > A TRC IPCT EVENT 2621
    >
    > DN=9759 TERM=3-H'329f DIALED
    >
    >
    > DN=5551212
    > TIME 02:54:01
    > 
    >
    > Martien
    > --
    > |
    > Martien Verbruggen | Useful Statistic: 75% of the people make up
    > Trading Post Australia | 3/4 of the population.
    > |
     
    Jman, Jun 26, 2003
    #3
  4. Jman

    Sam Holden Guest

    On Wed, 25 Jun 2003 20:59:07 -0700, Jman <> wrote:
    > Actually, as I completely understand the contents of the file, and you do
    > not,
    > I am trying to explain what the contents of the file looks like, and have
    > not
    > changed my mind on anything. I was attempting to show how I have
    > tried to handle my task, like you requested. I thought that it would be
    > better to remove the control characters first, maybe this isn't necessary.
    > The "MSG" data that I mentioned in my original posting are the second to
    > last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...


    Of course, all the readers are psychic and knew that when you said "actually
    MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
    TIME, SLC, MDIIMON, etc...

    How foolish of those of us who can't read minds.

    --
    Sam Holden
     
    Sam Holden, Jun 26, 2003
    #4
  5. [Don't top post]


    On Wed, 25 Jun 2003 20:59:07 -0700,
    Jman <> wrote:
    > Actually, as I completely understand the contents of the file, and you do
    > not,
    > I am trying to explain what the contents of the file looks like, and have
    > not
    > changed my mind on anything. I was attempting to show how I have
    > tried to handle my task, like you requested. I thought that it would be
    > better to remove the control characters first, maybe this isn't necessary.
    > The "MSG" data that I mentioned in my original posting are the second to
    > last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
    > Let's say I want to retrieve all of the MAINT messages from 03-06-13,
    > what is the best way to do it. Using my style I end up creating large
    > files,


    How are we supposed to know that? You initially said something totally
    different from what is in the actual data that you finally posted.
    Your data does NOT contain any MSG followed by a number between 1 and
    50 at all, but that is what you originally stated. I provided some
    code to find that.

    Then you post actual data that looks completely different, and I again
    do my best to interpret what it is you mean from your half-arsed
    specification (including modifying the data according to your
    instructions), and again provide some code for you to start with.

    All you do is whinge that you're not getting a complete solution to
    your underspacified problem, instead of trying to clarify the
    confusion that you, yourself, created in the first place.

    > against which I run another script against, creating another large file,
    > and running another script against it, until I finally get the data I want.
    > I would like to be able to run one script, looking for any day of the month
    > with a particular MSG.
    > If you can offer anything, thanks, if not thanks anyway
    > I am doing my best to explain


    What was wrong with the suggestions I posted already? if you answer,
    please realise that i will not be reading it anymore.

    *plonk*

    [SNIP of TOFU]

    Martien
    --
    |
    Martien Verbruggen | Never hire a poor lawyer. Never buy from a
    Trading Post Australia | rich salesperson.
    |
     
    Martien Verbruggen, Jun 26, 2003
    #5
  6. "Sam Holden" <> wrote in message
    news:...
    > On Wed, 25 Jun 2003 20:59:07 -0700, Jman <> wrote:
    > > Actually, as I completely understand the contents of the file, and you

    do
    > > not,
    > > I am trying to explain what the contents of the file looks like, and

    have
    > > not
    > > changed my mind on anything. I was attempting to show how I have
    > > tried to handle my task, like you requested. I thought that it would be
    > > better to remove the control characters first, maybe this isn't

    necessary.
    > > The "MSG" data that I mentioned in my original posting are the second to
    > > last words on the first line of each message, e.g. TIME, SLC MDIIMON,

    etc...
    >
    > Of course, all the readers are psychic and knew that when you said

    "actually
    > MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
    > TIME, SLC, MDIIMON, etc...
    >
    > How foolish of those of us who can't read minds.

    I didn't think that it was that hard to understand.
    I attempted to recreate the format manually in my first posting.
    Sorry this bothered you.
    I am thru with this thread.
    >
    > --
    > Sam Holden
    >
     
    Jim McTiernan, Jun 26, 2003
    #6
  7. "Martien Verbruggen" <> wrote in message
    news:...
    > [Don't top post]
    >
    >
    > On Wed, 25 Jun 2003 20:59:07 -0700,
    > Jman <> wrote:
    > > Actually, as I completely understand the contents of the file, and you

    do
    > > not,
    > > I am trying to explain what the contents of the file looks like, and

    have
    > > not
    > > changed my mind on anything. I was attempting to show how I have
    > > tried to handle my task, like you requested. I thought that it would be
    > > better to remove the control characters first, maybe this isn't

    necessary.
    > > The "MSG" data that I mentioned in my original posting are the second to
    > > last words on the first line of each message, e.g. TIME, SLC MDIIMON,

    etc...
    > > Let's say I want to retrieve all of the MAINT messages from 03-06-13,
    > > what is the best way to do it. Using my style I end up creating large
    > > files,

    >
    > How are we supposed to know that? You initially said something totally
    > different from what is in the actual data that you finally posted.
    > Your data does NOT contain any MSG followed by a number between 1 and
    > 50 at all, but that is what you originally stated. I provided some
    > code to find that.
    >
    > Then you post actual data that looks completely different, and I again
    > do my best to interpret what it is you mean from your half-arsed
    > specification (including modifying the data according to your
    > instructions), and again provide some code for you to start with.

    You seem to be a little thick, you can't even see that I was using
    substitution in the original post for the actual data. In retrospect
    I would not do that again, it leads to a whole lot of complaining.
    >
    > All you do is whinge that you're not getting a complete solution to
    > your underspacified problem, instead of trying to clarify the
    > confusion that you, yourself, created in the first place.

    Where did I whinge that I am not getting a complete solution?
    I attempted to adjust my explanation to your crankiness.
    >
    > > against which I run another script against, creating another large file,
    > > and running another script against it, until I finally get the data I

    want.
    > > I would like to be able to run one script, looking for any day of the

    month
    > > with a particular MSG.
    > > If you can offer anything, thanks, if not thanks anyway
    > > I am doing my best to explain

    >
    > What was wrong with the suggestions I posted already? if you answer,
    > please realise that i will not be reading it anymore.

    The only thing wrong is your annoying attitude, goodbye.
    >
    > *plonk*
    >
    > [SNIP of TOFU]
    >
    > Martien
    > --
    > |
    > Martien Verbruggen | Never hire a poor lawyer. Never buy from a
    > Trading Post Australia | rich salesperson.
    > |
     
    Jim McTiernan, Jun 26, 2003
    #7
  8. Jim McTiernan <> wrote:
    > "Sam Holden" <> wrote in message
    > news:...
    >> On Wed, 25 Jun 2003 20:59:07 -0700, Jman <> wrote:


    >> > Actually, as I completely understand the contents of the file, and you

    > do
    >> > not,



    Right. So it is *your* responsibility to convey what you know to us
    if we are to be able to help you.


    >> Of course, all the readers are psychic and knew that when you said

    > "actually
    >> MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
    >> TIME, SLC, MDIIMON, etc...
    >>
    >> How foolish of those of us who can't read minds.


    > I didn't think that it was that hard to understand.



    That is irrelevant, since you were not explaining it to yourself.

    When writing, what matters is the _reader's_ perception, not
    the author's perception.


    > Sorry this bothered you.
    > I am thru with this thread.



    I am through with this poster.

    *plonk*


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jun 26, 2003
    #8
  9. Jman

    Sam Holden Guest

    On Thu, 26 Jun 2003 08:14:23 -0700,
    Jim McTiernan <> wrote:
    >
    > "Martien Verbruggen" <> wrote in message
    >>
    >> What was wrong with the suggestions I posted already? if you answer,
    >> please realise that i will not be reading it anymore.

    > The only thing wrong is your annoying attitude, goodbye.


    Let's hope you don't have any future perl problems/questions/issues since
    the 'experts' of the group (of which I am not one, obviously) aren't going
    to be reading them here...

    --
    Sam Holden
     
    Sam Holden, Jun 26, 2003
    #9
  10. "Sam Holden" <> wrote in message
    news:...
    > On Thu, 26 Jun 2003 08:14:23 -0700,
    > Jim McTiernan <> wrote:
    > >
    > > "Martien Verbruggen" <> wrote in message
    > >>
    > >> What was wrong with the suggestions I posted already? if you answer,
    > >> please realise that i will not be reading it anymore.

    > > The only thing wrong is your annoying attitude, goodbye.

    >
    > Let's hope you don't have any future perl problems/questions/issues since
    > the 'experts' of the group (of which I am not one, obviously) aren't going
    > to be reading them here...

    That's fine. I'll just won't be able to learn anything else about perl,
    or get to be part of these lively conversations.
    >
    > --
    > Sam Holden
    >
     
    Jim McTiernan, Jun 26, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    314
    Larry Bates
    Sep 12, 2005
  2. Bryan
    Replies:
    1
    Views:
    438
  3. Abby Lee
    Replies:
    5
    Views:
    420
    Abby Lee
    Aug 2, 2004
  4. mud_saisem

    Search a Large files backwards

    mud_saisem, Mar 2, 2010, in forum: Perl Misc
    Replies:
    7
    Views:
    148
    Uri Guttman
    Mar 2, 2010
  5. Replies:
    3
    Views:
    126
    Roy Smith
    Sep 25, 2013
Loading...

Share This Page