search for messages in large files

J

Jman

I am working with files that grow to a size of 1-2 mb each day
of the month. The file is closed at the end of each month.
The format of the messages is:

aaaaaaaa YY-MN-DY HR:MN:SC MSG1 BBBB
qqqq wwww eeee rrrr tttt
yyyyyyy uuuuuuuuu
iiii

and

aaaaaaaa yy-mn-dy hr:mn:sc MSG2 BBBB
zzzz cccc
kkkkkkkk

lllllllll mmmmm nnnn

I want to do a search of the files each day for some previous days messages.
The important data in the message to me is the date (YY-MN-DY),
and the MSG1 (actually MSG[1-50]). Some of the messages have data
in every line (MSG1), and some messages have lines that are blank followed
by lines with data. Is there a good, or simple way to gather into a new
file
all of the previous days MSGs that I want? Hope my question makes sense.
Thanks
 
J

Jim McTiernan

Martien Verbruggen said:
I am working with files that grow to a size of 1-2 mb each day
of the month. The file is closed at the end of each month.
The format of the messages is:
snip

It would be better to include _real_ data from your log file, and even
better to show more than one record, so we can see whether there is
anything between records/messages that can be used.
I want to do a search of the files each day for some previous days messages.
The important data in the message to me is the date (YY-MN-DY),
and the MSG1 (actually MSG[1-50]). Some of the messages have data
in every line (MSG1), and some messages have lines that are blank followed
by lines with data. Is there a good, or simple way to gather into a new
file
all of the previous days MSGs that I want? Hope my question makes
sense.

Maybe something like (untested):

my $yesterday = "03-06-25"; # assuming that that is the format
open F, "mylogfile" or die $!;
while (<F>)
{
if (/$yesterday.*MSG(\d\d?)/)
{
# We now have the message number in $1
# Since you're only interested in yesterday, you already know
# the date. No need to capture it.
print;
}
}
close F;

I am assuming that none of the other lines have that pattern. I'm also
assuming that the BBBB bits above don't contain anything matching
'MSG\d\d?', or if it foes that it's actually the correct number as
well.

Hard to tell whether this is sufficient. You give us very little
information about what exactly you're having trouble with. next time,
apart from showing real data, also show us what you have tried (real
code), and which bit exactly you're having trouble with.

Martien
--
|
Martien Verbruggen | True seekers can always find something to
Trading Post Australia | believe in.
|

Below is an example of the data that I was trying to reflect.
There is a CTRL M at end of each line after the line that starts "-----New".
and there is a CTRL Y on the line prior to the line that starts "-----New".
I am new at this obviously, my original approach was to delete the data that
I don't need to try to group the messages into paragraphs:
#!/usr/bin/perl -w
while (<>) {
s/^M|^Y|^-.*//;
print:
}

Then I pipe that to another program:
#!/usr/bin/perl -w
$/ = "";
while (<>) {
print if / 03-06-01 /;
}

Here is some file data:

-----New Message Received on 06-01-2003 at 00:00:03 -----

S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
REPT TIME 03-06-01 00:00:03

-----New Message Received on 06-01-2003 at 00:00:06 -----

S570-58785830 03-06-01 00:00:06 611262 SLC SANF
* REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489

-----New Message Received on 06-01-2003 at 02:47:03 -----

S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
DISCARD 0
OPC 123083056 DPC 456041003 CIC 3004

-----New Message Received on 06-01-2003 at 02:53:01 -----

S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
M REPT AUDSTAT COMPLETED

ROUTINE AUDIT SCHEDULING IS ALLOWED

-----New Message Received on 06-01-2003 at 02:54:01 -----

S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
A TRC IPCT EVENT 2621

DN=9759 TERM=3-H'329f DIALED


DN=5551212
TIME 02:54:01
 
J

Jman

Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
Let's say I want to retrieve all of the MAINT messages from 03-06-13,
what is the best way to do it. Using my style I end up creating large
files,
against which I run another script against, creating another large file,
and running another script against it, until I finally get the data I want.
I would like to be able to run one script, looking for any day of the month
with a particular MSG.
If you can offer anything, thanks, if not thanks anyway
I am doing my best to explain


Martien Verbruggen said:
Martien Verbruggen said:
On Tue, 24 Jun 2003 19:27:20 -0700,
I am working with files that grow to a size of 1-2 mb each day
of the month. The file is closed at the end of each month.
The format of the messages is:
snip

It would be better to include _real_ data from your log file, and even
better to show more than one record, so we can see whether there is
anything between records/messages that can be used.

I want to do a search of the files each day for some previous days messages.
The important data in the message to me is the date (YY-MN-DY),
and the MSG1 (actually MSG[1-50]). Some of the messages have data
in every line (MSG1), and some messages have lines that are blank followed
by lines with data. Is there a good, or simple way to gather into a new
file
all of the previous days MSGs that I want? Hope my question makes sense.

Maybe something like (untested):

my $yesterday = "03-06-25"; # assuming that that is the format
open F, "mylogfile" or die $!;
while (<F>)
{
if (/$yesterday.*MSG(\d\d?)/)
{
# We now have the message number in $1
# Since you're only interested in yesterday, you already know
# the date. No need to capture it.
print;
}
}
close F;

I am assuming that none of the other lines have that pattern. I'm also
assuming that the BBBB bits above don't contain anything matching
'MSG\d\d?', or if it foes that it's actually the correct number as
well.

Hard to tell whether this is sufficient. You give us very little
information about what exactly you're having trouble with. next time,
apart from showing real data, also show us what you have tried (real
code), and which bit exactly you're having trouble with.

Below is an example of the data that I was trying to reflect.
There is a CTRL M at end of each line after the line that starts "-----New".
and there is a CTRL Y on the line prior to the line that starts "-----New".
I am new at this obviously, my original approach was to delete the data that
I don't need to try to group the messages into paragraphs:
#!/usr/bin/perl -w
while (<>) {
s/^M|^Y|^-.*//;
print:
}

So... You're removing any initial M or Y, or anything in a line that
initially starts with -?
Then I pipe that to another program:
#!/usr/bin/perl -w
$/ = "";
while (<>) {
print if / 03-06-01 /;
}

And now you print "paragraphs" that contain that date.
Here is some file data:

-----New Message Received on 06-01-2003 at 00:00:03 -----

S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
REPT TIME 03-06-01 00:00:03

Well.. That data doesn't look at all like what you described in your
original post. In your OP, you were talking about being interested in
some message number, and the date only. I don't see any message
number.

Given that ctrl-Y seems to be the record separator, or terminator, I'd
probably set $/ to ctrl-Y, and then process the file message by
message, selecting on whichever criteria you want, and I'm more
confused now about what you do and don't want. I'll just make up
something, and leave it up to you to change it. You're not clear on
whether all of the dates in those messages can be used, or whether it
has to be one in the capitalised bits. I'll simply select on that
first line, because it's easier.


#!/usr/local/bin/perl
use strict;
use warnings;

# Set record separator to ctrl-Y followed by a newline
$/ = "\cY\n";
my $target_date = "06-01-2003";

while (<DATA>)
{
chomp;

# We're only interested in records that contain our target date
next unless /Received on $target_date at/;

# Remove any M or Y following a newline (Just following your code,
# I think)
s/\n(M|Y)/\n/g;

# Remove that first line. We are not interested in it.
s/\A.*--\n//;

# Print what's left
print;
}

__DATA__
-----New Message Received on 06-01-2003 at 00:00:03 -----

S21D-685375656 03-06-01 00:00:03 611259 TIME SANF
REPT TIME 03-06-01 00:00:03

-----New Message Received on 06-01-2003 at 00:00:06 -----

S570-58785830 03-06-01 00:00:06 611262 SLC SANF
* REPT RT SID=2050 DNUSRT=2-0-60 MINOR FAR END EVENT=12489

-----New Message Received on 06-01-2003 at 02:47:03 -----

S570-58785830 03-06-01 02:47:03 612603 MDIIMON SANF
A REPT MDII CVN SIGTYPE ISUP TKGMN 303-4 SZ 168 OOS 0 ID
SUPRVSN TIME 02:47:03 NEN=2-0-0-1-1-4-3-4 TRIAL 1 CARRFLAG NC
OGT NORMAL CALL CALLED-NO 1288 CALLING-NO 9033
DISCARD 0
OPC 123083056 DPC 456041003 CIC 3004

-----New Message Received on 06-01-2003 at 02:53:01 -----

S32C-942407807 03-06-01 02:53:01 612617 MAINT SANF
M REPT AUDSTAT COMPLETED

ROUTINE AUDIT SCHEDULING IS ALLOWED

-----New Message Received on 06-01-2003 at 02:54:01 -----

S570-58785830 03-06-01 02:54:01 612619 TRCE SANF
A TRC IPCT EVENT 2621

DN=9759 TERM=3-H'329f DIALED


DN=5551212
TIME 02:54:01


Martien
--
|
Martien Verbruggen | Useful Statistic: 75% of the people make up
Trading Post Australia | 3/4 of the population.
|
 
S

Sam Holden

Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...

Of course, all the readers are psychic and knew that when you said "actually
MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
TIME, SLC, MDIIMON, etc...

How foolish of those of us who can't read minds.
 
M

Martien Verbruggen

[Don't top post]


Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
Let's say I want to retrieve all of the MAINT messages from 03-06-13,
what is the best way to do it. Using my style I end up creating large
files,

How are we supposed to know that? You initially said something totally
different from what is in the actual data that you finally posted.
Your data does NOT contain any MSG followed by a number between 1 and
50 at all, but that is what you originally stated. I provided some
code to find that.

Then you post actual data that looks completely different, and I again
do my best to interpret what it is you mean from your half-arsed
specification (including modifying the data according to your
instructions), and again provide some code for you to start with.

All you do is whinge that you're not getting a complete solution to
your underspacified problem, instead of trying to clarify the
confusion that you, yourself, created in the first place.
against which I run another script against, creating another large file,
and running another script against it, until I finally get the data I want.
I would like to be able to run one script, looking for any day of the month
with a particular MSG.
If you can offer anything, thanks, if not thanks anyway
I am doing my best to explain

What was wrong with the suggestions I posted already? if you answer,
please realise that i will not be reading it anymore.

*plonk*

[SNIP of TOFU]

Martien
 
J

Jim McTiernan

Sam Holden said:
Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON,
etc...

Of course, all the readers are psychic and knew that when you said "actually
MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
TIME, SLC, MDIIMON, etc...

How foolish of those of us who can't read minds.
I didn't think that it was that hard to understand.
I attempted to recreate the format manually in my first posting.
Sorry this bothered you.
I am thru with this thread.
 
J

Jim McTiernan

Martien Verbruggen said:
[Don't top post]


Actually, as I completely understand the contents of the file, and you do
not,
I am trying to explain what the contents of the file looks like, and have
not
changed my mind on anything. I was attempting to show how I have
tried to handle my task, like you requested. I thought that it would be
better to remove the control characters first, maybe this isn't necessary.
The "MSG" data that I mentioned in my original posting are the second to
last words on the first line of each message, e.g. TIME, SLC MDIIMON, etc...
Let's say I want to retrieve all of the MAINT messages from 03-06-13,
what is the best way to do it. Using my style I end up creating large
files,

How are we supposed to know that? You initially said something totally
different from what is in the actual data that you finally posted.
Your data does NOT contain any MSG followed by a number between 1 and
50 at all, but that is what you originally stated. I provided some
code to find that.

Then you post actual data that looks completely different, and I again
do my best to interpret what it is you mean from your half-arsed
specification (including modifying the data according to your
instructions), and again provide some code for you to start with.
You seem to be a little thick, you can't even see that I was using
substitution in the original post for the actual data. In retrospect
I would not do that again, it leads to a whole lot of complaining.
All you do is whinge that you're not getting a complete solution to
your underspacified problem, instead of trying to clarify the
confusion that you, yourself, created in the first place.
Where did I whinge that I am not getting a complete solution?
I attempted to adjust my explanation to your crankiness.
What was wrong with the suggestions I posted already? if you answer,
please realise that i will not be reading it anymore.
The only thing wrong is your annoying attitude, goodbye.
*plonk*

[SNIP of TOFU]

Martien
--
|
Martien Verbruggen | Never hire a poor lawyer. Never buy from a
Trading Post Australia | rich salesperson.
|
 
T

Tad McClellan

Right. So it is *your* responsibility to convey what you know to us
if we are to be able to help you.

Of course, all the readers are psychic and knew that when you said "actually
MSG[1-50]", you didn't mean MSG1, MSG2, ..., MSG50 but of course meant
TIME, SLC, MDIIMON, etc...

How foolish of those of us who can't read minds.
I didn't think that it was that hard to understand.


That is irrelevant, since you were not explaining it to yourself.

When writing, what matters is the _reader's_ perception, not
the author's perception.

Sorry this bothered you.
I am thru with this thread.


I am through with this poster.

*plonk*
 
S

Sam Holden

The only thing wrong is your annoying attitude, goodbye.

Let's hope you don't have any future perl problems/questions/issues since
the 'experts' of the group (of which I am not one, obviously) aren't going
to be reading them here...
 
J

Jim McTiernan

Sam Holden said:
Let's hope you don't have any future perl problems/questions/issues since
the 'experts' of the group (of which I am not one, obviously) aren't going
to be reading them here...
That's fine. I'll just won't be able to learn anything else about perl,
or get to be part of these lively conversations.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top