Continue to Search file after matching a value

D

deadpickle

This is what I want the program to do:
1. Read in a file containing:
KBQP 071845Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KBQP 071905Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071856Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
KBQP 071925Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
2. Search for strings beginning with K at the beginning
EXAMPLE: next unless $obs =~ m/^(K)/;
3. When they are found load them into an array
EXAMPLE: my @a = split(" ", $obs);
4. Next, continue to search the file. But instead of searching for any
string beginning with K it instead searches for and string beginning
with the first four letters of the station ID.
EXAMPLE: First time through finds that $a[0] = KBQP
Continues through the file until it finds
another KBQP
5. After it has found another station ID with the same name as shown
in #4, it then checks the next value in BOTH arrays and compares them.
The object is to see which string is the newest.
6. The newer string gets wrote to the array and the program continues
to search for the same ID.
7. If no other similar IDs are found, go back to step 1.

The flow should look something like this:
Search for "K"
Found "KBQP" -> Load into array
Search for "KBQP"
Found "KBQP" -> comparing times of the observations
Second "KBQP" found is newer -> replacing the array with newer "KBQP"
Search for "KBQP"
If No newer "KBQP" found -> Search For "K"

I hope this is clear. My problem is that I have no clue how to do this
after step 3. Any help would be appreciated.

Code So far:
==============================================================
use strict;
use warnings;
$\ = "\n";
my $wmo = "07020719f.wmo";
open OUT,'>', 'sub.txt' or die "cannot open 'sub.txt' $!";
open WMO, '<', $wmo;
while (my $obs = <WMO>) {
next unless $obs =~ m/^(K)/;
my @a = split(" ", $obs);


}
close OUT;
 
X

xhoster

deadpickle said:
This is what I want the program to do:
1. Read in a file containing:
KBQP 071845Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KBQP 071905Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071856Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
KBQP 071925Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
2. Search for strings beginning with K at the beginning

In your example, that is all of them.
EXAMPLE: next unless $obs =~ m/^(K)/;
3. When they are found load them into an array
EXAMPLE: my @a = split(" ", $obs);
4. Next, continue to search the file. But instead of searching for any
string beginning with K it instead searches for and string beginning
with the first four letters of the station ID.

So then "last" out of the loop and start a different loop.
EXAMPLE: First time through finds that $a[0] = KBQP
Continues through the file until it finds
another KBQP
5. After it has found another station ID with the same name as shown
in #4, it then checks the next value in BOTH arrays and compares them.
The object is to see which string is the newest.
6. The newer string gets wrote to the array and the program continues
to search for the same ID.
7. If no other similar IDs are found, go back to step 1.

"similar" ne "same". Which do you want?

At this point, You've reached the end of the file, so how does one go back
to step 1? There is nothing left to process. Do you have to rewind in the
file to some previously remembered landmark? If so, see "tell" and "seek".

But really, if you would need to rewind, I think you are going about this
fundamentally the wrong way. Use a hash on the station name, and store
in it the "newest" string encountered so far. At the end, print out all
these station/string pairs.
Code So far:
==============================================================
use strict;
use warnings;
$\ = "\n";
my $wmo = "07020719f.wmo";
open OUT,'>', 'sub.txt' or die "cannot open 'sub.txt' $!";
open WMO, '<', $wmo;

my %station;
while (my $obs = <WMO>) {
next unless $obs =~ m/^(K)/;
my @a = split(" ", $obs);

my ($station, $other) = split / /, $obs,2;
if (not exists $station{$station} or
newer_than($other,$station{$station}) )
{ $station{$station}=$other; };
}
close OUT;

while (my ($k,$v)=each %station) {
print "$k\t$v\n"; #or whatever format you want
};

Xho
 
U

usenet

4. Next, continue to search the file. But instead of searching for any
string beginning with K it instead searches for and string beginning
with the first four letters of the station ID. ....
7. If no other similar IDs are found, go back to step 1.

Gah! That's a convoluted mess! Run away, run away!

Just use a hash to keep track of the highest found value for each
identifier as you iterate over the file (once), such as:

#!/usr/bin/perl
use strict; use warnings;

my %info;
while (<DATA>) {
my $letters = (split)[0];
$info{$letters} = $_ if $info{$letters} lt $_;
}
print sort values %info;

__DATA__
KBQP 071845Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KBQP 071905Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071856Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
KBQP 071925Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
 
T

Tad McClellan

deadpickle said:
This is what I want the program to do:


Please see the Posting Guidelines that are posted here frequently.

1. Read in a file containing:
KBQP 071845Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KBQP 071905Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071856Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
KBQP 071925Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=


You should use the __DATA__ token for providing file data.

If your aim is to find the "newest" one, then your sample data
should probably NOT be sorted oldest-to-newest...

2. Search for strings beginning with K at the beginning
EXAMPLE: next unless $obs =~ m/^(K)/;
3. When they are found load them into an array
EXAMPLE: my @a = split(" ", $obs);


Is it OK if you can accomplish what you want without loading
it into an array?

4. Next, continue to search the file. But instead of searching for any
string beginning with K it instead searches for and string beginning
with the first four letters of the station ID.
EXAMPLE: First time through finds that $a[0] = KBQP
Continues through the file until it finds
another KBQP
5. After it has found another station ID with the same name as shown
in #4, it then checks the next value in BOTH arrays and compares them.
The object is to see which string is the newest.


You have not mentioned how to interpret what is "newest", so I'll
just go with the greatest string-wise.

6. The newer string gets wrote to the array and the program continues
to search for the same ID.
7. If no other similar IDs are found, go back to step 1.


Blech!

If there are 50 radio stations your algorithm will read the same
file 50 times?

That is just toooo wasteful.

The flow should look something like this:


Why do you care what the flow looks like?

Shouldn't you instead care about whether or not it makes the
correct output, even if it uses a different flow?

I hope this is clear.


You want the lines with the greatest (newest) time for each
radio station that appears in the data.

Right?

My problem is that I have no clue how to do this
after step 3. Any help would be appreciated.

Code So far:
==============================================================
use strict;
use warnings;


Good. Very good.

Thank you.

$\ = "\n";
open OUT,'>', 'sub.txt' or die "cannot open 'sub.txt' $!";


Your code never makes any output, so those two lines are not
necesary to illustrate your problem.

If you choose an appropriate data structure, the algorithm gets
quite simple:

---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %stations;
while ( <DATA> ) {
next unless /^(K[A-Z]+)\s+(\S+)/; # does not start with "K"

if ( not exists $stations{$1}
or
$2 gt $stations{$1}{time}) {
$stations{$1}{time} = $2;
$stations{$1}{line} = $_;
}
}

foreach my $station ( keys %stations ) {
print $stations{$station}{line};
}


__DATA__
KBQP 071845Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KBQP 071905Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071900Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
KBQP 071925Z AUTO 35003KT 7SM OVC012 14/12 A3018 RMK AO2=
KHLR 071856Z AUTO 19010KT 10SM CLR 22/13 A3007 RMK AO2 SLP179
 
T

Tad McClellan

deadpickle said:
open OUT,'>', 'sub.txt' or die "cannot open 'sub.txt' $!";
open WMO, '<', $wmo;


You should always, yes *always*, check the return value from open().

You are already checking the 1st one, why stop when you got to the 2nd one?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top