Here is a sample of my data:
Field 1 Field 2 Field 3 Field 4 Field 5
123 5675.68 5/24/03 Misc Misc2
E4678 345.76 6/23/02 Test Test2
123A 8756.67 7/3/03 Code Code2
0234 10456.45 6/4/02 Man Man2
234 456.34 10/5/02 Talk Talk2
675-02 1045.45 3/5/03 Level Level1
etc...
I would like to isolate only the records where the records in "Field
1" are "one-off." The results from the above sample would look like
this:
Field 1 Field 2 Field 3 Field 4 Field 5
123 5675.68 5/24/03 Misc Misc2
123A 8756.67 7/3/03 Code Code2
and
0234 10456.45 6/4/02 Man Man2
234 456.34 10/5/02 Talk Talk2
because the records in "Field 1" are what I am calling "one-off."
You mean one character shorter, by either taking the first or
the last character off?
Or should 123 and 12X3 be "one off" too?
I'll assume the former.
The characters in "Field 1" can be anything.
I will assume "anything except whitespace" below.
Any help would be greatly appreciated.
--------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while ( <DATA> ) {
$seen{$1} = $_ if /^(\S+)/;
}
my %reported;
foreach my $f1 ( sort keys %seen ) {
foreach my $shorter ( substr($f1, 0, -1), substr($f1, 1) ) {
if ( $seen{$shorter} and not $reported{ "$seen{$shorter}:$f1" }) {
$reported{ "$seen{$shorter}:$f1" } = 1;
print $seen{$shorter}, $seen{$f1}, "\n";
}
}
}
__DATA__
123 5675.68 5/24/03 Misc Misc2
E4678 345.76 6/23/02 Test Test2
123A 8756.67 7/3/03 Code Code2
0234 10456.45 6/4/02 Man Man2
234 456.34 10/5/02 Talk Talk2
675-02 1045.45 3/5/03 Level Level1
--------------------------------------------------------
[ snip TOFU.
Please learn the proper way of formatting followups.
]