Mike said:
What I have is 24,000 lines of ICD9 index entries with appended codes
that used to be stored in a format like below and processed one time
per year by an OS390 (then printed out). These are index entries for
ICD9 codes that now have been moved to a web service. The web services
are all "code-centric" and the index entries are just properties of the
codes now. The gist of the problem is that I need to format each entry
(for example that of ablation) into a tree-view that users can peruse
for codes. I believe I can do this recursively be reading each string
until the first word changes then pulling the repeating first and
subsequent words (which are then the root). Then, remove the root and
process what is left of each substring in turn until the base case is
hit (no repeating substrings).
The owner of the webservices is not willing to change the format of the
data. I may be able to do a one-time process (Perl) or on-demand
depending on performance.
A tree view would look like
Ablation
Endometrial (Hysteroscopic) 68.23
Heart (Conduction Defect) 27.33/2
With Catheter 37.34/2
Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
By Injection 20.72
Lesion Heart
By Peripherally Inserted Catheter 37.34
etc etc.....
The raw index entries would look like below.
ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
ABLATION HEART (CONDUCTION DEFECT) 37.33/2
ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
APPROACH 37.33
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
APPROACH 37.33
ABLATION PITUITARY 7.69
ABLATION PITUITARY BY COBALT-60 92.32
ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY 60.97
ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
(TUNA) 60.97
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.62
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.29
ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
ABLATION VESICLE NECK (ANAT = 60.02) 57.91
Ho hum. I was bored so I farted around with this for a bit.
Not particularly elegant or fast but...
use warnings;
use strict;
my @file;
my $lastline = '';
my $partline = '';
while ( my $line = <DATA> ) {
chomp $line;
$line = "$partline $line" if length $partline;
if ( $line =~ /\D$/ ) {
$partline = $line;
next;
}
else {
$partline = '';
}
$line =~ s/(\w+('\w+)?)/\u\L$1/g;
push @file, $line;
}
my $level = 0;
my $prefix = '';
my @step;
my $prev = shift @file;
my $tab = ' ';
while (@file) {
( $level, $prefix, $prev ) =
buildtree( $level, $prefix, shift @file, $prev );
}
buildtree( $level, $prefix, $prefix, $prev );
sub buildtree {
my ( $level, $prefix, $next, $prev ) = @_;
my $common = join ' ', greatest_common_prefix( $prev, $next );
if ( $common eq $prefix ) {
$prev =~ s/^\Q$prefix\E\s*//;
print $tab x $level, $prev, "\n";
}
elsif ( length $common > length $prefix ) {
$prev =~ s/^\Q$common\E\s*//;
my $trim = $common;
$common =~ s/^\Q$prefix\E\s*//;
push @step, $common;
print $tab x $level, $common;
if ( $prev !~ /[ \p{Alpha}]/ ) {
print "\t$prev\n";
$level = @step;
}
else {
$level = @step;
print "\n", $tab x $level, $prev, "\n";
}
$prefix = $trim;
}
elsif ( length $common < length $prefix ) {
$prev =~ s/^\Q$prefix\E\s*//;
print $tab x $level, $prev, "\n";
my @newstep;
my $test = $next;
for (@step) {
last unless $test =~ s/\Q$_\E\s*//;
push @newstep, $_;
}
@step = @newstep;
$level = @step;
$prefix = $common;
}
return ( $level, $prefix, $next );
}
sub greatest_common_prefix {
no warnings 'uninitialized';
my ( $first, $second ) = @_;
my @first = split ' ', $first;
my @second = split ' ', $second;
my @gcp;
for (@first) {
if ( $_ eq shift @second ) {
push @gcp, $_;
}
else {
last;
}
}
return @gcp;
}
__DATA__
ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
ABLATION HEART (CONDUCTION DEFECT) 37.33/2
ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
APPROACH 37.33
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
APPROACH 37.33
ABLATION PITUITARY 7.69
ABLATION PITUITARY BY COBALT-60 92.32
ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY 60.97
ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
(TUNA) 60.97
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.62
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.29
ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
ABLATION VESICLE NECK (ANAT = 60.02) 57.91