T
TonyShirt
I'm trying to parse a file with tag=value relationships. The file
looks something like this:
PRIMER_SEQUENCE_ID=1568502
SEQUENCE=GAGCCCCCAGCCTCTGCACTTTCCACCAGCTCAGTCTCTAGGGCTTTATCTTTCTCTGTTCATTGTTACCCGTTGCCAGCTTTAGCTCAGTGTTGTGTGAGCCACACTTCTTCATCATTGAGGTGTTCCTGTTGTCAGTGGGCTAGCTAAGGGCAGAAGGGCATTCGTGGGATTTTAAAGAATTTATGGGACCAACATTCTTTCCGCCTTCAGCAGATACCGATTATGTTTCCAGGAGGTGGGATGTGCCAGAAGCCGTCACCTCTTTTTGTTTCTCCCCTGCCTGCCTTCTTTCTCTTTTCCTCTTTCTCAATAAACAGATACTGTCTGTGTGTCTGCCTCACCTAATCTAACCCTCAGATTGCAGACAGTGCTTTATTTAGACCCAAAG
TTATGAGTCCTGATTGTGTTTTCCTGCTGGTCCCATCTGCTGTCTGTCTTTCAGTGGGCATCCACCGTTGTGGACCCAGGGATGGTTATGGGAAGCAAAACGTCTCCCTTAATCATAAACAGTGTCTACCAGTGGAAGCCCATCGACCGAGGGATCAGAGGCCTCTCAGTAGTATTGTTTATTGCAGTTCCTTGGCAACATTGCAGAGAGGCAGTCAGGTTCTGAAATACAACTGAGGTTATTGGCAGGCTGAGGCCCTGGCACAGGCACCTTCTAGAATATCAGCTAGTGTCTTGGCTTTCCTCTGGGGGGATCCCGTTGCTGTTGTGTTACAGAAATGGTAGTTGTTTACTCCAACAGTCTGGATGACCGCATAGAGGAACTATTTCARTAGTGACTG
CATCATTTTTTTTTTAACCTCGTAAACCTTTCACAGTTCAGGGGCCTTGGATCTTATTTTGAAGACAGGTGCAAATTGGAAATAGCATTTGAATATGACCCGGCAAAGCATGATTGCTTCTTAAGCTCAAGTATGAGATCTGTTTTGCAATCAGCTTGTCCAAGATGGTTATCTCTTCACTGTCAAATCAAAGTGCTCTGCATGGTGTTTAGAGATTGGGATGGTGAGGAGAGAGCAAGCCTGGGTATGTGCATGCATCTGTTTATTCTAGGCTTCGTGCCTCCAGGAGCTTGGAGGTCAGCTTGTAGTATAATAAAATAGAAAACTATACAGCCGGGGAGAACAGAAGCAGAATAGAAGGATAAAGTGTTGTTCATATCTCTCGGGCAAATTTTACCCA
ATTCTGACGAGCAGTTACTGCACAAGCAACAACAAAGGGACCTAGAGTGTGTTCATTGCCAATTCTGTCCATTTGGCTACATAACTACTGTGAACCAATACAAAAAGATATGTATAAAC
PRIMER_LIBERAL_BASE=1
TARGET=782,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
....
PRIMER_INTERNAL_OLIGO_4_SELF_END=1.00
PRIMER_LEFT_4_END_STABILITY=7.9000
PRIMER_RIGHT_4_END_STABILITY=7.3000
PRIMER_PAIR_4_COMPL_ANY=6.00
PRIMER_PAIR_4_COMPL_END=0.00
PRIMER_PRODUCT_SIZE_4=145
=
PRIMER_SEQUENCE_ID=2273703
SEQUENCE=ACAGAAAAGAGTCTATGAAAGCATGGAATTCCATAAAAATAATTTCTGAATGTTCAGTGTSACTTCCATATGTGCTCAGCAGTCCAGCAAGGTGTACCTGAGCTCACTTCCTCTGTCACCC
PRIMER_LIBERAL_BASE=1
TARGET=60,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
PRIMER_NUM_NS_ACCEPTED=2
PRIMER_PRODUCT_SIZE_RANGE=61-175
....
and so on. Each record starts with the SEQUENCE tag and ends with
=\n.
If your familiar with BioPerl its BoulderIO format. I'm aware of the
BioPerl Modules,
but its too much detail and too little documentation for this project.
I built a class to implement my parsersub:
package P3Wrapper;
use strict;
use Carp;
#Class constructor.
sub new{
my $self ={};
my $class = shift;
bless($self,$class);
return $self;
}#End New
sub ParseBIO{
my $self = shift;
my $BIOfile = shift;
my $tag = "";
my $value = "";
my $SeqID = "";
my %DataHash = ();
eval{
open(IN, $BIOfile) || croak "Can not open Boulder file
$BIOfile. $!";
while(<IN>){
if ($_ =~ /(.+)=(.+)/){
$tag = $1;
$value = $2;
}
if ($tag eq "PRIMER_SEQUENCE_ID"){
$SeqID = $value;
}elsif($SeqID ne ""){
$DataHash{$SeqID} = {$tag => $value}; #I think the
problem is here!
}
if ($_ =~ /=\n/){ #Reset, End of Record
$SeqID = "";
}
}#END while
close IN;
};#END EVAL
if ($@){
return (0,$@,undef);
}else{
return (1,"ok",\%DataHash);
}
}#End ParseBIO
and I built a little Driver to test it:
#use strict;
use diagnostics;
use P3Wrapper;
use Data:
umper;
my $outputBIO = "Design_test/test3.P3OUT";
my $P3 = P3Wrapper->new("Design_test/");
my ($rc,$msg,$Hash) = $P3 ->ParseBIO($outputBIO);
print Dumper $Hash;
The idea for the parser was that it didn't matter which tags were
used,
they were recorded in the hash with their values. Notice the
commented out pragma.
I found that I was using symbolic references quite by accident, and I
had to take out
the strict pragma. The problem is that I am capturing only one tag
per key. see below.
This is probably due the way I'm adding each tag to the hash (The
problem comment). I'm guessing, but I think
I'm only getting the last tag in each sub hash because of the
implementation of the symbolic references.
Here is what I get when I run the driver:
$VAR1 = {
'235546' => {
'PRIMER_PRODUCT_SIZE_4' => '140'
},
'1799929' => {
'PRIMER_ERROR' => 'INCLUDED_REGION length <
min PRIMER_
PRODUCT_SIZE_RANGE'
},
'5499' => {
'PRIMER_PRODUCT_SIZE_4' => '85'
}
};
Is there a way to incrementally add to the hash using symbolic tag
references? All the examples
I looked at show the hash being initialized all together. Any Help
would be appreciated.
Thanks
looks something like this:
PRIMER_SEQUENCE_ID=1568502
SEQUENCE=GAGCCCCCAGCCTCTGCACTTTCCACCAGCTCAGTCTCTAGGGCTTTATCTTTCTCTGTTCATTGTTACCCGTTGCCAGCTTTAGCTCAGTGTTGTGTGAGCCACACTTCTTCATCATTGAGGTGTTCCTGTTGTCAGTGGGCTAGCTAAGGGCAGAAGGGCATTCGTGGGATTTTAAAGAATTTATGGGACCAACATTCTTTCCGCCTTCAGCAGATACCGATTATGTTTCCAGGAGGTGGGATGTGCCAGAAGCCGTCACCTCTTTTTGTTTCTCCCCTGCCTGCCTTCTTTCTCTTTTCCTCTTTCTCAATAAACAGATACTGTCTGTGTGTCTGCCTCACCTAATCTAACCCTCAGATTGCAGACAGTGCTTTATTTAGACCCAAAG
TTATGAGTCCTGATTGTGTTTTCCTGCTGGTCCCATCTGCTGTCTGTCTTTCAGTGGGCATCCACCGTTGTGGACCCAGGGATGGTTATGGGAAGCAAAACGTCTCCCTTAATCATAAACAGTGTCTACCAGTGGAAGCCCATCGACCGAGGGATCAGAGGCCTCTCAGTAGTATTGTTTATTGCAGTTCCTTGGCAACATTGCAGAGAGGCAGTCAGGTTCTGAAATACAACTGAGGTTATTGGCAGGCTGAGGCCCTGGCACAGGCACCTTCTAGAATATCAGCTAGTGTCTTGGCTTTCCTCTGGGGGGATCCCGTTGCTGTTGTGTTACAGAAATGGTAGTTGTTTACTCCAACAGTCTGGATGACCGCATAGAGGAACTATTTCARTAGTGACTG
CATCATTTTTTTTTTAACCTCGTAAACCTTTCACAGTTCAGGGGCCTTGGATCTTATTTTGAAGACAGGTGCAAATTGGAAATAGCATTTGAATATGACCCGGCAAAGCATGATTGCTTCTTAAGCTCAAGTATGAGATCTGTTTTGCAATCAGCTTGTCCAAGATGGTTATCTCTTCACTGTCAAATCAAAGTGCTCTGCATGGTGTTTAGAGATTGGGATGGTGAGGAGAGAGCAAGCCTGGGTATGTGCATGCATCTGTTTATTCTAGGCTTCGTGCCTCCAGGAGCTTGGAGGTCAGCTTGTAGTATAATAAAATAGAAAACTATACAGCCGGGGAGAACAGAAGCAGAATAGAAGGATAAAGTGTTGTTCATATCTCTCGGGCAAATTTTACCCA
ATTCTGACGAGCAGTTACTGCACAAGCAACAACAAAGGGACCTAGAGTGTGTTCATTGCCAATTCTGTCCATTTGGCTACATAACTACTGTGAACCAATACAAAAAGATATGTATAAAC
PRIMER_LIBERAL_BASE=1
TARGET=782,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
....
PRIMER_INTERNAL_OLIGO_4_SELF_END=1.00
PRIMER_LEFT_4_END_STABILITY=7.9000
PRIMER_RIGHT_4_END_STABILITY=7.3000
PRIMER_PAIR_4_COMPL_ANY=6.00
PRIMER_PAIR_4_COMPL_END=0.00
PRIMER_PRODUCT_SIZE_4=145
=
PRIMER_SEQUENCE_ID=2273703
SEQUENCE=ACAGAAAAGAGTCTATGAAAGCATGGAATTCCATAAAAATAATTTCTGAATGTTCAGTGTSACTTCCATATGTGCTCAGCAGTCCAGCAAGGTGTACCTGAGCTCACTTCCTCTGTCACCC
PRIMER_LIBERAL_BASE=1
TARGET=60,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
PRIMER_NUM_NS_ACCEPTED=2
PRIMER_PRODUCT_SIZE_RANGE=61-175
....
and so on. Each record starts with the SEQUENCE tag and ends with
=\n.
If your familiar with BioPerl its BoulderIO format. I'm aware of the
BioPerl Modules,
but its too much detail and too little documentation for this project.
I built a class to implement my parsersub:
package P3Wrapper;
use strict;
use Carp;
#Class constructor.
sub new{
my $self ={};
my $class = shift;
bless($self,$class);
return $self;
}#End New
sub ParseBIO{
my $self = shift;
my $BIOfile = shift;
my $tag = "";
my $value = "";
my $SeqID = "";
my %DataHash = ();
eval{
open(IN, $BIOfile) || croak "Can not open Boulder file
$BIOfile. $!";
while(<IN>){
if ($_ =~ /(.+)=(.+)/){
$tag = $1;
$value = $2;
}
if ($tag eq "PRIMER_SEQUENCE_ID"){
$SeqID = $value;
}elsif($SeqID ne ""){
$DataHash{$SeqID} = {$tag => $value}; #I think the
problem is here!
}
if ($_ =~ /=\n/){ #Reset, End of Record
$SeqID = "";
}
}#END while
close IN;
};#END EVAL
if ($@){
return (0,$@,undef);
}else{
return (1,"ok",\%DataHash);
}
}#End ParseBIO
and I built a little Driver to test it:
#use strict;
use diagnostics;
use P3Wrapper;
use Data:
my $outputBIO = "Design_test/test3.P3OUT";
my $P3 = P3Wrapper->new("Design_test/");
my ($rc,$msg,$Hash) = $P3 ->ParseBIO($outputBIO);
print Dumper $Hash;
The idea for the parser was that it didn't matter which tags were
used,
they were recorded in the hash with their values. Notice the
commented out pragma.
I found that I was using symbolic references quite by accident, and I
had to take out
the strict pragma. The problem is that I am capturing only one tag
per key. see below.
This is probably due the way I'm adding each tag to the hash (The
problem comment). I'm guessing, but I think
I'm only getting the last tag in each sub hash because of the
implementation of the symbolic references.
Here is what I get when I run the driver:
$VAR1 = {
'235546' => {
'PRIMER_PRODUCT_SIZE_4' => '140'
},
'1799929' => {
'PRIMER_ERROR' => 'INCLUDED_REGION length <
min PRIMER_
PRODUCT_SIZE_RANGE'
},
'5499' => {
'PRIMER_PRODUCT_SIZE_4' => '85'
}
};
Is there a way to incrementally add to the hash using symbolic tag
references? All the examples
I looked at show the hash being initialized all together. Any Help
would be appreciated.
Thanks