hashs and symbolic refrences

T

TonyShirt

I'm trying to parse a file with tag=value relationships. The file
looks something like this:

PRIMER_SEQUENCE_ID=1568502
SEQUENCE=GAGCCCCCAGCCTCTGCACTTTCCACCAGCTCAGTCTCTAGGGCTTTATCTTTCTCTGTTCATTGTTACCCGTTGCCAGCTTTAGCTCAGTGTTGTGTGAGCCACACTTCTTCATCATTGAGGTGTTCCTGTTGTCAGTGGGCTAGCTAAGGGCAGAAGGGCATTCGTGGGATTTTAAAGAATTTATGGGACCAACATTCTTTCCGCCTTCAGCAGATACCGATTATGTTTCCAGGAGGTGGGATGTGCCAGAAGCCGTCACCTCTTTTTGTTTCTCCCCTGCCTGCCTTCTTTCTCTTTTCCTCTTTCTCAATAAACAGATACTGTCTGTGTGTCTGCCTCACCTAATCTAACCCTCAGATTGCAGACAGTGCTTTATTTAGACCCAAAG
TTATGAGTCCTGATTGTGTTTTCCTGCTGGTCCCATCTGCTGTCTGTCTTTCAGTGGGCATCCACCGTTGTGGACCCAGGGATGGTTATGGGAAGCAAAACGTCTCCCTTAATCATAAACAGTGTCTACCAGTGGAAGCCCATCGACCGAGGGATCAGAGGCCTCTCAGTAGTATTGTTTATTGCAGTTCCTTGGCAACATTGCAGAGAGGCAGTCAGGTTCTGAAATACAACTGAGGTTATTGGCAGGCTGAGGCCCTGGCACAGGCACCTTCTAGAATATCAGCTAGTGTCTTGGCTTTCCTCTGGGGGGATCCCGTTGCTGTTGTGTTACAGAAATGGTAGTTGTTTACTCCAACAGTCTGGATGACCGCATAGAGGAACTATTTCARTAGTGACTG
CATCATTTTTTTTTTAACCTCGTAAACCTTTCACAGTTCAGGGGCCTTGGATCTTATTTTGAAGACAGGTGCAAATTGGAAATAGCATTTGAATATGACCCGGCAAAGCATGATTGCTTCTTAAGCTCAAGTATGAGATCTGTTTTGCAATCAGCTTGTCCAAGATGGTTATCTCTTCACTGTCAAATCAAAGTGCTCTGCATGGTGTTTAGAGATTGGGATGGTGAGGAGAGAGCAAGCCTGGGTATGTGCATGCATCTGTTTATTCTAGGCTTCGTGCCTCCAGGAGCTTGGAGGTCAGCTTGTAGTATAATAAAATAGAAAACTATACAGCCGGGGAGAACAGAAGCAGAATAGAAGGATAAAGTGTTGTTCATATCTCTCGGGCAAATTTTACCCA
ATTCTGACGAGCAGTTACTGCACAAGCAACAACAAAGGGACCTAGAGTGTGTTCATTGCCAATTCTGTCCATTTGGCTACATAACTACTGTGAACCAATACAAAAAGATATGTATAAAC
PRIMER_LIBERAL_BASE=1
TARGET=782,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
....
PRIMER_INTERNAL_OLIGO_4_SELF_END=1.00
PRIMER_LEFT_4_END_STABILITY=7.9000
PRIMER_RIGHT_4_END_STABILITY=7.3000
PRIMER_PAIR_4_COMPL_ANY=6.00
PRIMER_PAIR_4_COMPL_END=0.00
PRIMER_PRODUCT_SIZE_4=145
=
PRIMER_SEQUENCE_ID=2273703
SEQUENCE=ACAGAAAAGAGTCTATGAAAGCATGGAATTCCATAAAAATAATTTCTGAATGTTCAGTGTSACTTCCATATGTGCTCAGCAGTCCAGCAAGGTGTACCTGAGCTCACTTCCTCTGTCACCC
PRIMER_LIBERAL_BASE=1
TARGET=60,1
PRIMER_OPT_SIZE=18
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=45
PRIMER_OPT_TM=58.0
PRIMER_MIN_TM=52
PRIMER_MAX_TM=90
PRIMER_MAX_DIFF_TM=10.0
PRIMER_MIN_GC=25.0
PRIMER_MAX_GC=70.0
PRIMER_NUM_NS_ACCEPTED=2
PRIMER_PRODUCT_SIZE_RANGE=61-175
....

and so on. Each record starts with the SEQUENCE tag and ends with
=\n.
If your familiar with BioPerl its BoulderIO format. I'm aware of the
BioPerl Modules,
but its too much detail and too little documentation for this project.


I built a class to implement my parsersub:

package P3Wrapper;

use strict;
use Carp;

#Class constructor.
sub new{

my $self ={};
my $class = shift;
bless($self,$class);
return $self;

}#End New

sub ParseBIO{
my $self = shift;
my $BIOfile = shift;
my $tag = "";
my $value = "";
my $SeqID = "";
my %DataHash = ();

eval{
open(IN, $BIOfile) || croak "Can not open Boulder file
$BIOfile. $!";

while(<IN>){

if ($_ =~ /(.+)=(.+)/){

$tag = $1;
$value = $2;
}

if ($tag eq "PRIMER_SEQUENCE_ID"){

$SeqID = $value;

}elsif($SeqID ne ""){

$DataHash{$SeqID} = {$tag => $value}; #I think the
problem is here!
}


if ($_ =~ /=\n/){ #Reset, End of Record
$SeqID = "";

}

}#END while

close IN;
};#END EVAL

if ($@){
return (0,$@,undef);
}else{
return (1,"ok",\%DataHash);
}

}#End ParseBIO

and I built a little Driver to test it:

#use strict;
use diagnostics;
use P3Wrapper;
use Data::Dumper;

my $outputBIO = "Design_test/test3.P3OUT";

my $P3 = P3Wrapper->new("Design_test/");
my ($rc,$msg,$Hash) = $P3 ->ParseBIO($outputBIO);

print Dumper $Hash;

The idea for the parser was that it didn't matter which tags were
used,
they were recorded in the hash with their values. Notice the
commented out pragma.
I found that I was using symbolic references quite by accident, and I
had to take out
the strict pragma. The problem is that I am capturing only one tag
per key. see below.
This is probably due the way I'm adding each tag to the hash (The
problem comment). I'm guessing, but I think
I'm only getting the last tag in each sub hash because of the
implementation of the symbolic references.
Here is what I get when I run the driver:

$VAR1 = {
'235546' => {
'PRIMER_PRODUCT_SIZE_4' => '140'
},
'1799929' => {
'PRIMER_ERROR' => 'INCLUDED_REGION length <
min PRIMER_
PRODUCT_SIZE_RANGE'
},
'5499' => {
'PRIMER_PRODUCT_SIZE_4' => '85'
}
};

Is there a way to incrementally add to the hash using symbolic tag
references? All the examples
I looked at show the hash being initialized all together. Any Help
would be appreciated.

Thanks
 
S

Sam Holden

I'm trying to parse a file with tag=value relationships. The file
looks something like this:

PRIMER_SEQUENCE_ID=1568502
SEQUENCE=GAGCCCCCAGCCTCTGCACTTTCCACCAGCTCAGTCTCTAGGGCTTTATCTTTCTCTGTTCATTGTTACCCGTTGCCAGCTTTAGCTCAGTGTTGTGTGAGCCACACTTCTTCATCATTGAGGTGTTCCTGTTGTCAGTGGGCTAGCTAAGGGCAGAAGGGCATTCGTGGGATTTTAAAGAATTTATGGGACCAACATTCTTTCCGCCTTCAGCAGATACCGATTATGTTTCCAGGAGGTGGGATGTGCCAGAAGCCGTCACCTCTTTTTGTTTCTCCCCTGCCTGCCTTCTTTCTCTTTTCCTCTTTCTCAATAAACAGATACTGTCTGTGTGTCTGCCTCACCTAATCTAACCCTCAGATTGCAGACAGTGCTTTATTTAGACCCAAAG
TTATGAGTCCTGATTGTGTTTTCCTGCTGGTCCCATCTGCTGTCTGTCTTTCAGTGGGCATCCACCGTTGTGGACCCAGGGATGGTTATGGGAAGCAAAACGTCTCCCTTAATCATAAACAGTGTCTACCAGTGGAAGCCCATCGACCGAGGGATCAGAGGCCTCTCAGTAGTATTGTTTATTGCAGTTCCTTGGCAACATTGCAGAGAGGCAGTCAGGTTCTGAAATACAACTGAGGTTATTGGCAGGCTGAGGCCCTGGCACAGGCACCTTCTAGAATATCAGCTAGTGTCTTGGCTTTCCTCTGGGGGGATCCCGTTGCTGTTGTGTTACAGAAATGGTAGTTGTTTACTCCAACAGTCTGGATGACCGCATAGAGGAACTATTTCARTAGTGACTG

[snip more of the same]

I don't see any handling of multi-line fields in the code you posted.

Or was that just usenet wrapping?

[snip more key=value lines]

[snip code]
while(<IN>){

if ($_ =~ /(.+)=(.+)/){

$tag = $1;
$value = $2;
}

if ($tag eq "PRIMER_SEQUENCE_ID"){

$SeqID = $value;

}elsif($SeqID ne ""){

$DataHash{$SeqID} = {$tag => $value}; #I think the
problem is here!

That assigns a reference to a brand new hash containing just that one
key/value pair to $DataHash{$SeqID}, you want something like:

$DataHash{$SeqID}{$tag} = $value;

To just add another key/value pair (and auto-vivify the hash reference the
first time through) to the existing hash.

[snip code]
and I built a little Driver to test it:

#use strict;
use diagnostics;
use P3Wrapper;
use Data::Dumper;

my $outputBIO = "Design_test/test3.P3OUT";

my $P3 = P3Wrapper->new("Design_test/");
my ($rc,$msg,$Hash) = $P3 ->ParseBIO($outputBIO);

print Dumper $Hash;

The idea for the parser was that it didn't matter which tags were
used,
they were recorded in the hash with their values. Notice the
commented out pragma.
I found that I was using symbolic references quite by accident, and I
had to take out
the strict pragma. The problem is that I am capturing only one tag
per key. see below.
This is probably due the way I'm adding each tag to the hash (The
problem comment). I'm guessing, but I think
I'm only getting the last tag in each sub hash because of the
implementation of the symbolic references.
Here is what I get when I run the driver:

There are no symbolic references that I can see. I ran the code with
use strict without any problems.

Other than handling those long lines (which is easy enough - when you see
a continuation line you add it to the last key you used, that requires
keeping track of the last key used, but that's no biggy, in fact your
code already does because $tag is scoped outside the loop). Speaking
of that you should scope things in the smallest scope they can be in,
you don't currently use $tag and $value outside the loop so they should
be myed inside the loop.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top