C
Cheez
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:
I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".
I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.
I will post my code but it's really lame!
Thanks,
Cheez
=====================
$flatfile = "I.faa";
open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";
@test2 = <FILE>;
close(FILE);
foreach (@test2) {
chomp;
$_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps
push @newtest, split(/ /);
}
open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";
print FILE "$_\n" for @newtest;
close(FILE);
print scalar(@newtest); # checking that the array is populated
particular line of text from a large flatfile:
gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]
I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".
I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.
I will post my code but it's really lame!
Thanks,
Cheez
=====================
$flatfile = "I.faa";
open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";
@test2 = <FILE>;
close(FILE);
foreach (@test2) {
chomp;
$_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps
push @newtest, split(/ /);
}
open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";
print FILE "$_\n" for @newtest;
close(FILE);
print scalar(@newtest); # checking that the array is populated