string capture regex

C

Cheez

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:
gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.

I will post my code but it's really lame!

Thanks,
Cheez

=====================
$flatfile = "I.faa";

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

@test2 = <FILE>;

close(FILE);

foreach (@test2) {

chomp;

$_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps

push @newtest, split(/ /);

}

open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";

print FILE "$_\n" for @newtest;

close(FILE);

print scalar(@newtest); # checking that the array is populated
 
S

Sam Holden

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:
gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

my %hash;
while (<>) {
chomp;
my (undef, $number, undef, undef, $words) = split /\|/;
$words=~s/\s*\[SC\]$//;
$hash{$number} = $words;
}

I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.

Just use split (which does use a regex but a very simple one).

[snip code]
 
G

Gunnar Hjalmarsson

Cheez said:
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:
gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

$flatfile = "I.faa";

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

Yet another variant - from here you might want to do something like:

my %hash = ();
while (<FILE>) {
if ( /^gi\|(\d+)\S+\s+(\w+)\s+(\w+);\s+(\w+)/ ) {
$hash{$1} = [ $2, $3, $4 ];
}
}
close FILE;

use Data::Dumper;
print Dumper %hash;
 
C

Cheez

I am really thankful for all of these suggestions. I will try some of
these regexes and get back to you all. Looking at a regex that works
helps me to work backwards (deconstuctionist?) to see *why* it worked.
This will be very helpful for not only this task but many more in the
future.

Thanks again everyone,
Cheez

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:
[snip]
 
K

Kris Jenkins

Sam said:
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".


my %hash;
while (<>) {
chomp;
my (undef, $number, undef, undef, $words) = split /\|/;
$words=~s/\s*\[SC\]$//;
$hash{$number} = $words;
}

Just as another option, you could replace:

my (undef, $number, undef, undef, $words) = split /\|/;

With:

my ( $number, $words) = ( split /\|/ )[1,4];

Cheers,
Kris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top