string capture regex

Cheez · Jan 7, 2004

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.

I will post my code but it's really lame!

Thanks,
Cheez

=====================
$flatfile = "I.faa";

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

@test2 = <FILE>;

close(FILE);

foreach (@test2) {

chomp;

$_ =~ s/\W/ /g; # getting rid of non word chunks..not sure it helps

push @newtest, split(/ /);

}

open (FILE, ">parsed.txt") || die "Can't open '$parsed': $!\n";

print FILE "$_\n" for @newtest;

close(FILE);

print scalar(@newtest); # checking that the array is populated

Sam Holden · Jan 7, 2004

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

Click to expand...

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

my %hash;
while (<>) {
chomp;
my (undef, $number, undef, undef, $words) = split /\|/;
$words=~s/\s*\[SC\]$//;
$hash{$number} = $words;
}

I am little overwhelmed by all the m// and s/// modifiers. Any nudge
in the right direction about developing a regex would be greatly
appreciated.

Just use split (which does use a regex but a very simple one).

[snip code]

Matt Garrish · Jan 7, 2004

Cheez said:
Howdy, newbie to Perl.

You're best not asking the newbie for help. He just doesn't get it... : )

Matt

Gunnar Hjalmarsson · Jan 7, 2004

Cheez said:
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

Click to expand...

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

$flatfile = "I.faa";

open(FILE, "$flatfile") || die "Can't open '$flatfile': $!\n";

Yet another variant - from here you might want to do something like:

my %hash = ();
while (<FILE>) {
if ( /^gi\|(\d+)\S+\s+(\w+)\s+(\w+);\s+(\w+)/ ) {
$hash{$1} = [ $2, $3, $4 ];
}
}
close FILE;

use Data:

umper;
print Dumper %hash;

Cheez · Jan 7, 2004

I am really thankful for all of these suggestions. I will try some of
these regexes and get back to you all. Looking at a regex that works
helps me to work backwards (deconstuctionist?) to see *why* it worked.
This will be very helpful for not only this task but many more in the
future.

Thanks again everyone,
Cheez

Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

[snip]

Kris Jenkins · Jan 8, 2004

Sam said:
Howdy, newbie to Perl. I want to make a regex that will process a
particular line of text from a large flatfile:

gi|6319248|ref|NP_009331.1| Hypothetical ORF; Yal069wp [SC]

Click to expand...

I want the regex to:
1. capture the 7 digit number that always follows >gi|
2. then associate that number (in a hash?) with the "words"
Hypothetical, ORF, Yal069wp. These "words" always follow the
"NP_009331.1|" format and end before the "[SC]".

Click to expand...

my %hash;
while (<>) {
chomp;
my (undef, $number, undef, undef, $words) = split /\|/;
$words=~s/\s*\[SC\]$//;
$hash{$number} = $words;
}

Just as another option, you could replace:

my (undef, $number, undef, undef, $words) = split /\|/;

With:

my ( $number, $words) = ( split /\|/ )[1,4];

Cheers,
Kris

Problem Splitting Text String	2	Dec 29, 2022
Regex help	2	Sep 3, 2010
Push regex search result into hash with multiple values	14	May 19, 2014
File size too big for perl processing	5	Jun 30, 2008
Help with dynamic regex	14	Mar 7, 2012
Regex resetting the capture buffer	3	Jun 21, 2007
Clickable link conversion regex?	0	Nov 30, 2012
Need to capture Log of each thread --Parallel::ForkManager;	0	Jun 30, 2010

string capture regex

Cheez

Sam Holden

Matt Garrish

Gunnar Hjalmarsson

Cheez

Kris Jenkins

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads