J
January Weiner
Hello,
I am matching a regexp with an a priori unknown number of groups. I would
like to loop over all groups that were matched. For example:
/(\w+)\s(\w+)/ ;
#or
/(\w+)\s(\w+)\s(\w+)/ ;
# or something else
@groups = ...???
for( @groups ) {
process_match( $_ ) ;
}
Of course, the above example is simplifying reality and could be replaced
by split(). Here are more details on the problem:
I am processing protein sequence files in the FASTA format. Depending on
the database, the FASTA headers may look like that:
or like that
or, possibly,
or, basically, anything else. As you might guess, I'm interested in the
"Q81231" or "Q81231" part. The idea is that my utility can take an
optional "regexp" string that matches the type of headers that are found in
a given database; while looping through the database, the regexp is
matched, and entries are made for any of the synonymous identifiers found
in one header.
Currently, I am assuming that I will not find more than four synonims, and
I do the following:
for( $1, $2, $3, $4 ) {
last unless $_ ;
process_match( $_ ) ;
}
....which is, of course, crap.
Thanks in advance,
January
P.S. No, ([A-Z]\d{5}) would not match any identifier; the id format can
differ as well. Sometimes it is HBA_HUMAN.
--
I am matching a regexp with an a priori unknown number of groups. I would
like to loop over all groups that were matched. For example:
/(\w+)\s(\w+)/ ;
#or
/(\w+)\s(\w+)\s(\w+)/ ;
# or something else
@groups = ...???
for( @groups ) {
process_match( $_ ) ;
}
Of course, the above example is simplifying reality and could be replaced
by split(). Here are more details on the problem:
I am processing protein sequence files in the FASTA format. Depending on
the database, the FASTA headers may look like that:
O81231 (Q81999) Dehydrogenase alpha subunit
or like that
O81231 123 Q81999
or
gi|O81231||li|Q81999
or, possibly,
O81231; synonyms: Q81999, P89812, O77781
or, basically, anything else. As you might guess, I'm interested in the
"Q81231" or "Q81231" part. The idea is that my utility can take an
optional "regexp" string that matches the type of headers that are found in
a given database; while looping through the database, the regexp is
matched, and entries are made for any of the synonymous identifiers found
in one header.
Currently, I am assuming that I will not find more than four synonims, and
I do the following:
for( $1, $2, $3, $4 ) {
last unless $_ ;
process_match( $_ ) ;
}
....which is, of course, crap.
Thanks in advance,
January
P.S. No, ([A-Z]\d{5}) would not match any identifier; the id format can
differ as well. Sometimes it is HBA_HUMAN.
--