extracting match found

T

tony

Hi,

I am searching for alternative patterns and had a problem w.r.t
extracting the pattern matched. Here is the code.

$string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";

while ($string=~
/((GAA|GAG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))|((CGT|CGC|CGA|CGG|AGA|AGG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))/g){
$where = pos($string);
print $1." ".$where,"\n";
}

Ideally it should print the matched entry and the corresponding
position of occurence. However the match CGTTTTTCT is not printed

Output

13
GAGTTCAGT 23


I know it is very simple but i am not realizing where i am making the
mistake.
I would really appreciate if somebody could help me.

Thanks,

tony
 
U

Uri Guttman

t> $string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";

t> while ($string=~
t> /((GAA|GAG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))|((CGT|CGC|CGA|CGG|AGA|AGG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))/g){

use the /x modifier to make that easier to read:

m{
(
(GAA|GAG)
(TTT|TTC)
(TCT|TCC|TCA|TCG|AGT|AGC)
)
|
(
(CGT|CGC|CGA|CGG|AGA|AGG)
(TTT|TTC)
(TCT|TCC|TCA|TCG|AGT|AGC)
)}gx ) {


t> $where = pos($string);
t> print $1." ".$where,"\n";

print "$1 $where\n" ;

much cleaner.

t> }

t> Ideally it should print the matched entry and the corresponding
t> position of occurence. However the match CGTTTTTCT is not printed

t> Output

t> 13
t> GAGTTCAGT 23

first off you are using all grabbing there when you probably want grouping
instead. use (?:) to group without grabbing (see perlre for
details).

but the real problem is that the regex is working. it tries to match the
first group in the top level alternation. and it does by finding
GAGTTCAGT. but then the cursor is past the CGTTTTTCT part of the string
so it won't match it. you can reverse the order of the alternates and
then it will find both but the problem will happen again if the matches
are found in the other order.

i don't know of any easy general way to force all matches like that in
one regex. a possible way is to use multiple regexes, one for each top
level alternation.

uri
 
U

Uri Guttman

A> Uri Guttman ([email protected]) wrote on MMMMCXXXII September MCMXCIII
A> in <URL:A> <>
A> <> but the real problem is that the regex is working. it tries to match the
A> <> first group in the top level alternation. and it does by finding
A> <> GAGTTCAGT. but then the cursor is past the CGTTTTTCT part of the string
A> <> so it won't match it. you can reverse the order of the alternates and
A> <> then it will find both but the problem will happen again if the matches
A> <> are found in the other order.

A> Rubbish. A regex always matches leftmost in the string first. So, if
A> both CGTTTTTCT and GAGTTCAGT match, and CGTTTTTCT is left of GAGTTCAGT,
A> CGTTTTTCT will be found. Regardless how they are ordered in the alternation.
you're right.

A> And the OPs program does produce two lines of output. His problem is one
A> of paren placing.

you may not believe it but i first started to say that when i composed
my reply. then i must have thought the parens were ok (and i didn't even
study my /x version which showed it). so i went in the wrong direction
instead.

uri
 
A

Anno Siegel

[...]
Either you test for $5, or perhaps better, put in an extra set of
parenthesis (and make the existing parens non-capturing):


#!/usr/bin/perl

use strict;
use warnings;
no warnings qw /syntax/;

my $string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";

while ($string =~
/((?:(?:GAA|GAG)(?:TTT|TTC)(?:TCT|TCC|TCA|TCG|AGT|AGC)) |
(?:(?:CGT|CGC|CGA|CGG|AGA|AGG)(?:TTT|TTC)(?:TCT|TCC|TCA|TCG|AGT|AGC)))
/xg){
my $where = pos $string;
print "$1 $where\n";
}

__END__
CGTTTTTCT 13
GAGTTCAGT 23

In addition, the OP may want to use $-[ 1] instead of pos(). pos()
points one character past the match, $-[ 1] is the first character
of the match.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top