extracting match found

Discussion in 'Perl Misc' started by tony, Dec 23, 2004.

  1. tony

    tony Guest

    Hi,

    I am searching for alternative patterns and had a problem w.r.t
    extracting the pattern matched. Here is the code.

    $string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";

    while ($string=~
    /((GAA|GAG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))|((CGT|CGC|CGA|CGG|AGA|AGG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))/g){
    $where = pos($string);
    print $1." ".$where,"\n";
    }

    Ideally it should print the matched entry and the corresponding
    position of occurence. However the match CGTTTTTCT is not printed

    Output

    13
    GAGTTCAGT 23


    I know it is very simple but i am not realizing where i am making the
    mistake.
    I would really appreciate if somebody could help me.

    Thanks,

    tony
    tony, Dec 23, 2004
    #1
    1. Advertising

  2. tony

    Uri Guttman Guest

    >>>>> "t" == tony <> writes:

    t> $string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";

    t> while ($string=~
    t> /((GAA|GAG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))|((CGT|CGC|CGA|CGG|AGA|AGG)(TTT|TTC)(TCT|TCC|TCA|TCG|AGT|AGC))/g){

    use the /x modifier to make that easier to read:

    m{
    (
    (GAA|GAG)
    (TTT|TTC)
    (TCT|TCC|TCA|TCG|AGT|AGC)
    )
    |
    (
    (CGT|CGC|CGA|CGG|AGA|AGG)
    (TTT|TTC)
    (TCT|TCC|TCA|TCG|AGT|AGC)
    )}gx ) {


    t> $where = pos($string);
    t> print $1." ".$where,"\n";

    print "$1 $where\n" ;

    much cleaner.

    t> }

    t> Ideally it should print the matched entry and the corresponding
    t> position of occurence. However the match CGTTTTTCT is not printed

    t> Output

    t> 13
    t> GAGTTCAGT 23

    first off you are using all grabbing there when you probably want grouping
    instead. use (?:) to group without grabbing (see perlre for
    details).

    but the real problem is that the regex is working. it tries to match the
    first group in the top level alternation. and it does by finding
    GAGTTCAGT. but then the cursor is past the CGTTTTTCT part of the string
    so it won't match it. you can reverse the order of the alternates and
    then it will find both but the problem will happen again if the matches
    are found in the other order.

    i don't know of any easy general way to force all matches like that in
    one regex. a possible way is to use multiple regexes, one for each top
    level alternation.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Dec 23, 2004
    #2
    1. Advertising

  3. tony

    Uri Guttman Guest

    >>>>> "A" == Abigail <> writes:

    A> Uri Guttman () wrote on MMMMCXXXII September MCMXCIII
    A> in <URL:news:>:
    A> <>
    A> <> but the real problem is that the regex is working. it tries to match the
    A> <> first group in the top level alternation. and it does by finding
    A> <> GAGTTCAGT. but then the cursor is past the CGTTTTTCT part of the string
    A> <> so it won't match it. you can reverse the order of the alternates and
    A> <> then it will find both but the problem will happen again if the matches
    A> <> are found in the other order.

    A> Rubbish. A regex always matches leftmost in the string first. So, if
    A> both CGTTTTTCT and GAGTTCAGT match, and CGTTTTTCT is left of GAGTTCAGT,
    A> CGTTTTTCT will be found. Regardless how they are ordered in the alternation.
    you're right.

    A> And the OPs program does produce two lines of output. His problem is one
    A> of paren placing.

    you may not believe it but i first started to say that when i composed
    my reply. then i must have thought the parens were ok (and i didn't even
    study my /x version which showed it). so i went in the wrong direction
    instead.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Dec 23, 2004
    #3
  4. tony

    Anno Siegel Guest

    Abigail <> wrote in comp.lang.perl.misc:

    [...]

    > Either you test for $5, or perhaps better, put in an extra set of
    > parenthesis (and make the existing parens non-capturing):
    >
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    > no warnings qw /syntax/;
    >
    > my $string = "AAAACGTTTTTCTTGAGTTCAGTTTTTAnTC";
    >
    > while ($string =~
    > /((?:(?:GAA|GAG)(?:TTT|TTC)(?:TCT|TCC|TCA|TCG|AGT|AGC)) |
    > (?:(?:CGT|CGC|CGA|CGG|AGA|AGG)(?:TTT|TTC)(?:TCT|TCC|TCA|TCG|AGT|AGC)))
    > /xg){
    > my $where = pos $string;
    > print "$1 $where\n";
    > }
    >
    > __END__
    > CGTTTTTCT 13
    > GAGTTCAGT 23


    In addition, the OP may want to use $-[ 1] instead of pos(). pos()
    points one character past the match, $-[ 1] is the first character
    of the match.

    Anno
    Anno Siegel, Dec 23, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    623
  2. Victor
    Replies:
    2
    Views:
    625
    Victor
    May 17, 2004
  3. ekzept
    Replies:
    0
    Views:
    351
    ekzept
    Aug 10, 2007
  4. Pilcrow
    Replies:
    2
    Views:
    210
    Eric Sosman
    Nov 21, 2008
  5. John Gordon
    Replies:
    13
    Views:
    452
    Ian Kelly
    Dec 20, 2011
Loading...

Share This Page