String matching and alignment?

Discussion in 'Perl Misc' started by Bryan, Jun 9, 2004.

  1. Bryan

    Bryan Guest

    If I have some sequence data:
    my $seq = "AGCCTCAAAGTTCGG";

    and some subset:
    my $subset = "CAAAGTTC";

    I want to first, match the pattern to see if $subset is found in $seq
    (which is fine), but then I also want to know the starting and end
    positions of the match in the original sequence, i.e. start = 6, end =
    13, is there something (like backreferences) fromt the regex that will
    give me this info? Or....?

    thanks,
    B
     
    Bryan, Jun 9, 2004
    #1
    1. Advertising

  2. Bryan

    gnari Guest

    "Bryan" <> wrote in message
    news:y7Mxc.68823$...
    > If I have some sequence data:
    > my $seq = "AGCCTCAAAGTTCGG";
    >
    > and some subset:
    > my $subset = "CAAAGTTC";
    >
    > I want to first, match the pattern to see if $subset is found in $seq
    > (which is fine), but then I also want to know the starting and end
    > positions of the match in the original sequence, i.e. start = 6, end =
    > 13, is there something (like backreferences) fromt the regex that will
    > give me this info? Or....?


    perldoc -f index

    if you really want to use regexes:

    perldoc perlvar (see @-)

    gnari
     
    gnari, Jun 10, 2004
    #2
    1. Advertising

  3. Bryan

    Simon Taylor Guest

    Bryan wrote:
    > If I have some sequence data:
    > my $seq = "AGCCTCAAAGTTCGG";
    >
    > and some subset:
    > my $subset = "CAAAGTTC";
    >
    > I want to first, match the pattern to see if $subset is found in $seq
    > (which is fine), but then I also want to know the starting and end
    > positions of the match in the original sequence, i.e. start = 6, end =
    > 13, is there something (like backreferences) fromt the regex that will
    > give me this info? Or....?


    Try the following:


    #!/usr/bin/perl

    use strict;
    use warnings;

    my $seq = "AGCCTCAAAGTTCGG";
    my $subset = "CAAAGTTC";

    if ($seq =~ m/$subset/g) {
    print "offset where last m//g match left off: " . pos($seq) . "\n";
    print "everything before matched string: $`\n";
    print "everything after matched string: $'\n";
    print "The entire matched string: $&\n\n";
    }

    Which gives me the following output:

    offset where last m//g match left off: 13
    everything before matched string: AGCCT
    everything after matched string: GG
    The entire matched string: CAAAGTTC


    Hope this helps.

    - Simon Taylor
    --
    Unisolve Pty Ltd - Melbourne, Australia
     
    Simon Taylor, Jun 10, 2004
    #3
  4. Bryan

    Sisyphus Guest

    Bryan wrote:
    > If I have some sequence data:
    > my $seq = "AGCCTCAAAGTTCGG";
    >
    > and some subset:
    > my $subset = "CAAAGTTC";
    >
    > I want to first, match the pattern to see if $subset is found in $seq
    > (which is fine), but then I also want to know the starting and end
    > positions of the match in the original sequence, i.e. start = 6, end =
    > 13, is there something (like backreferences) fromt the regex that will
    > give me this info? Or....?
    >


    Start = length($`) + 1
    End = length($`) + length($&)
    Alternatively, end = length($`) + length($subset)

    See perldoc perlvar for documentation on $`, $' and $&.

    Cheers,
    Rob


    --
    To reply by email u have to take out the u in kalinaubears.
     
    Sisyphus, Jun 10, 2004
    #4
  5. Bryan

    Joe Smith Guest

    Bryan wrote:

    > If I have some sequence data:
    > my $seq = "AGCCTCAAAGTTCGG";
    >
    > and some subset:
    > my $subset = "CAAAGTTC";
    >
    > I want to first, match the pattern to see if $subset is found in $seq
    > (which is fine), but then I also want to know the starting and end
    > positions of the match in the original sequence, i.e. start = 6, end =
    > 13, is there something (like backreferences) fromt the regex that will
    > give me this info?


    Use the magic arrays @- and @+.
    (Do not use $`, $&, and $' as they will just slow you down.

    my $seq = "AGCCTCAAAGTTCGG";
    my $subset = "CAAAGTTC";
    if ($seq =~ /($subset)(.?)/) {
    print "Overall match starts at $-[0] and ends just before $+[0]\n";
    print " 1st () match starts at $-[1] and ends just before $+[1]\n";
    print " 2nd () match starts at $-[2] and ends just before $+[2]\n";
    }
     
    Joe Smith, Jun 10, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    6
    Views:
    859
    John C. Bollinger
    Oct 7, 2005
  2. =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    Replies:
    5
    Views:
    1,338
    =?ISO-8859-1?Q?Martin_J=F8rgensen?=
    May 6, 2006
  3. Rahul

    String Alignment problem

    Rahul, Jun 4, 2007, in forum: ASP .Net
    Replies:
    10
    Views:
    600
    Alexey Smirnov
    Jun 4, 2007
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    260
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    261
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page