String matching and alignment?

B

Bryan

If I have some sequence data:
my $seq = "AGCCTCAAAGTTCGG";

and some subset:
my $subset = "CAAAGTTC";

I want to first, match the pattern to see if $subset is found in $seq
(which is fine), but then I also want to know the starting and end
positions of the match in the original sequence, i.e. start = 6, end =
13, is there something (like backreferences) fromt the regex that will
give me this info? Or....?

thanks,
B
 
G

gnari

Bryan said:
If I have some sequence data:
my $seq = "AGCCTCAAAGTTCGG";

and some subset:
my $subset = "CAAAGTTC";

I want to first, match the pattern to see if $subset is found in $seq
(which is fine), but then I also want to know the starting and end
positions of the match in the original sequence, i.e. start = 6, end =
13, is there something (like backreferences) fromt the regex that will
give me this info? Or....?

perldoc -f index

if you really want to use regexes:

perldoc perlvar (see @-)

gnari
 
S

Simon Taylor

Bryan said:
If I have some sequence data:
my $seq = "AGCCTCAAAGTTCGG";

and some subset:
my $subset = "CAAAGTTC";

I want to first, match the pattern to see if $subset is found in $seq
(which is fine), but then I also want to know the starting and end
positions of the match in the original sequence, i.e. start = 6, end =
13, is there something (like backreferences) fromt the regex that will
give me this info? Or....?

Try the following:


#!/usr/bin/perl

use strict;
use warnings;

my $seq = "AGCCTCAAAGTTCGG";
my $subset = "CAAAGTTC";

if ($seq =~ m/$subset/g) {
print "offset where last m//g match left off: " . pos($seq) . "\n";
print "everything before matched string: $`\n";
print "everything after matched string: $'\n";
print "The entire matched string: $&\n\n";
}

Which gives me the following output:

offset where last m//g match left off: 13
everything before matched string: AGCCT
everything after matched string: GG
The entire matched string: CAAAGTTC


Hope this helps.

- Simon Taylor
 
S

Sisyphus

Bryan said:
If I have some sequence data:
my $seq = "AGCCTCAAAGTTCGG";

and some subset:
my $subset = "CAAAGTTC";

I want to first, match the pattern to see if $subset is found in $seq
(which is fine), but then I also want to know the starting and end
positions of the match in the original sequence, i.e. start = 6, end =
13, is there something (like backreferences) fromt the regex that will
give me this info? Or....?

Start = length($`) + 1
End = length($`) + length($&)
Alternatively, end = length($`) + length($subset)

See perldoc perlvar for documentation on $`, $' and $&.

Cheers,
Rob
 
J

Joe Smith

Bryan said:
If I have some sequence data:
my $seq = "AGCCTCAAAGTTCGG";

and some subset:
my $subset = "CAAAGTTC";

I want to first, match the pattern to see if $subset is found in $seq
(which is fine), but then I also want to know the starting and end
positions of the match in the original sequence, i.e. start = 6, end =
13, is there something (like backreferences) fromt the regex that will
give me this info?

Use the magic arrays @- and @+.
(Do not use $`, $&, and $' as they will just slow you down.

my $seq = "AGCCTCAAAGTTCGG";
my $subset = "CAAAGTTC";
if ($seq =~ /($subset)(.?)/) {
print "Overall match starts at $-[0] and ends just before $+[0]\n";
print " 1st () match starts at $-[1] and ends just before $+[1]\n";
print " 2nd () match starts at $-[2] and ends just before $+[2]\n";
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top