String::Approx 'aindex' help

P

Puri

Hello,

I am trying to write a simple program using the String-Approx module
that returns the indexes of multiple matches in a single string. Here
is my code:
use String::Approx 'aindex';

$seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
$seq2 = "ag";
$seq2length = length($seq2);

print "$seq1\n\n";

$indexcount = 0;
#$index[0] = aindex($seq2, ["i 0"], $seq1);
$initial = 0;

until ($index[$indexcount-1] == -1) {
$index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
$seq1);
$initial=($index[$indexcount]+$seq2length);
$indexcount++;
}

pop @index;

print "@index";

##End code

This returns the following array @index: 1 10 35

However, this is incorrect, because the third "ag" is found starting at
the 11th character, not the 10th. Is there something wrong with the
code, or an easier way to use the "aindex" function of String-Approx to
search for multiple matches within one string?

Thanks in advance,
Puri
 
J

jl_post

Puri said:
I am trying to write a simple program using the String-Approx
module that returns the indexes of multiple matches in a
single string. Here is my code:
use String::Approx 'aindex';


Dear Puri,

Out of curiosity, couldn't you just use the index() function? If
you don't know how to use it you can learn by reading "perldoc -f
index". I suggest this because it doesn't look like to me that you are
trying to find an approximation, but rather an exact match.

$seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
$seq2 = "ag"; [code snipped]

This returns the following array @index: 1 10 35

However, this is incorrect, because the third "ag" is
found starting at the 11th character, not the 10th.
Is there something wrong with the code,


That doesn't seem wrong to me. Normally, positions in Perl start at
zero, which means that 0 signifies the first character (and that 10
signifies the 11th character). This property of certain programming
languages can be confusing to those who aren't familiar with it.

I hope this helps, Puri.

-- Jean-Luc
 
P

Puri

Jean-Luc,

Thank you for the quick reply, however I don't think that answers my
question. I am using String-Approx (version 3.25 by the way) because I
hope to eventually be finding approximate matches within sequences as
well. However, I wanted to check to make sure that I could get the
indexing to work properly first with perfect matches (hence the 0 in
the modifiers).

As for Perl positions starting at zero, I understand this, and this is
exactly why I think there is a problem. If you look at the array, it
says there are matches starting at positions 1, 10, and 35. However,
if you look at the first sequence I have entered ($seq1), the matches
should be at 1 (literally the second character entered in my string),
11 (character#12) and 35. 1 and 35 are correct, but the indexing seems
to be getting confused with the middle match, possibly because of the
'a' in front of the 'ag'.

Any other suggestions will be greatly appreciated.

-Puri
 
J

John W. Krahn

Puri said:
As for Perl positions starting at zero, I understand this, and this is
exactly why I think there is a problem. If you look at the array, it
says there are matches starting at positions 1, 10, and 35. However,
if you look at the first sequence I have entered ($seq1), the matches
should be at 1 (literally the second character entered in my string),
11 (character#12) and 35. 1 and 35 are correct, but the indexing seems
to be getting confused with the middle match, possibly because of the
'a' in front of the 'ag'.

Any other suggestions will be greatly appreciated.

You should ask the author of that module whether that is a bug or the correct
behavior.


John
 
X

xhoster

Puri said:
$seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
$seq2 = "ag";
$seq2length = length($seq2);

print "$seq1\n\n";

$indexcount = 0;
#$index[0] = aindex($seq2, ["i 0"], $seq1);
$initial = 0;

until ($index[$indexcount-1] == -1) {
$index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
$seq1);
$initial=($index[$indexcount]+$seq2length);
$indexcount++;
}

pop @index;

print "@index";

##End code

This returns the following array @index: 1 10 35

At first I thought it was a counting problem, but now I see that problem is
that 'aa' is matching. If you change the code so that overlapping matches
are allowed, then it returns 1 10 11 35. I don't know why aa matches, and
I don't know enough about String::Approx to know if this an bug or if it is
correct for some reason I don't understand.

Xho
 
X

xhoster

Puri said:
$seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
$seq2 = "ag";
$seq2length = length($seq2);

print "$seq1\n\n";

$indexcount = 0;
#$index[0] = aindex($seq2, ["i 0"], $seq1);
$initial = 0;

until ($index[$indexcount-1] == -1) {
$index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
$seq1);
$initial=($index[$indexcount]+$seq2length);
$indexcount++;
}

pop @index;

print "@index";

##End code

This returns the following array @index: 1 10 35

At first I thought it was a counting problem, but now I see that problem is
that 'aa' is matching. If you change the code so that overlapping matches
are allowed, then it returns 1 10 11 35. I don't know why aa matches, and
I don't know enough about String::Approx to know if this is a bug or if it
is correct for some reason I don't understand.

Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top