String::Approx 'aindex' help

Discussion in 'Perl Misc' started by Puri, Aug 24, 2005.

  1. Puri

    Puri Guest

    Hello,

    I am trying to write a simple program using the String-Approx module
    that returns the indexes of multiple matches in a single string. Here
    is my code:
    use String::Approx 'aindex';

    $seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
    $seq2 = "ag";
    $seq2length = length($seq2);

    print "$seq1\n\n";

    $indexcount = 0;
    #$index[0] = aindex($seq2, ["i 0"], $seq1);
    $initial = 0;

    until ($index[$indexcount-1] == -1) {
    $index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
    $seq1);
    $initial=($index[$indexcount]+$seq2length);
    $indexcount++;
    }

    pop @index;

    print "@index";

    ##End code

    This returns the following array @index: 1 10 35

    However, this is incorrect, because the third "ag" is found starting at
    the 11th character, not the 10th. Is there something wrong with the
    code, or an easier way to use the "aindex" function of String-Approx to
    search for multiple matches within one string?

    Thanks in advance,
    Puri
     
    Puri, Aug 24, 2005
    #1
    1. Advertising

  2. Puri

    Guest

    Puri wrote:
    >
    > I am trying to write a simple program using the String-Approx
    > module that returns the indexes of multiple matches in a
    > single string. Here is my code:
    > use String::Approx 'aindex';



    Dear Puri,

    Out of curiosity, couldn't you just use the index() function? If
    you don't know how to use it you can learn by reading "perldoc -f
    index". I suggest this because it doesn't look like to me that you are
    trying to find an approximation, but rather an exact match.


    > $seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
    > $seq2 = "ag";

    [code snipped]
    >
    > This returns the following array @index: 1 10 35
    >
    > However, this is incorrect, because the third "ag" is
    > found starting at the 11th character, not the 10th.
    > Is there something wrong with the code,



    That doesn't seem wrong to me. Normally, positions in Perl start at
    zero, which means that 0 signifies the first character (and that 10
    signifies the 11th character). This property of certain programming
    languages can be confusing to those who aren't familiar with it.

    I hope this helps, Puri.

    -- Jean-Luc
     
    , Aug 24, 2005
    #2
    1. Advertising

  3. Puri

    Puri Guest

    Jean-Luc,

    Thank you for the quick reply, however I don't think that answers my
    question. I am using String-Approx (version 3.25 by the way) because I
    hope to eventually be finding approximate matches within sequences as
    well. However, I wanted to check to make sure that I could get the
    indexing to work properly first with perfect matches (hence the 0 in
    the modifiers).

    As for Perl positions starting at zero, I understand this, and this is
    exactly why I think there is a problem. If you look at the array, it
    says there are matches starting at positions 1, 10, and 35. However,
    if you look at the first sequence I have entered ($seq1), the matches
    should be at 1 (literally the second character entered in my string),
    11 (character#12) and 35. 1 and 35 are correct, but the indexing seems
    to be getting confused with the middle match, possibly because of the
    'a' in front of the 'ag'.

    Any other suggestions will be greatly appreciated.

    -Puri
     
    Puri, Aug 24, 2005
    #3
  4. Puri wrote:
    >
    > As for Perl positions starting at zero, I understand this, and this is
    > exactly why I think there is a problem. If you look at the array, it
    > says there are matches starting at positions 1, 10, and 35. However,
    > if you look at the first sequence I have entered ($seq1), the matches
    > should be at 1 (literally the second character entered in my string),
    > 11 (character#12) and 35. 1 and 35 are correct, but the indexing seems
    > to be getting confused with the middle match, possibly because of the
    > 'a' in front of the 'ag'.
    >
    > Any other suggestions will be greatly appreciated.


    You should ask the author of that module whether that is a bug or the correct
    behavior.


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Aug 25, 2005
    #4
  5. Puri

    Guest

    "Puri" <> wrote:
    ....
    > $seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
    > $seq2 = "ag";
    > $seq2length = length($seq2);
    >
    > print "$seq1\n\n";
    >
    > $indexcount = 0;
    > #$index[0] = aindex($seq2, ["i 0"], $seq1);
    > $initial = 0;
    >
    > until ($index[$indexcount-1] == -1) {
    > $index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
    > $seq1);
    > $initial=($index[$indexcount]+$seq2length);
    > $indexcount++;
    > }
    >
    > pop @index;
    >
    > print "@index";
    >
    > ##End code
    >
    > This returns the following array @index: 1 10 35


    At first I thought it was a counting problem, but now I see that problem is
    that 'aa' is matching. If you change the code so that overlapping matches
    are allowed, then it returns 1 10 11 35. I don't know why aa matches, and
    I don't know enough about String::Approx to know if this an bug or if it is
    correct for some reason I don't understand.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 25, 2005
    #5
  6. Puri

    Guest

    "Puri" <> wrote:
    ....
    > $seq1 = "cagtttgtgtaagtgatcacgtnngatttacatatagccatcg";
    > $seq2 = "ag";
    > $seq2length = length($seq2);
    >
    > print "$seq1\n\n";
    >
    > $indexcount = 0;
    > #$index[0] = aindex($seq2, ["i 0"], $seq1);
    > $initial = 0;
    >
    > until ($index[$indexcount-1] == -1) {
    > $index[$indexcount] = aindex($seq2, ["i 0 initial_position=$initial"],
    > $seq1);
    > $initial=($index[$indexcount]+$seq2length);
    > $indexcount++;
    > }
    >
    > pop @index;
    >
    > print "@index";
    >
    > ##End code
    >
    > This returns the following array @index: 1 10 35


    At first I thought it was a counting problem, but now I see that problem is
    that 'aa' is matching. If you change the code so that overlapping matches
    are allowed, then it returns 1 10 11 35. I don't know why aa matches, and
    I don't know enough about String::Approx to know if this is a bug or if it
    is correct for some reason I don't understand.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Aug 25, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JVarsoke

    Can't locate String::Approx.pm

    JVarsoke, Feb 28, 2004, in forum: Perl
    Replies:
    1
    Views:
    932
    Joe Smith
    Feb 28, 2004
  2. Replies:
    24
    Views:
    663
    jeffc
    Oct 24, 2003
  3. Michael Hertz
    Replies:
    0
    Views:
    333
    Michael Hertz
    Sep 19, 2004
  4. Michael Hertz
    Replies:
    3
    Views:
    451
    Peter Flynn
    Sep 22, 2004
  5. Istvan Albert
    Replies:
    0
    Views:
    455
    Istvan Albert
    Aug 31, 2003
Loading...

Share This Page