How find all overlapping pattern?

Discussion in 'Perl Misc' started by Peng Yu, Feb 7, 2011.

  1. Peng Yu

    Peng Yu Guest

    $string="abcabcabc";
    @findall = $string =~ /abcabc/g;
    print scalar(@findall), "\n";

    The above commands will print 1 rather than 2. Because there are two
    overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    correct way to find all overlapping regexes. (Note that I gave
    'abcabc' as an example, but it could be any complex regex) Thanks!
     
    Peng Yu, Feb 7, 2011
    #1
    1. Advertising

  2. Peng Yu

    Guest

    On Feb 7, 8:31 am, Peng Yu <> wrote:
    > $string="abcabcabc";
    > @findall = $string =~ /abcabc/g;
    > print scalar(@findall), "\n";
    >
    > The above commands will print 1 rather than 2. Because there are two
    > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    > correct way to find all overlapping regexes. (Note that I gave
    > 'abcabc' as an example, but it could be any complex regex) Thanks!



    Dear Peng Yu,

    Here's one way to do it:

    while ($string =~ m/(abcabc)/g)
    {
    push @findall, $1;
    pos($string) = $-[0] + 1;
    }

    If you prefer to implement it in one line of code, you can do this:

    push(@findall, $1) and pos($string) = $-[0] + 1
    while $string =~ m/(abcabc)/g;

    Here's the explanation of what is happening: Normally, m//g and
    s///g both make additional matches AFTER (or right at) the end of the
    previous match, meaning that you can't directly use them to find
    overlapping patterns. However, inside a while($string =~ m//g) loop
    you can manipulate the pos($string) variable to force m//g to begin
    looking wherever you want -- or in your case, one character after the
    start of the last match. (You have to start one (or more) characters
    after, because if you started at (or before) the start of the last
    match the loop would be infinite.)

    As for the $-[0] variable, that's the first element of the @-
    array, which you can look up with "perldoc -v @-". $-[0] is basically
    the start of the last successful match, so ($-[0] + 1) would be the
    earliest where you would want to continue your search for overlapping
    patterns.

    I hope this helps, Peng Yu.

    Cheers,

    -- Jean-Luc
     
    , Feb 7, 2011
    #2
    1. Advertising

  3. Peng Yu

    ccc31807 Guest

    On Feb 7, 10:31 am, Peng Yu <> wrote:
    > $string="abcabcabc";
    > @findall = $string =~ /abcabc/g;
    > print scalar(@findall), "\n";
    >
    > The above commands will print 1 rather than 2. Because there are two
    > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    > correct way to find all overlapping regexes. (Note that I gave
    > 'abcabc' as an example, but it could be any complex regex) Thanks!


    You don't have to use a regular expression in a case like this. You
    can use index($string, $substring, $position) in a loop, ending the
    loop which $position is less than zero. This is how you might do it in
    a language like C.

    Sometimes, the simpler way is better.

    CC.
     
    ccc31807, Feb 7, 2011
    #3
  4. On 2011-02-07, Peng Yu <> wrote:
    > $string="abcabcabc";
    > @findall = $string =~ /abcabc/g;
    > print scalar(@findall), "\n";
    >
    > The above commands will print 1 rather than 2. Because there are two
    > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    > correct way to find all overlapping regexes. (Note that I gave
    > 'abcabc' as an example, but it could be any complex regex) Thanks!


    Do not use RExes which "move the match point too far" (i.e., match
    more than one character). In some situations 0-length match may cause
    a problem (non-intuitive semantic), but if the REx is ALWAYS matching
    0-length substring, the match rules are intuitive again.

    So use /(?=(abcabc))/g.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Feb 7, 2011
    #4
  5. Peng Yu

    C.DeRykus Guest

    On Feb 7, 8:46 am, ccc31807 <> wrote:
    > On Feb 7, 10:31 am, Peng Yu <> wrote:
    >
    > > $string="abcabcabc";
    > > @findall = $string =~ /abcabc/g;
    > > print scalar(@findall), "\n";

    >
    > > The above commands will print 1 rather than 2. Because there are two
    > > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    > > correct way to find all overlapping regexes. (Note that I gave
    > > 'abcabc' as an example, but it could be any complex regex) Thanks!

    >
    > You don't have to use a regular expression in a case like this. You
    > can use index($string, $substring, $position) in a loop, ending the
    > loop which $position is less than zero. This is how you might do it in
    > a language like C.
    >
    > Sometimes, the simpler way is better.
    >


    True in some cases but, IMO, a regex
    is shorter and arguably much easier
    here:


    $_ = "abcabcabc";
    ($count, $pos ) = ( 0, 0 );


    # regex
    $count++ while /(?=abcabc)/g and ++$pos;

    vs.

    # index
    while ($pos != -1 ) {
    $pos = index( $_, 'abcabc', $pos );
    $count++,$pos++ unless $pos == -1;
    }

    # and a trap lurks with this alternative
    while ($pos != -1 ) {
    $pos = index( $_, 'abcabc', $pos );
    $count++ and $pos++ unless $pos == -1;
    }

    --
    Charles DeRykus
     
    C.DeRykus, Feb 8, 2011
    #5
  6. On 2011-02-07 16:46, ccc31807 <> wrote:
    > On Feb 7, 10:31 am, Peng Yu <> wrote:
    >> $string="abcabcabc";
    >> @findall = $string =~ /abcabc/g;
    >> print scalar(@findall), "\n";
    >>
    >> The above commands will print 1 rather than 2. Because there are two
    >> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    >> correct way to find all overlapping regexes. (Note that I gave

    ^^^^^^^^^^^^^^^^
    >> 'abcabc' as an example, but it could be any complex regex) Thanks!

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    >
    > You don't have to use a regular expression in a case like this. You
    > can use index($string, $substring, $position) in a loop,


    You did read what the OP wrote, did you?

    hp
     
    Peter J. Holzer, Feb 8, 2011
    #6
  7. Peng Yu

    ccc31807 Guest

    On Feb 8, 10:09 am, "Peter J. Holzer" <> wrote:
    > You did read what the OP wrote, did you?


    I did, and I thought about it. Several times in the past few weeks,
    I've had problems with REs acting poorly, and used other means to do
    what I needed to do, primarily index() and substr().

    My point was not that an RE can always be replaced by built in
    functions, but that an RE can sometimes be replaced by built in
    functions.

    CC.
     
    ccc31807, Feb 8, 2011
    #7
  8. Peng Yu

    Guest

    > On Feb 7, 8:31 am, Peng Yu <> wrote:
    >
    > > $string="abcabcabc";
    > > @findall = $string =~ /abcabc/g;
    > > print scalar(@findall), "\n";

    >
    > > The above commands will print 1 rather than 2. Because there are two
    > > overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    > > correct way to find all overlapping regexes. (Note that I gave
    > > 'abcabc' as an example, but it could be any complex regex) Thanks!



    On Feb 7, 9:02 am, "" <>
    replied:
    >
    >    Here's one way to do it:
    >
    >       while ($string =~ m/(abcabc)/g)
    >       {
    >          push @findall, $1;
    >          pos($string) = $-[0] + 1;
    >       }



    Hmmm... after reading the other replies, I think that:

    @findall = $string =~ /(?=abcabc)/g;

    (which uses a positive look-head) is probably the cleaner solution.

    Just my opinion.

    -- Jean-Luc
     
    , Feb 8, 2011
    #8
  9. Peng Yu

    Guest

    On Mon, 7 Feb 2011 22:24:25 +0000 (UTC), Ilya Zakharevich <> wrote:

    >On 2011-02-07, Peng Yu <> wrote:
    >> $string="abcabcabc";
    >> @findall = $string =~ /abcabc/g;
    >> print scalar(@findall), "\n";
    >>
    >> The above commands will print 1 rather than 2. Because there are two
    >> overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
    >> correct way to find all overlapping regexes. (Note that I gave
    >> 'abcabc' as an example, but it could be any complex regex) Thanks!

    >
    >Do not use RExes which "move the match point too far" (i.e., match
    >more than one character). In some situations 0-length match may cause
    >a problem (non-intuitive semantic), but if the REx is ALWAYS matching
    >0-length substring, the match rules are intuitive again.
    >
    >So use /(?=(abcabc))/g.
    >


    s/ALWAYS/ONLY/

    Nice, and the behavior should be the same if quantifiers and/or
    assertions are added.

    -sln
     
    , Feb 8, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony
    Replies:
    4
    Views:
    2,151
    Andy De Petter
    Nov 27, 2003
  2. kj
    Replies:
    2
    Views:
    540
  3. Wybo Dekker
    Replies:
    1
    Views:
    374
    Yukihiro Matsumoto
    Nov 15, 2005
  4. nani
    Replies:
    2
    Views:
    169
    comp.llang.perl.moderated
    Mar 14, 2008
  5. Linsey Raaijmakers

    find overlapping lines & output times observed

    Linsey Raaijmakers, May 6, 2013, in forum: Python
    Replies:
    1
    Views:
    97
    Oscar Benjamin
    May 6, 2013
Loading...

Share This Page