How to make Perl's regex engine "halt" after a match

Discussion in 'Perl Misc' started by Dominic van der Zypen, Feb 18, 2006.

  1. Hello!

    I'd like to do the following:

    Given just one line of text, match every occurrence of a double letter
    and push those double letters on @my_stack. Say, if we are given

    $line = "This line contains some occurrences of double letters,
    hoorray";

    then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
    "rr";

    Everything I tried so far just put the first occurrence of double
    letters (in this example, "cc") on the stack, even if I used the /g
    option for my match. I suppose the regex engine matched every
    occurrence, but somehow it didnt "halt" and "care to push" every single
    one of the on @my_stack.

    So... how should I go about the above problem?

    Many thanks for your help!! Dominic
    Dominic van der Zypen, Feb 18, 2006
    #1
    1. Advertising

  2. Dominic van der Zypen wrote:
    > Hello!
    >
    > I'd like to do the following:
    >
    > Given just one line of text, match every occurrence of a double letter
    > and push those double letters on @my_stack. Say, if we are given
    >
    > $line = "This line contains some occurrences of double letters,
    > hoorray";
    >
    > then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
    > "rr";
    >
    > Everything I tried so far just put the first occurrence of double
    > letters (in this example, "cc") on the stack, even if I used the /g
    > option for my match. I suppose the regex engine matched every
    > occurrence, but somehow it didnt "halt" and "care to push" every single
    > one of the on @my_stack.
    >
    > So... how should I go about the above problem?


    use strict; use warnings;

    my @stack;

    my $string = 'This line contains some occurrences of double letters,
    hoorray';

    while ( $string =~ m/((\w)\2)/g ) {
    push @stack, $1;

    }

    print $_, "\n" for @stack;
    it_says_BALLS_on_your_forehead, Feb 18, 2006
    #2
    1. Advertising

  3. Dominic van der Zypen

    robic0 Guest

    On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen" <> wrote:

    >Hello!
    >
    >I'd like to do the following:
    >
    >Given just one line of text, match every occurrence of a double letter
    >and push those double letters on @my_stack. Say, if we are given
    >
    >$line = "This line contains some occurrences of double letters,
    >hoorray";
    >
    >then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
    >"rr";
    >
    >Everything I tried so far just put the first occurrence of double
    >letters (in this example, "cc") on the stack, even if I used the /g
    >option for my match. I suppose the regex engine matched every
    >occurrence, but somehow it didnt "halt" and "care to push" every single
    >one of the on @my_stack.
    >
    >So... how should I go about the above problem?
    >
    >Many thanks for your help!! Dominic


    This is trivial. Why would you need this?
    I would consider this a waste of my time to even read such a proposition.
    If you can't post a real world problem/question then don't post here...
    robic0, Feb 18, 2006
    #3
  4. Dominic van der Zypen

    Wes Groleau Guest

    it_says_BALLS_on_your_forehead wrote:
    > Dominic van der Zypen wrote:
    >>Given just one line of text, match every occurrence of a double letter
    >>and push those double letters on @my_stack. Say, if we are given
    >>
    >>$line = "This line contains some occurrences of double letters,
    >>hoorray";
    >>
    >>then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
    >>"rr";
    >>
    >>Everything I tried so far just put the first occurrence of double
    >>letters (in this example, "cc") on the stack, even if I used the /g
    >>option for my match. I suppose the regex engine matched every
    >>occurrence, but somehow it didnt "halt" and "care to push" every single
    >>one of the on @my_stack.
    >>
    >>So... how should I go about the above problem?

    >
    >
    > use strict; use warnings;
    >
    > my @stack;
    >
    > my $string = 'This line contains some occurrences of double letters,
    > hoorray';
    >
    > while ( $string =~ m/((\w)\2)/g ) {
    > push @stack, $1;
    >
    > }
    >
    > print $_, "\n" for @stack;
    >


    The above works (I tried it). Perl Cookbook 6.0 suggested
    something else, but it didn't work:

    Graphite:~ wgroleau$ perl -e '
    > use strict; use warnings;
    > my @stack;
    > my $string = "This line contains some occurrences of double letters,

    hoorray";
    > @stack = $string =~ /((\w)\2)/g;
    > print "Stack: @stack\n";
    > '

    Stack: cc c rr r tt t oo o rr r
    Graphite:~ wgroleau$

    What did I miss?

    --
    Wes Groleau

    Answer not a fool according to his folly,
    lest thou also be like unto him.
    Answer a fool according to his folly,
    lest he be wise according to his own conceit.
    -- Solomon

    Are you saying there's no good way to answer a fool?
    -- Groleau
    Wes Groleau, Feb 19, 2006
    #4
  5. Dominic van der Zypen

    DJ Stunks Guest

    Wes Groleau wrote:
    > it_says_BALLS_on_your_forehead wrote:
    > > use strict; use warnings;
    > >
    > > my @stack;
    > >
    > > my $string = 'This line contains some occurrences of double letters,
    > > hoorray';
    > >
    > > while ( $string =~ m/((\w)\2)/g ) {
    > > push @stack, $1;
    > >
    > > }
    > >
    > > print $_, "\n" for @stack;
    > >

    >
    > The above works (I tried it). Perl Cookbook 6.0 suggested
    > something else, but it didn't work:
    >
    > Graphite:~ wgroleau$ perl -e '
    > > use strict; use warnings;
    > > my @stack;
    > > my $string = "This line contains some occurrences of double letters,

    > hoorray";
    > > @stack = $string =~ /((\w)\2)/g;
    > > print "Stack: @stack\n";
    > > '

    > Stack: cc c rr r tt t oo o rr r
    > Graphite:~ wgroleau$
    >
    > What did I miss?


    I don't have a copy of that, but from perldoc perlop:

    The /g modifier specifies global pattern matching--that is,
    matching as many times as possible within the string. How it
    behaves depends on the context. In list context, it returns a
    list of the substrings matched by any capturing parentheses
    in the regular expression.

    Since there are two sets of capturing parentheses list context returns
    both values: the cc (from $1) AND the c (from $2).

    Unless one of the local grandmasters steps in, I'd say there's no way
    to perform this match all at once in list context. Instead one must
    step through in scalar context as Mr. BALLS has.

    -jp
    DJ Stunks, Feb 19, 2006
    #5
  6. Samwyse wrote:
    > DJ Stunks wrote:
    > > Wes Groleau wrote:
    > >
    > >>it_says_BALLS_on_your_forehead wrote:
    > >>
    > >>>use strict; use warnings;
    > >>>
    > >>>my @stack;
    > >>>
    > >>>my $string = 'This line contains some occurrences of double letters,
    > >>>hoorray';
    > >>>
    > >>>while ( $string =~ m/((\w)\2)/g ) {
    > >>> push @stack, $1;
    > >>>
    > >>>}
    > >>>
    > >>>print $_, "\n" for @stack;
    > >>>
    > >>
    > >>The above works (I tried it). Perl Cookbook 6.0 suggested
    > >>something else, but it didn't work:
    > >>
    > >>Graphite:~ wgroleau$ perl -e '
    > >> > use strict; use warnings;
    > >> > my @stack;
    > >> > my $string = "This line contains some occurrences of double letters,
    > >>hoorray";
    > >> > @stack = $string =~ /((\w)\2)/g;
    > >> > print "Stack: @stack\n";
    > >> > '
    > >>Stack: cc c rr r tt t oo o rr r
    > >>Graphite:~ wgroleau$
    > >>
    > >>What did I miss?

    > >
    > > Since there are two sets of capturing parentheses list context returns
    > > both values: the cc (from $1) AND the c (from $2).
    > >
    > > Unless one of the local grandmasters steps in, I'd say there's no way
    > > to perform this match all at once in list context.

    >
    > My knee-jerk reaction was to use non-capturing parentheses, but that
    > would just break everything. What you really need is to only capture
    > the odd-numbered values. Filtering values from a list makes me think of
    > using map. Note that map can transform individual values to into lists,
    > not just new values, and those lists are then concatenated together to
    > form a result. So, we need to return an empty list for the values we
    > don't care about. This should work:
    >
    > @stack = map {length == 2 ? $_ : ()} ($string =~ /((\w)\2)/g);


    why use map when grep is more appropriate? it seems that you're forcing
    map to discard elements via the empty list, but grep is better suited
    to selection from a list...

    my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
    print $_, "\n" for @t;
    it_says_BALLS_on_your_forehead, Feb 19, 2006
    #6

  7. > >
    > > @stack = map {length == 2 ? $_ : ()} ($string =~ /((\w)\2)/g);

    >
    > why use map when grep is more appropriate? it seems that you're forcing
    > map to discard elements via the empty list, but grep is better suited
    > to selection from a list...
    >
    > my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
    > print $_, "\n" for @t;


    actually, inside the block, length == 2 is probably more efficient, but
    /\w\w/ is shorter :).
    it_says_BALLS_on_your_forehead, Feb 19, 2006
    #7
  8. Dominic van der Zypen

    Samwyse Guest

    it_says_BALLS_on_your_forehead wrote:
    >>>@stack = map {length == 2 ? $_ : ()} ($string =~ /((\w)\2)/g);

    >>
    >>why use map when grep is more appropriate? it seems that you're forcing
    >>map to discard elements via the empty list, but grep is better suited
    >>to selection from a list...
    >>
    >>my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
    >>print $_, "\n" for @t;


    Hmmm, ahhh, I just wanted to see if you you were paying attention? ;-)

    > actually, inside the block, length == 2 is probably more efficient, but
    > /\w\w/ is shorter :).


    /../ is even shorter.
    Samwyse, Feb 19, 2006
    #8
  9. Dominic van der Zypen

    Anno Siegel Guest

    DJ Stunks <> wrote in comp.lang.perl.misc:
    > Wes Groleau wrote:
    > > it_says_BALLS_on_your_forehead wrote:
    > > > use strict; use warnings;
    > > >
    > > > my @stack;
    > > >
    > > > my $string = 'This line contains some occurrences of double letters,
    > > > hoorray';
    > > >
    > > > while ( $string =~ m/((\w)\2)/g ) {
    > > > push @stack, $1;
    > > >
    > > > }
    > > >
    > > > print $_, "\n" for @stack;
    > > >

    > >
    > > The above works (I tried it). Perl Cookbook 6.0 suggested
    > > something else, but it didn't work:
    > >
    > > Graphite:~ wgroleau$ perl -e '
    > > > use strict; use warnings;
    > > > my @stack;
    > > > my $string = "This line contains some occurrences of double letters,

    > > hoorray";
    > > > @stack = $string =~ /((\w)\2)/g;
    > > > print "Stack: @stack\n";
    > > > '

    > > Stack: cc c rr r tt t oo o rr r
    > > Graphite:~ wgroleau$
    > >
    > > What did I miss?

    >
    > I don't have a copy of that, but from perldoc perlop:
    >
    > The /g modifier specifies global pattern matching--that is,
    > matching as many times as possible within the string. How it
    > behaves depends on the context. In list context, it returns a
    > list of the substrings matched by any capturing parentheses
    > in the regular expression.
    >
    > Since there are two sets of capturing parentheses list context returns
    > both values: the cc (from $1) AND the c (from $2).
    >
    > Unless one of the local grandmasters steps in, I'd say there's no way
    > to perform this match all at once in list context. Instead one must
    > step through in scalar context as Mr. BALLS has.


    If you are happy with capturing only the first letter of each pair,
    this will do:

    my @stack = $line =~ /(.)(?=\1)/g;
    print "@stack\n";

    c r t o r

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
    Anno Siegel, Feb 20, 2006
    #9
  10. Anno Siegel wrote:
    > DJ Stunks <> wrote in comp.lang.perl.misc:
    >>
    >>Unless one of the local grandmasters steps in, I'd say there's no way
    >>to perform this match all at once in list context. Instead one must
    >>step through in scalar context as Mr. BALLS has.

    >
    > If you are happy with capturing only the first letter of each pair,
    > this will do:
    >
    > my @stack = $line =~ /(.)(?=\1)/g;
    > print "@stack\n";
    >
    > c r t o r


    Easy enough to "fix".

    my @stack = map $_ x 2, $line =~ /(.)(?=\1)/g;


    :)

    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Feb 20, 2006
    #10
  11. -berlin.de (Anno Siegel) writes:

    > If you are happy with capturing only the first letter of each pair,
    > this will do:
    >
    > my @stack = $line =~ /(.)(?=\1)/g;
    > print "@stack\n";
    >
    > c r t o r


    That was my idea: keep the regex simple by only having one capture,
    and then double them:

    my @stack = $line =~ /(\w)\1/g;
    $_ x= 2 for @stack;

    Not sure whether that would be faster than the other solutions. It
    makes the regex simpler, but adds a foreach loop instead of the maps
    and greps of the other solutions.


    --
    Aaron --
    http://360.yahoo.com/aaron_baugher
    Aaron Baugher, Feb 20, 2006
    #11
  12. Dominic van der Zypen

    Wayne M. Poe Guest

    [This is a reply to a thread from earlier this year
    Reply generated from source post with full headers
    from groups.google.com]

    robic0 wrote:
    > On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"
    > <> wrote:
    >
    >> Hello!
    >>
    >> I'd like to do the following:
    >>
    >> Given just one line of text, match every occurrence of a double
    >> letter and push those double letters on @my_stack. Say, if we are
    >> given
    >>
    >> $line = "This line contains some occurrences of double letters,
    >> hoorray";
    >>
    >> then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
    >> "rr";
    >>
    >> Everything I tried so far just put the first occurrence of double
    >> letters (in this example, "cc") on the stack, even if I used the /g
    >> option for my match. I suppose the regex engine matched every
    >> occurrence, but somehow it didnt "halt" and "care to push" every
    >> single one of the on @my_stack.
    >>
    >> So... how should I go about the above problem?
    >>
    >> Many thanks for your help!! Dominic

    >
    > This is trivial. Why would you need this?
    > I would consider this a waste of my time to even read such a
    > proposition.
    > If you can't post a real world problem/question then don't post
    > here...


    I was reading this on google groups archives and I just had to reply to
    it. I understand I'm a few months late, but being in the hospital at
    that time fighting cancer I hope is a good enough reason.

    I'm actually surprised no one responded to this post at the time it was
    originally posted.

    Since when can one not post a simplified version of the problem to make
    it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
    Rather than posting a longer code snippet where one would have to sift
    through the code to find the real problem?

    Or maybe that's just me.
    Wayne M. Poe, Nov 17, 2006
    #12
  13. Dominic van der Zypen

    Jim Gibson Guest

    In article <>, Wayne M. Poe
    <> wrote:

    > [This is a reply to a thread from earlier this year
    > Reply generated from source post with full headers
    > from groups.google.com]
    >
    > robic0 wrote:
    > > On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"
    > > <> wrote:
    > >


    [OP snipped]

    > >
    > > This is trivial. Why would you need this?
    > > I would consider this a waste of my time to even read such a
    > > proposition.
    > > If you can't post a real world problem/question then don't post
    > > here...

    >
    > I was reading this on google groups archives and I just had to reply to
    > it. I understand I'm a few months late, but being in the hospital at
    > that time fighting cancer I hope is a good enough reason.
    >
    > I'm actually surprised no one responded to this post at the time it was
    > originally posted.
    >
    > Since when can one not post a simplified version of the problem to make
    > it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
    > Rather than posting a longer code snippet where one would have to sift
    > through the code to find the real problem?
    >
    > Or maybe that's just me.


    robic0 is a known troll. Many or most of the regulars here simply
    ignore his posts, for good reason.
    Jim Gibson, Nov 17, 2006
    #13
  14. Wayne M. Poe <> wrote:
    > [This is a reply to a thread from earlier this year
    > Reply generated from source post with full headers
    > from groups.google.com]
    >
    > robic0 wrote:



    > I was reading this on google groups archives and I just had to reply to
    > it.



    Please do not feed the troll.


    > I'm actually surprised no one responded to this post at the time it was
    > originally posted.



    Because not feeding a troll is how you make them go elsewhere.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Nov 18, 2006
    #14
  15. Dominic van der Zypen

    Wayne M. Poe Guest

    Jim Gibson wrote:
    > In article <>, Wayne M. Poe
    > <> wrote:
    >
    > > [This is a reply to a thread from earlier this year
    > > Reply generated from source post with full headers
    > > from groups.google.com]
    > >
    > > robic0 wrote:
    > > > On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"
    > > > <> wrote:
    > > >

    >
    > [OP snipped]
    >
    > > >
    > > > This is trivial. Why would you need this?
    > > > I would consider this a waste of my time to even read such a
    > > > proposition.
    > > > If you can't post a real world problem/question then don't post
    > > > here...

    > >
    > > I was reading this on google groups archives and I just had to reply
    > > to it. I understand I'm a few months late, but being in the hospital
    > > at that time fighting cancer I hope is a good enough reason.
    > >
    > > I'm actually surprised no one responded to this post at the time it
    > > was originally posted.
    > >
    > > Since when can one not post a simplified version of the problem to
    > > make it easier to trouble shoot? Isn't that what you are SUPPOSED to
    > > do? Rather than posting a longer code snippet where one would have
    > > to sift through the code to find the real problem?
    > >
    > > Or maybe that's just me.

    >
    > robic0 is a known troll. Many or most of the regulars here simply
    > ignore his posts, for good reason.


    So noted.
    Wayne M. Poe, Nov 18, 2006
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    688
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. hiwa
    Replies:
    0
    Views:
    628
  3. Simon Faulkner

    Halt, stop, quit, exit?

    Simon Faulkner, Oct 13, 2003, in forum: Python
    Replies:
    5
    Views:
    5,149
    Peter Hansen
    Oct 14, 2003
  4. Chris P.
    Replies:
    4
    Views:
    872
    Martin Bless
    Jan 30, 2005
  5. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page