How to make Perl's regex engine "halt" after a match

Dominic van der Zypen · Feb 18, 2006

Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

Many thanks for your help!! Dominic

it_says_BALLS_on_your_forehead · Feb 18, 2006

Dominic said:
Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

use strict; use warnings;

my @stack;

my $string = 'This line contains some occurrences of double letters,
hoorray';

while ( $string =~ m/((\w)\2)/g ) {
push @stack, $1;

}

print $_, "\n" for @stack;

robic0 · Feb 18, 2006

Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

Many thanks for your help!! Dominic

This is trivial. Why would you need this?
I would consider this a waste of my time to even read such a proposition.
If you can't post a real world problem/question then don't post here...

Wes Groleau · Feb 19, 2006

it_says_BALLS_on_your_forehead said:
use strict; use warnings;

my @stack;

my $string = 'This line contains some occurrences of double letters,
hoorray';

while ( $string =~ m/((\w)\2)/g ) {
push @stack, $1;

}

print $_, "\n" for @stack;

The above works (I tried it). Perl Cookbook 6.0 suggested
something else, but it didn't work:

Graphite:~ wgroleau$ perl -e '

use strict; use warnings;
my @stack;
my $string = "This line contains some occurrences of double letters, hoorray";
@stack = $string =~ /((\w)\2)/g;
print "Stack: @stack\n";
'

Stack: cc c rr r tt t oo o rr r
Graphite:~ wgroleau$

What did I miss?

--
Wes Groleau

Answer not a fool according to his folly,
lest thou also be like unto him.
Answer a fool according to his folly,
lest he be wise according to his own conceit.
-- Solomon

Are you saying there's no good way to answer a fool?
-- Groleau

DJ Stunks · Feb 19, 2006

Wes said:
The above works (I tried it). Perl Cookbook 6.0 suggested
something else, but it didn't work:

Graphite:~ wgroleau$ perl -e '
Stack: cc c rr r tt t oo o rr r
Graphite:~ wgroleau$

What did I miss?

I don't have a copy of that, but from perldoc perlop:

The /g modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it
behaves depends on the context. In list context, it returns a
list of the substrings matched by any capturing parentheses
in the regular expression.

Since there are two sets of capturing parentheses list context returns
both values: the cc (from $1) AND the c (from $2).

Unless one of the local grandmasters steps in, I'd say there's no way
to perform this match all at once in list context. Instead one must
step through in scalar context as Mr. BALLS has.

-jp

it_says_BALLS_on_your_forehead · Feb 19, 2006

Samwyse said:
My knee-jerk reaction was to use non-capturing parentheses, but that
would just break everything. What you really need is to only capture
the odd-numbered values. Filtering values from a list makes me think of
using map. Note that map can transform individual values to into lists,
not just new values, and those lists are then concatenated together to
form a result. So, we need to return an empty list for the values we
don't care about. This should work:

@stack = map {length == 2 ? $_ : ()} ($string =~ /((\w)\2)/g);

why use map when grep is more appropriate? it seems that you're forcing
map to discard elements via the empty list, but grep is better suited
to selection from a list...

my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
print $_, "\n" for @t;

it_says_BALLS_on_your_forehead · Feb 19, 2006

why use map when grep is more appropriate? it seems that you're forcing
map to discard elements via the empty list, but grep is better suited
to selection from a list...

my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
print $_, "\n" for @t;

actually, inside the block, length == 2 is probably more efficient, but
/\w\w/ is shorter

.

Samwyse · Feb 19, 2006

Hmmm, ahhh, I just wanted to see if you you were paying attention? ;-)

actually, inside the block, length == 2 is probably more efficient, but
/\w\w/ is shorter .

/../ is even shorter.

Anno Siegel · Feb 20, 2006

DJ Stunks said:
I don't have a copy of that, but from perldoc perlop:

The /g modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it
behaves depends on the context. In list context, it returns a
list of the substrings matched by any capturing parentheses
in the regular expression.

Since there are two sets of capturing parentheses list context returns
both values: the cc (from $1) AND the c (from $2).

Unless one of the local grandmasters steps in, I'd say there's no way
to perform this match all at once in list context. Instead one must
step through in scalar context as Mr. BALLS has.

If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

Anno

John W. Krahn · Feb 20, 2006

Anno said:
If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

Easy enough to "fix".

my @stack = map $_ x 2, $line =~ /(.)(?=\1)/g;

John

Aaron Baugher · Feb 20, 2006

If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

That was my idea: keep the regex simple by only having one capture,
and then double them:

my @stack = $line =~ /(\w)\1/g;
$_ x= 2 for @stack;

Not sure whether that would be faster than the other solutions. It
makes the regex simpler, but adds a foreach loop instead of the maps
and greps of the other solutions.

Wayne M. Poe · Nov 17, 2006

[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]

This is trivial. Why would you need this?
I would consider this a waste of my time to even read such a
proposition.
If you can't post a real world problem/question then don't post
here...

I was reading this on google groups archives and I just had to reply to
it. I understand I'm a few months late, but being in the hospital at
that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it was
originally posted.

Since when can one not post a simplified version of the problem to make
it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
Rather than posting a longer code snippet where one would have to sift
through the code to find the real problem?

Or maybe that's just me.

Jim Gibson · Nov 17, 2006

Wayne M. Poe said:
[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]

On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"

Click to expand...

[OP snipped]

I was reading this on google groups archives and I just had to reply to
it. I understand I'm a few months late, but being in the hospital at
that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it was
originally posted.

Since when can one not post a simplified version of the problem to make
it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
Rather than posting a longer code snippet where one would have to sift
through the code to find the real problem?

Or maybe that's just me.

robic0 is a known troll. Many or most of the regulars here simply
ignore his posts, for good reason.

Tad McClellan · Nov 18, 2006

Wayne M. Poe said:
[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]

robic0 wrote:

I was reading this on google groups archives and I just had to reply to
it.

Please do not feed the troll.

I'm actually surprised no one responded to this post at the time it was
originally posted.

Because not feeding a troll is how you make them go elsewhere.

Wayne M. Poe · Nov 18, 2006

Jim said:
Wayne M. Poe said:

[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]

On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"

Click to expand...

Click to expand...

[OP snipped]

I was reading this on google groups archives and I just had to reply
to it. I understand I'm a few months late, but being in the hospital
at that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it
was originally posted.

Since when can one not post a simplified version of the problem to
make it easier to trouble shoot? Isn't that what you are SUPPOSED to
do? Rather than posting a longer code snippet where one would have
to sift through the code to find the real problem?

Or maybe that's just me.

Click to expand...

robic0 is a known troll. Many or most of the regulars here simply
ignore his posts, for good reason.

So noted.

Regex to match a numerical IP range	7	Dec 11, 2010
FAQ 6.4 How do I match XML, HTML, or other nasty, ugly things with a regex?	0	Jan 27, 2011
Doing both regex match and assignment within a If loop?	7	Mar 29, 2013
FAQ 5.5 How can I use Perl's "-i" option from within a program?	0	Apr 24, 2011
FAQ 6.9 How can I quote a variable to use in a regex?	10	Apr 12, 2011
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
FAQ 8.33 Is there a way to hide perl's command line from programs such as "ps"?	0	Apr 20, 2011
How to disregard the first match of a loop?	22	Aug 9, 2011

How to make Perl's regex engine "halt" after a match

Dominic van der Zypen

it_says_BALLS_on_your_forehead

robic0

Wes Groleau

DJ Stunks

it_says_BALLS_on_your_forehead

it_says_BALLS_on_your_forehead

Samwyse

Anno Siegel

John W. Krahn

Aaron Baugher

Wayne M. Poe

Jim Gibson

Tad McClellan

Wayne M. Poe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads