How to make Perl's regex engine "halt" after a match

  • Thread starter Dominic van der Zypen
  • Start date
D

Dominic van der Zypen

Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

Many thanks for your help!! Dominic
 
I

it_says_BALLS_on_your_forehead

Dominic said:
Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

use strict; use warnings;

my @stack;

my $string = 'This line contains some occurrences of double letters,
hoorray';

while ( $string =~ m/((\w)\2)/g ) {
push @stack, $1;

}

print $_, "\n" for @stack;
 
R

robic0

Hello!

I'd like to do the following:

Given just one line of text, match every occurrence of a double letter
and push those double letters on @my_stack. Say, if we are given

$line = "This line contains some occurrences of double letters,
hoorray";

then I want @my_stack to end up containing "cc", "rr", "tt", "oo",
"rr";

Everything I tried so far just put the first occurrence of double
letters (in this example, "cc") on the stack, even if I used the /g
option for my match. I suppose the regex engine matched every
occurrence, but somehow it didnt "halt" and "care to push" every single
one of the on @my_stack.

So... how should I go about the above problem?

Many thanks for your help!! Dominic

This is trivial. Why would you need this?
I would consider this a waste of my time to even read such a proposition.
If you can't post a real world problem/question then don't post here...
 
W

Wes Groleau

it_says_BALLS_on_your_forehead said:
use strict; use warnings;

my @stack;

my $string = 'This line contains some occurrences of double letters,
hoorray';

while ( $string =~ m/((\w)\2)/g ) {
push @stack, $1;

}

print $_, "\n" for @stack;

The above works (I tried it). Perl Cookbook 6.0 suggested
something else, but it didn't work:

Graphite:~ wgroleau$ perl -e '
use strict; use warnings;
my @stack;
my $string = "This line contains some occurrences of double letters, hoorray";
@stack = $string =~ /((\w)\2)/g;
print "Stack: @stack\n";
'
Stack: cc c rr r tt t oo o rr r
Graphite:~ wgroleau$

What did I miss?

--
Wes Groleau

Answer not a fool according to his folly,
lest thou also be like unto him.
Answer a fool according to his folly,
lest he be wise according to his own conceit.
-- Solomon

Are you saying there's no good way to answer a fool?
-- Groleau
 
D

DJ Stunks

Wes said:
The above works (I tried it). Perl Cookbook 6.0 suggested
something else, but it didn't work:

Graphite:~ wgroleau$ perl -e '
Stack: cc c rr r tt t oo o rr r
Graphite:~ wgroleau$

What did I miss?

I don't have a copy of that, but from perldoc perlop:

The /g modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it
behaves depends on the context. In list context, it returns a
list of the substrings matched by any capturing parentheses
in the regular expression.

Since there are two sets of capturing parentheses list context returns
both values: the cc (from $1) AND the c (from $2).

Unless one of the local grandmasters steps in, I'd say there's no way
to perform this match all at once in list context. Instead one must
step through in scalar context as Mr. BALLS has.

-jp
 
I

it_says_BALLS_on_your_forehead

Samwyse said:
My knee-jerk reaction was to use non-capturing parentheses, but that
would just break everything. What you really need is to only capture
the odd-numbered values. Filtering values from a list makes me think of
using map. Note that map can transform individual values to into lists,
not just new values, and those lists are then concatenated together to
form a result. So, we need to return an empty list for the values we
don't care about. This should work:

@stack = map {length == 2 ? $_ : ()} ($string =~ /((\w)\2)/g);

why use map when grep is more appropriate? it seems that you're forcing
map to discard elements via the empty list, but grep is better suited
to selection from a list...

my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
print $_, "\n" for @t;
 
I

it_says_BALLS_on_your_forehead

why use map when grep is more appropriate? it seems that you're forcing
map to discard elements via the empty list, but grep is better suited
to selection from a list...

my @t = grep {/\w\w/} ( $string =~m/((\w)\2)/g );
print $_, "\n" for @t;

actually, inside the block, length == 2 is probably more efficient, but
/\w\w/ is shorter :).
 
S

Samwyse

Hmmm, ahhh, I just wanted to see if you you were paying attention? ;-)
actually, inside the block, length == 2 is probably more efficient, but
/\w\w/ is shorter :).

/../ is even shorter.
 
A

Anno Siegel

DJ Stunks said:
I don't have a copy of that, but from perldoc perlop:

The /g modifier specifies global pattern matching--that is,
matching as many times as possible within the string. How it
behaves depends on the context. In list context, it returns a
list of the substrings matched by any capturing parentheses
in the regular expression.

Since there are two sets of capturing parentheses list context returns
both values: the cc (from $1) AND the c (from $2).

Unless one of the local grandmasters steps in, I'd say there's no way
to perform this match all at once in list context. Instead one must
step through in scalar context as Mr. BALLS has.

If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

Anno
 
J

John W. Krahn

Anno said:
If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

Easy enough to "fix".

my @stack = map $_ x 2, $line =~ /(.)(?=\1)/g;


:)

John
 
A

Aaron Baugher

If you are happy with capturing only the first letter of each pair,
this will do:

my @stack = $line =~ /(.)(?=\1)/g;
print "@stack\n";

c r t o r

That was my idea: keep the regex simple by only having one capture,
and then double them:

my @stack = $line =~ /(\w)\1/g;
$_ x= 2 for @stack;

Not sure whether that would be faster than the other solutions. It
makes the regex simpler, but adds a foreach loop instead of the maps
and greps of the other solutions.
 
W

Wayne M. Poe

[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]
This is trivial. Why would you need this?
I would consider this a waste of my time to even read such a
proposition.
If you can't post a real world problem/question then don't post
here...

I was reading this on google groups archives and I just had to reply to
it. I understand I'm a few months late, but being in the hospital at
that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it was
originally posted.

Since when can one not post a simplified version of the problem to make
it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
Rather than posting a longer code snippet where one would have to sift
through the code to find the real problem?

Or maybe that's just me.
 
J

Jim Gibson

Wayne M. Poe said:
[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]
On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"

[OP snipped]
I was reading this on google groups archives and I just had to reply to
it. I understand I'm a few months late, but being in the hospital at
that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it was
originally posted.

Since when can one not post a simplified version of the problem to make
it easier to trouble shoot? Isn't that what you are SUPPOSED to do?
Rather than posting a longer code snippet where one would have to sift
through the code to find the real problem?

Or maybe that's just me.

robic0 is a known troll. Many or most of the regulars here simply
ignore his posts, for good reason.
 
T

Tad McClellan

Wayne M. Poe said:
[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]

robic0 wrote:

I was reading this on google groups archives and I just had to reply to
it.


Please do not feed the troll.

I'm actually surprised no one responded to this post at the time it was
originally posted.


Because not feeding a troll is how you make them go elsewhere.
 
W

Wayne M. Poe

Jim said:
Wayne M. Poe said:
[This is a reply to a thread from earlier this year
Reply generated from source post with full headers
from groups.google.com]
On 18 Feb 2006 07:48:30 -0800, "Dominic van der Zypen"

[OP snipped]
I was reading this on google groups archives and I just had to reply
to it. I understand I'm a few months late, but being in the hospital
at that time fighting cancer I hope is a good enough reason.

I'm actually surprised no one responded to this post at the time it
was originally posted.

Since when can one not post a simplified version of the problem to
make it easier to trouble shoot? Isn't that what you are SUPPOSED to
do? Rather than posting a longer code snippet where one would have
to sift through the code to find the real problem?

Or maybe that's just me.

robic0 is a known troll. Many or most of the regulars here simply
ignore his posts, for good reason.

So noted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top