regex @a = m / | /g and captures?

B

Bill

Hello, I've got a regex question.

In the following, the use of () in an 'or' type regex causes @a to
hold both captures, so for each pass through the regex, one capture
and one undef is stored.

Can this be prevented and still use () captures and '|' in the regex?


my $s = '1 2 {3, 3, 3} 4';

my @a = $s =~ m/\{[^\}]+\}|\d/g;

print "\nWithout captures:\n", join "\n", @a;

@a = $s =~ m/(\{[^\}]+\})|(\d)/g;
foreach(@a) { $_ = 'undef' unless $_; }
print "\n\nNow with captures:\n", join "\n", @a;

<<<<<<<<<<<
 
S

Steve Grazzini

Bill said:
In the following, the use of () in an 'or' type regex causes @a to
hold both captures, so for each pass through the regex, one capture
and one undef is stored.

Can this be prevented and still use () captures and '|' in the regex?

Put the parentheses around the entire expression.
@a = $s =~ m/(\{[^\}]+\})|(\d)/g;

/( { [^}]+ } | \d )/xg;

But (as you already know) you don't need the parens at all in
this case.
 
T

Tad McClellan

Bill said:
In the following, the use of () in an 'or' type regex causes @a to
hold both captures, so for each pass through the regex, one capture
and one undef is stored.

Can this be prevented and still use () captures and '|' in the regex?

@a = $s =~ m/(\{[^\}]+\})|(\d)/g;

grep() is handy when you need to filter a list:

my @a = grep defined, $s =~ m/(\{[^\}]+\})|(\d)/g;
 
B

Bill

Steve Grazzini said:
Put the parentheses around the entire expression.
@a = $s =~ m/(\{[^\}]+\})|(\d)/g;

/( { [^}]+ } | \d )/xg;

Oh yes, of course! Cool.
But I think that I simplified the code I was revising too far.

What about this (we want the numbers not the separators):
my $s = '1; 2; {3, 3, 3}; 4;';

my @a = $s =~ m/\{[^\}]+\};|\d;/g;

print "\nWithout captures:\n", join "\n", @a;

@a = $s =~ m/(\{[^\}]+\});|(\d);/g;
foreach(@a) { $_ = 'undef' unless $_; }
print "\n\nNow with captures:\n", join "\n", @a;

<<<<<<<<<<<

It seems that either I have to chop the answers here or filter undefs,
as Tad suggests?
 
Q

Quantum Mechanic

/( { [^}]+ } | \d )/xg;

Oh yes, of course! Cool.
But I think that I simplified the code I was revising too far.

What about this (we want the numbers not the separators):
my $s = '1; 2; {3, 3, 3}; 4;';

my @a = $s =~ m/\{[^\}]+\};|\d;/g;

Then move the common elements (semi-colon) out of the alternation. In
this case, they can be moved out of the capture as well:

/( { [^}]+ } | \d );/xg;

But you haven't stated whether the semi-colons are always there, or
meaningful. If they have no meaning, you can go with the previous
version:
/( { [^}]+ } | \d )/xg;

-QM
 
B

Bill

my $s = '1; 2; {3, 3, 3}; 4;';

my @a = $s =~ m/\{[^\}]+\};|\d;/g;

Then move the common elements (semi-colon) out of the alternation. In
this case, they can be moved out of the capture as well:

/( { [^}]+ } | \d );/xg;

But you haven't stated whether the semi-colons are always there, or
meaningful. If they have no meaning, you can go with the previous
version:
/( { [^}]+ } | \d )/xg;

-QM

So, I guess the answer in general is just to find a way to rewrite the
regex so that there is only one capture. It's good that regexes are so
flexible. Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top