Regex with a varrying number of captures

Joe Gottman · Jun 18, 2005

I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

but this only captures the last value.

Joe Gottman

Jürgen Exner · Jun 18, 2005

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all
of the values?

Well, why do you want to use a regexp? A simple
my ($keyword, undef, @values) = split / /,$line;
should do the job much easier and faster.

jue

Brian McCauley · Jun 18, 2005

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

The easiest way is not to try. Do it in two steps.

It can be done in one step using (?{}) but that's way more complex.

In this specific case there are alternative ways if you are willing to
presume (or have already verified) that the input conforms.

Eg.

/(\w+)/g; # Then discard the first

John W. Krahn · Jun 18, 2005

Jürgen Exner said:
Well, why do you want to use a regexp? A simple
my ($keyword, undef, @values) = split / /,$line;
should do the job much easier and faster.

That *does* use a regexp.

John

Jürgen Exner · Jun 18, 2005

John said:
That *does* use a regexp.

Hmmm, guilty as charged ;-)
But at least not for capturing the desired values.

jue

Bart Lateur · Jun 19, 2005

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

but this only captures the last value.

That's indeed an annoying feature (IMO) of Perl regular expressions: you
either capture the lot, or you capture the last value, when you match
with a repeat modifier.

The only solution that I think works reasonably well, is a two step
approach: first match the whole list, and second split up the match into
its parts. For example, like this (though there are other approches, for
example using split):

if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
@parts = $1 =~ /\w+/g;
}

Yes, that is indeed making perl do the same match twice. Double work,
but I know of no one step method.

Brian McCauley · Jun 19, 2005

Bart said:
Joe Gottman wrote:

The only solution that I think works reasonably well, is a two step
approach: first match the whole list, and second split up the match into
its parts. For example, like this (though there are other approches, for
example using split):

if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
@parts = $1 =~ /\w+/g;
}

It is worth mentioning that rather than capturing and reprocessing $1
you can take advantage of the behaviour of //g in a scalar context.

if(/^ \s* keyword \s* :/gx) {
@parts = /\G \s* (\w+)/g;
}

Note - although I say this technique is worthy mention I probably
wouldn't use it here because although it's equivalent to Bart's solution
I would actually prefer to see an end-of-line anchor in Bart's solution.

if(/^ \s* keyword \s* : ([\s\w]*)$/x) {
@parts = $1 =~ /\w+/g;
}

Yes, that is indeed making perl do the same match twice.

Of course. But as I show above the first match can actually be somewhat
simpler.

If you are feeling particularly obscure you can combine the two
techniques by using lookahead to set pos() to the middle of a pattern match.

if(/^ \s* keyword \s* : (?=[\s\w]*$)/gx) {
@parts = /\w+/g;
}

This saves the expense of performing the string copy at the expense of
being rather harder to comprehend.

Sort by number of characters	1	Nov 2, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
regex @a = m / \| /g and captures?	5	Oct 17, 2003
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
How to debug a regex with (?DEFINE)?	0	Aug 7, 2012
rename captures in regex	6	Feb 10, 2005
Match a pattern multiple times, returning matches, captures andoffset?	9	Apr 5, 2011
Parsing multiple lines from text file using regex	0	Oct 27, 2013

Regex with a varrying number of captures

Joe Gottman

Jürgen Exner

Brian McCauley

John W. Krahn

Jürgen Exner

Bart Lateur

Brian McCauley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads