Regex with a varrying number of captures

J

Joe Gottman

I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

but this only captures the last value.

Joe Gottman
 
J

Jürgen Exner

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all
of the values?

Well, why do you want to use a regexp? A simple
my ($keyword, undef, @values) = split / /,$line;
should do the job much easier and faster.

jue
 
B

Brian McCauley

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

The easiest way is not to try. Do it in two steps.

It can be done in one step using (?{}) but that's way more complex.

In this specific case there are alternative ways if you are willing to
presume (or have already verified) that the input conforms.

Eg.

/(\w+)/g; # Then discard the first
 
J

John W. Krahn

Jürgen Exner said:
Well, why do you want to use a regexp? A simple
my ($keyword, undef, @values) = split / /,$line;
should do the job much easier and faster.

That *does* use a regexp. :)


John
 
B

Bart Lateur

Joe said:
I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

but this only captures the last value.

That's indeed an annoying feature (IMO) of Perl regular expressions: you
either capture the lot, or you capture the last value, when you match
with a repeat modifier.

The only solution that I think works reasonably well, is a two step
approach: first match the whole list, and second split up the match into
its parts. For example, like this (though there are other approches, for
example using split):

if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
@parts = $1 =~ /\w+/g;
}

Yes, that is indeed making perl do the same match twice. Double work,
but I know of no one step method.
 
B

Brian McCauley

Bart said:
Joe Gottman wrote:



The only solution that I think works reasonably well, is a two step
approach: first match the whole list, and second split up the match into
its parts. For example, like this (though there are other approches, for
example using split):

if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
@parts = $1 =~ /\w+/g;
}

It is worth mentioning that rather than capturing and reprocessing $1
you can take advantage of the behaviour of //g in a scalar context.

if(/^ \s* keyword \s* :/gx) {
@parts = /\G \s* (\w+)/g;
}

Note - although I say this technique is worthy mention I probably
wouldn't use it here because although it's equivalent to Bart's solution
I would actually prefer to see an end-of-line anchor in Bart's solution.

if(/^ \s* keyword \s* : ([\s\w]*)$/x) {
@parts = $1 =~ /\w+/g;
}
Yes, that is indeed making perl do the same match twice.

Of course. But as I show above the first match can actually be somewhat
simpler.

If you are feeling particularly obscure you can combine the two
techniques by using lookahead to set pos() to the middle of a pattern match.

if(/^ \s* keyword \s* : (?=[\s\w]*$)/gx) {
@parts = /\w+/g;
}

This saves the expense of performing the string copy at the expense of
being rather harder to comprehend.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top