Regex with a varrying number of captures

Discussion in 'Perl Misc' started by Joe Gottman, Jun 18, 2005.

  1. Joe Gottman

    Joe Gottman Guest

    I am parsing a file with several lines of the form
    keyword : value1 value2 ... valueN

    What is the easiest way for me to write a regex that will capture all of the
    values? my first pass was
    /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

    but this only captures the last value.

    Joe Gottman
     
    Joe Gottman, Jun 18, 2005
    #1
    1. Advertising

  2. Joe Gottman wrote:
    > I am parsing a file with several lines of the form
    > keyword : value1 value2 ... valueN
    >
    > What is the easiest way for me to write a regex that will capture all
    > of the values?


    Well, why do you want to use a regexp? A simple
    my ($keyword, undef, @values) = split / /,$line;
    should do the job much easier and faster.

    jue
     
    Jürgen Exner, Jun 18, 2005
    #2
    1. Advertising

  3. Joe Gottman wrote:

    > I am parsing a file with several lines of the form
    > keyword : value1 value2 ... valueN
    >
    > What is the easiest way for me to write a regex that will capture all of the
    > values? my first pass was
    > /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x


    The easiest way is not to try. Do it in two steps.

    It can be done in one step using (?{}) but that's way more complex.

    In this specific case there are alternative ways if you are willing to
    presume (or have already verified) that the input conforms.

    Eg.

    /(\w+)/g; # Then discard the first
     
    Brian McCauley, Jun 18, 2005
    #3
  4. Jürgen Exner wrote:
    > Joe Gottman wrote:
    >
    >>I am parsing a file with several lines of the form
    >> keyword : value1 value2 ... valueN
    >>
    >>What is the easiest way for me to write a regex that will capture all
    >>of the values?

    >
    > Well, why do you want to use a regexp? A simple
    > my ($keyword, undef, @values) = split / /,$line;
    > should do the job much easier and faster.


    That *does* use a regexp. :)


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jun 18, 2005
    #4
  5. John W. Krahn wrote:
    > Jürgen Exner wrote:
    >> Joe Gottman wrote:
    >>
    >>> I am parsing a file with several lines of the form
    >>> keyword : value1 value2 ... valueN
    >>>
    >>> What is the easiest way for me to write a regex that will capture
    >>> all of the values?

    >>
    >> Well, why do you want to use a regexp? A simple
    >> my ($keyword, undef, @values) = split / /,$line;
    >> should do the job much easier and faster.

    >
    > That *does* use a regexp. :)


    Hmmm, guilty as charged ;-)
    But at least not for capturing the desired values.

    jue
     
    Jürgen Exner, Jun 18, 2005
    #5
  6. Joe Gottman

    Bart Lateur Guest

    Joe Gottman wrote:

    >I am parsing a file with several lines of the form
    > keyword : value1 value2 ... valueN
    >
    >What is the easiest way for me to write a regex that will capture all of the
    >values? my first pass was
    > /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x
    >
    >but this only captures the last value.


    That's indeed an annoying feature (IMO) of Perl regular expressions: you
    either capture the lot, or you capture the last value, when you match
    with a repeat modifier.

    The only solution that I think works reasonably well, is a two step
    approach: first match the whole list, and second split up the match into
    its parts. For example, like this (though there are other approches, for
    example using split):

    if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
    @parts = $1 =~ /\w+/g;
    }

    Yes, that is indeed making perl do the same match twice. Double work,
    but I know of no one step method.

    --
    Bart.
     
    Bart Lateur, Jun 19, 2005
    #6
  7. Bart Lateur wrote:

    > Joe Gottman wrote:
    >
    >
    >>I am parsing a file with several lines of the form
    >> keyword : value1 value2 ... valueN
    >>
    >>What is the easiest way for me to write a regex that will capture all of the
    >>values? my first pass was
    >> /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x
    >>
    >>but this only captures the last value.

    >
    > The only solution that I think works reasonably well, is a two step
    > approach: first match the whole list, and second split up the match into
    > its parts. For example, like this (though there are other approches, for
    > example using split):
    >
    > if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
    > @parts = $1 =~ /\w+/g;
    > }


    It is worth mentioning that rather than capturing and reprocessing $1
    you can take advantage of the behaviour of //g in a scalar context.

    if(/^ \s* keyword \s* :/gx) {
    @parts = /\G \s* (\w+)/g;
    }

    Note - although I say this technique is worthy mention I probably
    wouldn't use it here because although it's equivalent to Bart's solution
    I would actually prefer to see an end-of-line anchor in Bart's solution.

    if(/^ \s* keyword \s* : ([\s\w]*)$/x) {
    @parts = $1 =~ /\w+/g;
    }

    > Yes, that is indeed making perl do the same match twice.


    Of course. But as I show above the first match can actually be somewhat
    simpler.

    If you are feeling particularly obscure you can combine the two
    techniques by using lookahead to set pos() to the middle of a pattern match.

    if(/^ \s* keyword \s* : (?=[\s\w]*$)/gx) {
    @parts = /\w+/g;
    }

    This saves the expense of performing the string copy at the expense of
    being rather harder to comprehend.
     
    Brian McCauley, Jun 19, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joshua Beall

    Screen captures

    Joshua Beall, Jan 19, 2004, in forum: HTML
    Replies:
    4
    Views:
    467
    Toby A Inkster
    Jan 19, 2004
  2. El Gato

    Regexp captures

    El Gato, Jan 30, 2007, in forum: Ruby
    Replies:
    2
    Views:
    96
    El Gato
    Jan 31, 2007
  3. Ari Brown

    Regexp: named captures

    Ari Brown, Aug 20, 2007, in forum: Ruby
    Replies:
    20
    Views:
    350
    Steve Austen
    Nov 29, 2010
  4. Bill

    regex @a = m / | /g and captures?

    Bill, Oct 17, 2003, in forum: Perl Misc
    Replies:
    5
    Views:
    129
  5. Todd W

    rename captures in regex

    Todd W, Feb 10, 2005, in forum: Perl Misc
    Replies:
    6
    Views:
    167
    Todd W
    Feb 11, 2005
Loading...

Share This Page