Regexp *-operator and multiple elements

Discussion in 'Perl Misc' started by Martijn Houtman, Dec 29, 2003.

  1. Hello,

    I have an issue parsing a string with a regular exression. Here's a small
    example:

    @foobar = ("foobarbarbarfoo" =~ m/(foo)(bar)*(foo)/g);

    this makes the array foobar contain:
    {"foo", "bar", "foo"}
    while I want it to be
    {"foo", "bar", "bar", "bar", "foo"}

    The *-operator seems to 'forget' the first few elements and just returns the
    last element, which is stored in the $2 variable. Is there a way to make it
    return the full list of elements?

    I have been suggested to split the string into three pieces first, and then
    parse them separately, but I'd still like to do it with a single regular
    expression.

    Thanks in advance!
    Regards,
    --
    tinus.
    Martijn Houtman, Dec 29, 2003
    #1
    1. Advertising

  2. Martijn Houtman wrote:
    >
    > @foobar = ("foobarbarbarfoo" =~ m/(foo)(bar)*(foo)/g);
    >
    > this makes the array foobar contain:
    > {"foo", "bar", "foo"}
    > while I want it to be
    > {"foo", "bar", "bar", "bar", "foo"}
    >
    > The *-operator seems to 'forget' the first few elements and just
    > returns the last element, which is stored in the $2 variable. Is
    > there a way to make it return the full list of elements?


    This may be something in the right direction:

    @foobar = 'foobarbarbarfoo' =~ /(foo)((?:bar)*)(foo)/;

    It distinguishes between clustering and capturing, and '*' is captured
    as well. The result is:

    ('foo', 'barbarbar', 'foo')

    Of course, to get an array with five elements you can do:

    @foobar = 'foobarbarbarfoo' =~ /(foo|bar)/g;

    But that matches much more. ;-)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Dec 29, 2003
    #2
    1. Advertising

  3. "Martijn Houtman" <> wrote in message
    news:pL_Hb.75652$...
    > Hello,
    >
    > I have an issue parsing a string with a regular exression. Here's a small
    > example:
    >
    > @foobar = ("foobarbarbarfoo" =~ m/(foo)(bar)*(foo)/g);
    >
    > this makes the array foobar contain:
    > {"foo", "bar", "foo"}
    > while I want it to be
    > {"foo", "bar", "bar", "bar", "foo"}
    >
    > The *-operator seems to 'forget' the first few elements and just returns

    the
    > last element, which is stored in the $2 variable.

    this is because the * is outside the capture brackets

    > Is there a way to make it
    > return the full list of elements?


    yes
    this can be done with a combination of look-ahead/behind assertions, along
    with the \G assertion

    >
    > I have been suggested to split the string into three pieces first, and

    then
    > parse them separately, but I'd still like to do it with a single regular
    > expression.


    why not:
    @foobar = ("foo",(barbarbarfoo" =~ /foo((?:bar)*)foo/ &&
    $1=~/(bar)/g),"foo");

    if you still want to do it with the assertions:
    @foobar = ("foobarbarbarfoo" =~
    /(foo(?=(?:bar)*foo)|\Gbar(?=(?:bar)*foo)|(?<=foo(?:bar))\Gfoo)/g;

    gnari

    >
    > Thanks in advance!
    > Regards,
    > --
    > tinus.
    Ragnar Hafstað, Dec 29, 2003
    #3
  4. "Ragnar Hafstað" <> wrote in message
    news:bspvvd$45c$...
    > @foobar = ("foobarbarbarfoo" =~
    > /(foo(?=(?:bar)*foo)|\Gbar(?=(?:bar)*foo)|(?<=foo(?:bar))\Gfoo)/g;


    ooops, the cut-and-paste failed to include the closing parens
    @foobar = ("foobarbarbarfoo" =~
    /(foo(?=(?:bar)*foo)|\Gbar(?=(?:bar)*foo)|(?<=foo(?:bar))\Gfoo)/g);

    gnari
    Ragnar Hafstað, Dec 29, 2003
    #4
  5. Ragnar Hafstað wrote:

    > "Ragnar Hafstað" <> wrote in message
    > news:bspvvd$45c$...
    >> @foobar = ("foobarbarbarfoo" =~
    >> /(foo(?=(?:bar)*foo)|\Gbar(?=(?:bar)*foo)|(?<=foo(?:bar))\Gfoo)/g;

    >
    > ooops, the cut-and-paste failed to include the closing parens
    > @foobar = ("foobarbarbarfoo" =~
    > /(foo(?=(?:bar)*foo)|\Gbar(?=(?:bar)*foo)|(?<=foo(?:bar))\Gfoo)/g);


    Thanks, Gnari and Gunnar, for your suggestions. I fail to see what exactly
    happens in the above example, though. I wished the answer would have been a
    bit more trivial.

    The problem is, the above might work for the above example, but my "real
    life" situation is a bit more complex. Take a look at this url, if you're
    interested: http://tinus.ath.cx/temp/form.txt. It's the code I currently
    have.

    It's meant to be a .java-file parser. The idea of this uni assignment is to
    have the script count a few certain keywords, like 'private', 'class',
    'new' etc. in the .java-file. Now, @imports is supposed to catch the bits
    surrounded by '( )' in the regexps. It does, but where the
    multiplier-operator * is used, it just counts the last, as explained in my
    previous, smaller example.

    I know there might be a better way to count the keywords, but I would still
    like to finish the parser as it is. Suggestions are very welcome.

    Thanks again. Kind regards,
    --
    tinus.
    Martijn Houtman, Dec 29, 2003
    #5
  6. "Martijn Houtman" <> wrote in message
    news:B%2Ib.80619$...

    > The problem is, the above might work for the above example, but my "real
    > life" situation is a bit more complex. Take a look at this url, if you're
    > interested: http://tinus.ath.cx/temp/form.txt. It's the code I currently
    > have.
    >
    > It's meant to be a .java-file parser. The idea of this uni assignment is

    to
    > have the script count a few certain keywords, like 'private', 'class',
    > 'new' etc. in the .java-file. Now, @imports is supposed to catch the bits
    > surrounded by '( )' in the regexps. It does, but where the
    > multiplier-operator * is used, it just counts the last, as explained in my
    > previous, smaller example.
    >
    > I know there might be a better way to count the keywords, but I would

    still
    > like to finish the parser as it is. Suggestions are very welcome.


    you might want to look at constructs like

    $string=~s/somepattern_with_capture/func($1)/ge;

    where func() is a sub that does your counting and optionally more
    operations.
    for example:
    sub func {
    my ($item)=@_;
    $counters{$item}++ if $countable{$item};
    return '' if deletable{$item};
    $item;
    }

    hashes like %countable and %deletable would be preset to
    control what action to take.

    gnari
    Ragnar Hafstað, Dec 30, 2003
    #6
  7. In article <pL_Hb.75652$>,
    Martijn Houtman <> wrote:
    >Hello,
    >
    >I have an issue parsing a string with a regular exression. Here's a small
    >example:
    >
    >@foobar = ("foobarbarbarfoo" =~ m/(foo)(bar)*(foo)/g);
    >
    >this makes the array foobar contain:
    > {"foo", "bar", "foo"}
    >while I want it to be
    > {"foo", "bar", "bar", "bar", "foo"}
    >
    >The *-operator seems to 'forget' the first few elements and just returns the
    >last element, which is stored in the $2 variable. Is there a way to make it
    >return the full list of elements?
    >
    >I have been suggested to split the string into three pieces first, and then
    >parse them separately, but I'd still like to do it with a single regular
    >expression.


    Another possibility:

    if ( "foobarbarbarfoo" =~ /^foo(.*?)foo$/g and
    (my $match = $1) =~ /^(?:bar)+$/ )
    {
    @foobar = ('foo', $match =~ /(bar)/g, 'foo');
    print join "\n",@foobar;
    }


    hth,
    --
    Charles DeRykus
    Charles DeRykus, Dec 30, 2003
    #7
  8. Martijn Houtman

    Guest

    Martijn Houtman <> wrote:
    > Hello,
    >
    > I have an issue parsing a string with a regular exression. Here's a small
    > example:
    >
    > @foobar = ("foobarbarbarfoo" =~ m/(foo)(bar)*(foo)/g);
    >
    > this makes the array foobar contain:
    > {"foo", "bar", "foo"}
    > while I want it to be
    > {"foo", "bar", "bar", "bar", "foo"}
    >
    > The *-operator seems to 'forget' the first few elements and just returns
    > the last element, which is stored in the $2 variable. Is there a way to
    > make it return the full list of elements?


    Others have give some solutions, but I don't think you understand why it
    does what it does. The mapping between capturing parantheses and capture
    variables is lexical, not dynamic. $2 hold the match of the capturing
    parantheses which are lexically the second to be opened in the regex.
    If that set matches multiple times, the $2 variable holds the last one of
    these matches.

    Xho

    >
    > I have been suggested to split the string into three pieces first, and
    > then parse them separately, but I'd still like to do it with a single
    > regular expression.
    >
    > Thanks in advance!
    > Regards,


    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service New Rate! $9.95/Month 50GB
    , Jan 2, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    146
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    447
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    328
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    176
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    166
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page