Re: Capture only first match in regular expression

Discussion in 'Perl' started by Peter Tuente, Apr 17, 2009.

  1. Peter Tuente

    Peter Tuente Guest

    Hi Zapanaz,

    the default behaviour of regular expression terms is to be "greedy", so to
    suppress this behaviour to become "not greedy" you have to apply a single
    question mark "?" right after the desired expression(s). Sounds some kind of
    complex, but I hope you get me ;-)

    In your case the following should be sufficient:

    # old: if($content =~ /.*(<a.*<\/a>).*/i){
    $anchorContent = $1;

    # new:
    if($content =~ /.*?(<a.*?<\/a>).*/i){
    $anchorContent = $1;

    The effect is, that the first expression ".*" becomes not so greedy eating
    all the possible chars (incl. one/some "<a" chars that prefix the last
    occurrence of "<a" in the current line). Same with the second ".*".

    Hope this helps ;-)

    Bye.

    PiT

    "Zapanaz" <http://joecosby.com/code/mail.pl> schrieb im Newsbeitrag
    news:...
    > Excuse the cross-post, my server doesn't carry comp.lang.perl.misc but
    > it looks like there is more activity there.
    >
    >
    > The answer to this is probably staring me in the face ...
    >
    > I am parsing/page scraping some HTML. I know the first anchor tag <a>
    > contains information I want.
    >
    > So I do this:
    >
    > if($content =~ /.*(<a.*<\/a>).*/i){
    > $anchorContent = $1;
    >
    > This basically works the way I want, it matches an anchor tag and
    > captures the content of it.
    >
    > But there are multiple anchor tags in the HTML. What I want is the
    > first one, but what I get is the last one.
    >
    > I think I should be using one of these
    >
    > * Match 0 or more times
    > + Match 1 or more times
    > ? Match 1 or 0 times
    > {n} Match exactly n times
    > {n,} Match at least n times
    > {n,m} Match at least n but not more than m times
    >
    > To be honest, I really don't know how (n) is actually supposed to
    > look. Would I actually use /a(1)/ to match "a" only one time?
    >
    >
    >
    > --
    > Zapanaz
    > International Satanic Conspiracy
    > Customer Support Specialist
    > http://joecosby.com/
    > Despite the strange appearance of the scooters, the Chinese ant-terror
    > police are lethal in action.
    >
    > :: Currently listening to No 21 in C major K467 Allegro maestoso, 1785, by
    > Mozart, from "Piano Concertos - Vladimir Ashkenazy"
     
    Peter Tuente, Apr 17, 2009
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    750
  2. Roger L. Cauvin

    Match First Sequence in Regular Expression?

    Roger L. Cauvin, Jan 26, 2006, in forum: Python
    Replies:
    43
    Views:
    1,175
    Armin Steinhoff
    Jan 28, 2006
  3. Jürgen Exner
    Replies:
    0
    Views:
    2,762
    Jürgen Exner
    Apr 12, 2009
  4. Mike Spencer
    Replies:
    0
    Views:
    2,975
    Mike Spencer
    Apr 19, 2009
  5. aliensite
    Replies:
    4
    Views:
    283
    aliensite
    Apr 13, 2005
Loading...

Share This Page