Re: Capture only first match in regular expression

Discussion in 'Perl' started by Jürgen Exner, Apr 12, 2009.

  1. Zapanaz <http://joecosby.com/code/mail.pl> wrote:
    >
    >The answer to this is probably staring me in the face ...
    >
    >I am parsing/page scraping some HTML. I know the first anchor tag <a>
    >contains information I want.
    >
    >So I do this:
    >
    > if($content =~ /.*(<a.*<\/a>).*/i){
    > $anchorContent = $1;
    >
    >This basically works the way I want, it matches an anchor tag and
    >captures the content of it.
    >
    >But there are multiple anchor tags in the HTML. What I want is the
    >first one, but what I get is the last one.


    Drop that .* at the beginning of your RE, it doesn't do you any good but
    eats up everything as far as it can provided the following RE still
    matches (in short: it is greedy).

    Having said that unless your HTML is some fixed format you really
    really should be using an HTML parser to parse HTML. HTML is not a
    regular language and therefore cannot be parsed using pure regular
    expressions.

    >I think I should be using one of these
    >
    >* Match 0 or more times
    >+ Match 1 or more times
    >? Match 1 or 0 times
    >{n} Match exactly n times
    >{n,} Match at least n times
    >{n,m} Match at least n but not more than m times


    If at all you could use ? to turn the * into non-greedy as in .*?, but
    that's just stupid because it would match the empty string anywhere.

    jue
    Jürgen Exner, Apr 12, 2009
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    720
  2. Roger L. Cauvin

    Match First Sequence in Regular Expression?

    Roger L. Cauvin, Jan 26, 2006, in forum: Python
    Replies:
    43
    Views:
    1,102
    Armin Steinhoff
    Jan 28, 2006
  3. Peter Tuente
    Replies:
    0
    Views:
    16,818
    Peter Tuente
    Apr 17, 2009
  4. Mike Spencer
    Replies:
    0
    Views:
    2,939
    Mike Spencer
    Apr 19, 2009
  5. aliensite
    Replies:
    4
    Views:
    259
    aliensite
    Apr 13, 2005
Loading...

Share This Page