Regex pattern problem

Discussion in 'Java' started by Ted Hopp, Nov 13, 2006.

  1. Ted Hopp

    Ted Hopp Guest

    I was writing a quick-and-dirty regex to search html text and pull out the
    source url from IMG tags. I first tried:

    Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"([^\"]*)\"");

    (I know that this pattern makes all kinds of unwarranted assumptions about
    the html, but that's another topic.) The problem I was having was that
    although this pattern matches, it only results in one capture group--group
    0. I was expecting the parens after src= to give me the url in capture group
    1, but no such luck. It's only when I double the parens:

    Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"(([^\"]*))\"");

    that the src value is captured.

    So my question is: why do I need to double the parens?

    Thanks,

    Ted Hopp
    Ted Hopp, Nov 13, 2006
    #1
    1. Advertising

  2. Ted Hopp writes:

    > I was writing a quick-and-dirty regex to search html text and
    > pull out the source url from IMG tags. I first tried:
    >
    > Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"([^\"]*)\"");
    >
    > (I know that this pattern makes all kinds of unwarranted
    > assumptions about the html, but that's another topic.) The
    > problem I was having was that although this pattern matches,
    > it only results in one capture group--group 0. I was
    > expecting the parens after src= to give me the url in capture
    > group 1, but no such luck. It's only when I double the
    > parens:
    >
    > Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"(([^\"]*))\"");
    >
    > that the src value is captured.
    >
    > So my question is: why do I need to double the parens?


    You don't need to double the parens. You need to provide a
    short program that demonstrates the problem. The following is
    longer than needed, but it fails to fail in the way that you
    describe: it has single parens in the pattern, accesses group
    1, and prints here.be.it/1 and here.be.it/2 as expected:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    class Roska {
    public static void main(String [] args) {
    String t1 = "left <img stuff src=\"here.be.it/1\" etc.>";
    String t2 = " then left <img src=\"here.be.it/2\" etc.>";
    Pattern p = Pattern
    .compile("<img (?:[^>]* )?src=\"([^\"]*)\"");
    Matcher m = p.matcher(t1 + t2);
    while (m.find()) {
    System.out.println(m.group(1));
    }
    }
    }
    Jussi Piitulainen, Nov 13, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xah Lee
    Replies:
    1
    Views:
    931
    Ilias Lazaridis
    Sep 22, 2006
  2. Xah Lee
    Replies:
    8
    Views:
    458
    Ilias Lazaridis
    Sep 26, 2006
  3. Xah Lee
    Replies:
    2
    Views:
    213
    Xah Lee
    Sep 25, 2006
  4. Replies:
    2
    Views:
    384
  5. ChrisC
    Replies:
    4
    Views:
    161
    ChrisC
    Jun 25, 2010
Loading...

Share This Page