[RegExp] Making non-greedy; Escaping parentheses?

Discussion in 'Javascript' started by Jane Doe, Sep 12, 2003.

  1. Jane Doe

    Jane Doe Guest

    Hello,

    I need to browse a list of hyperlinks, each followed by an
    author, and remove the links only for certain authors.

    1. I searched the archives on Google, but didn't find how to tell the
    RegExp object to be non-greedy as using the ? quantifier doesn't seem
    to work.

    --------- SAMPLE ----------------
    var items = new Array("johndoe","janedoe"
    // Add parentheses to match any item in items()
    var list = '('
    list += items.join("|")
    list += ')'

    //Example: <A href="dummy.php?page=10#934569">TITLE
    </A>, AUTHOR, April 12, 2003<br>

    pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
    pattern += list
    pattern += ',.+?<br>'

    var input = new RegExp(temp,"gi");
    var output = 'TROLL<br>'
    document.body.innerHTML = body.replace(input,output);
    --------- SAMPLE ----------------

    Does somebody know how to do this?

    2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
    value is copied into one of the $x variables. In this particular case,
    I don't need this.
    Is there a way to escape parentheses to tell RegEx _not_ to put this
    item into a variable? I tried "\(" and "((", to no avail.

    Thank you very much for any help
    JD.
    Jane Doe, Sep 12, 2003
    #1
    1. Advertising

  2. Jane Doe wrote:


    >
    > 2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
    > value is copied into one of the $x variables. In this particular case,
    > I don't need this.
    > Is there a way to escape parentheses to tell RegEx _not_ to put this
    > item into a variable? I tried "\(" and "((", to no avail.
    >


    I think you are looking for non-capturing parentheses e.g.
    /(?:john|jane)doe/
    but that is only supported with IE5.5+ and Netscape 6+

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Sep 12, 2003
    #2
    1. Advertising

  3. Jane Doe <> writes:

    > I need to browse a list of hyperlinks, each followed by an
    > author, and remove the links only for certain authors.
    >
    > 1. I searched the archives on Google, but didn't find how to tell the
    > RegExp object to be non-greedy as using the ? quantifier doesn't seem
    > to work.


    It should, if the browser is sufficiently new. The improved regular
    expressions (non-greedy +,*,? and {}, non capturing bracketsa and
    lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
    Javascript versions.

    > --------- SAMPLE ----------------
    > var items = new Array("johndoe","janedoe"


    Missing end parenthesis (and semicolon! Always end your sentences
    with a semicolon.).

    > // Add parentheses to match any item in items()
    > var list = '('
    > list += items.join("|")
    > list += ')'
    >
    > //Example: <A href="dummy.php?page=10#934569">TITLE
    > </A>, AUTHOR, April 12, 2003<br>


    Is the entire string always on one line?
    As a stupid convention, the regular expression "." matches
    all non-EOL characters, but there is no shorthand for matching
    any character. If the text contains newlines, you may need to
    change "." to, e.g., "[\s\S]".

    > pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
    > pattern += list
    > pattern += ',.+?<br>'


    If your code is inside a script tag, and not in an external file,
    you should escape your "</"'s as "<\/". Most browsers are forgiving.

    > var input = new RegExp(temp,"gi");


    Do you mean "pattern" instead of "temp"?

    > var output = 'TROLL<br>'
    > document.body.innerHTML = body.replace(input,output);
    > --------- SAMPLE ----------------
    >
    > Does somebody know how to do this?


    One problem is, that a minimal match will still be as early as possible.
    If you have two entries in a row, and the second has an author on your
    hit-list, it will find a match starting at the first "<A". It finds
    the minimal match starting there, which includes both entries, so
    both are replaced.
    To avoid this, you can restrict the .'s so they can't match too far:

    pattern = '<A href="[^"]+?#\\d+?">[^<]+?</A>, ';
    pattern += list;
    pattern += ',[^>]+?<br>';

    This prevents matching further than we want it. If there are tags
    inside the TITLE or in the date after the author name, then "[^<]"
    isn't sufficient as a restriction.

    > 2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
    > value is copied into one of the $x variables. In this particular case,
    > I don't need this.
    > Is there a way to escape parentheses to tell RegEx _not_ to put this
    > item into a variable? I tried "\(" and "((", to no avail.


    Yes.
    (?: ... )
    This pair of parentheses are purely grouping, and the match won't be
    remembered.

    /L
    --
    Lasse Reichstein Nielsen -
    Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit.html>
    'Faith without judgement merely degrades the spirit divine.'
    Lasse Reichstein Nielsen, Sep 12, 2003
    #3
  4. Jane Doe

    Jane Doe Guest

    On 12 Sep 2003 18:35:18 +0200, Lasse Reichstein Nielsen
    <> wrote:
    >It should, if the browser is sufficiently new. The improved regular
    >expressions (non-greedy +,*,? and {}, non capturing bracketsa and
    >lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
    >Javascript versions.


    Thank you very much Martin and Lasse :) Finally got it working thanks
    to you. I didn't know non-greedy regexes were so recent in JS.

    Thanks again
    JD.
    Jane Doe, Sep 13, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sam Pointon

    regexp non-greedy matching bug?

    Sam Pointon, Dec 4, 2005, in forum: Python
    Replies:
    8
    Views:
    356
    Fredrik Lundh
    Dec 5, 2005
  2. Tim Peters

    Re: regexp non-greedy matching bug?

    Tim Peters, Dec 4, 2005, in forum: Python
    Replies:
    0
    Views:
    384
    Tim Peters
    Dec 4, 2005
  3. John Hazen

    Re: regexp non-greedy matching bug?

    John Hazen, Dec 4, 2005, in forum: Python
    Replies:
    0
    Views:
    387
    John Hazen
    Dec 4, 2005
  4. Dan Kelly

    Greedy and non greedy quantifiers

    Dan Kelly, Jan 17, 2008, in forum: Ruby
    Replies:
    4
    Views:
    137
    Robert Klemme
    Jan 19, 2008
  5. Matt Garrish

    greedy v. non-greedy matching

    Matt Garrish, Feb 16, 2004, in forum: Perl Misc
    Replies:
    4
    Views:
    155
    Matt Garrish
    Feb 16, 2004
Loading...

Share This Page