[RegExp] Making non-greedy; Escaping parentheses?

J

Jane Doe

Hello,

I need to browse a list of hyperlinks, each followed by an
author, and remove the links only for certain authors.

1. I searched the archives on Google, but didn't find how to tell the
RegExp object to be non-greedy as using the ? quantifier doesn't seem
to work.

--------- SAMPLE ----------------
var items = new Array("johndoe","janedoe"
// Add parentheses to match any item in items()
var list = '('
list += items.join("|")
list += ')'

//Example: <A href="dummy.php?page=10#934569">TITLE
</A>, AUTHOR, April 12, 2003<br>

pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
pattern += list
pattern += ',.+?<br>'

var input = new RegExp(temp,"gi");
var output = 'TROLL<br>'
document.body.innerHTML = body.replace(input,output);
--------- SAMPLE ----------------

Does somebody know how to do this?

2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
value is copied into one of the $x variables. In this particular case,
I don't need this.
Is there a way to escape parentheses to tell RegEx _not_ to put this
item into a variable? I tried "\(" and "((", to no avail.

Thank you very much for any help
JD.
 
M

Martin Honnen

Jane Doe wrote:

2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
value is copied into one of the $x variables. In this particular case,
I don't need this.
Is there a way to escape parentheses to tell RegEx _not_ to put this
item into a variable? I tried "\(" and "((", to no avail.

I think you are looking for non-capturing parentheses e.g.
/(?:john|jane)doe/
but that is only supported with IE5.5+ and Netscape 6+
 
L

Lasse Reichstein Nielsen

Jane Doe said:
I need to browse a list of hyperlinks, each followed by an
author, and remove the links only for certain authors.

1. I searched the archives on Google, but didn't find how to tell the
RegExp object to be non-greedy as using the ? quantifier doesn't seem
to work.

It should, if the browser is sufficiently new. The improved regular
expressions (non-greedy +,*,? and {}, non capturing bracketsa and
lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
Javascript versions.
--------- SAMPLE ----------------
var items = new Array("johndoe","janedoe"

Missing end parenthesis (and semicolon! Always end your sentences
with a semicolon.).
// Add parentheses to match any item in items()
var list = '('
list += items.join("|")
list += ')'

//Example: <A href="dummy.php?page=10#934569">TITLE
</A>, AUTHOR, April 12, 2003<br>

Is the entire string always on one line?
As a stupid convention, the regular expression "." matches
all non-EOL characters, but there is no shorthand for matching
any character. If the text contains newlines, you may need to
change "." to, e.g., "[\s\S]".
pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
pattern += list
pattern += ',.+?<br>'

If your code is inside a script tag, and not in an external file,
you should escape your "</"'s as "<\/". Most browsers are forgiving.
var input = new RegExp(temp,"gi");

Do you mean "pattern" instead of "temp"?
var output = 'TROLL<br>'
document.body.innerHTML = body.replace(input,output);
--------- SAMPLE ----------------

Does somebody know how to do this?

One problem is, that a minimal match will still be as early as possible.
If you have two entries in a row, and the second has an author on your
hit-list, it will find a match starting at the first "<A". It finds
the minimal match starting there, which includes both entries, so
both are replaced.
To avoid this, you can restrict the .'s so they can't match too far:

pattern = '<A href="[^"]+?#\\d+?">[^<]+?</A>, ';
pattern += list;
pattern += ',[^>]+?<br>';

This prevents matching further than we want it. If there are tags
inside the TITLE or in the date after the author name, then "[^<]"
isn't sufficient as a restriction.
2. Also, I notice that when using (johndoe|janedoe) in a pattern, the
value is copied into one of the $x variables. In this particular case,
I don't need this.
Is there a way to escape parentheses to tell RegEx _not_ to put this
item into a variable? I tried "\(" and "((", to no avail.

Yes.
(?: ... )
This pair of parentheses are purely grouping, and the match won't be
remembered.

/L
 
J

Jane Doe

It should, if the browser is sufficiently new. The improved regular
expressions (non-greedy +,*,? and {}, non capturing bracketsa and
lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
Javascript versions.

Thank you very much Martin and Lasse :) Finally got it working thanks
to you. I didn't know non-greedy regexes were so recent in JS.

Thanks again
JD.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top