Regular expression for this?

stevewy · Jun 10, 2010

I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

I thought

onclick*\>

would work, but it doesn't.

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?

Steve

Joe Nine · Jun 10, 2010

I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

I thought

onclick*\>

would work, but it doesn't.

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?

Steve

I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.

Gabriel Gilini · Jun 10, 2010

Joe said:
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.

No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

What I have failed to understand is how the OP issue correlates with
Javascript.

Thomas 'PointedEars' Lahn · Jun 10, 2010

Stefan said:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

Click to expand...

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

The .*? in the first variant matches anything, in a non-greedy way (as
little as possible).

The [^>]* in the second variant matches any number of characters other
than ">".

One thing that every programmer should know is that SGML-based markup
languages like HTML, and programming languages, are usually not regular
languages (they are of the Correct Bracket Language type: context-free but
not regular), so they cannot be parsed with one regular expression alone (a
false positive for your suggestion has already been mentioned). With
further constraints as the one described here it is sometimes possible to
parse them with one application of a regular expression if the regular
expression grammar supports alternation and other special features.

The other thing is: What does this have to do with ECMAScript-based
scripting languages other than the text being replaced constitutes such
code? IMNSHO, this question is quite off-topic here, and not likely to
be answered in a way that is helpful to OP anyway, since the flavor of
regular expressions that their text editor supports is unknown.

PointedEars

Thomas 'PointedEars' Lahn · Jun 10, 2010

Gabriel said:
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

What I have failed to understand is how the OP issue correlates with
Javascript.

Add me.

PointedEars

Joe Nine · Jun 10, 2010

Gabriel said:
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

Yes technically that's what he said. I was reading between the lines and
deducing that it's probably not what he wants. I suspect he wants the
contents of the onclick string. Here's an example where he gets more
than that.

< ...onclick="something()" onmouseover="somethingelse()">

Gabriel Gilini · Jun 10, 2010

Thomas said:
<â€¦ onclick="if (2 > 1) window.alert(42);">â€¦</â€¦>

Yes, I know that, given the context, that is a faulty request. I was
just stating OP's request.
The short answer would be: Don't.

Thomas 'PointedEars' Lahn · Jun 10, 2010

Gabriel said:
Thomas said:

Gabriel said:

Joe Nine wrote:
(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".
[...]

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".

Click to expand...

<â€¦ onclick="if (2 > 1) window.alert(42);">â€¦</â€¦>

Click to expand...

Yes, I know that, given the context, that is a faulty request.

There is nothing faulty about this.

I was just stating OP's request.

You have (mis)interpreted it in your favor.

The short answer would be: Don't.

Nonsense.

PointedEars

Gabriel Gilini · Jun 10, 2010

Thomas said:
Gabriel said:

Thomas said:

Gabriel Gilini wrote:
Joe Nine wrote:
(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".
[...]

Basically it needs to Find the word onclick, then select all the text
up to >. Sort of like an extended search.

The wildcard "*" symbol select "the previous token", not "all and
anything" like I am used to.

What am I doing wrong?
I don't know the right regexp but I do notice that you're making an
assumption that the onclick is always going to be last, before the >
character. It might not be.
No, he isn't. Read his post again, he wants to match everything that
goes after the string "onclick" until the first appearance of ">".
<â€¦ onclick="if (2 > 1) window.alert(42);">â€¦</â€¦>

Click to expand...

Yes, I know that, given the context, that is a faulty request.

Click to expand...

There is nothing faulty about this.

I think you misunderstood me. What I tried to say is that trying to
match everything after an onclick attribute up to the end of the opening
tag with Regular Expressions in HTML isn't something that could be
relied upon, as you so technically put in your reply to OP.

You have (mis)interpreted it in your favor.

I don't think so.

| (e-mail address removed) wrote:
| > I'm just trying to work out (if what I want is at all possible), a
| > regular expression that will search for and select (in a text editor
| > that supports regexps, like Notepad++) the word "onclick", then any
| > text at all, up to and including ">".

This is exactly what I said.

Nonsense.

Now you're confusing me. Do you think that what OP is trying to
accomplish with Regular Expressions should be done or not?

Gabriel Gilini · Jun 10, 2010

Joe said:
Yes technically that's what he said. I was reading between the lines and
deducing that it's probably not what he wants. I suspect he wants the
contents of the onclick string. Here's an example where he gets more
than that.

< ...onclick="something()" onmouseover="somethingelse()">

That's one way of deducting what he wants, but that's nothing but an
exercise in futility. OP didn't give enough information for us to know
exactly what he wants.

Either way, this probably don't belong do c.l.js

Thomas 'PointedEars' Lahn · Jun 10, 2010

Gabriel said:
I think you misunderstood me. What I tried to say is that trying to
match everything after an onclick attribute up to the end of the opening
tag with Regular Expressions in HTML isn't something that could be
relied upon, as you so technically put in your reply to OP.

You need to read my explanation more carefully. It is quite possible to do
what was intended with regular expressions reliably, just not with any
flavor of regular expressions.

Now you're confusing me. Do you think that what OP is trying to
accomplish with Regular Expressions should be done or not?

I do not see why it should not be done if it is done properly. For example,
I have frequently used Java regular expressions in Eclipse, and sometimes
GNU-BREs, EREs and PCREs in shell scripts, for efficient search-and-replace,
including in HTML documents. With regard to JS/ES and the DOM, using
regular expressions is also the first step in writing an efficient
`innerHTML' replacement.

So there certainly is value in knowing how to use flavors of regular
expressions to solve the parsing problem of context-free languages.

HTH

PointedEars

Thomas 'PointedEars' Lahn · Jun 10, 2010

Stefan said:
Thomas said:

Stefan said:

(e-mail address removed) wrote:
I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

The .*? in the first variant matches anything, in a non-greedy way (as
little as possible).

The [^>]* in the second variant matches any number of characters other
than ">".

Click to expand...

One thing that every programmer should know is that SGML-based markup
languages like HTML, and programming languages, are usually not regular
languages (they are of the Correct Bracket Language type: context-free
but not regular), so they cannot be parsed with one regular expression
alone

Click to expand...

And I never said they could.

You are misunderstanding my followup as an attempt at complete rebuttal of
your arguments.

Besides, it would depend on the type of regular expression used. For
example, take Perl's (?{...}) and (??{...}) constructs, which can be used
to embed Perl code in regexes. Same thing goes for the /e modifier in Perl
substitutions. Voila, Turing complete regular expressions. (yeah, I know
that's cheating ;-)

(?Râ€¦) suffices with PCRE said:
The first (highest rated) comment on this page is a good indication of
what happens when you think too hard about parsing HTML with regular
expressions:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except
xhtml-self-contained-tags

Yes, cluelessness is a widespread disease, and especially common at
stackoverflow. You can parse HTML with regular expressions, just not
with a (non-PCRE) regular expression alone.

(a false positive for your suggestion has already been mentioned).

Click to expand...

A false positive for what? The OP wanted to match...

| the word "onclick", then any text at all, up to and including ">".

...which is just what the proposed expressions do. [...]

No, think again.

PointedEars

Thomas 'PointedEars' Lahn · Jun 10, 2010

Stefan said:
Thomas said:

Stefan said:

Thomas 'PointedEars' Lahn wrote:
(a false positive for your suggestion has already been mentioned).

A false positive for what? The OP wanted to match...

| the word "onclick", then any text at all, up to and including ">".

...which is just what the proposed expressions do. [...]

Click to expand...

No, think again.

Click to expand...

I'm curious. Do you mean that "any text at all" should exclude the empty
string as an edge case? [...]

I mean that it should include "any text" to begin with. Granted, the OP's
request is ambiguous to a large degree, but I would not assume "any text" to
exclude `>' characters. So if there is a correct answer to this "question"
it should, IMO, be more like

onclick.*>

(Not that this would likely be overly useful, of course.)

PointedEars

stevewy · Jun 11, 2010

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

Well, it seems onclick="[^"]*" matches what I want (which is up to the

symbol, but not including it as I erroneously stated in my original

post), but unfortunately does not include newlines. To be
comprehensive, it would need to include any newline characters between
onclick and the " symbol.

So, it needs to select from onclick, all the way through the onclick
statement till it finds the closing " mark. Across several lines if
necessary.

The reason I am needing this rather odd thing done, is that at work I
deal with putting client-side validation into questionnaires, that are
churned out by a survey application. The client-side validation
relies, of course, on Javascript and has a lot of onClick statements.
Later on in the life-cycle of the questionnaire, the validation needs
to be stripped out. I am using Notepad++ that accommodates regexps in
its find & replace feature.

Being that onClick statements are of the form onClick=" [JS
statements] " and each onClick is placed inside a form element tag
of the HTML (like <INPUT>), I thought it would save time to use the
find & replace feature of Notepad++ to select onClick statements and
replace them with nothing, thus removing them.

I realise, Thomas, that this is more a regexp query and not
exclusively Javascript, although I am using it in a JS task.

Does this help in figuring out the regexp I would need to accomplish
this? At the moment, I am "so near and yet so far"....

Steve

stevewy · Jun 11, 2010

Given the extra information supplied above, I did a more targeted
Google search about my problem, and found this article:
http://blog.microugly.com/2009/10/notepad-linebreaks-in-regular.html,
which would indicate Notepad++ does not have very good support of
regular expressions anyway. Other text editors have an option to
specify whether "." includes newlines or not, but Notepad does not.

And so, other than following the tip supplied in the article, it does
not seem very likely that a regexp could be found to accomplish
exactly what I need, not in Notepad++ at any rate.

Anyway, thank you for the responses supplied to my initial query.

Steve

SAM · Jun 11, 2010

Le 6/11/10 11:41 AM, (e-mail address removed) a écrit :

If the regular expressions used by Notepad++ are similar to those in
JavaScript, you could try

onclick.*?> or onclick[^>]*>

Click to expand...

Well, it seems onclick="[^"]*" matches what I want (which is up to the

symbol, but not including it as I erroneously stated in my original

Click to expand...

post), but unfortunately does not include newlines.

maybe :

onclick="([^"]|\s)*"

I am using Notepad++ that accommodates regexps in
its find & replace feature.

Don't know NotePad, sorry.

Being that onClick statements are of the form onClick=" [JS
statements] " and each onClick is placed inside a form element tag
of the HTML (like <INPUT>), I thought it would save time to use the
find & replace feature of Notepad++ to select onClick statements and
replace them with nothing, thus removing them.

search :
(onclick=")([^"]|\s)*
replace :
\1"
or ?
$1"

stevewy · Jun 11, 2010

On 11 June, 11:55, SAM <[email protected]>
wrote:
maybe :

onclick="([^"]|\s)*"

No, it doesn't do anything with that string. But thanks for the
input.

Steve

SAM · Jun 11, 2010

Le 6/11/10 1:00 PM, (e-mail address removed) a écrit :

On 11 June, 11:55, SAM <[email protected]>
wrote:
maybe :

onclick="([^"]|\s)*"

No, it doesn't do anything with that string.

Sorry,
that works fine in my text editor.

Rest to use a JS tool ?

<form onsubmit="return doIt(this)">
<div>
Enter your code here :<br>
<textarea name="txt" cols=80 rows=16></textarea><br>
Search: <input name="fSearch"><br>
Replace: <input name="fReplace"><br>
<input type="submit" value="replace all">
<input type="reset" onclick="restitue(this)">
</div>
</form>
<script type="text/javascript">
var memoriz = '';
function doIt(where) {
var f = where.fSearch.value,
r = where.fReplace.value,
t = where.txt;
if(memoriz=='') memoriz = t.value;
var rg = new RegExp ( f, 'ig');
t.value = t.value.replace(rg,r);
return false;
}
function restitue(what) {
setTimeout( function() {
if(memoriz!='')
what.form.txt.value = memoriz;
memoriz = '';
},10);
}
</script>

stevewy · Jun 11, 2010

Rest to use a JS tool ?

Thanks, I'll keep that code and try it if the Notepad++ semi-solution
proves too tedious.

Steve

Dr J R Stockton · Jun 12, 2010

In comp.lang.javascript message <b60bba22-da8f-4437-baca-ffbe23c8b5e9@z1
0g2000yqb.googlegroups.com>, Thu, 10 Jun 2010 09:14:52,
(e-mail address removed) posted:

I'm just trying to work out (if what I want is at all possible), a
regular expression that will search for and select (in a text editor
that supports regexps, like Notepad++) the word "onclick", then any
text at all, up to and including ">".

How "like" must it be?

MiniTrue will do it, at least at the XP 32-bit command line (CMD.EXE) :

PROMPT>mtr $1.htm "onclick[^^]*>"

The character ^ is a command-line escape, so only the second one counts;
[^] thus means to search for not nothing, which, at least in MiniTrue,
is a more potent "anything" than a mere dot is.

You could see if that works in Notepad++. Or in JavaScript.

MiniTrue is not a full interactive editor, but it can do substitutions.

Pattern Search Regular Expression	20	Jun 15, 2013
Regular Expression for the special character "\|" pipe	7	May 27, 2014
Regular expression help	3	Jul 8, 2008
What do you think about this script?	0	Aug 11, 2023
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Regular Expression Help	3	Dec 21, 2006
Looking For Advice	1	Dec 10, 2022
Need help with this script	4	Mar 12, 2023

Regular expression for this?

stevewy

Joe Nine

Gabriel Gilini

Thomas 'PointedEars' Lahn

Thomas 'PointedEars' Lahn

Joe Nine

Gabriel Gilini

Thomas 'PointedEars' Lahn

Gabriel Gilini

Gabriel Gilini

Thomas 'PointedEars' Lahn

Thomas 'PointedEars' Lahn

Thomas 'PointedEars' Lahn

stevewy

stevewy

SAM

stevewy

SAM

stevewy

Dr J R Stockton

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads