Regular expression help

  • Thread starter Christoph Boget
  • Start date
C

Christoph Boget

I'm trying to get a regular expression to work in JS. It appears to be
working everywhere else I'm testing it (an app called Regex Coach and php)
but I can't seem to get it to work in JS. What the regex is supposed to do
is:

<p></p>
OR
<br>
OR
<br/>
OR
<br />
OR
<p></p>
OR
<p><br></p>
OR
<p><br/></p>
OR
<p><br /></p>
OR
<p>[multiple of any of the above br's]</p>

taking into account any number of interlaced spaces. The regex I came up
with is:

^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$

which, as I said, seems to work elsewhere. However, no matter how I try to
use it in JS, using the test() method against it returns true against text
when it should be returning false. The things I've tried are as follows:

var re = /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/;
re.test( MyStringValue );

var re = /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/gim;
re.test( MyStringValue );

var re = new RegExp(
'^\<p\s*\>[\<br\s*\/{0,1}\>|\s]*\<\/p\>|[\<br\s*\/{0,1}\>|\s]*$' );
re.test( MyStringValue );

var re = new RegExp(
'^\<p\s*\>[\<br\s*\/{0,1}\>|\s]*\<\/p\>|[\<br\s*\/{0,1}\>|\s]*$', 'gim' );
re.test( MyStringValue );

But it's failing the test (returning true) on things like

<p>
<br>
<br/>lskadfakjsdf;lja <br>
</p>

for example. What gives? Am I doing something wrong? It seems like it's
working elsewhere, just not in JS...

thnx,
Christoph
 
T

Thomas 'PointedEars' Lahn

Christoph said:
I'm trying to get a regular expression to work in JS. It appears to be
working everywhere else I'm testing it (an app called Regex Coach and php)

I recommend to use QuickREx instead:

but I can't seem to get it to work in JS. What the regex is supposed to do
is:

<p></p> ^^^^^^^
OR
<br>
OR
<br/>
OR
<br />
OR
<p></p>
^^^^^^^
ISTM there is a duplicate.
OR
<p><br></p>
OR
<p><br/></p>
OR
<p><br /></p>
OR
<p>[multiple of any of the above br's]</p>

taking into account any number of interlaced spaces.

Probably you mean in-between *whitespace* instead as that is what \s matches.
The regex I came up with is:

^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$

which, as I said, seems to work elsewhere.

It will not work in JScript before version 5.5 (by default: MSHTML before
version 5.5) because of the non-capturing parentheses.
However, no matter how I try to use it in JS, using the test() method
against it returns true against text when it should be returning false.
The things I've tried are as follows:
(1)
var re = /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/;
re.test( MyStringValue );

It should be `myStringValue' as the identifier does not denote a constructor.

(2)
var re = /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/gim;
re.test( MyStringValue );

The `m' modifier means that `^' matches the start-of-line rather than the
start-of-input, and `$' matches the end-of-line rather than the end-of-input.

(3)
var re = new RegExp(
'^\<p\s*\>[\<br\s*\/{0,1}\>|\s]*\<\/p\>|[\<br\s*\/{0,1}\>|\s]*$' );
re.test( MyStringValue );
(4)
var re = new RegExp(
'^\<p\s*\>[\<br\s*\/{0,1}\>|\s]*\<\/p\>|[\<br\s*\/{0,1}\>|\s]*$', 'gim' );
re.test( MyStringValue );

You don't need to escape `<' or `>' in string literals. Instead, you must
escape all backslashes in the expression as currently you are passing the
equivalent of

"^<ps*>[<brs*/{0,1}>|s]*</p>|[<brs*/{0,1}>|s]*$"

Unlike PHP, there is no difference in ECMAScript implementations with
single-quoted and double-quoted strings. And unlike PHP, ECMAScript
implementations do not support Perl-Compatible Regular Expressions (PCRE).
Although I am pretty sure `[' and `]' are not used in PCRE for capturing as
well.
But it's failing the test (returning true) on things like

<p>
<br>
<br/>lskadfakjsdf;lja <br>
</p>

for example. What gives?

(1) /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/
matches the empty word at the end of input
with /(?:<br\s*\/?>|\s)*\s*$/

(2) /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/gim
first matches "<br>" in line 2, then " <br>" at the end of line 3,
with /(?:<br\s*\/?>|\s)*\s*$/gim

(3) /^<ps*>[<brs*/{0,1}>|s]*</p>|[<brs*/{0,1}>|s]*$/
matches ">" at the end of input (in the last line),
with /[<brs*/{0,1}>|s]*$/

(4) /^<ps*>[<brs*/{0,1}>|s]*</p>|[<brs*/{0,1}>|s]*$/gim
first matches ">" in the first line (containing "<p>"),
with /[ said:
Am I doing something wrong?
Obviously.

It seems like it's working elsewhere, just not in JS...

That's highly unlikely. You have not considered how greedy matching works
(/x*/ matches the empty word, maybe you were looking for /x+/ instead) and
you have been using fantasy syntax (`[...]' denotes a character class
instead). Furthermore, you failed to observe that anchors are part of the
operand of the alternation: /^ab|cd$/ matches either "ab" at the beginning
of input or "cd" at the end of input, not an input consisting of either "ab"
or "cd" (that have to be /^(ab|cd)$/ or /^(?:ab|cd)$/.)

<http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp>


HTH

PointedEars
 
T

Thomas 'PointedEars' Lahn

Thomas said:
Christoph said:
But it's failing the test (returning true) on things like

<p>
<br>
<br/>lskadfakjsdf;lja <br>
</p>

for example. What gives?

[...]
(2) /^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$/gim
first matches "<br>" in line 2, then " <br>" at the end of line 3,
with /(?:<br\s*\/?>|\s)*\s*$/gim

Correction: It matches the substring consisting of the newline before "<br>"
(because of the /(...|\s)*/gim alternation) followed by "<br>" followed by
(3) /^<ps*>[<brs*/{0,1}>|s]*</p>|[<brs*/{0,1}>|s]*$/
matches ">" at the end of input (in the last line),
with /[<brs*/{0,1}>|s]*$/

It matches also the empty word at the end of input because of /[...]*$/.
(4) /^<ps*>[<brs*/{0,1}>|s]*</p>|[<brs*/{0,1}>|s]*$/gim
first matches ">" in the first line (containing "<p>"),
with /[<brs*/{0,1}>|s]*$/gim

It matches also:

- the empty word after "<p>" because of /[...]*$/
- "<br>" in line 2 because of /[...<br...>...]*$/
- the empty word after "<br>" in line 2 because of /[...>...]*$/
- "<br>" in line 3 because of /[<br...>]*$/
- the empty word after "<br>" in line 3 because of /[...>...]*$/
- ">" in line 4 because of /[...>...]*$/
- the empty word after "</p>" in line 4 because of /[...]*$/

(Did I already mention QuickREx rules? :))


PointedEars
 
L

Lasse Reichstein Nielsen

Christoph Boget said:
I'm trying to get a regular expression to work in JS. It appears to
be working everywhere else I'm testing it (an app called Regex Coach
and php) but I can't seem to get it to work in JS. What the regex is
supposed to do is:

If I understand correctly:
either a <br>, optionally with whitespaces after the "br" and/or slash
before the ">",
or zero or more of these br's interleaves by optional whitespace
and optionally flanked by <p> and </p> with optional surrounding whitespace.

From that I construct the following:

/^\s*(?: said:
The regex I came up with is:

^\s*<p\s*>(?:<br\s*\/?>|\s)*<\/p>|(?:<br\s*\/?>|\s)*\s*$

The middle "|" separates the entire regexp into two parts. That means
that the ^ and $ anchors, as well as their adjacent whitespaces, are in
separate alternatives. You probably want to group the alternatives
so that the anchors and whitespaces apply to both.
.....
But it's failing the test (returning true) on things like

<p>
<br>
<br/>lskadfakjsdf;lja <br>
</p>

That's because its second alternative, anchored only at the
end, allows matching the empty string.

If you change "test" to "exec", you can see what was matched.
In the above case, it's a zero-length string (unless there
is some whitespace hidden after the </p>).

If you also provide the "g" flag, you can check the regexp's
"lastIndex" property after the match, to see the index of the
first character after the matched string. In the above case,
it's at the end of the string.

I.e., it matches the zero-length string at the end.
for example. What gives? Am I doing something wrong? It seems like
it's working elsewhere, just not in JS...

"seems" like it's working "elsewhere". To channel Chandler Bing:
Could it BE any less precise? :)

Your Javascript is working according to specification.

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top