higabe said:
Three questions
1)
I have a string function that works perfectly but according to W3C.org
web site is syntactically flawed because it contains the characters </
in sequence. So how am I supposed to write this function?
String.replace(/</g,'<');
Hmm, I can see that I have some of those too, the most recent of them
written today. Bummer. I never noticed that there was a </-sequence in
that.
Try
String.replace(/[<]/g,'<');
or
String.replace(RegExp("<","g"),'<');
2)
While I'm on the subject, anyone know why they implemented replace using
a slash delimiter instead of quotes? I know it's how it's done in Perl
but why is it done that way?
They didn't implement "replace" with slash-delimiters. They
implemented *regular expressions* with slash-delimiters. You can use
regular expressions in many other ways than just string-replace.
You could also write
var myRegExp = /[<]/g;
String.replace(myRegExp,'<');
These are equivalent uses of regular expressions and strings:
/a*b/i.exec("caabc")
and
"caabc".match(/a*b/i)
3)
One last regexp question:
is it possible to do something like this:
String.replace(/<(.*?)>(.*?)</$1>/ig,'<$1>$2</$1>');
Yes, but you need to escape the slash in "</" and it's "\1" instead of
"$1". Also you will only want to match the tag name, not attributes,
and you have no letters, so the "i" flag is not necessary. And don't
call a variable "String", since it conflicts with the global variable
holding the constructor of String objects.
So, this should do what you wanted:
string.replace(/<\s*(\w+)\b(.*?)>(.*?)<\/\1>/g,
'<$1$2>$3</$1>');
It is confuzed if ">" occurs inside an attribute, e.g. <tag
attr="foo>bar">. Just don't do that
It doesn't handle nested tags either. That is still outside the power
of regular expressions, even with backreference.
There are ways around that, though, using a function as second argument
of replace, allowing us to use recursion:
function tagify(string) {
return string.replace(/<\s*(\w+)\b(.*?)>(.*?)<\/\1>/g,
function(match,sub1,sub2,sub3) {
return "<"+sub1+sub2+">" +
tagify(sub3) +
"</"+sub1+">";
});
}
This still fails for elements with no closing tag. It could probably
be made to work for XHTML, where all tags have end tags (sometimes
abbreviated to just end in "/>"):
/<\s*(\w+)\b(|.*?[^/])(?:\/>|>(.*?)<\/\1>)/g
^start tag
^optional whitespace
^tagname
^optional attributes, not ending in /
^either >content</tagname> or just />
The XHTML parser would then be:
function tagify(string) {
return string.replace(
/<\s*(\w+)\b(|.*?[^/])(?:\/>|>(.*?)<\/\1>)/g,
function(match,sub1,sub2,sub3) {
return "<"+sub1+" "+sub2+
(sub3 !== undefined ?
">" + tagify(sub3) +
"</"+sub1+">" :
"/>");
});
}
Hmm. I feel stupid, considering the much larger parser for XHTML that
I made some time ago. Oh well, at least it handled ">" inside
attribute values

.
This is just an example where a sub-match used in a regular expression
must sub-match again exactly as it did the first time later in the same
string.
It works in recent versions of Javascript/ECMAScript. Earlier ones didn't
have non-greedy matches (*?) or backreferences (\1).
But I don't know how to do that in a regexp although it seems
like it should be possible.
It is, and you were close.
Adding backreferences to regular expressions gives them more power than
"real" regular expressions, i.e., they can be used to match something that
is not a regular language. Example:
/^(11+)\1+$/
This regular expression matches any string of 1's that can be written
as two or more repetitions of two or more 1's. That is, unary representation
of composite numbers.
!/^(11+)\1+$/.test("--string of n 1's--")
is a test for whether n is prime (but not a very efficient one).
/L