A robust way to remove white spaces (RegExp)

V

VK

If it was already answered somewhere, I'll be glad to be pointed to
(after the necessary comments on my search abilities :)

I need as booletproof as possible way to strip out whitespaces from
between tag borders in the source code.

1) left border defined by gt sign >
2) right border defined by lt sign <
3) If the content between left and right borders consists only of white
spaces it has to be removed.
4) Content consists of white spaces only if it contains only \n, \r,
\t, \f, space (\u0020) in any amount and any combinations.
Note: NON-BREAKING SPACE (nbsp, \u00A0) is /not/ a whitespace
character.

Thus say the outcome from:

<foo>
<bar>Foobar</bar>
</foo>

will be:

<foo><bar>Foobar</bar></foo>

Anyone knows of a similar RegExp?
 
E

Evertjan.

VK wrote on 22 mei 2006 in comp.lang.javascript:
If it was already answered somewhere, I'll be glad to be pointed to
(after the necessary comments on my search abilities :)

I need as booletproof as possible way to strip out whitespaces from
between tag borders in the source code.

1) left border defined by gt sign >
2) right border defined by lt sign <
3) If the content between left and right borders consists only of white
spaces it has to be removed.
4) Content consists of white spaces only if it contains only \n, \r,
\t, \f, space (\u0020) in any amount and any combinations.
Note: NON-BREAKING SPACE (nbsp, \u00A0) is /not/ a whitespace
character.

Thus say the outcome from:

<foo>
<bar>Foobar</bar>
</foo>

will be:

<foo><bar>Foobar</bar></foo>

Anyone knows of a similar RegExp?

myResult = myString.replace(/\s/g,'')


\s is equivalent to [ \f\n\r\t\v]
 
D

Dr John Stockton

JRS: In article <[email protected]>
, dated Mon, 22 May 2006 07:23:40 remote, seen in
news:comp.lang.javascript said:
If it was already answered somewhere, I'll be glad to be pointed to
(after the necessary comments on my search abilities :)

I need as booletproof as possible way to strip out whitespaces from
between tag borders in the source code.

1) left border defined by gt sign >
2) right border defined by lt sign <
3) If the content between left and right borders consists only of white
spaces it has to be removed.
4) Content consists of white spaces only if it contains only \n, \r,
\t, \f, space (\u0020) in any amount and any combinations.
Note: NON-BREAKING SPACE (nbsp, \u00A0) is /not/ a whitespace
character.

Thus say the outcome from:

<foo>
<bar>Foobar</bar>
</foo>

will be:

<foo><bar>Foobar</bar></foo>

Anyone knows of a similar RegExp?

Seems easy

S = "<foo>\n <bar>Foobar</bar>\n</foo>"

alert(S.replace(/>\s+</g, "><"))

Note : \u0020 is not whitespace, but it is a representation of unit
whitespace in common Unicode characters.
 
T

Thomas 'PointedEars' Lahn

Dr said:
Note : \u0020 is not whitespace, but it is a representation of unit
whitespace in common Unicode characters.

It is the Unicode escape sequence representation of _one_ Unicode SPACE
character (U+0020).


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top