FAQ Topic - How do I trim whitespace? (2009-12-10)

F

FAQ server

-----------------------------------------------------------------------
FAQ Topic - How do I trim whitespace?
-----------------------------------------------------------------------

A regular expression can be used:

function trimString(s) {
return s.replace(/^\s+|\s+$/g,'');
}

Implementations are inconsistent with ` \s `. For example,
some implementations do not match ` \xA0 ` (no-break space),
among others.

A more consistent approach would be to create a character class
that defines the characters to trim.

ECMAScript 5 defines ` String.prototype.trim `, but this is
not yet widely supported.

https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp

http://thinkweb2.com/projects/prototype/whitespace-deviations/

https://developer.mozilla.org/en/Firefox_3.1_for_developers

http://docs.sun.com/source/816-6408-10/regexp.htm

http://msdn.microsoft.com/en-us/library/6wzad2b2(VS.85).aspx

http://groups.google.com/group/comp...39217600c3/31092c5eb99625d0?#31092c5eb99625d0

http://unicode.org/Public/UNIDATA/PropList.txt


The complete comp.lang.javascript FAQ is at
http://jibbering.com/faq/
 
D

Dr J R Stockton

In comp.lang.javascript message <[email protected]
, Thu, 10 Dec 2009 00:00:02, FAQ server <[email protected]> posted:

A regular expression can be used:

function trimString(s) {
return s.replace(/^\s+|\s+$/g,'');
}

Or something like

String.prototype.myTrim =
function() { return this.replace(/^\s+|\s+$/g,'') }

with a brief explanation of the effect of prototype?
 
M

Michael Haufe (\TNO\)

-----------------------------------------------------------------------
FAQ Topic - How do I trim whitespace?
-----------------------------------------------------------------------

A regular expression can be used:

function trimString(s) {
return s.replace(/^\s+|\s+$/g,'');

}

Implementations are inconsistent with ` \s `. For example,
some implementations do not match ` \xA0 ` (no-break space),
among others.

A more consistent approach would be to create a character class
that defines the characters to trim.

I think this should cover all whitespace:

if(typeof String.prototype.trim === "undefined"){
String.prototype.trim = (function(){
var strWhitespace = "\\u0009\\u000b\\u000c\\u0020\\u00a0\
\u1680" +
"\\u180e\\u2000-\\u200a\\u2028\\u2029\
\u202f" +
"\\u205f\\u3000\\ufeff",
reTrim = new RegExp("^[" + strWhitespace + "]*" +
"([^" + strWhitespace + "]*)" +
"[" + strWhitespace + "]*$");
return function(){
return this.replace(reTrim,"$1");
}
})();
};

Line terminators aren't included, but can be added with the following:
(?:\u000D\u000A)|[\u000D\u000A\u2028\u2029])
 
T

Thomas 'PointedEars' Lahn

Michael said:
-----------------------------------------------------------------------
FAQ Topic - How do I trim whitespace?
-----------------------------------------------------------------------

A regular expression can be used:

function trimString(s) {
return s.replace(/^\s+|\s+$/g,'');

}

Implementations are inconsistent with ` \s `. For example,
some implementations do not match ` \xA0 ` (no-break space),
among others.

A more consistent approach would be to create a character class
that defines the characters to trim.

I think this should cover all whitespace:

if(typeof String.prototype.trim === "undefined"){
String.prototype.trim = (function(){
var strWhitespace = "\\u0009\\u000b\\u000c\\u0020\\u00a0\
\u1680" +
"\\u180e\\u2000-\\u200a\\u2028\\u2029\
\u202f" +
"\\u205f\\u3000\\ufeff",
reTrim = new RegExp("^[" + strWhitespace + "]*" +
"([^" + strWhitespace + "]*)" +
"[" + strWhitespace + "]*$");
return function(){
return this.replace(reTrim,"$1");
}
})();
};

Line terminators aren't included, but can be added with the following:
(?:\u000D\u000A)|[\u000D\u000A\u2028\u2029])

I presume the wording in the FAQ was chosen intentionally to point out that
it depends on the use-case which characters need to be considered for
trimming and that a general solution like his might neither be necessary nor
recommended.

The main error in your approach, though, that the original approach does not
have, is that you are matching optional leading whitespace and optional
trailing whitespace with either no other character or no whitespace
whatsoever in-between. For example, your expression will not allow

" foo bar "

to be trimmed to

"foo bar"

as expected, because the space (\u0020) in-between prevents the match (the
expression is isomorph to /^\s*(\S*)\s*$/). It will also needlessly match
the empty string because of the `*' quantifiers. And the trailing `;'
following the /Block/ statement is superfluous and potentially confusing.

However, if the original approach would be modified to include your
character classes, that would work:

if (typeof String.prototype.trim === "undefined")
{
String.prototype.trim = (function() {
var
strWhitespace =
"\\u0009\\u000b\\u000c\\u0020\\u00a0\\u1680"
+ "\\u180e\\u2000-\\u200a\\u2028\\u2029\\u202f"
+ "\\u205f\\u3000\\ufeff",
reTrim = new RegExp("^[" + strWhitespace + "]+" +
+ "|[" + strWhitespace + "]+$");

return function() {
return this.replace(reTrim, "");
};
})();
}

Also, if I understand the CharacterClass production in the ECMAScript
Specification (at least Edition 5, section 15.10.1) correctly, escaping the
escape sequences should not be necessary here (as they do not specify one of
`\', `]', or `-', and it stands to reason that if an implementation supports
Unicode RegExp escape sequences, it would support Unicode String escape
sequences, too; at least that is what my research shows so far.)

For JavaScript, though, your solution apparently comes a bit too late (and
please forgive me if I say so, it does not strike me as particularly
original anyway; we have discussed this several times before): ECMAScript
Edition 5 specifies String.prototype.trim(), and JavaScript 1.8.1 as of
Firefox 3.5 implements it (if exactly as specified, remains to be tested).
So this serves only older JavaScript versions and other implementations that
do not yet implement this feature of ES 5 -- whereas "only" should not mean
to diminish the approach's value as the important JScript would be included.


PointedEars
 
G

Garrett Smith

Dr said:
In comp.lang.javascript message <[email protected]

Or something like

String.prototype.myTrim =
function() { return this.replace(/^\s+|\s+$/g,'') }

with a brief explanation of the effect of prototype?

| ECMAScript 5 defines String.prototype.trim. Where not implemented, it
| can be added:
|
| if(!String.prototype.trim) {
| String.prototype.trim = function() {
| return this.replace(/^\s+|\s+$/g,'');
| };
| }
|
| Implementations are inconsistent with \s. For example, some
| implementations do not match \xA0 (no-break space), among others.
|
| A more consistent approach would be to create a character class that
| defines the characters to trim.
 
T

Thomas 'PointedEars' Lahn

Garrett said:
| ECMAScript 5 defines String.prototype.trim.

5 is not the version of ECMAScript, but the Edition of the Specification.
I would also add an empty argument list to indicate that it is a method:

ECMAScript Edition 5 defines String.prototype.trim().

Maybe "specifies" instead of "defines".
| Where not implemented, it
| can be added:
|
| if(!String.prototype.trim) {
^
Michael's feature test is better although it, too, lacks the pretty-printing
distinction between method calls and statements.


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <05085e2c-9481-4de5-a2d0-ba50565c3be9@19
g2000vbq.googlegroups.com>, Sat, 12 Dec 2009 20:00:44, "Michael Haufe
if(typeof String.prototype.trim === "undefined"){
String.prototype.trim = (function(){
var strWhitespace = "\\u0009\\u000b\\u000c\\u0020\\u00a0\
\u1680" +
"\\u180e\\u2000-\\u200a\\u2028\\u2029\
\u202f" +
"\\u205f\\u3000\\ufeff",
reTrim = new RegExp("^[" + strWhitespace + "]*" +
"([^" + strWhitespace + "]*)" +
"[" + strWhitespace + "]*$");
return function(){
return this.replace(reTrim,"$1");
}
})();
};

That will take a fair bit of explaining if it is to be understood
reasonably easily by the target readership of the FAQ.

The FAQ contains no other example of 'new RegExp' (and 7.1 uses, without
explanation, a RegExp literal).

The FAQ contains no other example of '\\' or '\u'.

The FAQ contains no other example of
'<string>.replace(RegExp, <string>)'.

The FAQ contains no other explanation of '.prototype.'.

The FAQ's only existing instance of ')(' is in an unrelated part, and is
not explained.

It is worth saying that the whitespace set is browser-dependent; but the
set [ 0009 000a 000b 000c 000d 0020 ] common to 'all' browsers is
sufficient for almost all needs. Using anything but '\s' to refer to
the whitespace set is going too far in a general-purpose trim routine.
Few will care about \u180e, and most of those probably won't read
English. It's just bloat. Commonly, space and tab suffice, or those
and CR LF.

Perhaps the FAQ should be accompanied by a library, containing well-
tested code with instructions for use but no consideration of how easily
it can be understood.

This .trim takes no argument. It could be given a string argument to be
used, if present, instead of strWhitespace.
 
G

Garrett Smith

Thomas said:
5 is not the version of ECMAScript, but the Edition of the Specification.
I would also add an empty argument list to indicate that it is a method:

ECMAScript Edition 5 defines String.prototype.trim().

Maybe "specifies" instead of "defines".

ECMAScript Edition 5 specifies String.prototype.trim().

^
Michael's feature test is better although it, too, lacks the pretty-printing
distinction between method calls and statements.

Better in what way? This way is a little shorter.
 
T

Thomas 'PointedEars' Lahn

RobG said:
There is an interesting post on the speed of various trim functions
here:

Faster JavaScript Trim
<URL: http://blog.stevenlevithan.com/archives/faster-trim-javascript >

As usual with ECMAScript, less is not always more and "best" is
context dependent. :)

A two-years old article providing test results for Fx 2.0 and IE 6 is hardly
relevant these days. Anyhow, you miss the point, see the underlined part.

And trim your quotes, please.


PointedEars
 
R

RobG

RobG said:
Garrett said:
Thomas 'PointedEars' Lahn wrote:
Garrett Smith wrote: [...]
| if(!String.prototype.trim) {
^
Michael's feature test is better although it, too, lacks the
^^^^^^^^^^^^
pretty-printing distinction between method calls and statements.
Better in what way? This way is a little shorter.
There is an interesting post on the speed of various trim functions
here:
As usual with ECMAScript, less is not always more and "best" is
context dependent. :)

A two-years old article providing test results for Fx 2.0 and IE 6 is hardly
relevant these days.

It is to me as it:

1. Provides a number of different algorithms that can easily be tested
in current browsers for updated comparisons

2. Makes a general case that what seems fast in one browser may not be
in another

3. Shows that which algorithm is faster may depend on characteristics
of the string being trimmed

If none of that is relevant to you, fine - ignore it. I can't imagine
why you think I need to know that, my reply was not directly to a post
of yours nor to a point you had made.
Anyhow, you miss the point, see the underlined part.

No, I didn't. I understand that you prefer typeof tests to type
conversion tests. My reply was not in regard to that specific point.
Garrett's inference was that the type conversion test might be
considered better because it was shorter (i.e. less code). As Richard
Cornford wrote:

"It is extremely rare that 'less typing' has been the reason for doing
(or not doing) anything. That is a recent excuse, and only for things
that could never be otherwise justified at all."
<URL: http://groups.google.com/group/comp.lang.javascript/msg/14cdc71533e1c2ba
One of the points highlighted by the post I linked to is that less
code does not necessarily mean faster performance and therefore may
not be "better" by that criterion.

And trim your quotes, please.

I included perhaps 6 lines of unnecessary text, which is not much more
than the length of your signature. Hopefully I have been sufficiently
concise this time. This song is a suitable reflection of my attitude:

<URL: http://en.wikipedia.org/wiki/Oxford_Comma_(song) >

You might call an Oxford comma a Harvard comma, but who cares about
that.
 
G

Garrett Smith

Thomas said:
Testing type, not value.


But is it also more reliable or efficient?

Efficient?

Not measureably. Looping over that |if| statement 1000 times is not
measureable. Testing on a 2ghz dell latitude running windows vista,
looping over the - if - statement 1000 times is not measureable.

javascript: var d = new Date;for(var i = 0; i < 1000;
i++)if(!String.prototype.trim);var r= new Date-d; alert(r);

Results in IE7, FF 3.5:
0

Increasing the loop to 15 times:
javascript: var d = new Date;for(var i = 0; i < 10000;
i++)if(!String.prototype.trim);var r= new Date-d; alert(r);

IE7: 13
FF3.5: 0

Using typeof check with 10000 iterations:
javascript: var d = new Date;for(var i = 0; i < 10000; i++)if(typeof
String.prototype.trim !== "function");var r= new Date-d; alert(r);

IE7: 17
FF3.5:

The typeof test is actually little slower in IE. However, the difference
is not noticeable with up to 1000 iterations. Using typeof adds a few
extra bytes. The costs/benefits do not seem significant.

As for being more reliable, I cannot see how boolean conversion could
fail here. Can you explain?
 
G

Garrett Smith

RobG said:
RobG said:
Garrett Smith wrote:
Thomas 'PointedEars' Lahn wrote:
Garrett Smith wrote: [...]
| if(!String.prototype.trim) {
^
Michael's feature test is better although it, too, lacks the ^^^^^^^^^^^^

pretty-printing distinction between method calls and statements.
Better in what way? This way is a little shorter.
[snip about blog entry]
No, I didn't. I understand that you prefer typeof tests to type
conversion tests. My reply was not in regard to that specific point.
Garrett's inference was that the type conversion test might be
considered better because it was shorter (i.e. less code). As Richard
Cornford wrote:

"It is extremely rare that 'less typing' has been the reason for doing
(or not doing) anything. That is a recent excuse, and only for things
that could never be otherwise justified at all."
<URL: http://groups.google.com/group/comp.lang.javascript/msg/14cdc71533e1c2ba

Nobody is arguing about typing. The argument was that boolean conversion
was shorter, but that the difference did not seem significant. The other
implication is readability.

It seems simpler and easier to see:

if(!String.prototype.trim) { }

Than:

if(typeof String.prototype.trim !== "undefined") { }

The former is shorter and puts the negation first.
 
M

Michael Haufe (\TNO\)

Nobody is arguing about typing. The argument was that boolean conversion
was shorter, but that the difference did not seem significant. The other
implication is readability.

It seems simpler and easier to see:

   if(!String.prototype.trim) { }

Than:

   if(typeof String.prototype.trim !== "undefined") { }

The former is shorter and puts the negation first.

With the shorter approach, one risks overwriting an explicitly set
falsy property. Whether that is desirable behavior is questionable.
 
D

Dmitry A. Soshnikov

With the shorter approach, one risks overwriting an explicitly set
falsy property. Whether that is desirable behavior is questionable.

The same will be for an explicitly set `undefined' value in both
approaches (if you're afraid of exactly explicitly set value fact) ;)
So, in educational purpose shorter variant looks more elegant.

/ds
 
G

Garrett Smith

Michael said:
With the shorter approach, one risks overwriting an explicitly set
falsy property. Whether that is desirable behavior is questionable.

It depends where in the source order the falsy property was set. Was
String.prototype.trim = 0 occur in source order before
String.prototype.trim = function(){...} ? If so, then the value would be
a function.

Setting String.prototype.trim to a falsy value seems pointless. Setting
String.prototype.trim to a falsy value, and then setting
String.prototype.trim = function(){...} would be confusing and pointless.
 
G

Garrett Smith

Dmitry said:
[...]
With the shorter approach, one risks overwriting an explicitly set
falsy property. Whether that is desirable behavior is questionable.

The same will be for an explicitly set `undefined' value in both
approaches (if you're afraid of exactly explicitly set value fact) ;)
So, in educational purpose shorter variant looks more elegant.
Right. ES$ allows:-

this.undefined = Function.prototype

Pointless thing to do.
 
D

Dmitry A. Soshnikov

Dmitry said:
On Dec 17, 11:55 am, Garrett Smith <[email protected]> wrote:
[...]
With the shorter approach, one risks overwriting an explicitly set
falsy property. Whether that is desirable behavior is questionable.
The same will be for an explicitly set `undefined' value in both
approaches (if you're afraid of exactly explicitly set value fact) ;)
So, in educational purpose shorter variant looks more elegant.

Right. ES$ allows:-

   this.undefined = Function.prototype

Pointless thing to do.

Yep, right, `undefined' property of the global can be changed, but I
meant on Michael Haufe (TNO)'s post the following:

String.prototype.trim = undefined;

and if so (if Michael Haufe (TNO) afraid of exactly user set value
fact), there's no difference to check it on (!...) or (typeof ... !=
'undefined'). And since it used in educational purpose, the short
variant can be better (for do not overloading example), although, both
variants are acceptable.

/ds
 
A

Asen Bozhilov

Garrett said:
It seems simpler and easier to see:

   if(!String.prototype.trim) { }

Than:

   if(typeof String.prototype.trim !== "undefined") { }

The former is shorter and puts the negation first.

In ECMA5 15.5.4.20 String.prototype.trim refer to object which
internal [[Prototype]] refer Function.prototype. Internal have
[[Call]] and [[Construct]] methods.

Because that is feature of ES5 before i use [[Call]] i make feature
test.

if (typeof str.trim === 'function')
{
//call trim
}

But if i explicit define trim function and assign to
String.prototype.trim i don't need from all of that test before i call
`trim'. Because of that i will be use:

if (typeof String.prototype.trim !== 'function')
{
String.prototype.trim = function()
{
//do trim
};
}

So in my code i can directly use:

str.trim();

Without feature test every time when i call `trim' method.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top