Peter Michaux wrote:
<snip>
So in this case we have an interface that consist of a global Identifier
named 'serializeForm' that refers to a function object, which takes a
form element as its single argument and returns a string that represents
the form control's values in a (near) application/x-www-form-urlencoded
form.
Indeed it is only "near".
http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
"a b" should be encoded as "a+b"
but
encodeURIComponent('a b') // "a%20b"
"a\nb" should be encoded as "a%0D%0Ab"
but
encodeURIComponent('\n') // "%0A"
In practice, it seems developers don't complain about this so their
server-side toolkits that decode a response must know to do the right
thing.
I wonder why encodeURIComponent doesn't encode the whitespace
correctly.
Is replicating the spec for spaces and new lines necessary or should
it simply be documented that there is a difference?
My first quibble would be the name.
I agree.
(To a certain degree, over time I've been loosing energy to fight for
this aesthetic quality simply because I work with a server-side legacy
system that has a hodgepodge of names. Also, the browser scripting
world is not known for a clean API. I just get used to looking up the
correct name. We should try for a nice naming scheme, however.)
Where global Identifiers are to be
used (and I have no objection to their use here (better than the
pointless runtime overheads of one of those silly 'namespace' schemes))
I don't think one level of namespace is silly but would like to focus
on correct algorithms. If an automatic build system converts a
collection of these multiple implementations into a library, then the
build system could introduce the namespacing. Correct algorithms are
far more time consuming to develop then repackaging the code for use
with a namespace.
the Identifier should be as unambiguous as possible, and certainly be
fairly explicit about what the object identified is for.
Form serialisation may take many forms, including (increasingly these
days) JSON and XML serialisation. It may even include (given a suitable
Intranet application with very restricted browser/browser configuration
support and local file system access) multipart/form-data serialization
for context involving file upload.
It would be better if the interface name stated how it was going to
serialize the form. Something like - urlSerializeForm - or -
serializeFormUrlEncodeed -.
To me, descriptive but still short is better. I like the idea that
serializing the form is the primary concern and the format is
secondary. "serializeFormUrl" and "serializeFormJson" have the feel of
the ES4 type annotations that are coming down the pipe.
This description should be more explicit/precise about what this
implementation actually does. It returns a string consisting of a '&'
separated sequence of '=' separated name/value pairs where the names and
values have been encoded using javascript's - encodeURIComponent -. The
result being an approximation of application/x-www-form-urlencoded, but
not including the transfformation of line breaks into "CR LF" pairs
(which is only likely to be an issue with TEXTAREA fileds on non-Windows
OSs, and then only if the recivenign software cares or where consitency
is expectd in the storage medium).
THe documentation does need improvement.
There is a text wrapping issue in this presentation of the code. If this
code is to be critiqued on Usenet (as it always should given the
intention) it would be a good idea to apply/impose the formatting rule
that all code and comments should be manually wrapped at (or before),
say, 72 characters.
I noticed this also. I only looked briefly and thought it was the
google groups software wrapping after 71 characters. If Usenet wraps
after 72 characters then perhaps we should set the limit to 70 so that
at least one reply can be made without causing wrapping in the reply.
TEST
123456789 123456789 123456789 123456789 123456789 123456789 1 3 5 7 9
1 3 5 7 9
123456789 123456789 123456789 123456789 123456789 1234567890 2 4 6 8 0
2 4 6 8 0
(I'm posting through the google groups interface.)
This is a statement of intended use masquerading as a statement of
actual behaviour. This particular version will also serialize <input
type="button">, <input type="submit">, and so on, in a way that would be
unexpected in any submitted form data. In practice this version is only
really intended for use with forms that consist _only_ of the elements
listed (and that all such controls _must_ have name attributes), and
that restriction should be very clearly stated.
Good point.
The conditions determining the creation of the function object are the
existence of a - push - method of arrays (so JScript 5.5+. or emulated
on IE (<=5.0)) and the ECMAScript 3rd Ed. encodeURIComponent function
(or an emulation). When these conditions can be known to be met then
there is no need for testing prior to use. Thus the true conditions
should be stated here so that the test/don't test decision can be made
on an informed basis.
So you are in favor of stating the requirements in documentation
rather than including the feature tests in the code? I would think
that a version with and a version without the tests could account for
two of the multiple implementations. The one with tests being
applicable to the general web.
There is also the possibility of defining a 'default' function in an
else branch for the creation test that just returns - null - and having
the calling code test the return value for the success of the sterilize
function call (no string, including the empty string, equals - null - by
type-converting (or, obviously, strict) equality). That design would
allow for the signalling of error states within the serialization
function to also be signalled with a - null - return value).
I've been thinking about this lately as a result of some other posts
about feature detection.
I like the idea of not defining a function if feature detection
determines the function cannot be supported. It would be nice if
browser features that exist were bug free always worked. This allows
for a nice programming pattern where functions are only defined if the
dependencies exists. For example, a tabbed pane widget creation
function wouldn't even exist if the browser's event's module is not
present and functioning adequately. This seems like a safer way to
program. The situation where a section of JavaScript starts executing
and manipulating the DOM but fails half way through execution seems to
be a concern that occupies my mind frequently. This technique of only
defining a function when the dependencies are available is not
possible in general. Scroll reporting is an example as the body
element may need to be present. In scroll reporting checking the
return value is important. I'd like to try to define functions only
when dependencies are available and fall back to checking return
values only when necessary. I'm not completely committed on way or the
other, however.
Another technique I tried for a while was to have an isSupported
function
function serializeForm() {
// ...
}
serializeForm.isSupported = function() {
// ...
};
// use
if (serializeForm.isSupported()) {
serializeForm();
}
This would be better expressed as "Features assumed to exist"
Yes.
There has got to be as better way of expressing this.
Indeed. Those were very rough notes.
While
JavaScript(tm) 1.0 did have a notion of 'client side' and 'core'
JavaScript(tm) that notion has long gone in practice, and the echoes of
idea do no more now than introduce misconceptions. We (as a group)
should do nothing to promote any ongoing perception of the browser
object model as being a part of the javascript language.
Agreed.
Martin Honnen has already corrected that second test, though it could
still be a type-converting test rather than - typeof - (-
window.encodeURIComponent - , or some form of -
global.encodeURIComponent - with - var global - explicitly set (perhaps
elsewhere)).
One thing that really bothers me about the mainstream libraries is
there is some sort of foundation file upon which all other files
exist. I would like to avoid this and so would rather not have a file
with "var global=this;". It is a slippery slope and that foundation
file has always grown in size in the libraries I've watched.
In ES4 there will be a global identifier called "global" that is
equivalent to "window" so avoiding "global" as a global identifier may
be a good idea although the clobbering may not make any difference.
I am not opposed to having a "global" identifier at times. For example
this pattern seems handy...
var foo = (function() {
var global = this;
return function() {
// some use of global
}
})();
I don't think we should use "window" as a global reference in case the
code could be used outside a browser
Regardless of those nit picks, there is a problem changing the test
from "typeof encodeURIComponent != 'undefined'" to
"window.encodeURIComponent" or "global.encodeURIComponent". The
problem is that the test is not the same as the use in the function.
If the use was also changed to "global.encodeURIComponent" it would be
ok. I'm thinking about an automatic build system that inserts the
serializeFormUrl source code into a scope (perhaps using some
templating system to make the build JavaScript files)
function() {
function encodeUrlComponent(){}
// begin template inserted code
if (window.encodeURIComponent) {
var serializeFormUrl = function() {
encodeURIComponent();
}
}
// end template inserted code
}
If the above code runs in a browser without a global
encodeUrlComponent then it will not define serializeFormUrl even
though it would have been possible to do so. For these reason I think
Martin Honnen's suggestion is a good one to stick with.
The use of - push - is avoidable with another local variable initialised
to zero and post incremented at assignment:-
var n = 0;
...
c[n++] = (
encodeURIComponent(e.name) +
"=" +
encodeURIComponent(e.value)
);
- as the output array is initialised to an empty array.
In principle keeping the dependencies low is a good idea. I'd feel
very compelled to make this change if there exists a browser that
supports encodeURIComponent and does not have Array.prototype.push.
Otherwise I really don't have much opinion. If you or others think
removing the "push" dependency is that important it is fine with me.
[snip about code conventions: whitespace, single letter identifiers
and loop identifiers]
I agree these issues are important and should be addressed. It is very
difficult to achieve agreement. Because this is so difficult to
achieve agreement, I will state my primary concern is local variables
in short functions are short. It doesn't really bother me that lower
case "ell" and uppercase "Oh" are a problem in some fonts. Programmers
really should have picked a good font for their editor

. I'm not so
concerned if loop identifiers are "i", "j", "k" with lengths "ilen",
"jlen", "klen" or if they are "c", "d", "e". I'd like to stick with
"i", "j", "k" because they are traditional and I've used them for a
long time. I do use "e" for element and event and "f" for form and
function. However these functions are short so any identifiers really
are ok for me. And after the functions are written my plan is not to
look inside them frequently so it is not a big concern for me.
Download size does matter, however. Perhaps we can otherwise finish
the first function and then make a decision about this just for some
momentum.
----
I've always seen the "trick" to avoid Array.prototype.push written as
c[c.length] = ...
On the general interface design question, may I suggest an optional
second argument that would be the name of a submit button (of one sort
or another). It would not be used in this implementation (where submit
buttons are not being handled at all, and so should be absent form the
form) but in more elaborate implementations, while filtering out <input
type=submit>, <input type=image>, <button type=submit> and <button>
elements, the name value pair for any who's name corresponded with that
second argument (if provided) could be inserted into the serialization.
That could be useful for emulating types of form submission where the
name/value pair of the submit button clicked determined the actions of
the server script).
Maybe not a complete solution as people often use many like-named submit
buttons with differing values. And maybe not a necessary addition as the
code retrieving the serialized form could always append such a name
value pair to the serialized form.
In a previous XHR library I wrote I did accommodate the button name/
value problem. For these many implementations I was thinking it would
be better to let the calling function append.
Generally these interface designs deserve some kicking about before
fixing on the final design so that they can be flexible enough to
support a good range of possible/likely underlying implementations
beyond the one situation that first prompts their creation.
I think that when designing an interface that is intended to grow,
having the last argument be an object (to be used as a hash) is a good
idea. That way any implementation could require extra arguments
serializeFormUrl(form, options) {}
for one implementation options could be
{
button: {name:'foo', value:'bar'}
}
or another format like the following (which could suffer from the for-
in don't enum enumeration problem in IE.)
{
extraParams: {foo:'bar'}
}
Thanks for the thoughts, Richard.
Peter