Delimiter for attributes in an element

T

Tim Streater

Take for example, this:

<a href="http://www.example.com" target="_blank">Click here</a>

Now, according to section 3.2.2 of:

<http://www.w3.org/TR/html401/intro/sgmltut.html>

the attributes are separated by spaces. Does it really mean this, or is
any whitespace allowed? IOW, could my example above be legally written
as:

<a href="http://www.example.com"
target="_blank">Click here</a>

I'd like to think the answer is no, it has to be space(s) (it would be
very irritating to find that any white space is permitted).
 
B

Beauregard T. Shagnasty

Tim said:
Take for example, this:

<a href="http://www.example.com" target="_blank">Click here</a>

I realize that is just an example, but note that "Click here" as link
anchor text shouldn't be used in the real world.

Now, according to section 3.2.2 of:

<http://www.w3.org/TR/html401/intro/sgmltut.html>

the attributes are separated by spaces. Does it really mean this, or is
any whitespace allowed? IOW, could my example above be legally written
as:

<a href="http://www.example.com"
target="_blank">Click here</a>

I'd like to think the answer is no, it has to be space(s) (it would be
very irritating to find that any white space is permitted).

After a short test, it seems to work (at least in Firefox 12), but might
be dependent on the browser's error correction. Best to use regular
spaces between the attributes. It also makes maintenance easier (to read).
 
T

Tim Streater

Beauregard T. Shagnasty said:
I realize that is just an example, but note that "Click here" as link
anchor text shouldn't be used in the real world.

<http://www.hallaminternet.com/blog/2011/07/the-click-here-problem.html>

Of course not. That's why I always put "Click there". said:
After a short test, it seems to work (at least in Firefox 12), but might
be dependent on the browser's error correction. Best to use regular
spaces between the attributes. It also makes maintenance easier (to read).

Well quite. That's why I didn't bother testing it.

I was enquiring because I do some elementary sanitising of html before
shoving it in an iframe. Changing all target attributes to _blank is
just one such, and I'd rather have to search for ' target'.
 
J

Jukka K. Korpela

Now, according to section 3.2.2 of:

<http://www.w3.org/TR/html401/intro/sgmltut.html>

the attributes are separated by spaces. Does it really mean this,

No. The section is sloppy language even for a tutorial, and it is
normatively relevant. Normatively, HTML 4.01 has been defined as an
application of SGML (though never really implemented that way - but it's
still *mostly* implemented in a roughly SGML-like manner, including this
issue).

In SGML, attribute specifications (that's what people normally mean by
"attribute" as a syntact construct) *may* be separated by any number of
whitespace characters. Technically, this is not defined in terms of
separators but so that an attribute specification list is just a
sequence of attribute specifications, and an attribute specification is
syntactically defined as starting and ending with s*, which means any
(possibly empty) sequence of whitespace.

Browsers play by these rules, though probably more by coincidence than
by willingness to implement SGML rules.
or is
any whitespace allowed?

Yes, including none.
IOW, could my example above be legally written as:

<a href="http://www.example.com"
target="_blank">Click here</a>

Certainly, and also as
<a href="http://www.example.com"target="_blank">Click here</a>
(Few, if any, people would recommend that. But it is valid, as you can
check using a validator, and accepted by browsers.)
I'd like to think the answer is no, it has to be space(s) (it would be
very irritating to find that any white space is permitted).

I would find it irritating to find that any whitespace is *not*
permitted. In modern standards (largely, anything from the 1990s or
newer), the SPACE character seldom has a special role as a separator; it
is more common to define things in terms of whitespace characters.

In XHTML, things are different, but only in the sense that at least one
whitespace character is required between attribute specifications.
 
T

Tim Streater

Jukka K. Korpela said:
No. The section is sloppy language even for a tutorial, and it is
normatively relevant. Normatively, HTML 4.01 has been defined as an
application of SGML (though never really implemented that way - but it's
still *mostly* implemented in a roughly SGML-like manner, including this
issue).

In SGML, attribute specifications (that's what people normally mean by
"attribute" as a syntact construct) *may* be separated by any number of
whitespace characters. Technically, this is not defined in terms of
separators but so that an attribute specification list is just a
sequence of attribute specifications, and an attribute specification is
syntactically defined as starting and ending with s*, which means any
(possibly empty) sequence of whitespace.

Browsers play by these rules, though probably more by coincidence than
by willingness to implement SGML rules.


Yes, including none.


Certainly, and also as
<a href="http://www.example.com"target="_blank">Click here</a>
(Few, if any, people would recommend that. But it is valid, as you can
check using a validator, and accepted by browsers.)

Mmmm, I was afraid of this - and in particular the no-whitespace as in
your example. I *really* don't want to have to look at full-strength
parsing of the html I receive, just so I can ensure that all <a> contain
target="_blank".

Alternatively, is there a way to ensure that links in html in an iframe
always open in a new window?
 
J

Jukka K. Korpela

Mmmm, I was afraid of this - and in particular the no-whitespace as in
your example. I *really* don't want to have to look at full-strength
parsing of the html I receive, just so I can ensure that all <a> contain
target="_blank".

I don't quite see the point... if you receive HTML from somewhere, can't
you make your HTML provider ensure that your rules are enforced?
Alternatively, is there a way to ensure that links in html in an iframe
always open in a new window?

Put <base target="_blank"> into the <head> part. This will set a default
target attribute on all links (and can be overridden with target
attributes in individual links, such as target="_self").
 
T

Tim Streater

Jukka K. Korpela said:
I don't quite see the point... if you receive HTML from somewhere, can't
you make your HTML provider ensure that your rules are enforced?

:)

The html is the bodies of emails. Need I say more?
Put <base target="_blank"> into the <head> part. This will set a default
target attribute on all links (and can be overridden with target
attributes in individual links, such as target="_self").

Yes, I thought of <base>. Unfortunately all that does is supply a
default. If a link has an explicit target, that is the one used.

I think I'm going to try a slightly different approach. At present, the
sequence is:

SQLite db -> PHP sanitisation -> ajax -> JavaScript write to iframe.

I'll look at moving the sanitisation from PHP (where I have to deal with
the raw html) to after it's written into the iframe. That way I let the
browser make sense of the html first, after which I can fiddle with the
DOM.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top