Extending the XHTML transitional XSD

K

Kidogg

Hi all,
I'm attempting to write a validator for some email template files we
use as part of our e-commerce application (incidentally in C#) and I've
run into a problem as I'm not a huge user of XML.

Essentially an email template can contain any valid XHTML, but also has
tags such as <membershipcardheader> and <membershipcardfooter> that can
appear anywhere in the body of the HTML.

I've managed to download the XSD for XHTML transitional from the W3C
and can validate plain XHTML files fine, and can do the same with
simple XML containing my tags - but I have no idea how to proceed and
get the .NET parser to validate from either of the XSD's?

Or can I somehow inherit my XSD from the XHTML transitional one? I've
looked at the xsd:redefine tag but have no idea where to start?

Can anyone assist?

Cheers,
Kieran
 
A

Andy Dingley

Kidogg said:
Essentially an email template can contain any valid XHTML, but also has
tags such as <membershipcardheader> and <membershipcardfooter> that can
appear anywhere in the body of the HTML.

HTML email is an abomination, XHTML doubly so.

Although the "XHTML route" to adding these elements is obvious (look at
modularized XHTML) you're heading further and further away from what's
a realistically usable version of HTML for today's clients. I'd
strongly recommend that you stick with HTML 4.01 as any "public" use of
"HTML".

There's also a serious question as to why you need to include new tags
in publically published content. If they're new and unknown, then
what's the client expected to do with them ? If they're unusable, why
include them ?

Assuming that they're significant for your in-house CMS, then I'd be
strongly inclined to use them internally with namespaced XHTML
(probably just XHTML 1.0 Strict), then strip them out with a final
"transcode for publication" step, possibly in XSLT, that also converts
it to HTML 4.01. This is also a usable route to plaintext email.

PS - send HTML email to me and it goes straight into /dev/null/ You
_are_ going to support plaintext, aren't you?
 
K

kieranbenton

Andy said:
HTML email is an abomination, XHTML doubly so.

Justify yourself. Just because we provide HTML email templates does not
mean that they are massively bloated or horrendously formatted. Some of
our clients simply request that they can have some control over the
formatting of their outputted email.
Although the "XHTML route" to adding these elements is obvious (look at
modularized XHTML) you're heading further and further away from what's
a realistically usable version of HTML for today's clients. I'd
strongly recommend that you stick with HTML 4.01 as any "public" use of
"HTML".

Our webpages are fully XHTML strict compliant and we have no issues -
practically full compatibilty across the board in terms of browsers
(including mobile devices). What issues are you aware of regarding
using XHTML as opposed to 4.01?
There's also a serious question as to why you need to include new tags
in publically published content. If they're new and unknown, then
what's the client expected to do with them ? If they're unusable, why
include them ?

The point is that they are not included in the final email that is
sent. They are part of a templating system, we have XML email templates
that I wish to validate to stop our clients shooting themselves in the
feet with badly formed XML.
Assuming that they're significant for your in-house CMS, then I'd be
strongly inclined to use them internally with namespaced XHTML
(probably just XHTML 1.0 Strict), then strip them out with a final
"transcode for publication" step, possibly in XSLT, that also converts
it to HTML 4.01. This is also a usable route to plaintext email.

Which is what we already do to get our plaintext output. This does not
solve the problem of trying to validate the templates in the first
place however.
PS - send HTML email to me and it goes straight into /dev/null/ You
_are_ going to support plaintext, aren't you?

Firstly, as above, of course we do, and secondly, you are not
necesarily the typical market demographic for recepients of emails from
our system - just because you prefer plain text does not hold true for
all users.

Regards,
Kieran
 
K

kieranbenton

monique said:
Hi,

take a look at XHTML modularization,
it seems useful to extend the xhmlt specs with custom tags.

There's something in the XHTML section on w3c official site, like this:

http://www.w3.org/MarkUp/Guide/xhtml-m12n-tutorial/

Thanks for that Monique, it does look like the right track for me. I've
also considered the posibility of just making changes to the
transitional xsd and renaming it / changing the namespace. I take it
that is considered bad practice?

Cheers,
Kieran
 
B

Ben Newsam

Andy Dingley wrote:

Firstly, as above, of course we do, and secondly, you are not
necesarily the typical market demographic for recepients of emails from
our system - just because you prefer plain text does not hold true for
all users.

Throwing HTML email in the bin is a good way of getting rid of
mountains of unwanted mail/pictures/advets/unsolicited stuff. I have
used the technique successfully for years. Anyone who really needs to
contact me shoul/ought to be able to send in plain text, and if they
can't yet, they have to learn.
 
M

monique

Hi

I'm afraid that renaming/changing an existing standard is really bad
practise.

It isn't a safe choice,
even if you need a quick and (really!!!) dirty prototype ... :)

I've seen examples of the modularization in a custom cms
where new tags acts as placeholders to tell the engine
where the dynamic contents need to be placed.

All the markers are replaced by the engine with the contents
in a single XSLT transformation.

So the output is XHTML Strict 1.0 compliant.

That's all.

Bye ;-)

kieranbenton ha scritto:
 
A

Andy Dingley

Ben said:
Throwing HTML email in the bin is a good way of getting rid of
mountains of unwanted mail/pictures/advets/unsolicited stuff.

We probably shouldn't talk too loudly about that, as it's one of the
most useful spam filters around :cool:

Phishing usually falls flat in plaintext as the phishing sites become
obvious when viewed that way. For this reason, most phishfood is pure
HTML and deliberately without a plaintext version. They're not just
being inaccessible here, they're doing it deliberately. Now I've no
real problem with someone who sends me competent HTML _and_ a plaintext
version, but HTML alone just smells so much like spam it's going
straight in the bin.
 
A

Andy Dingley

kieranbenton said:
Justify yourself.

What MIME type are you using when you send them out? XHTML is not
widely processable as XML out in "the real world", so you're dependent
on Appendix C for getting it to work. Assuming that you're going to go
with HTML for email, then it ought to be HTML and not XHTML. There is
no reason for sending it with a pure-XML MIME type on it.
 
C

C. M. Sperberg-McQueen

Kidogg said:
...
I've managed to download the XSD for XHTML transitional from
the W3C and can validate plain XHTML files fine, and can do the
same with simple XML containing my tags - but I have no idea
how to proceed and get the .NET parser to validate from either
of the XSD's?
Or can I somehow inherit my XSD from the XHTML transitional
one? I've looked at the xsd:redefine tag but have no idea where
to start?

Unless you're doing something rather unusual you probably won't
need to use xsd:redefine. All you need to do is induce the
validator to create a single schema containing (a) the components
you want from the schema for XHTML transitional, and (b) the
components from your schema for your stuff. Many validators will
take run-time parameters indicating a set of schema documents
they should read and build schema components from; many (most?)
will automatically read and process any schema document mentioned
in a schemaLocation hint in the XML instance, unless you
explicitly tell them not to do so.

It probably won't be essential, but it may be convenient to
create a driver file: a single schema document that imports both
the XHTML schema documents you want and your own stuff. It might
look like this:

<xsd:schema xmlns:xsd ="http://www.w3.org/2001/XMLSchema" >

<xsd:annotation>
<xsd:documentation>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>Simple driver file for schema with XHTML and my-stuff.</p>
<p>Just import what we know we need. This provides a single
schema document to start out from.</p>
</div>
</xsd:documentation>
</xsd:annotation>

<xsd:import
namespace="http://www.example.com/myschema"
schemaLocation="http://www.example.com/myschema.xsd"/>

<xsd:import
namespace="http://www.w3.org/199/xhtml"
schemaLocation="http://www.example.com/mycache/xhtml-transitional.xsd"/>

</xsd:schema>

Strictly speaking, the schemaLocation attribute on xsd:import
is a hint, not an instruction; you'll want to consult the
documentation for the validator you're using to see whether
it honors the hint (almost all current schema processors
seems to -- it can be a problem to tell them NOT to do so,
if for some reason you want to prevent it).

Once you succeed in getting the processor to build a schema
with components for both namespaces you're interested in,
you may discover that one or the other of the schema authors
has failed to indicate any place where elements in other
namespaces are legal. It's been a while since I read the
XHTML schema documents, so I don't remember whether they
have wildcards for other namespaces at appropriate locations
or not. If they do, and your stuff does, too, then you're
done. (From your description, I infer that your declaration
for a template will allow as content any element from the
XHTML namespace; if it doesn't, either I've misunderstood
your design or you may want to make a change.)

If the XHTML schema documents have no wildcards, then you'll need
to do something about it, to make sure your stuff (e.g. your
membershipcardheader and membershipcardfooter elements) can occur
at appropriate locations in an HTML context. The easiest way is
to declare your elements as substitutable for appropriate XHTML
elements.

If, for example, membershipcardheader should be able to
appear pretty much wherever the XHTML 'p' element can
appear, then you'd write

<xsd:element name="membershipcardheader"
substitutionGroup="xhtml:p"
... >
...
</xsd:element>

One concern here is that the type of membershipcardheader
needs to be the same as that of xhtml:p, or derived from
it by extension or restriction. Be sure you're using the
schema documents from the Modularization of XHTML document,
not the ones intended for stand-alone use in "XHTML 1.0
in XML Schema".

I hope this helps.

--C. M. Sperberg-McQueen
World Wide Web Consortium
 
H

Helio Miranda

Hi,

This post was very useful. I've been having a look-alike trouble that
maybe you could help. I've extended some XHTML tags to add an form
validation functionality in the client side (implemented with javascript
). I’ve created a set of attributes that indicates what kind of
validation check is needed for the form field before it was submitted to
the server. Like this:

<input type="text" v:type="email" />
<input type="text" v:type="name" />

Then a javascript function processes all the tags and validates the form
accordingly to the v:type attribute.

Everything was fine. All the modern desktop browsers supported the
extension, but that’s not enough for the project I’ve been working on.
The code must be validated by some XHTML validators around the web. The
code validation is required for "quality of the product" reasons. These
validators were complaining about the invalid/unspecified attribute used
in the <input /> tag.

So I’ve created a Schema for the "v" namespace. The problem is: I've
tried to extend the XHTML specification and turn the code into a valid
XML document (the input tag must be expecting a possible v:type
attribute occurrence) . I found it was a hard task to conclude.

So, can you give me suggestions of a painless way for doing it? And what
validation engines support such schema extension?

Thanks!
 
C

C. M. Sperberg-McQueen

Helio Miranda said:
... I've extended some XHTML tags to add an form
validation functionality in the client side (implemented with javascript
). I've created a set of attributes that indicates what kind of
validation check is needed for the form field before it was submitted to
the server. Like this:

<input type="text" v:type="email" />
<input type="text" v:type="name" />

Then a javascript function processes all the tags and validates the form
accordingly to the v:type attribute.

Everything was fine. All the modern desktop browsers supported the
extension, but that's not enough for the project I've been working on.
The code must be validated by some XHTML validators around the web. The
code validation is required for "quality of the product" reasons. These
validators were complaining about the invalid/unspecified attribute used
in the <input /> tag.

So I've created a Schema for the "v" namespace. The problem is: I've
tried to extend the XHTML specification and turn the code into a valid
XML document (the input tag must be expecting a possible v:type
attribute occurrence) . I found it was a hard task to conclude.

So, can you give me suggestions of a painless way for doing it? And what
validation engines support such schema extension?

Painless? I doubt it - judging by your description, pain
is part of the structure of the situation.

HTML validators that use the standard HTML DTDs are unlikely
to aceept your attribute, because DTDs have no construct for
saying "any attribute in any namespace other than the HTML
namespace is OK here". The HTML DTDs are heavily parameterized,
so in theory you may be able to define a parameter entity that
makes the attribute v:type legal in the input element. But
it won't be much fun.

(X)HTML validators that use XSD schemas to validate will, or
won't, accept your attribute as legal, depending on whether the
schema they are using does or does not have an attribute wildcard
on the input element. There are several sets of XSD
schema documents for XHTML on the Web, some of which are
attempting to mimic the behavior of the DTDs as exactly as
possible (so they don't have attribute wildcards), and
some of which are attempting to match the spec more
closely (so they do). Schemas with the latter should
accept your v:type attribute whether you provide a
schema for your namespace or not (because the wildcards
in question are lax).

Whether a validation service supports extensions or
modifications to the schema being used depends on the
purpose and policy of the service. If the point is to
provide a running validator, so that users of the service
don't need to install one, then it's plausible to
allow the user to specify what schema is to be used.
But if the point is to provide some more or less
objective test of the conformance of a document to
a specified schema, then the service won't typically
want to use the schema you provide -- it will want to
use the schema it already knows about from trusted
sources. You aren't a trusted source - if you were,
your document wouldn't need validating, would it?
So I wouldn't expect most HTML validators to allow you
to specify your own copy of the relevant schema
documents.

Whether that puts you irrevocably at loggerheads with those
who want validation of the document, I can't tell.

If you absolutely must make the HTML valid against
the DTD, or against a schema that doesn't allow for
out-of-namespace attributes, then I can think of
one possible way out: use the built-in semantic
extension provided by the 'class' attribute. So instead
of v:type="email", you write class="email". It's
not even tag abuse -- that kind of thing is (as far as
I understand) what class was introduced for.

I hope this helps.

--C. M. Sperberg-McQueen
World Wide Web Consortium
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top