cdata and javascript

A

Andy Dingley

I find that most of the people who object to XHTML don't
want to spend the time learning a new process for coding pages.

Yes, I'm sure that when people like Jukka can just learn how to use
their copies of FrontPage properly, they'll stop advocating HTML over
XHTML.
XHTML uses modern practices like external style sheets and box design for
layouts instead of tables and inline formatting.

Trivially obvious as an observation, but what's your point? HTML does
all of that, _exactly_ as XHTML does it.
 
H

Harlan Messinger

Jeff said:
Why? It's a self closing tag and it works in every browser. The trend
is toward closing every tag you open.

In HTML, <br> is one of the tags (along with <input>, <link>, <meta:>,
and so forth) that doesn't *have* a closing tag, so it's equally
meaningless to make the opening tag a "self-closing" tag. Moreover, the
slash has another meaning in SGML (HTML is to SGML as XHTML is to XML)
as follows:

<title/My Page/

is equivalent to

<title>My Page</title>

So

<br />This sentence has a / (slash) character.

would be treated in a correctly performing HTML user agent as

<br>>This sentence has a </br> (slash) character.

which is certainly not what you want. The reason

<br />

works in real browsers is that they haven't implemented that detail of
SGML--in other words, it relies on a browser deficiency. Instead they
wind up treating the slash as something that simply doesn't belong
there, and they handle it the same way they handle anything that doesn't
belong there--they pretend it isn't there. That doesn't mean it's
valid--it isn't. It's just being ignored.
 
H

Harlan Messinger

A-OK-SITE said:
See what I was talking about it is the same old arguments with no
basis in reality (see previous post). A large percent of
knowledgeable web designers use XHTML and serve it as HTML with no
problems. I find that most of the people who object to XHTML don't
want to spend the time learning a new process for coding pages. XHTML
uses modern practices like external style sheets and box design for
layouts instead of tables and inline formatting. There is absolutely
> no doubt that XHTML provides a much cleaner and uncluttered page.

This demonstrates that you aren't even aware of what does and doesn't
distinguish XHTML from HTML 4.01. That being the case, you lack a basis
for judging the relative merits of using either of them. For your
information, XHTML adds nothing to HTML 4.01 in the way that external
style sheets and box design are used in preference to markup to specify
presentation for a page.
 
C

cwdjrxyz

Scripsit Jeff:
I have a bit of javascript that I'd like to hide from the validator.
Consider learning what a validator is before trying to fool it.
Then read the validator's FAQ list when you run into problems.
And if you use JavaScript, just put any bulky code into an external
file, and any validation issues with it vanish in a puff of logic.
Should I be using XHTML [...]?
No, especially since you asked.

Everybody seems to have a preference as to which doc type they prefer
html or xhtml and most of it has no basis in reality. It is like the
old Ford vs Chevy argument in which both are good but for some reason
people just seem to like one more than the other.
XHTML is a cleaner code with features like self-closing tags (et al).
The XHTML is almost always served and interpreted as HTML with the
main difference being the syntax only. The new HTML 5.0 and XHTML 2.0
that is soon to be released is bringing the two types even closer
together based on preliminary information. It is also somewhat like
the difference between strict and transitional and both will render
the page the same with only minute differences in the way the page is
coded.
So in summary pick the language you feel the most comfortable with and
use it. They are both valid and fully functional, and all modern
browsers will render the code just fine. It is just my humble opinion
but I prefer XHTML, but I always preferred a Chevy and a Budweiser
too.

Unfortunately no IE browser, including IE7, can render any xhtml if it
is served properly as mime type application/xhtml+xml. All you get is
an error message. Many mis-serve xhtml as text/html and use an
extension .html. Although this often works for IE browsers, there is
no point in writing xhtml code in the first place if you are not going
to serve it as xhtml. Since the extension .html usually is associated
with the mime type for text/html on the server, you have to use
another extension, such as .xhtml, and assign it to the xhtml mime
type application/xhtml+xml on the server. Then when you serve xhtml
properly, in addition to IE browsers not working, other modern
browsers such as Firefox, Opera, Seamonkey, and Safari for Windows
will handle true xhtml. However then the code is parsed as xml. A xml
parser must be much more strict than a html parser. The least little
mistake, such as a single unclosed tag, gives a fatal parse error that
results in an error message rather than a view of the page, which
often works with little problem in html.

If you want to serve true xhtml, you have to provide IE html by using
separate pages, header/browser exchange and rewriting the page for
html for browsers that do not indicate they can handle the mime type
for xhtml, etc. The main reason for all of this trouble is that
Microsoft can not or will not write their browsers to handle modern
xhtml properly. Hopefully, now that Vista is out after much delay,
Microsoft will have time to bring their browser up to date. IE7 was
just a minor change from IE6. It did correct some bugs and might be a
bit more secure. However it was outmoded at the moment it was
released.

The important thing to remember is that you have xhtml only if both
the code is written in xhtml and it is served as application/xhtml+xml
- not text/html. The W3C validator only validates the code as html or
xhtml. It does not validate that the code is being served properly.
However, in the most detailed setting of the validator, it will tell
you if the page is being served as text/html or application/xhtml+xml.
You will find very few pages being served properly as xhtml when you
check them.

A web page written and served properly as xhtml can be all html, all
xml, or a combination of both. When a page is written in xhtml that
includes only code that is part of html, then of course it serves no
useful purpose to write it in xhtml instead of html, but it also does
no harm if you know xhtml well and are properly set up to serve it.
However if the page includes some xml, then you have to use xhtml,
unless you can use some tricks to make the xml part show up. PCs are
now just a small part of the many computer devices out there. Many of
the modern small and portable devices are xml devices. Many pages will
work properly on small xml devices if properly written in xhtml. An
xhtml page requires parsing as the very strict xml, because it may
contain some xml. Devices that are xml do not allow many errors that
an html device will tolerate. One of the most serious errors you can
make in xml is to not close something that should be closed, for
example. Some think of xhtml as a transition language, and that in the
future all code will be in a language that is closer to xml than html.
I am not a prophet, and time will tell.
 
A

A-OK-SITE

This demonstrates that you aren't even aware of what does and doesn't
distinguish XHTML from HTML 4.01. That being the case, you lack a basis
for judging the relative merits of using either of them. For your
information, XHTML adds nothing to HTML 4.01 in the way that external
style sheets and box design are used in preference to markup to specify
presentation for a page.

You know that is not what I was talking about. I didn't feel like
writing a book, but in your case I guess I should have. You lack the
basis for saying anything I care to hear about. I don't ask for help
in these groups, so your opinion has no meaning to me. People are
like you are just picking posts apart in an attempt to look
intelligent, but that is a lost cause. So in summary go fly a kite
or .......

Daniel
 
H

Harlan Messinger

A-OK-SITE said:
You know that is not what I was talking about.

I don't know that and I don't see how it's possible unless you were
using words that expressed something completely different from what you
meant. Perhaps you should reread your own words to find out what it was
you said, as opposed to what it was you think you said.
I didn't feel like
writing a book, but in your case I guess I should have. You lack the
basis for saying anything I care to hear about.

That's fine, proceed in ignorance, but just do everyone a favor and
don't presume to "inform" others.
 
J

Jeff

Harlan said:
In HTML, <br> is one of the tags (along with <input>, <link>, <meta:>,
and so forth) that doesn't *have* a closing tag, so it's equally
meaningless to make the opening tag a "self-closing" tag. Moreover, the
slash has another meaning in SGML (HTML is to SGML as XHTML is to XML)
as follows:

<title/My Page/

is equivalent to

<title>My Page</title>

Hmm, that's pretty wild but I think I've seen something like that in
xml/xslt.

What happens if you have the space before the slash? Does SGML ignore
whitespace there? I know an earlier version of NS needed the whitespace.
So

<br />This sentence has a / (slash) character.

would be treated in a correctly performing HTML user agent as

<br>>This sentence has a </br> (slash) character.


So, this:

<p />does this/

<p>does this</p>

That implies to me that / should join the list of entities that should
be escaped as "<" and "&" are.
which is certainly not what you want. The reason

<br />

works in real browsers is that they haven't implemented that detail of
SGML--in other words, it relies on a browser deficiency.

Well, I realize that <br /> is not valid in html 4. From what I can tell
it is required in xhtml, which I have no desire to use since it gives me
nothing tangible over html, but what I was thinking was whether this is
going to be needed in html 5 which will get here someday. Or not.

Instead they
wind up treating the slash as something that simply doesn't belong
there, and they handle it the same way they handle anything that doesn't
belong there--they pretend it isn't there. That doesn't mean it's
valid--it isn't. It's just being ignored.

On a side note, I notice an ever increasing number of pages with an
xhtml doctype.


Jeff
 
J

Jeff

cwdjrxyz said:
Scripsit Jeff:
I have a bit of javascript that I'd like to hide from the validator.
Consider learning what a validator is before trying to fool it.
Then read the validator's FAQ list when you run into problems.
And if you use JavaScript, just put any bulky code into an external
file, and any validation issues with it vanish in a puff of logic.
Should I be using XHTML [...]?
No, especially since you asked.
--
Jukka K. Korpela ("Yucca")http://www.cs.tut.fi/~jkorpela/
Jeff,
Everybody seems to have a preference as to which doc type they prefer
html or xhtml and most of it has no basis in reality. It is like the
old Ford vs Chevy argument in which both are good but for some reason
people just seem to like one more than the other.
XHTML is a cleaner code with features like self-closing tags (et al).
The XHTML is almost always served and interpreted as HTML with the
main difference being the syntax only. The new HTML 5.0 and XHTML 2.0
that is soon to be released is bringing the two types even closer
together based on preliminary information. It is also somewhat like
the difference between strict and transitional and both will render
the page the same with only minute differences in the way the page is
coded.
So in summary pick the language you feel the most comfortable with and
use it. They are both valid and fully functional, and all modern
browsers will render the code just fine. It is just my humble opinion
but I prefer XHTML, but I always preferred a Chevy and a Budweiser
too.
Unfortunately no IE browser, including IE7, can render any xhtml if it
is served properly as mime type application/xhtml+xml. All you get is
an error message. Many mis-serve xhtml as text/html and use an
extension .html. Although this often works for IE browsers, there is
no point in writing xhtml code in the first place if you are not going
to serve it as xhtml. Since the extension .html usually is associated
with the mime type for text/html on the server, you have to use
another extension, such as .xhtml, and assign it to the xhtml mime
type application/xhtml+xml on the server. Then when you serve xhtml
properly, in addition to IE browsers not working, other modern
browsers such as Firefox, Opera, Seamonkey, and Safari for Windows
will handle true xhtml. However then the code is parsed as xml. A xml
parser must be much more strict than a html parser. The least little
mistake, such as a single unclosed tag, gives a fatal parse error that
results in an error message rather than a view of the page, which
often works with little problem in html.

If you want to serve true xhtml, you have to provide IE html by using
separate pages, header/browser exchange and rewriting the page for
html for browsers that do not indicate they can handle the mime type
for xhtml, etc. The main reason for all of this trouble is that
Microsoft can not or will not write their browsers to handle modern
xhtml properly. Hopefully, now that Vista is out after much delay,
Microsoft will have time to bring their browser up to date. IE7 was
just a minor change from IE6. It did correct some bugs and might be a
bit more secure. However it was outmoded at the moment it was
released.

The important thing to remember is that you have xhtml only if both
the code is written in xhtml and it is served as application/xhtml+xml
- not text/html. The W3C validator only validates the code as html or
xhtml. It does not validate that the code is being served properly.
However, in the most detailed setting of the validator, it will tell
you if the page is being served as text/html or application/xhtml+xml.
You will find very few pages being served properly as xhtml when you
check them.

A web page written and served properly as xhtml can be all html, all
xml, or a combination of both. When a page is written in xhtml that
includes only code that is part of html, then of course it serves no
useful purpose to write it in xhtml instead of html, but it also does
no harm if you know xhtml well and are properly set up to serve it.
However if the page includes some xml, then you have to use xhtml,
unless you can use some tricks to make the xml part show up. PCs are
now just a small part of the many computer devices out there. Many of
the modern small and portable devices are xml devices.


Now that is something I'm interested in.

I had thought that since everything is CMS driven that I can just
create something like an RSS feed, that would have a bit of html in it
(like "strong" or "i" or br) and serve that depending on accept type.

Now, it's not hard to generate well formed RSS but those html strays
in there are another story. Will those bits then also have to be in
correct xhtml? In other words: <br> or <br />?

Jeff




Many pages will
 
H

Harlan Messinger

Jeff said:
Hmm, that's pretty wild but I think I've seen something like that in
xml/xslt.

What happens if you have the space before the slash? Does SGML ignore
whitespace there? I know an earlier version of NS needed the whitespace.


So, this:

<p />does this/

<p>does this</p>

That implies to me that / should join the list of entities that should
be escaped as "<" and "&" are.

Well, yeah, if you were writing for user agents that followed this
treatment correctly, and you wanted the equivalent of

<p>The / is called a "slash"</p>

you would need to escape it when writing it in this form:

<p/The / is called a "slash"/
 
N

Neredbojias

Well bust mah britches and call me cheeky, on Mon, 28 Jan 2008 17:56:50
GMT Jeff scribed:
Now that is something I'm interested in.

I recently updated my home page to xhtml where, via php, the same page is
served as application/xhtml+xml to any browser which can handle it and
text/html to IE. It validates and works fine, but there is one or two
things I'm not sure about. Here is the (relevant) code:

<?
if (stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {
header("Content-Type: application/xhtml+xml; charset=UTF-8");
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo "\r\n";
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">';
echo "\r\n";
} else {
header("Content-Type: text/html; charset=UTF-8");
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">';
echo "\r\n";
}
?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>The Neredbojias Website</title>
<meta name="author" content="Neredbojias" />
<link rel="icon" href="nerbo.ico" type="image/x-icon" />
.... (no other meta tags)
</head>

What I don't know is if I need the <?xml version="1.0" encoding="UTF-8"?>
part or not and the disparity between xml version="1.0" and XHTML 1.1.
(xml version="1.1" gives an error.) There's also a question about
needing the plain lang="en" designation. (The xml one _is_ needed.) As
I said, though, the code above works and returns "valid" from the w3c
markup validator. If you wish to see it in action:

http://www.neredbojias.com/
 
A

Andy Dingley

I had thought that since everything is CMS driven

_Inside_ a CMS, there are strong arguments for using XHTML (or at
least, some XML that shares the same XML schema)

On publishing from a CMS, it's usually easier to serve the document to
the web as HTML than it is as XHTML (usable XHTML, meeting Appendix C
and the requirements of good web practice). This is certainly the case
for XSLT-based output. It's hard to achieve Appendix C from XSLT -
XSLT wants to serve it as "XML standards compliant" content, which
isn't appropriate for IE. There's a simple switch to flip it into
"HTML output", but no similar switch for "Appendix C XHTML".

that I can just create something like an RSS feed, that would have a bit of html in it
(like "strong" or "i" or br) and serve that depending on accept type.

Have you read the infamous article on this, "Myth of RSS version
comaptibility" from Dive Into Mark? You ought to.

Now, it's not hard to generate well formed RSS

There's never anything simple about non-trivial RSS, because RSS 2.0
doesn't have a competent specification. There's no clear way to embed
HTML in it. Practical experience favours escaping through entity
encoding. This is different to using CDATA sections, but similar in
meaning. Both are a way to embed "<" safely, but both do it by
embedding "<" as a mere character, stripping away all implication that
it might be marking the start of a HTML tag.

Your RSS reader _might_ later decide to assume that any "words" that
are "wrapped" in angle brackets should thus be treated as HTML tags.
This works (and it's how it's done), but it's far from robust. It has
several drawbacks:

* Such embedded HTML content can't be validated as being valid HTML,
outside of a final RSS tool that knows about this assumption.

* How do you publish a HTML tutorial that is marked up in plain text,
not HTML? What does this mean:
<item>
<title>HTML elements Introduction Course</title>
<description>Today we'll meet the &lt;BR&gt; element!</
description>
</item>

* It's hardly rare to use this style of markup in plain text either:
<description>Set the value of the &lt;customer-identifier&gt;
field</description>
In this case, that isn't a HTML tag at all.


Will those bits then also have to be in
correct xhtml? In other words: <br> or <br />?

Just about the only constant for embedding (X)HTML into RSS is that
it's not done through XML or XML namespacing. This is for two reasons:

* RSS (2.0 specs) doesn't grok namespacing, as it's not defined to be
XML (in a compliant sense).

* XML namespacing requires balanced tags and closed elements. This is
a restrictive thing to impose upon embedding HTML fragments. Consider
this:

<rss:description>
<html:p>A honking great list, which we've truncated for display.
<html:ul>
<html:li>Foo</html:li>
<html:li>Bar</html:li>
<html:li>Bat</html:li>
[...]
</rss:description>

Now that's a reasonable fragment to want to embed, but it's
impractical by namespacing.
 
J

Jeff

Andy said:
_Inside_ a CMS, there are strong arguments for using XHTML (or at
least, some XML that shares the same XML schema)

On publishing from a CMS, it's usually easier to serve the document to
the web as HTML than it is as XHTML

That has been my thinking. To create it as xhtml and serve it as html.
That leaves me with either fixing the stray bits like <br /> or just
ignoring them as the browsers do anyway. I suppose I should fix them...


(usable XHTML, meeting Appendix C
and the requirements of good web practice). This is certainly the case
for XSLT-based output. It's hard to achieve Appendix C from XSLT -
XSLT wants to serve it as "XML standards compliant" content, which
isn't appropriate for IE. There's a simple switch to flip it into
"HTML output", but no similar switch for "Appendix C XHTML".

I'll have to read through that.
Have you read the infamous article on this, "Myth of RSS version
comaptibility" from Dive Into Mark? You ought to.

Nope, hadn't seen that. I had noticed when I was learning about RSS that
it was a bit of a mess. That article dates to 2004. If you have a more
current resource on useability I'd like to read it.
There's never anything simple about non-trivial RSS, because RSS 2.0
doesn't have a competent specification. There's no clear way to embed
HTML in it. Practical experience favours escaping through entity
encoding. This is different to using CDATA sections, but similar in
meaning. Both are a way to embed "<" safely, but both do it by
embedding "<" as a mere character, stripping away all implication that
it might be marking the start of a HTML tag.

That seems to be what I've done. It looks like I added a handful of
other elements (like "&"). The RSS reader reconstructs these as what
they are. Frankly, when I wrote my RSS generator I fixed the errors I
found. Perhaps something to do with the widely varying specs I saw and
not really understanding them.
Your RSS reader _might_ later decide to assume that any "words" that
are "wrapped" in angle brackets should thus be treated as HTML tags.
This works (and it's how it's done), but it's far from robust. It has
several drawbacks:

* Such embedded HTML content can't be validated as being valid HTML,
outside of a final RSS tool that knows about this assumption.

Now, that makes sense!
* How do you publish a HTML tutorial that is marked up in plain text,
not HTML? What does this mean:
<item>
<title>HTML elements Introduction Course</title>
<description>Today we'll meet the &lt;BR&gt; element!</
description>
</item>

* It's hardly rare to use this style of markup in plain text either:
<description>Set the value of the &lt;customer-identifier&gt;
field</description>
In this case, that isn't a HTML tag at all.


Will those bits then also have to be in
correct xhtml? In other words: <br> or <br />?

Just about the only constant for embedding (X)HTML into RSS is that
it's not done through XML or XML namespacing. This is for two reasons:

* RSS (2.0 specs) doesn't grok namespacing, as it's not defined to be
XML (in a compliant sense).

* XML namespacing requires balanced tags and closed elements. This is
a restrictive thing to impose upon embedding HTML fragments. Consider
this:

<rss:description>
<html:p>A honking great list, which we've truncated for display.
<html:ul>
<html:li>Foo</html:li>
<html:li>Bar</html:li>
<html:li>Bat</html:li>
[...]
</rss:description>

Now that's a reasonable fragment to want to embed, but it's
impractical by namespacing.


Ah, now I understand! Sort of! Fortunately I'm not doing that.

Thanks,
Jeff
 
A

Andy Dingley

That has been my thinking. To create it as xhtml and serve it as html.
That leaves me with either fixing the stray bits like <br /> or just
ignoring them as the browsers do anyway. I suppose I should fix them...

In general, you just shouldn't even think about writing output
routines!
Really. Stop it right now.

Why do you need to write a "serialiser" at all? You're probably
working with XML, from a well-known language. In which case, there's a
range of XML DOMs to choose from and they're already written for you.
In particular, skilled people have worked hard to write standards-
compliant serialiser methods on them, including support for varying
character encodings. These things work. They work better than most
people have the skill to duplicate. They take more time to write again
than the competent people can afford.

It's like cryptography. There's only half-a-dozen people who should
ever be allowed to write the components, one of them's insane, one's
Finnish, two are academics with scary hair, and the others are kept in
a locked cupboard by the NSA. The rest of us should only ever re-use
these components, not re-invent them.

So your serialiser should understand the difference between SGML and
XML, or at least HTML and XML. Then you tell it which, and it just
works. If it doesn't, it's broken (so why trust it to get anything
right?).

If you aren't using some sort of intermediate DOM (i.e. direct
document.write()s) then fix that first. _Especially_ for anything
resembling XML.


Here's a spare Clue: If you come to my project again and tell me that
"We need to write a new serialiser from scratch, because the standard
one doesn't work because "Our Project Am Spesshull.", then you'll get
a swift dose of Clueiron justice. I've had this imposed on me three
times now, all incorrectly. First time was dumb, second time was dumb,
a big project and disastrous, third time I just blew the bastard thing
clean out of the source repository (and we all lived happily ever
afterwards).

Mostly people think this in the first place because they don't grok
character encodings.
 
J

Jeff

Andy said:
In general, you just shouldn't even think about writing output
routines!
Really. Stop it right now.

Why do you need to write a "serialiser" at all? You're probably
working with XML, from a well-known language. In which case, there's a
range of XML DOMs to choose from and they're already written for you.
In particular, skilled people have worked hard to write standards-
compliant serialiser methods on them, including support for varying
character encodings. These things work. They work better than most
people have the skill to duplicate. They take more time to write again
than the competent people can afford.

It's like cryptography. There's only half-a-dozen people who should
ever be allowed to write the components, one of them's insane, one's
Finnish, two are academics with scary hair, and the others are kept in
a locked cupboard by the NSA. The rest of us should only ever re-use
these components, not re-invent them.

So your serialiser should understand the difference between SGML and
XML, or at least HTML and XML. Then you tell it which, and it just
works. If it doesn't, it's broken (so why trust it to get anything
right?).

Well, we seem to have diverged somewhere. I'm not converting html to xml
or vice versa.

All I'm doing is taking the CMS data and outputing it as html. That's
pretty easy as the CMS is nothing but a collection of heading,paragraph,
image, list, class... objects. One set of those after another. That's
all html is anyways. It has to write correct XHTML style html, because
the author is not creating paragraphs and lists but merely filling them
in. I understand that many CMS's have an editor component that edits
like a "word" doc. I've always thought that was wrong.

There's two issues that arise and the first is what to do with
linebreaks in paragraphs and headings. Generally the author expects to
see those as newlines. So I convert those to <br>'s with an option not to.

The other is what to do with extraneous markup the author adds. In
RSS this is no problem as they are escaped. Otherwise it's not hard to
ensure that tags are nested correctly and closed properly. It's just a
couple of greedy regexes that check each outside pair of tags to see
that they match. That only leaves any single tags such as <br> that the
author may add. And that is roughly what I was asking. But it's at
worst just another regex to remove trailing "/"'s in tags.


Jeff
 
A

Andy Dingley

All I'm doing is taking the CMS data and outputing it as html. That's
pretty easy as the CMS is nothing but a collection of heading,paragraph,
image, list, class... objects. One set of those after another. That's
all html is anyways.

Are you familiar with the MVC pattern (Model View Controller)? It
sounds as if your system here is very far from it, which isn't a good
thing.

Now MVC has most to offer in an interactive context, and for simple
RESTful view-only apps then you really don't gain much benefit from
separating out the Controller. You do however still benefit from
separable Model & View.

Separating Model & View begins by recognising that a good CMS has
quite different data models for "in" and "out" content. The way that
content editors work with content is quite different to how the
eventual pages operate. Editors care about "document
structure" (paras, headers) and may even use HTML as a suitable markup
language for it. They also care a great deal about metadata such as
indexing terms, authoring audit trail and editorial control. The
final page may embed the document content relatively unchanged, but it
will build page wrappers and navigation structures from different
routes, such as "site skins" and queries to build navigation list from
queries across the index metadat on a number of pages. To allow easy
editing you need a content structure that modeal what the editors
need. To allow flexible querying (you'll re-write this a _lot_ over a
sites' lifetime) then you need a clearly separate view that isn't
tightly limited by the underlying DB structure.

There are two ways to build a non-MVC CMS. One treats both portions as
"Model", one treats both as "View". Both have merged things that
shouldn't be merged, which becomes a serious limitation long-term,
once you try and do the inevitable maintenance changes over the
project's lifetime.

A "pure View" architecture stores chunks of HTML in the CMS database
and spits these chunks back out on request. Its characteristics are
that page authoring is hard (content authors are still having to work
in HTML) and its not "smart" for manipulating the content DB as
_content_, rather then as its final presentation.

A "pure Model" architecture stores abstract content, then applies a
hard-coded view process through scripting. This is probably the more
common, especially for page-scripting languages like ASP or JSP with
Scriptlets. iit may be quite a powerful system internally, with good
editing features and smart querying or index-generation. The downside
is that the "view" layer is unclear (probably hard-coded scripts) and
is inflexible to modify.

Classic ways to break Model-View separation are to allow "objects" or
"lists" to be embedded _inside_ pages. If you have any sort of
structure that links content objects together, make sure that they're
stored outside of these objects in another structure, then embed them
within objects only at view time.
I understand that many CMS's have an editor component that edits
like a "word" doc. I've always thought that was wrong.

A CMS really should allow "content" to be edited in any damn way you
please. Requiring this to be HTML isn't so bad (HTML is an OK format
to use and Word is a bit paper-centric. Consider DocBook too for some
cases). Where it goes wrong is when you allow _content_ authors to
start hard-coding view-related issues that shouldn't be generated
until query-time on the CMS DB.

There's two issues that arise and the first is what to do with
linebreaks in paragraphs and headings. Generally the author expects to
see those as newlines. So I convert those to <br>'s with an option not to.

Options are bad. Options give the content editor a way to start hard-
coding (or at least implying) the final format after the view
operation.

Give your content editors a "para" and a "linebreak" structure (even a
page or section break too). Most word processors support this,
although few typists are trained to appreciate the difference. Render
this at view-time appropriately, accurately and consistently.

The other is what to do with extraneous markup the author adds.

Stop them doing it - it's extraneous, after all. The authors should be
given an adequate set of content markup for what they need to describe
and they should be strongly discouraged from using anything else as
well, particularly slipping in little HTML fragments. A necessary
condition to allow this is providing them an adequate content markup
to begin with, and extending this as necessary.

If editors can insert HTML they'll do so, and they'll do it with badly-
understood, invalid HTML 3.2 snippets that they've scraped up from
some cargo-cult website. Or maybe HTML 5.

You might know how to control whitespace, but your users will do it by
inserting repeated <br>.

Users aren't always smart, but they are persistent. If it's possible
to mis-use the system, they'll do so. Your only hope of avoiding
"wrong" use of it isn't by trying to stamp it out, it's by giving them
a "right" alternative, making it good enough to be useful, and
training them so that they use it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top