My XML buried in some HTML which is all wrapped up in my XML?

G

Grant Robertson

Here I am again with my off the wall questions about what is even
possible.

Up until now, I have been thinking about the standard I have been
creating as a mere wrapper around a snippet of HTML or XHTML. A sort of
envelope or container with lots of additional metadata in which to place
a small piece of HTML (about 1/4 page worth each). I just realized that I
need to be able to mark up text *within* the HTML snippet with some of
the tags from my XML standard. A large set of metadata about the snippet
of HTML as a whole will not be adequate. Users need to be able to tag
specific parts of the HTML snippet with tags from my standard.

I know that the frequent use of the "any" wildcard in the XHTML schema
means that XHTML content creators can use XML from any standard to insert
additional tags within their XHTML content.

My first question is:
If my tags are used within the XHTML snip pit which is then wrapped up in
an element of my XML standard with lots of my metadata above and below
it, will a standard web browser engine be able to display that document -
ignoring all of the XML from my standard? Or, will I need to create my
own app to even be able to display the basic content?

2:
Even though that XHTML content is wrapped up in my XML metadata, will the
main part of the XML and the tags within the XHTML be considered part of
the same namespace? Will they be part of the same namespace but perhaps
have a different prefix depending on how the XHTML content creator chose
to write their headers? (The content may be created by different people,
and even different software, from those who add the metadata later.) If
they have different prefixes, will a standard XML parser still figure out
that both are of the same namespace? Will this matter to the program that
is actually doing something with this document?

3:
What about plain HTML? Do things work the same as XHTML in this context
or will they be completely different? I have a sinking feeling that HTML
will not allow my XML tags at all. Would I have to completely write my
own web engine to enable this to work at all?
 
J

Joe Kesselman

Grant said:
Even though that XHTML content is wrapped up in my XML metadata, will the
main part of the XML and the tags within the XHTML be considered part of
the same namespace?

Entirely up to you. This depends on what your document's namespace
declarations are and how you do or don't use prefixes. Say what you mean.
What about plain HTML? Do things work the same as XHTML in this context
or will they be completely different?

Different. If you want to intermix HTML and other markup, use XHTML.
That's why it was created.
 
A

Andy Dingley

Here I am again with my off the wall questions about what is even
possible.

I'd pay more attention if I suspected you'd bothered to read what
other people had already written on that wall. You're re-inventing
lots of wheels here.
 
G

Grant Robertson

I'd pay more attention if I suspected you'd bothered to read what
other people had already written on that wall. You're re-inventing
lots of wheels here.

I'm working as hard as I can to learn XML Schema but the available books
aren't much help. At the same time, I am trying to invent a standard to
help educate the world AND finish college. So, sometimes I just need to
know if something is possible or not so I can make some major decisions
as to the direction of some part of my standard. Since there is no one on
my campus who knows anything about XML at all, I am forced to turn to the
newsgroups. However, I just don't have the time or understanding right
now to read through all of the posts in this newsgroup. If you can tell
me how to do a search for something like this, I would appreciate it.
However, I'm sure you would admit that doing such a search when you don't
already know the terms would be rather fruitless.

I am trying to avoid reinventing as many wheels as possible. But I have
to learn about all the different types of wheels and how they fit on all
the different types of axles first. That is what I am trying to do.
 
G

Grant Robertson

keshlam- said:
Different. If you want to intermix HTML and other markup, use XHTML.
That's why it was created.

Thanks. This tells me that I will need to adjust my standard to require
all content to be in XHTML instead of HTML. Since a lot of the content
that will be put into this system will originally be in HTML format, this
tells me that I will need to incorporate a feature to automatically
convert HTML to XHTML when people cut and paste content into the
authoring software.
 
J

Joseph Kesselman

tells me that I will need to incorporate a feature to automatically
convert HTML to XHTML when people cut and paste content into the
authoring software.

That's definitely the simplest solution. The W3C makes a tool available
("tidy") which can do that cleanup; it also incorporates a lot of the
repair guesswork that's needed when people (inevitably!) want you to
process files that aren't even correct HTML.

(Eliminating that guesswork stage is Yet Another reason the W3C would
really like to see everyone moving toward XHTML.)
 
J

Joseph Kesselman

Forgot to say: Another possible solution would be to use a parser like
NekoHTML (derived from the Apache Xerces XML parser), which reads HTML,
applies the guesswork, and presents it to the rest of the application as
if it had been XHTML.
 
P

Peter Flynn

Grant said:
Thanks. This tells me that I will need to adjust my standard to require
all content to be in XHTML instead of HTML. Since a lot of the content
that will be put into this system will originally be in HTML format, this
tells me that I will need to incorporate a feature to automatically
convert HTML to XHTML when people cut and paste content into the
authoring software.

Although it's possible, in general mixing XHTML and other markup is A
Bad Idea. Use a properly-defined XML vocabulary to store the
information, and convert it to plain XHTML to serve it.

///Peter
 
J

Joseph Kesselman

Peter said:
Although it's possible, in general mixing XHTML and other markup is A
Bad Idea. Use a properly-defined XML vocabulary to store the
information, and convert it to plain XHTML to serve it.

That's also a good point... except when the point of your XML is to
wrapper and store fragments of XHTML, which does sometimes happen. (To
take an obvious example, a stylesheet which generates XHTML will often
contain fragments of XHTML which get copied into its output.)
 
G

Grant Robertson

Although it's possible, in general mixing XHTML and other markup is A
Bad Idea. Use a properly-defined XML vocabulary to store the
information, and convert it to plain XHTML to serve it.

Not all XML is "served" out to a web browser. The primary use of
documents produced using my standard will be within special software that
is part web display engine and part organizational system. The content
will have been downloaded directly to the user's computer and then used
from their "cache", rather than served up from a web server each time
they use it.

However, I do want people to be able to view the content directly from a
web server using a web browser if that is all they've got. I will have a
"properly-defined XML vocabulary" which is the wrapper that contains the
piece of XHTML content. The content must be in XHTML (or HTML) so that
people can use existing tools to create and format the content. Once that
is created - or found in archives or repositories - it will be pasted
into specialized authoring software which will surround that XHTML (or
HTML) content with tags from my XML vocabulary in order to attach all the
metadata to the content. (No, I don't want to just make the content an
XHTML page with metatags in the header. That won't work for my system.)

Since viewing in a standard web browser is not the primary purpose of
this content, do you think it would really do much harm to put my own
tags within the XHTML content. Would most web browsers choke on that?

I guess what I could do is create a XSL transform that strips off the XML
wrapper and strips out the internal XML tags and creates a separate
XHTML-only version, just for direct viewing with a plain web browser. It
could be a one time thing that is run against the XML file and then both
files would be added to the directory on the server. One would be a .XML
file and the other would be a .HTM file with the same base name. That
wouldn't really be too hard to do and would still enable the server to be
a plain-jane server with no real server side code.
 
G

Grant Robertson

That's also a good point... except when the point of your XML is to
wrapper and store fragments of XHTML, which does sometimes happen. (To
take an obvious example, a stylesheet which generates XHTML will often
contain fragments of XHTML which get copied into its output.)

To further expound on Peter's concerns, If someone took that stylesheet
with the fragments of XHTML in it and opened it directly in a plain web
browser, would they just see the XHTML or would they see all the
stylesheet tags too? I need users to only see the XHTML content. If this
isn't possible then I may just use a modified version of what Peter
suggested. Once it is automated it won't be any trouble at all and will
only marginally increase the total data store size. (Yes, the metadata
will likely consume far more space than the actual content. No one said
XML was the most efficient means of tagging data.)
 
G

Grant Robertson

There's nowt on the newsgroups, so try Wikipedia and Google instead.
You really need to have read at least these (and their background
links) before you start

Well, I have already started so it is too late for that. ;^)

But thanks for the links. I will add them to the 300MB of notes and links
that I have already accumulated in my research for this project. Not to
be flippant, but I have done considerable research into existing
standards. Unfortunately, just reading a spec - and I have read dozens -
does not give one the full information. I am still trying to figure out
how to properly use the Dublin Core citation tags within the Learning
Object Metadata (LOM) standard and then use that within my own XML
documents, AND express all of that in an XML Schema. It is kind of slow
going when I don't have real people to ask questions of. I am almost
ready to move to Boston so I can sit on TBL's front porch and ask him to
splain all this to me.
 
A

Andy Dingley

I am almost
ready to move to Boston so I can sit on TBL's front porch and ask him to
splain all this to me.

Worked for me. I'm in Bristol. I just have to wander into the pub and
I keep bumping into various bits of the W3
 
J

Joe Kesselman

In addition to searching specs, you may want to search
articles/tutorials, which might have advice that's more specific to your
needs. Herewith my standard pointer to the large collection at
http://www.ibm.com/xml

(The specs are deliberately written by experts for experts and are
prescriptive rather than descriptive, so interpolating from them to
actual applications can be a slow process. Real examples/discussions may
be more useful when you're trying to find out "how" rather than "why and
why not".)
 
J

Joe Kesselman

Grant said:
> I need users to only see the XHTML content.

Usual solution, then, is to put the document through an XSLT stylesheet
or XQuery which describes how to extract and present only those portions
of the document. Obviously you can do this with hand-coding as well.
 
G

Grant Robertson

Worked for me. I'm in Bristol. I just have to wander into the pub and
I keep bumping into various bits of the W3

Is this Bristol in the UK or somewhere in Massachusetts? Either way, I am
pretty envious.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,140
Latest member
SweetcalmCBDreview
Top