XML for multilingual technical documentation - seven questions

D

David Winter

As a technical author and translator, I am highly interested in single
source/multi format publishing. Meaning: I'd like to keep manuals,
technical specifications etc. in multiple languages (English, French)
in a *single* repository (<- files or database) and generate documents
in the various languages and target formats (XHTML, PDF, HTML Help,
Text) on demand.

I am not a programmer, though, and can't develop my own tools, but of
course I am willing to invest money and spend time learning.

I understand that I could use an existing XML Schema such as DocBook or
cook my own and then use XSLT to generate the various output formats.

Since I'm not keen on reinventing the wheel, I'd like to ask you what
would be a good (proven) way to achieve the following. I am looking for
a set of tools and technologies that will work together reliably, and I
assume others have solved these problems before. I'd be grateful if
someone could answer a few of the following questions.


1) Authoring tool?

I guess using a native XML editor from the start would be a better
approach than exporting from some proprietary format such as
FrameMaker. I have considered <oxygen/> and the XML Mind Editor. Are
these good editors for daily work on big, complex documents? What other
products would you recommend for a user fluent with plain text editors,
Frame and Dreamweaver? (A *cough* WYSIWYG environment (using some CSS)
would be appreciated.)


2) Appropriate XML Schema/DTD: DocBook or ..?

DocBook is impressing, but - forgive my blasphemy - seems a bit baroque
while missing pieces I would need for certain clients/technologies. Now
this may seem a bit megalomaniac, but if I wanted to build my own XML
Schema - what tools should I use? The Altova product suite seems
professional, but maybe overkill for a freelancer. What would you
suggest?


3) XSLT

I understand the XSLT processor does most of the magic that turns XML
into target formats. Assuming you'd want XHTML, pretty PDFs and HTML
Help - what would be my weapon of choice as a non-programmer? I'd like
to be able to modify PDF and HTML output, so a "blackbox" app is out of
question.


4) Multilingual documents

To prevent version drift, I would like to keep the text for all
languages in the same file. I.e. the (imaginary) <head1> tag should
hold both the English "Introduction" and the French "Préliminaire".
What's the best approach to achieve this? I can hardly have two <head1
lang="FOO"> tags when my DTD/Schema allows only one. Namespaces?


5) Index/TOC/Document outline

A (multi-level) Index, Table of Contents and maybe a (collapsible)
outline view of a document - does XSLT take care of these? Are there
e.g. sample XSLT stylesheets that can generate a hyperlinked outline of
an XML document in HTML?


6) Conditional Text

What I mean here is text that can be filtered out when generating
target formats. Assuming I want to do something like "Only generate the
digest version of the manual" - does DocBook allow me to tag sections
as "Only for Digest Version"? What would be the generic approach to do
this in XML, and how can I combine them on rendering ("Only for PDF"
AND "Digest")?


7) CAT translation

Integration of Translation Memory Tools: Is there an easy way to feed
XML (e.g. DocBook) documents into CAT tools? Ideally, this would accept
<para lang="EN">Source</para> and generate <para
lang="FR">Target</para> from a TU database.


Thank you for helping.
 
P

Peter Flynn

David said:
As a technical author and translator, I am highly interested in single
source/multi format publishing. Meaning: I'd like to keep manuals,
technical specifications etc. in multiple languages (English, French)
in a *single* repository (<- files or database) and generate documents
in the various languages and target formats (XHTML, PDF, HTML Help,
Text) on demand.

Yep. Common requirement.
I am not a programmer, though, and can't develop my own tools, but of
course I am willing to invest money and spend time learning.

I understand that I could use an existing XML Schema such as DocBook or
cook my own and then use XSLT to generate the various output formats.

DocBook is excellent for computer documentation. It may be overkill for
technical documentation in other fields (eg maintenance manuals for
washing machines) or may simply not provide what is needed in those
fields. It's a popular misconception that DocBook is for *any* technical
documentation, computing or not. And yes, XSLT can be used to transform
your XML.
Since I'm not keen on reinventing the wheel, I'd like to ask you what
would be a good (proven) way to achieve the following. I am looking for
a set of tools and technologies that will work together reliably, and I
assume others have solved these problems before. I'd be grateful if
someone could answer a few of the following questions.


1) Authoring tool?

I guess using a native XML editor from the start would be a better
Esseential.

approach than exporting from some proprietary format such as
FrameMaker. I have considered <oxygen/> and the XML Mind Editor. Are
these good editors for daily work on big, complex documents? What other
products would you recommend for a user fluent with plain text editors,
Frame and Dreamweaver? (A *cough* WYSIWYG environment (using some CSS)
would be appreciated.)

Don't be fooled by WYSIWYG. Unless it provides *all* your formatting needs,
it may be more of a hindrance than a help. An editor sold on the spurious
basis that it can use fonts and colour does not IMHO qualify as WYSIWYG.

Plaintext: Emacs with psgmls and nsgmls is free and runs on all platforms.

High-end: XML Spy and EPIC are excellent but to do *all* your formatting
you will almost certainly need to start programming them internally.
2) Appropriate XML Schema/DTD: DocBook or ..?

DocBook is impressing, but - forgive my blasphemy - seems a bit baroque

Quod scripsi scripsi (ut supra).
while missing pieces I would need for certain clients/technologies. Now
this may seem a bit megalomaniac, but if I wanted to build my own XML
Schema - what tools should I use? The Altova product suite seems
professional, but maybe overkill for a freelancer. What would you
suggest?

I write DTDs in Emacs with tdtd-mode, and I'll let you into a secret:
most the other DTD and Schema writers I know do the same -- eventually.
Graphical structure-design programs are excellent to get the thing up
and running in outline, though.
3) XSLT

I understand the XSLT processor does most of the magic that turns XML
into target formats. Assuming you'd want XHTML, pretty PDFs and HTML
Help - what would be my weapon of choice as a non-programmer? I'd like
to be able to modify PDF and HTML output, so a "blackbox" app is out of
question.

Don't even think of trying to modify PDF. It's and end-of-line format and
is not designed to be modified, just recreated afresh. In fact, don't try
and modify the HTML either. Always fix the problem in the XSLT (or the XML,
depending on what the problem is) and the recreate the output.

XSL:FO will create PDF direct, but at the expense of having to reinvent all
the formatting wheels -- by hand. I prefer to use XSLT to create LaTeX, and
rely on it because it already knows more about document formatting than
anything else. But it does mean learning some LaTeX (not hard, just
different).
4) Multilingual documents

To prevent version drift, I would like to keep the text for all
languages in the same file. I.e. the (imaginary) <head1> tag should
hold both the English "Introduction" and the French "Préliminaire".
What's the best approach to achieve this? I can hardly have two <head1
lang="FOO"> tags when my DTD/Schema allows only one. Namespaces?

Possibly. Or maybe <head lang="fr">Préliminaire</head> and
<head lang="en">Introduction</head>. These are a form of "effectivities"
(ie they come into effect only when picked up by your XSLT when you
specify "use lang='fr' this time"). Many DTDs do allow precisely this
kind of thing, specifically for this purpose (and more commonly, text
applicable to related but different product lines).

The alternative is to use a translating editor, if you can find one. There
was a superb one put out by CITEC years ago, for SGML, which displayed your
source language in the top window, and in the bottom window it put the
exact same elements, only empty, ready to fill in the target language
(subelements in mixed content were omitted, of course, as they would
likely occur in different sequences in a target language). But this has
long since disappeared, alas, and I've never seen a replacement.
5) Index/TOC/Document outline

A (multi-level) Index, Table of Contents and maybe a (collapsible)
outline view of a document - does XSLT take care of these? Are there
e.g. sample XSLT stylesheets that can generate a hyperlinked outline of
an XML document in HTML?

You can program these in XSLT very easily. There are indeed sample XSLT
stylesheets for (eg) DocBook doing exactly this.
6) Conditional Text

What I mean here is text that can be filtered out when generating
target formats. Assuming I want to do something like "Only generate the
digest version of the manual" - does DocBook allow me to tag sections
as "Only for Digest Version"? What would be the generic approach to do
this in XML, and how can I combine them on rendering ("Only for PDF"
AND "Digest")?

These are effectivities as above. DocBook has attributes to identify
conditionality and many other metadata features. So do many other DTDs.

Combining them would be something you do in the XSLT.
7) CAT translation

Integration of Translation Memory Tools: Is there an easy way to feed
XML (e.g. DocBook) documents into CAT tools? Ideally, this would accept
<para lang="EN">Source</para> and generate <para
lang="FR">Target</para> from a TU database.

I don't know what tools exist in this area. The localisation business was
very slow to take up XML, but it is gathering speed now. The nexus of
knowledge in this area is probably Dublin, which has a huge localisation
industry.

///Peter
 
D

David Winter

Hello Peter,

thank you for your comments - highly appreciated!

Well, it seems I'll bite the bullet and finally learn Emacs. :/

Don't even think of trying to modify PDF.

Sorry; I didn't express myself correctly here. I do not want to fiddle
with the HTML and PDF output, but change the XSLT or - in the case of
PDF - the XSL:FO generating the output. I still have no concept of
XSL:FO, i.e. how to setup various templates for cover and TOC pages,
multi-column pages etc. I had hoped for a handy GUI, but I can live
with some code tweaking. I'll finally take a closer look at LaTeX,
too.

Or maybe <head lang="fr">Préliminaire</head> and
<head lang="en">Introduction</head>.
Many DTDs do allow precisely this kind of thing,
specifically for this purpose (and more commonly, text
applicable to related but different product lines).

What (DTD) would you personally suggest for this (= Writing/maintaining
long technical manuals (various languages, various product versions)?
So far, I keep separate documents for each language, but having to
apply structure changes several times is a PITA.

Thank you again.
 
P

Peter Flynn

David said:
Hello Peter,

thank you for your comments - highly appreciated!

Well, it seems I'll bite the bullet and finally learn Emacs. :/

:) It's a life skill. I can't count the number of times it's saved my neck
when other systems have failed to produce the goodies.
Sorry; I didn't express myself correctly here. I do not want to fiddle
with the HTML and PDF output, but change the XSLT or - in the case of
PDF - the XSL:FO generating the output. I still have no concept of
XSL:FO, i.e. how to setup various templates for cover and TOC pages,
multi-column pages etc. I had hoped for a handy GUI, but I can live
with some code tweaking. I'll finally take a closer look at LaTeX,
too.

There are several experiments ongoing at creating XSLT GUIs but none of
them do anything useful outside simple 1:1 transformations (eg <para> to
<p>).

Cover pages (unless purely typographic) are often done by a designer as
a separate job. I don't know how your organisation handles these.

The reason behind recommending LaTeX over FO is simply that LaTeX has
all the stuff for automation (eg ToC, multi-columns, etc) already
written. I hate reinventing wheels in a production job.
What (DTD) would you personally suggest for this (= Writing/maintaining
long technical manuals (various languages, various product versions)?

Are they computer manuals or some other technology? For computer doc
I would always recommend DocBook as I've never found anything to beat it,
but if it's some other area, there may be industry-specific DTDs already
available (ask the relevant industrial consortiums and representative
bodies). Otherwise you can always write your own, but it's easier to
steal^H^H^H^H^Hplagia^H^H^H^H^H^Hborrow from another DTD where possible.

Get a copy of Eve Maler and Jeanne el Andaloussi's "SGML DTDs: from Text
to Model to Markup" (ignore the "SGML" in the title: 99% of everything in
the book applies to XML as well). This is THE book on writing DTDs, and
it covers the non-technical side of consulting with users, colleagues, etc,
document modelling, document analysis, and all the organisational aspects.

Doing it yourself is not hard, but needs foresight and hindsight as well
as inside knowledge of the document type.
So far, I keep separate documents for each language, but having to
apply structure changes several times is a PITA.

All multilingual work is a PITA to keep in synch unless you have a large-
scale production publishing workflow system. Actually you probably could
do something like it in Cocoon, but that would be a BIG task.

My gut feeling is to use separate documents, and have a CVS or RCS or other
document check-out/check-in system that will do something sensible with
the "this paragraph changed last time" attributes when a document is
checked out for editing (ie zap them), and then do some kind of diff on
the document when it's checked back in, and see if the diffs have all
been flagged with the relevant "updated" or "deleted" attribute, and
then enforce an interlock on publishing it until corresponding language
versions have been brought up to date. That would be a little tricky to
write, but it would help keep stuff in synch.

///Peter
 
D

David Winter

Peter,

once again thank you for your advice. The ideas on a multi-lingual
workflow sound interesting, but since I am a freelancer, I will have to
come up with some kind of home-cooked, affordable solution or wait for
an Open Source project (right now, everyone and their grandmother seems
to focus on building yet another generic CMS/Blog tool).

BTW, AuthorIT (http://www.authorit.com/) does what I have in mind (and
more), but at least the Localization Manager is out of my price range.
I guess I'll go with DocBook and use the opportunity to learn
something. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top