XML for multilingual technical documentation - seven questions

Discussion in 'XML' started by David Winter, Jan 8, 2005.

  1. David Winter

    David Winter Guest

    As a technical author and translator, I am highly interested in single
    source/multi format publishing. Meaning: I'd like to keep manuals,
    technical specifications etc. in multiple languages (English, French)
    in a *single* repository (<- files or database) and generate documents
    in the various languages and target formats (XHTML, PDF, HTML Help,
    Text) on demand.

    I am not a programmer, though, and can't develop my own tools, but of
    course I am willing to invest money and spend time learning.

    I understand that I could use an existing XML Schema such as DocBook or
    cook my own and then use XSLT to generate the various output formats.

    Since I'm not keen on reinventing the wheel, I'd like to ask you what
    would be a good (proven) way to achieve the following. I am looking for
    a set of tools and technologies that will work together reliably, and I
    assume others have solved these problems before. I'd be grateful if
    someone could answer a few of the following questions.


    1) Authoring tool?

    I guess using a native XML editor from the start would be a better
    approach than exporting from some proprietary format such as
    FrameMaker. I have considered <oxygen/> and the XML Mind Editor. Are
    these good editors for daily work on big, complex documents? What other
    products would you recommend for a user fluent with plain text editors,
    Frame and Dreamweaver? (A *cough* WYSIWYG environment (using some CSS)
    would be appreciated.)


    2) Appropriate XML Schema/DTD: DocBook or ..?

    DocBook is impressing, but - forgive my blasphemy - seems a bit baroque
    while missing pieces I would need for certain clients/technologies. Now
    this may seem a bit megalomaniac, but if I wanted to build my own XML
    Schema - what tools should I use? The Altova product suite seems
    professional, but maybe overkill for a freelancer. What would you
    suggest?


    3) XSLT

    I understand the XSLT processor does most of the magic that turns XML
    into target formats. Assuming you'd want XHTML, pretty PDFs and HTML
    Help - what would be my weapon of choice as a non-programmer? I'd like
    to be able to modify PDF and HTML output, so a "blackbox" app is out of
    question.


    4) Multilingual documents

    To prevent version drift, I would like to keep the text for all
    languages in the same file. I.e. the (imaginary) <head1> tag should
    hold both the English "Introduction" and the French "Préliminaire".
    What's the best approach to achieve this? I can hardly have two <head1
    lang="FOO"> tags when my DTD/Schema allows only one. Namespaces?


    5) Index/TOC/Document outline

    A (multi-level) Index, Table of Contents and maybe a (collapsible)
    outline view of a document - does XSLT take care of these? Are there
    e.g. sample XSLT stylesheets that can generate a hyperlinked outline of
    an XML document in HTML?


    6) Conditional Text

    What I mean here is text that can be filtered out when generating
    target formats. Assuming I want to do something like "Only generate the
    digest version of the manual" - does DocBook allow me to tag sections
    as "Only for Digest Version"? What would be the generic approach to do
    this in XML, and how can I combine them on rendering ("Only for PDF"
    AND "Digest")?


    7) CAT translation

    Integration of Translation Memory Tools: Is there an easy way to feed
    XML (e.g. DocBook) documents into CAT tools? Ideally, this would accept
    <para lang="EN">Source</para> and generate <para
    lang="FR">Target</para> from a TU database.


    Thank you for helping.
     
    David Winter, Jan 8, 2005
    #1
    1. Advertising

  2. David Winter

    Peter Flynn Guest

    David Winter wrote:

    >
    > As a technical author and translator, I am highly interested in single
    > source/multi format publishing. Meaning: I'd like to keep manuals,
    > technical specifications etc. in multiple languages (English, French)
    > in a *single* repository (<- files or database) and generate documents
    > in the various languages and target formats (XHTML, PDF, HTML Help,
    > Text) on demand.


    Yep. Common requirement.

    > I am not a programmer, though, and can't develop my own tools, but of
    > course I am willing to invest money and spend time learning.
    >
    > I understand that I could use an existing XML Schema such as DocBook or
    > cook my own and then use XSLT to generate the various output formats.


    DocBook is excellent for computer documentation. It may be overkill for
    technical documentation in other fields (eg maintenance manuals for
    washing machines) or may simply not provide what is needed in those
    fields. It's a popular misconception that DocBook is for *any* technical
    documentation, computing or not. And yes, XSLT can be used to transform
    your XML.

    > Since I'm not keen on reinventing the wheel, I'd like to ask you what
    > would be a good (proven) way to achieve the following. I am looking for
    > a set of tools and technologies that will work together reliably, and I
    > assume others have solved these problems before. I'd be grateful if
    > someone could answer a few of the following questions.
    >
    >
    > 1) Authoring tool?
    >
    > I guess using a native XML editor from the start would be a better


    Esseential.

    > approach than exporting from some proprietary format such as
    > FrameMaker. I have considered <oxygen/> and the XML Mind Editor. Are
    > these good editors for daily work on big, complex documents? What other
    > products would you recommend for a user fluent with plain text editors,
    > Frame and Dreamweaver? (A *cough* WYSIWYG environment (using some CSS)
    > would be appreciated.)


    Don't be fooled by WYSIWYG. Unless it provides *all* your formatting needs,
    it may be more of a hindrance than a help. An editor sold on the spurious
    basis that it can use fonts and colour does not IMHO qualify as WYSIWYG.

    Plaintext: Emacs with psgmls and nsgmls is free and runs on all platforms.

    High-end: XML Spy and EPIC are excellent but to do *all* your formatting
    you will almost certainly need to start programming them internally.

    > 2) Appropriate XML Schema/DTD: DocBook or ..?
    >
    > DocBook is impressing, but - forgive my blasphemy - seems a bit baroque


    Quod scripsi scripsi (ut supra).

    > while missing pieces I would need for certain clients/technologies. Now
    > this may seem a bit megalomaniac, but if I wanted to build my own XML
    > Schema - what tools should I use? The Altova product suite seems
    > professional, but maybe overkill for a freelancer. What would you
    > suggest?


    I write DTDs in Emacs with tdtd-mode, and I'll let you into a secret:
    most the other DTD and Schema writers I know do the same -- eventually.
    Graphical structure-design programs are excellent to get the thing up
    and running in outline, though.

    > 3) XSLT
    >
    > I understand the XSLT processor does most of the magic that turns XML
    > into target formats. Assuming you'd want XHTML, pretty PDFs and HTML
    > Help - what would be my weapon of choice as a non-programmer? I'd like
    > to be able to modify PDF and HTML output, so a "blackbox" app is out of
    > question.


    Don't even think of trying to modify PDF. It's and end-of-line format and
    is not designed to be modified, just recreated afresh. In fact, don't try
    and modify the HTML either. Always fix the problem in the XSLT (or the XML,
    depending on what the problem is) and the recreate the output.

    XSL:FO will create PDF direct, but at the expense of having to reinvent all
    the formatting wheels -- by hand. I prefer to use XSLT to create LaTeX, and
    rely on it because it already knows more about document formatting than
    anything else. But it does mean learning some LaTeX (not hard, just
    different).

    > 4) Multilingual documents
    >
    > To prevent version drift, I would like to keep the text for all
    > languages in the same file. I.e. the (imaginary) <head1> tag should
    > hold both the English "Introduction" and the French "Préliminaire".
    > What's the best approach to achieve this? I can hardly have two <head1
    > lang="FOO"> tags when my DTD/Schema allows only one. Namespaces?


    Possibly. Or maybe <head lang="fr">Préliminaire</head> and
    <head lang="en">Introduction</head>. These are a form of "effectivities"
    (ie they come into effect only when picked up by your XSLT when you
    specify "use lang='fr' this time"). Many DTDs do allow precisely this
    kind of thing, specifically for this purpose (and more commonly, text
    applicable to related but different product lines).

    The alternative is to use a translating editor, if you can find one. There
    was a superb one put out by CITEC years ago, for SGML, which displayed your
    source language in the top window, and in the bottom window it put the
    exact same elements, only empty, ready to fill in the target language
    (subelements in mixed content were omitted, of course, as they would
    likely occur in different sequences in a target language). But this has
    long since disappeared, alas, and I've never seen a replacement.

    > 5) Index/TOC/Document outline
    >
    > A (multi-level) Index, Table of Contents and maybe a (collapsible)
    > outline view of a document - does XSLT take care of these? Are there
    > e.g. sample XSLT stylesheets that can generate a hyperlinked outline of
    > an XML document in HTML?


    You can program these in XSLT very easily. There are indeed sample XSLT
    stylesheets for (eg) DocBook doing exactly this.

    > 6) Conditional Text
    >
    > What I mean here is text that can be filtered out when generating
    > target formats. Assuming I want to do something like "Only generate the
    > digest version of the manual" - does DocBook allow me to tag sections
    > as "Only for Digest Version"? What would be the generic approach to do
    > this in XML, and how can I combine them on rendering ("Only for PDF"
    > AND "Digest")?


    These are effectivities as above. DocBook has attributes to identify
    conditionality and many other metadata features. So do many other DTDs.

    Combining them would be something you do in the XSLT.

    > 7) CAT translation
    >
    > Integration of Translation Memory Tools: Is there an easy way to feed
    > XML (e.g. DocBook) documents into CAT tools? Ideally, this would accept
    > <para lang="EN">Source</para> and generate <para
    > lang="FR">Target</para> from a TU database.


    I don't know what tools exist in this area. The localisation business was
    very slow to take up XML, but it is gathering speed now. The nexus of
    knowledge in this area is probably Dublin, which has a huge localisation
    industry.

    ///Peter
    --
    "The cat in the box is both a wave and a particle"
    -- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
     
    Peter Flynn, Jan 8, 2005
    #2
    1. Advertising

  3. David Winter

    David Winter Guest

    Hello Peter,

    thank you for your comments - highly appreciated!

    Well, it seems I'll bite the bullet and finally learn Emacs. :/


    > Don't even think of trying to modify PDF.


    Sorry; I didn't express myself correctly here. I do not want to fiddle
    with the HTML and PDF output, but change the XSLT or - in the case of
    PDF - the XSL:FO generating the output. I still have no concept of
    XSL:FO, i.e. how to setup various templates for cover and TOC pages,
    multi-column pages etc. I had hoped for a handy GUI, but I can live
    with some code tweaking. I'll finally take a closer look at LaTeX,
    too.


    > Or maybe <head lang="fr">Préliminaire</head> and
    > <head lang="en">Introduction</head>.
    > Many DTDs do allow precisely this kind of thing,
    > specifically for this purpose (and more commonly, text
    > applicable to related but different product lines).


    What (DTD) would you personally suggest for this (= Writing/maintaining
    long technical manuals (various languages, various product versions)?
    So far, I keep separate documents for each language, but having to
    apply structure changes several times is a PITA.

    Thank you again.
     
    David Winter, Jan 9, 2005
    #3
  4. David Winter

    Peter Flynn Guest

    David Winter wrote:

    > Hello Peter,
    >
    > thank you for your comments - highly appreciated!
    >
    > Well, it seems I'll bite the bullet and finally learn Emacs. :/


    :) It's a life skill. I can't count the number of times it's saved my neck
    when other systems have failed to produce the goodies.

    >> Don't even think of trying to modify PDF.

    >
    > Sorry; I didn't express myself correctly here. I do not want to fiddle
    > with the HTML and PDF output, but change the XSLT or - in the case of
    > PDF - the XSL:FO generating the output. I still have no concept of
    > XSL:FO, i.e. how to setup various templates for cover and TOC pages,
    > multi-column pages etc. I had hoped for a handy GUI, but I can live
    > with some code tweaking. I'll finally take a closer look at LaTeX,
    > too.


    There are several experiments ongoing at creating XSLT GUIs but none of
    them do anything useful outside simple 1:1 transformations (eg <para> to
    <p>).

    Cover pages (unless purely typographic) are often done by a designer as
    a separate job. I don't know how your organisation handles these.

    The reason behind recommending LaTeX over FO is simply that LaTeX has
    all the stuff for automation (eg ToC, multi-columns, etc) already
    written. I hate reinventing wheels in a production job.

    > What (DTD) would you personally suggest for this (= Writing/maintaining
    > long technical manuals (various languages, various product versions)?


    Are they computer manuals or some other technology? For computer doc
    I would always recommend DocBook as I've never found anything to beat it,
    but if it's some other area, there may be industry-specific DTDs already
    available (ask the relevant industrial consortiums and representative
    bodies). Otherwise you can always write your own, but it's easier to
    steal^H^H^H^H^Hplagia^H^H^H^H^H^Hborrow from another DTD where possible.

    Get a copy of Eve Maler and Jeanne el Andaloussi's "SGML DTDs: from Text
    to Model to Markup" (ignore the "SGML" in the title: 99% of everything in
    the book applies to XML as well). This is THE book on writing DTDs, and
    it covers the non-technical side of consulting with users, colleagues, etc,
    document modelling, document analysis, and all the organisational aspects.

    Doing it yourself is not hard, but needs foresight and hindsight as well
    as inside knowledge of the document type.

    > So far, I keep separate documents for each language, but having to
    > apply structure changes several times is a PITA.


    All multilingual work is a PITA to keep in synch unless you have a large-
    scale production publishing workflow system. Actually you probably could
    do something like it in Cocoon, but that would be a BIG task.

    My gut feeling is to use separate documents, and have a CVS or RCS or other
    document check-out/check-in system that will do something sensible with
    the "this paragraph changed last time" attributes when a document is
    checked out for editing (ie zap them), and then do some kind of diff on
    the document when it's checked back in, and see if the diffs have all
    been flagged with the relevant "updated" or "deleted" attribute, and
    then enforce an interlock on publishing it until corresponding language
    versions have been brought up to date. That would be a little tricky to
    write, but it would help keep stuff in synch.

    ///Peter
    --
    "The cat in the box is both a wave and a particle"
    -- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
     
    Peter Flynn, Jan 9, 2005
    #4
  5. David Winter

    David Winter Guest

    Peter,

    once again thank you for your advice. The ideas on a multi-lingual
    workflow sound interesting, but since I am a freelancer, I will have to
    come up with some kind of home-cooked, affordable solution or wait for
    an Open Source project (right now, everyone and their grandmother seems
    to focus on building yet another generic CMS/Blog tool).

    BTW, AuthorIT (http://www.authorit.com/) does what I have in mind (and
    more), but at least the Localization Manager is out of my price range.
    I guess I'll go with DocBook and use the opportunity to learn
    something. :)
     
    David Winter, Jan 10, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JoanneC
    Replies:
    0
    Views:
    438
    JoanneC
    Aug 28, 2003
  2. JoanneC
    Replies:
    0
    Views:
    532
    JoanneC
    Aug 30, 2003
  3. Replies:
    2
    Views:
    633
    Shmuel (Seymour J.) Metz
    May 29, 2005
  4. HALLES
    Replies:
    0
    Views:
    466
    HALLES
    May 30, 2005
  5. Astley Le Jasper

    Multilingual documentation solutions

    Astley Le Jasper, Oct 27, 2010, in forum: Python
    Replies:
    2
    Views:
    226
    Astley Le Jasper
    Oct 27, 2010
Loading...

Share This Page