An often asked question - document consistency

Discussion in 'XML' started by Geico Caveman, Jul 26, 2007.

  1. Hello,

    I am a long time user of LaTeX on Linux platform. I have episodically used
    OpenOffice.org Writer and Microsoft Word 2003 (using Crossover Linux) to
    satisfy a few people who insist in putting habit over quality.

    However, now I am faced with a situation that is probably familiar to some
    of you. I have a document that needs to be available as PDF, as LaTeX
    source code (not the least for myself), and unfortunately, as a DOC file at
    the same time. The first two are easy to arrange, and I have been using
    pdflatex for years to produce high quality pdfs. The last is the problem.
    For reasons that are obvious, and need not be discussed, doc is kind of a
    stand alone format, refusing to play nice with anything else.

    I have been looking at xml format as a possible way out of this mess. Is it
    possible for me to convert LaTeX to xml (texml claims to do this), and then
    have Microsoft Word 2003 read this somehow ? I do fear that true to form,
    Microsoft Office 2003 XML might be inconsistent in some fashion with the
    output of that process (would be too standard otherwise for Word).

    The other option seems to be to use mk4ht/oolatex to convert the document
    to odt and then save as doc using OpenOffice.org. I do not like that
    approach as I know from personal experience - OpenOffice.org's doc export
    is not perfect, and becomes increasingly deficient for more complicated
    documents. Its a miracle that the doc export works to the extent it does,
    but its not acceptable for my documents which are often very complicated.
    Export to Microsoft Office 2003 XML has problems when Word 2003 sometimes
    fails to read the documents generated.

    Any suggestions (short of asking me to maintain two versions manually, one
    in LaTeX, and the other in Word) would be very welcome.

    Thanks.
     
    Geico Caveman, Jul 26, 2007
    #1
    1. Advertisements

  2. Correct: "doc is kind of a stand alone format, refusing to play nice
    with anything else" -- this is root of the problem.
    Most likely this is the case. Just because MS-Word 2003 uses XML does
    not mean very much.
    There really is no *good* (read: perfect and totally automated) way of
    doing what you want. There are many reasons for this (and you seem to
    be aware of all/most of them).
     
    Robert Heller, Jul 26, 2007
    #2
    1. Advertisements

  3. Hello,

    In such cases, I use an Open Office (or Word) document as the source
    document. I accurately use styles only, without any manual formatting,
    therefore

    * I can save the document as a raw XML. Then
    * an XSLT program converts the raw XML to an XML with a logical
    structure, and
    * TeXML plus Consodoc make LaTeX and PDF.
     
    Oleg Paraschenko, Jul 27, 2007
    #3
  4. Geico Caveman

    Peter Flynn Guest

    A common position.
    Is this (are these?) documents you author yourself, or are they written
    by someone else over whose software you have no control?
    The .doc format is valueless, as you clearly understand. It is also now
    obsolescent.
    Correct choice. Unfortunately the XML created by Word tends to be just
    as valueless as .doc, as all it does it provide an XML-readable
    expression of the visual appearance, unless you are rigorously using a
    very carefully-designed stylesheet.
    Yes, but with some difficulty, and not a lot of reliability unless the
    document structure and markup is very simple.
    Not at all easily.
    Word is capable of reading a non-WordML XML document but it requires a
    Schema and some massaging. Not a path I would choose to tread.
    That about sums it up. Other wordprocessor conversions to Word format
    have similar restrictions.
    Ah. Can you give a minimal example?
    The canonical solution to this is to author the documents in a suitable
    document type in XML to start with, and then convert them to your output
    targets using XSLT. Store the master in XML, generate what you need. In
    general, I avoid using anything other than XML for the master version of
    a document.

    Transformation using XSLT to LaTeX is similar to transformation to HTML,
    and not difficult to achieve. Making the generated LaTeX source look
    pretty is more difficult (if that is a requirement), but if all you need
    is LaTeX code that compiles without error, it's straightforward, modulo
    the complexity of your document structure.

    Transformation to WordML and similar is also possible but much more
    complex, because there is a vast amount of redundancy to include, and
    there are multiple ways of achieving the same result. As another writer
    has asked, do these Word documents need to be editable? Without knowing
    the level of complexity, it's hard to be more specific.

    A shortcut which may work is to transform the document to very carefully
    constructed XHTML with an embedded <style> header element, and then
    rename the output file to end in .doc. Word is undiscriminating, and
    seems to open such files in native (.doc) mode without complaint, but I
    have only used this method for relatively simple documents.

    ///Peter
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Jul 30, 2007
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.