Convert Word .doc to Acrobat .pdf files

Discussion in 'Python' started by kbperry, Mar 23, 2006.

  1. kbperry

    kbperry Guest

    Hi all,

    Background:
    I need some help. I am trying to streamline a process for one of our
    technical writers. He is using Perforce (version control system), and
    is constantly changing his word documents, and then converts them to
    both .pdf and "Web page" format (to publish to the web). He has a
    licensed copy of Adobe Acrobat Professional (7.x).

    Questions:
    Does Acrobat Pro, have some way to interface with it command-line (I
    tried searching, but couldn't find anything)? Is there any other good
    way to script word to pdf conversion?

    Note: The word documents do contain images, and lots of stuff besides
    just text.
    kbperry, Mar 23, 2006
    #1
    1. Advertising

  2. I wrote a script which uses OpenOffice. It can
    convert and read a lot of formats.

    #!/usr/bin/env python
    #Old: !/optlocal/OpenOffice.org/program/python
    # (c) 2003-2006 Thomas Guettler http://www.tbz-pariv.de/

    # OpenOffice1.1 comes with its own python interpreter.
    # This Script needs to be run with the python from OpenOffice1:
    # /opt/OpenOffice.org/program/python
    # Start the Office before connecting:
    # soffice "-accept=socket,host=localhost,port=2002;urp;"
    #
    # With OpenOffice2 you can use the default Python-Interpreter (at least on SuSE)
    #

    # Python Imports
    import os
    import re
    import sys
    import getopt

    default_path="/usr/lib/ooo-2.0/program"
    sys.path.insert(0, default_path)

    # pyUNO Imports
    try:
    import uno
    from com.sun.star.beans import PropertyValue
    except:
    print "This Script needs to be run with the python from OpenOffice.org"
    print "Example: /opt/OpenOffice.org/program/python %s" % (
    os.path.basename(sys.argv[0]))
    print "Or you need to insert the right path at the top, where uno.py is."
    print "Default: %s" % default_path

    raise
    sys.exit(1)



    extension=None
    format=None

    def usage():
    scriptname=os.path.basename(sys.argv[0])
    print """Usage: %s [--extension pdf --format writer_pdf_Export] files
    All files or directories will be converted to HTML.

    You must start the office with this line before starting
    this script:
    soffice "-accept=socket,host=localhost,port=2002;urp;"

    If you want to export to something else, you need to use give the extension *and*
    the format.

    For a list of possible export formats see
    http://framework.openoffice.org/files/documents/25/897/filter_description.html

    or

    /opt/OpenOffice.org/share/registry/data/org/openoffice/Office/TypeDetection.xcu

    or

    grep -ri MYEXTENSION /usr/lib/ooo-2.0/share/registry/modules/org/openoffice/TypeDetection/
    the format is <node oor:name="FORMAT" ...

    Attention: Writer (.xls) needs an other export format than Writer (.doc)
    Example: calc_pdf_Export instead of writer_pdf_Export
    """ % (scriptname)

    def do_dir(dir, desktop):
    # Load File
    dir=os.path.abspath(dir)
    if os.path.isfile(dir):
    files=[dir]
    else:
    files=os.listdir(dir)
    files.sort()
    for file in files:
    if file.startswith("."):
    continue
    file=os.path.join(dir, file)
    if os.path.isdir(file):
    do_dir(file, desktop)
    else:
    do_file(file, desktop)

    def do_file(file, desktop):
    file_l=file.lower()

    global format
    if extension=="html":
    if file_l.endswith(".xls"):
    format="HTML (StarCalc)"
    elif file_l.endswith(".doc"):
    format="HTML (StarWriter)"
    else:
    print "%s: unkown extension" % file
    return

    assert(format)
    assert(extension)

    file_save="%s.%s" % (file, extension)
    properties=[]
    p=PropertyValue()
    p.Name="Hidden"
    p.Value=True
    properties.append(p)
    doc=desktop.loadComponentFromURL(
    "file://%s" % file, "_blank", 0, tuple(properties));
    if not doc:
    print "Failed to open '%s'" % file
    return
    # Save File
    properties=[]
    p=PropertyValue()
    p.Name="Overwrite"
    p.Value=True
    properties.append(p)
    p=PropertyValue()
    p.Name="FilterName"
    p.Value=format
    properties.append(p)
    p=PropertyValue()
    p.Name="Hidden"
    p.Value=True
    try:
    doc.storeToURL(
    "file://%s" % file_save, tuple(properties))
    print "Created %s" % file_save
    except ValueError:
    import sys
    import traceback
    import cStringIO
    (exc_type, exc_value, tb) = sys.exc_info()
    error_file = cStringIO.StringIO()
    traceback.print_exception(exc_type, exc_value, tb,
    file=error_file)
    stacktrace=error_file.getvalue()
    print "Failed while writing: '%s'" % file_save
    print stacktrace
    doc.dispose()

    def init_openoffice():
    # Init: Connect to running soffice process
    context = uno.getComponentContext()
    resolver=context.ServiceManager.createInstanceWithContext(
    "com.sun.star.bridge.UnoUrlResolver", context)
    try:
    ctx = resolver.resolve(
    "uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
    except:
    print "Could not connect to running openoffice."
    usage()
    sys.exit(1)
    smgr=ctx.ServiceManager
    desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop",ctx)
    return desktop

    def main():
    try:
    opts, args = getopt.getopt(sys.argv[1:], "", [
    "extension=", "format="])
    except getopt.GetoptError,e:
    print e
    usage()
    sys.exit(1)

    global extension
    global format
    for o, a in opts:
    if o=="--extension":
    extension=a
    assert(not extension.startswith("."))
    elif o=="--format":
    format=a
    else:
    raise("Internal Error, undone option: %s %s" % (
    o, a))
    if (not extension) and (not format):
    extension="html"
    elif extension and format:
    pass
    else:
    print "You need to set format and extension."
    usage()
    sys.exit(1)

    if not args:
    usage()
    sys.exit(1)

    desktop=init_openoffice()
    for file in args:
    do_dir(file, desktop)

    if __name__=="__main__":
    main()
    Thomas Guettler, Mar 23, 2006
    #2
    1. Advertising

  3. kbperry

    Duncan Booth Guest

    kbperry wrote:

    > Hi all,
    >
    > Background:
    > I need some help. I am trying to streamline a process for one of our
    > technical writers. He is using Perforce (version control system), and
    > is constantly changing his word documents, and then converts them to
    > both .pdf and "Web page" format (to publish to the web). He has a
    > licensed copy of Adobe Acrobat Professional (7.x).
    >
    > Questions:
    > Does Acrobat Pro, have some way to interface with it command-line (I
    > tried searching, but couldn't find anything)? Is there any other good
    > way to script word to pdf conversion?


    As I remember, Acrobat monitors a directory and converts anything it
    finds there, so you don't need to script Acrobat at all, just script
    printing the documents. However, it sounds as though you are talking
    about running Acrobat on a server and his license probably doesn't
    permit that.

    Alternatively use OpenOffice: it will convert word documents to
    pdf or html and can be scripted in Python.
    Duncan Booth, Mar 23, 2006
    #3
  4. kbperry

    kbperry Guest

    Thanks for the replys!

    I need to stick with Word (not my choice, but I would rather keep
    everything like he has it).

    Duncan,
    I was just trying the printing thing. When installing Adobe Acrobat,
    it installs a printer called "Adobe PDF," and I have been trying to
    print to there, but the "Save" window keeps popping up. I need to
    figure out a way to keep it in the background.
    kbperry, Mar 23, 2006
    #4
  5. kbperry

    Duncan Booth Guest

    kbperry wrote:

    > Thanks for the replys!
    >
    > I need to stick with Word (not my choice, but I would rather keep
    > everything like he has it).


    That shouldn't be a problem: you can use stick with Word for editing the
    documents and just use OpenOffice to do the conversion.

    >
    > Duncan,
    > I was just trying the printing thing. When installing Adobe Acrobat,
    > it installs a printer called "Adobe PDF," and I have been trying to
    > print to there, but the "Save" window keeps popping up. I need to
    > figure out a way to keep it in the background.
    >

    I'm afraid its a while since I used Acrobat to generate PDF files. I think
    there are configuration options to tell it to do the conversion
    automatically and not prompt you, but I can't remember where.
    Duncan Booth, Mar 23, 2006
    #5
  6. kbperry

    kbperry Guest

    Thanks again Duncan!

    I will use the OpenOffice solution as a last resort. It isn't the
    standard office suite at my corp. I would like the code to be as
    portable as possible, and it would seem like a pain in the arse to have
    the end user install OpenOffice just to run my script. Sure it would
    just be a one time deal, but I can hear the groans already.


    If you happend to come across a way to suppress the save window when
    doing print option, please let me know.
    kbperry, Mar 23, 2006
    #6
  7. ## this creates a postscript file which you can then convert to PDF
    ## using Acrobat Distiller
    ##
    ## BTW, I spent an hour trying to get this working with
    ## win32com.client.Dispatch
    ## (the save file dialog still appeared)
    ## Then I remembered win32com.client.dynamic.Dispatch
    ##
    ## Can somebody please explain why this happened using
    ## win32com.client.Dispatch?
    import win32com.client.dynamic

    if __name__ == '__main__':
    printer = "Adobe PDF on NE03:"
    docpath = r"E:\Documents and Settings\justin\Desktop\test.doc"
    pspath = r"E:\Documents and Settings\justin\Desktop\test.ps"

    word = win32com.client.dynamic.Dispatch("Word.Application")
    try:
    document = word.Documents.Open(docpath, 0, -1)
    try:
    remember = word.ActivePrinter
    word.ActivePrinter = printer
    try:
    document.PrintOut(Background=0, OutputFileName=pspath,
    PrintToFile=-1)
    finally:
    word.ActivePrinter = remember
    finally:
    document.Close(0)
    del document
    finally:
    word.Quit(0)
    del word
    Justin Ezequiel, Mar 24, 2006
    #7
  8. kbperry

    kbperry Guest

    Justin,
    While I was salivating when reading your post, it doesn't work for me,
    but I am not sure why.

    I keep getting an error:

    Titled: Adobe PDF
    "When you create a PostScript file you have to send the host fonts.
    Please go to the printer properties, "Adboe PDF Settings" page and turn
    OFF the option "Do not send fonts to Distiller".

    Keith
    www.301labs.com
    kbperry, Mar 28, 2006
    #8
  9. kbperry

    Rune Strand Guest

    kbperry wrote:
    > Questions:
    > Does Acrobat Pro, have some way to interface with it command-line (I
    > tried searching, but couldn't find anything)? Is there any other good
    > way to script word to pdf conversion?
    >
    > Note: The word documents do contain images, and lots of stuff besides
    > just text.


    The Acrobat Distiller installs (or van install) a Word VBS macro which
    allows Word to Save As .PDF. It's easy to call from Python:

    doc = "somefile.doc"
    import win32com.client

    # Create COM-object
    wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")

    wordapp.Documents.Open(doc)
    wordapp.Run("'!CreatePDFAndCloseDoc") # the name of the macro for
    Acrobat 6.0
    wordapp.ActiveDocument.Close()
    wordapp.Quit()

    You'll probably wrap this in more logic, but it works.
    Rune Strand, Mar 29, 2006
    #9
  10. "When you create a PostScript file you have to send the host fonts.
    Please go to the printer properties, "Adboe PDF Settings" page and turn
    OFF the option "Do not send fonts to Distiller".

    kbperry,

    sorry about that.
    go to "Printers and Faxes"
    go to properties for the "Adobe PDF" printer
    go to the "General" tab, "Printing Preferences" button
    "Adobe PDF Settings" tab
    uncheck the "Do not send fonts..." box



    rune,

    >> "'!CreatePDFAndCloseDoc"


    I had Adobe 6 once and recalled it had Word macros you could call.
    However, when I installed Adobe 7, I could not find the macros.
    Perhaps it has something to do with the naming.
    Thanks for the post. Will check it out.
    Justin Ezequiel, Mar 29, 2006
    #10
  11. kbperry

    kbperry Guest

    Wow...thanks again for the replies!

    I will try both of these out at work tomorrow. (I only work 3 days a
    week because of school).

    Thanks,

    Keith

    www.301labs.com
    kbperry, Mar 29, 2006
    #11
  12. kbperry

    kbperry Guest

    Justin,
    Your way appeared to work great, but now I just realized that the using
    the printer way destroys the table of contents and bookmark links.

    Rune's way would be perfect, but I don't see a macro created like that.
    I tried to create one from scratch, but it didn't work.

    I am now trying to see if there is a way to call PDFMWord.dll through a
    command-line using rundll32.exe.
    kbperry, Mar 30, 2006
    #12
  13. If you can find some API documentation for PDFMWord.dll, you can call
    its methods with the ctypes python module.

    Caleb
    Caleb Hattingh, Mar 30, 2006
    #13
  14. kbperry

    kbperry Guest

    The question is where is the API?
    kbperry, Mar 31, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sharon
    Replies:
    0
    Views:
    591
    Sharon
    Jul 27, 2005
  2. Matt
    Replies:
    3
    Views:
    484
    Tor Iver Wilhelmsen
    Sep 17, 2004
  3. Dinil Karun

    Convert Word .doc to Acrobat .pdf files

    Dinil Karun, Jun 6, 2008, in forum: Python
    Replies:
    0
    Views:
    619
    Dinil Karun
    Jun 6, 2008
  4. Martin
    Replies:
    1
    Views:
    265
    Bullschmidt
    Oct 5, 2005
  5. Domenico Discepola
    Replies:
    2
    Views:
    591
    Domenico Discepola
    Jul 16, 2004
Loading...

Share This Page