Convert Word .doc to Acrobat .pdf files

K

kbperry

Hi all,

Background:
I need some help. I am trying to streamline a process for one of our
technical writers. He is using Perforce (version control system), and
is constantly changing his word documents, and then converts them to
both .pdf and "Web page" format (to publish to the web). He has a
licensed copy of Adobe Acrobat Professional (7.x).

Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?

Note: The word documents do contain images, and lots of stuff besides
just text.
 
T

Thomas Guettler

I wrote a script which uses OpenOffice. It can
convert and read a lot of formats.

#!/usr/bin/env python
#Old: !/optlocal/OpenOffice.org/program/python
# (c) 2003-2006 Thomas Guettler http://www.tbz-pariv.de/

# OpenOffice1.1 comes with its own python interpreter.
# This Script needs to be run with the python from OpenOffice1:
# /opt/OpenOffice.org/program/python
# Start the Office before connecting:
# soffice "-accept=socket,host=localhost,port=2002;urp;"
#
# With OpenOffice2 you can use the default Python-Interpreter (at least on SuSE)
#

# Python Imports
import os
import re
import sys
import getopt

default_path="/usr/lib/ooo-2.0/program"
sys.path.insert(0, default_path)

# pyUNO Imports
try:
import uno
from com.sun.star.beans import PropertyValue
except:
print "This Script needs to be run with the python from OpenOffice.org"
print "Example: /opt/OpenOffice.org/program/python %s" % (
os.path.basename(sys.argv[0]))
print "Or you need to insert the right path at the top, where uno.py is."
print "Default: %s" % default_path

raise
sys.exit(1)



extension=None
format=None

def usage():
scriptname=os.path.basename(sys.argv[0])
print """Usage: %s [--extension pdf --format writer_pdf_Export] files
All files or directories will be converted to HTML.

You must start the office with this line before starting
this script:
soffice "-accept=socket,host=localhost,port=2002;urp;"

If you want to export to something else, you need to use give the extension *and*
the format.

For a list of possible export formats see
http://framework.openoffice.org/files/documents/25/897/filter_description.html

or

/opt/OpenOffice.org/share/registry/data/org/openoffice/Office/TypeDetection.xcu

or

grep -ri MYEXTENSION /usr/lib/ooo-2.0/share/registry/modules/org/openoffice/TypeDetection/
the format is <node oor:name="FORMAT" ...

Attention: Writer (.xls) needs an other export format than Writer (.doc)
Example: calc_pdf_Export instead of writer_pdf_Export
""" % (scriptname)

def do_dir(dir, desktop):
# Load File
dir=os.path.abspath(dir)
if os.path.isfile(dir):
files=[dir]
else:
files=os.listdir(dir)
files.sort()
for file in files:
if file.startswith("."):
continue
file=os.path.join(dir, file)
if os.path.isdir(file):
do_dir(file, desktop)
else:
do_file(file, desktop)

def do_file(file, desktop):
file_l=file.lower()

global format
if extension=="html":
if file_l.endswith(".xls"):
format="HTML (StarCalc)"
elif file_l.endswith(".doc"):
format="HTML (StarWriter)"
else:
print "%s: unkown extension" % file
return

assert(format)
assert(extension)

file_save="%s.%s" % (file, extension)
properties=[]
p=PropertyValue()
p.Name="Hidden"
p.Value=True
properties.append(p)
doc=desktop.loadComponentFromURL(
"file://%s" % file, "_blank", 0, tuple(properties));
if not doc:
print "Failed to open '%s'" % file
return
# Save File
properties=[]
p=PropertyValue()
p.Name="Overwrite"
p.Value=True
properties.append(p)
p=PropertyValue()
p.Name="FilterName"
p.Value=format
properties.append(p)
p=PropertyValue()
p.Name="Hidden"
p.Value=True
try:
doc.storeToURL(
"file://%s" % file_save, tuple(properties))
print "Created %s" % file_save
except ValueError:
import sys
import traceback
import cStringIO
(exc_type, exc_value, tb) = sys.exc_info()
error_file = cStringIO.StringIO()
traceback.print_exception(exc_type, exc_value, tb,
file=error_file)
stacktrace=error_file.getvalue()
print "Failed while writing: '%s'" % file_save
print stacktrace
doc.dispose()

def init_openoffice():
# Init: Connect to running soffice process
context = uno.getComponentContext()
resolver=context.ServiceManager.createInstanceWithContext(
"com.sun.star.bridge.UnoUrlResolver", context)
try:
ctx = resolver.resolve(
"uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
except:
print "Could not connect to running openoffice."
usage()
sys.exit(1)
smgr=ctx.ServiceManager
desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop",ctx)
return desktop

def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "", [
"extension=", "format="])
except getopt.GetoptError,e:
print e
usage()
sys.exit(1)

global extension
global format
for o, a in opts:
if o=="--extension":
extension=a
assert(not extension.startswith("."))
elif o=="--format":
format=a
else:
raise("Internal Error, undone option: %s %s" % (
o, a))
if (not extension) and (not format):
extension="html"
elif extension and format:
pass
else:
print "You need to set format and extension."
usage()
sys.exit(1)

if not args:
usage()
sys.exit(1)

desktop=init_openoffice()
for file in args:
do_dir(file, desktop)

if __name__=="__main__":
main()
 
D

Duncan Booth

kbperry said:
Hi all,

Background:
I need some help. I am trying to streamline a process for one of our
technical writers. He is using Perforce (version control system), and
is constantly changing his word documents, and then converts them to
both .pdf and "Web page" format (to publish to the web). He has a
licensed copy of Adobe Acrobat Professional (7.x).

Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?

As I remember, Acrobat monitors a directory and converts anything it
finds there, so you don't need to script Acrobat at all, just script
printing the documents. However, it sounds as though you are talking
about running Acrobat on a server and his license probably doesn't
permit that.

Alternatively use OpenOffice: it will convert word documents to
pdf or html and can be scripted in Python.
 
K

kbperry

Thanks for the replys!

I need to stick with Word (not my choice, but I would rather keep
everything like he has it).

Duncan,
I was just trying the printing thing. When installing Adobe Acrobat,
it installs a printer called "Adobe PDF," and I have been trying to
print to there, but the "Save" window keeps popping up. I need to
figure out a way to keep it in the background.
 
D

Duncan Booth

kbperry said:
Thanks for the replys!

I need to stick with Word (not my choice, but I would rather keep
everything like he has it).

That shouldn't be a problem: you can use stick with Word for editing the
documents and just use OpenOffice to do the conversion.
Duncan,
I was just trying the printing thing. When installing Adobe Acrobat,
it installs a printer called "Adobe PDF," and I have been trying to
print to there, but the "Save" window keeps popping up. I need to
figure out a way to keep it in the background.
I'm afraid its a while since I used Acrobat to generate PDF files. I think
there are configuration options to tell it to do the conversion
automatically and not prompt you, but I can't remember where.
 
K

kbperry

Thanks again Duncan!

I will use the OpenOffice solution as a last resort. It isn't the
standard office suite at my corp. I would like the code to be as
portable as possible, and it would seem like a pain in the arse to have
the end user install OpenOffice just to run my script. Sure it would
just be a one time deal, but I can hear the groans already.


If you happend to come across a way to suppress the save window when
doing print option, please let me know.
 
J

Justin Ezequiel

## this creates a postscript file which you can then convert to PDF
## using Acrobat Distiller
##
## BTW, I spent an hour trying to get this working with
## win32com.client.Dispatch
## (the save file dialog still appeared)
## Then I remembered win32com.client.dynamic.Dispatch
##
## Can somebody please explain why this happened using
## win32com.client.Dispatch?
import win32com.client.dynamic

if __name__ == '__main__':
printer = "Adobe PDF on NE03:"
docpath = r"E:\Documents and Settings\justin\Desktop\test.doc"
pspath = r"E:\Documents and Settings\justin\Desktop\test.ps"

word = win32com.client.dynamic.Dispatch("Word.Application")
try:
document = word.Documents.Open(docpath, 0, -1)
try:
remember = word.ActivePrinter
word.ActivePrinter = printer
try:
document.PrintOut(Background=0, OutputFileName=pspath,
PrintToFile=-1)
finally:
word.ActivePrinter = remember
finally:
document.Close(0)
del document
finally:
word.Quit(0)
del word
 
K

kbperry

Justin,
While I was salivating when reading your post, it doesn't work for me,
but I am not sure why.

I keep getting an error:

Titled: Adobe PDF
"When you create a PostScript file you have to send the host fonts.
Please go to the printer properties, "Adboe PDF Settings" page and turn
OFF the option "Do not send fonts to Distiller".

Keith
www.301labs.com
 
R

Rune Strand

kbperry said:
Questions:
Does Acrobat Pro, have some way to interface with it command-line (I
tried searching, but couldn't find anything)? Is there any other good
way to script word to pdf conversion?

Note: The word documents do contain images, and lots of stuff besides
just text.

The Acrobat Distiller installs (or van install) a Word VBS macro which
allows Word to Save As .PDF. It's easy to call from Python:

doc = "somefile.doc"
import win32com.client

# Create COM-object
wordapp = win32com.client.gencache.EnsureDispatch("Word.Application")

wordapp.Documents.Open(doc)
wordapp.Run("'!CreatePDFAndCloseDoc") # the name of the macro for
Acrobat 6.0
wordapp.ActiveDocument.Close()
wordapp.Quit()

You'll probably wrap this in more logic, but it works.
 
J

Justin Ezequiel

"When you create a PostScript file you have to send the host fonts.
Please go to the printer properties, "Adboe PDF Settings" page and turn
OFF the option "Do not send fonts to Distiller".

kbperry,

sorry about that.
go to "Printers and Faxes"
go to properties for the "Adobe PDF" printer
go to the "General" tab, "Printing Preferences" button
"Adobe PDF Settings" tab
uncheck the "Do not send fonts..." box



rune,

I had Adobe 6 once and recalled it had Word macros you could call.
However, when I installed Adobe 7, I could not find the macros.
Perhaps it has something to do with the naming.
Thanks for the post. Will check it out.
 
K

kbperry

Justin,
Your way appeared to work great, but now I just realized that the using
the printer way destroys the table of contents and bookmark links.

Rune's way would be perfect, but I don't see a macro created like that.
I tried to create one from scratch, but it didn't work.

I am now trying to see if there is a way to call PDFMWord.dll through a
command-line using rundll32.exe.
 
C

Caleb Hattingh

If you can find some API documentation for PDFMWord.dll, you can call
its methods with the ctypes python module.

Caleb
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top