using python to edit a word file?

J

John Salerno

I figured my first step is to install the win32 extension, which I did,
but I can't seem to find any documentation for it. A couple of the links
on Mark Hammond's site don't seem to work.

Anyway, all I need to do is search in the Word document for certain
strings and either delete them or replace them. Easy enough, if only I
knew which function, etc. to use.

Hope someone can push me in the right direction.

Thanks.
 
R

Rob Wolfe

John Salerno said:
I figured my first step is to install the win32 extension, which I
did, but I can't seem to find any documentation for it. A couple of
the links on Mark Hammond's site don't seem to work.

Anyway, all I need to do is search in the Word document for certain
strings and either delete them or replace them. Easy enough, if only I
knew which function, etc. to use.

Hope someone can push me in the right direction.

Maybe this will be helpful:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/279003
 
J

John Salerno

Rob said:

But if I save the file to text, won't it lose its formatting?

More specifically, here's what I have: a four-page calendar, each page
with three months on it. The months are in tables, which is why I don't
think making a text file will help me here, because I'll lose all that.
What I need to do is renumber all the dates, basically replacing a
number with itself minus 1. So it's not a simple find/replace task, and
there doesn't seem to be a way to do this in Word's find/replace feature
(but if there is, please let me know!)
 
G

Gerhard Fiedler

I figured my first step is to install the win32 extension, which I did,
but I can't seem to find any documentation for it. A couple of the links
on Mark Hammond's site don't seem to work.

Anyway, all I need to do is search in the Word document for certain
strings and either delete them or replace them. Easy enough, if only I
knew which function, etc. to use.

Hope someone can push me in the right direction.

When Word is installed, you have a few COM interfaces to Word. I'm not sure
how to access these with Python (but documentation about using COM with
Python should help you here), and I'm not sure whether what you want is
available (but the Word COM documentation should help you with that).

Gerhard
 
J

John Salerno

John said:
But if I save the file to text, won't it lose its formatting?

It looks like I can save it as an XML file and it will retain all the
formatting. Now I just need to decipher where the dates are in all that
mess and replace them, just using a normal text file! :)
 
J

John Henry

John said:
I figured my first step is to install the win32 extension, which I did,
but I can't seem to find any documentation for it. A couple of the links
on Mark Hammond's site don't seem to work.

Anyway, all I need to do is search in the Word document for certain
strings and either delete them or replace them. Easy enough, if only I
knew which function, etc. to use.

Hope someone can push me in the right direction.

Thanks.

The easiest way for me to do things like this is to do it in Word and
record a VB Macro. For instance you will see something like this:

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "save it"
.Replacement.Text = "dont save it"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchByte = False
.CorrectHangulEndings = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = False
.MatchFuzzy = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

and then hand translate it to Win32 Python, like:

wordApp = Dispatch("Word.Application")
wordDoc=wordApp.Documents.Add(...some word file name...)
wordRange=wordDoc.Range(0,0).Select()
sel=wordApp.Selection
sel.Find.ClearFormatting()
sel.Find.Replacement.ClearFormatting()
sel.Find.Text = "save it"
sel.Find.Replacement.Text = "dont save it"
sel.Find.Forward = True
sel.Find.Wrap = constants.wdFindContinue
sel.Find.Format = False
sel.Find.MatchCase = False
sel.Find.MatchWholeWord = False
sel.Find.MatchByte = False
sel.Find.CorrectHangulEndings = False
sel.Find.MatchAllWordForms = False
sel.Find.MatchSoundsLike = False
sel.Find.MatchWildcards = False
sel.Find.MatchFuzzy = False
sel.Find.Find.Execute(Replace=constants.wdReplaceAll)
wordDoc.SaveAs(...some word file name...)

Can't say that this works as I typed because I haven't try it myself
but should give you a good start.

Make sure you run the makepy.py program in the
\python23\lib\site-packages\win32com\client directory and install the
"MS Word 11.0 Object Library (8.3)" (or something equivalent). On my
computers, this is not installed automatically and I have to remember
to do it myself or else things won't work.

Good Luck.
 
A

Anthra Norell

John,

I have a notion about translating stuff in a mess and could help you with the translation. But it may be that the conversion
from DOC to formatted test is a bigger problem. Loading the files into Word and saving them in a different format may not be a
practical option if you have many file to do. Googling for batch converters DOC to RTF I couldn't find anything.
If you can solve the conversion problem, pass me a sample file. I'll solve the translation problem for you.

Frederic


----- Original Message -----
From: "John Salerno" <[email protected]>
Newsgroups: comp.lang.python
To: <[email protected]>
Sent: Thursday, August 10, 2006 9:08 PM
Subject: Re: using python to edit a word file?
 
J

John Salerno

Anthra said:
John,

I have a notion about translating stuff in a mess and could help you with the translation. But it may be that the conversion
from DOC to formatted test is a bigger problem. Loading the files into Word and saving them in a different format may not be a
practical option if you have many file to do. Googling for batch converters DOC to RTF I couldn't find anything.
If you can solve the conversion problem, pass me a sample file. I'll solve the translation problem for you.

Frederic

What I ended up doing was just saving the Word file as an XML file, and
then writing a little script to process the text file. Then when it
opens back in Word, all the formatting remains. The script isn't ideal,
but it did the bulk of changing the numbers, and then I did a few things
by hand. I love having Python for these chores! :)



import re

xml_file = open('calendar.xml')
xml_data = xml_file.read()
xml_file.close()

pattern = re.compile(r'<w:t>(\d+)</w:t>')

def subtract(match_obj):
date = int(match_obj.group(1)) - 1
return '<w:t>%s</w:t>' % date

new_data = re.sub(pattern, subtract, xml_data)

new_file = open('calendar2007.xml', 'w')
new_file.write(new_data)
new_file.close()
 
A

Anthra Norell

No one could do it any better. Good for you! - Frederic

----- Original Message -----
From: "John Salerno" <[email protected]>
Newsgroups: comp.lang.python
To: <[email protected]>
Sent: Friday, August 11, 2006 4:08 PM
Subject: Re: using python to edit a word file?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top