Opening MS Word files via Python

F

Fazer

Here comes another small question from me :)

I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?

Thanks,
 
R

Rob Nikander

Fazer said:
I am curious as to how I should approach this issue. I would just
want to parse simple text and maybe perhaps tables in the future.
Would I have to save the word file and open it in a text editor? That
would kind of....suck... Has anyone else tackled this issue?

The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob
 
J

jmdeschamps

Rob Nikander said:
The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob

You can use VBA documentation for Word, and using dot notation and
normal Pythonesque way of calling functions, play with its diverses
objects, methods and attributes...
Here's some pretty straightforward code along these lines:
#************************
import win32com.client
import tkFileDialog

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open(myWordDoc)
#Get the textual content
docText = MSWord.Documents[0].Content
# Get a list of tables
listTables= MSWord.Documents[0].Tables
#************************

Happy parsing,

Jean-Marc
 
F

Fazer

Rob Nikander said:
The win32 extensions for python allow you to get at the COM objects for
applications like Word, and that would let you get the text and tables.
google: win32 python.

word = win32com.client.Dispatch('Word.Application')
word.Documents.Open('C:\\myfile.doc')

But I don't know the best way to find out the methods and properties of
the "word" object.

Rob

You can use VBA documentation for Word, and using dot notation and
normal Pythonesque way of calling functions, play with its diverses
objects, methods and attributes...
Here's some pretty straightforward code along these lines:
#************************
import win32com.client
import tkFileDialog

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open(myWordDoc)
#Get the textual content
docText = MSWord.Documents[0].Content
# Get a list of tables
listTables= MSWord.Documents[0].Tables
#************************

Happy parsing,

Jean-Marc


That is Awesome! Thanks!

How would I save something in word format? I am guessing
MSWord.Docments.Save(myWordDoc) or around those lines? where can I
find more documentatin? Thanks.
 
A

anon

Fazer wrote...
How would I save something in word format? I am guessing
MSWord.Docments.Save(myWordDoc) or around those lines? where can I
find more documentatin? Thanks.



Open MS Word and press (ALT + F11), then F2
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top