MS Word parser

K

kenicheema

Hi all,
I'm currently using antiword to extract content from MS Word files.
Is there another way to do this without relying on any command prompt
application?
 
T

Tim Golden

Hi all,
I'm currently using antiword to extract content from MS Word files.
Is there another way to do this without relying on any command prompt
application?

Well you haven't given your environment, but is there
anything to stop you from controlling Word itself via
COM? I'm no Word expert, but looking around, this
seems to work:

<code>
import win32com.client
word = win32com.client.Dispatch ("Word.Application")
doc = word.Documents.Open ("c:/temp/temp.doc")
text = doc.Range ().Text

open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8"))
</code>

TJG
 
K

kenicheema

Well you haven't given your environment, but is there
anything to stop you from controlling Word itself via
COM? I'm no Word expert, but looking around, this
seems to work:

<code>
import win32com.client
word = win32com.client.Dispatch ("Word.Application")
doc = word.Documents.Open ("c:/temp/temp.doc")
text = doc.Range ().Text

open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8"))
</code>

TJG

Tim,
I'm on Linux (RedHat) so using Word is not an option for me. Any
other suggestions?
 
B

Ben C

Tim,
I'm on Linux (RedHat) so using Word is not an option for me. Any
other suggestions?

There is OpenOffice which has a Python API to it (called UNO). But
piping through antiword is probably easier.
 
J

Josiah Carlson

I'm currently using antiword to extract content from MS Word files.
Is there another way to do this without relying on any command prompt
application?

There is also wvware http://wvware.sourceforge.net/, but it is also
generally a command-line application. Either of these programs are open
source, so you could (with a bit of work) wrap them with Swig or Pyrex
to access them directly from Python.

- Josiah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,161
Latest member
GertrudeMa
Top