Analysing Word documents (slow) What's wrong with this code please!

J

jmdeschamps

Anyone has a hint how else to get faster results?
(This is to find out what was bold in the document, in order to grab
documents ptoduced in word and generate html (web pages) and xml
(straight data) versions)

# START ========================
import win32com.client
import tkFileDialog, time

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")

myWordDoc = tkFileDialog.askopenfilename()

MSWord.Documents.Open(myWordDoc)

boldRanges=[] #list of bold ranges
boldStart = -1
boldEnd = -1
t1= time.clock()
for i in range(len(MSWord.Documents[0].Content.Text)):
if MSWord.Documents[0].Range(i,i+1).Bold : # testing for bold
property
if boldStart == -1:
boldStart=i
else:
boldEnd= i
else:
if boldEnd != -1:
boldRanges.append((boldStart,boldEnd))
boldStart= -1
boldEnd = -1
t2 = time.clock()
MSWord.Quit()

print boldRanges #see what we got
print "Analysed in ",t2-t1
# END =====================================

Thanks in advance
 
D

Daniel Dittmar

jmdeschamps said:
Anyone has a hint how else to get faster results?
(This is to find out what was bold in the document, in order to grab
documents ptoduced in word and generate html (web pages) and xml
(straight data) versions) [...]
for i in range(len(MSWord.Documents[0].Content.Text)):
if MSWord.Documents[0].Range(i,i+1).Bold : # testing for bold

Perhaps you can search for bold text. The Word search dialog allows this.
And when you use the keybord macro recording feature of Word, you can
probably figure out how to use that search feature from Python.

Daniel
 
E

Eric Brunel

jmdeschamps said:
Anyone has a hint how else to get faster results?
(This is to find out what was bold in the document, in order to grab
documents ptoduced in word and generate html (web pages) and xml
(straight data) versions)

# START ========================
import win32com.client
import tkFileDialog, time

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")

myWordDoc = tkFileDialog.askopenfilename()

MSWord.Documents.Open(myWordDoc)

boldRanges=[] #list of bold ranges
boldStart = -1
boldEnd = -1
t1= time.clock()
for i in range(len(MSWord.Documents[0].Content.Text)):
if MSWord.Documents[0].Range(i,i+1).Bold : # testing for bold
property

Vaguely knowing how pythoncom works, you'd really better avoid asking for
MSWord.Documents[0] at each loop step: pythoncom will fetch the COM objects
corresponding to all attributes and methods you ask for dynamically and it may
cost a lot of time. So doing:

doc = MSWord.Documents[0]
for i in range(len(doc.Content.text)):
if doc.Range(i,i+1).Bold: ...

may greatly improve performances.
 
J

jmdeschamps

Daniel Dittmar said:
jmdeschamps said:
Anyone has a hint how else to get faster results?
(This is to find out what was bold in the document, in order to grab
documents ptoduced in word and generate html (web pages) and xml
(straight data) versions) [...]
....
Perhaps you can search for bold text. The Word search dialog allows this.
And when you use the keybord macro recording feature of Word, you can
probably figure out how to use that search feature from Python.

Daniel

Thanks Paul Prescod suggested this also, works great!

Jean-Marc
 
J

jmdeschamps

Eric Brunel said:
jmdeschamps said:
Anyone has a hint how else to get faster results?
(This is to find out what was bold in the document, in order to grab
documents ptoduced in word and generate html (web pages) and xml
(straight data) versions)

# START ========================
import win32com.client
import tkFileDialog, time

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")

myWordDoc = tkFileDialog.askopenfilename()

MSWord.Documents.Open(myWordDoc)

boldRanges=[] #list of bold ranges
boldStart = -1
boldEnd = -1
t1= time.clock()
for i in range(len(MSWord.Documents[0].Content.Text)):
if MSWord.Documents[0].Range(i,i+1).Bold : # testing for bold
property

Vaguely knowing how pythoncom works, you'd really better avoid asking for
MSWord.Documents[0] at each loop step: pythoncom will fetch the COM objects
corresponding to all attributes and methods you ask for dynamically and it may
cost a lot of time. So doing:

doc = MSWord.Documents[0]
for i in range(len(doc.Content.text)):
if doc.Range(i,i+1).Bold: ...

may greatly improve performances.
....
Thanks, it does! And using builtin Find object also.

Jean-Marc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top