McKirahan said:
results
The following script may get you started.
Option Explicit
Const cVBS = "Word_doc.vbs"
Const cDOC = "Word_doc.doc"
Dim objMSW
Set objMSW = CreateObject("Word.Application.8")
Dim objDOC
Set objDOC = objMSW.Documents.Open(cDOC)
Dim strDOC
strDOC = objDOC.Content
objDOC.Close False
objMSW.Application.Quit True
Set objDOC = Nothing
Set objMSW = Nothing
WScript.Echo strDOC
Since the above script is slow, I might suggest that
you have a process that uses the above to preprocess
each of your MS-Word documents and stores the
result in a database or text file then use that for your
searches.
Here's a script that will process a list of MS-Word documents.
It will generate a Tab Separated Variable file with a header row of:
Document Line Text
Optionally, which can be opened up in MS-Excel for analysis
or review (via Data + Get External Data + Import Text File...).
Watch for word.wrap.
Option Explicit
'*
'* Declare Constants
'*
Const cVBS = "Document.vbs"
Const cTXT = "Document.txt"
Const cCSV = "Document.csv"
'*
'* Declare Variables
'*
Dim arrDOC
Dim intDOC
Dim strDOC
Dim intINS
Dim strOTF
Dim intTOT(1)
intTOT(0) = 0
intTOT(1) = 0
Dim strTOT
strTOT = "# Documents; ## Lines"
Dim arrTXT
Dim intTXT
Dim strTXT
'*
'* Declare Objects
'*
Dim objDOC
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objMSW
Set objMSW = CreateObject("Word.Application.8")
Dim objOTF
'*
'* Read list of databases
'*
Set objOTF = objFSO.OpenTextFile(cTXT,1)
strOTF = objOTF.ReadAll
Set objOTF = Nothing
'*
'* Documents, Lines
'*
Set objOTF = objFSO.OpenTextFile(cCSV,2,True)
objOTF.WriteLine("Document" & vbTab & "Line" & vbTab & "Text")
arrDOC = Split(strOTF,vbCrLf)
For intDOC = 0 To UBound(arrDOC)
strDOC = arrDOC(intDOC)
If InStr(LCase(strDOC),".doc") > 0 Then
intINS = InStr(strDOC,":")
If intINS > 0 Then strDOC = Mid(strDOC,intINS-1)
intTOT(0) = intTOT(0) + 1
objOTF.WriteLine(intTOT(0) & vbTab & "0" & vbTab & strDOC)
Set objDOC = objMSW.Documents.Open(strDOC)
strTXT = objDOC.Content
arrTXT = Split(strTXT,vbCr)
For intTXT = 0 To UBound(arrTXT)
If Trim(arrTXT(intTXT)) <> "" Then
intTOT(1) = intTOT(1) + 1
objOTF.WriteLine(intTOT(0) & vbTab & intTOT(1) & vbTab &
arrTXT(intTXT))
End If
Next
objDOC.Close False
objMSW.Application.Quit True
Set objDOC = Nothing
End If
Next
Set objOTF = Nothing
'*
'* Destroy Objects
'*
Set objMSW = Nothing
Set objFSO = Nothing
'*
'* Finish
'*
strTOT = Replace(strTOT,"##",FormatNumber(intTOT(1),0))
strTOT = Replace(strTOT,"#",FormatNumber(intTOT(0),0))
MsgBox strTOT,vbInformation,cVBS
The input file ("Document.txt") can be generated via the MS-DOS
command "attrib". To identify all MS-Word documents on a drive:
run the following form a Command prompt:
attrib \*.doc /s > Document.txt
Alternatively, you can just enter the filenames of the documents
that you're interested in into a text file one per line; for example:
C:\My Documents\Document1.doc
C:\My Documents\Document2.doc
An example of the output follows:
Document Line Text
1 0 C:\MYDOCU~1\Document1.doc
1 1 First line
1 2 Last line
2 0 C:\MYDOCU~1\Document2.doc
2 1 line number one
2 2 line number two
2 2 line number three
To reduce space, the document's filename is identified only once.
The filename is always on a "Line" of "0". Any questions?