Use ASP to search text in Word docs on intranet

Discussion in 'ASP General' started by Carstonio, Aug 17, 2006.

  1. Carstonio

    Carstonio Guest

    I use ASP to display links to Word documents on an intranet. Is there a way
    in ASP to do text searches on the documents' contents? I want the results to
    have the link to the Word document plus two or three lines from the document
    that include the search terms.
    Carstonio, Aug 17, 2006
    #1
    1. Advertising

  2. Carstonio

    McKirahan Guest

    "Carstonio" <> wrote in message
    news:...
    > I use ASP to display links to Word documents on an intranet. Is there a

    way
    > in ASP to do text searches on the documents' contents? I want the results

    to
    > have the link to the Word document plus two or three lines from the

    document
    > that include the search terms.



    The following script may get you started.

    Option Explicit
    Const cVBS = "Word_doc.vbs"
    Const cDOC = "Word_doc.doc"
    Dim objMSW
    Set objMSW = CreateObject("Word.Application.8")
    Dim objDOC
    Set objDOC = objMSW.Documents.Open(cDOC)
    Dim strDOC
    strDOC = objDOC.Content
    objDOC.Close False
    objMSW.Application.Quit True
    Set objDOC = Nothing
    Set objMSW = Nothing
    WScript.Echo strDOC

    Since the above script is slow, I might suggest that
    you have a process that uses the above to preprocess
    each of your MS-Word documents and stores the
    result in a database or text file then use that for your
    searches.
    McKirahan, Aug 17, 2006
    #2
    1. Advertising

  3. Carstonio

    McKirahan Guest

    "McKirahan" <> wrote in message
    news:...
    > "Carstonio" <> wrote in message
    > news:...
    > > I use ASP to display links to Word documents on an intranet. Is there a

    > way
    > > in ASP to do text searches on the documents' contents? I want the

    results
    > to
    > > have the link to the Word document plus two or three lines from the

    > document
    > > that include the search terms.

    >
    >
    > The following script may get you started.
    >
    > Option Explicit
    > Const cVBS = "Word_doc.vbs"
    > Const cDOC = "Word_doc.doc"
    > Dim objMSW
    > Set objMSW = CreateObject("Word.Application.8")
    > Dim objDOC
    > Set objDOC = objMSW.Documents.Open(cDOC)
    > Dim strDOC
    > strDOC = objDOC.Content
    > objDOC.Close False
    > objMSW.Application.Quit True
    > Set objDOC = Nothing
    > Set objMSW = Nothing
    > WScript.Echo strDOC
    >
    > Since the above script is slow, I might suggest that
    > you have a process that uses the above to preprocess
    > each of your MS-Word documents and stores the
    > result in a database or text file then use that for your
    > searches.


    Here's a script that will process a list of MS-Word documents.

    It will generate a Tab Separated Variable file with a header row of:
    Document Line Text

    Optionally, which can be opened up in MS-Excel for analysis
    or review (via Data + Get External Data + Import Text File...).

    Watch for word.wrap.

    Option Explicit
    '*
    '* Declare Constants
    '*
    Const cVBS = "Document.vbs"
    Const cTXT = "Document.txt"
    Const cCSV = "Document.csv"
    '*
    '* Declare Variables
    '*
    Dim arrDOC
    Dim intDOC
    Dim strDOC
    Dim intINS
    Dim strOTF
    Dim intTOT(1)
    intTOT(0) = 0
    intTOT(1) = 0
    Dim strTOT
    strTOT = "# Documents; ## Lines"
    Dim arrTXT
    Dim intTXT
    Dim strTXT
    '*
    '* Declare Objects
    '*
    Dim objDOC
    Dim objFSO
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Dim objMSW
    Set objMSW = CreateObject("Word.Application.8")
    Dim objOTF
    '*
    '* Read list of databases
    '*
    Set objOTF = objFSO.OpenTextFile(cTXT,1)
    strOTF = objOTF.ReadAll
    Set objOTF = Nothing
    '*
    '* Documents, Lines
    '*
    Set objOTF = objFSO.OpenTextFile(cCSV,2,True)
    objOTF.WriteLine("Document" & vbTab & "Line" & vbTab & "Text")
    arrDOC = Split(strOTF,vbCrLf)
    For intDOC = 0 To UBound(arrDOC)
    strDOC = arrDOC(intDOC)
    If InStr(LCase(strDOC),".doc") > 0 Then
    intINS = InStr(strDOC,":")
    If intINS > 0 Then strDOC = Mid(strDOC,intINS-1)
    intTOT(0) = intTOT(0) + 1
    objOTF.WriteLine(intTOT(0) & vbTab & "0" & vbTab & strDOC)
    Set objDOC = objMSW.Documents.Open(strDOC)
    strTXT = objDOC.Content
    arrTXT = Split(strTXT,vbCr)
    For intTXT = 0 To UBound(arrTXT)
    If Trim(arrTXT(intTXT)) <> "" Then
    intTOT(1) = intTOT(1) + 1
    objOTF.WriteLine(intTOT(0) & vbTab & intTOT(1) & vbTab &
    arrTXT(intTXT))
    End If
    Next
    objDOC.Close False
    objMSW.Application.Quit True
    Set objDOC = Nothing
    End If
    Next
    Set objOTF = Nothing
    '*
    '* Destroy Objects
    '*
    Set objMSW = Nothing
    Set objFSO = Nothing
    '*
    '* Finish
    '*
    strTOT = Replace(strTOT,"##",FormatNumber(intTOT(1),0))
    strTOT = Replace(strTOT,"#",FormatNumber(intTOT(0),0))
    MsgBox strTOT,vbInformation,cVBS


    The input file ("Document.txt") can be generated via the MS-DOS
    command "attrib". To identify all MS-Word documents on a drive:
    run the following form a Command prompt:
    attrib \*.doc /s > Document.txt

    Alternatively, you can just enter the filenames of the documents
    that you're interested in into a text file one per line; for example:
    C:\My Documents\Document1.doc
    C:\My Documents\Document2.doc


    An example of the output follows:
    Document Line Text
    1 0 C:\MYDOCU~1\Document1.doc
    1 1 First line
    1 2 Last line
    2 0 C:\MYDOCU~1\Document2.doc
    2 1 line number one
    2 2 line number two
    2 2 line number three

    To reduce space, the document's filename is identified only once.
    The filename is always on a "Line" of "0". Any questions?
    McKirahan, Aug 18, 2006
    #3
  4. Carstonio

    McKirahan Guest

    "McKirahan" <> wrote in message
    news:...

    [snip]

    > If Trim(arrTXT(intTXT)) <> "" Then


    [snip]

    Logic needs to be added before the above line to remove
    unprintable characters. Let me know if you need it.
    McKirahan, Aug 18, 2006
    #4
  5. Carstonio

    Carstonio Guest

    Thanks for your help. Your scripts appear to be written in VB, as opposed to
    VBScript for an ASP-generated Web page. Our Intranet has at least 200 Word
    files that are updated every month or so.

    I like your idea of preprocessing. Is there a way to do that automatically
    when the ASP search page loads, so the page creates a new text file if the
    old file is a week old or missing?
    Carstonio, Aug 18, 2006
    #5
  6. Carstonio

    McKirahan Guest

    "Carstonio" <> wrote in message
    news:...
    > Thanks for your help. Your scripts appear to be written in VB, as opposed

    to
    > VBScript for an ASP-generated Web page. Our Intranet has at least 200 Word
    > files that are updated every month or so.
    >
    > I like your idea of preprocessing. Is there a way to do that automatically
    > when the ASP search page loads, so the page creates a new text file if the
    > old file is a week old or missing?


    VBScript is the same whether it's in a .vbs or a .asp file.
    There are are few minor differences; among them:
    a) CretaeObject is Server.CreateObject
    b) WScript.Echo (et.al). are not supported

    My suggestion was that this be done offline using a .vbs
    file rather than an .asp page though you could convert it.

    It could be scheduled nightly so the file should be current.
    McKirahan, Aug 19, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. cgia

    WORD Docs and ASP.NET

    cgia, Feb 5, 2004, in forum: ASP .Net
    Replies:
    7
    Views:
    759
  2. matthew rutherford via .NET 247

    asp.net and creating word docs on the server

    matthew rutherford via .NET 247, Apr 28, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    419
    Steve C. Orr [MVP, MCSD]
    Apr 28, 2004
  3. KYG
    Replies:
    2
    Views:
    870
    Ian Collins
    Aug 18, 2008
  4. Stéphane Wirtel
    Replies:
    0
    Views:
    165
    Stéphane Wirtel
    Apr 19, 2007
  5. Al
    Replies:
    1
    Views:
    138
    Henry Law
    Oct 16, 2005
Loading...

Share This Page