Scan Microsoft Office files

W

Will Fawcett

I am trying to put together a script that will allow me to scan
Microsoft Office files and store "keywords" for those files so they
are searchable by content not just title.

If you open a word file with Perl and look at the actual source it is
basically a text file with a bunch of bogus code. I was hoping someone
here might have heard of a module out there that can step out the
ambiguous code out and just store plain text words. Or is RegEx my
only option?

-Will
 
W

wfsp

Will Fawcett said:
I am trying to put together a script that will allow me to scan
Microsoft Office files and store "keywords" for those files so they
are searchable by content not just title.

If you open a word file with Perl and look at the actual source it is
basically a text file with a bunch of bogus code. I was hoping someone
here might have heard of a module out there that can step out the
ambiguous code out and just store plain text words. Or is RegEx my
only option?

-Will

An example:

#!/bin/perl5
use strict;
use warnings;
use Win32::OLE;

my $w = Win32::OLE->GetActiveObject('Word.Application');
my $d = $w->ActiveDocument;
my $paras = $d->Paragraphs;

foreach my $para ( in $paras ) {
my $style = $para->Style->{ NameLocal };
my $text = $para->Range->{ text };
print "$style\t$text\n"
}
Assumes Word is open and a document is open. The vba help files have all the
methods/properties. A search on Win32::OLE will bring up many
tutorials/references.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top