N
nodata
How can I extract Metadata from a range of Office documents, on a Linux box?
How can I extract Metadata from a range of Office documents, on a
Linux box?
Quoth Ben Morrow said:Quoth (e-mail address removed) (nodata):
If you have access to a win32 box over the network it wouldn't be too
hard to write a perl script for the win32 box which would receive a
document, open it in Office using Win32::OLE, save it as HTML and send
it back.
a recent version on Office on a Windows box. New (since 2k-ish) versions
of Office actually produce XML, with pretty much everything in the
original file intact (and, of course, the file is about one tenth the
size...). You can then parse this with, say, XML::LibXML and get out the
data you need.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.