How to convert MS doc to plain text using Perl on unix

A

A. Sinan Unur

If you're on windows and have word, Win32::OLE

Once again, the perils of putting your entire question in the subject
line are demonstrated.

The OP needs this on Unix.

One alternative is to take a look at word2x (google for it).

On the other hand, if all one wants to is, say, to index contents of a
Word file, the following would work to a certain extent:

#! /usr/bin/perl

use strict;
use warnings;

use File::Slurp;

my $word_file = shift;
my $doc = read_file($word_file, binmode => ':raw');

$doc =~ s/[^\015\012\011\040-\176]//g;
write_file(\*STDOUT, $doc);

__END__

Sinan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top