Doc to PDF using Perl

S

Scott R. Prelewicz

Can anyone suggest a good solution to convert MS Word docs to PDF
programatically, using perl?

I tried installing and using abiword but I am having too many issues with
the program, and I am looking for a different solution.
I searched CPAN and can only find Win32 mosules for doing something close,
but I'm not on windows.

I am using FreeBSD. The .docs are uploaded resumes, and the client wants
these to be converted to PDF automatically and stored on the server.

Thanks in advanced,

Scott
 
B

Ben Morrow

Quoth "Scott R. Prelewicz said:
Can anyone suggest a good solution to convert MS Word docs to PDF
programatically, using perl?

I tried installing and using abiword but I am having too many issues with
the program, and I am looking for a different solution.
I searched CPAN and can only find Win32 mosules for doing something close,
but I'm not on windows.

I don't know of any program which can read Word files except Word (this
is one of the main reasons I hate M$ :) ). If you have access to a Win32
machine Word is (relatively) easy to script using Win32::OLE. Otherwise,
find some program that can read the files decently and we can probably
find a way for you to drive it in Perl (that is, the currently hard part
of your problem is not a Perl problem... sorry).

Ben
 
A

Alan J. Flavell

I don't know of any program which can read Word files except Word

Hmmm. I've several times encountered MS Word files which could not be
read by the then-current version of MS Word, but were happily read by
openoffice.
(this is one of the main reasons I hate M$ :) ).

....and all those who insist on putting their substantive content into
MS Word email attachments, without offering any clue to its content in
the Subject header (which typically reads "Important", "Please read",
or similar vacuous sentiments), nor in their covering plaintext note
(which typically contains instructions on how to read MS Word
documents, implying that the misbegotten sender is already aware that
there is a problem, but hasn't got the sense to solve that problem)...

If you have access to a Win32 machine Word is (relatively) easy to
script using Win32::OLE.

Indeed. MS recommend RTF as a more portable format, by the way.

all the best
 
D

Dr.Ruud

Scott R. Prelewicz schreef:
Can anyone suggest a good solution to convert MS Word docs to PDF
programatically, using perl?

Why use perl? I would use a PostScript-printer that prints to file, and
then ps2pdf.
 
B

Ben Morrow

Quoth "Alan J. Flavell said:
Hmmm. I've several times encountered MS Word files which could not be
read by the then-current version of MS Word, but were happily read by
openoffice.

Fair enough... I presume you mean old versions of Word files in newer
versions of Word? I was assuming the files were all from Word 2k/2k+3,
as I don't know of anyone who doesn't use one of those two now.
<rant snipped>
:)


Indeed. MS recommend RTF as a more portable format, by the way.

They do; but as of Word2k I would recommend Word's 'HTML' export, which
(while it is not HTML, or XHTML) is a perfectly decent XML format
(...plus M$ conditional comments, IIRC, but those can be stripped fairly
easily, or ignored) which preserves all but everything in the original
Word document.

I read the OP as having a collection of Word files on a Unix machine and
no way to proceed from there... not a situation I envy.

Ben
 
A

Alan J. Flavell

Fair enough... I presume you mean old versions of Word files in
newer versions of Word?
Right.

I was assuming the files were all from Word 2k/2k+3,

We have at least one academic who insists on using an obsolete Mac
version of MS Word on his obsolete Mac...

As far as he's concerned, it ain't broke, so he sees no reason to fix
it. His usage of Mac Symbol fonts produces some quite exotic displays
on Windows.[0]
They do; but as of Word2k I would recommend Word's 'HTML' export,
which (while it is not HTML, or XHTML) is a perfectly decent XML
format (...plus M$ conditional comments, IIRC, but those can be
stripped fairly easily, or ignored) which preserves all but
everything in the original Word document.

Fair comment. Pity they try to call it HTML, though.
I read the OP as having a collection of Word files on a Unix machine
and no way to proceed from there... not a situation I envy.

In earlier times, I would pipe the .doc attachment to "strings"[1]

But with current Word versions, it doesn't seem to work at all.

Sorry, this is drifting offtopic - I guess I should stop.

best regards

[0] Word still seems to make no attempt to replace Symbol font
characters by their proper Unicode equivalents, meaning that their
resulting HTML is rubbish on specification-conforming browsers. But I
guess you've seen me ranting about that before (since around 1997-8,
at least!).

[1] which sometimes had the side effect of revealing interesting
snippets that they had deleted before sending the attachment,
 
D

David Squire

Alan said:
I read the OP as having a collection of Word files on a Unix machine
and no way to proceed from there... not a situation I envy.

In earlier times, I would pipe the .doc attachment to "strings"[1]

But with current Word versions, it doesn't seem to work at all.

I use antiword by Adri van Os, which seems to cope just fine with all
..doc files I have thrown at it.

DS
 
B

Brad Baxter

Alan said:
...and all those who insist on putting their substantive content into
MS Word email attachments, without offering any clue to its content in
the Subject header (which typically reads "Important", "Please read",
or similar vacuous sentiments), nor in their covering plaintext note
(which typically contains instructions on how to read MS Word
documents, implying that the misbegotten sender is already aware that
there is a problem, but hasn't got the sense to solve that problem)...

</rant>

<emphatic>Hear hear!</emphatic> And it sends me through the
roof to discover that the document contains a single paragraph.
The day I saw Google mail's "View as HTML" I did the Herman
Munster hop (high fives not being my thing).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top