Is there a MS Office to PDF conversion library

E

Eeby

My boss asked me to research this question. He wants me to write a
script / program that will convert a directory of Word, Excel, and
PowerPoint documents into PDFs. I did some Google searching and found
this:

http://www.activepdf.com/

However it is Windows-only. We have Linux servers.

Does anyone know of a Java library that I could use for this? Or any
library in any language? PHP? Perl?

Any help or advice would be greatly appreciated.

Thanks,

E
 
T

Thomas Kellerer

Eeby, 14.01.2008 14:31:
My boss asked me to research this question. He wants me to write a
script / program that will convert a directory of Word, Excel, and
PowerPoint documents into PDFs. I did some Google searching and found
this:

http://www.activepdf.com/

However it is Windows-only. We have Linux servers.

Does anyone know of a Java library that I could use for this? Or any
library in any language? PHP? Perl?

Any help or advice would be greatly appreciated.

OpenOffice can generate PDF, can read MS Office and has an integration
with Java. Maybe that could be a way for you.

Thomas
 
A

Andrew Thompson

My boss asked me to research this question.

Did you dare to ask the point of this exercise?
(Or are you just taking the money?*)
..He wants me to write a
script / program that will convert a directory of Word, Excel, and
PowerPoint documents into PDFs.

That seems relatively pointless and stupid.
- About all that PDFs are good for is page layout.
- Few 'something else' -> PDF converers will do
any intelligent thing with the page layout in
the conversion process.
- If you can get a program that parses and reads
the documents, you might as well just dump them
direct to printer, without the file clutter of
ever creating the PDF.

Which brings me back to..

What is the point of this exercise?
(* And no - you ain't payin' me enough for me
to 'settle for the money - no questions asked'.)
 
A

AL

Andrew said:
Did you dare to ask the point of this exercise?
(Or are you just taking the money?*)


I'm curious why you would consider this any of your business?
Maybe the OP's boss is one of those guys who is always thinking and
wondering about stuff like, "gee, I wonder if there's a way to..., hey
OP, how 'bout checking something out for me..." Once upon a time I had
a boss like that and the diversity of assignments was incredibly
satisfying, and educational. So, I guess *your* response would be, "go
to hell, you don't pay me enough to do that crap without a 30 page
RFI..." Oh, what a stellar employee you must be.

That seems relatively pointless and stupid.

As does your response...



- About all that PDFs are good for is page layout.

What about sharing documents with others without having to consider
which version of MS Office they may be running or whether they even have
Office running or whether their version of Open Office can read the
newest Word document? What if the "boss" is planning to publish these
documents on a website - wouldn't PDF be a preferred format for
downloading?

http://www.adobe.com/products/acrobat/adobepdf.html

- Few 'something else' -> PDF converers will do
any intelligent thing with the page layout in
the conversion process.


The OP didn't indicate that "any intelligent thing" was required - just
conversion.




- If you can get a program that parses and reads
the documents, you might as well just dump them
direct to printer, without the file clutter of
ever creating the PDF.


The OP didn't indicate printing to be the primary objective.


Which brings me back to..

What is the point of this exercise?
(* And no - you ain't payin' me enough for me
to 'settle for the money - no questions asked'.)




Which leads me to wonder, what was the point of your response???

It may be that the OP asked a legitimate question you didn't have a clue
how to answer (intelligently) so you chose to slap them around. Once
upon a time I had a boss like that too - we had a name for him, bet I
can guess yours...

AL
 
M

Martin Gregorie

AL said:
What if the "boss" is planning to publish these
documents on a website - wouldn't PDF be a preferred format for
downloading?
No, not unless there's a requirement to make the document somewhat
unmodifiable: even a PDF can be cracked into and changed if you're
determined enough.

HTML is better. Its smaller and faster to load, even it its MS Office
generated HTML. Save the same document as a PDF and as HTML. Compare the
file sizes with each other and with the original MS Office document.
HTML < MS Office doc < PDF.
 
J

Joshua Cranmer

AL said:
I'm curious why you would consider this any of your business?

There is an implicit requirement on Usenet--we, the responders, have
full rights to criticize the methodology of any poster.

It also happens that, fairly often, the root problem can be more easily
solved by a different methodology than the OP wants to use. This comes
up quite frequently in the case of reflection in Java: most of the time,
the best answer is to use something else.
Maybe the OP's boss is one of those guys who is always thinking and
wondering about stuff like, "gee, I wonder if there's a way to..., hey
OP, how 'bout checking something out for me..." Once upon a time I had
a boss like that and the diversity of assignments was incredibly
satisfying, and educational.

Is this the case with the OP right now?

For future reference, we only know what you tell us about the problem,
and must therefore assume the rest. The proper response for "I need XXX
to be done in YYY way" is going to be different than "Is YYY a suitable
way to do XXX?"
What about sharing documents with others without having to consider
which version of MS Office they may be running or whether they even have
Office running or whether their version of Open Office can read the
newest Word document?

I would recommend RTFs, but OOo tends to quickly munge these documents.
In general, a Word 95 document should be supported by anyone who cares.
Hell, MS even has the reference for one of its early Word file formats!
Which leads me to wonder, what was the point of your response???

To point out that there might be other means to solve the unstated core
problem than the way the OP has asked for.

I read once in a guideline for asking questions that the second of these
two questions is preferred:

"Hi, I think I have a hairline crack on my motherboard; how would I check?"

"Hi, I am having a problem with my computer. I am getting random memory
errors, [etc.]. What may be causing these problems, and how would I check?"

The question the OP asked was in the style of the former, that is,
assuming the answer and asking it. I suspect that Andrew was attempting
to glean the sort of information provided in the latter style.
 
J

Joshua Cranmer

Martin said:
No, not unless there's a requirement to make the document somewhat
unmodifiable: even a PDF can be cracked into and changed if you're
determined enough.

HTML is better. Its smaller and faster to load, even it its MS Office
generated HTML. Save the same document as a PDF and as HTML. Compare the
file sizes with each other and with the original MS Office document.
HTML < MS Office doc < PDF.

PDF is an extremely rigid, final-proof-centric format. HTML is extremely
loose and, even taking into account CSS through all current WDs (and
thus exiting the world of even niche-browser support), resistant to
certain concepts like pagination and final format designs.
 
A

Andrew Thompson

PDF is an extremely rigid, final-proof-centric format. HTML is extremely
loose and, even taking into account CSS through all current WDs
...

WD? That's a new one on me!

War Department? Word Disparity? Will Dated? ..What?
 
A

AL

Joshua said:
AL wrote:
There is an implicit requirement on Usenet--we, the responders, have
full rights to criticize the methodology of any poster.


I recognize that right and freely exercise it myself.





Is this the case with the OP right now?


The OP's exact circumstances are not known, so the sarcasm about "just
taking the money" is pointless.




For future reference, we only know what you tell us about the problem,
and must therefore assume the rest. The proper response for "I need XXX
to be done in YYY way" is going to be different than "Is YYY a suitable
way to do XXX?"

No argument there. However, in the event the boss has already determined
this to be the suitable way, (shall we also interrogate the boss to
determine his/her qualifications to make that determination?), the OP's
assignment is to find out how to get it done - some assignments are like
that.





I would recommend RTFs, but OOo tends to quickly munge these documents.

You just identified an incompatibility that PDF's avoid.



In general, a Word 95 document should be supported by anyone who cares.

So, your advice is put it out there in that format and damn those who
"don't care" ? I can see the OP going back to the boss saying "just
put it out there in Word, Excel, Powerpoint format and f*** 'em if they
can't take a joke."






To point out that there might be other means to solve the unstated core
problem than the way the OP has asked for.


Why can't it be accepted that *maybe* the alternatives have been weighed
and this is what the client needs?






I read once in a guideline for asking questions that the second of these
two questions is preferred:

"Hi, I think I have a hairline crack on my motherboard; how would I check?"

"Hi, I am having a problem with my computer. I am getting random memory
errors, [etc.]. What may be causing these problems, and how would I check?"



Or maybe, "Hi, I've diagnosed a problem with my computer and determined
I need a new motherboard, can you advise me the best way to replace it?"


AL
 
E

Eeby

Thanks for the replies. That's very helpful. The reason I'm asked to
research PDF conversion: the organization I work for posts documents
on its website in MS Office formats. Management would like to post
PDFs instead.

E
 
A

AL

Eeby said:
Thanks for the replies. That's very helpful. The reason I'm asked to
research PDF conversion: the organization I work for posts documents
on its website in MS Office formats. Management would like to post
PDFs instead.

E


FWIW, I agree with management.

AL
 
A

Andrew Thompson

Thanks for the replies. That's very helpful. The reason I'm asked to
research PDF conversion: the organization I work for posts documents
on its website in MS Office formats. Management would like to post
PDFs instead.

That does not explain *why*.

Why would management prefer to put PDFs
(which are higher bandwidth than the
equivalent MS Doc.) on the site?
 
S

Steve Sobol

Why would management prefer to put PDFs
(which are higher bandwidth than the
equivalent MS Doc.) on the site?

So people without Microsoft Office can read them. Yes, MS has free Office
document viewers, but plenty of people already have Acrobat Reader or another
PDF viewer installed. Plus, if you don't run Windows you may be SOL if you
need to view the document (maybe, maybe not on a Mac, definitely on other
platforms). PDF is pretty ubiquitous and viewers are available for every
common computing platform.
 
A

Arne Vajhøj

AL said:
I'm curious why you would consider this any of your business?

If the OP want help with no questions asked then he should
hire a consultant for 100 USD/h (or whatever).

If the OP want free help he will have to accept that people
will ask question - maybe to better understand the problem, maybe
because they have a similar problem, maybe because they are
just curious.
Maybe the OP's boss is one of those guys who is always thinking and
wondering about stuff like, "gee, I wonder if there's a way to..., hey
OP, how 'bout checking something out for me..." Once upon a time I had
a boss like that and the diversity of assignments was incredibly
satisfying, and educational. So, I guess *your* response would be, "go
to hell, you don't pay me enough to do that crap without a 30 page
RFI..." Oh, what a stellar employee you must be.

The lack of applicability of your analogy to the situation here'
says a bit about you as an employee.

Arne
 
A

Arne Vajhøj

Andrew said:
That does not explain *why*.

Why would management prefer to put PDFs
(which are higher bandwidth than the
equivalent MS Doc.) on the site?

There are 3 good reasons to put PDF's instead of DOC's up:

1) readonly (not fully true, but it does not open up
in a program capable of modifying it)
2) in general works better on non-Windows platforms
3) does not contain "extra information" (*)

Arne

*) There were a little incident in Denmark a couple of years ago
where the prime minister send a speech to the press in DOC format.
And the press looked at the document and could see that the DOC
originally came from a man working in an industrial association. The
IT department decided that all future speeches send to the press
would be in PDF format.
 
A

Arne Vajhøj

Eeby said:
My boss asked me to research this question. He wants me to write a
script / program that will convert a directory of Word, Excel, and
PowerPoint documents into PDFs. I did some Google searching and found
this:

http://www.activepdf.com/

However it is Windows-only. We have Linux servers.

Does anyone know of a Java library that I could use for this? Or any
library in any language? PHP? Perl?

I would go for whatever Microsoft and Adobe has to do this.

Sure you can find a Perl script somewhere that can convert
95% of the docs to readable but not very good looking PDF.
And it will break with the next Word version. And the author
is no longer maintaining it.

Arne
 
L

Lew

Arne said:
3) does not contain "extra information" (*)

Arne

*) There were a little incident in Denmark a couple of years ago
where the prime minister send a speech to the press in DOC format.
And the press looked at the document and could see that the DOC
originally came from a man working in an industrial association. The
IT department decided that all future speeches send to the press
would be in PDF format.

Curious, that the Microsoft format would actually provide more transparency
and greater knowledge of other's attempts to obfuscate than another format.

I suspect OpenOffice docs would have that advantage over PDF as well.

Perhaps the Danes should demand that their leaders publish only in formats
that provide such "extra information". Shoot, I'd love it if we could
identify those for whom our politicians are mouthpieces where I live, too.
 
L

Lew

Arne said:
If the OP want help with no questions asked then he should
hire a consultant for 100 USD/h (or whatever).

If the OP want free help he will have to accept that people
will ask question - maybe to better understand the problem, maybe
because they have a similar problem, maybe because they are
just curious.

One thing about free advice - no matter how bad it is, it's worth what you
paid for it.

If one doesn't like Andrew's or anyone else's answers here, they're welcome to
demand a refund of what they paid for them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top