osiceanu said:
Hello,
I have a asp.net application storing pdf files and word documents into
db. The problem appears when trying to show a preview of a document on
the aspx page. That is converting the document to html or text. Is
there a method for doing it? Keeping the images in the document or the
format of the document is not necessary.
If it is not possible, maybe an image preview of the document (i.e.
the first page of it) is more suitable and easier.
Thanks in advance!
Perhaps not an "optimal" solution in terms of resource usage on the
server, but could you use the Office 2007 COM objects for this?
A PDF document you can easily embed into a page.
A Word document you could, on the server, load into the Word
application, save as a temporary pdf file, and then embed that into the
page.
If resource usage on the server will take a hit, you could tag new
documents in the database "must be rendered to pdf", and then run a job
at intervals that does the same, ie. loads up the word document into
Word, save as pdf, and then uploads the pdf to the database as an
alternate representation of the word document.
You mention that you want to convert it to html or text. Is this a
must-have criteria? Because if you need that you need to either have a
server-component that can output html from pdf and word (Word 2007 can
do this from the word file), or you need to do a similar interval-based
rendering of the files to html.
3rd party class libraries exists that does either, and while I don't
know the current state of pdf libraries that would fit, I do know that
the only way to support all the features of the word application is by
using word itself.
As for only showing the text, you can then probably use such 3rd party
libraries, TX Text Control can be used to grab the text from a word
file, and there are probably similar things for pdf, but do know that
pdf is a format suitable for printing, I've seen badly formed pdf files
that consists of words on a page, but the words are not actually put on
a page on a per line per sentence basis, more like just thrown onto the
page in the right spots, grabbing the text from such a document would
most likely not look good.