Suggestion for converting PDF files to HTML/txt files

srinivasan srinivas · Aug 11, 2008

Could someone suggest me ways to convert PDF files to HTML files??
Does Python have any modules to do that job??

Thanks,
Srini

Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/

brad · Aug 11, 2008

srinivasan said:
Could someone suggest me ways to convert PDF files to HTML files??
Does Python have any modules to do that job??

Thanks,
Srini

Unless there is some recent development, the answer is no, it's not
possible. Getting text out of PDF is difficult (to say the least) and at
times impossible... i.e. a PDF can be an image that contains some text, etc.

alex23 · Aug 12, 2008

srinivasan said:
Could someone suggest me ways to convert PDF files to HTML files??
Does Python have any modules to do that job??

PDFMiner is a set of CLI tools written in Python, one of which
converts PDF to text, HTML and more:
http://www.unixuser.org/~euske/python/pdfminer/index.html

brad · Aug 12, 2008

alex23 said:
PDFMiner is a set of CLI tools written in Python, one of which
converts PDF to text, HTML and more:
http://www.unixuser.org/~euske/python/pdfminer/index.html

Very neat program. Would be cool if it could easily integrate into other
py apps instead of being a standalone CLI tool.

alex23 · Aug 12, 2008

Very neat program. Would be cool if it could easily integrate into other
py apps instead of being a standalone CLI tool.

Perhaps, but I think you could get a long way using os.system().

brad · Aug 12, 2008

alex23 said:
Perhaps, but I think you could get a long way using os.system().

Yes, that is possible, but there's a lot of overhead when doing that...
unfortunately. Also, if using os.system() is the answer, then one could
just use the xpdf pdftotext program. A native Python solution that could
be called from other PY apps naturally, would be awesome.

Whats the best approach for converting OST to PST files?	5	Feb 10, 2025
How to export Outlook PST files to a different format?	5	Dec 31, 2024
Can EML files be converted to PST?	2	Dec 26, 2024
How to export MBOX files into other formats?	4	Feb 10, 2025
Python one-liner??	2	Aug 21, 2008
What are the steps to convert Outlook PST files to various formats?	6	Dec 26, 2024
How do I change OST files into PST format?	4	Dec 26, 2024
How to convert MBOX to HTML for email backup?	1	Mar 7, 2026

Suggestion for converting PDF files to HTML/txt files

srinivasan srinivas

brad

alex23

brad

alex23

brad

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads