Suggestion for converting PDF files to HTML/txt files

Discussion in 'Python' started by srinivasan srinivas, Aug 11, 2008.

  1. srinivasan srinivas, Aug 11, 2008
    #1
    1. Advertising

  2. srinivasan srinivas

    brad Guest

    srinivasan srinivas wrote:
    > Could someone suggest me ways to convert PDF files to HTML files??
    > Does Python have any modules to do that job??
    >
    > Thanks,
    > Srini


    Unless there is some recent development, the answer is no, it's not
    possible. Getting text out of PDF is difficult (to say the least) and at
    times impossible... i.e. a PDF can be an image that contains some text, etc.
     
    brad, Aug 11, 2008
    #2
    1. Advertising

  3. srinivasan srinivas

    alex23 Guest

    srinivasan srinivas wrote:
    > Could someone suggest me ways to convert PDF files to HTML files??
    > Does Python have any modules to do that job??


    PDFMiner is a set of CLI tools written in Python, one of which
    converts PDF to text, HTML and more:
    http://www.unixuser.org/~euske/python/pdfminer/index.html
     
    alex23, Aug 12, 2008
    #3
  4. srinivasan srinivas

    brad Guest

    alex23 wrote:

    > PDFMiner is a set of CLI tools written in Python, one of which
    > converts PDF to text, HTML and more:
    > http://www.unixuser.org/~euske/python/pdfminer/index.html


    Very neat program. Would be cool if it could easily integrate into other
    py apps instead of being a standalone CLI tool.
     
    brad, Aug 12, 2008
    #4
  5. srinivasan srinivas

    alex23 Guest

    On Aug 12, 11:13 pm, brad <> wrote:
    > Very neat program. Would be cool if it could easily integrate into other
    > py apps instead of being a standalone CLI tool.


    Perhaps, but I think you could get a long way using os.system().
     
    alex23, Aug 12, 2008
    #5
  6. srinivasan srinivas

    brad Guest

    alex23 wrote:
    > On Aug 12, 11:13 pm, brad <> wrote:
    >> Very neat program. Would be cool if it could easily integrate into other
    >> py apps instead of being a standalone CLI tool.

    >
    > Perhaps, but I think you could get a long way using os.system().


    Yes, that is possible, but there's a lot of overhead when doing that...
    unfortunately. Also, if using os.system() is the answer, then one could
    just use the xpdf pdftotext program. A native Python solution that could
    be called from other PY apps naturally, would be awesome.
     
    brad, Aug 12, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Davor
    Replies:
    4
    Views:
    479
    Davor
    Jun 14, 2006
  2. Sameen
    Replies:
    2
    Views:
    463
    Victor Bazarov
    Aug 29, 2005
  3. Replies:
    3
    Views:
    844
    Jayakrishnan
    May 14, 2008
  4. Jochen Brenzlinger
    Replies:
    7
    Views:
    5,771
    Roedy Green
    Sep 15, 2011
  5. middletree

    converting txt to pdf

    middletree, Jul 9, 2004, in forum: ASP General
    Replies:
    2
    Views:
    104
    middletree
    Jul 9, 2004
Loading...

Share This Page