pdf to HTML conversion program?

Discussion in 'HTML' started by Cliff R., Jan 31, 2004.

  1. Cliff R.

    Cliff R. Guest

    Hi, can anyone recommend a good program that converts PDF files to
    HTML? I've tried one called PDF to HTML Converter Pro but the code it
    creates isn't what I'm looking for. I really just need it to convert
    to basic HTML keeping bold, itals, paragraph breaks, etc., NOT styled
    text so the line breaks are exactly the same, etc. In this one, every
    single line has this sort of code at the beginning: <div
    id="_506:9699" style="position:absolute;top:9699;left:506"><span
    id="_11" style="font-size:11px;font-family:Helvetica;color=#000000">
    etc. so the code is huge and unnecessarily complicated.

    Any ideas of what to use to create clean, basic HTML of mostly
    text-based PDF's?

    Thanks.
     
    Cliff R., Jan 31, 2004
    #1
    1. Advertisements

  2. Cliff R. wrote:
    > Hi, can anyone recommend a good program that converts PDF files to
    > HTML?


    rm -f *.pdf
    nano foo.html
     
    Leif K-Brooks, Jan 31, 2004
    #2
    1. Advertisements

  3. Cliff R.

    Terry Guest

    Leif K-Brooks wrote:

    > Cliff R. wrote:
    >
    >> Hi, can anyone recommend a good program that converts PDF files to
    >> HTML?

    >
    >
    > rm -f *.pdf
    > nano foo.html
    >


    tsk... and he asked so politely too!
     
    Terry, Jan 31, 2004
    #3
  4. Cliff R. wrote:

    > Any ideas of what to use to create clean, basic HTML of mostly
    > text-based PDF's?


    I dunno about that, but I can go one step better. Ghostscript includes a
    tool "ps2ascii" that can convert PDF and Postscript files to plain text.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me - http://www.goddamn.co.uk/tobyink/?page=132
     
    Toby A Inkster, Jan 31, 2004
    #4
  5. Terry wrote:
    >>> Hi, can anyone recommend a good program that converts PDF files to
    >>> HTML?

    >> rm -f *.pdf
    >> nano foo.html

    > tsk... and he asked so politely too!


    It's what I would do. PDF is a (mostly?) presentational format, HTML is
    structural. Anything short of true AI won't be able to convert them well.
     
    Leif K-Brooks, Feb 1, 2004
    #5
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?d2ViZ3JlZw==?=

    PDF to HTML conversion help

    =?Utf-8?B?d2ViZ3JlZw==?=, Nov 2, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    425
    =?Utf-8?B?d2ViZ3JlZw==?=
    Nov 2, 2004
  2. =?Utf-8?B?d2ViZ3JlZ2luc2Y=?=

    PDF to HTML conversion help...

    =?Utf-8?B?d2ViZ3JlZ2luc2Y=?=, Nov 16, 2004, in forum: ASP .Net
    Replies:
    7
    Views:
    546
    Kelly White
    Dec 2, 2004
  3. WOJSAL

    conversion HTML/PDF to txt

    WOJSAL, Feb 16, 2005, in forum: Java
    Replies:
    0
    Views:
    424
    WOJSAL
    Feb 16, 2005
  4. Paul Gallion

    HTML to PDF conversion

    Paul Gallion, Oct 22, 2003, in forum: HTML
    Replies:
    8
    Views:
    1,134
    Toby A Inkster
    Oct 23, 2003
  5. Alexander Klingenstein

    .doc to html and pdf conversion with python

    Alexander Klingenstein, Oct 14, 2006, in forum: Python
    Replies:
    2
    Views:
    1,460
  6. osiceanu
    Replies:
    4
    Views:
    438
    Mark Rae [MVP]
    Feb 21, 2008
  7. Ricardo Pog
    Replies:
    1
    Views:
    780
    Austin Ziegler
    Mar 26, 2008
  8. Sean Nakasone
    Replies:
    1
    Views:
    658
    Farrel Lifson
    Apr 14, 2008
Loading...