Re: Using python to convert PDF document to MSWord documents

Discussion in 'Python' started by Timothy Grant, Sep 28, 2004.

  1. ----- Original Message -----
    From: JEET <>
    Date: Tue, 28 Sep 2004 17:13:17 +0100 (BST)
    Subject: Using python to convert PDF document to MSWord documents
    To:




    Hello All,

    Can anyone please suggest me if there any python modules available to
    convert PDF document to MSWord documents. If not then can you please
    suggest how can i acheive this.

    Many thanks in advance,

    Regards
    Deb

    ======

    What you ask is quite difficult. My understanding is that PDF files
    are simply Postscript files with some special wrapping. Depending on
    the nature of the PDF (is it encrypted, are there other special
    provisions?) you may be able to strip the raw text from the file and
    create and RTF file from it. However you will lose all formatting in
    this case. If the formatting is "standard" across all the PDFs you may
    be able to infer from the text something that will allow you to
    replace some or all of it.







    --
    Stand Fast,
    tjg.
     
    Timothy Grant, Sep 28, 2004
    #1
    1. Advertising

  2. > From: JEET <>
    > Can anyone please suggest me if there any python modules available to
    > convert PDF document to MSWord documents. If not then can you please
    > suggest how can i acheive this.


    No python modules, but:
    - feeding the subject line to google brings some sponsored links that
    claim to solve your problem
    - http://www.quiss.org/swftools/ has a tool to convert PDF to Flash, so
    there must be some code to detect Text, Fonts etc.

    Daniel
     
    Daniel Dittmar, Sep 28, 2004
    #2
    1. Advertising

  3. In article <>,
    Timothy Grant <> wrote:
    .
    .
    .
    >Can anyone please suggest me if there any python modules available to
    >convert PDF document to MSWord documents. If not then can you please
    >suggest how can i acheive this.

    .
    .
    .
    <URL: http://phaseit.net/claird/comp.text.pdf/PDF_converters.html >
     
    Cameron Laird, Sep 28, 2004
    #3
  4. >> From: JEET <>
    >> Can anyone please suggest me if there any python modules available to
    >> convert PDF document to MSWord documents. If not then can you please
    >> suggest how can i acheive this.

    >
    > No python modules, but:
    > - feeding the subject line to google brings some sponsored links that
    > claim to solve your problem
    > - http://www.quiss.org/swftools/ has a tool to convert PDF to Flash,
    > so there must be some code to detect Text, Fonts etc.
    >


    Pdf2swf is based on xpdf (http://www.foolabs.com/xpdf).
    Another tool, that is also based on xpdf, is pdftohtml
    (http://pdftohtml.sourceforge.net/). It can convert pdf to html (using
    absolute CSS positioning) or to xml. I don't know if there is any rtf
    or Word writers in Python, but in the previous VB life I programmed a
    simple Word macro that would open HTML page and save it as .doc
    document. It was the most easy way to get all images embedded and
    formatting correctly done. Don't know, however, how it will handle
    absolute positioning.

    Another possible option is to convert PDF to PS format, and than use
    pstoedit (http://www.pstoedit.net/pstoedit) with shareware RTF plugin
    mentioned on that page. Don't have any experience with this option.

    Ksenia.
     
    Ksenia Marasanova, Sep 28, 2004
    #4
  5. Timothy Grant

    Jan Gregor Guest

    > Can anyone please suggest me if there any python modules available to
    > convert PDF document to MSWord documents. If not then can you please
    > suggest how can i acheive this.


    I think that there's no specification of doc format. Pdf and doc are also
    different class of formats. So you can extract text (with ghostscript
    frontend ps2ascii and hope in right encoding), and pictures. Typesetting
    of word document is your work.

    Maybe conversion pdf to html and import of html to word can be better
    way - but again, you go from stronger language to weaker.


    Jan
     
    Jan Gregor, Oct 2, 2004
    #5
  6. Timothy Grant

    Steve Holden Guest

    Timothy Grant wrote:

    > ----- Original Message -----
    > From: JEET <>
    > Date: Tue, 28 Sep 2004 17:13:17 +0100 (BST)
    > Subject: Using python to convert PDF document to MSWord documents
    > To:
    >
    >
    >
    >
    > Hello All,
    >
    > Can anyone please suggest me if there any python modules available to
    > convert PDF document to MSWord documents. If not then can you please
    > suggest how can i acheive this.
    >
    > Many thanks in advance,
    >

    One of the problems with such a module would be that PDF is primarily a
    display format, and so the structure of the file doesn't necessarily
    conform with the structure of the document.

    regards
    Steve
     
    Steve Holden, Oct 3, 2004
    #6
  7. Timothy Grant

    Guest

    You can use "pdf to word", it can help you to batch convert pdf to word
    or text at one time, keeping source layout, and Standalone software, MS
    Word, Adobe Acrobat and Reader NOT required! and you can get more
    information from
    http://www.convertzone.com/net/cz-PDF to Word-1-1.htm.

    ConvertZone Support team
    ConvertZone Software Co,.ltd
    http://www.convertzone.com


    ************************************************************
    ConvertZone provides office(PDF, Word, Excel, PowerPoint, AutoCAD etc),
    video(DVD, VCD, SVCD etc), audio(MP3, WAV, MIDI etc), image(JPG, GIF,
    TIF, BMP etc) file converter.
    ************************************************************
     
    , Jan 4, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lans Redmond
    Replies:
    2
    Views:
    650
    Lans Redmond
    Apr 10, 2005
  2. Replies:
    5
    Views:
    311
    Steve Holden
    Oct 11, 2006
  3. Theerasak Photha

    Re: Converting MSWord Docs to PDF

    Theerasak Photha, Oct 11, 2006, in forum: Python
    Replies:
    4
    Views:
    261
    Theerasak Photha
    Oct 13, 2006
  4. AAaron123
    Replies:
    1
    Views:
    1,013
    Alexey Smirnov
    Nov 17, 2009
  5. Neil_T
    Replies:
    0
    Views:
    169
    Neil_T
    Sep 8, 2004
Loading...

Share This Page