PDF Parser?

Discussion in 'Python' started by Miki Tebeka, Jul 7, 2003.

  1. Miki Tebeka

    Miki Tebeka Guest

    Hello All,

    I'm looking for a PDF parser.
    Any pointers?

    10x.
    Miki
     
    Miki Tebeka, Jul 7, 2003
    #1
    1. Advertising

  2. Miki Tebeka

    John Hunter Guest

    >>>>> "Miki" == Miki Tebeka <> writes:

    Miki> Hello All, I'm looking for a PDF parser. Any pointers?

    A little more info would be helpful: do you need access to all the pdf
    structures or just the text? AFAIK, there is no full pdf parser in
    python. The subject has come up several times before, so check the
    google.groups archives

    http://groups.google.com/groups?q=p...n*&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google Search

    Things people have suggested before:

    1) use pdftotext and parse the text
    2) wrap xpdf's parser.

    For example, if you have pdftotext, the following will give you a
    python file-like handle to the source:

    def pdf2txt(fname):
    return os.popen('pdftotext -raw -ascii7 %s -' % fname)

    If you just want to search and index pdf, see
    http://pdfsearch.sourceforge.net.

    John Hunter
     
    John Hunter, Jul 7, 2003
    #2
    1. Advertising

  3. "John Hunter" <>

    > A little more info would be helpful: do you need access to all the pdf
    > structures or just the text? AFAIK, there is no full pdf parser in
    > python.


    If you need to access the graphical elements, you may use pstoedit to
    convert the PDF into SVG (Structured Vector Graphics). Since SVG is XML, you
    can then use any Python-based XML toolkit to parse the data.
    http://www.pstoedit.net/pstoedit

    Adam
     
    Adam Twardoch, Jul 15, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    789
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    844
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    841
    Bernd Oninger
    Jun 9, 2004
  4. Ricardo Pog
    Replies:
    1
    Views:
    487
    Austin Ziegler
    Mar 26, 2008
  5. Sean Nakasone
    Replies:
    1
    Views:
    427
    Farrel Lifson
    Apr 14, 2008
Loading...

Share This Page