PDF Parser?

Miki Tebeka · Jul 7, 2003

Hello All,

I'm looking for a PDF parser.
Any pointers?

10x.
Miki

John Hunter · Jul 7, 2003

Miki> Hello All, I'm looking for a PDF parser. Any pointers?

A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python. The subject has come up several times before, so check the
google.groups archives

http://groups.google.com/groups?q=p...n*&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google+Search

Things people have suggested before:

1) use pdftotext and parse the text
2) wrap xpdf's parser.

For example, if you have pdftotext, the following will give you a
python file-like handle to the source:

def pdf2txt(fname):
return os.popen('pdftotext -raw -ascii7 %s -' % fname)

If you just want to search and index pdf, see
http://pdfsearch.sourceforge.net.

John Hunter

Adam Twardoch · Jul 15, 2003

John Hunter said:
A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python.

If you need to access the graphical elements, you may use pstoedit to
convert the PDF into SVG (Structured Vector Graphics). Since SVG is XML, you
can then use any Python-based XML toolkit to parse the data.
http://www.pstoedit.net/pstoedit

Adam

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
PDF file won't open	1	Jun 21, 2022
PDF File Code	4	Apr 20, 2023
How to create PDF file in Batch	5	May 11, 2022
Digital Signature field form in PDF generated document from HTML	5	Nov 16, 2022
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
PDF extraction of specific data	1	Jun 13, 2021
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023

PDF Parser?

Miki Tebeka

John Hunter

Adam Twardoch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads