PDF to text

L

Lord0

Hi there,

Looking for some Java libraries which will extract the text from a PDF,
retaining white space formatting i.e. paragraphs, newlines etc.

I've looked at, and tested, pdfbox, which does extract the text however
it does not preserve, or insert, paragraphs, newlines into its output.

I've looked at IText but according to the FAQ this will not extract the
text from the PDF.

I'd rather not use an external program like "pdftotext", a pure Java,
library based solution would be better.

Any ideas?

Cheers

Lord0
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top