[Half-off] How to get textboxes (text blocks) from ps/pdf files?

D

durumdara

Hi!

I need to get textboxes/textblocks from pdf files. I can convert them
into ps.
Is anyone knows about method, trick, routine to I can get the textboxes
from ps or pdf?
(Pythonic, COM, or command line solutions needed.)

I need to redraw them into my application, and user can reorder them,
and next I concat. every text to process it.

I need these infos:
x, y, w, h, text

Example:
page1
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1 /xfc /xfa"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
page2
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
....

Any solution?

Thanks for it!
dd

ps1:
I tried every pdf2text and pdf2html application. All failed in the
test.
Only one provide good informations, the pdftohtml, because it is
makes divs with abs. position and size and the texts.
But this program is not handle the iso-8859-2 chars, so I lost them.

ps2:
The program must run under Windows XP. So the solution is os specific.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top