find text location (in pixels) using python (pyPdf)

Chris Curvey · Mar 9, 2010

Has anyone ever tried to find the pixel (or point) location of text in
a PDF using Python? I've been using the pyPdf libraries for other
things, and it seems to me that if I can find the bounding box for
text, I should be able to calculate the location.

What I want to do is take a PDF of one of our vendor invoices and blur
everything in it except the block that's related to a single
customer. So if I have an invoice that looks like:

Alfred Annoying
123 Elm St
Somewhere, NJ
$100

Barbie Bonehead
456 Pine St
Elsewhere, NJ
$125

Charlie Clueless
789 Beech St.
Everywhere, NJ
$150

I want to show Barbie just her section of the invoice (with the header
intact, so that she can tell it's a real invoice) but with Alfred and
Charlie's information blurred out. I was going to convert the PDF to
a JPG or PNG and do the blurring with ImageMagick/PythonMagick. But
that requires me to know the pixel location of the regions that I want
blurred and left alone.

I'm also open to other ideas if I'm going about this the hard way....

What you can't find in the programing text books	2	Apr 7, 2006
comp.lang.vhdl FAQ part 2 of 4: books	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

find text location (in pixels) using python (pyPdf)

Chris Curvey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads