finding blocks in black-and-white images (efficiently)

A

Axel Etzold

Dear all,

I have a number of black-and-white scanned pages. To prepare them for OCR,
I have to split them in columns and rows. Additionally, somewhere in between, there
are pictures, which also need to be separated.

So, in a page that might look like this:

Text1 Text4 Text6

Text2 Pict1 Text7

Text3 Text5 Pict2

I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
and vertically.

Right now, I would use RMagick with export_pixels_to_str and then regular expressions to find the
zeros, but I am not sure whether there's a more effective way for this purpose....

Do you have any suggestions ?

Thank you very much,

Best regards,

Axel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top