A
Axel Etzold
Dear all,
I have a number of black-and-white scanned pages. To prepare them for OCR,
I have to split them in columns and rows. Additionally, somewhere in between, there
are pictures, which also need to be separated.
So, in a page that might look like this:
Text1 Text4 Text6
Text2 Pict1 Text7
Text3 Text5 Pict2
I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
and vertically.
Right now, I would use RMagick with export_pixels_to_str and then regular expressions to find the
zeros, but I am not sure whether there's a more effective way for this purpose....
Do you have any suggestions ?
Thank you very much,
Best regards,
Axel
I have a number of black-and-white scanned pages. To prepare them for OCR,
I have to split them in columns and rows. Additionally, somewhere in between, there
are pictures, which also need to be separated.
So, in a page that might look like this:
Text1 Text4 Text6
Text2 Pict1 Text7
Text3 Text5 Pict2
I'd like to find the largest blocks of white which separate the texts and pictures, both horizontally
and vertically.
Right now, I would use RMagick with export_pixels_to_str and then regular expressions to find the
zeros, but I am not sure whether there's a more effective way for this purpose....
Do you have any suggestions ?
Thank you very much,
Best regards,
Axel