Ruby + OCR

Zbigniew Kowalski · Nov 17, 2011

hi!
I need to create sort of service that will OCR a simple image of a
document send via e-mail to serwer.

Would you recommend any worth-to-use libraries to do so?

Thank you.
Z

Fester · Nov 21, 2011

Hello,

Recently I was digging into the same area in Python and came to
following conclusions:
1. You must choose between pretty expensive and proprietary Abbyy
command line OCR SDK and free Tesseract OCR. Abbyy's product is great
in recognition, but have very limiting license, while Tesseract is
great and trainable, but have very poor layout analysis.
2. I am not aware about any existing wrapper over either of these
products. Writing a basic wrapper won't be a real problem though,
since basic interaction with them is limited to forking an external
process. Additionally, Tesseract has an API bindings for Python, it
seems that implementing them for Ruby would be an easy task too.

Tesseract would work for you if you have an evenly formatted amounts
of text. Otherwise you would have to implement image layout analysis
engine on your own. Also, you would better use SVN trunk of Tesseract,
because it contains many changes comparing to the last packaged
version.

Zbigniew Kowalski · Nov 21, 2011

Hello,
Recently I was digging into the same area in Python and came to
following conclusions:

Hi,
Thank you for this info! I'm very appreciated. It will be very useful.
I guess it won't be a great problem because as far as I saw the sample
pages - text use a clear type fonts (most fixed-width) - like old
fashioned typewriters.

Regards

Fester · Nov 22, 2011

Hello,

I think it will be fine unless you have big amonunts of noise. Also
you can check out Tesseract's training abilities, and probably even
enforce a better recognition of a well-known font face by retraining
the OCR.

nikolaykhl · Nov 23, 2011

Hello there,
My name is Nikolay Khlebinsky, i work @ ABBYY.

As Fester mentioned, ABBYY provides the most accurate OCR (for
example, have look at http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison).

We are currently launching a new cloud-based OCR SDK suitable for
small businesses and single developers. It has a well-composed
developer guide, a good set of sample codes (including python) and
it’s free for the testing period

Would you like to participate in the closed beta testing program of
our OCR SDK?

All you have to do to participate is sign up at www.ocrsdk.com, fill
in a short form a start developing your application. Please fill the
“Where did you hear about ABBYY Cloud OCR SDK?” field with “Nikolay
invite”.

Feel free to contact me if you have any questions.
Best regards, Nikolay Khlebinsky.
(e-mail address removed)

Nikolay · Nov 23, 2011

Hello there,

My name is Nikolay Khlebinsky, i work @ ABBYY.

As Fester mentioned, ABBYY provides the most accurate OCR (for
example, have look at http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison).

We are currently launching a new cloud-based OCR SDK suitable for
small businesses and single developers. It has a well-composed
developer guide, a good set of sample codes (including python) and
it’s free for the testing period

Would you like to participate in the closed beta testing program of
our OCR SDK?

All you have to do to participate is sign up at www.ocrsdk.com, fill
in a short form a start developing your application. Please fill the
“Where did you hear about ABBYY Cloud OCR SDK?” field with “Nikolay
invite”.

Feel free to contact me if you have any questions.
Best regards, Nikolay Khlebinsky.
(e-mail address removed)

Nikolay · Nov 23, 2011

Hello there,

My name is Nikolay Khlebinsky, i work @ ABBYY.

As Fester mentioned, ABBYY provides the most accurate OCR (for
example, have look at http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison).

We are currently launching a new cloud-based OCR SDK suitable for
small businesses and single developers. It has a well-composed
developer guide, a good set of sample codes (including python) and
it’s free for the testing period

Would you like to participate in the closed beta testing program of
our OCR SDK?

All you have to do to participate is sign up at http://www.ocrsdk.com,
fill in a short form a start developing your application. Please fill
the “Where did you hear about ABBYY Cloud OCR SDK?” field with
“Nikolay invite”.

Feel free to contact me if you have any questions.
Best regards, Nikolay Khlebinsky.
(e-mail address removed)

Zbigniew Kowalski · Nov 25, 2011

@Fester: Most of the images would be taken with cheap cameras (in the
beginning ) - so I guess it need to be trained to eliminate camera
noise and "shakes".

@Nikolay:
I would like to participate to the beta test to compare which solution
would be more suitable for my idea. I will send the info this weekend.

Thanks and regards
Z

nikolaykhl · Nov 28, 2011

Hi Zbigniew,

I haven't seen you signing in for the beta testing program. Did you
face any difficulties with registration?
Feel free to contact me if you have any questions.

Best regards, Nikolay.

AutoIt: Any support for OCR?	1	Jul 14, 2008
Creating a direct download div link for pdf file	3	Mar 19, 2023
Cut pages for OCR with RMagick?	2	Sep 29, 2007
New coder with a focus on animating web banners	0	Dec 15, 2022
pdf barcode/ocr library	0	Jun 17, 2008
How to position the tooltip comment on these buttons?	9	Nov 4, 2023
Web Session Handling	5	Nov 2, 2011
More Freelance Projects in C	0	Jun 30, 2008

Ruby + OCR

Zbigniew Kowalski

Fester

Zbigniew Kowalski

Fester

nikolaykhl

Nikolay

Nikolay

Zbigniew Kowalski

nikolaykhl

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads