Poor man's OCR: need performance improvement tips

Discussion in 'Python' started by qvx, Sep 24, 2005.

  1. qvx

    qvx Guest

    Hi all,

    I have a performance problem in my app. It is a poor man's version of
    OCR app. I started this app as a prototype before implementing it in
    C++. But now, after I have a working copy in Python, I don't feel like
    doing the job again in C++. A speed improvement of at least 5 times
    would be necessary (I have to process several hundreds of thousands of
    images).

    The application works like this:

    1. Load a raster image of alphabet. This image is accompanied with xml
    description of that image (positions of each letter etc.). No problems
    here. Using this information load raster data (data[][]) for each
    letter.

    2. Load image which must be recognized and transform it into 1 bit
    image (to avoid palette issues). No problems here.

    3. Detect lines of text in input picture. No problems here.

    4. Process each line: compare pixels of each letter of alphabet with
    corresponding pixels in line of input picture. This consists of loops
    comparing pixel by pixel. This is my performance bottleneck.

    I'm using PIL for initial image processing. But then I use plain Python
    loops for pixel matrix comparision. One nice optimization was to call
    PIL.Image.getdata() at the begining and then use data[y*w+x] instead of
    PIL.Image.getpixel(xy). I would like to compare each character raster
    with corresponding image pixels in a "single operation" and avoid
    (Python) loops.

    Oh, one more thing. Letter raster matrices have varying width and
    constant height (due to proportional width font which is used). This
    compare function should signal if there is a single different pixel.

    Any library that can do that?


    Here is how I expected to solve this problem in C++. Each line of text
    (and letter) has height below 16 pixels. It can be artificially made
    into 16 pixels. I planned to linearize each letter's matrix by columns.
    Imagine leter with pixel indices numbered like this:

    00 10 20
    01 11 21
    02 12 22
    03 13 23
    .. .. ..
    0f 1f 2f

    I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
    is one bit wide, each column would be 2 octets long. I would do the
    same to the line of text of input picture. Then i would have to compare
    two buffers of length equal to the length of character. After
    successfull match, I would advance "input" stream by that number of
    bytes.

    Thanx
    qvx
    qvx, Sep 24, 2005
    #1
    1. Advertising

  2. qvx

    D.Hering Guest

    Hi Take a look at ADaM's image processing functionality. I'd also
    suggest seeing if Numarray of Scipy can be utilized.

    Here's ADaM's site. I'm sure your familiar with the others mentioned.

    http://datamining.itsc.uah.edu/adam/

    hth,
    Dieter
    D.Hering, Sep 24, 2005
    #2
    1. Advertising

  3. On 24 Sep 2005, at 19:14, qvx wrote:

    > Hi all,


    <snip>

    >
    >
    > 4. Process each line: compare pixels of each letter of alphabet with
    > corresponding pixels in line of input picture. This consists of loops
    > comparing pixel by pixel. This is my performance bottleneck.
    >
    > I'm using PIL for initial image processing. But then I use plain
    > Python
    > loops for pixel matrix comparision. One nice optimization was to call
    > PIL.Image.getdata() at the begining and then use data[y*w+x]
    > instead of
    > PIL.Image.getpixel(xy). I would like to compare each character raster
    > with corresponding image pixels in a "single operation" and avoid
    > (Python) loops.
    >
    > Oh, one more thing. Letter raster matrices have varying width and
    > constant height (due to proportional width font which is used). This
    > compare function should signal if there is a single different pixel.
    >
    > Any library that can do that?
    >
    >
    > Here is how I expected to solve this problem in C++. Each line of text
    > (and letter) has height below 16 pixels. It can be artificially made
    > into 16 pixels. I planned to linearize each letter's matrix by
    > columns.
    > Imagine leter with pixel indices numbered like this:
    >
    > 00 10 20
    > 01 11 21
    > 02 12 22
    > 03 13 23
    > .. .. ..
    > 0f 1f 2f
    >
    > I would convert it into 00 01 02 03 04 05 ... 2e 2f. Since each pixel
    > is one bit wide, each column would be 2 octets long. I would do the
    > same to the line of text of input picture. Then i would have to
    > compare
    > two buffers of length equal to the length of character. After
    > successfull match, I would advance "input" stream by that number of
    > bytes.


    Presumably you don't care about alignment and kerning and other
    things currently.

    If you haven't tried Psyco yet, try it.
    If you read the image in rotated 90 degrees then the data is
    linearised how you want it already. You could then just pack it into
    an integer and compare that, or look it up in a dictionary even.

    e.g.
    char = lookup[data[n:n+2]]

    where n is the left (or bottom, we rotated in 90 degrees remember?)
    and 2 is me assuming PIL will not encode each pixel as entire byte in
    a 1bpp image. I would of thought that would be pretty quick as long
    as you could get the alignment reliable enough.

    I hope this makes some actual sense, I have 0 OCR experience tbh.
    Alex Stapleton, Sep 24, 2005
    #3
  4. qvx

    qvx Guest

    I also have 0 OCR experience, but the case is simple enough.

    I know about scipy but I have 0 experience with it. I was actually
    hoping somebody who knows about it might have some recipe.

    I also tried psyco, but unfortunetly, the speedup is only few percent.

    I will check out ADaM's site.

    I was hoping to replace matrix comparison with something more efficient
    (minimal code change). I like the idea of dictionary lookup but it
    requires more code to be changed. If nothing else comes up I will
    probably try this method. Of course I will have to check the wider
    characters first so there will be presumably several lookups for each
    position. The only problem here is how to efficiently transform
    portions of input picture into suitable format (some kind of
    list/buffer).

    Thanks.
    qvx, Sep 24, 2005
    #4
  5. qvx

    D.Hering Guest

    I'm working on essentially the same thing for a real-time context. No
    formal schema developed yet, but I know that I'll be using some
    combination of the following: ADaM, ESML (Binary data format
    unification...xml config), Numarray/Scipy, Pytables (DataBase), PIL,
    Cairo (svg), and MatPlotlib.
    D.Hering, Sep 24, 2005
    #5
  6. qvx

    D.Hering Guest

    I thought I should correct and clarify the use of EMSL (even though
    this isn't specific toward your request, qvx.)

    http://esml.itsc.uah.edu/index.jsp

    It's more for data format recognition and conversion. Here's a
    description from the site. The sw was developed for earth science, but
    is very useful for any domain.

    ESML is an interchange technology that enables data (both structural
    and semantic) interoperability with applications without enforcing a
    standard format. Users can write external files using ESML schema to
    describe the structure of the data file. Applications can utilize the
    ESML Library to parse this description file and decode the data format.
    As a result, software developers can now build data format independent
    applications utilizing the ESML technology. Furthermore, semantic tags
    can be added to the ESML files by linking different domain ontologies
    to provide a complete machine understandable data description. This
    ESML description file allows the development of intelligent
    applications that can now understand and "use" the data.
    D.Hering, Sep 24, 2005
    #6
  7. qvx

    John J. Lee Guest

    "qvx" <> writes:
    [...]
    > 4. Process each line: compare pixels of each letter of alphabet with
    > corresponding pixels in line of input picture. This consists of loops
    > comparing pixel by pixel. This is my performance bottleneck.
    >
    > I'm using PIL for initial image processing. But then I use plain Python
    > loops for pixel matrix comparision. One nice optimization was to call
    > PIL.Image.getdata() at the begining and then use data[y*w+x] instead of
    > PIL.Image.getpixel(xy). I would like to compare each character raster
    > with corresponding image pixels in a "single operation" and avoid
    > (Python) loops.

    [...]

    I don't know what exactly "compare" means here, so no Numeric code,
    but GIYF when it comes to the PIL<-->Numeric conversion (I imagine
    numarray is almost identical here, though I've not used it in anger):

    http://effbot.org/zone/pil-numpy.htm


    Since you mention C++, scipy.weave may also be of interest to you.


    John
    John J. Lee, Sep 24, 2005
    #7
  8. qvx

    Guest

    Imagine a large matrix with dimensions [W,H], and a lots of smaller
    matrices with dimensions [p,q1], [p,q1], [p,q2], [p,q1], ... I have to
    slide a small window [p,q] horizontally over a larger matrix. After
    each slide I have to compare smaller matrices with the data from larger
    matrix (as defined by sliding window).

    I'm currently trying to use other kinds of optimizations (linearize
    data by columns), but the program no longer works, and it is so hard to
    debug. But it works very fast :)

    Here is an example of linearization by columns that i'm currently using
    :

    # setup: convert to 1 bit image
    img = Image.open(file_name)
    img2 = img.point([0]*255 + [1], "1")

    # find ocr lines, and for each do ...

    # extract OCR line
    region = img2.crop((0, ocrline.base_y-13, width, ocrline.base_y+3)) #
    h=16
    region.load()

    # clean up upper two lines which should be empty but
    # sometimes contain pixels from other ocr line directly above
    draw = ImageDraw.Draw(region)
    draw.line((0,0,width,0), fill=1)
    draw.line((0,1,width,1), fill=1)

    # transpose data so I get columns as rows
    region = region.transpose(Image.FLIP_LEFT_RIGHT)
    region = region.transpose(Image.ROTATE_90)
    ocrline.data = region.tostring() # packs 8 pixels into 1 octet

    I do the same for known letters/codes (alphabet). Then I compare like
    this:

    def recognize (self, ocrline, x):
    for code_len in self.code_lengths: # descending order
    code = ocrline.data[x:x+code_len]
    ltr = self.letter_codes.get(code, None)
    if ltr is not None:
    return ltr, code_len # So I can advance x

    This currently "works" two orders of magnitude faster.
    , Sep 25, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anjali Lourda
    Replies:
    0
    Views:
    446
    Anjali Lourda
    Feb 4, 2004
  2. Replies:
    9
    Views:
    491
    Benji
    Nov 23, 2005
  3. Andrea Crotti

    Poor man unit testing

    Andrea Crotti, Jul 22, 2010, in forum: C Programming
    Replies:
    7
    Views:
    316
    Andrea Crotti
    Jul 29, 2010
  4. pete
    Replies:
    12
    Views:
    711
    Stephen Sprunk
    Dec 27, 2010
  5. Replies:
    21
    Views:
    231
    Martijn Lievaart
    Mar 8, 2010
Loading...

Share This Page