Extracting text from .png images

Discussion in 'Python' started by Henrik Berg Nielsen, Oct 1, 2003.

  1. Hi group!

    I need to extract some text (well numbers actually) from a bunch of
    similarly looking .png images. After extraction the numbers will be fed to a
    Python script for further processing. Any good ideas on how to go about with
    this? I have no idea whatsoever about how to extract the numbers out of the
    images...

    Thanks in advance,

    Henrik
     
    Henrik Berg Nielsen, Oct 1, 2003
    #1
    1. Advertising

  2. Henrik Berg Nielsen

    John J. Lee Guest

    "Henrik Berg Nielsen" <> writes:

    > I need to extract some text (well numbers actually) from a bunch of
    > similarly looking .png images. After extraction the numbers will be fed to a
    > Python script for further processing. Any good ideas on how to go about with
    > this? I have no idea whatsoever about how to extract the numbers out of the
    > images...


    OCR is the TLA you're looking for ("Optical Character Recognition").

    Dunno if there are any good free OCR engines. With these sorts of
    hard algorithms, you tend to get what you pay for.


    John
     
    John J. Lee, Oct 1, 2003
    #2
    1. Advertising

  3. Henrik Berg Nielsen <> spake thusly:
    >
    > I need to extract some text (well numbers actually) from a bunch of
    > similarly looking .png images. After extraction the numbers will be fed
    > to a Python script for further processing. Any good ideas on how to go
    > about with this? I have no idea whatsoever about how to extract the
    > numbers out of the images...
    >

    This might help you out...
    http://www.pricelessware.org/2003/PL2003TEXT.htm#Convert-OCR

    I'm not sure if it does PNG, you might have to convert the file to tiff or
    bmp or something.


    --
    Audio Bible Online:
    http://www.audio-bible.com/
     
    Indigo Moon Man, Oct 1, 2003
    #3
  4. Henrik Berg Nielsen

    Lee Harr Guest

    In article <wbDeb.2223$2net.dk>, Henrik Berg Nielsen wrote:
    > Hi group!
    >
    > I need to extract some text (well numbers actually) from a bunch of
    > similarly looking .png images. After extraction the numbers will be fed to a
    > Python script for further processing. Any good ideas on how to go about with
    > this? I have no idea whatsoever about how to extract the numbers out of the
    > images...
    >



    http://www.claraocr.org/
     
    Lee Harr, Oct 1, 2003
    #4
  5. John> OCR is the TLA you're looking for ("Optical Character Recognition").

    John> Dunno if there are any good free OCR engines. With these sorts of
    John> hard algorithms, you tend to get what you pay for.

    Which often means there's a piece of free software out there which works
    better than the most expensive commercial solutions. <wink>

    A little googling suggests this might be a candidate:

    http://www.claraocr.org/

    I have no idea if there's an exported library and/or a Python wrapper, but
    it's probably worth a look.

    Skip
     
    Skip Montanaro, Oct 1, 2003
    #5
  6. Henrik Berg Nielsen

    Tim Roberts Guest

    "Henrik Berg Nielsen" <> wrote:
    >
    >I need to extract some text (well numbers actually) from a bunch of
    >similarly looking .png images. After extraction the numbers will be fed to a
    >Python script for further processing. Any good ideas on how to go about with
    >this? I have no idea whatsoever about how to extract the numbers out of the
    >images...


    Are you hoping to extract the "password" characters from the pictures
    presented by the whois checks? If so, you should give up now, because
    those images are SPECIFICALLY designed to make them almost impervious to
    automated recognition.
    --
    - Tim Roberts,
    Providenza & Boekelheide, Inc.
     
    Tim Roberts, Oct 2, 2003
    #6
  7. Henrik Berg Nielsen wrote:
    > Hi group!
    >
    > I need to extract some text (well numbers actually) from a bunch of
    > similarly looking .png images. After extraction the numbers will be fed to a
    > Python script for further processing. Any good ideas on how to go about with
    > this? I have no idea whatsoever about how to extract the numbers out of the
    > images...


    Hi,
    I'm dealing with similar problem now. My pictures are very complicated
    (construction drawings). I am trying to use gamera
    (http://dkc.jhu.edu/gamera/) for OCR and it seems very promising.

    --
    -- Lukas
     
    Lukas Ccenovsky, Oct 2, 2003
    #7
  8. On Wed, 01 Oct 2003 20:25:45 -0700, Tim Roberts <> wrote:

    >"Henrik Berg Nielsen" <> wrote:
    >>
    >>I need to extract some text (well numbers actually) from a bunch of
    >>similarly looking .png images. After extraction the numbers will be fed to a
    >>Python script for further processing. Any good ideas on how to go about with
    >>this? I have no idea whatsoever about how to extract the numbers out of the
    >>images...

    >
    >Are you hoping to extract the "password" characters from the pictures
    >presented by the whois checks? If so, you should give up now, because
    >those images are SPECIFICALLY designed to make them almost impervious to
    >automated recognition.

    Sounds interesting as a problem, but I wouldn't want to create a skeleton key
    for any bad guys ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Oct 2, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Jones
    Replies:
    0
    Views:
    1,286
    Chris Jones
    Jun 25, 2003
  2. Chris Jones

    Dynamically generated png images

    Chris Jones, Jun 27, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    531
    Kevin Spencer
    Jun 27, 2003
  3. Replies:
    0
    Views:
    474
  4. Keith Hughitt
    Replies:
    6
    Views:
    1,073
  5. Replies:
    3
    Views:
    331
    Dave Angel
    Oct 17, 2009
Loading...

Share This Page