Java OCR ?

S

Soefara

Is it just me or no Java OCR package exists ?

I've seen one reference - www.javaocr.com - but if you download the
demo, it's actually a 33KB .jar file which in turn calls a 220KB DLL
(which will run on Windows only). Maybe I'm misunderstanding
something, but this looks like a bit misleading.

Even PHP apparently has OCR packages, according to SourceForge. How
can it be that Java does not ?

Seofara
 
S

Soefara

Thank you for the reply Marco.
Good OCR is hard and requires a lot of research and experience.
Finereader has an SDK that works under Windows and Linux:
<http://www.abbyy.com/developer_toolkits1.asp?param=28807&from=topcom2>.
Maybe it can be interfaced from Java?

I'm not sure how I'd go about "interfacing" that. However,
there do seem to be quite a few open source and linux OCR
packages, some of which can be driven from the command line,
the most prominent of which is Clara
(see http://www.claraocr.org/faq.html)

Is there any danger in executing an external program (such
as Clara) from within a Java servlet using something like this ?

Runtime.exec("/full/path/to/program [optional-arguments]");


Soefara
 
Joined
May 23, 2008
Messages
1
Reaction score
0
Copy from java.sun, hope it can help some lost soul in need of OCR :)

Greetings,

I know that it's quite a long time that those posts are here but I found them while looking for an OCR solution in Java, and I would like to share the FREE answer I have created.

I browsed lots of posts while searching for OCR in Java, and all was linking to Asprise / javaocr, but those are unaffordable for non-commercial project.

So I searched for OCR software, without language prereq, in the purpose to interface it with Java.

-I discovered GOCR (http://jocr.sourceforge.net/) which is an ocr in command line. It was a beginning ^^ I downloaded and used Windows version. After few tests I was able to figure how to use it but I've to feed it with PPM images.

-Here come the second software nconvert (http://pagesperso-orange.fr/pierre.g/xnview/fr_nconvert.html) which can convert images to PPM.

So I have done 2 static classes to act like OCR.

The main part is the class OCR, which take a screenshot of the screen, put the proper color (I've made gorc work only with Black letters on White background), write the image to the disk and then call nconvert and gorc.

By parsing outputstream of GOCR process you should have your text recognized. There is the "replace" thing in return because I work on numbers and gorc make some mistakes with 1-l and O-0 ^^

That's not a Strong OCR facility but it can help with little application. Hope it'll help and lot of thanks to nconvert and gocr ;)

Code:
package t3x.tnn.utility;

import java.awt.Color;
import java.awt.Point;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

public class OCR {
	static public String recognize(Point hg, Point bd, Color color, boolean isColorEcriture){
		String res = null;
		File fImg = new File("screenshot.png");
		while(res == null){
			BufferedImage img = ScreenHandler.getScreen(hg, bd);
			if(isColorEcriture)
				img = changeWithColorEcriture(img, color);
			else
				img = changeWithColorFond(img, color);
			try {
				ImageIO.write(img, "PNG", fImg);
				Process p = Runtime.getRuntime().exec("nconvert -out ppm -o text.ppm screenshot.png");
				p.waitFor();
				p.destroy();
				p = Runtime.getRuntime().exec("gocr045 text.ppm");
				p.waitFor();
				if(p.getInputStream().available()>0)
					res = IOHandler.getResponse(p.getInputStream());
				p.destroy();
			}catch (InterruptedException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		if(fImg.exists())
			fImg.delete();
		File texte = new File("text.ppm");
		if(texte.exists())
			texte.delete();
		return res.replace("l", "1").replace("O", "0").trim();
	}

	private static BufferedImage changeWithColorEcriture(BufferedImage bi, Color ecriture) {
		if (bi != null) {                       
			int w = bi.getWidth();
			int h = bi.getHeight();
			int pixel;
			BufferedImage bitmp = new BufferedImage(w, h, bi.getType());
			BufferedImage biOut = new BufferedImage(w, h, bi.getType());

			for (int x = 0; x < w; x++) {
				for (int y = 0; y < h; y++) {
					pixel = bi.getRGB(x, y);
					if(pixel != ecriture.getRGB())
						pixel = Color.BLUE.getRGB();
					else
						pixel = Color.BLACK.getRGB();
					bitmp.setRGB(x, y, pixel); 
				}
			}

			for (int x = 0; x < w; x++) {
				for (int y = 0; y < h; y++) {
					pixel = bitmp.getRGB(x, y);
					if(pixel == Color.BLUE.getRGB())
						pixel = Color.WHITE.getRGB();
					biOut.setRGB(x, y, pixel);
				}
			}

			return biOut;
		} else {
			return bi;
		}
	}
	
	private static BufferedImage changeWithColorFond(BufferedImage bi, Color fond) {
		if (bi != null) {                       
			int w = bi.getWidth();
			int h = bi.getHeight();
			int pixel;
			BufferedImage bitmp = new BufferedImage(w, h, bi.getType());
			BufferedImage biOut = new BufferedImage(w, h, bi.getType());

			for (int x = 0; x < w; x++) {
				for (int y = 0; y < h; y++) {
					pixel = bi.getRGB(x, y);
					if(pixel == fond.getRGB())
						pixel = Color.BLUE.getRGB();
					else
						pixel = Color.WHITE.getRGB();
					bitmp.setRGB(x, y, pixel); 
				}
			}

			for (int x = 0; x < w; x++) {
				for (int y = 0; y < h; y++) {
					pixel = bitmp.getRGB(x, y);
					if(pixel == Color.BLUE.getRGB())
						pixel = Color.WHITE.getRGB();
					biOut.setRGB(x, y, pixel);
				}
			}

			return biOut;
		} else {
			return bi;
		}
	}
}

Code:
package t3x.tnn.utility;

import java.awt.AWTException;
import java.awt.Color;
import java.awt.Dimension;
import java.awt.Point;
import java.awt.Rectangle;
import java.awt.Robot;
import java.awt.image.BufferedImage;

public class ScreenHandler {

	public static Color getPixelColor(Point p){
		return getPixelColor(p.x, p.y);
	}

	public static BufferedImage getScreen(Point hg, Point bd){
		checkNano();
		return nano.createScreenCapture(new Rectangle(hg, new Dimension(bd.x-hg.x, bd.y-hg.y)));
	}
	
	public static boolean areImagesEqual(BufferedImage img1, BufferedImage img2){
		int[] timg1 = getPixels(img1);
		int[] timg2 = getPixels(img2);
		for(int i = 0 ; i < timg1.length; i++){
			if(timg1[i]!=timg2[i]){
				return false;
			}
		}
		return true;
	}
	
	public static Color analyse(Point depart, int deviation, Color fond){
		for(int i= depart.x; i < depart.x+deviation; i++){
			Color col = ScreenHandler.getPixelColor(i, depart.y);
			if(!col.equals(fond))
				return col;
		}
		//IOHandler.abort("[ScreenHandler.analyse] : Aucune couleur de jeu trouvée");
		return null;
	}
///////////////////////////////////////////////////////////////////////////////////
	private static Robot nano;
	
	private static Color getPixelColor(int x, int y){
		checkNano();
		return nano.getPixelColor(x, y);
	}
	
	private static int[] getPixels(BufferedImage img){
		return img.getRaster().getPixels(img.getRaster().getMinX(), img.getRaster().getMinY(),  img.getRaster().getWidth(), img.getRaster().getHeight(), new int[ img.getRaster().getWidth()*img.getRaster().getHeight()*10]);
	}
	
	private static void checkNano(){
		if(nano == null)
			try {
				nano = new Robot();
			} catch (AWTException e) {
				e.printStackTrace();
			}
	}
}
 
Joined
Dec 15, 2008
Messages
1
Reaction score
0
Trying to use your solution but...

Hi Ayesh,

We´ve developed a web app which indexes documents, you can see it at nootes dot org

What we want now is to make a swing app which lets me scan documents and do OCR on them so they can be uploaded to my web app using webservices (already developed).

The thing is that I found your solution perfect to my needs, but when I tried to use it on NetBeans IDE I got the following error:

res = IOHandler.getResponse(p.getInputStream());

What package do I need to use such function?

Thanks for your help and your time.
 
Last edited:
Joined
Aug 2, 2009
Messages
1
Reaction score
0
I am aware that this thread is rather old but am in need of help! I have used java quite extansivly a few years back but unfortunatly am a little rusty with it- i am trying to make an OCR program and think that the method posted here using gocr and nconvert is a good idea to avoid using Aspire OCR which needs payed for...

Anyway, using blueJ, I am having the same problem as the above poster "cannot find symbol - variable IOHandler". I thought I would try it in netbeans too just to make sure it wasn't a blueJ quirk but same error message.

From waht I can gather the variable IOHandler hasn't been defined in the OCR class but I am unsure what type to variable to declare it as such that it can use the getResponse() method. does anyone have any idea?

I have searched high and low to find a solution but to no avail, I really hope someone can point me in the right direction.

Thanks.

Ewen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top