Hello all,
Here is a very basic implementation of the OpenCV-Python based program for Optical Character Recognition(OCR).
First, we’ll understand what all resources we already with the default OpenCV build.
There is a file called letter_recognition.data file that comes with OpenCV samples. It contains a letter, along with 16 features of that letter. These 16 features are explained in the paperLetter Recognition Using Holland-Style Adaptive Classifiers.
There will be 2 Stages of operation in which we’ll perform the final classification:
Stage I : Preparing the Data
Stage II: Training and Testing the Data
Stage I: Preparing the Data
For now, we’ll use the following image for our training data:
To prepare the data for training, we’ll write a code that’ll do the following:
A) It loads the image
B) Selects the digits (by contour finding and applying constraints on area and height of letters to avoid false detection)
C) Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in box
D) Once corresponding digit key is pressed, it resizes this box to 10×10 and saves 100 pixel values in an array and corresponding manually entered digit in another array
E) Then save both the arrays in a separate text file
At the end of manual classification of digits, all the digits in the train data are labeled manually by us and the image will look like below:
Training Code Below:
import sys
import numpy as np
import cv2
im = cv2.imread('train_image.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)
Stage II: Training and Testing the Data
Once, we have trained the data, now, comes the testing part.
For testing, we’ll use the following image
For training we do as follows:
A) Load the text files we already saved earlier
B) Create an instance of classifier we are using (KNearest)
C) Then we use KNearest.train function to train the data
For testing purposes, we do as follows:
A) We load the image used for testing
B) process the image as earlier and extract each digit using contour methods
C) Draw bounding box for it, then resize to 10×10, and store its pixel values in an array as done earlier.
D) Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit 🙂
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
model.train(samples,responses)
############################# testing part #########################
im = cv2.imread('input.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)
Here are the results:
And it worked, wohoooooo !!!
Enjoy !!
Simply amazing blog