Hand-writing Chinese Character Recognition System

ET Handwriting Recognizer is a Hand-writing Character Recognition System built by ExceedTech Team. It includes two modules: On-line Recognition and Off-line Recognition. And Chinese and English are both supported here. The recognitions both satisfy the invariants of zooming, transforming and rotating. The main aim of this system is providing a platform to test image processing algorithm and recognizing algorithm.

Hw1

Figure 1 On-line Writing example

On-line Recognition:

You can use mouse or other pointing device to draw the characters at the writing area. Then right click the mouse, the system will show the recognition result in the Textbox on the right side and candidate characters for correcting.

If some recognition error happen, you can tell the system the correct character and click the button “学习” in order that the system can improve its performance next time. In addition, the system can check the spelling error as follow:

Off-line Recognition:

There are two ways of input for off-line recognition: select a picture file or use your PC Camera.

hw2

For test purpose, the off-line recognition module only supported 20 Chinese characters now.

Recognition Technology

On-line Recognition Technology

In the phase of Image processing, the system will regularize each small stroke to one of four angles (0, 45, 135, 180) and calculate the barycenter of them. Then normalize all the coordinate value so that the relative location of each stroke is independent to the character size.

In the analysis phase, the system compare the stroke information of the input character to the pattern information in pattern information DB, which contains the information shown as follow:

After the comparisons, the top 10 best match pattern will be reserved and output.

Off-line Recognition Technology

There are two independent modules in Off-line Recognition: Preprocessing and Recognition.

Preprocessing: Preprocessing is necessary here, because all the input samples come from a cheap PC camera and the character is made by usual pen and paper, hence, noise or shade or other interfere is investable to occur in our samples.

The preprocessing flow is shown as follow:

For example, the original input is left figure. With our preprocessing algorithm, our system identifies the character in the picture correctly and also filters the noise.

More complicate case is a character forming by several components.

To handle this challenge, Minimal contour and intersection detection is considered by us. More detail can be found in Detailed Design Document (Chinese).

Recognition:

Feature Extraction : Extraction a 128 dimension feature vector from the image.

First 48 features can be obtained by grid ratio: 30×40 normalized image is divided into 5×5 size grids, then calculate the ratio of the black color pixels in each grid, hence 48 grids can generate 48 ratio values.

Next 80 features are obtained by surrounding feature computation: 30×40 normalized image is divided into 10×10 size grids. And then in the first round, the algorithm calculate the ratio of area changing from white to black in the first column, the first row and the last column, the last row, which can be seen as a surrounding rectangle of the edge of the image. In the second round, the algorithm do the same thing but in the second column, the second row and the last second column, the last second row. At last, the algorithm can get another 80 features.

Classifier:

In the current version of this system, 3 layers Back propagation Artificial Neural Network is applied here. Input layer contains 128 input units and hidden layer contains 20 input units, 20 linear units are in the output layer.

Operating System：Microsoft Windows NT\2000\XP

Running Environment：Microsoft .NET Framework 2.0 Runtime

Develop Environment：Microsoft .NET Framework 2.0 SDK, Visual Studio.net 2005

Develop Language：C# .net

Software download: ET OCR Installation

Handwriting Samples: zi.rar

ohw