Hand-writing Chinese Character Recognition System
ET Handwriting Recognizer is a Hand-writing
Character Recognition System built
by ExceedTech Team. It includes two modules: On-line Recognition and Off-line Recognition.
And Chinese and English are both supported here. The recognitions both satisfy
the invariants of zooming, transforming and rotating. The main aim of this
system is providing a platform to test image processing algorithm and
recognizing algorithm.
Figure 1 On-line Writing example
You
can use mouse or other pointing device to draw the characters at the writing
area. Then right click the mouse, the system will show the recognition result
in the Textbox on the right side and candidate characters for correcting.
If
some recognition error happen, you can tell the system the correct character
and click the button “学习” in order that the system
can improve its performance next time. In addition, the system can check the
spelling error as follow:
There
are two ways of input for off-line recognition: select a picture file or use
your PC Camera.
For
test purpose, the off-line recognition module only supported 20 Chinese
characters now.
In
the phase of Image processing, the system will regularize each small stroke to
one of four angles (0, 45, 135, 180) and calculate the barycenter of them. Then
normalize all the coordinate value so that the relative location of each stroke
is independent to the character size.
In
the analysis phase, the system compare the stroke information of the input
character to the pattern information in pattern information DB, which contains
the information shown as follow:
After the comparisons, the top 10 best match pattern will be reserved
and output.
There
are two independent modules in Off-line Recognition: Preprocessing and
Recognition.
The preprocessing flow is shown as follow:
For
example, the original input is left figure. With our preprocessing algorithm,
our system identifies the character in the picture correctly and also filters
the noise.
More
complicate case is a character forming by several components.
To
handle this challenge, Minimal contour and intersection detection is considered
by us. More detail can be found in Detailed Design Document
(Chinese).
Feature Extraction : Extraction a 128 dimension feature vector from the
image.
First 48 features can be obtained
by grid ratio: 30×40 normalized image is divided into 5×5 size grids, then calculate the
ratio of the black color pixels in each grid, hence 48 grids can generate
48 ratio values.
Next 80 features are obtained by
surrounding feature computation: 30×40 normalized image is divided
into 10×10 size grids. And then in the first round, the algorithm calculate the
ratio of area changing from white to black in the first column, the first row
and the last column, the last row, which can be seen as a surrounding rectangle
of the edge of the image. In the second round, the algorithm do the same thing
but in the second column, the second row and the last second column, the last
second row. At last, the algorithm can get another 80 features.
Classifier:
In
the current version of this system, 3 layers Back propagation Artificial Neural
Network is applied here. Input layer contains 128 input units and hidden layer
contains 20 input units, 20 linear units are in the output layer.
Operating System:Microsoft Windows NT\2000\XP
Running Environment:Microsoft .NET Framework 2.0 Runtime
Develop Environment:Microsoft .NET Framework 2.0 SDK,
Visual Studio.net 2005
Develop Language:C# .net
Software download: ET OCR
Installation
Handwriting Samples: zi.rar