[ieee 2010 international conference on frontiers in handwriting recognition (icfhr) - kolkata, india...

Improving HMM-Based Chinese Handwriting Recognition Using DeltaFeatures and Synthesized String Samples

Tong-Hua Su, Cheng-Lin LiuNational Laboratory of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences95 Zhongguancun East Road, Beijing 100190, P.R. China

{thsu, liucl}@nlpr.ia.ac.cn

Abstract

The HMM-based segmentation-free strategy for Chi-nese handwriting recognition has the advantage oftraining without annotation of character boundaries.However, the recognition performance has been limitedby the small number of string samples. In this paper,we explore two techniques to improve the performance.First, Delta features are added to the static ones foralleviating the conditional independence assumption ofHMMs. We then investigate into techniques for synthe-sizing string samples from isolated character images.We show that synthesizing linguistically natural stringsamples utilizes isolated samples insufficiently. Instead,we draw character samples without replacement andconcatenate them into string images through between-character gaps. Our experimental results demonstratethat both Delta features and synthesized string sam-ples significantly improve the recognition performance.Combining these with a bigram language model, therecognition accuracy has been increased by 36∼38%compared to our previous system.

1. Introduction

The powerful tool, hidden Markov model (HMM),has been widely applied to sequence analysis taskssuch as speech recognition and handwriting recogni-tion. Equipped with efficient learning and inferencealgorithms like the Baum-Welch algorithm and theViterbi algorithm, the HMM scales well to large dataset. HMM-based handwriting recognition has the meritthat the model parameters can be trained with stringsamples without annotation of character boundaries.

We have employed HMM into Chinese handwrit-ing recognition to build a segmentation-free recog-

nizer [1, 2, 3], where each character was modeled bya continuous-density HMM. Our previous practice sug-gests that besides modeling techniques, carefully deriv-ing strong features [2] and reasonably alleviating datasparseness [3] are critical to the recognition perfor-mance. However, due to the unavailability of large dataset of string samples, the recognition accuracy of ourprevious system has been limited to a low level.

In this paper, we improve the performance of HMM-based Chinese handwriting recognition by exploring theDelta features and synthesized string samples. TheDelta feature was first proposed in speech recogni-tion [4] and is widely adopted, but it has been rarelyused in handwriting recognition. We utilize such fea-tures to express the local feature slope across neigh-boring windows for alleviating the conditional indepen-dence assumption of HMMs.

Considering the large number of Chinese charac-ter categories and the severe sparseness of handwrit-ten string image data, we synthesize string images fromexisting isolated character samples for HMM training.Synthesizing string samples has been practised in En-glish word recognition. In [5], training samples wereexpanded by distorting realistic text lines. In [6], indi-vidual characters are concatenated using a ligature join-ing algorithm for a natural-shape handwriting. We in-vestigate into the techniques for synthesizing Chinesestring samples from isolated character samples and ouremphasis is how to effectively utilize these samplesrather than generate linguistically natural handwriting.In light of above principle, we approximate the dis-tribution of between-character gaps and draw charac-ter samples without replacement. By combining Deltafeatures and training with synthetic string samples, wehave achieved a large improvement of recognition accu-racy in Chinese handwriting recognition.

The rest of this paper is organized as follows. Sec-

2010 12th International Conference on Frontiers in Handwriting Recognition

978-0-7695-4221-8/10 $26.00 © 2010 IEEE

DOI 10.1109/ICFHR.2010.18

78

Performance

feature extraction

textline

B-W algorithm

char HMM

Viterbi algorithm

char string

text corpus

LM

groundtruth

feature extraction

normalization

textline

normalization

Training Testing

Figure 1. System architecture.

tion 2 outlines the HMM-based Chinese handwritingrecognition system; Section 3 introduces the Delta fea-tures; Section 4 describes the techniques of string sam-ple synthesis; Section 5 presents the experimental re-sults and Section 6 provides concluding remarks.

2. System Overview

The architecture of our HMM-based handwritingrecognizer [3] is shown in Fig. 1. The input text lineimage is converted to a sequence of feature vectors (orobservations) O = o1, ..., o𝑚 by extracting features insliding window over the writing direction. Currently,o𝑖 is a 64-dimensional vector for enhanced four planefeature (enFPF) [2]. To overcome the undulation oftext lines, only the body zone (enclosing the topmostand bottommost foreground pixels) instead of the wholewindow is used for feature extraction (Cf. [1]).

The character HMM is modeled with a left-to-righttopology. The transitions between states can intuitivelyreflect spatial variability. The state emission probabilityof o𝑖 is expressed by a Gaussian mixture density func-tion with diagonal covariance matrices. The characterHMMs are estimated by embedded Baum-Welch algo-rithm (B-W algorithm) on text line samples and the lan-guage models are produced from a text corpus. There isno attempt to segment the text line samples into charac-ters for training, but we have provided the ground truthdata of test strings for evaluating the character segmen-tation and recognition performance.

In testing, the character string class 𝑆 = 𝑠1, ..., 𝑠𝑛 ofO is identified according to the maximum a posteriori(MAP) probability 𝑃 (𝑆∣O).

Recently, we have used three techniques to renewthe above baseline recognizer. First, variance trunca-tion (VT) is used to regulate the Gaussian mixture den-sity: the variance of each dimension is truncated to theaverage variance if it is smaller than this value. Sec-

ond, the Box-Cox transformation (BCT) is applied toenFPF features to make the values more Gaussian-likeand thus benefits recognition. We empirically selecteda power of 0.4 in BCT. In addition, we normalize theheight of each text line to reduce the within-class vari-ability of character samples. The height of a text lineis estimated from connected components by boundingconsecutive 𝑀 blocks and averaging the height of thebounding boxes [7]. The text line image is re-scaled toa standard height, 𝑠𝑡𝑑𝐻 .

3. Delta Features

Delta features are equipped in most speech recogni-tion systems for their ability to grasp the temporal de-pendence. The Delta features are the first order linearregression coefficients derived from static features:

△o𝑡 =

∑𝑛𝑖=−𝑛 𝑖 ⋅ o𝑡+𝑖∑𝑛

𝑖=−𝑛 𝑖2

=

∑𝑛𝑖=1 𝑖 ⋅ (o𝑡+𝑖 − o𝑡−𝑖)

2∑𝑛

𝑖=1 𝑖2

,

(1)where o𝑡 is the vector of static features. The Delta fea-ture involves 2n+1 windows for regression. In hand-writing images, it reflects the slope in each feature di-mension. The derivative features used in [8, 9] canbe viewed as special cases of Eq. 1. Delta featuresare dynamically derived and can express the depen-dence among consecutive observations. This compen-sates HMMs for the loss resulting from the conditionalindependence assumption.

4. Synthesizing String Samples

There have been large databases of isolated Chinesecharacter samples for training classifiers but the num-ber of text line images is very small. Isolated charactersamples can be directly used to train character HMMsbut the collocation between characters cannot be mod-eled by training with isolated characters. To alleviatethe sparseness of text line data, we synthesize string im-age samples from isolated character images.

String samples are generated through concatenatingexisting isolated character samples of the same imageresolution. If we have already picked 𝐾 character sam-ples up, then the task is to concatenate them into a stringimage whose purpose is to effectively utilize isolatedcharacter samples. To synthesize a text line, thus thefollowing problems should be solved: how to draw acharacter class, how to select a sample for a certainclass, how to join a sequence of character samples to-gether. Since the samples in each class often are as-sumed of equal importance, we draw samples uniformlyproviding that the character class is known.

79

−20 0 20 40 60 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

gap (pixels)

accu

mul

ated

pro

babi

lity

Figure 2. Accumulative gap histogram.

Character samples are concatenated together throughbetween-character gaps. Gap between two samples arerandomly generated from a gap histogram. To approxi-mate the gaps, we first mark each left and right bound-aries of characters in the training text lines. Then thedifference between the left position of current characterand the right position of previous one is collected to de-rive a gap histogram. If character overlapping occurs,the gap may be negative. The accumulative gap his-togram is illustrated in Fig. 2. The binary search is usedto locate the gap corresponding to a random number.

Algorithm 1 gives the skeleton to produce a text line.Step 2∼ 6 recursively draw samples. All selected classnames and samples in current text line are temporar-ily recorded. Step 4 and 5 express a stratified samplingparadigm. We can first draw a class from a lexicon,then we dispatch a sample instance to the class. Eachclass may associate a weight, indicating its probabil-ity to be sampled in Step 4. We may consider uniform(zero-gram), unigram and bigram statistics in Sect. 5 re-spectively. Step 7 and 8 normalize current text line to astandard height. Step 9 concatenates a sample sequencetogether with random gaps. The 𝑥𝑘𝑗s are aligned invertical middle line. The last two steps produce fea-ture observations and corresponding ground truth. Al-gorithm 1 can be called as many times as desired or tillthere leaves no available isolated character samples.

Also, sampling can be done with replacement orwithout replacement. In our system, isolated charactersamples are drawn without replacement. This kind ofsampling deliberately avoids choosing any sample morethan once. If all samples of a character class are ex-hausted, the class will never be picked up again in thefollowing steps. It stops when samples of all classesare exhausted. Unlikely, random sampling can be donewith replacement, however, even in a Bootstrap manner,only 63.21% of samples can be picked up.

Algorithm 1 Synthesizing a text lineInput:

{(𝐿𝑖, 𝑋𝑖)}: character samples;𝐻𝑐,: distribution of the length of strings;𝐻𝑔: distribution of the gaps;𝑠𝑡𝑑𝐻: standard height;𝐾: the number of characters in this line;

Output:𝑓 : features matrix of the string sample;𝑙: ground truth of the string sample;

1: 𝑘 ← 0;2: while (𝑘 ≤ 𝐾) do3: 𝑘 ← 𝑘 + 1;4: Draw a character class 𝐿𝑘 and save it;5: Draw a sample 𝑥𝑘𝑗 (∈ 𝑋𝑘) and save it;6: end while7: Estimate height of the text line 𝑒𝑠𝑡𝐻 based on the

maximum height of each consecutive 𝑀 samples;8: Scale each 𝑥𝑘𝑗 with a factor of 𝑠𝑡𝑑𝐻/𝑒𝑠𝑡𝐻;9: Generate 𝐾 − 1 gaps and through which 𝑥𝑘𝑗s are

linked together;10: Concatenate 𝐿𝑘s as 𝑙;11: Extract enFPF and Delta features, 𝑓 ;

5. Experiments

Proposed methods are evaluated on a test set fromHIT-MW database [10]. Six Gaussian components areused for the state emission probability. The setup is thesame with [2] and [3], unless declared explicitly.

The database and the criteria for performance eval-uation are presented in Section 5.1. Then the renewedbaseline system is evaluated . In Section 5.3 and 5.4, ex-periments on Delta features and the algorithm for syn-thesizing string samples are conducted respectively.

5.1. Database and evaluation criteria

The HIT-MW database can be seen as a represen-tative subset of real Chinese handwriting [10]. It in-cludes 853 legible Chinese handwriting samples frommore than 780 participants with a systematic philoso-phy. Portions of text lines are partitioned into train-ing set (953 text lines), validation set (189 text lines),and test set (383 text lines) according to random sam-pling theory to reproduce a realistic scenario where thehandwritten text lines for recognition in test set have notbeen seen before. Since Chinese character recognitioninvolves a large categories, many classes associate fewsamples in above realistic string samples. Thus we alsosynthesize text lines to alleviate the severe data sparse-ness (Cf. [3]). This paper utilizes 100 sets of isolated

80

character samples in CASIA database [11].The output of certain recognizer is compared with

the reference transcription and two metrics, the correctrate (CR) and accurate rate (AR), are calculated to eval-uate the results. Supposing the number of substitutionerrors (𝑆𝑒), deletion errors (𝐷𝑒), and insertion errors(𝐼𝑒) are known, CR and AR are defined respectively as:{

CR = (𝑁𝑡 −𝐷𝑒 − 𝑆𝑒)/𝑁𝑡

AR = (𝑁𝑡 −𝐷𝑒 − 𝑆𝑒 − 𝐼𝑒)/𝑁𝑡

, (2)

where 𝑁𝑡 is the number of total characters in the refer-ence transcription.

5.2. Evaluation of renewed baseline

Techniques used to renew the baseline are evaluated(Cf. Table 1). Seen from the table, VT increases theCR/AR by 8∼10%, BCT receives 1.1∼1.7% improve-ments and HN (𝑀 = 6, 𝑠𝑡𝑑𝐻 = 80pixels) with about1% increases. Renewed baseline yields at least 11% im-provements than previous results totally. Note that theresults of [2] presented here have slight differences withthe original paper, since the training iterations are fixedin this paper instead of tuning up on validation set.

5.3. Evaluation of Delta features

We merely append (first order) Delta features for thedemonstration purpose. The static features are enFPF.The parameter 𝑛 in Eq. 1 is set as 8 through valida-tion. Compared with the baseline, about 3% improve-ment is observed (see Table 2). Due to space limitation,we merely outline the further investigations: 1) Deltafeatures facilitate classes of large samples, without lossin those of small samples; 2) Delta features (first twoorders) demonstrate strengths than derivative featuresof [8, 9] when dimensionality reduction is carried out.

5.4. Evaluation of synthetic string samples

We evaluate the efficacy of synthetic string sam-ples through recognition results. Before retraining theHMMs three passes, we append the synthetic stringsamples and reduce the variance threshold to truncateeach Gaussian component by half. The recognition re-sults are given in Table 3 and Fig. 3; a recognition ex-ample is demonstrated in Fig. 4. First, the effect ofa bigram language model is considered. The bigramstatistics are derived from People’s Daily corpus with79,509,778 characters. Second, the sampling methods,with replacement and without replacement are evalu-ated separately. The sampling with replacement drawsthe same number of character samples as without re-placement. Third, the process of drawing samples is

guided by three kinds of linguistic contexts. Morestrong the linguistic contexts, more linguistically nat-ural the synthetic text line looks. In addition, the resultsproduced by character training are also provided.

From Table 3 and Fig. 3, we simply highlight thefollowing three points. First, the sampling without re-placement performs better. During sampling with re-placement, if the underlying text is generated by bigramstatistics, the synthetic image looks linguistically natu-ral. However, the parameters of HMMs can hardly berobustly learned, since samples of some classes may bechosen multiple times, while portions of samples neverbe accessed (Cf. Fig. 3(a) and Fig. 3(b)).

Second, using synthetic text lines has clear advan-tages than character training to learn HMM parameterswhen recognition couples a bigram language model.The most fundamental distinction between text linetraining and character training is that the adjacent char-acters in a text line compete with each other for param-eter estimation of their classes. Such competitions ben-efit postprocessing with a bigram language model.

Third, as for without replacement strategy, the uni-form, unigram and bigram-guided sampling techniquesperform comparably well. Though sampling can beguided by linguistic statistics, the underlying text ofthe synthetic images is biased much to uniform distri-bution, since the same sets of isolated characters areused. Further inspection (Cf. Fig. 3(c) and Fig. 3(d))shows that uniform-guided technique performs slightbetter if bigram language model is equipped with recog-nition, while unigram and bigram-guided techniquesare slight better when recognition without bigram lan-guage model. As for uniform-guided technique, a char-acter class can follow any character classes, thus eachcharacter HMM has the equal chance to compete witheach other. Unlikely, the character classes of high fre-quency are picked up with large chances using unigramor bigram-guided technique and their samples are ex-hausted in earlier stage. As a consequence, the compe-titions between the classes of high frequency and thoseof low frequency are much limited.

How about fixed between-character gaps instead ofhistogram-based one? We evaluate two kinds of fixedgaps: 1 pixel and one fifth of 𝑠𝑡𝑑𝐻 (16 pixels). Char-acter sampling is done uniformly without replacement.The average recognition rates are 70.77%/65.22%,73.01%/69.51% (CR/AR) respectively. Histogram gapyields at least 0.88% improvement. From Fig. 3(c) andFig. 3(d), synthesizing string samples using fixed gapsshares some properties with character training.

There exist a few related works in offline recogni-tion of Chinese handwriting. Actually, they aren’t com-parable, since they are not trained on the same data or

81

Table 1. Construct renewed baseline incrementally (%). Variance truncation (VT), Box-Coxtransformation (BCT) and height normalization (HN) are applied on the previous system.

Digit Punctuation Chinese character AverageCR AR CR AR CR AR CR AR

ICPR2008 55.65 45.65 44.19 39.77 36.33 32.43 37.59 33.48VT 52.17 45.65 6.69 5.18 49.98 46.80 45.95 42.83BCT 62.61 52.17 21.47 18.43 49.99 46.51 47.62 44.00HN (Renewed baseline) 67.83 53.48 27.53 25.25 50.73 46.62 48.98 44.77

Table 2. Evaluation of Delta features (%).Digit Punctuation Chinese character Average

CR AR CR AR CR AR CR ARHN (Renewed baseline) 67.83 53.48 27.53 25.25 50.73 46.62 48.98 44.77Adding Delta features 59.13 38.26 43.18 40.40 52.90 48.70 52.11 47.60

they are developed from different motivations, and eventhey are tested on different test sets. As possible ref-erences to readers, we list them below. In [8], BBNrecognition system is presented based on HMMs, wherethe best AR metric is less than 38% on their in-houseChinese handwriting databases. In [12], segmentation-recognition-integrated strategy is presented. It recog-nizes 58% of the characters without language models,and the best recognition rate is about 78% after incor-porating the scores from a restricted Gaussian classifierand language models.

6. Conclusion

Two techniques are presented to improve the Chinesehandwriting recognition system. First of all, Delta fea-tures and static enFPF ones are incorporated together tocompensate for the conditional independence assump-tion in HMM modeling. We further investigate tech-niques to synthesize string samples for alleviating datasparseness. Character samples are drawn without re-placement and histogram of between-character gaps isutilized to join character samples. The recognition per-formance of Chinese handwriting is significantly im-proved after integrating these strategies.

Acknowledgements

This work is supported by Key Program of Na-tional Natural Science Foundation of China (No.60933010) and China Postdoctoral Science Foundation(No. 20090450623).

References

[1] T.-H. Su, T.-W. Zhang, H.-J. Huang, Y. Zhou. HMM-based recognizer with segmentation-free strategy for

unconstrained Chinese handwritten text. in: 9th IC-DAR, 2007: 133-137.

[2] T.-H. Su, T.-W. Zhang, H.-J. Huang, C.-L. Liu.Segmentation-free recognizer based on enhanced fourplane feature for realistic Chinese handwriting. in: 19thICPR, 2008.

[3] T.-H. Su, T.-W. Zhang, H.-J. Huang, D.-J. Guan. Off-line recognition of realistic Chinese handwriting usingsegmentation-free strategy. Patten Recognition, 2009,42(1): 167-182.

[4] S. Furui. Speaker-independent isolated word recogni-tion using dynamic features of speech spectrum. IEEETrans. ASSP. 1986, 34(1): 52-59.

[5] T. Varga. Off-line Cursive Handwriting recognition us-ing synthetic training data. PhD Thesis, University ofBern, 2006.

[6] A. O. Thomas, A. I. Rusu, V. Govindaraju. Synthetichandwritten CAPTCHAs. Pattern Recognition, 2009,42(12): 3365-3373.

[7] C.-L. Liu, M. Koga, H. Fujisawa. Lexicon-drivensegmentation and recognition of handwritten charac-ter strings for Japanese address reading. IEEE Trans.PAMI, 2002, 24(11): 1425-1437.

[8] P. Natarajan, S. Saleem, R. Prasad, E. MacRostie,K. Subramanian. Multi-lingual offline handwritingrecognition using hidden Markov models: a script-independent approach. Lecture Notes in Computer Sci-ence, 2008, 4768: 231-250.

[9] Y. Ge, Q. Huo. A comparative study of several modelingapproaches for large vocabulary offline recognition ofhandwritten Chinese characters. in: 16th ICPR, 2002.

[10] T.-H. Su, T.-W. Zhang, D.-J. Guan. Corpus-based HIT-MW database for offline recognition of general-purposeChinese handwritten text, IJDAR, 2007, 10: 27-38.

[11] R. Dai, C. Liu, B. Xiao. Chinese character recognition:history, status and prospects. Frontiers of Computer Sci-ence in China, 2007, 1: 126-136.

[12] Q.-F. Wang, F. Yin, C.-L. Liu. Integrating languagemodel in handwritten Chinese text recognition. in: 10thICDAR, 2009: 1036-1040.

82

5

15

25

35

45

55

65

75

85

95

0 1 2 4 8 16 32 64 128

Sample size of character class in realistic training set

aver

age

CR

(%)

Uniform

Unigram

Bigram

Char training

(a)

5

15

25

35

45

55

65

75

85

95

0 1 2 4 8 16 32 64 128


aver

age

CR

(%)

Uniform

Unigram

Bigram

Char training

(b)

15

25

35

45

55

65

75

85

95

0 1 2 4 8 16 32 64 128


aver

age

CR

(%)

UniformUnigramBigramChar trainingMarg=16pixMarg=1pix

(c)

15

25

35

45

55

65

75

85

95

0 1 2 4 8 16 32 64 128


aver

age

CR

(%)

UniformUnigramBigramChar trainingMarg=16pixMarg=1pix

(d)

Figure 3. Chinese character CR over different sample sizes: (a) sample drawing with replace-ment & recognition without bigram; (b) sample drawing with replacement & recognition withbigram; (c) sample drawing without replacement & recognition without bigram, (d) sampledrawing without replacement & recognition with bigram.

Table 3. Evaluation of synthesized methods (%).Draw Digit Punctuation Chinese character Average

Methods CR AR CR AR CR AR CR ARRecognition without bigram language model

Uniform & with rep 59.13 50.00 41.92 39.27 61.51 58.43 59.55 56.36Unigram & with rep 60.00 47.83 43.56 40.15 61.29 57.92 59.54 55.93Bigram & with rep 58.26 50.43 37.50 35.86 61.16 58.58 58.81 56.18Uniform & without rep 56.09 47.39 41.79 40.03 63.97 61.06 61.58 58.67Unigram & without rep 57.39 48.70 41.29 39.39 64.10 61.45 61.73 58.99Bigram & without rep 55.22 46.96 39.02 37.37 64.18 61.52 61.52 58.81Char training 62.17 47.39 47.85 41.41 63.54 60.38 61.97 58.20

Recognition with bigram language modelUniform & with rep 70.44 65.22 59.97 49.49 72.31 69.62 71.04 67.56Unigram & with rep 72.61 69.13 57.95 48.48 72.73 70.29 71.27 68.16Bigram & with rep 70.00 67.39 57.20 49.24 72.90 70.42 71.29 68.29Uniform & without rep 71.74 66.52 60.99 51.77 75.41 73.05 73.89 70.81Unigram & without rep 70.87 65.65 60.98 52.02 75.20 72.89 73.68 70.67Bigram & without rep 71.74 66.52 60.35 50.88 75.12 72.78 73.57 70.49Char training 70.43 64.35 62.37 41.29 68.80 65.93 68.18 63.52

(a)

(b)

Figure 4. Recognition results coupling a bigram: (a) isolated character training; (b) synthetictext line training (uniform sampling, without replacement). The correct outputs are underlined.

83

[ieee 2010 international conference on frontiers in handwriting recognition (icfhr) - kolkata, india...

Documents