m.j. castro-bleda, joan pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... ·...

18
Handwritten Text Recognition M.J. Castro-Bleda, Joan Pastor Universidad Polit´ ecnica de Valencia Spain Roma, May 2013 Text recognition TRABHCI Roma, May 2013 1 / 32

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Handwritten Text Recognition

M.J. Castro-Bleda, Joan Pastor

Universidad Politecnica de ValenciaSpain

Roma, May 2013

Text recognition TRABHCI Roma, May 2013 1 / 32

Page 2: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

The problem: Handwriting recognition

⇤ Handwriting recognition: o✏ine and online handwriting recognition.An o✏ine handwriting recognitionsystem extracts the informationfrom previously scanned text im-ages

... whereas online systems re-ceive information captured whilethe text is being written (stylusand sensitive tablets).

O✏ine systems are applicable toa wider range of tasks, given thatonline recognition require the dataacquisition to be made with spe-cific equipment.

Online systems are more reliabledue to the additional informationavailable, such as the order, direc-tion and velocity of the strokes.

Text recognition TRABHCI Roma, May 2013 2 / 32

Page 3: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

The problem: Handwriting recognition

⇤ Recognition performance of current automatic o✏ine handwritingtranscription systems: far from being perfect.! Growing interest in assisted transcription systems, which are moree�cient than correcting by hand an automatic transcription.

⇤ A recent approach to interactive transcription involves multi-modalrecognition, where the user can supply an online transcription of some ofthe words: state system.

⇤ Bimodal recognition.

Text recognition TRABHCI Roma, May 2013 3 / 32

Page 4: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

The state system

http://state.dlsi.uji.es/state/Home.htmlText recognition TRABHCI Roma, May 2013 4 / 32

Page 5: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

The state system

http://state.dlsi.uji.es/state/Home.htmlText recognition TRABHCI Roma, May 2013 5 / 32

Page 6: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Part I: O✏ine Printed text: Nearest Neighbors

Our Nearest Neighbor back-end works in a way reminiscent of thetwo-level architecture used in several speech recognizers. It can be globallydescribed as a sequence of two steps: an initial classification of promisingsegments and a search for the optimum sequence using those segments.The initial classification can also be split in two phases. The first onecreates a heuristic over-segmentation of the line by analyzing the gray levelof the columns and searching for local extrema, which define plausiblesegment marks. The second phase searches for the best glyphcorresponding to each segment of an appropriate length. First, thesegment is processed to eliminate character overlaps and to correctbaseline deviation and then its Nearest Neighbor among the trainingsamples is found.Once each segment is so classified, the optimal transcription is found usinga dynamic programming two level algorithm. Finally, white spaces areadded to the transcription.

Text recognition TRABHCI Roma, May 2013 6 / 32

Page 7: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Part II: Online Handwritten Character Recognition

⇤ STATE accepts pen input.⇤ From our point of view, pen input comes as glyphs:

A glyph is a sequence of strokes representing a character.

A stroke is a sequence of points.

A point is represented by its coordinates: (x , y).

⇤ The problem: given

a set of system prototypes (labelled glyphs) and

an unlabelled glyph,

the new glyph must be classified (labelled).

Text recognition TRABHCI Roma, May 2013 7 / 32

Page 8: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Parametrization

0 250 500 750 1000 1250

250

500

750

1000

1250

1500

1750

2000

È È È

Original Normalized Resampled

È È

0

02

0

00

00

0

00

1

00

01

0

00

1

00

10

1

00

0

00

00

1

00

0

00

00

0

00

0

00

00

0

01

1

00

11

0

01

0

00

00

0

11

0

00

00

Point/angle Region/direction pairs Histogram

Text recognition TRABHCI Roma, May 2013 10 / 32

Page 9: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Baseline system+MLP

⇤ Classification error on TS and running time (system prototypes whenneeded È TR + VA):

UJIpenchars2 Pendigits

Error Time Error Timerate (ms/glyph) rate (ms/glyph)

Microsoft Tablet PC SDK 1.7 engine 8.5% 0.6 1.89% 0.5Baseline VIP engine 8.5% 23.6 0.60% 32.4MLP guesses the most probable class 14.2% 1.0 3.63% 0.8MLP screener + VIP engine 8.2% 12.4 0.80% 14.8

Text recognition TRABHCI Roma, May 2013 11 / 32

Page 10: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Part III: O✏ine Handwritten Recognition

⇤ A preprocessed text line image can be considered a sequence of featurevectors to be generated by a statistical model, as is done in SpeechRecognition:

S = argmaxS2⌦?

p(S |X ) = argmaxS2⌦?

p(X |S)p(S) .

⇤ This work proposes a handwriting recognition system based on

• MLPs for preprocessing

• hybrid HMM/ANN models, to perform opticalcharacter modeling

• statistical or connectionist n-gram language models:words or characters

Text recognition TRABHCI Roma, May 2013 12 / 32

Page 11: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Preprocessing: Image cleaning

⇤ MLP to enhance and clean images

Text recognition TRABHCI Roma, May 2013 13 / 32

Page 12: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Preprocessing

⇤ Slope and slant removal, and size normalization

Original

Cleaned

Contour

Lower baseline

Text recognition TRABHCI Roma, May 2013 14 / 32

Page 13: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Desloped

Desloped and deslanted

Reference lines

Size normalization

Text recognition TRABHCI Roma, May 2013 15 / 32

Page 14: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Preprocessing

⇤ Feature extraction

Final image

Feature extraction

Frames with 60 features

� grid of 20 square cells

� horizontal and vertical derivatives

Text recognition TRABHCI Roma, May 2013 16 / 32

Page 15: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Optical models

⇤ Hybrid HMM/ANN models: emission probabilities estimated by ANNs• A MLP estimates p(q|x) for every

state q given the frame x . Emissionprobability p(x |q) computed withBayes’ theorem.

• Trained with EM algorithm: MLPbackpropagation and forced Viterbialignment of lines are alternated.

• Advantages:

each class trained with alltraining samplesnot necessary to assume an apriori distribution for the datalower computational costcompared to Gaussian mixtures

• 7-state HMM/ANN using a MLPwith two hidden layers of sizes 192and 128

Text recognition TRABHCI Roma, May 2013 17 / 32

Page 16: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Corpora for optical modeling

⇤ Lines from the IAM Handwriting Database version 3.0

657 di↵erent writers

a subset of 6,161 training, 920 validation and 2,781 test lines

87,967 instances of 11,320 distinct words (training, validation, andtest sets)

Text recognition TRABHCI Roma, May 2013 18 / 32

Page 17: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Corpora for language modeling

⇤ Three di↵erent text corpora: LOB, Brown and Wellington

Corpora Lines Words CharsLOB + IAM Training 174K 2.3M 11MBrown 114K 1.1M 12MWellington 114K 1.1M 11MTotal 402K 4.5M 34M

Text recognition TRABHCI Roma, May 2013 19 / 32

Page 18: M.J. Castro-Bleda, Joan Pastor - unizar.esdihana.cps.unizar.es/~eduardo/trabhci/doc/2013/... · M.J. Castro-Bleda, Joan Pastor Universidad Polit´ecnica de Valencia Spain Roma, May

Testing the system

⇤ Error Rate of the HMMs and the hybrid HMM/ANN models on the testset. Language models estimated with the three corpora and an opendictionary are used.

Results of Test (%)Best model WER CER

8-state HMMs 38.8 ±1.0 18.6 ±0.67-state HMMs, MLP 192-128 22.4 ±0.8 9.8 ±0.4

Text recognition TRABHCI Roma, May 2013 20 / 32