o ptical c haracter r ecognition using h idden m arkov m odels jan rupnik
TRANSCRIPT
![Page 1: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/1.jpg)
OPTICAL CHARACTER RECOGNITION USING HIDDEN MARKOV MODELSJan Rupnik
![Page 2: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/2.jpg)
OUTLINE
HMMs Model parameters Left-Right models Problems
OCR - Idea Symbolic example Training Prediction Experiments
![Page 3: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/3.jpg)
HMM
Discrete Markov model : probabilistic finite state machine
Random process: random memoryless walk on a graph of nodes called states.
Parameters: Set of states S = {S1, ..., Sn} that form the nodes Let qt denote the state that the system is in at time t Transition probabilities between states that form the edges,
aij= P(qt = Sj | qt-1 = Si), 1 ≤ i,j ≤ n Initial state probabilities, πi = P(q1 = Si), 1 ≤ i ≤ n
![Page 4: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/4.jpg)
MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
![Page 5: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/5.jpg)
MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
Pick q1 according to the distribution π
q1 = S2
![Page 6: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/6.jpg)
MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
Move to a new state according to the distribution aij
q2 = S3
![Page 7: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/7.jpg)
MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
Move to a new state according to the distribution aij
q3 = S5
![Page 8: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/8.jpg)
HIDDEN MARKOV PROCESS The Hidden Markov random process is a partially observable
random process. Hidden part: Markov process qt Observable part: sequence of random variables with the
same domain vt, where conditioned on the variable qi the distribution of the variable vi is independent of every other variable, for all i = 1,2,...
Parameters: Markov process parameters: πi, aij for generation of qt
Observation symbols V = {V1,... , Vm} Let vt denote the observation symbol emitted at time t. Observation emission probabilities bi(k) = P(vt = Vk | qt = Si)
Each state defines its own distribution over observation symbols
![Page 9: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/9.jpg)
HIDDEN MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
V1 V2 V3
b1(1) b1(2) b1(3)
V1 V2 V3
b2(1) b2(2) b3(3)V1 V2 V3
b3(1) b3(2) b3(3)
V1 V2 V3
b4(1) b4(2) b4(3)
V1 V2 V3
b5(1) b5(2) b5(3)
V1 V2 V3
b6(1) b6(2) b6(3)
![Page 10: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/10.jpg)
HIDDEN MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
V1 V2 V3
b1(1) b1(2) b1(3)
V1 V2 V3
b2(1) b2(2) b3(3)V1 V2 V3
b3(1) b3(2) b3(3)
V1 V2 V3
b4(1) b4(2) b4(3)
V1 V2 V3
b5(1) b5(2) b5(3)
V1 V2 V3
b6(1) b6(2) b6(3)
Pick q1 according to the distribution π
q1 = S3
![Page 11: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/11.jpg)
V1
HIDDEN MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
V1 V2 V3
b1(1) b1(2) b1(3)
V1 V2 V3
b2(1) b2(2) b3(3)V2 V3
b3(1) b3(2) b3(3)
V1 V2 V3
b4(1) b4(2) b4(3)
V1 V2 V3
b5(1) b5(2) b5(3)
V1 V2 V3
b6(1) b6(2) b6(3)
Pick v1 according to the distribution b1
v1 = V1
![Page 12: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/12.jpg)
HIDDEN MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
V1 V2 V3
b1(1) b1(2) b1(3)
V1 V2 V3
b2(1) b2(2) b3(3)V1 V2 V3
b3(1) b3(2) b3(3)
V1 V2 V3
b4(1) b4(2) b4(3)
V1 V2 V3
b5(1) b5(2) b5(3)
V1 V2 V3
b6(1) b6(2) b6(3)
Pick q2 according to the distribution aij
q1 = S4
![Page 13: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/13.jpg)
V3
HIDDEN MARKOV PROCESS
π1
π2
π3
π4
π5
π6
S1
S2
S3
S4
S5
S6
a41a14
a21
a12
a61
a26
a65
a35a34
a23
V1 V2 V3
b1(1) b1(2) b1(3)
V1 V2 V3
b2(1) b2(2) b3(3)V1 V2 V3
b3(1) b3(2) b3(3)
V1 V2
b4(1) b4(2) b4(3)
V1 V2 V3
b5(1) b5(2) b5(3)
V1 V2 V3
b6(1) b6(2) b6(3)
Pick v2 according to the distribution b4
v2 = V4
![Page 14: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/14.jpg)
LEFT RIGHT HMM
Sparse and easy to train Parameters
π1 = 1, πi = 0, i > 1 aij = 0 if i > j
Left-right HMM example
S1 S2 Sn...
![Page 15: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/15.jpg)
PROBLEMS
Given a sequence of observations v1,..., vT find the sequence of hidden states q1,..., qT that most likely generated it.
Solution: Viterbi algorithm (dynamic programming, complexity: O(Tn2))
How to determine the model parameters πi, aij, bi(k)? Solution: Expectation Maximization (EM) algorithm that
finds the local maximum of the parameters, given a set of initial parameters πi
0, aij0, bi(k)0.
![Page 16: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/16.jpg)
OPTICAL CHARACTER RECOGNITION
“THE”
“revenue”
“of”
...
Input for training of the OCR system: pairs of word images and their textual strings
Input for the recognition process: a word image
“?”
![Page 17: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/17.jpg)
IDEA
The modelling of the generation of character images is accomplished by generating sequences of thin vertical images (segments)
Build a HMM for each character separately (model the generation of images of character ‘a’, ‘b’,... )
Merge the character HMMs into the word HMM (model the generation of sequences of characters and their images)
Given a new image of a word, use the word HMM to predict the most likely sequence of characters that generated the image.
![Page 18: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/18.jpg)
SYMBOLIC EXAMPLE
Example of word images generation for a two character alphabet {‘n’, ‘u’}.
Set S = {Su1,..., Su
5, Sn1,..., Sn
5} Set V = {V1, V2, V3, V4} Assign to each Vi a thin vertical image: The word model (for words like ‘unnunuuu’)
is constructed by joining two left-rightcharacter models.
![Page 19: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/19.jpg)
Word model state transition architecture
Character ‘n’ model
Character ‘u’ model
Sn1 Sn
2 Sn3 Sn
4 Sn5
Su1 Su
2 Su3 Su
4 Su5
Ann := ASn5Sn
1
ASn1Sn
1
ASn1Sn
2
Auu := ASu5Su
1
Aun := ASu5Sn
1
Anu := ASn
5Su1
ASn2Sn
3
ASn2Sn
2
WORD MODEL
![Page 20: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/20.jpg)
WORD MODELA Sn1 Sn2 Sn3 Sn4 Sn5 Su1 Su2 Su3 Su4 Su5Sn1 0.5 0.5 Sn2 0.5 0.5 Sn3 0.5 0.5 Sn4 0.5 0.5
Sn5 0.33 0.33 0.33
Su1 0.5 0.5
Su2 0.5 0.5 Su3 0.5 0.5 Su4 0.5 0.5
Su5 0.33 0.33 0.33
B V1 V2 V3 V4Sn1 1 Sn2 0.05 0.95Sn3 1 Sn4 0.05 0.95Sn5 1 Su1 1 Su2 0.05 0.95Su3 1 Su4 0.05 0.95Su5 1
V1 V4 V4 V2 V2 V4 V4 V1
A sequence of observation symbols that correspond to an “image” of a word
![Page 21: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/21.jpg)
EXAMPLE OF IMAGES OF GENERATED WORDS
Example: word 'nunnnuuun'
Example: word ‘uuuunununu’
![Page 22: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/22.jpg)
RECOGNITIONFind best matching patterns
V1V4V4V2V4V1V1V1V4V2V2V2V4V4V1V1V1V4V2V4V4V4V4V4V4V1V1
1-1 correspondenceViterbi
Sn1Sn
2Sn2Sn
3Sn4Sn
5Sn1Sn
1Sn2Sn
3Sn3Sn
3Sn4Sn
4Sn5 ...
...Sn1Sn
1Sn2Sn
3Sn4Sn
4Sn4Sn
4Sn4Sn
4Sn5Sn
1Sn1
Predict: ‘nnn’
Look at transitions of type S*5S**
1 to find transitions from character to character
![Page 23: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/23.jpg)
FEATURE EXTRACTION, CLUSTERING, DISCRETIZATION
Discretization: if we have a set of basic patterns (thin images of the observation symbols), we can transform any sequence of thin images into a sequence of symbols (previous slide) – the input for our HMM.
We do not deal with images of thin slices directly but rather with some feature vectors computed from them (and then compare vectors instead of matching images).
The basic patterns (feature vectors) can be found with k-mean clustering (from a large set of feature vectors).
![Page 24: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/24.jpg)
FEATURE EXTRACTION
•Transformation of the image into a sequence of 20-dimensional feature vectors.
•Thin overlapping rectangles split into 20 vertical cells. •The feature vector for each rectangle is computed by computing the average luminosity of each cell.
![Page 25: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/25.jpg)
CLUSTERING
Given a large set of feature vectors (100k) extracted from the training set of images and compute the k-means clustering with 512 clusters.
Eight of the typical feature vectors:
![Page 26: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/26.jpg)
TRAININGGATHERING OF TRAINING EXAMPLES
FEATURE EXTRACTION
CLUSTERING
DISCRETIZATION
...
CHARACTER MODELS
‘a’ HMM
‘b’ HMM
‘z’ HMM...
GET CHARACTER TRANSITION PROBABILITIES
WORD MODEL
Input: images of wordsOutput: instances of character images
Input: large corpus of textOutput: character transition probabilities
Input: instances of character imagesOutput: a sequence of 20-dimensional vectors per character intance
Input: a sequence of feature vectors for every character instance, COutput: a sequence of discrete symbols for every character instance
Input: a sequence of observation symbols for each character instanceOutput: separately trained character HMMs for each character
Input: all feature vectors computed in the previous step, number of clustersOutput: a set of centroid vectors C
Input: character transition probabilities, character HMMsOutput: word HMM
![Page 27: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/27.jpg)
FEATURE EXTRACTION
DISCRETIZATION
VITERBI DECODING
WORD MODEL computed in the training phase.
Input: image of a wordOutput: sequence of 20-dimensional vectors
Input: sequence of feature vectors for the image of a word, C computed in training phaseOutput: a sequence of symbols from a finite alphabet for the image of a word
Input: word HMM and a sequence of observation symbolsOutput: sequence of states that most likely emitted the observations.
An image of a word that is to be recognized
Set of 20-dimensional centroid vectors C, computed in the training phase.
PREDICTION
‘c’ ‘o’ ‘n’ ‘t’ ‘i’ ‘n’ ‘u’ ‘i’ ‘t’ ‘y’
Input: sequence of statesOutput: sequence of characters
PREDICTION
![Page 28: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/28.jpg)
EXPERIMENTS
Book on French Revolution, John Emerich Edward Dalberg-Acton (1910) (source: archve.org)
Test word error rate on the words containing lower-case letters only.
Approximately 22k words on the first 100 pages
![Page 29: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/29.jpg)
EXPERIMENTS – DESIGN CHOICES
Extract 100 images of each character Use 512 clusters for discretization Build 14-state models for all characters, except for ‘i’, ’j’
and ‘l’ where we used 7-state models, and ‘m’ and ‘w’ where we used 28-state models.
Use character transition probabilities from [1] Word error rate: 2%
[1] Michael N. Jones; D.J.K. Mewhort , Case-sensitive letter and bigram frequency counts from large-scale English corpora, Behavior Research Methods, Instruments, & Computers, Volume 36, Number 3, August 2004 , pp. 388-396(9)
![Page 30: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/30.jpg)
TYPICAL ERRORS
Predicted: proposled. We see that the there is a short strip between 'o' and 's' where the system predicted an 'l'.
Predicted: inifluenoes. The system interpreted the character 'c' and the beginning of character 'e' as an 'o', and it predicted an extra ‘i'.
Predicted: dennocratic. The character 'm' was predicted as a sequence of two 'n' characters.
Places where the system predicted transitions between character models
![Page 31: O PTICAL C HARACTER R ECOGNITION USING H IDDEN M ARKOV M ODELS Jan Rupnik](https://reader030.vdocuments.mx/reader030/viewer/2022032703/56649d1f5503460f949f2ad8/html5/thumbnails/31.jpg)
QUESTIONS?