research scientist, google inc. ashok popat · ashok popat, sep 03, 2015 inspiration:...
TRANSCRIPT
![Page 1: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/1.jpg)
Ashok Popat, Sep 03, 2015
OCR for Most of the World’s Languages
Ashok PopatResearch Scientist, Google Inc.
Sep 3, 2015
![Page 2: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/2.jpg)
Ashok Popat, Sep 03, 2015
Outline1. Optical Character Recognition2. Approach3. Reflections and comments
![Page 3: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/3.jpg)
Ashok Popat, Sep 03, 2015
Optical Character Recognition
![Page 4: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/4.jpg)
Ashok Popat, Sep 03, 2015
Optical Character Recognition
![Page 5: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/5.jpg)
Ashok Popat, Sep 03, 2015
Examples from Google Books
Multiple scripts / languages on a page:
![Page 6: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/6.jpg)
Ashok Popat, Sep 03, 2015
Examples from Google Books (cont.)
Per-word script and language variation:
![Page 7: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/7.jpg)
Ashok Popat, Sep 03, 2015
Examples from Google Books (cont.)
![Page 8: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/8.jpg)
Ashok Popat, Sep 03, 2015
What’s a “character”?Result Unicode Transliteration
0930 ra0930 094d r0930 094d 0926 rda0930 094d 0926 094d rd0930 094d 0926 094d 0935 rdva0930 094d 0926 094d 0935 093f rdvi0930 094d 0926 094d 0935 093f 0915 rdvika
![Page 9: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/9.jpg)
Ashok Popat, Sep 03, 2015
Bidirectional issues
123
U+0028 - open parenthesis U+0029 - close parenthesis
![Page 10: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/10.jpg)
Ashok Popat, Sep 03, 2015
Connected scriptsNaskh style:
Nastaliq style:
![Page 11: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/11.jpg)
Ashok Popat, Sep 03, 2015
Part 2: Approach
![Page 12: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/12.jpg)
Ashok Popat, Sep 03, 2015
Optical character recognition as text-line decoding
Text Line Recognition Digital (Unicode) TextPreprocessing /
Layout Analysis
Input Document
Style, Size, Position, Font, Weight
Script, Language
![Page 13: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/13.jpg)
Ashok Popat, Sep 03, 2015
Goal: universal, accurate OCR● Universal
○ Omni-script○ Omni-language○ Omni-setting
● Accuracy and Speed○ Best-in-world, approaching human accuracy○ Speed comparable to commercial engines
![Page 14: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/14.jpg)
Ashok Popat, Sep 03, 2015
Inspiration: Markov-model-based approaches● Document image decoding [Kopec and Chou, 1994]
○ Explicit model of typesetting process: seek to invert○ Influenced by speech recognition methods○ Extremely high accuracy when models match the data
● BBN Byblos system [Schwartz et al., 1996]○ Treat text line like a speech waveform○ Built on existing speech recognition system○ First successful Arabic OCR
![Page 15: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/15.jpg)
Ashok Popat, Sep 03, 2015
Underlying both: noisy channel model● Communication theory perspective
○ Source produces a message m according to P(m)
○ Channel (noisily) renders observed image x according to P(x|m)
○ OCR task: given x, produce an estimate of m
○ Goal: choose m’ to minimize error rate:
● Challenges○ Nobody tells us what P(m|x) is (modeling task)○ Even if we knew P(m|x), how to compute arg maxm?
![Page 16: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/16.jpg)
Ashok Popat, Sep 03, 2015
Component models● Language models
○ Character- and Word N-grams with appropriate smoothing (ProdLM)
● Likelihood component○ Speech, BBN OCR: GMMs, DNNs for HMM state-conditional densities, optimized for held-out
likelihood○ DID: Learned probabilistic character templates (foreground, background, “don’t-care”)○ Ours: Sliding window / deep network / HMMs
![Page 17: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/17.jpg)
Ashok Popat, Sep 03, 2015
Generalization of the noisy channel model● Speech approach
● Generalize to multiple feature functions
● Learn {λ} via minimum error-rate training
![Page 18: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/18.jpg)
Ashok Popat, Sep 03, 2015
Principles● Minimize language-specific engineering
● Prefer integrated, wholistic decisions to pipelined steps
● Take advantage of data (labeled, unlabeled)
● Take advantage of advances in other areas (MT, Speech, NLP, CV,...)
![Page 19: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/19.jpg)
Ashok Popat, Sep 03, 2015
Accuracy and Speed over time● More and more accurate● Faster and faster
![Page 20: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/20.jpg)
Ashok Popat, Sep 03, 2015
● Optical model○ GMM -> DNN○ DNN -> LSTM○ Sequential discriminative training of DNN/LSTM
● Language model○ N-gram -> RNN-LM
● Decoding○ Pruning algorithms designed for OCR○ Automatic decoding parameter optimization
Technical advances in the past few years
![Page 21: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/21.jpg)
Ashok Popat, Sep 03, 2015
Script and Language Identification● Some parameters usefully considered piecewise stationary latent processes
○ Font○ Style (bold, italic,...)○ Point size○ Script○ Language○ Topic
● Most of these have low information rate → exploit!
![Page 22: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/22.jpg)
Ashok Popat, Sep 03, 2015
Script ID approach 1: re-use OCR engine● Script class seen as evolving as a hidden Markov process● Pretend all letters of a given script are different glyph instances of the same
“letter” (script class label)● Do OCR with a very small vocabulary● Reasonably accurate, significant hit on processing time● Details: Genzel et al., “HMM-based script identification for OCR,” 2013
![Page 23: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/23.jpg)
Ashok Popat, Sep 03, 2015
Alternative approach (Li et al., 2015)
![Page 24: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/24.jpg)
Ashok Popat, Sep 03, 2015
![Page 25: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/25.jpg)
Ashok Popat, Sep 03, 2015
Countries we don’t cover
![Page 26: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/26.jpg)
Ashok Popat, Sep 03, 2015
Part 3: Reflections and Comments
![Page 27: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/27.jpg)
Ashok Popat, Sep 03, 2015
Unicode: a Godsend for OCR● Defining the goal requires specifying representation space● Duality
○ Synthetic data○ Document Image Decoding○ Noisy Channel Formulation
● Internationalization libraries and resources, BiDi● Corollary: OCR could not have been solved when it was most worked on
![Page 28: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/28.jpg)
Ashok Popat, Sep 03, 2015
Changing styles, orthographies
![Page 29: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/29.jpg)
Ashok Popat, Sep 03, 2015
Then and now
![Page 30: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/30.jpg)
Ashok Popat, Sep 03, 2015
Academia and Industry● Strengths● Evolving roles● Cooperation
![Page 31: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/31.jpg)
Ashok Popat, Sep 03, 2015
Can OCR finally be a “solved problem?”● Available to anyone, anywhere, ideally free-of-charge● Network / cloud not required, keep your documents● All languages, scripts, typefaces● Quasi-linguistic: math, diagrams● Regional libraries, cultural preservation efforts● Newspapers, manuscripts, magazines, books
![Page 32: Research Scientist, Google Inc. Ashok Popat · Ashok Popat, Sep 03, 2015 Inspiration: Markov-model-based approaches Document image decoding [Kopec and Chou, 1994] Explicit model of](https://reader034.vdocuments.mx/reader034/viewer/2022043021/5f3cc202029ecb673b27aee3/html5/thumbnails/32.jpg)
OCR in Google Products