a discriminative semi-markov model for robust …weinman/files/nescai08poster.pdfa discriminative...
TRANSCRIPT
A Discriminative Semi-Markov Model for Robust Scene Text RecognitionJerod J. Weinman, Erik Learned-Miller, and Allen R. Hanson
Computer Vision Laboratory, Dept. of Computer Science, University of Massachusetts Amherst
Problem and MotivationOur goal: Read text from images of the everyday world.
Applications include:•Aid to visually impaired•Photo annotation•Portable translation
Many new challenges are atypical of OCR page readers:•Small sample (<50 chars) •Perspective distortion
•Uneven lighting •Unusual fonts
•Word segmentation •Character segmentation
Previous work has assumed word boundaries, charactersegmentations, and/or known lexicon words. Theconditions above make these assumptions problematic.We propose a model that integrates lexical decision andboth word and character segmentation with recognition.
Semi-Markov Model
!
p y x;r " ( ) =
1
Z x( )exp U y, x;
r " ( ){ }
We use a discriminatively trained semi-Markov model forrecognition. Like a CRF, it models dependencies betweenstates. It has the additional property of modeling theduration of a state, i.e., character.
Model parameterscan be learned fromlabeled training data.
!
r "
The exponent (U) is composed of functions that score asegmentation and labeling (y) of a text image (x) using:
•Appearance
•Bigrams
•Lexicon
( )1000
39English|TH =P ( )
1000
4.1English|QU =P ( )
1000
0001.English|QA =P
aAaberg
M
zymurgyZyuganov
M
!
L
Parse ScoringA text image is parsed, or divided intolabeled segments. Parses may differ in thenumber of segments, and the states yi maydiffer in width.y1 y2 y3 y4
y1 y2 y3
AppearanceEvery possible parse segment isscored by a discriminant UA forcharacter appearance and width.
!
U A yi , x( )rti
RkK
y1 y4
!
UA
y1, x( )
!
UA
y4, x( )
BigramsEach pair of neighboring segments get a bigram score UB.
!
U B yi , yi+1( )
LexiconA character invariant UL replaces UB in sequences forminglexicon words to promote known word recognition.
!
UL
Parse Gap/OverlapNeighboring states may have a gap or anoverlap, as in the case of ligatures. Theseare also scored using image features.
!
UP yi , yi+1, x( )
y1 y2 y3
y1 y2 y3
Interpretation GraphAll possible parses can be represented by a weightedgraph. Dynamic programming can be used to find theoptimal path, or the best interpretation of the image.
ExperimentsTraining•Appearance: Synthetic images generated from 934 fonts•Bigrams: 82 books from Project Gutenberg•Lexicon: 50th frequency percentile words from SCOWL
Evaluation85 sign images of 1,144 characters at ~12 pixel x-height.
Char. Error RateNo Lex.
Forced Lex.Mixed
Allowing words both in and out ofthe lexicon reduces error by 13%.
23.5%OmniPage
15.0%Semi-Markov
16.6%OmniPage +Binarized
CharErr.System
Above: Ourmodel reduceserror by 10%.
Left: Examples.
Low ResolutionIntegrated segmentation and recognition allow our modelto recognize images at lower resolutions than trained on.
End-to-End
InputImage
MultiscaleDetection
Text RegionCandidates
MultipleRecognitions
Top-ScoringOutput
Ï I tawAim 1717ïïïï Ñalti7Tir-TWITE.A
OmniPageDetection
OmniPageOutput
LLOYDS BANK
0
MILL ANTIQUM
MILL ANTIQUESJ L7:"? 4,
Fleet
43 010 Fleet ri
FIREE DELIVEp
Ma OS
With a textd e t e c t o r ,we have ac o m p l e t er e a d i n gsystem.
Sem
i-Mar
kov
Omni
Page
With support from:
uucL,ass
LIBRFIRY
A),1HERbTTAVF
Free ehe in
EMIJUONEYFLO.010NIK EY
COFFEEMaus
Ella
OriginalImage
BinaryImage
OmniPageOutput
Douglass
LIBRARY
AMHERSTTAVERN
Free checking
CHURCHMONEYFLOMONKEY
COFFEEHOUSE
First
Fleet
Semi-MarkovModel Output
Resumes
WM/ I WS
krauts11.-4her
ResumesResumesResumesResumesResumestttA wms
Resumes
Image Semi-Markov OmniPage
Je eryAmherstJefferyAmherstJONryAmhent
.WlervAmMnt
Jeffery AmherstJeffery AmherstJeffery AmherstLit wryA An memoAn wrrA Am wme
Image Semi-Markov OmniPage
GkilaryAmhent