un workshop on data capture, dar es salaam session 7 data capture
DESCRIPTION
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture. Richard Lang International Manager. Agenda. OCR Optical Character Recognition ICR Intelligent Character Recognition DFR Dynamic Form Recognition. OCR = optical character recognition. Technology was first invented in 1929 - PowerPoint PPT PresentationTRANSCRIPT
© Beta Systems Software AG 2008 1
UN Workshop on Data Capture, Dar es Salaam
Session 7
Data Capture
Richard Lang International Manager
04/22/23 2© Beta Systems Software AG 2008
Agenda
OCROptical CharacterRecognition
ICRIntelligent CharacterRecognition
DFRDynamic Form Recognition
04/22/23 3© Beta Systems Software AG 2008
OCR = optical character recognition
Technology was first invented in 1929 Gustav Tauschek obtained
a patent on OCR in Germany Mechanical device that used templates First commercial system was installed at
Readers Digest in 1955 Years later donated to the Smithsonian Institution
Today Recognition of machine written text is
now considered largely a solved problem Accuracy rates exceed 99%
04/22/23 4© Beta Systems Software AG 2008
OCR
Beta Systems well experienced with this recognition engines in Banks
in Germany OCR A
⑁ Chair ⑀ Hook
⑂ Fork
Austria OCR B+ Plus
04/22/23 5© Beta Systems Software AG 2008
ICR Intelligent Character Recognition
The technique is far ahead of OCRbecause of ongoing development of ICR
Handwriting recognition system
Allows different styles of handwriting to be learned by a computer during / before processing to improve accuracy and recognition rates
04/22/23 6© Beta Systems Software AG 2008
ICR Process: Capturing the image with Scanners Processing by (ICR) and/or (OCR)
Segmentationis a very important step
Decision if the homogenous criteria belong to the foreground or to the background
Human editors can do that depending on the context Compare also computer tomography:
according to different results from radio waves reflected from different angels the computer can reconstruct the picture
With the first step only a suitable starting point (sets of pixels) is possible
The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence)
04/22/23 7© Beta Systems Software AG 2008
ICR Process:
Pre-processing
Deskew Shift, rotate Stretch
04/22/23 8© Beta Systems Software AG 2008
ICR Process:
Enhance
Less / More Contrast
Clean up(de-noise, halftone removal)
to enable the recognition engine to give best results
04/22/23 9© Beta Systems Software AG 2008
ICR Process:
Feature extraction
Data reduction
04/22/23 10© Beta Systems Software AG 2008
ICR Process:
Classification
A one was written
90 % = 1
8 % = 7
2 % = 4
04/22/23 11© Beta Systems Software AG 2008
ICR Algorithm:
Neural Network
Using kNN k-Nearest Neighbour
SVMSupport Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers
04/22/23 12© Beta Systems Software AG 2008
ICR Process: After different classification alternatives
the appropriate confidence will be provided
Recognition Limitation only for most probable characters
e.g. if only characters 3,6,0 are possible the engine can also be limited to this setand the results are much better
Voting Machine Usability:
security, efficiency and Accuracy
04/22/23 13© Beta Systems Software AG 2008
Dynamic Field Recognition No fixed position is required
If form is only ½ available still ½ readable
No special Forms are required
No timing tracks are necessary on the forms for OMR but results are also available the same timeno cleaning of LEDs in the scanner necessary
Robust against vertical / horizontal stretching or shrinking (e.g. different printers)
04/22/23 14© Beta Systems Software AG 2008
Dynamic Field Recognition
Recognizes:
features(word as pixel cloud)
boxes, lines and symbols
04/22/23 15© Beta Systems Software AG 2008
Hardware- / Software - Requirement
Hardware Scanner PC Network Disc Storage only necessary if images
are needed for audit purposes
Software Scan Software One Recognition and Voting Software
for OMR, OCR, ICR, Barcode
04/22/23 16© Beta Systems Software AG 2008
OMR
Cost Comparatives in general
OMR from image Dedicated OMR Scanner
Forms Design Same
Forms Production - Up to 50% More
Enumerator Training - Up to double the cost
Scanners - Up to double the cost
PC Low cost PC
PC Operators Same
Servers Same
Cost of more/new flexibility low high
04/22/23 17© Beta Systems Software AG 2008
ICR Advantages
Better than: Manual keying
90 % (plus) correct keysManual = higher substitution ratethan automated recognition
Time consuming Deliberate manipulation possible
OMR, because OMR is space consuming
OCR, because OCR is machine written and therefore of limited use
04/22/23 18© Beta Systems Software AG 2008
ICR Advantages
Clear accuracy for OMRbecause of dirt removal by softwaredepending on the mark size and figure
Can detect line and can ignore dirt
Clear result
04/22/23 19© Beta Systems Software AG 2008
ICR Advantages
Barcode,
OCR,
OMR,
and ICR Recognition with one Software
04/22/23 20© Beta Systems Software AG 2008
ICR Advantages Pro:
Only rejected characters/fields need correctionRest of the form untouched
With new technologies open for futurefaster, better quality
With standardized correction mode
Handwriting of the corresponding country will be recognized
The previously mentioned advantages do not have to be repeated here again
04/22/23 21© Beta Systems Software AG 2008
Thank you for your attention