un workshop on data capture, dar es salaam session 7 data capture

21
© Beta Systems Software AG 2008 1 UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture Richard Lang International Manager

Upload: emmett

Post on 13-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture. Richard Lang International Manager. Agenda. OCR Optical Character Recognition ICR Intelligent Character Recognition DFR Dynamic Form Recognition. OCR = optical character recognition. Technology was first invented in 1929 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

© Beta Systems Software AG 2008 1

UN Workshop on Data Capture, Dar es Salaam

Session 7

Data Capture

Richard Lang International Manager

Page 2: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 2© Beta Systems Software AG 2008

Agenda

OCROptical CharacterRecognition

ICRIntelligent CharacterRecognition

DFRDynamic Form Recognition

Page 3: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 3© Beta Systems Software AG 2008

OCR = optical character recognition

Technology was first invented in 1929 Gustav Tauschek obtained

a patent on OCR in Germany Mechanical device that used templates First commercial system was installed at

Readers Digest in 1955 Years later donated to the Smithsonian Institution

Today Recognition of machine written text is

now considered largely a solved problem Accuracy rates exceed 99%

Page 4: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 4© Beta Systems Software AG 2008

OCR

Beta Systems well experienced with this recognition engines in Banks

in Germany OCR A

⑁ Chair ⑀ Hook

⑂ Fork

Austria OCR B+ Plus

Page 5: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 5© Beta Systems Software AG 2008

ICR Intelligent Character Recognition

The technique is far ahead of OCRbecause of ongoing development of ICR

Handwriting recognition system

Allows different styles of handwriting to be learned by a computer during / before processing to improve accuracy and recognition rates

Page 6: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 6© Beta Systems Software AG 2008

ICR Process: Capturing the image with Scanners Processing by (ICR) and/or (OCR)

Segmentationis a very important step

Decision if the homogenous criteria belong to the foreground or to the background

Human editors can do that depending on the context Compare also computer tomography:

according to different results from radio waves reflected from different angels the computer can reconstruct the picture

With the first step only a suitable starting point (sets of pixels) is possible

The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence)

Page 7: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 7© Beta Systems Software AG 2008

ICR Process:

Pre-processing

Deskew Shift, rotate Stretch

Page 8: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 8© Beta Systems Software AG 2008

ICR Process:

Enhance

Less / More Contrast

Clean up(de-noise, halftone removal)

to enable the recognition engine to give best results

Page 9: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 9© Beta Systems Software AG 2008

ICR Process:

Feature extraction

Data reduction

Page 10: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 10© Beta Systems Software AG 2008

ICR Process:

Classification

A one was written

90 % = 1

8 % = 7

2 % = 4

Page 11: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 11© Beta Systems Software AG 2008

ICR Algorithm:

Neural Network

Using kNN k-Nearest Neighbour

SVMSupport Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers

Page 12: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 12© Beta Systems Software AG 2008

ICR Process: After different classification alternatives

the appropriate confidence will be provided

Recognition Limitation only for most probable characters

e.g. if only characters 3,6,0 are possible the engine can also be limited to this setand the results are much better

Voting Machine Usability:

security, efficiency and Accuracy

Page 13: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 13© Beta Systems Software AG 2008

Dynamic Field Recognition No fixed position is required

If form is only ½ available still ½ readable

No special Forms are required

No timing tracks are necessary on the forms for OMR but results are also available the same timeno cleaning of LEDs in the scanner necessary

Robust against vertical / horizontal stretching or shrinking (e.g. different printers)

Page 14: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 14© Beta Systems Software AG 2008

Dynamic Field Recognition

Recognizes:

features(word as pixel cloud)

boxes, lines and symbols

Page 15: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 15© Beta Systems Software AG 2008

Hardware- / Software - Requirement

Hardware Scanner PC Network Disc Storage only necessary if images

are needed for audit purposes

Software Scan Software One Recognition and Voting Software

for OMR, OCR, ICR, Barcode

Page 16: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 16© Beta Systems Software AG 2008

OMR

Cost Comparatives in general

  OMR from image Dedicated OMR Scanner

Forms Design Same

Forms Production - Up to 50% More

Enumerator Training - Up to double the cost

Scanners - Up to double the cost

PC Low cost PC

PC Operators Same

Servers Same

Cost of more/new flexibility low high

Page 17: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 17© Beta Systems Software AG 2008

ICR Advantages

Better than: Manual keying

90 % (plus) correct keysManual = higher substitution ratethan automated recognition

Time consuming Deliberate manipulation possible

OMR, because OMR is space consuming

OCR, because OCR is machine written and therefore of limited use

Page 18: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 18© Beta Systems Software AG 2008

ICR Advantages

Clear accuracy for OMRbecause of dirt removal by softwaredepending on the mark size and figure

Can detect line and can ignore dirt

Clear result

Page 19: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 19© Beta Systems Software AG 2008

ICR Advantages

Barcode,

OCR,

OMR,

and ICR Recognition with one Software

Page 20: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 20© Beta Systems Software AG 2008

ICR Advantages Pro:

Only rejected characters/fields need correctionRest of the form untouched

With new technologies open for futurefaster, better quality

With standardized correction mode

Handwriting of the corresponding country will be recognized

The previously mentioned advantages do not have to be repeated here again

Page 21: UN  Workshop on Data Capture, Dar es Salaam  Session 7 Data Capture

04/22/23 21© Beta Systems Software AG 2008

Thank you for your attention