unsd census workshop day 2 - session 7 data capture: intelligent character recognition andy tye –...

27
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census data capture www.drs.co.uk

Upload: ann-parker

Post on 05-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

UNSD Census WorkshopDay 2 - Session 7

Data Capture: Intelligent Character Recognition

Andy Tye – International Manager

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 2: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture

Intelligent Character Recognition (ICR)

Elements

• Form design• Hardware/Software requirements

– Scanners– Computer infrastructure

• Workflow• Accuracy• Advantages• Disadvantages

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 3: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Forms design

• Typical stock grade paper (90GSM)

• Corner Stones advised

• Dropout colour is recommended but not essential

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 4: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Hardware requirements

• Image Scanners– TWAIN or ISIS

• Database Server (Full redundancy)• Storage Server – Terabytes

– (Raid 5, Mirrored, etc.)

• Network (Gb preferred)

• Administrator PC• CS-Pro PCs• Key correction PCs (Verification)• Character Inspection PCs

– (Mass verification - optional)• Scanner PCs• Automatic data capture PCs

Software requirements

• MS-SQL or other database• Data Storage, Archive and Retrieval• Backup Software

• Software for Administrator PC• CS-Pro for analysis and reporting PCs• Software for Key correction PCs• Software for Character inspection PCs• Software for Scanner PCs• Software for automatic data capture

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 5: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Typical Workflow

ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 6: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Typical Workflow

Paper Movement – Processing Centre/s

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 7: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Typical Workflow

Receiving

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 8: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Logging/Checking

• Open Batch

• Verify Contents

• Register Batch

Page 9: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Sifting

• Orientation

• Other Forms

Page 10: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Spine removal

• Cut Booklets

• 30,000/day

Page 11: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Scanning

• Double Sided

• High Speed

• Double Detection

• Ease of Use

Page 12: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Scanning/sorting

• Automatic Identification

• Data Capture

Page 13: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Storage

• Conditions

• Retrieval

• Space

Page 14: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Typical Workflow

Image Movement/Data Extraction – Processing Centre/s

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 15: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Image interpretation

• Automated Process

• Background Task

• Page Identification

• De-skew

• Image Clean up

• Pre-defined Areas

Page 16: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Character inspection

• Tiling

• High Confidence

• Operator Decision

• Field Context

• Tall to Short

Page 17: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Key correction

• Low Confidence

• Operator Decision

• From Context

• External Verification

Page 18: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Typical Workflow

Key Correction

• ASCII File

• CSV Format

• 1 Line/Form

• CSPro Import

Page 19: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Typical Workflow

ICR

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 20: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Accuracy

This is always the first question

Handprint• Numeric only in isolated fields 98%• Numeric only in semi constrained fields 95-96%• Alpha upper case only 90%• Alpha lower case only 85-87%• Alpha mixed case 75-80%• Alpha/Numeric mixed case 50% or less

– reduce by 5% if there are special characters not a-z and 0-9

The accuracy level post data correction (e.g. the final output accuracy) should be 100% (subject to good operators)

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 21: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Accuracy continued…

The accuracy of all modern ICR engines are pretty much comparable

The major differences with suppliers solutions are the methods and workflow utilised with each offering

False positive detection takes 10 times longer than entry of characters recognised with low confidence – false positives (substitutions) are the most expensive errors

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 22: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Accuracy continued…

Accuracy can be improved by:

• Restricting the responses to any given question

• Using external verification

• Using multiple ICR engines to ‘vote’ which is expensive

• Training your ICR engines on local hand writing styles (If possible)

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 23: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Advantages

• No specialist hardware required

• An image archive can be automatically produced of every form

• Very high speed scanning can be achieved

• Both OMR and ICR can be interpreted using ICR software

• Forms designed for ICR relatively easy to fill in. Locally printed forms can be used.

• Allows capturing much more complex data than with OMR alone

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 24: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Disadvantages

• Significant hardware/software and trained IT staff will be required

• Accuracy dependant on manual intervention

• High calibre IT staff are required to support the ICR system

• More complex cost/benefit analysis than with OMR alone

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 25: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - ICR

Indicative Costs

For 65 Million Population Census (20M Single Sided A4 household form)

Processing period of 12 Weeks (8 hours/day 5 days/week)

• Hardware $800k-$1M in total

• Software $700k-$1.3M in total

Total Indicative Costs are $1.5M to $2.3M

• No. of Staff 100-190 in total– 6-10 Managers – 94-180 PC Operators

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 26: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

Data Capture - OMR

Summary

ICR offers considerable flexibility at the cost of higher skilled IT personnel

The single most important factor for timely and accurate data capture is to make sure

‘the forms are filled in correctly and are returned in good condition’

DRS are Worldwide specialists in Census data capture www.drs.co.uk

Page 27: UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in

UNSD Census WorkshopDay 2 - Session 7

Thank you for listening

Andy Tye – International Manager

DRS are Worldwide specialists in Census data capture www.drs.co.uk