optical data capture: optical character recognition (ocr) intelligent character recognition (icr)

21
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008 Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition

Upload: mercury

Post on 22-Feb-2016

79 views

Category:

Documents


1 download

DESCRIPTION

Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition. Summary. Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR Advantages and Disadvantages Intelligent Recognition (IR) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Optical Data Capture: Optical Character Recognition (OCR)

Intelligent Character Recognition (ICR)

Intelligent Recognition

Page 2: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Summary Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR Advantages and Disadvantages Intelligent Recognition (IR) Commercial Suppliers

Page 3: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Definition/Concept of OCR

Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters.

Images of the machine printed characters are extracted from a bitmap of the scanned image

Page 4: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Definition/Concept of ICR Gives scanning and imaging systems the

ability to turn images of hand written characters into machine readable characters

Images of the hand written characters are extracted from a bitmap of the scanned image

Page 5: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR and ICR Differences OCR is less accurate than OMR but more

accurate than ICR

ICR will require editing to achieve high data coverage

Page 6: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Forms OCR/ICR has less strict form design

compared to OMR No timing tracks Has Registration Marks

ICR requires hand printed boxes filled one alphanumeric character per box

Page 7: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR Forms

OCR/ ICR is more flexible since: no timing tracks are required The image can float on a page

The use of drop color reduces the size of the scanner’s output and enhances the accuracy

ICR/OCR technology often uses registration mark on the four-corners of a document, in the recognition of an image

Page 8: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Page 9: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR/ICR Scanners and Software Forms can be scanned through a scanner and then the

recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).

Users can scan up without doing the OCR

Speeds Range from: 85-160 sheets/min (dependent on the recognition engine)

Page 10: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR/ICR Storage Characteristics Storage/Retrieval

Images are scanned and stored and maintained electronically

There is no need to store the paper forms as long as you safeguard the electronic files

With OCR/ICR technologies, images can be scanned, indexed, and written to optical media

Page 11: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Ideal OCR/ICR Accuracy Thresholds Accuracy:

Accuracy achieved by data entry clerks (~99.5%) are approximately equal to OCR/ICR in in perfect tuning (~99.5%)

Up to 99.9% accuracy with editing (like OMR)

The recognition engine must be tuned, tested and validated very carefully

Page 12: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR/ICR Advantages Advantages

Recognition engines used with imaging can capture highly specialized data sets

OCR/ICR recognize machine-printed or hand-printed characters.

Scanning and recognition allowed efficient management and planning for the rest of the processing workload

Quick retrieval for editing and reprocessing

Page 13: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR/ICR Disadvantages

Technology is costly

May require significant manual intervention

Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting

Characters must be hand-printed/machine-printed with separate characters in boxes

ineffective when dealing with cursive characters

Page 14: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OMR-OCR/ICR Compared

Page 15: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

OCR/ICR Challenges/Issues Has corresponding issues with OMR

Algorithm development (Preparation of memory dictionary)

Processing time considerations due to recognition engine

Development costs

Page 16: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Definition/Concept of IRState of the art recognition technology

Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters

Images of the hand written and cursive characters are extracted from a bitmap of the scanned image

The ability to capture cursive make this method unique

Page 17: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Definition/Concept of IR

eight elements that make up the trajectories of all cursive letters (figure 1)

Photo: Parascript LLC

Page 18: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Definition/Concept of IR Intelligent Recognition dynamically uses context

context is used during the recognition process, improving the accuracy of results

Contexts helps to identify letters where the symbol segmentation of an image is ambiguous

Photo: Parascript LLC

Page 19: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Cursive

Bad quality machine print

UnconstrainedHandprint

ConstrainedHandprint

Machine Print

TEXT STYLESFORM TYPESNo special form designNo constraining boxes or combsCondensed stringsDirty & Noisy formsBad quality paperLegacy Forms

Specially designed for automatic recognition

Constraining boxes or combs

Drop out ink for preprinted text & boxes

TECHNOLOGY EVOLUTION

OCR ICRIntelligentRecognition

Technology Evolution

Illustration: Conference on Technology Options for 2011 Census

Page 20: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

Major Commercial Suppliers Top Image Systems (TIS) (http://www.topimagesystems.com)

ReadSoft (http://www.readsoft.com)

Teleform (http://www.intelliscan.com/TeleForm1.htm)

Scanner Suppliers Fujitsu, Canon, Bell & Howell, Kodak

Page 21: Optical Data Capture:  Optical Character Recognition (OCR) Intelligent Character Recognition (ICR)

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Dar es Salaam, Tanzania, 9-13 June 2008

THANK YOU!