optical character recognition( ocr )
TRANSCRIPT
![Page 1: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/1.jpg)
Optical Character
Recognition
( OCR )
Karan Panjwani
T.E – B , 68
Guided By :
Prof. Shalini Wankhade
![Page 2: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/2.jpg)
Contents
Definition
Introduction To OCR
Problem Overview
Uses
Types
Steps in OCR
Accuracy
Software Implementation
Pros and Cons
Research
![Page 3: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/3.jpg)
Optical Character Recognition (OCR) is the
mechanical or electronic conversion of images of
typewritten or printed text into machine-encoded
text.
Definition
![Page 4: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/4.jpg)
Introduction to OCR
1 2 3 4 5 6 7 8 9 0
![Page 5: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/5.jpg)
Problem overview
Humans are bound to make errors – some time or the other – especially
while performing mundane boring tasks like digitization or Security,
continuously.
Many times we are unable to perceive certain digits due to various factors
– motion, lack digit clarity, illumination and so on.
It is these problems which have lead us to delve into this topic.
![Page 6: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/6.jpg)
USES
It is widely used as a form of Data Entry from Printed
Paper data records, whether Passport Documents,
Invoices, Bank Statements, Business Card, Mail or Other
Documents.
It is common method of Digitizing Printed Texts so that
it can be Electronically edited, searched, stored more
compactly, displayed on-line, and used in Machine
Processes such as Machine Translation, Text-to-Speech,
Key Data and Text Mining.
![Page 7: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/7.jpg)
TYPES
1) Optical Character Recognition ( OCR ) -
Targets typewritten text, one Glyph or Character at a time.
2) Optical Word Recognition ( OWR ) -
Targets typewritten text, one word at a time (for languages that use a space as a word divider).
3) Intelligent Character Recognition ( ICR ) –
Targets handwritten print script or cursive text one glyph or character at a time, usually involving machine learning.
![Page 8: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/8.jpg)
TYPES( contd…)
4) Intelligent Word Recognition ( IWR ) -
Targets handwritten print script or cursive text, one
word at a time.
This is especially useful for languages where glyphs
are not separated in cursive script.
![Page 9: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/9.jpg)
Steps in OCR
![Page 10: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/10.jpg)
Steps in ocr
![Page 11: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/11.jpg)
Pre - processing
• Deals with Improving
quality of the Image for
better recognition by the
system. OCR software often
"pre-processes" images to
improve the chances of
successful recognition.
Techniques include:
• De-Skew
• Despeckle
• Binarization
• Line Removal
• Zoning
• Line and Word Detection
• Script Recognition
• Segmentation
• Normalize Aspect Ratio and
Scale
![Page 12: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/12.jpg)
Character Recognition
There are two basic types of core OCR algorithm, which may produce a ranked list of candidate characters.
• Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as “pattern matching”. This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered.
• Feature extraction decomposes glyphs into “features” like lines, closed loops, line direction, and line intersections.Feature Extraction serves two purposes; one is to extract properties that can identify a character uniquely. Second is to extract properties that can differentiate between similar characters.
![Page 13: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/13.jpg)
![Page 14: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/14.jpg)
Post - processing
OCR accuracy can be increased if the output is
constrained by a lexicon – a list of words that are
allowed to occur in a document. This might be, for
example, all the words in the English language, or a
more technical lexicon for a specific field. This
technique can be problematic if the document contains
words not in the lexicon, like proper nouns. Tesseract
uses its dictionary to influence the character
segmentation step, for improved accuracy.
![Page 15: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/15.jpg)
Accuracy
Recognition of Latin-script, typewritten text is still not 100% accurate even where clear imaging is available. One study based on recognition of 19th- and early 20th-century newspaper pages concluded that character-by-character OCR accuracy for commercial OCR software varied from 81% to 99%; total accuracy can be achieved by human review or Data Dictionary Authentication.
Other areas—including recognition of hand printing, cursive handwriting, and printed text in other scripts are still the subject of active research.
![Page 16: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/16.jpg)
Accuracy(contd..)
Accuracy rates can be measured in several ways, and
how they are measured can greatly affect the reported
accuracy rate.
For example, if word context (basically a lexicon of
words) is not used to correct software finding non-
existent words, a character error rate of 1% (99%
accuracy) may result in an error rate of 5% (95%
accuracy) or worse if the measurement is based on
whether each whole word was recognized with no
incorrect letters.
![Page 17: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/17.jpg)
Use of Freeocr software
![Page 18: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/18.jpg)
![Page 19: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/19.jpg)
![Page 20: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/20.jpg)
Pros and Cons
OCR reduces time for processing for processing data
from large number of forms.
If done manually, may lead to human error and takes up
much of the time.
Recognition of cursive text is an active area of research,
with recognition rates even lower than that of hand-
printed text.
Higher rates of recognition of general cursive script will
likely not be possible without the use of contextual or
grammatical information.
![Page 21: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/21.jpg)
Research
Recognition of cursive text is an active area
of research, with recognition rates even lower
than that of hand-printed text.
Higher rates of recognition of general cursive
script will likely not be possible without the
use of contextual or grammatical information.
For example, recognizing entire words from a
dictionary is easier than trying to parse
individual characters from script.
![Page 22: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/22.jpg)
Conclusion
• OCR technology provides fast, automated
data capture which can save considerable
time and labour costs of organisations.
• The system has its advantages such as
Automation of mundane tasks, Less Time
Complexity, Very Small Database and High
Adaptability to untrained inputs with only
a small number of features to calculate.
![Page 23: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/23.jpg)
References
INTERNET :
www.google.co.in
www.slideshare.net
http://www.ijsrp.org/research_paper_may2012/ijsrp-
may-2012-68.pdf
en.wikipedia.org/wiki/Optical_character_recognition
BOOKS’ :
Character Recognition Systems by Mohamed Cheriet,
Nawwaf, Cheng-lin, Ching Y
![Page 24: Optical Character Recognition( OCR )](https://reader031.vdocuments.mx/reader031/viewer/2022012322/55a60f0f1a28abcb418b46e7/html5/thumbnails/24.jpg)
THANK YOU