improve ocr accuracy, clean up and enhance scanned images
Post on 20-Oct-2014
1.017 views
DESCRIPTION
See ways to improve OCR accuracy on document scans. Cleaning and enhancing images can greatly improve the accuracy of OCR interpretations on your documents. Learn about automatic sophisticated adaptive thresholding, text smoothing and more. Add field validation and preview and testing features for optimal OCR interpretation.TRANSCRIPT
![Page 1: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/1.jpg)
Improving OCR Accuracy
Clean Up and Enhance Scanned Images
![Page 2: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/2.jpg)
Cleaner Image = More Accurate OCR
![Page 3: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/3.jpg)
Your acceptable level of OCR accuracy may depend on your application
![Page 4: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/4.jpg)
Healthcare and Legal applications have
high OCR accuracy requirements.
![Page 5: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/5.jpg)
Pre-
Scanning
During
Scanning
Optimizing for the highest OCR accuracy generally is divided into two phases.
![Page 6: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/6.jpg)
Form Design • adequate white
space • limited lines
Font Selection
• monospace like Courier or san serif fonts like Helvetica
• at least 10-13 points
Color Selection
• limited use of color
Set pre-processing standards and procedures
![Page 7: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/7.jpg)
During scanning…
Scan at at least 300 dpi
and CLEAN.
![Page 8: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/8.jpg)
Most capture applications include basic cleaning features.
![Page 9: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/9.jpg)
Go beyond the basics with DocuFi’s
![Page 10: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/10.jpg)
Adaptive thresholding assists in cleaning “dirty” documents or
documents with a colored background which interferes with the
foreground data.
Adaptive Thresholding
![Page 11: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/11.jpg)
Adaptive thresholding assists in cleaning “dirty” documents or
documents with a colored background which interferes with the
foreground data.
Adaptive Thresholding
Most scanner and capture software can apply basic thresholding
technology.
![Page 12: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/12.jpg)
Adaptive Thresholding
ImageRamp uses Adaptive Thresholding with advanced algorithms
and Sensitivity settings allowing you to optimize the thresholding for
your documents.
![Page 13: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/13.jpg)
This option smoothes the edging of text. Smoothing text fills small
pits in the edges of a character and removes small bumps on the
edges. This improves legibility and reduce storage needs.
Smooth Text
![Page 14: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/14.jpg)
Dither Form Fills
Black and white printed images may use dithering, often called dot
shading, to simulate shades of gray by varying the patterns of dots.
The Dither Form Fills feature removes areas of dot shading from an
image. This function is used to make a black and white TIFF image
appear as black and white and not a grayscale image.
![Page 15: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/15.jpg)
This searches and resizes the document based on the outermost
located raster data or pixels.
Reset Margins
![Page 16: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/16.jpg)
Using detected text as the basis for alignment, this tool is designed
to work with scanned office documents and eliminate rescans.
Deskew or Straighten Page
![Page 17: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/17.jpg)
This selection detects and removes lines which may interfere with
OCR interpretation.
Remove Lines
![Page 18: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/18.jpg)
Whether your scanned image is contaminated or a bad original, this
option removes extraneous black specks and fills in white holes on
black areas of an image.
Remove Noise or Despeckle
![Page 19: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/19.jpg)
Auto Rotate automatically evaluates orientation based on the text
and rotates misoriented pages. Optionally, select a degree of
rotation for ImageRamp to rotate all pages based on the selection.
Auto Rotate and Rotate Pages
![Page 20: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/20.jpg)
This can be used to eliminate unnecessary blank pages in a
document and make the file size smaller. Blank page detection can
also play a role in file splitting. Many users divide documents in a
scanning stack with blank pages and ImageRamp can be set to split
the stack of documents into multiple files when blanks are detected.
Remove Blank Pages
![Page 21: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/21.jpg)
Besides cleaning and enhancing the image, ImageRamp has other ways
to improve OCR accuracy.
![Page 22: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/22.jpg)
OCR with validation during processing is a very powerful way to eliminate entries not meeting a specific format rule.
For instance if an inventory item should contain three alpha characters followed by five numbers, all documents with item numbers that are not identified in the OCR process with that pattern may be tagged for manual inspection before further processing is done.
Field Validation Improves Accuracy.
PEN21096
CAP36581
INV98453
PA568793
![Page 23: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/23.jpg)
ImageRamp offers significant preview and testing options to fine-tune settings. Additionally ImageRamp offers PDF or TIFF output which may differ in OCR accuracy.
![Page 24: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/24.jpg)
Set Pre-
Processing
Standards
OCR
Accuracy
Scan at
300+ dpi
Capture
with Clean-
up
Wrap up: Ways to Improve OCR
3
![Page 25: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/25.jpg)
Pre-Processing Standards
Encourage accuracy by setting document procedures and guidelines to:
Good pre-processing can be as important as the scanning technologies.
• Use adequate white space
• Limit lines and gridlines
• Limit the use of color
• Use OCR friendly fonts and sizes
![Page 26: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/26.jpg)
Use an Intelligent Capture Solution such as ImageRamp
![Page 27: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/27.jpg)
Learn More about Document Imaging and Capture
![Page 28: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/28.jpg)
For more on: • Clean scans, • Ways to improve OCR
scanning, • Cleaning documents for
scanning, • Enhancing your images for
improved OCR, • Watching folders, • Batch Processing, • Bulk scanning, • Split files with barcodes, • Barcode splitting, • Docufi, • Imageramp, • Watch folders, • Data capture, • Intelligent Data Capture
Contact Us
DocuFi
30 years’ experience in the Document Imaging market.
ImageRamp www.docufi.com
ImageRamp Cleanup and Enhance for OCR
603-685-4033
Copyright ©2014
makers of ImageRamp, Document Management
Capture Solution
![Page 29: Improve OCR Accuracy, Clean Up and Enhance Scanned Images](https://reader033.vdocuments.mx/reader033/viewer/2022050804/5444efd3b1af9fdd748b45ff/html5/thumbnails/29.jpg)
Image Credits
• Tim Evanson, “Albert V Bryan Federal District Courthouse - Alexandria Va - 0014 - 2012-03-10”, http://bit.ly/1iGIBpF
• takacsi75, “Medicine 02”, http://bit.ly/1dtsIxK • ToastyKen,”New Mophead”, http://bit.ly/1ijjkkD
• mjtmail (tiggy), “Day 307”, http://bit.ly/1g4G3Bw