transform text detection and recognition from natural...

Deep Text Recognition Text Detection / Localization Text Detection and Recognition from Natural Scene using Stroke Width Transform and Deep Feature Classification Ishtiak Zaman, David Crandall, School of Informatics and Computing, Indiana University Introduction ● In a natural scenery, there could be multiple instances of text that an agent may want to read. ● We detect and recognize text from an image. ● We use stroke width transform [1] with grouping and filtering to detect and localize characters. ● We extract the deep features of each character and classify the characters using a trained SVM [5] . ● All recognized characters are grouped together to get the text string. Stroke width transform [1] Edge Map Each character is a single group of pixels ● Deep feature: Overfeat library [3] extracts the deep features of the training images. ● Training: We trained a multiclass SVM [5] with the extracted deep features. ● Classify: Overfeat library [3] to extract the features and trained SVM [5] model to classify. ● Dataset: char74k datasets [2] with 7705 natural images with 62 classes (0-9, A-Z, a-z). Detected Text String: Input Image: Summary & Future Work ● Able to detect all the characters most of the time. ● Recognizes English numbers and letters correctly. ● False positive for foliage/texture similar to text. ● Cannot detect cursive text. ● Detects dark text on light background only. ● Future Work: To recognize the characters using deep learning which would eliminate the false positives. ● Research on cursive and light text on dark background. References 1. B. Epshtein, E. Ofek, and Y.Wexler. Detecting text in natural scenes with stroke width transform. In CVPR, 2010. 2. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ 3. http://cilvr.nyu.edu/doku.php?id=software:overfeat:start 4. J. Canny, “A Computational Approach To Edge Detection”, IEEE Trans, 1986. 5. https://www.cs.cornell.edu/people/tj/svm_light/svm_multiclass.html 7 layers of filtering filters out inconsistent items Text have the most consistent stroke width ● We generate edge map with Canny edge detector [4] . ● For each edge pixel p, we search in the gradient direction of p for another edge pixel q. If gradient direction of q is opposite of p, all the pixels within the search ray has width of |p-q|. ● Text strokes have consistent width. Stroke Width Transform [1] p q w=|p-q| w Fig 1: Fig 2: Typical Text Stroke Figure Reprinted from: http://www.fts4buses.com/vehicles/2002-international-bus-8135 Figure Reprinted from: http://www.stonemascots.com/v/vspfiles/photos/2997-4.jpg UNIVERSITY INDIANA 6 BUS SCHOOL Fig: Sample Char74k dataset

Upload: others

Post on 25-May-2020

16 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: transform Text Detection and Recognition from Natural ...vision.soic.indiana.edu/b657/sp2016/projects/izaman/poster.pdfText Detection and Recognition from Natural Scene using Stroke

Deep Text RecognitionText Detection / Localization

Text Detection and Recognition from Natural Scene using Stroke Width Transform and Deep Feature ClassificationIshtiak Zaman, David Crandall, School of Informatics and Computing, Indiana University

Introduction

● In a natural scenery, there could be multiple instances of text that an agent may want to read.

● We detect and recognize text from an image.● We use stroke width transform[1] with grouping and

filtering to detect and localize characters.● We extract the deep features of each character and

classify the characters using a trained SVM[5].● All recognized characters are grouped together to

get the text string.

Stroke width transform[1]

Edge Map

Each character is a single group of pixels

● Deep feature: Overfeat library[3]

extracts the deep features of the training images.● Training: We trained a multiclass SVM[5] with the

extracted deep features.● Classify: Overfeat library[3] to extract the features

and trained SVM[5] model to classify.

● Dataset: char74k datasets[2] with 7705 natural images with 62 classes (0-9, A-Z, a-z).

Detected Text String:Input Image:Summary & Future Work

● Able to detect all the characters most of the time. ● Recognizes English numbers and letters correctly.● False positive for foliage/texture similar to text.● Cannot detect cursive text.● Detects dark text on light background only.● Future Work: To recognize the characters using deep

learning which would eliminate the false positives.● Research on cursive and light text on dark background.

References1. B. Epshtein, E. Ofek, and Y.Wexler. Detecting text in natural scenes with

stroke width transform. In CVPR, 2010.2. http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/3. http://cilvr.nyu.edu/doku.php?id=software:overfeat:start4. J. Canny, “A Computational Approach To Edge Detection”, IEEE Trans, 1986.5. https://www.cs.cornell.edu/people/tj/svm_light/svm_multiclass.html

7 layers of filtering filters out

inconsistent items

Text have the most consistent stroke width

● We generate edge map with Canny edge detector[4].

● For each edge pixel p, we search in the gradient direction of p for another edge pixel q. If gradient direction of q is opposite of p, all the pixels within the search ray has width of |p-q|.

● Text strokes have consistent width.

Stroke Width Transform[1]

w=|p-q|

Fig 1:

Fig 2:

Typical Text Stroke

Figure Reprinted from: http://www.fts4buses.com/vehicles/2002-international-bus-8135

Figure Reprinted from: http://www.stonemascots.com/v/vspfiles/photos/2997-4.jpg

UNIVERSITY

INDIANA

BUS

SCHOOL