scene text detection using machine learning classifiers
TRANSCRIPT
8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS
http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 1/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015
Scene Text Detection Using Machine Learning Classifiers
Nafla C.N.1, Sneha K.2, Divya K.P.3
1(Department of CSE, RCET, Akkikkvu, Thrissur)2 (Department of CSE, RCET, Akkikkvu, Thrissur)3 (Department of CSE, RCET, Akkikkvu, Thrissur)
ABSTRACTIn this paper we present an efficient method of scene text
detection using two machine learning classifiers: one forgenerating candidate word regions and the other for the
classification of text or nontext components. At first weextract connected components with the help of maximally
stable extremal region algorithm. The resultingcomponents are partitioned into clusters with help of anadaboost classifier based on adjacency relationship. After
that we extract features for classification from theclusters. Then with the help of a support vector machineclassifier we classify a block into text and nontextcomponents
Keywords - Connected component (CC), maximally stable
extremal region (MSER), optical character recognition(OCR), support vevtor machine (SVM).
I. INTRODUCTIONDue to the wide availability of mobile devices havinghigh quality digital cameras, research areas related tothese devices are getting more attention in the last few
decades. Text detection and extraction is one of the mostimportant and interesting area among these researches.Texts present in camera captured images are consideredas one of the important and strong source of informationabout that image and about the place or situation fromwhere the image was captured. Text detection and
extraction from images have a lot of valuable and usefulapplication.Texts present in an image or video can be classified as
scene text and caption text. Scene text exists in the imagenaturally. Caption texts refer to those texts which are
added manually by the user. Scene texts overlap with the background. Therefore scene text detection and extractionare difficult as compared to the detection of caption text.
Compared to the scanned document images, textextraction from the natural scenes are not easy becausethey exist in arbitrary orientation, different sizes and
background interference. Examples of scene texts includesigns on streets, display boards on shops, texts on
vehicles, advertisement boards etc. Fig1 shows examplesof text in natural scene images.
Text string detection and extraction have a variety anduseful applications. As people travel through different
places for various purposes, it will be difficult for them to
understand the text present on display boards in theforeign countries. In this case people either look for thehelp of guides or intelligent hand held devices for thetranslation of the information written on display boardsFor this text detection is an important part. Text detection
can play a crucial role in the case of content-based visuainformation retrieval and the content-based image
retrieval, which includes utilization of techniques ofcomputer vision for the problem of image retrieval inhuge database applications. Another importan
application of scene text extraction is helping people withvisual disabilities. It will be a great help for them if they
have a computerized system which can convey the textinformation present on the objects and locations. License
plate detection is another important area where tex
detection plays a central role. License plate detection hascrucial role in monitoring of traffic at custom check
points, for tracking of stolen cars. etc. Another significanapplication of scene text detection and extraction arerobotic navigation, automatic geocoding etc.
Fig 1: Examples of natural images with scene text
8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS
http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 2/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015
OCR is one of the technologies which can extract textcharacters, by identifying the corners. This can be done
only if the characters have correct separation from background. Background interference and degradation inimages will lead to the decrease in performance of OCR.So performance of OCR is comparatively low in case ofnatural scene images. Texture analysis and topic based
partition are other methods of detection. But they workcorrectly on document images. Text detection andextraction from natural image is not an simple task. Textmay exist in complex background and also the chances of
degradation are high in case of natural images. As a resulttext extractions from natural images have a lot ofcomplexities.
The paper is organized as follows. In Section II, aliterature survey on existing methods of scene textdetection is done. In Section III, we provide details of the
proposed method. In Section IV, we show conclusion.
II.
LITERATURE SURVEYThis section covers the study of existing scene textdetection methods. Existing method of scene text
detection can be categorized as Texture based method,connected component based method and hybrid method.
2.1Texture based methodsTexture based methods considers text as a special kind of
texture and identify the texts by using their properties likewavelet features, filter responses and local intensities.Angadi et al[1] described a method that make use of a
high pass filter that works in DCT domain for suppressing
of the background and make use of texture properties likehomogeneity and contrast for detection of text. Themethod comprises mainly of 5 phases. They are removalof background in the DCT domain, deriving feature
matrix D, block classification, merging of the blocks fortext area extraction and finally refinement of the textregion.Kim et al[2] described a method that uses a combinationof CAMSHIFT and SVM for detection and extraction of
text.. Raw pixel intensity that forms the textural pattern isgiven as input to the SVM. After texture extraction, the
text identification is performed by using the CAMSHIFT.
Gllavata et al[3] described a method that uses highfrequency wavelet coefficients distribution obtained bythe application of wavelet transform of the image. Forseparating text and non text area. Then text area
classification is done by k-means clustering. Then textextraction is performed by OCR engine by giving
segmented binary text image as input.
2.2 Connected component based methodsIn connected component based methods, at first the image
is divided and candidate text components are extractedAfter that non text elements are eliminated throughvarious ways. Connected component based methods makeuse of geometrical properties. This method works
properly on the images that contains texts of many
variations like changes in orientation, font etc.Epshtein et al [4] describe a method that makes use ofstroke width for the extraction of text components. Astroke is a contiguous part in an image that forms a band
of approximately constant width. Constant stroke width isone of the important feature that separate texts from othercomponents of a scene. In this method they make use of a
logical operator together with geometrical reasoning thaidentifies the place having same stroke width for theidentification of regions having text.Yi et al [5] describes a method that use of gradientfeatures and color homogeneity of character components
for the extraction of candidate text regions. After thacharacter candidate grouping is performed to detect texstrings. This is performed on the basis of structuralfeatures of characters in text string such as differences incharacter size, distances between neighboring characters
and alignment of characters.Gatos et al[6] described a methodology for text detectionfrom natural scene images is based on an efficient
binarization and enhancement technique followed by aconnected component analysis procedure. Starting from
the original image, the method produces a binary imageand an inverted binary image. Then connected
components are extracted from complementary imagesFurther, the text verification is conducted at characterlevel and word level on the candidate connected
components. Finally, text regions localized in two imagesare refined and merged in post-processing.
2.3
Hybrid based methods
Hybrid based method is a combination of texture basedand connected component based methods.Yi et al[7] described a hybrid approach. At first a text
region detector generates a text estimation map. Thishelps in the segmentation of text components by loca
binarization. After that non text component filtering is performed by a conditional random field model. Finallytext line grouping of text components are performed by
learning based energy minimization method.Liu et al[8] described a hybrid based method. This
method is based on the assumption that characters haveclosed contours and a character string contains charactersthat lie in a straight line. This method extracts the text
8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS
http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 3/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015
region by extracting closed contours and searchingneighbors of them.
III PROPOSED METHODThis section describes the techniques used in the
proposed methodology.
3.1 overview of proposed method
We have illustrated the block diagram of our system infig 2.
Fig 2: Overview of proposed system
As shown in the diagram the method consists of mainlyof the following steps: connected component extraction,
clustering with the help of an adaboost classifier, featureextraction for svm classification, classification of clustersinto text and nontext components. For the CCs extraction
we make use of MSER algorithm. An adaboost classifierthat works on the basis of adjacency relationship between
the CCS is used for clustering. Then we extract features.After that we classify the clusters as text and nontextcomponents. For classification, we make use of an svmclassifier.
3.2 connected component extractionAlthough there are a lot of CC extraction methods wemake use of MSER algorithm because of its low
computation cost with high performance. MSERalgorithm will extract the part of the image where local
binarization will be stable over a wide range ofthresholds. This property helps us to extract most of thetext components in the image.
Fig 3: input image
MSER algorithm finds out the connected component thais brighter or darker than their surroundings. Fig 4 shows
the result of MSER extraction of the input image shownin fig 3.
Fig 4: Result of MSER extraction
3.3 Clustering of CCsClustering includes grouping of CCs based on adjacency
relationship with the help of adaboost classifier
3.3.1Building of training setsOur classifier is based on the pair wise adjacencyrelationship between connected components extracted
using MSER. For building the training set for theclassifier, we obtain a collection of CCs by the help ofMSER extraction to the set of training images. Then forevery pair of extracted CCs we check if they are adjacentand they belong to text component set. Then we build a
set of positive and negative examples. Positive set
8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS
http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 4/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015
contains samples that are adjacent and both belong to textcomponent set. Negative samples are constructed by
providing pairs of CCs such that one CC belongs to textcomponent set and other belongs to nontext set.
3.3.2 Adaboost learning and clustering of CCsWith the help of collected samples, we train an adaboost
classifier which tells us whether two given CCs areadjacent or not. For the purpose of training of classifierwe make use of one color based property and fourgeometrical properties of CCs. first we construct
bounding box on each CC and denote its height and width
as , respectively. For each pair of CCs, we estimate
the vertical overlap, horizontal overlap and horizontaldistance between the bounding boxes. They are denoted
by voij, hoij, d ij respectively.
, (1)
, (2)
× (3)
And color distance between two CCs. we calculate thesefeatures for both positive and negative samples. We train
an adaboost classifier with the help of these features. Weset the output of the adaboost classifier as +1 for CCs thatare adjacent and -1 for CCs that are not adjacent. We
checks these adjacency for all pair of CCs extracted usingMSER. Then we cluster the CCs with the help of union
find set algorithm.
Fig 5: Result of clustering on input image
3.3 Feature extractionAfter clustering we will get a set of clusters which
includes text as well as non text regions. For theclassification of text and nontext component, we makeuse of an SVM classifier. For this we have to extractfeatures from the clusters. For this we divide each clusterinto overlapped square and we extract feature from each
square block. Each square block is divided into 4 verticaland horizontal ones and features are extracted. For ahorizontal block, we find
a)
number of white pixels,
b) number of vertical white-black transitionsc) number of vertical black-white transitions
as features, and features for vertical block is defined
similarly.
3.4
SVM classificationFor the training of SVM we first apply our connectedcomponent extraction, clustering and feature extraction
steps and we train a support vector machine classifier forthe classification of square block as text and nontextcomponent. For a testing image, we do all the above stepsand finally decision result of all the square blocks of acluster is integrated. If the number square blocks which
are text is greater than the non text, then that cluster isclassified as a text component.
Fig 6: Text region detected from input image IV CONCLUSION
Due to the complicated background and unpredictable
text appearances scene text detection is still a challenging problem. We have presented in this paper an improvedscene text detection method that makes use of machinelearning classifiers. One for identifying the textcomponent and other classification of text and non tex
8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS
http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 5/5
www.ijsret.org
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015
components. Our method is designed to work correctly onimages having text strings arranged horizontally. Our
future work will focus on developing an efficient learning based algorithm that extracts text in complex backgroundand texts of arbitrary orientation.
ACKNOWLEDGEMNTSEvery success stands as a testimony not only to thehardship but also to hearts behind it. Likewise, the presentwork has been undertaken and completed with direct andindirect help from many people and I would like to
acknowledge all of them for the same
REFERENCES
[1] Angadi, S.A. and Kodabagi, M.M, Text region
extraction from low resolution natural scene imagesusing texture features, 2ndInternational Advance
Computing Conference, IEEE, 2010,pp 121-128[2]
K. I. Kim, K. Jung, and J. H. Kim, Texture-based
approach for text detection in images using supportvector machines and continuously adaptive meanshift algorithm, IEEE Trans. PAMI, vol. 25, no. 12,
pp. 1631 – 1639, 2003.[3]
J. Gllavata, R. Ewerth, and B. Freisleben, Text
Detection in Images Based on UnsupervisedClassification of High-Frequency WaveletCoefficients, Proc. of Int’l Conf. on Pattern
Recognition, Cambridge, UK, (page 425-428 Year ofPublication : 2004 ICPR.2004.1334146 ).
[4]
B. Epshtein, E. Ofek, and Y. Wexler, Detecting textin natural scenes with stroke width transform, in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., Page.
2963 – 2970 Year of Publication: 2010CVPR.2010.5540041
[5]
Yingli Tian and Chucai Yi, Text string detection fromnatural scenes by structure based partition andgrouping, IEEE Transactions on image processing,
vol. 20, no. 9, pp. 2594-2605, 2011.[6] Gatos, B.,Pratikakis, I. & Perantonis, S.J. ,Towards
text recognition in natural scene Images, inProceedings of Int. Conf. Automation andTechnology, ( Page 354-359 Year of Publication
2005)[7] Yi-Feng Pan, Xinwen Hou, Cheng-LinLiu(2009),
“Text Localization In Natural Scene Images BasedOn Conditional Random Field,” ICDAR,pp 6-10.
[8] Y.Liu, S. Goto, and T. Ikenaga, A contour-basedrobust algorithm for text detection in color images,
IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1221 –
1230, 2006.
[9] H Koo and D Kim., Scene text detection viaconnected component clustering and non-tex
filtering, IEEE Trans. Image Proc., vol. 22, no. 6 pp2296 – 2305, 2013
[10]
P. Shivakumara, T. Q. Phan, L. Shijian and C. LTan, Gradient Vector Flow and Grouping Based forArbitrarily-Oriented Scene Text Detection in Video
Images, IEEE Trans. CSVT, 2013, pp 1729-1739.