scene text detection using machine learning classifiers

8/20/2019 SCENE TEXT DETECTION USING MACHINE LEARNING CLASSIFIERS

http://slidepdf.com/reader/full/scene-text-detection-using-machine-learning-classifiers 1/5

www.ijsret.org

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 – 0882Volume 4, Issue 5, May 2015

Scene Text Detection Using Machine Learning Classifiers

Nafla C.N.1, Sneha K.2, Divya K.P.3

1(Department of CSE, RCET, Akkikkvu, Thrissur)2 (Department of CSE, RCET, Akkikkvu, Thrissur)3 (Department of CSE, RCET, Akkikkvu, Thrissur)

ABSTRACTIn this paper we present an efficient method of scene text

detection using two machine learning classifiers: one forgenerating candidate word regions and the other for the

classification of text or nontext components. At first weextract connected components with the help of maximally

stable extremal region algorithm. The resultingcomponents are partitioned into clusters with help of anadaboost classifier based on adjacency relationship. After

that we extract features for classification from theclusters. Then with the help of a support vector machineclassifier we classify a block into text and nontextcomponents

Keywords - Connected component (CC), maximally stable

extremal region (MSER), optical character recognition(OCR), support vevtor machine (SVM).

I. INTRODUCTIONDue to the wide availability of mobile devices havinghigh quality digital cameras, research areas related tothese devices are getting more attention in the last few

decades. Text detection and extraction is one of the mostimportant and interesting area among these researches.Texts present in camera captured images are consideredas one of the important and strong source of informationabout that image and about the place or situation fromwhere the image was captured. Text detection and

extraction from images have a lot of valuable and usefulapplication.Texts present in an image or video can be classified as

scene text and caption text. Scene text exists in the imagenaturally. Caption texts refer to those texts which are

added manually by the user. Scene texts overlap with the background. Therefore scene text detection and extractionare difficult as compared to the detection of caption text.

Compared to the scanned document images, textextraction from the natural scenes are not easy becausethey exist in arbitrary orientation, different sizes and

background interference. Examples of scene texts includesigns on streets, display boards on shops, texts on

vehicles, advertisement boards etc. Fig1 shows examplesof text in natural scene images.

Text string detection and extraction have a variety anduseful applications. As people travel through different

places for various purposes, it will be difficult for them to

understand the text present on display boards in theforeign countries. In this case people either look for thehelp of guides or intelligent hand held devices for thetranslation of the information written on display boardsFor this text detection is an important part. Text detection

can play a crucial role in the case of content-based visuainformation retrieval and the content-based image

retrieval, which includes utilization of techniques ofcomputer vision for the problem of image retrieval inhuge database applications. Another importan

application of scene text extraction is helping people withvisual disabilities. It will be a great help for them if they

have a computerized system which can convey the textinformation present on the objects and locations. License

plate detection is another important area where tex

detection plays a central role. License plate detection hascrucial role in monitoring of traffic at custom check

points, for tracking of stolen cars. etc. Another significanapplication of scene text detection and extraction arerobotic navigation, automatic geocoding etc.

Fig 1: Examples of natural images with scene text



www.ijsret.org


OCR is one of the technologies which can extract textcharacters, by identifying the corners. This can be done

only if the characters have correct separation from background. Background interference and degradation inimages will lead to the decrease in performance of OCR.So performance of OCR is comparatively low in case ofnatural scene images. Texture analysis and topic based

partition are other methods of detection. But they workcorrectly on document images. Text detection andextraction from natural image is not an simple task. Textmay exist in complex background and also the chances of

degradation are high in case of natural images. As a resulttext extractions from natural images have a lot ofcomplexities.

The paper is organized as follows. In Section II, aliterature survey on existing methods of scene textdetection is done. In Section III, we provide details of the

proposed method. In Section IV, we show conclusion.

II.

LITERATURE SURVEYThis section covers the study of existing scene textdetection methods. Existing method of scene text

detection can be categorized as Texture based method,connected component based method and hybrid method.

2.1Texture based methodsTexture based methods considers text as a special kind of

texture and identify the texts by using their properties likewavelet features, filter responses and local intensities.Angadi et al[1] described a method that make use of a

high pass filter that works in DCT domain for suppressing

of the background and make use of texture properties likehomogeneity and contrast for detection of text. Themethod comprises mainly of 5 phases. They are removalof background in the DCT domain, deriving feature

matrix D, block classification, merging of the blocks fortext area extraction and finally refinement of the textregion.Kim et al[2] described a method that uses a combinationof CAMSHIFT and SVM for detection and extraction of

text.. Raw pixel intensity that forms the textural pattern isgiven as input to the SVM. After texture extraction, the

text identification is performed by using the CAMSHIFT.

Gllavata et al[3] described a method that uses highfrequency wavelet coefficients distribution obtained bythe application of wavelet transform of the image. Forseparating text and non text area. Then text area

classification is done by k-means clustering. Then textextraction is performed by OCR engine by giving

segmented binary text image as input.

2.2 Connected component based methodsIn connected component based methods, at first the image

is divided and candidate text components are extractedAfter that non text elements are eliminated throughvarious ways. Connected component based methods makeuse of geometrical properties. This method works

properly on the images that contains texts of many

variations like changes in orientation, font etc.Epshtein et al [4] describe a method that makes use ofstroke width for the extraction of text components. Astroke is a contiguous part in an image that forms a band

of approximately constant width. Constant stroke width isone of the important feature that separate texts from othercomponents of a scene. In this method they make use of a

logical operator together with geometrical reasoning thaidentifies the place having same stroke width for theidentification of regions having text.Yi et al [5] describes a method that use of gradientfeatures and color homogeneity of character components

for the extraction of candidate text regions. After thacharacter candidate grouping is performed to detect texstrings. This is performed on the basis of structuralfeatures of characters in text string such as differences incharacter size, distances between neighboring characters

and alignment of characters.Gatos et al[6] described a methodology for text detectionfrom natural scene images is based on an efficient

binarization and enhancement technique followed by aconnected component analysis procedure. Starting from

the original image, the method produces a binary imageand an inverted binary image. Then connected

components are extracted from complementary imagesFurther, the text verification is conducted at characterlevel and word level on the candidate connected

components. Finally, text regions localized in two imagesare refined and merged in post-processing.

2.3

Hybrid based methods

Hybrid based method is a combination of texture basedand connected component based methods.Yi et al[7] described a hybrid approach. At first a text

region detector generates a text estimation map. Thishelps in the segmentation of text components by loca

binarization. After that non text component filtering is performed by a conditional random field model. Finallytext line grouping of text components are performed by

learning based energy minimization method.Liu et al[8] described a hybrid based method. This

method is based on the assumption that characters haveclosed contours and a character string contains charactersthat lie in a straight line. This method extracts the text



www.ijsret.org


region by extracting closed contours and searchingneighbors of them.

III PROPOSED METHODThis section describes the techniques used in the

proposed methodology.

3.1 overview of proposed method

We have illustrated the block diagram of our system infig 2.

Fig 2: Overview of proposed system

As shown in the diagram the method consists of mainlyof the following steps: connected component extraction,

clustering with the help of an adaboost classifier, featureextraction for svm classification, classification of clustersinto text and nontext components. For the CCs extraction

we make use of MSER algorithm. An adaboost classifierthat works on the basis of adjacency relationship between

the CCS is used for clustering. Then we extract features.After that we classify the clusters as text and nontextcomponents. For classification, we make use of an svmclassifier.

3.2 connected component extractionAlthough there are a lot of CC extraction methods wemake use of MSER algorithm because of its low

computation cost with high performance. MSERalgorithm will extract the part of the image where local

binarization will be stable over a wide range ofthresholds. This property helps us to extract most of thetext components in the image.

Fig 3: input image

MSER algorithm finds out the connected component thais brighter or darker than their surroundings. Fig 4 shows

the result of MSER extraction of the input image shownin fig 3.

Fig 4: Result of MSER extraction

3.3 Clustering of CCsClustering includes grouping of CCs based on adjacency

relationship with the help of adaboost classifier

3.3.1Building of training setsOur classifier is based on the pair wise adjacencyrelationship between connected components extracted

using MSER. For building the training set for theclassifier, we obtain a collection of CCs by the help ofMSER extraction to the set of training images. Then forevery pair of extracted CCs we check if they are adjacentand they belong to text component set. Then we build a

set of positive and negative examples. Positive set



www.ijsret.org


contains samples that are adjacent and both belong to textcomponent set. Negative samples are constructed by

providing pairs of CCs such that one CC belongs to textcomponent set and other belongs to nontext set.

3.3.2 Adaboost learning and clustering of CCsWith the help of collected samples, we train an adaboost

classifier which tells us whether two given CCs areadjacent or not. For the purpose of training of classifierwe make use of one color based property and fourgeometrical properties of CCs. first we construct

bounding box on each CC and denote its height and width

as , respectively. For each pair of CCs, we estimate

the vertical overlap, horizontal overlap and horizontaldistance between the bounding boxes. They are denoted

by voij, hoij, d ij respectively.

, (1)

, (2)

× (3)

And color distance between two CCs. we calculate thesefeatures for both positive and negative samples. We train

an adaboost classifier with the help of these features. Weset the output of the adaboost classifier as +1 for CCs thatare adjacent and -1 for CCs that are not adjacent. We

checks these adjacency for all pair of CCs extracted usingMSER. Then we cluster the CCs with the help of union

find set algorithm.

Fig 5: Result of clustering on input image

3.3 Feature extractionAfter clustering we will get a set of clusters which

includes text as well as non text regions. For theclassification of text and nontext component, we makeuse of an SVM classifier. For this we have to extractfeatures from the clusters. For this we divide each clusterinto overlapped square and we extract feature from each

square block. Each square block is divided into 4 verticaland horizontal ones and features are extracted. For ahorizontal block, we find

a)

number of white pixels,

b) number of vertical white-black transitionsc) number of vertical black-white transitions

as features, and features for vertical block is defined

similarly.

3.4

SVM classificationFor the training of SVM we first apply our connectedcomponent extraction, clustering and feature extraction

steps and we train a support vector machine classifier forthe classification of square block as text and nontextcomponent. For a testing image, we do all the above stepsand finally decision result of all the square blocks of acluster is integrated. If the number square blocks which

are text is greater than the non text, then that cluster isclassified as a text component.

Fig 6: Text region detected from input image IV CONCLUSION

Due to the complicated background and unpredictable

text appearances scene text detection is still a challenging problem. We have presented in this paper an improvedscene text detection method that makes use of machinelearning classifiers. One for identifying the textcomponent and other classification of text and non tex



www.ijsret.org


components. Our method is designed to work correctly onimages having text strings arranged horizontally. Our

future work will focus on developing an efficient learning based algorithm that extracts text in complex backgroundand texts of arbitrary orientation.

ACKNOWLEDGEMNTSEvery success stands as a testimony not only to thehardship but also to hearts behind it. Likewise, the presentwork has been undertaken and completed with direct andindirect help from many people and I would like to

acknowledge all of them for the same

REFERENCES

[1] Angadi, S.A. and Kodabagi, M.M, Text region

extraction from low resolution natural scene imagesusing texture features, 2ndInternational Advance

Computing Conference, IEEE, 2010,pp 121-128[2]

K. I. Kim, K. Jung, and J. H. Kim, Texture-based

approach for text detection in images using supportvector machines and continuously adaptive meanshift algorithm, IEEE Trans. PAMI, vol. 25, no. 12,

pp. 1631 – 1639, 2003.[3]

J. Gllavata, R. Ewerth, and B. Freisleben, Text

Detection in Images Based on UnsupervisedClassification of High-Frequency WaveletCoefficients, Proc. of Int’l Conf. on Pattern

Recognition, Cambridge, UK, (page 425-428 Year ofPublication : 2004 ICPR.2004.1334146 ).

[4]

B. Epshtein, E. Ofek, and Y. Wexler, Detecting textin natural scenes with stroke width transform, in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., Page.

2963 – 2970 Year of Publication: 2010CVPR.2010.5540041

[5]

Yingli Tian and Chucai Yi, Text string detection fromnatural scenes by structure based partition andgrouping, IEEE Transactions on image processing,

vol. 20, no. 9, pp. 2594-2605, 2011.[6] Gatos, B.,Pratikakis, I. & Perantonis, S.J. ,Towards

text recognition in natural scene Images, inProceedings of Int. Conf. Automation andTechnology, ( Page 354-359 Year of Publication

2005)[7] Yi-Feng Pan, Xinwen Hou, Cheng-LinLiu(2009),

“Text Localization In Natural Scene Images BasedOn Conditional Random Field,” ICDAR,pp 6-10.

[8] Y.Liu, S. Goto, and T. Ikenaga, A contour-basedrobust algorithm for text detection in color images,

IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1221 –

1230, 2006.

[9] H Koo and D Kim., Scene text detection viaconnected component clustering and non-tex

filtering, IEEE Trans. Image Proc., vol. 22, no. 6 pp2296 – 2305, 2013

[10]

P. Shivakumara, T. Q. Phan, L. Shijian and C. LTan, Gradient Vector Flow and Grouping Based forArbitrarily-Oriented Scene Text Detection in Video

Images, IEEE Trans. CSVT, 2013, pp 1729-1739.

scene text detection using machine learning classifiers

Documents