· indian institute of technology roorkee roorkee candidate’s declaration i hereby certify that...

NEW FEATURE DESCRIPTORS FOR IMAGE RETRIEVAL,

OBJECT TRACKING AND SHOT DETECTION

Ph. D. THESIS

by

MANISHA VERMA

DEPARTMENT OF MATHEMATICS

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

ROORKEE- 247 667 (INDIA)

DECEMBER, 2015

NEW FEATURE DESCRIPTORS FOR IMAGE RETRIEVAL,

OBJECT TRACKING AND SHOT DETECTION

A THESIS

Submitted in partial fulfilment of the

requirements for the award of the degree

of

DOCTOR OF PHILOSOPHY

in

MATHEMATICS

by

MANISHA VERMA

DEPARTMENT OF MATHEMATICS


ROORKEE- 247 667 (INDIA)

DECEMBER, 2015


ROORKEE

CANDIDATE’S DECLARATION

I hereby certify that the work which is being presented in the thesis entitled

“NEW FEATURE DESCRIPTORS FOR IMAGE RETRIEVAL, OBJECT TRACKING

AND SHOT DETECTION” in partial fulfilment of the requirements for the award of

the Degree of Doctor of Philosophy and submitted in the Department of Mathematics of

the Indian Institute of Technology Roorkee, Roorkee is an authentic record of my own

work carried out during a period from July, 2012 to December, 2015 under the supervision of

Dr. R. Balasubramanian, Associate Professor, Department of Computer Science and Engineering,

Indian Institute of Technology Roorkee, Roorkee.

The matter presented in this thesis has not been submitted by me for the award of any

other degree of this or any other Institute.

(MANISHA VERMA)

This is to certify that the above statement made by the candidate is correct to the best of

my knowledge.

(R. Balasubramanian)

Supervisor

The Ph.D. Viva-Voce Examination MANISHA VERMA, Research Scholar, has been

held on ….…….………, 2016.

Chairman SRC External Examiner

This is to certify that the student has made all the corrections in the thesis.

(R. Balasubramanian)

Supervisor Head of the Department

Date: ……………….

Abstract

Image retrieval has been a popular research area due to extensive online and offline

image database. Content based image retrieval (CBIR) has served well in the areas

of education, multimedia, medical diagnosis, art collections, scientific databases, etc.

Feature extraction and similarity detection are measure aspects of a CBIR system.

Similarly, object tracking and shot boundary detection are the standard computer

vision applications which required proficient feature extraction methods. This research

work develops and integrates feature extraction methods for CBIR, object tracking and

shot boundary detection applications. Application of chapter 2 to 6 is content based

image retrieval systems for different databases, chapter 7 targets an object tracking

problem and finally a shot boundary detection problem is solved in chapter 8.

Chapter 2, proposes two techniques using discrete wavelet transform and local fea-

ture descriptors. Local patterns utilize the neighboring pixels to get the local infor-

mation of the image. Discrete wavelet transform (DWT) is first applied to acquire the

subband images and then direction based local patterns, local extrema pattern (LEP)

and directional local extrema pattern (DLEP) are used to extract local directional in-

formation of DWT subband images. Both the patterns work in four specific directions.

In first method, LEP is uniformly applied to all the subband images. Moreover, in

second method, based on the direction information of the wavelet coefficient, corre-

sponding DLEP is applied. Wavelet has proved its directional information significance

and hence it helps LEP and DLEP to create more orientated features.

i

In Chapter 3 and 4, local information is extracted using local patterns and that

information further organized in a feature vector using co-occurrence of pixel pairs in

pattern map. Most of the local pattern that have been proposed by researchers, used

only occurrence of each pattern value in the pattern map. Besides, in this work, pixels

are analyzed in occurrence of pattern value pairs and on the basis of occurrence values

corresponding feature vectors are formed. In Chapter 2, HSV color space is used for

extracting color information using histograms of hue and saturation components and

LEP is extracted from value component. Further, to extract co-occurrence informa-

tion, gray level co-occurrence matrix (GLCM) is derived from LEP map. In Chapter

4, co-occurrence matrix is utilized in different directions and distances to obtain more

local directional information. In this chapter, center symmetric local binary pattern

(CSLBP) are employed to acquire the local information and GLCM of 0◦, 45◦, 90◦ and

135◦ orientation and one and two distances are applied to CSLBP map. Different com-

binations are analyzed for performance in CBIR application and results are projected

accordingly.

Two novel local patterns are proposed based on pixel directions and mutual relation-

ship of neighboring pixels in chapter 5 and 6. Local tri-directional pattern (LTriDP) for

texture features is proposed in chapter 5. It extracts information of each neighboring

pixel related to a center pixel in three specific directions. On the basis of thresholding

of neighboring pixel with other three neighboring pixel, a ternary pattern (0, 1 or 2) is

assigned to corresponding pixel. Also, one magnitude pattern is extracted using same

pixel. Both patterns are combined and called local tri-directional pattern and used as

a feature descriptor of CBIR system. In chapter 6, local neighboring difference pattern

(LNDP) is proposed which deals with mutual relationship of neighboring pixels. Re-

lationship of each neighboring pixel is calculated with two other adjacent neighboring

pixels and pattern map is created. In feature extraction, LNDP is combined with LBP

as they are compliment with each other since LBP extracts the information regarding

center and neighboring pixel relationship and LNDP extracts mutual relationship of

neighboring pixels. Combined feature is applied to textural and natural image database

for image retrieval.

Chapter 7 and 8 are based on video problems of object tracking and shot bound-

ary detection. A new texture feature is proposed called local rhombus pattern and

ii

combined with HSV color histograms in chapter 7. Local rhombus pattern creates

a local patterns using four neighboring pixels of each center pixel in image. Feature

extraction is performed using color and texture information of objects in the video and

mean shift tracking algorithm is used for tracking the object. In chpater 8, a hierarchi-

cal approach is applied to extract shot boundaries. Two step approach is implemented

using RGB color histogram and local binary pattern (LBP). Hierarchical method us-

ing global and local features helped in reducing the extra number of keyframes from

repeated shots in video sequence.

iii

Acknowledgements

First and foremost, I would like to thank the God for his uncountable blessings

throughout my life and ever more during my research.

I would like to express my deepest gratitude to my supervisor Dr. Balasubramanian

Raman for the continuous support during my Ph.D study and related research, for his

patience, knowledge, and immense motivation. His guidance helped me in all the time

of research and writing of this thesis. I could not have imagined having a better advisor

and mentor for my Ph.D study. He is a very helpful person, admirable teacher and

wonderful supervisor.

Besides my advisor, I would like to thank Prof. V.K. Katiyar, Head of Department

for providing facilities to carry out my research work. I extend my thanks to the mem-

bers of student research committee, Prof. Kusum Deep, Dr. Sanjeev Malik, Prof. R.P.

Maheshwari and Dr. Partha Pratim Roy for their insightful comments and encourage-

ment, but also for the hard question which motivated me to widen my research from

various perspectives. My special thanks to Dr. Subrahmanyam Murala for technical

discussions, advises, motivation and providing source codes of his algorithms. I would

like to thank to former Prof. Mridula Garg, University of Rajasthan, who showed me

the path of higher education and IIT Roorkee.

I thank to Mathematics Department, IIT Roorkee for infrastructure and all neces-

sary facilities for my Ph.D. I also thank to Computer Science & Engineering Depart-

ment and Computer Center, IIT Roorkee for providing computing and lab facilities

v

for research work. I would like to acknowledge all the teachers from school to research

career who motivated me in education, research and life. I thank all the staff members

of Mathematics Department for all necessary help.

I thank all my seniors and labmates Dr. Sanoj Kumar, Dr. Anil Gonde, Dr.

Himanshu Agarwal, Dr. Asha Rani, Pushpendra Kumar, Tasneem Ahmed, Shitala

Prasad, Naresh Atri, Bhavik Patel, Deepak Murugan, Arun Pundir, Priyanka Singh,

Anjali Gautam and many more for their support and advises in research. I would like

to thank all my friends and juniors Garima, Niyati, Shivani, Arachna, Reenu, Neha,

Divya, Priyanka, Geetika, Queeny, Rupali, Urvashi, Vanita, Abhijeet and Sudhakar for

their all time support and help.

I acknowledge Ministry of Human Resource and Development (MHRD) and Stu-

dent’s Career Development Fund, IITR Alumni Affairs for providing financial assis-

tance during my Ph.D.

Last but not the least, I would like to thank my family: my paternal grandparents;

Sh. Shiv Ram Verma and Smt. Chandravati Verma, my maternal grandparents; Late

Sh. Mithan Lal Kumawat and Smt. Chota Devi, my parents; Sh. Vijesh Ku. Verma

and Smt. Pushpa Verma, my uncle aunt; Sh. Satish Ku. Verma and Smt. Sumita

Verma and to my brothers, sisters and sister-in-law; Rahul, Rohit, Rohan, Gunjan,

Nikita and Anjali for supporting me spiritually throughout my Ph.D. and my life in

general.

vi

Table of Contents

Abstract i

Acknowledgements v

Table of Contents vii

List of Figures xiii

List of Tables xvii

List of Abbreviations xix

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Content based image retrieval . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Image database . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Query image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.4 Similarity measure . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.5 Evaluation measure . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.6 Relevance feedback . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.3 Object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 Shot boundary detection . . . . . . . . . . . . . . . . . . . . . . . . . . 16

vii

TABLE OF CONTENTS

1.5 Literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.1 Color features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.2 Texture features . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.5.3 Local features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.5.4 Biomedical image retrieval . . . . . . . . . . . . . . . . . . . . . 19

1.5.5 Object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.5.6 Shot detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.7 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 CBIR System using Discrete Wavelet Transform and Local Patterns 27

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.1.1 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . 28

2.1.2 Local extrema pattern . . . . . . . . . . . . . . . . . . . . . . . 29

2.1.3 Directional local extrema pattern . . . . . . . . . . . . . . . . . 30

2.2 Proposed methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.1 Proposed method 1 . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2.2 Proposed method 2 . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . 35

2.3.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3 Local Extrema Co-occurrence Pattern for Image Retrieval 41

3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1.1 Color space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.1.2 Gray level co-occurrence matrix . . . . . . . . . . . . . . . . . . 42

3.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Proposed system framework . . . . . . . . . . . . . . . . . . . . . . . . 45


3.4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

viii

TABLE OF CONTENTS

3.4.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.5 Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4.6 Experiment results with different distance measure . . . . . . . 55

3.4.7 Proposed method with different quantization levels . . . . . . . 56

3.4.8 Computational complexity . . . . . . . . . . . . . . . . . . . . . 57

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Center Symmetric Local Binary Co-occurrence Pattern for CBIR 61

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.1.1 Center symmetric local binary pattern . . . . . . . . . . . . . . 62

4.1.2 Gray level co-occurrence matrix . . . . . . . . . . . . . . . . . . 62

4.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63




4.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . . . 67


4.4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.5 Proposed method using different directions and distances in GLCM 73

4.4.6 Proposed system using different distance measure . . . . . . . . 74

4.4.7 Feature vector length and computation time . . . . . . . . . . . 75

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Local Tri-Directional Patterns : A New Feature Descriptor 79

5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.1.1 Local binary pattern . . . . . . . . . . . . . . . . . . . . . . . . 80

5.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81


5.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85


ix

TABLE OF CONTENTS


5.4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.4.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Local Neighborhood Difference Pattern : A New Feature Descriptor 97

6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.1.1 Local binary pattern . . . . . . . . . . . . . . . . . . . . . . . . 98

6.1.2 Local ternary pattern . . . . . . . . . . . . . . . . . . . . . . . . 98

6.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99



6.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102


6.4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Object Tracking using Joint Histogram of Color and Local Rhombus

Pattern 117

7.1 Local rhombus pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.2 Framework of proposed algorithm . . . . . . . . . . . . . . . . . . . . . 119

7.2.1 Target object representation . . . . . . . . . . . . . . . . . . . . 119

7.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120


7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8 A Hierarchical Shot Boundary Detection Algorithm 125

8.1 Hierarchical clustering for shot detection and key frame selection . . . . 126


8.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

x

TABLE OF CONTENTS

9 Conclusions and Future Scope 133

9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

9.2 Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Appendix 139

Bibliography 145

Author’s Publications 167

xi

List of Figures

1.1 CBIR system architecture . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Corel 1k sample images [1] . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Corel 5k image samples (one image per category) [2] . . . . . . . . . . . 6

1.4 Sample images from Corel-10k database [2] . . . . . . . . . . . . . . . . 6

1.5 Sample images from urban and natural scene database, MIT [4] . . . . 7

1.6 MIT VisTex color texture database image samples [3] . . . . . . . . . . 8

1.7 MIT VisTex database sample images [3] . . . . . . . . . . . . . . . . . 8

1.8 Sample images from Brodatz texture database [127] . . . . . . . . . . . 9

1.9 STex color texture database sample images [65] . . . . . . . . . . . . . 10

1.10 OASIS Database sample images [78] . . . . . . . . . . . . . . . . . . . . 10

1.11 ORL Database sample images [5] . . . . . . . . . . . . . . . . . . . . . 11

2.1 1-level discrete wavelet transform example . . . . . . . . . . . . . . . . 28

2.2 2-dimensional filter bank and downsampling process for 2d-DWT . . . 29

2.3 Local Extrema Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4 Directional Local Extrema Pattern . . . . . . . . . . . . . . . . . . . . 31

2.5 Block diagram of the proposed method . . . . . . . . . . . . . . . . . . 33

2.6 Block diagram of the proposed system . . . . . . . . . . . . . . . . . . 34

2.7 Corel-5k database (a) precision and (b) recall, with number of images

retrieved, and (c) precision and (a) recall, with image database category 37

xiii

LIST OF FIGURES

2.8 Corel-5k database (a) precision and (b) recall, with number of images

retrieved, and (c) precision and (a) recall, with image database category 39

3.1 Gray level co-occurrence matrix computation example . . . . . . . . . . 43

3.2 Proposed system block diagram . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Results of precision and recall with number of images retrieved of Corel-

1k database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Precision-recall curve and F-measure curve for Corel-1k database . . . . 48

3.5 Corel-5k plots of (a) precision and (b) recall, with number of images

retrieved, and (c) precision and (d) recall, with category number . . . . 49

3.6 (a) Precision-recall curve and (b) F-measure curve for Corel-5k database 50

3.7 Graphs of Corel-10k database (a) precision and images retrieved (b)

recall and images retrieved from database (c) precision and category

number (d) recall and category number . . . . . . . . . . . . . . . . . . 51

3.8 (a) Precision-recall curve and (b) F-measure curve for Corel-10k database 52

3.9 MIT VisTex database results of (a) average precision and (b) average

recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.10 (a) Precision-recall curve and (b) F-measure curve for MIT VisTex

database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.11 STex database results of (a) average precision and (b) average recall . . 55

3.12 (a) Precision-recall curve and (b) F-measure curve for STex database . 56

4.1 Center symmetric local binary pattern computation example . . . . . . 62

4.2 Different combinations of (d, θ) used for feature vector computation in

GLCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3 Proposed method feature vector computation for sample image . . . . . 66

4.4 Proposed algorithm block diagram . . . . . . . . . . . . . . . . . . . . 66


4.6 (a) Average precision and (b) recall graph for MIT VisTex database . . 69

4.7 Query image retrieval in MIT VisTex texture image database . . . . . . 70

4.8 (a) Average precision and (b) recall graph for Brodatz texture database 71

4.9 (a) Average precision and (b) recall graph for ORL face database . . . 72

4.10 Query image retrieval in ORL face image database . . . . . . . . . . . 73

xiv

LIST OF FIGURES

4.11 Query image retrieval in ORL face image database for all methods . . . 74

4.12 Average precision and group precision graph for OASIS medical image

database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.13 Query image retrieval in OASIS medical image database . . . . . . . . 76

5.1 Local binary pattern example . . . . . . . . . . . . . . . . . . . . . . . 80

5.2 Sample window example of the proposed method . . . . . . . . . . . . 81

5.3 Block diagram of the proposed method . . . . . . . . . . . . . . . . . . 86

5.4 Precision and recall with number of images retrieved for database 1 . . 87

5.5 (a) Precision and (b) recall of proposed methods for database 1 . . . . 88

5.6 (a) Precision and (b) recall with number of images retrieved for database 2 89

5.7 (a) Precision and (b) recall of the proposed methods for database 2 . . 90

5.8 (a) Precision and (b) recall with number of images retrieved for database 3 91

5.9 (a) Precision and (b) recall of proposed methods for database 3 . . . . 93

5.10 ORL database query example . . . . . . . . . . . . . . . . . . . . . . . 94

6.1 Local ternary pattern calculation (a) a window example (b) difference

of neighboring and center pixel (c) ternary pattern for t=3 (d) ternary

pattern divided in two binary patterns (e) weights (f) weights multiplied

by binary patterns and sum up to pattern value . . . . . . . . . . . . . 98

6.2 Local neighborhood difference pattern calculation (a) pixel presentation

(b) a window example (f-m) pattern calculation for each neighboring

pixel (c) binary values assigned to each neighboring pixel (d) weights

(e) weights multiplied by LNDP pattern and sum up to pattern value . 99

6.3 (a) LBP features (b) LNDP features (c) Concatenation of LBP and LNDP101


6.5 (a) Precision vs number of images retrieved (b) Recall vs number of

images retrieved in Database 1 . . . . . . . . . . . . . . . . . . . . . . . 103

6.6 Comparison between LBP, LNDP and fusion method in Database 1 . . 104

6.7 Query image example of Brodatz database images . . . . . . . . . . . . 105




xv

LIST OF FIGURES



6.11 (a) Precision vs image category (b) Recall vs image category in Database 3110




6.14 (a) Precision vs image category (b) Recall vs image category in Database 4113


6.16 Query image example of urban and natural scene database, MIT . . . . 115

7.1 Local rhombus pattern sample window example . . . . . . . . . . . . . 118

7.2 Object tracking in road traffic video using (a)LBPriu2 RGB (b) LEP RGB

and, (c) LRP HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.3 Results of a player tracking in football video of (a) LBPriu2 RGB (b)

LEP RGB and (c) LRP HSV . . . . . . . . . . . . . . . . . . . . . . . 122

8.1 Consecutive frames and shot boundary of a video . . . . . . . . . . . . 126

8.2 Distance measure calculation in 2nd phase . . . . . . . . . . . . . . . . 128

8.3 Video 1: (a) Initial stage keyframes (b) final stage keyframes . . . . . . 131

8.4 Video 1: (a) Initial stage keyframes (b) Final stage keyframes . . . . . 132

xvi

List of Tables

1.1 Image databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 MRI data acquisition details [78] . . . . . . . . . . . . . . . . . . . . . 9

2.1 Applied DLEP on wavelet coefficient . . . . . . . . . . . . . . . . . . . 33

2.2 Precision and Recall percentage for all methods . . . . . . . . . . . . . 38

2.3 Feature vector length of different methods . . . . . . . . . . . . . . . . 40

3.1 Values of θ1 and θ2 corresponding to θ in GLCM . . . . . . . . . . . . . 43

3.2 Abbreviation of all methods . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Results of Corel-1k, Corel-5k and Corel-10k in precision (for n=10) and

recall (for n=100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.4 Average retrieval rate (ARR) for both MIT VisTex and STex database 57

3.5 Experimental results of the proposed method with different distance

measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.6 Precision and recall of the proposed method with different quantization

schemes for all databases . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.7 Feature vector (F.V.) length, feature extraction (F.E.) and image re-

trieval (I.R.) time of different method . . . . . . . . . . . . . . . . . . . 58

4.1 Results of previous methods and the proposed method for all databases 72

4.2 Proposed method with different direction and distance in GLCM . . . . 76

4.3 Results of all databases with different distance metrics . . . . . . . . . 77

xvii

LIST OF TABLES

4.4 Computation time and feature vector length of all methods . . . . . . . 77

5.1 Average retrieval rate of all databases . . . . . . . . . . . . . . . . . . . 87

5.2 Average normalized modified retrieval rank of different methods and

databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92


6.1 Average retrieval rate for STex and Brodatz databases . . . . . . . . . 108

6.2 Results of precision and recall for all methods . . . . . . . . . . . . . . 110


7.1 Feature vector length and process time of proposed method and previous

methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.1 Video details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.2 Number of keyframes extracted in both phases . . . . . . . . . . . . . . 131

xviii

List of Abbreviations

AMORE Advanced Multimedia Oriented Retrieval Engine

ANMRR Averaged Normalized Modified Retrieval Rate

APR Average Precision Rate

ARR Average Retrieval Rate

BLK LBP Block based Local Binary Pattern

CBVQ Content-Based Visual Query

CCM Color Co-occurrence Matrix

CHKM Color Histogram for K-mean

CSLBCoP Center Symmetric Local Binary Co-occurrence Pattern

CSLBP Center Symmetric Local Binary Pattern

CT Computed Tomography

db4 Daubechies-4

DBPSP Difference Between Pixels of Scan Pattern

DLEP Directional Local Extrema Pattern

DSLR Digital Single-lens Reflex camera

DWT Discrete Wavelet Transform

GLCM Gray Level Co-occurrence Matrix

HSV Hue; Saturation; Value

LBP Local Binary Pattern

LBPriu2 Rotation Invariant Uniform Local Binary Pattern

xix

List of Abbreviations

LDP Local Derivative Pattern

LECoP Local Extrema Co-occurrence Pattern

LEP Local Extrema Pattern

LEPINV Local Edge Pattern for Image Retrieval

LEPSEG Local Edge Pattern for Segmentation

LMEBP Local Maximum Edge Binary Pattern

LMeP Local Mesh Pattern

LMePVEP Local Mesh Peak Valley Edge Patterns

LNDP Local Neighborhood Difference Pattern

LRP Local Rhombus Pattern

LTCoP Local Ternary Co-occurrence Patterns

LTP Local Ternary Pattern

LTriDP Local Tri-Directional Pattern

LTriDPmag Local Tri-directional Pattern Magnitude

LTrP Local Tetra Patterns

MIT VisTex Massachusetts Institute of Technology Vision Texture

MRI Magnetic Resonance Images

MRR Modified Retrieval Rank

NMRR Normalized Modified Retrieval Rank

OASIS Open Access Series of Imaging Studies

ORL Olivetti Research Ltd

PM Proposed Method

PM1 Proposed Method 1

PM2 Proposed Method 2

PVEP Peak Valley Edge Pattern

RGB Red; Green; Blue

SLR Single-lens Reflex camera

STex Salzburg Texture Image Database

YCbCr Luminance; Chroma blue; Chroma red

xx

Chapter 1

Introduction

1.1 Motivation

The expansion of online and offline images in various areas, e.g., education, news,

entertainment, etc. makes retrieval of images both fascinating and important. From

birthday party to professional conferences people used to take digital images and save

them for future, therefore images are increasing rapidly. Social media advancement,

e.g., Facebook, Twitter, Instagram, GooglePlus, etc. has increased the online database

of images as people upload their photos for social activities on these social networking

sites. In addition, high quality digital imaging devices, e.g., Single-lens reflex camera

(SLR), Digital single-lens reflex camera (DSLR), camcorder, etc. have placed their feet

in the market. Nowadays, not only professional photographers, normal people used to

own these devices, therefore, image and video databases have increased. Similarly, there

is a huge database of biomedical images for disease diagnosis. Biomedical images exist

in different formats, such as, magnetic resonance images (MRI), computed tomography

(CT), X-ray, etc.

Image retrieval or searching can be performed using text-based and content-based.

In text based image retrieval, a textual query is involved that helps in extracting

1

1.2 Content based image retrieval

similar images related to the query text. This is a traditional image retrieval method

based on meta data such as captions, keywords, etc. of images and used by Google

Images, Yahoo Image Search, Bing Images, etc. It involves manual or automatic an-

notation of images and it is neither efficient nor effective since it is laborious and time

consuming. Also, annotation is subjective and sometimes it gets confuse to under-

stand what user wants. On the other hand, content based image retrieval is popular

since 1990s and still an active research problem. Many image retrieval systems, e.g.,

AltaVista Photofinder, AMORE (Advanced Multimedia Oriented Retrieval Engine),

Berkeley Digital Library Project, Blobworld, C-bird (Content-Based Image Retrieval

from Digital libraries), CBVQ (Content-Based Visual Query), DrawSearch, etc. have

been proposed by researchers [149].

The presented work in this thesis is related to feature extraction methods for content

image retrieval, object tracking and shot boundary detection problems. Comprehensive

and extensive surveys of content based image retrieval and object tracking techniques

have been presented by researchers [57, 70, 167, 134, 81]. Mainly, image features fall

in two categories, i.e., low level and high level features. Low level features represent

visual image feature, e.g., color, texture, shape, etc. whereas high level features are

semantic features which can be obtained using textual annotation or complex visual

feature maps.


Content-based image retrieval (CBIR) is the application of computer vision techniques

and it involves the problem of searching for digital images in large databases. “Content-

based” means that the search analyzes the contents of the image rather than the meta

data such as keywords, tags, or descriptions associated with the image. The term

“content” in this context might refer to color information, textural distribution infor-

mation, object shapes, object’s spatial orientation or any other information that can

be derived from the image itself. Content based image retrieval is a hybrid research

area, which needs knowledge of both mathematics and computer science for an effi-

cient image retrieval system. Image retrieval is based on image matching, and image

matching is performed by feature matching.

2

Chapter 1. Introduction

Figure 1.1: CBIR system architecture

System architecture of content based image retrieval system has been demonstrated

in Fig. 1.1. A typical CBIR system involves the following key items:

• Image database

• Query image

• Feature extraction method

• Similarity matching

• Evaluation measures

• Relevance feedback

3


Table 1.1: Image databases

Category Image database

Natural image database

Corel 1k

Corel 5k

Corel 10k

MIT natural and urban scene image database

Texture image database

MIT VisTex color database

MIT VisTex gray scale database

Brodatz database

STex database

Biomedical image database OASIS MRI database

Facial image database ORL face database

1.2.1 Image database

CBIR system retrieves similar images from the existing image database. Many databases

are available freely on web or one can make their own database. In the presented work,

four kind of databases are used, i.e., natural, textural, medical and face image as shown

in table 1.1. Explanation about each database according to their category are given

below:

Database 1

Database 1 includes the Corel-1k database [1], that consist 1,000 natural images. It

has 1,000 images in 10 categories, and each category is having 100 images. It includes

images of Africans, beaches, buildings, dinosaur, elephant, flower, buses, hills, moun-

tains and food. Size of images in this database is either 384× 256 or 256× 384. Some

sample images from Database 1 are shown in Fig. 1.2, in which 3 images per category

are shown.

Database 2

The Corel-5k database [2] is Database 2, and it has 5,000 images of random categories.

It involves images of animals, for e.g., bear, fox, lion, tiger, etc., human, natural

4


1

Figure 1.2: Corel 1k sample images [1]

scenes, buildings, paintings, fruits, cars, etc. It is a collection of total 5,000 images

of 50 categories, and 100 images per category. Sample images from Database 2 are

collected, and shown in the Fig. 1.3. One image is taken from each category of the

Corel-5k database in sample image figure.

Database 3

Database 3 is a continuation of the Corel-5k database [2]. Extra 5,000 images are

appended to the Corel-5k database to make bigger and versatile. Hence, it has 10,000

images of 100 types, and 100 images are in each type. In addition of Corel-5k database,

it has images of ships, buses, food, textures, airplanes, furniture, army, ocean, cats,

fishes, etc. Sample images from Corel-10k database are shown in Fig. 1.4.

Database 4

Fourth database in natural image category is taken from Computational Visual Cogni-

tion Laboratory, MIT [4]. It contains few hundred images of urban and natural scenes,

e.g., coast & beach, forest, highway, city center, mountain, open country, streets and

tall buildings. Each image in this database is of size 256×256. For experimental pur-

poses, 200 images per category are selected. Sample images from each category have

been shown in Fig. 1.5.

5


Figure 1.3: Corel 5k image samples (one image per category) [2]

Figure 1.4: Sample images from Corel-10k database [2]

6


Figure 1.5: Sample images from urban and natural scene database, MIT [4]

Database 5

Database 5 is collected from MIT VisTex database [3]. This database contains a large

amount of colored texture images, and 40 textures are selected for the experiment. The

size of each image is 512 × 512. For retrieval purpose, all 40 images are divided into

16 block images of size 128 × 128 and hence, 16 images belong to each category, and

total 40 categories are there with total 640 images. Sample images from this database

are shown in Fig. 1.6.

Database 6

This database is a gray scale version of MIT VisTex color database. It has images of

similar size and scale as Database 5. Sample images are presented in Fig. 1.7.

Database 7

Database 7 contains Brodatz texture database [127]. It has total 112 images of 640×640

size. For retrieval purpose, each image is divided into 25 sub-images of size 128× 128.

7


Figure 1.6: MIT VisTex color texture database image samples [3]

Figure 1.7: MIT VisTex database sample images [3]

Hence, total 112 × 25, i.e., 2800 images exist in the database for experiment. It is

comparatively larger than MIT VisTex database. Some images from Brodatz database

are shown in Fig. 1.8.

Database 8

Database 8 is the Salzburg Texture Image Database (STex) and it is a big collection

of texture images [65]. It contained total 476 images, and each image is divided into

16 non-overlapping sub images. Total 7616 images obtained from this database with

having 476 categories. Some sample images from STex database are given in Fig. 1.9.

8


Figure 1.8: Sample images from Brodatz texture database [127]

Database 9

The Open Access Series of Imaging Studies (OASIS) [78] is publicly available dataset for

research and study. It is a series of magnetic resonance imaging (MRI). This database

includes a cross-sectional collection of 421 subjects aged between 18 to 96 years. The

MRI acquisition details are given in Table 1.2. These MRI images are grouped in four

categories (124, 102, 89, and 106 images) based on the shape of ventricular. Hence,

this database contains total 421 images with 4 categories.

Table 1.2: MRI data acquisition details [78]

Sequence MP-RAGE

TR (msec) 9.7

TE (msec) 4.0

Flip angle (o) 10

TI (msec) 20

TD (msec) 200

Orientation Sagittal

Thickness, gap (mm) 1.25, 0

Resolution (pixels) 176208

9


Figure 1.9: STex color texture database sample images [65]

Figure 1.10: OASIS Database sample images [78]

10


Database 10

The Olivetti Research Ltd (ORL) database of faces is created by AT&T laboratories,

Cambridge [5]. Images present in this database, have been taken between April 1992

and April 1994. It contains images of 40 users and each user have 10 images. For

some users, the images were taken at different times, with different facial expression,

varying the lighting and with glasses or without glasses. The size of each image in this

database is 92× 112. Sample images from each category has been shown in Fig. 1.11.

Figure 1.11: ORL Database sample images [5]

1.2.2 Query image

Query image represents a sample image for what kind of images user wants to retrieve

from the existing database. Query image can be an any random image and used to re-

trieve similar images. This is called as query by example. To evaluate the performance

of a CBIR system, query can be used as a database image itself. Query image can be

formed by a sketch also.

11


1.2.3 Feature extraction

Feature extraction is an effective step in image retrieval and its importance depends on

how precisely the feature extraction technique suits on image database taken. There

are two types of features called low level and high level features. Color, shape, texture,

etc. include low level features and conceptual, text descriptor are high level features.

Low level features may be local or global descriptors.

Proposed feature extraction method in this work are as follows and described in

further chapters:

1. Wavelet based local features

2. Color-texture feature

3. Integration of two texture features

4. Local information based texture features

5. Hierarchical color-texture feature

1.2.4 Similarity measure

Feature extraction has to be done for all the images of database and query image,

and a feature vector database has been constructed for the full image database. After

applying the feature extraction process, similarity has been performed for query image.

The following distance measures have been used for the similarity match.

d1 distance

D(dbk, q) =L∑

m=1

∣∣∣∣ Fdbk(m)− Fq(m)

1 + Fdbk(m) + Fq(m)

∣∣∣∣ (1.1)

Euclidean distance

D(dbk, q) =

(L∑

m=1

∣∣(Fdbk(m)− Fq(m))2∣∣) 1

2

(1.2)

Manhattan distance

D(dbk, q) =L∑

m=1

|Fdbk(m)− Fq(m)| (1.3)

12


Canberra distance

D(dbk, q) =L∑

m=1

∣∣∣∣Fdbk(m)− Fq(m)

Fdbk(m) + Fq(m)

∣∣∣∣ (1.4)

Chi-square distance

D(dbk, q) =1

2

L∑m=1

(Fdbk(m)− Fq(m))2

Fdbk(m) + Fq(m)(1.5)

where D(dbk, q) measures the distance between kth database image dbk and the

query image q. Length of the feature vector is denoted by L, and Fdbk and Fq are the

feature vectors of kth database image and the query image respectively.

1.2.5 Evaluation measure

Precision and recall are used to observe the performance of the CBIR system. The

precision of the system represents a ratio of the number of relevant images in retrieved

images and the total number of retrieved images from the database. In the same

manner, recall gives the proportion of the number of relevant images in retrieved images

and the total number of relevant images in the database. For a given query image i, if

total n images are being retrieved, then precision and recall can be calculated as:

P (i, n) =Number of relevant images retrieved

n(1.6)

R(i, n) =Number of relevant image retrieved

Nic

(1.7)

where Nic indicates the total number of relevant images in the database, i.e., number

of images in each category of the database. Average precision and average recall are

formulated as:

Pavg(j, n) =1

Nic

Nic∑i=1

P (i, n) (1.8)

Ravg(j, n) =1

Nic

Nic∑i=1

R(i, n) (1.9)

where j denotes the number of categories. Finally, total precision and total recall for

the whole database are calculated as:

Ptotal(n) =1

Nc

Nc∑j=1

Pavg(j, n) (1.10)

13


Rtotal(n) =1

Nc

Nc∑j=1

Ravg(j, n) (1.11)

where Nc is the number of total categories exist in the database. Precision and recall

are strong evaluation measures but F-measure combines them in a harmonic mean.

F-measure is defined as a relation between both precision and recall, and it gets larger

when both precision and recall are large. F-measure on the basis of precision and recall,

is calculated as follows:

F =2× precision× recallprecision+ recall

(1.12)

The average normalized modified retrieval rate (ANMRR) is used by the MPEG group

to evaluate the performance of a system [77]. For a given query image Q, total number

of relevant images in the database (ground-truth values) are Ng(Q). Rank of each

ground-truth value for query Q is defined as Rank1(i), i.e., position of the ground-

truth image i in retrieved images. Moreover, a variable K(Q) > Ng(Q) is defined as a

limit of ranks. In retrieved images, a ground-truth value that has a rank greater than

K(Q) is considered as a miss and a new rank, Rank(Q) is defined as follows:

Rank(i) =

Rank1(i) if Rank1(i) ≤ K(Q)

1.25×K(Q) if Rank1(i) > K(Q)(1.13)

K(Q) = min(4×Ng(Q), 2×max(Ng(Q), ∀Q)) (1.14)

Average rank (AVR) can be defined as:

AV R(Q) =1

Ng(Q)

Ng(Q)∑i=1

Rank(i) (1.15)

Modified retrieval rank (MRR) and normalized modified retrieval rank (NMRR)

for different ground-truth values are defined as:

MRR(Q) = AV R(Q)− 0.5× [1 +Ng(Q)] (1.16)

NMRR(Q) =MRR(Q)

1.25×K(Q)− 0.5× (1 +Ng(Q))(1.17)

Average normalized modified retrieval rank (ANMRR) is average of NMRR for different

queries.

ANMRR =1

NQ

NQ∑q=1

NMRR(q) (1.18)

14


where NQ is number of query images. ANMRR value lies between 0 and 1, and AN-

MRR value more close to 0 indicates that more ground-truth results found in retrieval.

Further explanation about ANMRR can be found in [77].

1.2.6 Relevance feedback

CBIR system provides the results based on feature extraction and feature matching.

Relevance feedback is a technique that takes user feedback and improvised the results.

It is a supervised learning technique that helps in upgrading the performance of the

system. It works as a mapping between low level features to conceptual features based

on user requirement. Low level features are directly related to image contents, e.g.,

color, shape, texture. Feedback from user itself leads a CBIR system from low level to

high level semantics. In the relevance feedback process, the query image is modified

based on user feedback and again retrieval technique is processed for better results.

1.3 Object tracking

Object tracking is a crucial issue in the field of pattern recognition and computer vision.

It mainly finds applications in the areas of vehicle navigation, traffic monitoring, face

tracking, etc. Object tracking includes tasks such as object detection in frame, object

feature extraction and object tracking using features.

Object detection is the process of finding notable items of real-world objects such

as cars on road, faces in crowd, planes in the sky, and buildings in images or videos.

Object detection algorithms also use image features and learning algorithms to detect

instances of an object category. Next task is of feature extraction of detected object and

it depends on the category of video and detected objects. Color, object shape, texture,

object orientation and other related features can be utilized as the requirement of video

in this step. Tracking is the key step in object tracking process. Tracking algorithm

chases the object of user interest in further frames. In the presented thesis work, a

problem of object tracking is solved and a novel texture feature is proposed.

15

1.4 Shot boundary detection

1.4 Shot boundary detection

Video is a four dimensional data, and it is a collection of images with some temporal

relation in between sequential images. A video scene is made of some shots, and shots

contain similar images. Keyframe is a frame which is assumed to contain most of the

information of a shot. Keyframe may be one or more according to the requirement of

the system. Shot detection and key frame selection are the initial stages of a video

retrieval model system. It is near impossible to process a video for retrieval or analysis

task, without key frame detection. Key frame detection appears to reduce a large

amount of data from video that makes it easy for further process. In this work, a

shot boundary detection problem is solved using hierarchical approach for color and

texture features. Hierarchical technique is used as a two step approach to remove the

redundant information of keyframes.

1.5 Literature survey

A numerous methods have been proposed in low level and high level image feature

extraction by researchers to enhance the accuracy and reduce the computation in image

retrieval. The proposed work in this thesis is related to low level feature extraction in

different applications, hence, a brief literature survey has been given which is able to

describe visual descriptors.

1.5.1 Color features

Color is a captivating feature of image and very eye-catching for human. Feature for

color images extract the information regarding color distribution. Different color spaces

(RGB, HSV, YCbCr, etc.) retain different kind of color distribution that can be used

to extract a variety of color features. Swain and Ballard presented the idea of color

histogram, and distance measure for image matching via histograms [141]. Two new

schemes were presented by Stricker and Orengo for color indexing in that, first holds

complete color distribution, and second contains only major features instead of full

distribution [139]. For both color and texture information, standard wavelet transform

and Gabor wavelet transform were combined with color histogram and applied for

16


image retrieval [86]. Further, new color feature has been proposed using co-occurrence

and clustering. Lin et al. proposed three features, that are color co-occurrence ma-

trix (CCM), difference between pixels of scan pattern (DBPSP) and color histogram

for K-mean (CHKM), in which CCM and DBPSP are related to color and texture,

and CHKM corresponds to the color feature [68]. Integrated color and intensity co-

occurrence matrix has been proposed for color and texture features. Composition of

color and texture features have been computed in it rather than separation. Instead of

RGB, HSV color space is used for color representation, and this method is applied for

image retrieval in large, labeled and unlabeled image database [147].

Color histogram considers the frequency of each intensity but it does not handle the

spatial co-relation of colors. To overcome this issue, color correlogram was proposed

and it considers the spatial co-relation of color intensity in the image [45]. Again, color

correlogram was used for feature vector, and also a relevance feedback technique has

been applied for supervised learning in two ways, first is improving the query image,

and the second is learning the distance metric and applied for improved result in image

retrieval [44]. Color coherence vector was introduced for image retrieval which uses

coherence and incoherence of image pixel colors, and compared with color histogram

for image retrieval [113].

1.5.2 Texture features

Texture is a prominent feature of image and it has been useful in many pattern recog-

nition applications. Texture is defined by small repeated patterns in image. Gray level

co-occurrence matrix (GLCM) first introduced by Haralick, and it is a very popular

method for extracting statistical features of the image [40]. GLCM is a matrix, that

depends on the co-occurrence of every two pixels in image. Haralick calculated the sta-

tistical features of GLCM for texture feature extraction. GLCM was applied directly

to the image to calculate the features, but Zhang et al. used edge image to extract

more precise information using GLCM in texture images [177]. They applied the Pre-

witt edge detector in four directions and calculated GLCM of edge images, and used

statistical features of co-occurrence matrices for texture image retrieval. GLCM was

extended to single and multi-channel co-occurrence matrix for RGB and LUV color

channels, and applied for color texture image retrieval [108]. Partio et al. used gray

17


level co-occurrence matrix with statistical features for rock texture image retrieval

[111]. Gaussian smoothing and pyramid representation were utilized for extracting

multi-scale images, and GLCM is applied to the obtained multi scale images, and sta-

tistical features were calculated for image retrieval by Siqueira et al. [123]. Further,

GLCM was broadly used for different applications [12, 62, 28].

1.5.3 Local features

Local features provide each pixel’s local information that is useful to detect texture

patterns in images. Ojala et al. presented local binary patterns (LBP), which proved

its excellence and standard in many areas as a feature descriptor [105]. Local binary

pattern was modified into uniform and rotation invariant local binary pattern [106].

Translation, rotation and scale invariant method using color and edge has been pro-

posed for color-texture and natural image retrieval [168]. LBP compares all neighboring

pixels with center pixel, but Heikkil et al. presented center symmetric local binary pat-

terns (CSLBP) which computes the difference in four directions [42]. Tan and Triggs

proposed local ternary pattern (LTP), that compares neighboring pixels and center

pixel with a threshold interval, and assign a ternary pattern (1, 0, -1). Further, it is

converted into two binary patterns (0, 1), and this method is applied for face recog-

nition [144]. LBP and LTP were based on all neighboring pixels evenly. A direction

based method called directional local extrema pattern (DLEP) has been proposed for

directional edge information in 0◦, 45◦, 90◦ and 135◦ directions, and applied for image

retrieval [95]. Local extrema pattern has been proposed by Murala et al., and joint

histogram of color and LEP has been applied for object tracking [93].

Moment based local binary pattern has been proposed, in which LBP has been

derived from momentgrams, and momentgrams have been constructed from moment

invariants of original image [109]. Zhang et al. proposed local derivative pattern (LDP)

[175], which is a higher order local binary pattern, and used for face recognition. Lo-

cal ternary co-occurrence patterns (LTCoP) have been proposed for medical image

retrieval, that utilize the properties of LTP and LDP [87]. A method based on edge

distribution using local pattern was proposed, and called local maximum edge binary

pattern (LMEBP). It was obtained by considering the magnitude of local difference

between the center pixel and reference eight neighborhood pixels in descending order,

18


and LMEBP was obtained for all eight neighbor pixels. LMEBP was applied for im-

age retrieval and object tracking [85]. Further, LMEBP is extended by Jasmine and

Kumar [49], in which only first three uniform and rotational invariant LMEBPs were

considered as feature vector, also an HSV color histogram was used for feature vec-

tor, and finally joint histogram was constructed for image retrieval. After local binary

pattern and local ternary pattern, Murala et al. proposed local tetra patterns (LTrP)

which took advantage of vertical and horizontal directional neighborhood of each pixel

and constructed a tetra pattern, which was again converted into binary patterns [96].

They combined it with Gabor transform, and applied it for image retrieval. Jacob

et al. extended local tetra patterns in RGB color channels. For each center pixel of

a particular color channel, other color channels were used for horizontal and vertical

direction pixels, and applied it for image retrieval [51].

1.5.4 Biomedical image retrieval

Content based image retrieval might be beneficial in medical imaging for handling

large image database. It can be very useful for medical students and interns to learn

disease by retrieving similar images corresponding to a particular image. Medical

image retrieval has been performed using an open source system (GNU Image Finding

Tool) with some improvement using histogram and Gabor filters [83]. Discrete sine

transform is used for feature extraction and ’Boosting’ method is applied for increasing

the accuracy of the system [63]. Image retrieval has been performed using wavelet

transform with Daubuchies, Haar and Gabor wavelets, and statistical features have

been extracted for magnetic resonance image retrieval [146]. The directional binary

wavelet pattern has been proposed for face and biomedical image retrieval using binary

wavelet and local binary pattern [94]. Felipe et al. proposed medical image retrieval

using gray level co-occurrence matrix in 0◦, 45◦, 90◦ and 135◦ directions, and 1, 2, 3, 4

and 5 distances [34]. Further, feature vector has been obtained from GLCM.

Murala et al. proposed local mesh pattern (LMeP) for biomedical image retrieval

and indexing. It creates a local pattern using the mesh of neighboring pixels [90]. Peak

valley edge patterns were proposed for medical image retrieval that extracts directional

edge information using first order derivative [88]. Local mesh patterns and peak valley

19


edge patterns were combined into local mesh peak valley edge patterns and proposed

for MRI and CT image indexing and retrieval [91].

1.5.5 Object tracking

Object tracking in a moving camera for non-rigid objects has been performed with

mean shift tracking algorithm and dissimilarity has been measured with a distance

measure derived from Bhattacharya coefficient [21]. For better object tracking, shadow

detection and suppression have been carried out using the HSV color information of

moving objects [25]. A kernel based object tracking was employed for non-rigid objects

using histogram as a feature space [22]. Shape features were also utilized with HSV

color histogram using edge histogram in different directions and applied to object

tracking [131]. An interest point based tracking algorithm was proposed by Babu and

Parate [11]. Texture recognition has been applied in the temporal domain for a dynamic

sequence using local binary pattern in three orthonormal planes [179]. A modified LBP

illumination variation was proposed and applied to detect moving objects in a video

sequence [41]. Takala et al. used color histogram, color correlogram and local binary

pattern for color and texture features. Motion features were extracted using trajectories

and applied for object tracking in indoor and outdoor videos [143].

Object tracking in illumination, occlusion and object/camera motion conditions

has been proposed using local features [115]. A two layer feature learning module has

been proposed using neural network and pre-learned features have been been adopted

in tracking mode in video sequence [157]. Joint color texture histogram created by

LBP and RGB color channel, is used to extract feature and mean shift algorithm

is applied for object tracking [102]. A novel method called, spatial extended center

symmetric local binary pattern was proposed for background subtraction from the

image sequence [164]. Local maxima edge binary pattern (LMEBP) has been proposed,

and rotation invariant uniform LMEBP has been applied for object tracking using mean

shift tracking algorithm [85]. Dash et al. proposed a method based on local binary

patterns and Ohta color features instead of RGB, and employed it for object tracking

[27]. Multiple object tracking in a long sports video was proposed by Liu et al. using

short-term activity of each player in the game [69].

20


1.5.6 Shot detection

A video shot transition happens in two ways, i.e., abrupt and gradual transition. The

abrupt transition happens because of short cuts and gradual transition includes shot

dissolve and fades. Many algorithms have been proposed to detect abrupt and gradual

shot transition in video sequence [15]. A hierarchical shot detection algorithm was

proposed using abrupt transitions and gradual transitions in different stages [16]. Wolf

and Yu presented a method for hierarchical shot detection based on different shot tran-

sition analysis and used multi-resolution analysis. They used a hierarchical approach to

detect different shot transitions, e.g., cut, dissolve, wipe-in, wipe-out, etc. [172]. Local

and global feature descriptors have been used for feature extraction in shot boundary

detection. Apostolidis et al. used local Surf features and global HSV color histograms

for gradual and abrupt transitions for a shot segmentation [9].

Images are still and generally spatial information are extracted for analysis pur-

pose. However, for a video study, temporal information should be recognized with

spatial information. Temporal information defines the activity and transition of a

frame to another frame. Rui et. al. proposed a keyframe detection algorithm using

color histogram and activity measure. Spatial information was analyzed using color

histogram and activity measure is used for temporal information detection. Similar

shots are grouped later for better segmentation [125]. A two stage video segmentation

technique was proposed using a sliding window. A segment of frame is used to detect

shot boundary in first stage, and in second stage, the 2-D segments are propagated

across the window of frames in both spatial and temporal direction [119]. Tippaya et

al. proposed a shot detection algorithm using RGB histogram and edge change ratio,

and three different dissimilarity measures have been used to extract difference between

frame feature vectors [145].

Event detection and video content analysis have been done based on shot detection

and keyframe selection algorithms [23]. Similar scene detection has been done using

clustering approach. Story line has been made from a long video [169]. Event detection

in sports video has been analyzed using long, medium and close-up shots, and play

breaks are extracted for summarization of a video [32]. A shot detection technique has

been implemented based on visual and audio content in video. Wavelet transformation

domain has been utilized for feature extraction [97].

21

1.6 Objective

1.6 Objective

The main objective of this thesis is to introduce feature extraction methods for com-

puter vision applications that includes content based image retrieval, object tracking

and shot boundary detection. Many methods for low level features have been proposed

by researchers as explained in Literature survey section. This thesis work is concen-

trated on local feature extraction methods using neighboring intensities of image pixels.

Extended versions of traditional LBP are proposed with respect to orientation of pix-

els, co-occurrence of pixel pairs, neighboring pixels mutual relationship, etc. Targeting

towards better feature extraction methods with respect to accuracy, the objectives of

this work are as follows:

• Traditional LBP and its extended versions target to extract local pattern related

to neighboring and center pixels and convert the pattern map into a histogram to

create a feature vector. Local pattern map contains more information that can

not be summarized using histogram. To extract more information, co-occurrence

of pixel pairs are used in this work. Co-occurrence provides mutual occurrence

of pixel pairs instead of occurrence of each patterns (histogram) in the pattern

map.

• Local information based on pixels in different direction can give more detailed

features than traditional LBP. Design of a feature extraction method for image

retrieval based on different directions is targeted in this work.

• Mostly local patterns proposed in literature use relationship of neighboring pixels

with center pixel. There is a need to design a local pattern that can extract

mutual relationship between neighboring pixels. This problem is solved using a

novel local pattern in this thesis.

• A problem to track an object is addressed using a novel local pattern. The

proposed local pattern aims to extract directional information using less pixels

that results in a reduced feature vector length.

• Shot boundary detection and keyframe extraction are problems to reduce a video

into few frames so that it can be used for further processing. A video scene may

22


contain many repeating shots in nonconsecutive manner and a direct approach

to extract keyframes may lead to redundant keyframes in this scenario. In this

work, main aim to solve the problem of shot boundary detection is to reduce

redundant information from a video summary using a hierarchical approach.

1.7 Organization of the thesis

This thesis presents novel low level image descriptors and integration of different image

features. The whole work has been organized in nine chapters. Chapter 2 suggests two

new proposed feature descriptors using discrete wavelet transform (DWT) and local

patterns. Murala et al. proposed local extrema pattern (LEP) [93] and directional local

extrema patterns (DLEP) [95] for object tracking and image retrieval respectively. In

the proposed work, two techniques have been established to extract local features from

the wavelet transformation domain. In the first method, two level DWT has been

applied to the original image, and seven sub-band images have been obtained. To

extract local features, local extrema patterns have been extracted, and histograms of

all LEP maps have been created. In the second method, one-level DWT is applied

to the image. Four sub-band images, approximation, horizontal, vertical and detail

sub-band, are obtained. DLEP is a directional method and works in four directions,

i.e., 0◦, 45◦, 90◦, 135◦. All four DLEPs are applied on DWT sub-band images in a way

that maximum directional information can be obtained. Both methods are tested on

Corel-5k and Corel-10k databases [2]. Precision and recall are obtained to verify the

performance of presented methods. Both techniques are compared with some existing

local patterns, and it has been observed that both the techniques are better from

others.

Chapter 3 focuses on a feature extraction method that handles color and texture

information together. In this chapter, we have proposed a novel feature descriptor,

called local extrema co-occurrence pattern. Each image in the database is converted

into HSV color space from RGB color space, since HSV color space gives information

about hue, saturation and value separately. Quantized color histograms of hue and

saturation components are calculated for color information of image. Local extrema co-

occurrence pattern is applied to the value component to extract the texture information.

23

1.7 Organization of the thesis

Further, this method is applied to three natural image databases and two texture image

databases. Corel-1k [1], Corel-5k [2], Corel-10k [2], MIT VisTex color-texture database

[3] and STex database [65] are used to determine the performance of the method.

Moreover, this method is compared to some other local patterns with color histogram.

Evaluation measures, e.g., precision, recall and F-measure are used to validate the

performance of the proposed method as compared to other methods.

Chapter 4 focuses on a problem of a multi-purpose feature descriptor on different

category databases. Heikkila et al. proposed CSLBP that utilizes only center sym-

metric pixels to create the local pattern [42]. They have used histogram to create the

feature vector of CSLBP. Histogram only utilizes the information of occurrence of each

pattern value in the pattern map. In the proposed method, co-occurrence of every

pixel pair is observed in different directions and feature vector is made accordingly.

CSLBP is applied on the original image and pattern map is obtained. GLCMs of two

different distances and four different directions have been applied on pattern map and

combined in different ways. Final feature vector is obtained by combining four GLCMs

of different direction and distance. This method is applied on two texture (MIT Vis-

Tex database [3] and Brodatz database [127]), one face (ORL face database [5]) and

one MRI image database (OASIS MRI database [78]). This method is compared with

the existing local patterns for performance measurement. Precision and recall curves

proved the accuracy of the proposed method.

Chapter 5 discusses the problem of texture and face image retrieval. A novel feature

descriptor called, local tri directional pattern is proposed which extracts information

of each pixel in image using neighboring pixels. Nearest neighborhood of 8-pixels

are considered for pattern creation. Three most adjacent pixels of each neighboring

pixel are taken for pattern formation. Based on their difference with corresponding

neighboring pixel, a tri-directional pattern is formed. Further, this pattern is converted

into binary pattern. For more information, a magnitude pattern is also combined with

LTriDP, and histograms of both patterns are concatenated. This algorithm is applied

for texture and face image retrieval using Brodatz texture database [127], MIT VisTex

database [3] and ORL face database [5].

Chapter 6 presents a novel feature extraction method called, local neighborhood

difference pattern (LNBD). LBP analyzes the relationship of center pixel with neigh-

24


boring pixels. In the proposed method, the mutual relationship of neighboring pixels is

considered. Relationship of each neighboring pixel with two other neighboring pixels is

observed, and converted into a binary pattern map. For more information, LBP (center-

neighborhood pixel relationship) and LNBD (neighboring pixels mutual relationship)

are combined in one feature vector. The proposed method is applied to Corel-10k

database [2], MIT natural scene database [4], Brodatz texture database [127] and STex

database [65] for image retrieval purpose. Precision and recall curves are measured for

the proposed method and for other existing methods. Evaluation curves show that the

proposed method outperforms others.

Chapter 7 discusses the problem of object tracking in a video sequence. Object

tracking problem using color and texture information is considered in this chapter. A

novel texture descriptor, local rhombus pattern (LRP), is proposed in this work. LRP

considers the four neighboring pixels which come under the rhombus of the center

pixel. Local relationship of these four pixels with other four neighboring pixels has

been obtained using differences, and a binary pattern has been constructed. HSV color

histogram is acquired for color information, and joint histogram is obtained of HSV

color and LRP. The proposed method is tested on traffic video and sport video.

Chapter 8 presents a problem of shot detection in video sequences. The main

motivation of this problem is to eliminate redundant shots and extract keyframes.

In the proposed work, hierarchical shot detection algorithm has been developed in

two stages. First stage extracts temporal information of video and detect the initial

shot boundary and extract the keyframes based on each shot. In the second stage,

spatial information of extracted key frames from first stage are analyzed, and redundant

keyframes are excluded. Keyfarmes extraction provided using entropy of each frame in

video. Experiments have been done on news, movie clip and TV advertisement video.

Experiment results show major difference in the number of key frames extracted in

first and final stage of the process.

Chapter 9 concludes the work done in all above chapters. It presents the perfor-

mance of the proposed methods over existing methods in terms of accuracy. Also,

future work using the proposed methods in different applications have been described.

25

Chapter 2

CBIR System using Discrete Wavelet

Transform and Local Patterns

Content based image retrieval is grievous need of present scenario in digital imaging

world. Modern advancement in technology and evolution of digital images force to

create vigorous and methodical systems for searching and retrieving images. Content

based image retrieval is a solution for this troublesome problem. Many methods have

been proposed based on statistics, transformations and local patterns to achieve this

task.

In 1982, Jean Morlet initiated the idea of wavelet transform. Discrete wavelet trans-

form (DWT) is the decomposition of signal into four sub bands which are obtained by

applying low pass and high pass filters. It is used in signal processing, image de-

noising, fingerprint verification, speech recognition along with others [133]. In feature

extraction methods, local patterns have made their place because of their efficiency and

simplicity. However, most of the local patterns have been extracted from the original

image. The main motivation of the presented work is to extract local patterns from

the transformation domain to get more information.

27

2.1 Preliminaries

In this chapter, two new content based image retrieval schemes have been discussed.

Local features have been acquired from the transformation domain. Discrete wavelet

transform has been used to obtain sub-band images. Local features using local extrema

pattern (LEP) [93] and directional local extrema pattern (DLEP) [95] have been ex-

tracted from the DWT domain. DLEP extracts features from four directions. Local

feature extraction from transformation domain gives more robust features as compare

to the original image and it has been proved in experimental section of this chapter.

These methods have been tested on massive natural image databases for image retrieval

purpose. These methods are compared with some local feature extraction methods for

validation of accuracy.

2.1 Preliminaries

2.1.1 Discrete wavelet transform

Figure 2.1: 1-level discrete wavelet transform example

The wavelet transform is a modified and improved version of Fourier transform.

The Fourier transform is generated from sinusoid functions and wavelet transform is

generated from wave functions that are called wavelets. Two dimensional discrete

wavelet transform decomposes an image into 4 parts using low and high pass filters.

28

Chapter 2. CBIR System using Discrete Wavelet Transform and Local Patterns

Filters are first applied to one dimension (row-wise) and then other (column-wise)

as shown in Fig. 2.2. After filtering process, down-sampling has to be done for re-

ducing the computation. In this process, four sub-band images are retrieved, which

are called approximation, horizontal, vertical and detail parts. This is called one level

decomposition. For next level, again same process is applied on 1-level approximation

part.

L(x)

H(x)

2

2

L(y)

H(y)

L(y)

H(y)

2

2

2

2

I(x,y)

Approximation

Horizontal

Vertical

Diagonal 2

2

2

2

2

row-wisecolumn-wise

Figure 2.2: 2-dimensional filter bank and downsampling process for 2d-DWT

2.1.2 Local extrema pattern

75 40 10051 78 8590 62 80

8 4 216 132 64 128

0 0 10 11 0 1

0 0 20 163 1

32 0 128

-3 -38 22-27 712 -16 2

0 1 10

8 4 21

0 4 26 0

75 40 10051 78 8590 62 80

-3 -38 22-27 712 -16 2

Figure 2.3: Local Extrema Pattern

Local extrema pattern is a local feature descriptor in which, pattern value depends

on particular direction’s neighboring pixels [93]. LEP operator finds the difference

between center and its neighbor pixels, and based on the value of difference in four

different directions (0◦, 45◦, 90◦ and 135◦), it decides the value of pattern. If both are

of same sign, it gives 1 and if both are of different sign it gives 0. LEP of the center

pixel is obtained as follows:

I ′k = Ik − Ic; k = 1, 2, . . . , 8 (2.1)

29

2.1 Preliminaries

I ′k(θ) = F1(I′j, I′j+4); j = (1 + θ/45)

∀θ = 0◦, 45◦, 90◦, 135◦(2.2)

F1(I′j, I′j+4) =

1 I ′j × I ′j+4 ≥ 0

0 else

LEP (Ic) =∑θ

2θ/45 × I ′k(θ)

∀θ = 0◦, 45◦, 90◦, 135◦(2.3)

where Ic and Ik are center and neighbor pixel. The direction angle of LEP is denoted

by θ. Feature vector is obtained using histogram of LEP image.

Hist(L) |LEP =m∑j1=1

n∑j2=1

F2(LEP(j1, j2), L);

L ∈ [0, 15]

(2.4)

F2(x1, x2) =

1 x1 = x2

0 else(2.5)

where L is intensity in LEP map. The illustration of LEP is given in Fig. 2.3.

2.1.3 Directional local extrema pattern

The directional local extrema pattern (DLEP) is proposed by Murala et al. for content

based image retrieval [95]. DLEP operator compares the neighboring pixels of four

directions (0◦, 45◦, 90◦, 135◦) with center pixel, and assigns the binary pattern that

depends on the relationship of pixels in 0◦, 45◦, 90◦ and 135◦ directions. Mathematical

formulation of DLEP is given below.

Fθ(Ic) = F1(I′j, I′j+4) ∀ j = (1 + θ/45)

∀θ = 0◦, 45◦, 90◦, 135◦

DLEPpat(Ic))|θ = {Fθ(Ic); Fθ(I1); Fθ(I2); . . . Fθ(I8)}

DLEP(Ic)|θ =8∑

k=0

2k ×DLEPpat(Ic)|θ(k) (2.6)

where I ′j is defined as Eq. 2.1. Ic and Ii are center and neighborhood pixels. θ denotes

the direction angle of DLEP. Histogram is used for calculating the feature vector. The

30


21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

21 57 3 9 2925 11 45 10 1112 3 24 41 131 19 13 36 4021 40 31 2 59

0110

0

1 1 1 1

1 1 1 0DLEP Pattern: 011111110

Figure 2.4: Directional Local Extrema Pattern

illustration of DLEP calculation is shown in Fig. 2.4. Further details about DLEP can

be found in [95].

Hist(L) |DLEP(θ) =m∑i1=1

n∑i2=1

F2(DLEP(i1, i2) |θ, L);L ∈ [0, 511]

2.2 Proposed methods

Two new schemes have been proposed using local patterns and 2D-DWT in this work.

In both the methods, local patterns have been extracted from wavelet transformation

domain.

31


2.2.1 Proposed method 1

This work presents a new multi-scale content based image retrieval system which lever-

ages the multi-resolution property of discrete wavelet transform (DWT) and the local

information attribute of local extrema patterns (LEPs). A two level discrete wavelet

transform with Daubechies-4 wavelet filters are applied to the original image and seven

subband images are obtained. Local extrema pattern is performed on these seven sub-

band images, which gives seven local extrema pattern maps. Multi-resolution analysis

using DWT helps in enhancing the features of image. It extracts the information of

image in different directions, and obtain four subband images that carries more direc-

tional information rather than the original image with respect to feature extraction.

LEP works with local intensity and it is also direction based. Consequently, applying

LEP on wavelet subband image makes it possible to extract directional and textural

information from the image that is less possible with the original image.

Histogram is calculated for each map where maximum intensity in each map is 15,

therefore length of each histogram is 16. Joint histogram is obtained by combining all

seven histograms. Resultant length of joint histogram is 16× 7. Feature vector length

of the proposed method is less as compared to other existing methods (LBP, BLK LBP

etc.). Hence, the proposed method is more suitable for pattern recognition application

like image revival, face recognition, etc.

Algorithm

Block diagram of proposed method 1 is illustrated in Fig. 2.5.

Input: Query image.

Output: Similar images.

1. Upload the query image of size m× n.

2. Convert it into gray scale image if it is a color image.

3. Apply 2-level DWT on the image obtained in step 2 and get seven sub band

images.

4. Apply LEP on acquired seven images.

5. Construct the histogram for all seven LEP maps obtained in step 4.

32


ImageDatabase

2-levelDWT Local extremapatterns Histogram Joint Histogram

Feature Vector

SimilarityMeasure

RetrievedImages

Query Image

Feature vectorDatabase

Figure 2.5: Block diagram of the proposed method

6. Concatenate all histograms with equal weights.

7. Compare the histogram of query image and database’s image with the help of

distance formula given in Eq. 1.1 and get distance measure.

8. Sort the distance measure and get the images of minimum distance as the best

results.

2.2.2 Proposed method 2

In the proposed method 2, two dimensional discrete wavelet transform is applied on

original image, and 4 subband images (approximation, horizontal, vertical and detail

parts) are extracted. These four sub images render low pass, 0 degree, 90 degree, 45 and

-45 degrees information respectively, as shown in Fig. 2.1. Daubechies-4 (db4) wavelets

are applied for decomposition, and DLEP algorithm is applied in different directions

on four sub images as mentioned in table 2.1. Both DWT and DLEP, are helpful in the

Table 2.1: Applied DLEP on wavelet coefficient

Approximation part 4-direction DLEP (0◦, 45◦, 90◦, 135◦)

Detail Part 2-direction DLEP (0◦, 90◦)

Horizontal Part 3-direction DLEP (45◦, 90◦, 135◦)

Vertical Part 3-direction DLEP (0◦, 45◦, 135◦)

33


extraction of directional features. With the help of the wavelet transform, we extract

different sub images with different directional information, and by applying DLEP of

different directions, feature vector is being extracted. For example, in detail part, only

0◦, 90◦ DLEP is applied because 45◦ and −45◦ degree information are already extracted

in wavelet transform, for horizontal part DLEP of 45◦, 90◦ and 135◦ is applied as the

information of 0◦ has earlier been taken in wavelet transform. The same process is

applied to other sub images, and accordingly DLEP of specific direction is obtained.

In this process, additional directional information is being extracted.

After applying DLEP on four sub images, twelve DLEP maps are extracted and

their corresponding histograms are obtained. A joint histogram is prepared by con-

catenating all 12 histograms one after another. DLEP of each direction gives a feature

vector of the length 512. In the proposed method, twelve DLEP maps (4+2+3+3) of

4 orientations (0◦, 45◦, 90◦, 135◦) are obtained and the total length of joint histogram

is 12× 512.

ImageDatabase

1-levelDWT

Histogram 1

JointHistogram

Feature Vector

SimilarityMeasure

RetrievedImages

Query Image


Approximationsubband

Horizontalsubband

Verticalsubband

Detailsubband

0°, 45°, 90°,135° DLEP

45°, 90°, 135°DLEP

0°, 45°, 135°DLEP

0°, 90°DLEP

Histogram 2

Histogram 3

Histogram 4

Figure 2.6: Block diagram of the proposed system

Algorithm

Input: Query image

Output: Retrieved similar images

1. Upload the input image of size m× n.

34


2. If it is color image, then convert it into a gray scale.

3. Apply 1-level DWT on the obtained gray scale image.

(a) Apply 0◦, 45◦, 90◦, 135◦ DLEP on approximation part.

(b) Apply 45◦, 90◦, 135◦ DLEP on horizontal part.

(c) Apply 0◦, 45◦, 135◦ DLEP on vertical part.

(d) Apply 0◦, 90◦ DLEP on detail part.

4. Get histogram of all 12 (4+3+3+2) DLEP maps and concatenate all histograms.

5. Compare the histogram of the query image and the database image with the help

of distance formula and get distance measure.

6. Sort the distance measure and get the minimum distance as the best result.

Both the methods proposed in this chapter are obtained using wavelet transformation

domain. Wavelet coefficients on different scales gives directional information. Wavelets

have been used in feature extraction by researchers. In this work, wavelet coefficients

used to extract local features. Hence, it is advantageous as local features (LEP and

DLEP) also use directional information for feature extraction.

2.3 Experimental results and discussion

To prove the excellence of both the methods, Corel-5k and Corel-10k databases have

been used. Explanation about databases have been given in Chapter 1, Section 1.2.1.

Each image of database have been considered as query image, and the results have

been obtained using evaluation measures, precision and recall (Chapter 1, Section

1.2.5). The proposed methods have been compared with the following methods:

DWT : Discrete wavelet transform

LEP : Local extrema pattern

CS LBP : Center-symmetric local binary pattern

LEPSEG : Local edge pattern for segmentation

LEPINV : Local edge pattern for image retrieval

BLK LBP : Block based local binary pattern

35


LBP : Local binary pattern

DLEP : Directional local extrema pattern

Proposed method 1 and 2 are abbreviated as PM1 and PM2 respectively.

2.3.1 Experiment 1

Results of Corel-5k has been demonstrated in this experiment section. Two kinds of

plots for each precision and recall are presented here. One is based on the number

of images retrieved and other one is according to the category of images. Graphs of

both the methods have been demonstrated and compared with other existing methods.

In experiments, 10, 20,..,100 images are retrieved one after another for performance

measurement and, precision and recall are obtained.

Fig. 2.7(a) and 2.7(b) show graphs between total number of images retrieved with

precision and recall. Proposed method 1 is the combination of DWT and LEP, and it

is better in terms of performance from both DWT and LEP. Similarly, the proposed

method 2 is extracted from DWT and DLEP, and it shows high performance when

compare to both DWT and DLEP. Also, both the methods outperform other methods,

and show better performance in terms of accuracy. In Fig. 2.7(c) and 2.7(d), plots

of each category in Corel-5k have been demonstrated with precision and recall. Both

the proposed methods presented better precision recall in most of the categories than

other methods, and all above, PM2 (proposed method 2) is better than PM1 (proposed

method 1).

2.3.2 Experiment 2

In second experiment, Corel-10k database has been used. Similarly, like previous exper-

iment, precision and recall curves have been plotted with number of images retrieved,

and shown in Fig. 2.8. Results of experiments are shown in graphs. Fig. 2.8 ex-

plains the change in recall and precision with a total number of images retrieved, and

proposed methods outperform other methods. Although, PM2 beats PM1 in terms

of precision and recall, PM2 has bigger feature vector length than PM1 that leads to

more time complexity in image retrieval, and it is explained using table 2.3. Fig. 2.8

presents change in recall and precision with category of images. Clearly graphs validate

36


10 20 30 40 50 60 70 80 90 10010

20

30

40

50

60

Number of images retrieved

Pre

cisi

on

DWTLEPCS_LBPLEPSEGLBPINVBLK_LBPLBPDLEPPM1PM2

(a)

10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25


Rec

all


(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 500

20

40

60

80

100

Number of category

Pre

cis

ion

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 500

20

40

60

80

Number of category

Re

ca

ll

DWT LEP CS_LBP LEPSEG LBPINV BLK_LBP LBP DLEP PM1 PM2

(d)

Figure 2.7: Corel-5k database (a) precision and (b) recall, with number of images

retrieved, and (c) precision and (a) recall, with image database category

37

2.4 Conclusion

the efficiency of proposed methods for large Corel-10k image database, and proposed

method outperforms other existing methods in most of the categories.

Total precision and recall have been shown in table 2.2 for the fixed number of

retrieved images. For precision and recall, total 10 and 100 retrieved images are chosen

respectively and the results are presented for all methods including proposed methods.

Proposed method 2 has the best performance measures among these methods.

Feature vector length directly implies the image retrieval time of the system. More

feature vector length takes more time to extract images. In table 2.3, feature vector

length of all the methods have been shown. Evaluation measures have already shown

that the performance of both the proposed methods are better than others. The

proposed method 2 is better than the proposed method 1 in accuracy. However, the

feature vector length of PM1 is less than PM2.

Table 2.2: Precision and Recall percentage for all methods

Method Corel 5k Corel 10k

Precision Recall Precision Recall

DWT 29.06 12.77 23.13 8.96

LEP 41.52 18.23 33.40 13.26

CS LBP 32.97 14.00 26.43 10.15

LEPSEG 41.56 18.31 34.01 13.81

LEPINV 35.19 14.84 28.93 11.22

BLK LBP 45.76 20.30 38.13 15.34

LBP 43.63 19.22 37.62 14.98

DLEP 48.73 21.05 39.99 15.71

PM1 49.32 23.32 40.68 17.91

PM2 51.41 24.12 43.13 18.26

2.4 Conclusion

This chapter proposes two image retrieval methods that extracts features using wavelet

transform domain. Multi-resolution analysis has been done with wavelet transform. In

proposed method 1, discrete wavelet transform is applied on image, and then local

38


10 20 30 40 50 60 70 80 90 1005

10

15

20

25

30

35

40

45


Pre

cisi

on


(a)

10 20 30 40 50 60 70 80 90 1000

5

10

15

20


Rec

all


(b)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

20

40

60

80

100

Number of category

Pre

cis

ion

(c)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

20

40

60

80

Number of category

Recall

DWT LEP CS_LBP LEPSEG LBPINV BLK_LBP LBP DLEP PM1 PM2(d)

Figure 2.8: Corel-5k database (a) precision and (b) recall, with number of images

retrieved, and (c) precision and (a) recall, with image database category

39

2.4 Conclusion

Table 2.3: Feature vector length of different methods

Method Feature vector length

DWT 20

LEP 16

CS LBP 16

LEPSEG 512

LEPINV 72

BLK LBP 256 × 6

LBP 256

DLEP 512×4

PM1 16×7

PM2 512×12

extrema patterns are extracted from wavelet coefficients. Total seven local extrema

pattern maps have been obtained and histogram is generated. Wavelet transform

captures the low frequency as well as high frequency features that helps local extrema

pattern to create more direction oriented features.

In second method, one level discrete wavelet transform is applied on the image, and

DLEP of different directions obtained from sub-band images. This method utilizes the

directional information of both DWT and DLEP, and combines it in a directive way so

that most of the detail can be extracted from the image for construction of the feature

vector. Effectiveness of the proposed method is measured by testing on Corel-5k and

Corel-10k image database. Proposed methods are compared with some past results,

and evolution measures show that the proposed methods are more accurate than past

methods.

40

Chapter 3

Local Extrema Co-occurrence Pat-

tern for Image Retrieval

In this chapter, we propose a new image retrieval technique; local extrema co-

occurrence patterns (LECoP) using the HSV color space. To utilize the color, intensity

and brightness of images, HSV color space is used in this method. Local extrema pat-

terns are applied to define the local information of image and gray level co-occurrence

matrix is used to obtain the co-occurrence of LEP map pixels. The local extrema

co-occurrence pattern extracts the local directional information from local extrema

pattern, and convert it into a well-mannered feature vector with use of gray level co-

occurrence matrix. Many local patterns for image retrieval have been proposed by

researchers, but most of the local patterns consider the frequency of each pattern in

the image and treat it as a feature descriptor using histogram. However, frequency

gives information about the occurrence of pattern, and it does not reveal any informa-

tion for mutual occurrence of patterns in the image. Mutual occurrence of patterns

is considered in this work. Earlier work with local pattern and color features, con-

sidered texture pattern and color information as individual features. However in this

work, texture feature of local pattern has been extracted using color space component.

41

3.1 Preliminaries

The presented method is tested on five standard databases called Corel, MIT VisTex

and STex database, in which Corel database includes Corel-1k, Corer-5k and Corel-

10k database. Also, this algorithm is compared with previous proposed methods, and

results in terms of precision and recall are shown in this chapter.

3.1 Preliminaries

3.1.1 Color space

In general, images are of three types. First one is the binary image, which contains

only two intensities namely black and white pixels. Second is the gray scale image,

that can have a range of intensities, but only in one band. Next is the color image,

which has multiple bands, and each band contains a range of intensity. Most generally

used, RGB images have three bands called, red, green and blue. In RGB images, three

bands contain information about red, green and blue colors, and hence it is called the

RGB color space. The other one is the HSV color space that stands for hue, saturation

and value.

In HSV, hue component directly related to the color, and a human eye can distin-

guish different colors. Hue is defined as an angle. Saturation represents the brightness

and lightness of color component, and the value shows the intensity of a color, that is

decomposed from the color information of the image. Hue presents an angle from 0 to

360 degrees, and each degree occupies different colors. Saturation is numbered from 0

to 1, as it goes from low to high, intensity of color increases. Value also varies from 0 to

1. Many researchers have shown that HSV color space is more appropriate than RGB

space as in HSV space, color, intensity and brightness can be extracted individually

from images [147, 140, 173]. In this work, images are converted from RGB space to

HSV color space.

3.1.2 Gray level co-occurrence matrix

Gray level co-occurrence matrix transforms an image into a matrix which correspond

to the relationship of pixels in the original image. It calculates the mutual occurrence

42

Chapter 3. Local Extrema Co-occurrence Pattern for Image Retrieval

Pixel pair 1 2 3 41 0 3 1 02 2 2 1 33 0 2 1 04 0 4 0 0

4 2 2 1 21 2 3 3 22 1 3 2 24 2 4 2 41 2 2 4 2

3

Figure 3.1: Gray level co-occurrence matrix computation example

of pixel pairs for a specific distance and in a particular direction. For an input image,

the GLCM is calculated as below.

Gθd(i, j) = ]{((x, y), (a, b)) : I(x, y) = i, I(a, b) = j},

where (x, y), (a, b) ∈ Nx ×Ny

(3.1)

(a, b) = (x+ d× θ1, y + d× θ2)

where Gθd is the gray level co-occurrence matrix of distance d and angle θ. I(x, y)and

I(a, b) are the pixel intensity at position (x, y) and (a, b) and # represents the count

of ((x, y), (a, b)) pixels with satisfying further condition given in Eq. 3.1. Nx and Ny

are horizontal and vertical spacial domains. Values of θ1 and θ2 depend on θ and are

shown in table 3.1.

Table 3.1: Values of θ1 and θ2 corresponding to θ in GLCM

θ θ1 θ2

0◦ 0 1

45◦ -1 1

90◦ -1 0

135◦ -1 -1

An example of GLCM calculation is shown in Fig. 3.1. In Fig. 3.1, first matrix is

image matrix and second matrix is the GLCM. Pixel pair of (2, 2) with distance ‘1’

and angle 0◦ is denoted by red arrow in the first matrix and this pixel pair is occurring

three times in the original matrix, and accordingly in the GLCM at position (2, 2) a

number ‘3’ is occurring. Similarly, for other pixel pairs, GLCM is calculated.

43

3.2 Proposed method

3.2 Proposed method

A new texture feature called local extrema co-occurrence pattern (LECoP) is presented

in this work that extracts the co-occurrence of local extrema pattern (LEP) values.

LEP is a texture feature that is proposed by Murala et al. for object tracking [93] and

it is explained in detail in Chapter 2, Section 2.1.2.

The proposed image retrieval method utilizes both color and texture information

of images. Texture and color both are salient features of an image. HSV color space

has been used for extraction of details of the image in hue, saturation and value parts.

Initially, RGB image is converted into HSV color space [135]. Hue corresponds to the

color component, and varies from 0◦ to 360◦. Each number corresponds to different

color. In the proposed work, three different quantization levels of hue component,

i.e., 18, 36 and 72 have been used, and performance of the proposed method has been

observed. All three quantization schemes divide all colors into different sections so that

optimum color information can be obtained. Similarly, saturation is quantized into 10

and 20 bins for reasonable information extraction. All possible combinations of hue

and saturation have been used, and the performance has been observed on different

datasets in section 4.7. The color histogram has proven its excellence in image retrieval

[141]. The histogram is constructed for both hue and saturation parts.

In the HSV color space, value component is nearly close to the gray scale conversion

of the RGB image, therefore, value component is used for texture extraction. Hue

and saturation are used to extract global information regarding color and brightness

using histogram. Local information of each pixel corresponds to the texture feature

of the image, and it has been extracted using local extrema patterns. LEP is applied

to the value channel of original image. It gives a LEP map same as the image size

with entries from 0 to 15. Histogram extracts the information about the frequency of

intensity, that only implies the occurrence of every pattern in the whole image, and

neglect the information regarding the co-occurrence of pixels. Gray level co-occurrence

matrix reveals the relative occurrence of intensity pairs in the image so that the local

information of occurrence of every pixel pairs in the LEP map can be extracted in the

matrix form. Hence, instead of histogram, GLCM is calculated for LEP map. GLCM

in 0◦ with one distance has been used in this proposed work.

44


Intensity size in LEP map is varying from 0 to 15, hence the size of GLCM in this

case is 16 × 16. Each position in the GLCM matrix represents the occurrence of cor-

responding pixel pair. For histogram concatenation, GLCM is again converted into a

single vector, and a combined histogram is made of hue and saturation quantized bins

and GLCM single vector. Total length of feature vector depends on the quantization

number of hue and saturation part. Feature vector can be normalized with a factor

n, according to the database images. In the proposed work, all databases taken for

experiment, are benchmark databases, therefore, the size of images in databases are

same for a particular database. Normalization factor n can vary, if size of images are

different in a database. Hence, according to the size of images in database, normal-

ization factor is chosen 1000 for database 1 (Corel-1k), and 100 for other databases

(Corel-5k, Corel-10k, MIT VisTex and STex) in experiment section since image size in

database 1 is bigger than other databases.

3.3 Proposed system framework

QueryImage

RGB toHSV color

spaceSaturation

18/36/72bin

quantizedhistogram

LocalextremaPattern

Gray levelco-occurrence

matrix

Histogramconcatena-

tion

Hue

Value

10/20 binquantizedhistogram

Resizematrix tovector

Featurevector

Featurevector

database

Queryimagefeaturevector

Imagedatabase

SimilarityMatch

Results

Figure 3.2: Proposed system block diagram

Algorithm of the system is given below and a block diagram of presented work has

been given in Fig. 3.2 .

Input: Query image.

45


Output: Retrieved images from the database.

1. Upload the input image.

2. Convert it from RGB to HSV color space.

3. Quantize the hue and the saturation part into 18/36/72 and 10/20 bins respec-

tively (according to the requirement of database), and construct histograms for

both, hue and saturation.

4. Apply LEP on the value part of HSV color space, and obtain LEP map.

5. Construct GLCM of LEP map.

6. Convert GLCM into a vector form.

7. Concatenate the histograms obtained in step 3 with GLCM vector of step 6, and

construct the final histogram as a feature vector.

8. Use similarity distance measure for comparing the query feature vector and fea-

ture vectors of existing feature database.

9. Sort the distance measure, and produce the corresponding images of the best

match vectors as final results.


For experimental purpose, natural and texture image databases of color images have

been utilized in this chapter. Three natural-color image databases (Corel-1k, Corel-5k

and Corel-10k) and two color-texture image databases (MIT VisTex database and STex

Ddatabase) have been used. In experiments, for each query image, different number of

images are retrieved separately, and evaluation measures are calculated for each group

of retrieved images. For evaluation measure, precision, recall and F-measure (Chapter

1 Section 1.2.5) are used for all databases and methods. Proposed method is compared

with some existing methods and the details are depicted in table 3.2.

46


10 20 30 40 50 60 70 80 90 10040

45

50

55

60

65

70

75

80

No. of images retrieved

Pre

cisi

on

%

Wavelet+colorhistCS_LBP+colorhistJoint LEP colorhistJoint colorhistLEPINV+colorhistLEPSEG+colorhistPM

(a)

10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60


Rec

all %


(b)

Figure 3.3: Results of precision and recall with number of images retrieved of Corel-1k

database

3.4.1 Experiment 1

First experiment is conducted on Corel-1k database and the details about this database

are given in Chapter 1, Section 1.2.1.

Fig. 3.3 shows the performance of the presented method according to the number

of images retrieved. Precision-recall curve and F-measure curve have been shown in

Fig. 3.4, and these indicate that the proposed method is better than other methods.

Table 3.3 shows the numerical results of precision and recall in terms of percentage

along with other methods. In terms of precision, the proposed method accuracy has

been increased from Wavelet+colorhist, CS LBP+colorhist, Joint LEP colorhist, Joint

colorhist, LEPINV+colorhist and LEPSEG+colorhist up to 16.26%, 3.56%, 4.59%,

4.56%, 8.43% and 3.67% respectively.

47


Table 3.2: Abbreviation of all methods

CS LBP+colorhist : Center-symmetric local binary pattern [42] + RGB color histogram

LEPSEG+colorhist : Local edge pattern for segmentation [168] + RGB color histogram

LEPINV+colorhist : Local edge pattern for image retrieval [168] + RGB Color histogram

Wavelet+colorhist : Discrete wavelet transform + RGB Color histogram [86]

Joint LEP colorhist : Joint histogram of color and LEP [93]

Joint colorhist : Joint histogram of RGB color

PM : Proposed method

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on


0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of top matches

F−m

easu

re


Figure 3.4: Precision-recall curve and F-measure curve for Corel-1k database

3.4.2 Experiment 2

Corel-5k has been used in the second experiment and the details about this database

are given in Chapter 1, Section 1.2.1.

48


10 20 30 40 50 60 70 80 90 10020

30

40

50

60

70


Pre

cisi

on

%


(a)

10 20 30 40 50 60 70 80 90 1005

10

15

20

25

30

35


Rec

all %


(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49500

20

40

60

80

100

No. of image category

Pre

cis

ion

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49500

20

40

60

80

100


Recall


(d)

Figure 3.5: Corel-5k plots of (a) precision and (b) recall, with number of images re-

trieved, and (c) precision and (d) recall, with category number

49


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on


(a)

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35


F−m

easu

re


(b)

Figure 3.6: (a) Precision-recall curve and (b) F-measure curve for Corel-5k database

Average precision and average recall are obtained using Eq. [1.6-1.11], and F-

measure is calculated using Eq. 1.12. Results in terms of precision and recall are

shown in Fig. 3.5 category wise and according to the number of images retrieved.

Precision-recall curve and F-measure curve have been plotted for Corel-5k database in

Fig. 3.6. Table 3.3 illustrates the results of retrieval on the subject of evaluation mea-

sures of Corel-5k database. Results in table and figures clearly indicate that the per-

formance of the presented technique is significantly better than other techniques. The

proposed method accuracy has been significantly improved from Wavelet+colorhist,

CS LBP+colorhist, Joint LEP colorhist, Joint colorhist, LEPINV+colorhist and LEP-

50


10 20 30 40 50 60 70 80 90 10015

20

25

30

35

40

45

50

55


Pre

cisi

on %


(a)

10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25


Rec

all %


(b)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

20

40

60

80

100


Pre

cisi

on

(c)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

20

40

60

80

100


Re

call


(d)

Figure 3.7: Graphs of Corel-10k database (a) precision and images retrieved (b) recall

and images retrieved from database (c) precision and category number (d) recall and

category number

51


SEG+colorhist up to 20.72%, 14.10%, 12.37%, 12.44%, 12.90% and 5.90% respectively.

3.4.3 Experiment 3

Third database in color-natural category is Corel-10k database and the details about

this database are given in Chapter 1, Section 1.2.1.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on


(a)

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25


F−m

easu

re


(b)

Figure 3.8: (a) Precision-recall curve and (b) F-measure curve for Corel-10k database

Same as previous database, for this database also precision, recall and F-measure

are collected with the help of Eq. [1.6-1.12]. Fig. 3.7 and 3.8 explain the results of

Corel-10k regrading precision, recall and F-measure as compared to other methods, and

table 3.3 indicates that the presented technique outperforms other existing methods.

Precision of the proposed method has been considerably raised from Wavelet+colorhist,

CS LBP+colorhist, Joint LEP colorhist, Joint colorhist, LEPINV+colorhist and LEP-

52


SEG+colorhist up to 20.72%, 14.10%, 12.37%, 12.44%, 12.90% and 5.90% respectively.

3.4.4 Experiment 4

10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100


Pre

cisi

on %


(a)

10 20 30 40 50 60 70 80 90 10050

60

70

80

90

100


Rec

all %


(b)

Figure 3.9: MIT VisTex database results of (a) average precision and (b) average recall

For color texture image retrieval, MIT VisTex database is used and more explana-

tion about database is given in Chapter 1, Section 1.2.1.

Results of precision, recall and F-measure with number of images retrieved are

demonstrated in Figs. 3.9 and 3.10. Table 3.4 illustrates the results in percentage of

total precision and recall. Retrieval performance is presented as the number of top

matches during experiment for each image in the database. Average retrieval rate of

the proposed method has been increased from Wavelet+colorhist, CS LBP+colorhist,

Joint LEP colorhist, Joint colorhist, LEPINV+colorhist and LEPSEG+colorhist up

53


to 22.99%, 14.48%, 12.66%, 12.51%, 15.19% and 16.35% respectively. It is acquired

that the presented method in this work is more advantageous than others in terms of

accuracy.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.75

0.8

0.85

0.9

0.95

1

Recall

Pre

cisi

on


(a)

0 2 4 6 8 10 12 14 160

0.2

0.4

0.6

0.8

1


F−m

easu

re


(b)

Figure 3.10: (a) Precision-recall curve and (b) F-measure curve for MIT VisTex

database

3.4.5 Experiment 5

This experiment is conducted on STex database. More information regarding STex

database is presented in Chapter 1, Section 1.2.1. Average precision and recall have

been calculated for all images in the database. In Fig. 3.11, plots of precision and

recall have been presented with number of images retrieved. Precision-recall curve and

54


16 32 48 64 80 96 1120

10

20

30

40

50

60

70

80


Pre

cisi

on

%

Wavelet+colorhistCS_LBP+colorhistJoint LEP colorhistJoint colorhistLEPINV+ColorhistLEPSEG+colorhistPM

(a)

16 32 48 64 80 96 11240

50

60

70

80

90

100


Rec

all %

(b)

Figure 3.11: STex database results of (a) average precision and (b) average recall

F-measure curve have been shown in Fig. 3.12. Average recall rate of the presented

method and other methods have been shown in table 3.4, which clearly show that the

ARR of proposed method is greater than others. Average retrieval rate of the proposed

method has been exceptionally increased from Wavelet+colorhist, CS LBP+colorhist,

Joint LEP colorhist, Joint colorhist, LEPINV+colorhist and LEPSEG+colorhist up to

64.50%, 39.03%, 23.79%, 23.79%, 54.15% and 59.90% respectively.

3.4.6 Experiment results with different distance measure

Four different distance measures, d1, Canberra, Manhattan and Euclidean (Eq. 1.1-

1.4) have been used for measuring the similarity between images. Comparison between

all four distance measures has been shown in table 3.5 for five databases in terms of

55


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cisi

on


(a)

0 2 4 6 8 10 12 14 160.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8


F−m

easu

re


(b)

Figure 3.12: (a) Precision-recall curve and (b) F-measure curve for STex database

average retrieval rate (ARR) and average precision rate (APR). Experiments show that

d1 distance measure is giving better results among others.

3.4.7 Proposed method with different quantization levels

In HSV color space, hue, saturation and value have their own importance. The pro-

posed method is analyzed with different quantization levels of hue and saturation com-

ponents for all the databases in table 3.6. Performance with different quantization

methods differ in different databases, e.g., hue 72 and saturation 20 provide the best

result for Corel 1k database, on the other hand it is not better for other databases.

In the same manner, performance has been observed for other quantization schemes.

56


Table 3.3: Results of Corel-1k, Corel-5k and Corel-10k in precision (for n=10) and

recall (for n=100)

Method Corel-1k Corel-5k Corel-10k

Precision Recall Precision Recall Precision Recall

CS LBP+colorhist 75.88 48.14 54.39 25.47 44.08 18.57

LEPSEG+colorhist 75.80 36.15 43.66 17.62 35.58 13.48

LEPINV+colorhist 72.47 38.56 50.41 20.44 41.25 15.74

Wavelet+colorhist 67.59 40.65 52.15 24.43 42.28 17.34

Joint LEP colorhist 75.13 37.90 53.89 22.85 44.14 16.77

Joint Colorhist 75.15 37.90 53.64 22.71 43.96 16.66

PM 78.58 51.87 62.96 31.16 52.50 23.29

Table 3.4: Average retrieval rate (ARR) for both MIT VisTex and STex database

CS LBP+

colorhist

LEPSEG+

colorhist

LEPINV+

colorhist

Wavelet+

colorhist

Joint LEP

colorhist

Joint

colorhistPM

MIT VisTex 81.23 79.92 80.73 75.61 82.54 82.65 92.99

STEX 53.33 46.37 48.10 45.08 59.90 59.90 74.15

Table 3.5: Experimental results of the proposed method with different distance measure

Corel 1k Corel 5k Corel 10k MIT VisTex STEX

APR ARR APR ARR APR ARR APR ARR APR ARR

d1 78.58 51.87 62.96 31.16 52.50 23.29 92.99 99.22 74.15 90.03

Canberra 75.90 46.77 60.48 27.23 49.84 21.57 91.05 98.76 70.61 87.34

Manhattan 72.86 48.78 52.47 24.62 42.25 18.04 82.80 97.00 65.87 84.95

Euclidean 64.55 41.43 43.61 19.66 34.65 14.32 71.81 92.72 54.18 75.92

Performance of the method depends on the color distribution and the texture of images

present in database.

3.4.8 Computational complexity

Speed of extracting similar images to the query image depends on the feature vector

length of the image. Lengthy feature vector takes more time in calculating the dif-

ference between query image and database images. Comparison of feature vector of

57


Table 3.6: Precision and recall of the proposed method with different quantization

schemes for all databases

Corel 1k Corel 5k Corel 10 MIT VisTex STEX

APR ARR APR ARR APR ARR APR ARR APR ARR

HSV(18 10 256) 78.32 50.58 62.96 31.16 52.50 23.29 92.54 99.03 72.63 88.80

HSV(18 20 256) 77.98 51.35 63.10 30.61 52.47 22.93 92.99 99.22 73.25 89.43

HSV(36 10 256) 78.50 50.70 61.56 30.27 51.18 22.60 92.14 99.08 73.37 89.44

HSV(36 20 256) 78.66 51.72 62.89 30.84 52.52 23.05 92.95 99.23 74.15 90.03

HSV(72 10 256) 78.02 50.87 61.23 29.53 51.22 22.13 91.52 99.08 73.32 89.36

HSV(72 20 256) 78.58 51.87 60.46 28.72 50.86 22.21 92.18 99.26 74.01 89.90

Table 3.7: Feature vector (F.V.) length, feature extraction (F.E.) and image retrieval

(I.R.) time of different method

Method F.V. length F.E. time (sec) I.R. time (sec)

CS LBP+colorhist 16+24=40 0.1216 0.51

LEPSEG+colorhist 512+24=536 0.0243 0.59

LEPINV+colorhist 72+24=96 0.0709 0.52

Wavelet+colorhist 24+192=216 0.0757 0.53

Joint LEP colorhist 16× 8× 8× 8 = 8192 0.1676 2.52

Joint colorhist 8× 8× 8 = 512 0.0360 0.58

LECoP(H18S10V256) 18+10+256=284 0.2407 0.54

LECoP(H18S20V256) 18+20+256=294 0.2414 0.54

LECoP(H36S10V256) 36+10+256=302 0.2418 0.54

LECoP(H36S20V256) 36+20+256=312 0.2422 0.55

LECoP(H72S10V256) 71+10+256=338 0.2427 0.56

LECoP(H72S20V256) 72+20+256=348 0.2449 0.56

the proposed method with other methods has been given in table 3.7 for speed evalua-

tion. Also, feature extraction time for one image has been given in table 3.7 for all the

methods including proposed method. For proposed method, feature extraction time

has been mentioned with all quantization levels.

As demonstrated in the table, the feature vector length of the proposed method is

greater than CS LBP+colorhist, LEPINV+colorhist and Wavelet+colorhist, however,

58


it outperforms these methods in terms of accuracy as mentioned in different database

experimental sections. Also, LEPSEG+colorhist, Joint LEP colorhist and Joint col-

orhist have more feature vector length, and take more time than the proposed method

in extracting the similar images to the query image.

3.5 Conclusion

A novel feature descriptor LECoP is proposed for color and texture image retrieval. It

utilizes the properties of local pattern and co-occurrence matrix using the HSV color

space. This method extracts local directional information of each pixel in terms of

LEP, and construct GLCM to obtain the co-occurrence of each pair in LEP map. The

HSV color space is used for color feature. In particular, hue and saturation are used to

extract color and brightness respectively, using histograms. Value component is used

to apply LECoP, and the combined feature vector is applied to benchmark Corel, MIT

VisTex and STex database. The results for the proposed method and previous methods

have been explained using graphs with evaluation measures, and results show that the

proposed method outperforms other methods.

59

Chapter 4

Center Symmetric Local Binary Co-

occurrence Pattern for CBIR

Image feature extraction, according to the user requirement and database images, is

a difficult task in the present scenario. In this chapter, a new image retrieval technique

has been introduced. This technique is useful for different kind of dataset. In the

proposed method, center symmetric local binary pattern (CSLBP) has been extracted

from the original image to obtain the local information. Co-occurrence of pixel pairs

in local pattern map have been observed in different directions and distances using

gray level co-occurrence matrix. Earlier methods have utilized histogram to extract

the frequency information of local pattern map but co-occurrence of pixel pairs is more

robust than frequency of patterns. The proposed method is tested on three different

category of images , i.e., texture, face and medical image database and compared with

some state-of-the-art local patterns.

61

4.1 Preliminaries

4.1 Preliminaries

4.1.1 Center symmetric local binary pattern

0 1 00

8 4 21

0 4 04 0

R=1P=8

98 20 7859 50 5212 88 30

Figure 4.1: Center symmetric local binary pattern computation example

Center symmetric local binary pattern (CSLBP) is proposed by Heikkila [42]. It

extracts a local pattern for every pixel of the input image region. In center symmetric

local binary pattern (CSLBP), a local pattern is extracted from the input image pixels

based on center symmetric pixel difference. Each pixel is considered as center pixel and

difference of center symmetric pixels is calculated which is independent of the center

pixel. Based on the difference, four binary numbers are assigned to the center pixel

that further multiplied by four weights, and summed up to one value, that is called

center symmetric local binary pattern value. A histogram has been created for feature

vector of CSLBP map. CSLBP is explained mathematically in the following equations.

CSLBPP,R,T =

(P/2)−1∑s=0

2s × F3(Is − Is+(P/2)) (4.1)

F3(a) =

1 a > T

0 else

Hist(L) |CSLBP =m∑

x1=1

n∑x2=1

F2(CSLBP(x1, x2), L);L ∈ [0, 15] (4.2)

where T is the threshold parameter and its value is set to 1% of the highest intensity

in the image. Function F2(x, y) is defined as given in Eq. 2.5. An example of CSLBP

is explained in Fig. 4.1. More details about CSLBP can be found in [42].

4.1.2 Gray level co-occurrence matrix

Complete details about gray level co-occurrence matrix (GLCM) are given in Chapter

3, Section 3.1.2.

62

Chapter 4. Center Symmetric Local Binary Co-occurrence Pattern for CBIR

4.2 Proposed method

]1,0[0°

]1,-1[45°[-1,0]90°]-1,-1[135°

]2,0[0°

]2,-2[45°[-2,0]90°]-2,-2[135°

]2,0[0°

]2,-2[45°

]2,0[0°

[-2,0]90°

[-1,0]90°

]1,0[0°]1,0[0°

]1,-1[45°

(a)

(b)

(c) (d)

Figure 4.2: Different combinations of (d, θ) used for feature vector computation in

GLCM

In earlier local patterns, features have been extracted in the form of histograms that

express the frequency of each pattern. Information regarding the mutual occurrence of

patterns was not considered in the histogram. In the proposed method, we have kept

this as an issue and features have been extracted in a improved manner.

In the proposed method, CSLBP has been chosen for local pattern extraction of the

original image. Center symmetric local binary patterns have some important properties

that encouraged us to adopt it as image pattern. The patterns obtained from the

CSLBP are ranged from 0 to 15. Hence, the feature vector length does not exceed

in comparison with the other local patterns. CSLBP considers all neighboring pixels,

and observe the relation between center symmetric pixels. In the proposed method,

63

4.2 Proposed method

CSLBP pattern is extracted from the input image. Closest neighboring ‘8’ pixels with

radius ‘1’, are considered for pattern. It provides a pattern map which is same size as

of the input image. The value of intensity in the pattern map is varying from 0 to 15.

Gray level co-occurrence matrix has been used for feature extraction from the

CSLBP pattern map. The GLCM can be obtained in different directions and with

different distance from the image as explained in section 2.2. In the presented work,

different combinations of distances and angles have been observed and combined ac-

cordingly. Four combinations that have been used, are demonstrated in Fig. 4.2 and

are explained below.

1. In Fig. 4.2(a), four GLCM of distance 1, and angles 0◦, 45◦, 90◦ and 135◦ have

been obtained.

2. In Fig. 4.2(b), four GLCM of distance 2, and angles 0◦, 45◦, 90◦ and 135◦ have

been calculated.

3. Two GLCM of distance 1 with 0◦ and 45◦ directions, and two GLCM of distance

2 with 0◦ and 45◦ directions have been obtained in Fig. 4.2(c).

4. In Fig. 4.2(d), two GLCM of distance 1 with 0◦ and 90◦ angles, and two GLCM

of distance 2 with 0◦ and 90◦ angles have been obtained.

It can be easily seen that all the four above combinations are considering most

adjacent and most close neighboring pixels for GLCM formation. For local analysis,

closest neighboring pixels are more important than far pixels as they reveal more

neighborhood relationship information. Hence, these four combinations have been

used to collect the GLCMs. By using different directions and distances, these all four

combinations gather the pixel pair co-occurrence between most close pixels, and that

is more informative.

Four matrices have been obtained in each of the above methods. All four matrices

have been converted into vectors, and the vectors have been concatenated in single vec-

tor that is called final feature vector. All above four combinations have been observed

in experimental section for performance. The CSLBP pattern map has the intensity of

pixels in a range of [0, 15] (total 16 intensities). Hence, the length of each GLCM ma-

trix is 16× 16, and in the proposed method, four GLCM matrices have been combined

at one time. Final length of feature vector is 4× (16× 16).

64


Mathematically, CSLBCoP can be formulated for an M×N size image as explained

below:

I =M∑x1=1

N∑x2

CSLBP8,1,2.6(x1, x2) (4.3)

GLCMθd(I) = Gθ

d(i, j) ∀ (i, j) ∈ I (4.4)

where I is the CSLBP map of image. Four combinations of GLCM as explained above,

have been used to create four different feature vectors.

FV1(I) = [GLCM0◦

1 GLCM45◦

1 GLCM90◦

1 GLCM135◦

1 ] (4.5)

FV2(I) = [GLCM0◦

2 GLCM45◦

2 GLCM90◦

2 GLCM135◦

2 ] (4.6)

FV3(I) = [GLCM0◦

1 GLCM45◦

1 GLCM0◦

2 GLCM45◦

2 ] (4.7)

FV4(I) = [GLCM0◦

1 GLCM90◦

1 GLCM0◦

2 GLCM90◦

2 ] (4.8)

It has been explained that CSLBP gives a local pattern for each pixel that extracts

the local information of every pixel and transforms whole image into a CSLBP map.

Each CSLBP map contains total 16 intensities. Further, GLCM gives co-occurrence

between pixel pairs in an image. Following reasons influenced us to combine these two

methods:

• CSLBP extracts local information and has significantly less feature vector length.

• Earlier, people used histogram of local pattern map that extracts frequency dis-

tribution but lacks in mutual occurrence of pixel pairs.

• GLCM extracts the spatial information between pixels. Applying GLCM of dif-

ferent direction and distance on CSLBP map makes it possible to extract spatial

information along with frequency distribution from a transformed local map.

An example of feature extraction has been shown in Fig. 4.3. In Fig. 4.3(a) original

image is shown and the CSLBP of original image is calculated in 4.3(b). Gray level co-

occurrence matrices have been obtained and converted into vector form in Fig. 4.3(c).

Finally, joint vector has been shown in Fig. 4.3(d) by concatenating all GLCM vectors.

65


(a) (b)

(c)

(d)

Figure 4.3: Proposed method feature vector computation for sample image



Figure 4.4: Proposed algorithm block diagram

66


Feature computation has been depicted in the block diagram which is shown in Fig.

4.4. The algorithm for the same is given below.

Input : Image

Output : Feature vector

1. Obtain the gray scale image of the input image.

2. Apply CSLBP and obtain CSLBP map of image.

3. Apply GLCM of ‘1’ distance and 0◦, 45◦, 90◦ and 135◦ directions, and obtain 4

matrices correspond to each direction.

4. Convert all four matrices into vectors, obtained in step 3.

5. Concatenate all four vectors obtained in step 4, into a single feature vector.


In this chapter, five distances, i.e., d1, Euclidean, Manhattan, Canberra and Chi-square

(Eq. 1.1-1.5) have been used to compute the distance between query and database

image feature vectors. Performance of the proposed method has been observed with

these five distance measures.

4.3.3 Feature matching

Block diagram of the whole system has been shown in Fig. 4.5, and algorithm of the

same has been given below.

Input : Query Image

Output : Retrieved Images

1. Extract the features of query image using proposed algorithm.

2. Compute similarity index between query image feature vector and feature vector

of each database’s image.

3. Sort the similarity index.

4. Retrieve images as final results which correspond to shorter distances.

67


Imagedatabase

Featureextraction

Queryimage

Featureextraction


SimilarityMeasure

RetrievedImages



Experiments have been conducted on texture, face and MRI images. In each experi-

ment, precision and recall (Chapter 1 Section 1.2.5) have been calculated and the per-

formance of proposed method is compared with CSLBP [42], LEPINV [168], LEPSEG

[168], DLEP [95], LMEBP [85] and LBP [105]. The proposed method is abbreviated

as CSLBCoP.

4.4.1 Experiment 1

MIT VisTex database (Chapter 1, Section 1.2.1) has been used in first experiment.

Every image of database is used as a query and results are retrieved. For each query,

images are retrieved in groups of 16, 32, 48, .., 112. Average precision and recall have

been shown with number of images retrieved in Fig. 4.6. Precision and recall for other

methods are also calculated in the same manner and plotted in Fig. 4.6. It has been

observed that the proposed method’s average retrieval rate (90.08%) is greater than

CSLBP (79.23%), LEPINV (72.10%), LEPSEG (80.25%), DLEP (80.47%), LMEBP

(87.77%) and LBP (85.84%). Results are also shown in table 4.1. Some query image

examples are shown in Fig. 4.7, where query images are there in first column and

retrieved images from the proposed method are shown next to the query images.

68


16 32 48 64 80 960

20

40

60

80

100


Pre

cisi

on

CSLBPLEPINVLEPSEGLBPLMEBPDLEPCSLBCoP

(b)

16 32 48 64 80 9670

75

80

85

90

95

100


Rec

all


(a)

Figure 4.6: (a) Average precision and (b) recall graph for MIT VisTex database

4.4.2 Experiment 2

In second experiment, Brodatz database (Chapter 1 Section 1.2.1) has been used. For

Brodatz database, retrieved images are grouped in 25, 30, 35,..,70 since each category

holds 25 images. Final results are shown in Fig. 4.8. Graph of precision with number of

image retrieved presented in 4.8(a), and graph of recall with number of image retrieved

is shown in 4.8(b). Average recall rate (ARR) of all methods has been given in table

4.1. Results demonstrated in graphs, clearly indicate that the proposed method is

better than other local patterns, and it is a better texture descriptor.

69


Query image Retrieved images

Figure 4.7: Query image retrieval in MIT VisTex texture image database

4.4.3 Experiment 3

In this experiment, ORL face database (Chapter 1, Section 1.2.1) has been utilized.

In ORL face database experiment, images are retrieved in a group of 1, 2, 3,..,10

since the face images are very sensitive and almost similar to each other with minor

changes. Precision and recall graphs are shown in Fig. 4.9(a) and 4.9(b) versus number

of images retrieved. Fig. 4.9 and table 4.1 clearly imply that the proposed method

(64.15%) outperforms others. A query example of face image retrieval is shown in

Fig. 4.10 in which, query images and the corresponding retrieved images are shown.

In Fig. 4.11, a query example for all the methods has been shown. Results can be

seen practically that for the proposed method, all retrieved images belong to the same

category whereas for other methods some false images are retrieved.

4.4.4 Experiment 4

Final experiment in this chapter is conducted on OASIS MRI image database (Chapter

1 Section 1.2.1). For performance measurement, average precision rate (APR) has been

70


25 30 35 40 45 50 55 60 65 7020

30

40

50

60

70

80


Pre

cisi

on


(a)

25 30 35 40 45 50 55 60 65 7055

60

65

70

75

80

85

90


Rec

all


(b)

Figure 4.8: (a) Average precision and (b) recall graph for Brodatz texture database

calculated for each method, including the proposed method. In Fig. 4.12(a), APR with

number of images retrieved has been plotted, and the proposed method outperforms

other methods. Also, group precision for each category has been calculated, and shown

in Fig. 4.12(b). It indicates that the performance of proposed method is much better

for group 2 and group 4 images, and for group 1 and group 3, it is slightly down from few

methods. Average group precision (48.81%) is more satisfactory than other methods

as shown in table 4.1. Query image example of OASIS database is demonstrated in

Fig. 4.13.

71


1 2 3 4 5 6 7 8 9 1020

30

40

50

60

70

80

90

100


Pre

cisi

on


(a)

1 2 3 4 5 6 7 8 9 1010

20

30

40

50

60

70


Rec

all


(b)

Figure 4.9: (a) Average precision and (b) recall graph for ORL face database

Table 4.1: Results of previous methods and the proposed method for all databases

MIT VisTex

ARR

Brodatz

ARR

ORL Face

ARR

OASIS MRI

Group Precision

CSLBP 79.23 56.41 54.38 37.98

LEPINV 72.10 56.83 26.63 38.13

LEPSEG 82.23 64.77 37.13 39.15

LBP 85.84 70.06 42.13 42.04

LMEBP 87.77 74.23 46.38 45.69

DLEP 80.47 71.39 51.35 44.80

CSLBCoP 90.08 75.51 64.15 48.81

72



Figure 4.10: Query image retrieval in ORL face image database

4.4.5 Proposed method using different directions and distances

in GLCM

In the proposed method, gray level co-occurrence matrix has been obtained from the

CSLBP map of original image. Different combinations of GLCM distances and angles

have been observed in this work, and analyzed on all databases. All combinations of

GLCMs which have been used in this work, are explained in Fig. 4.2. Results on

different datasets in terms of precision and recall, have been shown in Table 4.2.

73


CSLBP

LEPINV

LEPSEG

LBP

LMEBP

CSLBCoP

DLEP

Figure 4.11: Query image retrieval in ORL face image database for all methods

4.4.6 Proposed system using different distance measure

Similarity index plays a major role in retrieval system. Hence, performance of the

proposed method has been analyzed on five similarity indices as mentioned in section

4.2. All five similarity measures have been explained in Eq. 1.1-1.5. Results for all

databases on different distance measures have been shown in table 4.3. It has been

observed that the results for d1 distance are better from others in all cases. A weight

has been associated with each pattern difference that makes it possible to enhance the

small variation in local patterns. Hence, d1 distance gives better results than others.

Also, in the literature [84, 91, 153], the d1 distance has worked as a better distance

measure for local patterns.

74


1 2 3 4 5 6 7 8 9 1030

40

50

60

70

80

90

100


Pre

cisi

on

CS_LBPLEPINVLEPSEGDLEPLMEBPLBPCSLBCoP

Group 1 Group 2 Group 3 Group 425

30

35

40

45

50

55

60

65

Group ID

Pre

cisi

on


1 2 3 4 5 6 7 8 9 1030

40

50

60

70

80

90

100


Pre

cisi

on


Group 1 Group 2 Group 3 Group 425

30

35

40

45

50

55

60

65

Group ID

Pre

cisi

on


Figure 4.12: Average precision and group precision graph for OASIS medical image

database

4.4.7 Feature vector length and computation time

Run time of feature extraction and image retrieval have been given in table 4.4. Feature

extraction time of a image depends on the complexity of algorithm. On the other

75


Retrieved imagesQuery image

Figure 4.13: Query image retrieval in OASIS medical image database

Table 4.2: Proposed method with different direction and distance in GLCM

MIT VisTex

ARR

Brodatz

ARR

ORL Face

ARR

OASIS MRI

Group Precision

D=1 θ = 0◦, 45◦

D=2 θ = 0◦, 45◦89.85 75.10 62.03 45.59

D=1 θ = 0◦, 90◦

D=2 θ = 0◦, 90◦90.08 75.46 61.68 48.81

D=1

θ = 0◦, 45◦, 90◦, 135◦89.30 74.69 60.63 46.56

D=2

θ = 0◦, 45◦, 90◦, 135◦90.05 75.51 64.15 47.13

hand, image retrieval time hangs on length of feature vector. Feature extraction time

of proposed method is less when compare to all other methods except CSLBP and

LBP. Moreover, accuracy of the proposed method is better than these methods. Image

retrieval time of all the algorithms are approximately equal except DLEP and LMEBP,

as there feature vector length is greater than all other methods. Feature vector length

of all methods is also given in table 4.4.

76


Table 4.3: Results of all databases with different distance metrics

MIT VisTex

ARR

Brodatz

ARR

ORL Face

ARR

OASIS MRI

Group Precision

d1 90.08 75.51 64.15 48.81

Euclidean 81.06 65.77 55.83 42.44

Manhattan 87.14 73.43 61.93 45.08

Canberra 87.34 61.12 55.48 44.12

Chi-square 88.38 74.42 63.20 47.83

Table 4.4: Computation time and feature vector length of all methods

Method F.E. time I.R. time F.V. length

CSLBP 0.0196 0.0322 16

LEPINV 0.0720 0.0323 72

LEPSEG 0.0371 0.0330 512

LBP 0.0192 0.0325 256

LMEBP 0.0900 0.0390 4608

DLEP 0.0380 0.0350 2048

CSLBCoP 0.0320 0.0335 1024

F.E.= Feature extraction, I.R.= Image retrieval, F.V.= Feature vector.

4.5 Conclusion

In this chapter, a new image retrieval method has been developed for multi purpose

image datasets. This method is obtained using powerful center symmetric local binary

pattern and gray level co-occurrence matrix. Local features have been obtained using

CSLBP, and co-occurrence of pixel intensities in CSLBP map has been observed using

GLCM. Gray level co-occurrence matrix of different angles and distances have been

observed and combined as one feature vector. Instead of histogram, GLCM of pattern

map have been proved more robustness in terms of feature descriptor. It gives mutual

occurrence of pixels in different directions that helped in obtaining more vigorous

local information from CSLBP map. This method has been tested on MIT VisTex

77

4.5 Conclusion

texture database, Brodatz texture database, ORL face image database and OASIS MRI

medical image database. Effectiveness of the proposed method has been demonstrated

by experiments and proved by comparing other local patterns.

78

Chapter 5

Local Tri-Directional Patterns : A

New Feature Descriptor

In image retrieval, local features extract the information regarding local objects in

the image or local intensity of pixels. Local patterns consider the neighboring pixels

to extract the local information in the image. Most of the local patterns proposed

by researchers, were uniform for all neighboring pixels. A very few patterns utilized

the pixel information based on the direction. The main objective of this work is to

develop a direction based local pattern which can provide better features with respect

to uniform local patterns.

In this chapter, a new texture feature descriptor has been developed which is using

local intensity of pixels based on three directions in the neighborhood and named as the

local tri-directional patterns (LTriDPs). Further, a magnitude pattern is merged for

better feature extraction. The proposed method has been tested on three databases, in

which first two, Brodatz texture image database and MIT VisTex database are texture

image databases and third one is the ORL face database. Further, the effectiveness

of the proposed method is proven by comparing it with existing algorithms for image

retrieval application.

79

5.1 Preliminaries

5.1 Preliminaries

5.1.1 Local binary pattern

3 7 18 6 85 2 9

0 1 01 10 0 1

8 4 216 132 64 128

0 4 016 149 10 0 128

Figure 5.1: Local binary pattern example

Local binary patterns (LBPs) have been proposed by Ojala et al. for local informa-

tion of pixels in an image. In this method, all pixels of the image have been considered

as center pixel, and local information extracted for each pixel that depends on neigh-

boring pixels. Each center pixel is subtracted from all neighboring pixels, and a binary

number is assigned to each neighboring pixel that depends on the difference of center

pixel and neighboring pixel. These binary numbers construct the local binary pattern

for each center pixel. Further, local binary patterns multiplied by some weights and

summed up to a pattern value that is called local binary pattern value for a center pixel.

For a center pixel Ic and neighboring pixels In (n=1,2,..,8), LBP can be computed as

follows:

LBPP,R =P−1∑n=0

2n × F4(In − Ic) (5.1)

F4(x) =

1 x ≥ 0

0 else

where P and R are the number of neighboring pixels and radius respectively. Histogram

of LBP map has been calculated using Eq. 5.2, where m× n is the size of image and

F2(x, y) is defined as given in Eq. 2.5. A sample window example of LBP pattern is

shown in Fig. 5.1.

Hist(L)∣∣LBP

=m∑a=1

n∑b=1

F2(LBP (a, b), L);

L ∈ [0, (2P − 1)]

(5.2)

80

Chapter 5. Local Tri-Directional Patterns : A New Feature Descriptor

5.2 Proposed method

3 7 18 6 85 2 9

3 7 18 6 85 2 9

I6 I7 I8

I5 Ic I1

I4 I3 I2

3 7 18 6 85 2 9

3 7 18 6 85 2 9

3 7 18 6 85 2 9

3 7 18 6 85 2 9

3 7 18 6 85 2 9

3 7 18 6 85 2 9

3 7 18 6 85 2 9

101 1 111 0 000 0 100 2

111 0 000 0 111 0 000 0

Tri-directional pattern = 10020000

01.78)-(98)-(18.56)-(96)-(1

22

22→

=+

=+ 06.72)-(52)-(92.36)-(56)-(9

22

22→

=+

=+ 12.45)-(85)-(25.46)-(86)-(2

22

22→

=+

=+01.79)-(29)-(85.46)-(26)-(8

22

22→

=+

=+

08.58)-(38)-(52.36)-(36)-(5

22

22→

=+

=+ 02.77)-(17)-(38.56)-(16)-(3

22

22→

=+

=+ 02.91)-(81)-(72.26)-(86)-(7

22

22→

=+

=+04.63)-(73)-(82.26)-(76)-(8

22

22→

=+

=+

Magnitude pattern = 0001000010000000 00010000

(a) (b)

(c) (d) (e) (f)

(i) (j)(g) (h)

Figure 5.2: Sample window example of the proposed method

Local tri-directional pattern is an extension of LBP. Instead of uniform relation-

ship with all neighboring pixels, LTriDP considers the relationship based on different

directions. Each center pixel have some neighboring pixels in a particular radius.

Closest neighbor consists of 8 pixels all around the center pixel. Further, there are

16 pixels in next radius and so on. Closest neighboring pixels are less in number

and gives more related information as they are nearest to center pixel. Hence, we

consider 8-neighborhood pixels for pattern creation. Each neighborhood pixel at one

time is considered and compared it with center pixel and also with two most adjacent

neighborhood pixels. These two neighborhood pixels are either vertical or horizontal

81

5.2 Proposed method

pixels as they are closest to the considered neighboring pixel. The pattern formation

is demonstrated in Fig. 5.2 and explained mathematically as follows.

Consider a center pixel Ic and 8-neighborhood pixels I1, I2, .., I8. Firstly, we calcu-

late the difference between each neighborhood pixel with its two most adjacent pixels

and difference of each neighborhood pixel with center pixel.

D1 = Ii − Ii−1, D2 = Ii − Ii+1, D3 = Ii − Ic ∀ i = 2, 3.., 7 (5.3)

D1 = Ii − I8, D2 = Ii − Ii+1, D3 = Ii − Ic for i = 1 (5.4)

D1 = Ii − Ii−1, D2 = Ii − I1, D3 = Ii − Ic for i = 8 (5.5)

For each neighborhood pixel, we have three differences, D1, D2 and D3 and hence the

pattern number is assigned as

f(D1, D2, D3) = {#(Dk < 0)}mod 3 ∀ k = 1, 2, 3. (5.6)

where #(Dk < 0) denotes the total count of Dk which is less than 0, for all k = 1, 2, 3.

#(Dk < 0) provides the values ranging from 0 to 3. To calculate each pattern value, a

mod value of #(Dk < 0) is taken with 3. It gives the value according to #(Dk < 0),

e.g., when all Dk < 0, k = 1, 2, 3 then #(Dk < 0) is 3 and #(Dk < 0) mod 3 is 0.

Similarly if no Dk < 0 then also the value of #(Dk < 0) mod 3 will be 0. In this way,

#(Dk < 0) mod 3 is assigned values 0, 1 and 2. More explanation of pattern value

calculation using example, is given in the end of this section in Fig. 5.2. For each

neighborhood pixel ‘i = 1, 2, .., 8’, pattern values fi(D1, D2, D3) are calculated using

Eq. 5.6, and tri-directional pattern has been obtained as

LTriDP (Ic) = {f1, f2, .., f8} (5.7)

Hence, we get a ternary pattern for each center pixel and convert this into two binary

patterns as shown below,

LTriDP1(Ic) = {F5(f1), F5(f2), .., F5(f8)}

F5(x) =

1, x = 1

0, else

(5.8)

LTriDP2(Ic) = {F6(f1), F6(f2), .., F6(f8)}

F6(x) =

1, x = 2

0, else

(5.9)

82


LTriDP (Ic)∣∣i=1,2

=7∑l=0

2l × LTriDPi(Ic)(l + 1) (5.10)

After getting pattern map, the histogram is calculated for both binary patterns using

Eq. 5.11, where Pattren is LTriDP |i (i=1,2).

Hist(L)∣∣Pattern

=m∑a=1

n∑b=1

F2(Pattern(a, b), L);

L ∈ [0, 255]

(5.11)

The tri-direction pattern is extracting most of the local information, however it has

been shown that the magnitude pattern is also helpful in creation of more informative

feature vector [38, 96]. We have also employed a magnitude pattern based on center

pixel, neighborhood pixel and two most adjacent pixels. Magnitude pattern is created

as follows:

M1 =√

(Ii−1 − Ic)2 + (Ii+1 − Ic)2

M2 =√

(Ii−1 − Ii)2 + (Ii+1 − Ii)2 ,∀ i = 2, 3.., 7(5.12)

M1 =√

(I8 − Ic)2 + (Ii+1 − Ic)2

M2 =√

(I8 − Ii)2 + (Ii+1 − Ii)2, for i = 1(5.13)

M1 =√

(Ii−1 − Ic)2 + (I1 − Ic)2

M2 =√

(Ii−1 − Ii)2 + (I1 − Ii)2, for i = 8(5.14)

Values of M1 and M2 are calculated for each neighborhood pixel and according to these

values, a magnitude pattern value is assigned to each neighborhood pixel.

Magi(M1,M2) =

1, M1 ≥M2

0, else(5.15)

LTriDPmag(Ic) = {Mag1,Mag2, ..,Mag8} (5.16)

LTriDP (Ic)∣∣mag

=7∑l=0

2l × LTriDPmag(Ic) (5.17)

Similarly, the histogram of the magnitude pattern is created by Eq. 5.11, where

Pattern is LTriDP |mag, and three histograms are concatenated as one.

Hist = [Hist∣∣LTriDP1

, Hist∣∣LTriDP2

, Hist∣∣LTriDPmag

] (5.18)

An example of pattern calculation is shown in Fig. 5.2 through (a)-(j) windows. In

window (a), center pixel Ic and neighborhood pixels I1, I2, .., I8 are shown. Center pixel

83

5.2 Proposed method

is marked as red color in windows (b)-(j). In the next window (c), first neighborhood

pixel I1 is marked as blue color, and two most adjacent pixels marked as yellow color.

First, we compare blue pixel with yellow pixels and red pixel, and assign ‘0’ or ‘1’

value for all the three comparisons. For example, in window (c) I1 is compared with

I8, I2 and Ic. Since I1 > I8, I1 < I2 and I1 > Ic, the pattern for I1 is 101. Now,

according to Eq. 5.6 the pattern value for I1 is 1. In the same way, for next windows

(d)-(j) pattern values are obtained for other neighboring pixels. Finally, the local

tri-directional pattern for center pixel is obtained by merging all neighborhood pixel

pattern values. For magnitude pattern, magnitude of center pixel and neighborhood

pixel is obtained and compared. In the presented example, ‘6’ is center pixel and

I1 is ‘8’. In window (c), magnitude of the center pixel ‘6’ is 5.8 and magnitude of

‘8’ is 7.1 with respect to ‘1’ and ‘9’. Since, the magnitude of center pixel is less than

neighborhood pixel, we assign ‘0’ pattern value here. Consequently, magnitude pattern

is calculated for next neighborhood pixels and shown in (d)-(j) windows, and magnitude

patterns of all neighborhood pixels are merged into one pattern, and that is magnitude

pattern for center pixel. Boundary pixels in each image have been left and LTriDP is

calculated for all pixels except boundary pixels.

Local patterns use local intensity of pixels for grabbing the information and create

the pattern according to the information. Local binary patterns compare the neighbor-

hood pixel and center pixel and assign a pattern to the center pixel. In the proposed

work, additional relationships among local pixels have been observed. Along with re-

lationship of center-neighborhood pixels, mutual relationship of adjacent neighboring

pixels are obtained, and local information based on three direction pixels are exam-

ined. This method gives more information compare to LBP and other local patterns, as

it calculates center-neighboring pixel information alongwith mutual neighboring pixel

information. Nearest neighbors gives most of the information. Hence, the pattern

is calculated using most adjacent neighboring pixels for each pattern value. Also, a

magnitude pattern is introduced which provides information regarding intensity weight

for each pixel. Both LTriDP and magnitude pattern, give different information and

concatenation provides better feature descriptor.

84



A block diagram of the presented method is shown in Fig. 5.3, and algorithm for the

same is demonstrated below. Two parts of the algorithm are given in Section 5.3.1. In

part 1, feature vector construction is explained, and in part 2, image retrieval system

is presented.

5.3.1 Algorithm

Part 1: Feature vector construction

Input: Image.

Output: Feature vector.

1. Upload the image and convert it into gray scale if it is a color image.

2. Compute the tri-directional patterns and construct histogram.

3. Evaluate the magnitude patterns and make histogram.

4. Concatenate both the histograms calculated in step 2 and step 3.

Part 2: Image retrieval

Input: Query image.

Output: Similar images to the query.

1. Enter the query image.

2. Calculate the feature vector as shown in part 1.

3. Compute the similarity index of query image feature vector with every database

image feature vector.

4. Sort similarity indices and produce images corresponding to minimum similarity

indices as results.


Similarity of query image and database image has been measured using d1 distance

measure (Eq. 1.1).

85


QueryImage

Tri-directionalpattern

Queryfeaturevector Similarity

match

Imagesretrieved

ImageDatabase

Feature vectordatabase

Magnitudepattern

Histogram3

Histogram1

Jointhistogram

Histogram2

Figure 5.3: Block diagram of the proposed method


The proposed method has been tested on two texture database and one face image

database for validation. The capability of the presented method for image retrieval is

shown on the basis of precision, recall [82] and average normalized modified retrieval

rank (ANMRR) [77]. Precision, recall and ANMRR are explained in Chapter 1, Section

1.2.5. In the process of retrieving images, for a given query, many images are retrieved.

In those images, some are relevant to query image and some are non-relevant results

which do not match query image. Every image of the database is treated as a query

image, and for each image, precision, recall and normalized modified retrieval rank

(NMRR) are calculated. The proposed method is compared with CS LBP, LEPINV,

LEPSEG, LBP, LMEBP and DLEP in the following experiments.

5.4.1 Experiment 1

In the first experiment, MIT VixTex database [3] of gray scale images is used and the

details of this database are given in Chapter 1, Section 1.2.1.

Precision and recall for the presented method and other methods are calculated

and demonstrated through graphs. In Fig. 5.4, plots of precision and recall are shown.

86


16 32 48 64 80 9610

20

30

40

50

60

70

80

90


Prec

isio

n

CS_LBPLEPINVLEPSEGLBPLMEBPDLEPPM

(b)

16 32 48 64 80 9670

75

80

85

90

95

100


Rec

all


(a)

Figure 5.4: Precision and recall with number of images retrieved for database 1

Table 5.1: Average retrieval rate of all databases

Method Brodatz Database MIT VisTex Database ORL Face Database

CS LBP 54.74 74.39 44.35

LEPINV 56.83 72.10 26.63

LEPSEG 64.76 80.25 37.13

LBP 70.06 82.27 42.13

LMEBP 74.23 87.77 46.38

DLEP 71.39 80.47 51.35

PM 76.45 88.62 55.10

87


16 32 48 64 80 9610

20

30

40

50

60

70

80

90

97


Prec

isio

n

16 32 48 64 80 96

86

88

90

92

94

96

98


Rec

all

LTriDPLTriDP

mag

(b)(a)

16 32 48 64 80 9610

20

30

40

50

60

70

80

90

97


Prec

isio

n

16 32 48 64 80 96

86

88

90

92

94

96

98


Rec

all

LTriDPLTriDP

mag

(b)(a)

Figure 5.5: (a) Precision and (b) recall of proposed methods for database 1

88


25 30 35 40 45 50 55 60 65 7020

30

40

50

60

70

80


Pre

cisi

on


(a)

25 30 35 40 45 50 55 60 65 7050

55

60

65

70

75

80

85

90


Rec

all


(b)

Figure 5.6: (a) Precision and (b) recall with number of images retrieved for database 2

Fig. 5.4(a) presents variation in precision with number of images retrieved and Fig.

5.4(b) shows graph between recall and number of images retrieved. Both the graphs

clearly show that the presented method is better than others in terms of precision and

recall. In terms of average retrieval rate, the proposed method has been improved from

CS LBP, LEPINV, LEPSEG, LBP, LMEBP and DLEP by 39.66%, 34.52%, 18.05%,

9.12%, 2.99% and 7.09%. ANMRR for every method is calculated and presented in the

table 5.2. ANMRR for proposed method is more close to zero as compared to other

methods. It clearly indicates that the most ground-truth results have been achieved

using the proposed method.

89


20 25 30 35 40 45 50 55 60 65 70 7525

30

35

40

45

50

55

60

65

70

75

80


Prei

sion

LTriDPLTriDP

mag

20 25 30 35 40 45 50 55 60 65 70 7570

72

74

76

78

80

82

84

86

88


Rec

all

LTriDPLTriDP

mag

(b)(a)

20 25 30 35 40 45 50 55 60 65 70 7525

30

35

40

45

50

55

60

65

70

75

80


Prei

sion

LTriDPLTriDP

mag

20 25 30 35 40 45 50 55 60 65 70 7570

72

74

76

78

80

82

84

86

88


Rec

all

LTriDPLTriDP

mag

(b)(a)

Figure 5.7: (a) Precision and (b) recall of the proposed methods for database 2

In addition, we have shown the comparison of the proposed methods mutually. In

Fig. 5.5, comparison of LTriDP and LTriDPmag has been shown in terms of precision

and recall with the number of images retrieved. It is clearly visible, that the LTriDP

is more precise than LTriDPmag.

90


5.4.2 Experiment 2

1 2 3 4 5 6 7 8 9 1020

30

40

50

60

70

80

90

100


Prec

isio

n


(a)

1 2 3 4 5 6 7 8 9 10

10

20

30

40

50

60

Number of images retrievd

Rec

all


(b)

Figure 5.8: (a) Precision and (b) recall with number of images retrieved for database 3

In the second experiment, Brodatz textures [127] have been used for testing. Details

about Brodatz database are given in Chapter 1, Section 1.2.1.

The results of the proposed algorithm in the form of precision and recall are pre-

sented in graphs. In this system, initially 25 images are retrieved for each query, and

then an increment of 5 has been applied, and up to 70 images are retrieved. Plots

of precision and recall with number of retrieved images are shown in Fig. 5.6. The

proposed method is more satisfying than other methods and it is clearly visible in

the graphs. Moreover, result of ANMRR are shown in table 5.2, and it implies that

more relevant images are retrieved using the proposed method as compared to other

91


Table 5.2: Average normalized modified retrieval rank of different methods and

databases

Method Brodatz Database MIT VisTex Database ORL Face Database

CS LBP 0.3664 0.1696 0.4607

LEPINV 0.3437 0.1876 0.6638

LEPSEG 0.2704 0.1198 0.5398

LBP 0.2278 0.0817 0.4833

LMEBP 0.1944 0.0738 0.4422

DLEP 0.1685 0.1278 0.3918

PM 0.1742 0.0679 0.3570

methods. Fig. 5.7 shows the plots between LTriDP and LTriDPmag. On the basis of

precision and recall, LTriDP is more effective than LTriDPmag, however combination of

both has enhanced the image information as shown in Fig. 5.6.

5.4.3 Experiment 3

In the third experiment, ORL face database [5] has been taken for face image retrieval

purpose. Results of database 3 have been presented in Figs. 5.8 and 5.9. In image

retrieval experimental setup, images have been retrieved in a group of 1, 2, .., 10 images,

and precision and recall have been calculated for each group and shown in Fig. 5.8. The

performance measures in experimental results clearly depict that the proposed method

(55.10%) outperforms DLEP (51.35%), LMEBP (46.38%), LBP (42.13%), LEPSEG

(37.13%), LEPINV (26.63%) and CS LBP (44.35%). Average normalized modified

retrieval rank for this database is shown in table 5.2. For the proposed method it is

more close to zero as compared to other methods, hence the proposed method is more

promising than others in terms of accurate retrieval. Further, in Fig. 5.9, comparison

between LTriDP and LTriDPmag is shown.

Feature vector length of each method is shown in table 5.3. Feature vector length

of the proposed method is comparatively very less from LMEBP and DLEP, and per-

formance is better. Feature vector length of the proposed method is more than CS LBP,

92


0 1 2 3 4 5 6 7 8 9 10 11

50

60

70

80

90

100


Prec

isio

n

0 1 2 3 4 5 6 7 8 9 10 115

10

15

20

25

30

35

40

45

50

55

60


Rec

all

LTriDPLTriDP

mag

LTriDPLTriDP

mag

(a) (b)

0 1 2 3 4 5 6 7 8 9 10 11

50

60

70

80

90

100


Prec

isio

n

0 1 2 3 4 5 6 7 8 9 10 115

10

15

20

25

30

35

40

45

50

55

60


Rec

all

LTriDPLTriDP

mag

LTriDPLTriDP

mag

(a) (b)

Figure 5.9: (a) Precision and (b) recall of proposed methods for database 3

LEPINV, LEPSEG and LBP, but performance is considerably better as shown in dif-

ferent database results.

93



Figure 5.10: ORL database query example

Demonstration of the proposed method is shown in Fig. 5.10. Similar face images

have been retrieved for five query images. In Fig. 5.10, first image in each row is

query image and next three images are retrieved images using the proposed method.

Table 5.1 explains the average retrieval rate (ARR) of two texture databases and one

face image database for all compared methods with the proposed method. Final ARR

results clearly verify that the proposed algorithm outperforms others.

94




CS LBP 16

LEPINV 72

LEPSEG 512

LBP 256

LMEBP 4096

DLEP 2048

PM 768

5.5 Conclusion

A novel method, named as Local tri-directional pattern, has been proposed in this

chapter and abbreviated as LTriDP. Each pixel in the neighborhood has been compared

with the most adjacent pixels and center pixel for local information extraction. In most

of the previous local patterns, only center pixel is considered for pattern formation,

however in the proposed method, information related to each pixel of neighborhood

is extracted, therefore, this method is giving more enhanced features. The magnitude

pattern is also incorporated, that is again based on the same pixels used in LTriDP.

All methods are tested on MIT VisTex texture database, Brodatz texture database

and ORL face image database. Precision and recall show that the proposed system

more proficient and appropriate than others in terms of accuracy. Further, the feature

vector length of the proposed algorithm is more acceptable than LMEBP and DLEP.

95

Chapter 6

Local Neighborhood Difference Pat-

tern : A New Feature Descriptor

A new image retrieval technique called local neighborhood difference pattern (LNDP)

has been proposed for local features. The conventional local binary pattern (LBP)

transforms every pixel of image into a binary pattern based on their relationship with

neighboring pixels. The proposed feature descriptor differs from local binary pattern as

it transforms the mutual relationship of all neighboring pixels in a binary pattern. Both

LBP and LNDP are complementary to each other as they extract different information

using local pixel intensity.

In the proposed method, both LBP and LNDP features are combined to extract

extreme information. To prove the excellence of the proposed method, two experiments

have been conducted on four different database of texture images and natural images.

The performance has been observed using well-known evaluation measures, precision

and recall, and compared with some state-of-art local patterns. Comparison shows a

significant improvement in the proposed method over existing methods.

97

6.1 Preliminaries

6.1 Preliminaries

6.1.1 Local binary pattern

The complete details of local binary pattern is given in Chapter 5, Section 5.1.1.

6.1.2 Local ternary pattern

2 5 38 5 29 1 7

-1 0 0+1 -1+1 -1 0

8 4 216 132 64 128

0 0 016 48 032 0 0

(a) (b) (c)

(d) (e)

-3 0 -23 -34 -4 2

0 0 01 01 0 0

1 0 00 10 1 0

8 4 216 132 64 128

8 0 00 73 10 64 0

(f)

Figure 6.1: Local ternary pattern calculation (a) a window example (b) difference of

neighboring and center pixel (c) ternary pattern for t=3 (d) ternary pattern divided in

two binary patterns (e) weights (f) weights multiplied by binary patterns and sum up

to pattern value

Local ternary pattern (LTP) is an extension of LBP for noisy images. In this local

pattern, threshold of neighboring and center pixel is processed by an interval instead

of single value. LTP can be explained mathematically as follows:

LTPp,r,t =

p−1∑n=0

2n × F7(In − Ic, t) (6.1)

F7(x, t) =

+1 if x ≥ t

−1 if x ≤ −t

0 if− t < x < t

(6.2)

where p, r and t are number of neighboring pixels, radius and threshold interval value

respectively. Parameter t depends on the maximum number of intensity and noise

98

Chapter 6. Local Neighborhood Difference Pattern : A New Feature Descriptor

in the image. Histogram of LTP map is created using Eq. 7.3, where function F2

is defined in Eq. 2.5.

Hist(L)∣∣LTP

=m∑a=1

n∑b=1

F2(LTP (Ic), L);

L ∈ [0, 511]

(6.3)

A 3×3 window example of LTP is explained in Fig. 6.1. More details about LTP can

be found in [144].

6.2 Proposed method

I6 I7 I8I5 Ic I1I4 I3 I2

11 1

115

(f) 10 0

2 0-1

(g) 00 1

-3 1 -2

(h) 11 1

1 36

(i)

10 0

-601

(j) 00 1

-11 -8

(k) 11 18 1 6

(l) 00 1

-5-6 1

(m)

1 1 00 11 1 1

8 4 216 132 64 128

2 5 38 5 29 1 7

8 4 00 237 1

32 64 128(a) (b) (c) (d) (e)

Pattern 10110111Pattern value 237

Figure 6.2: Local neighborhood difference pattern calculation (a) pixel presentation

(b) a window example (f-m) pattern calculation for each neighboring pixel (c) binary

values assigned to each neighboring pixel (d) weights (e) weights multiplied by LNDP

pattern and sum up to pattern value

99

6.2 Proposed method

A new feature extraction method called local neighborhood difference pattern (LNDP),

has been proposed in this chapter. As it appears from the name, this method extracts

the local features based on neighborhood pixel differences, and form a binary pattern

to represent each pixel in the image. For each pixel, 8 neighboring pixels have been

considered, and for each neighboring pixel again two most appropriate pixels are cho-

sen. Relationship of these two pixels have been obtained with neighboring pixel, and a

binary number is assigned. Similarly, for each neighboring pixel a binary number is ob-

tained. Pattern of these binary numbers is formed to represent each pixel, and finally,

histogram is constructed to represent the image in the form of LNDP. For neighboring

pixels In (n=1, 2, .., 8) of a center pixel Ic, LNDP can be computed in the following

procedure:

kn1 = I8 − In, kn2 = In+1 − In, for n = 1 (6.4)

kn1 = In−1 − In, kn2 = In+1 − In, ∀ n = 2, 3, .., 7 (6.5)

kn1 = In−1 − In, kn2 = I1 − In, for n = 8 (6.6)

Difference of each neighborhood pixel with two other neighborhood pixels have been

obtained in kn1 and kn2 . Based on these two differences, a binary number is assigned to

each neighboring pixel.

F1(kn1 , k

n2 ) =

1 kn1 × kn2 ≥ 0

0 else(6.7)

For the center pixel Ic, LNDP can be computed using above binary values as follows:

LNDP (Ic) =8∑

n=1

2n−1 × F1(kn1 , k

n2 ) (6.8)

Histogram for LNDP map can be achieved as follows:

Hist(L) |LNDP =m∑x=1

n∑y=1

F2(LNDP(x, y), L);

L ∈ [0, (28 − 1)]

In Fig. 6.2, an example of LNDP calculation has been demonstrated. In window (a)

and (b), numbering of neighborhood pixels and intensities have been shown. In window

(f)-(m), pattern calculation has been presented. For example, in window (f), pixel I1

has been considered, and differences of I1 with I8 and I2 are obtained using Eq. 7.4,

100


and those are ‘1’ and ‘5’ respectively. Since both the differences are positive, ‘1’ pattern

value is assigned to I1 pixel (using Eq. 6.7). Similarly, for other pixels, pattern values

have been obtained, and shown in window (c). Pattern values are multiplied by weights

as shown in window (d), and LNDP are obtained by summation of pattern values in

window (e).



(a) (b)

(c)

Figure 6.3: (a) LBP features (b) LNDP features (c) Concatenation of LBP and LNDP

In the proposed system, features have been extracted using local intensities from

the closest neighborhood of a center pixel. Earlier LBP [105] and the proposed LNDP,

both extract the local features. LBP is based on center-neighboring pixel relationship.

Alternatively, LNDP operator extracts the relationship among neighboring pixels itself.

Both the operators are complement to each other. Hence, in the proposed work, both

operators are employed to achieve the most of the information (Fig. 6.3). For both

LBP and LNDP, pattern map is extracted using each pixel of image except boundary

pixels. Histogram of LBP and LNDP are concatenated for final feature vector for each

image.

101


6.3.2 Algorithm

Queryimage

Localneighborhood

differencepattern

Queryfeaturevector

Similaritymatch

ImagesretrievedImage

database

Feature vectordatabase

Local binarypattern Histogram 2

Jointhistogram

Histogram 1


Block diagram of the presented method has been demonstrated in Fig. 6.4 and

algorithm for the same is given below:

1. Upload the image, and convert it into a gray scale image if it is a color image.

2. Compute local binary pattern of the image.

3. Compute local neighborhood difference pattern of the image.

4. Create histograms of LBP and LNDP maps.

5. Concatenate both histograms as feature descriptor.

6. Compute the distance of the query image feature vector with all database image

feature vectors using Eq. 1.1.

7. Sort the distances, and produce a set of images with least distances as similar

image results.

102


25 30 35 40 45 50 55 60 65 7020

30

40

50

60

70

80


Pre

cisi

on

CSLBPLEPINVLEPSEGLBPDLEPLTrPPM

(a)

25 30 35 40 45 50 55 60 65 7055

60

65

70

75

80

85

90


Rec

all


(b)

Figure 6.5: (a) Precision vs number of images retrieved (b) Recall vs number of images

retrieved in Database 1


To prove the excellence of the proposed method, four databases have been used for

experiment. In each experiment, every image of database has been used as a query

image, and precision and recall have been measured for the whole database. Simi-

lar process has been applied to some well-known local patterns, and compared with

the proposed method. The proposed algorithm is compared with CSLBP, LEPINV,

LEPSEG, LBP, DLEP and LTrP since they also depend on the local intensity of image.

103


6.4.1 Experiment 1

25 30 35 40 45 50 55 60 65 7020

30

40

50

60

70

80


Pre

cisi

on

25 30 35 40 45 50 55 60 65 7070

72

74

76

78

80

82

84

86

88


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP

25 30 35 40 45 50 55 60 65 7020

30

40

50

60

70

80


Pre

cisi

on

25 30 35 40 45 50 55 60 65 7070

72

74

76

78

80

82

84

86

88


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP

Figure 6.6: Comparison between LBP, LNDP and fusion method in Database 1

In the first experiment, image retrieval is performed on texture image databases.

Two texture image databases, Brodatz and Stex, are used for experiment. Results have

been presented in the following subsections.

104


Database 1

Details about Brodatz database have been given in Chapter 1 Section 1.2.1. Precision

and recall have been calculated for the Brodatz database images using Eq. (1.6-1.11).

Images are retrieved in a group of 25, 30, 35,.., 70 to measure the performance of

system for different numbers in images. Graph between precision and number of images

retrieved is demonstrated in Fig. 6.5(a), and graph of recall and number of images

retrieved is shown in 6.5(b). Graphs clearly demonstrate that, as compared to other

methods, the proposed method is better in terms of both precision and recall. Average

retrieval rate (ARR) of all methods have been presented in table 1. ARR of the

proposed method is upgraded from CSLBP, LEPINV, LEPSEG, LBP, DLEP and LTrP

up to 35.46%, 34.45%, 17.98%, 9.06%, 7.03% and 3.09%. A query example has been

presented in Fig. 6.7, where (a) and (b) are query images, and similar images retrieved

using proposed method are shown. Comparisons of LBP and LNDP are individually

shown in Fig. 6.6. Performance of LNDP is better than LBP and the fusion of both

the methods has given much better results.

Retrieved ImagesQuery Image

(a)

(b)

Figure 6.7: Query image example of Brodatz database images

105


Database 2

STex database is used for experiment. Details about STex database are given in

Chapter 1, Section 1.2.1. Each image is treated as query image from a database of

16 32 48 64 80 96 1120

10

20

30

40

50

60


Pre

cisi

on


(a)

16 32 48 64 80 96 11220

30

40

50

60

70

80


Rec

all


(b)



7,616 images to analyze the performance without discrimination. Precision and recall

graphs have been shown in Fig. 6.8(a) and 6.8(b). The performance have been sig-

nificantly improved from CSLBP, LEPINV, LEPSEG, LBP, DLEP and LTrP. During

experiment, fixed number of images have been retrieved (16,32,..,112) as shown in Fig.

6.8 to observe the performance over different number of images. ARR of presented

methods have been significantly raised from CSLBP, LEPINV, LEPSEG, LBP, DLEP

106


and LTrP up to 52.95%, 116.33%, 61.08%, 7.85%, 10.56% and 10.06%. Precision and

recall of LBP and LNDP separately demonstrated in Fig. 6.9. LNDP outperforms

LBP and fusion of both the methods has further improved the accuracy.

16 32 48 64 80 96 11210

20

30

40

50

60


Pre

cisi

on

16 32 48 64 80 96 11250

55

60

65

70

75

80


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP

16 32 48 64 80 96 11210

20

30

40

50

60


Pre

cisi

on

16 32 48 64 80 96 11250

55

60

65

70

75

80


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP


107


Table 6.1: Average retrieval rate for STex and Brodatz databases

Method CSLBP LEPINV LEPSEG LBP DLEP LTrP PM

Brodatz database 56.41 56.83 64.77 70.06 71.39 74.12 76.41

STex database 37.31 26.38 35.43 52.92 51.62 51.85 57.07

6.4.2 Experiment 2

In the next experiment, natural image databases have been chosen. Two datasets of

10,000 and 1600 images have been selected.

Database 3

Corel-10k database is used as a natural image database. Details about Corel-10k are

given in Chapter 1 Section 1.2.1.

In Corel-10k, database images are retrieved in group of 10,20,..,100 during experi-

ment. For every experiment, precision and recall have been computed using Eq. (1.6-

1.11). Plots of precision and recall with number of images retrieved have been shown

in Fig. 6.10(a) and 6.10(b). The graphs clearly demonstrate that the proposed method

outperforms others. In terms of precision/recall the proposed method performance is

significantly increased from CSLBP, LEPINV, LEPSEG, LBP, DLEP and LTrP up

to 61.96%/67.6%, 47.96%/51.57%, 5.88%/23.18%, 13.8%/13.65%, 7.06%/8.26%, and

12.88%/6.14%.

Performance of the system with respect to each category is also demonstrated using

graphs. In Fig. 6.11(a) and 6.11(b), precision and recall for each category have been

shown. Also, comparison of LBP and LNDP is shown in Fig. 6.12. Table 6.2 explains

the total precision and recall for every database and the proposed method outperforms

others.

Database 4

In fourth experiment, MIT urban and natural scene database [4] is used, and further

details about this database are given in Chapter 1, Section 1.2.1.

Each class of this database is loaded with 200 images. Hence, a group of 10, 20,

.., 200 images are retrieved in each experiment for every image of database. In Fig.

6.13, precision and recall have been shown for every group of retrieved images. The

108


25 30 35 40 45 50 55 60 65 7010

15

20

25

30

35

40

45


Pre

cisi

on

25 30 35 40 45 50 55 60 65 702

4

6

8

10

12

14

16

18


Rec

all



(a) (b)

25 30 35 40 45 50 55 60 65 7010

15

20

25

30

35

40

45


Pre

cisi

on

25 30 35 40 45 50 55 60 65 702

4

6

8

10

12

14

16

18


Rec

all



(a) (b)



performance in terms of precision and recall is better than other methods as visible

in graphs. In terms of precision/recall, the performance of the proposed method is

109


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 10010

20

30

40

50

60

70

80

90

100

Image category

Pre

cisi

on


(a)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

10

20

30

40

50

60

70

80

Image category

Rec

all


(b)

Figure 6.11: (a) Precision vs image category (b) Recall vs image category in Database

3

significantly increased from CSLBP, LEPINV, LEPSEG, LBP, DLEP and LTrP up

to 25.84%/24.26%, 21.01%/18.73%, 13.19%/9.98%, 3.06%/13.08%, 5.29%/13.94% and

3.09%/0.45%.

Table 6.2: Results of precision and recall for all methods

MethodCorel-10k database MIT natural scene

Precision (n=10) Recall (n=100) Precision (n=10) Recall (n=200)

CSLBP 26.43 10.15 55.82 31.38

LEPINV 28.93 11.22 58.04 32.85

LEPSEG 34.01 13.81 62.06 35.46

LBP 37.62 14.97 68.16 34.49

DLEP 39.99 15.71 66.71 34.23

LTrP 37.93 16.03 68.14 38.82

PM 42.81 17.01 70.24 39.00

110


10 20 30 40 50 60 70 80 90 10010

15

20

25

30

35

40

45


Pre

cisi

on

10 20 30 40 50 60 70 80 90 1002

4

6

8

10

12

14

16

18


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP

10 20 30 40 50 60 70 80 90 10010

15

20

25

30

35

40

45


Pre

cisi

on

10 20 30 40 50 60 70 80 90 1002

4

6

8

10

12

14

16

18


Rec

all

LBPLNDPLBP+LNDP

LBPLNDPLBP+LNDP


Precision and recall graphs with every category are presented in Fig. 6.14. In most

of the categories, the proposed method outperforms other methods. Performance of

LBP and LNDP are compared in Fig. 6.15. It is clearly visible through precision and

recall that LNDP outperforms LBP and fusion of both the methods is better. A query

image example is also demonstrated in Fig. 6.16. Two query images are denoted as

111

6.5 Conclusion

0 20 40 60 80 100 120 140 160 180 20030

40

50

60

70

80


Pre

cisi

on


(a)

0 20 40 60 80 100 120 140 160 180 2000

5

10

15

20

25

30

35

40


Rec

all


(b)



(a) and (b), and the most similar eight images are retrieved and are shown as results.

Feature vector length of each method have been given in table 6.3.

6.5 Conclusion

In this chapter, a novel local feature descriptor has been proposed and called as local

neighborhood difference pattern (LNDP). The proposed local feature descriptor LNDP,

is a complementary method over LBP as it extracts the relationship among neighboring

pixels by comparing them mutually. On the contrary, LBP computes the relationship

of neighboring pixels with center pixel. In the proposed feature extraction method,

112


Coast & Beach Forest Highway City Center Mountain Open Country Street Tall Building

0.4

0.5

0.6

0.7

0.8

0.9

1

Image category

Pre

cisi

on


(c)

Coast & Beach Forest Highway City Center Mountain Open Country Street Tall Building0.1

0.2

0.3

0.4

0.5

0.6

Image category

Rec

all


(d)

Figure 6.14: (a) Precision vs image category (b) Recall vs image category in Database

4



CS LBP 16

LEPINV 72

LEPSEG 512

LBP 256

DLEP 2048

LTrP 767

LNDP 256

LNDP+LBP 512

113

6.5 Conclusion

20 40 60 80 100 120 140 160 180 20030

35

40

45

50

55

60

65

70

75


Pre

cisi

on

LBPLNDPLBP+LNDP

20 40 60 80 100 120 140 160 180 2000

5

10

15

20

25

30

35

40


Rec

all

LBPLNDPLBP+LNDP

20 40 60 80 100 120 140 160 180 20030

35

40

45

50

55

60

65

70

75


Pre

cisi

on

LBPLNDPLBP+LNDP

20 40 60 80 100 120 140 160 180 2000

5

10

15

20

25

30

35

40


Rec

all

LBPLNDPLBP+LNDP


LBP and LNDP are combined as they complete each other on the basis of local feature

extraction. The proposed method has been applied in image retrieval of texture and

natural image datasets. Two texture datasets and two natural image datasets have

been chosen for experiments. The performance of proposed method has been observed

114


(b)

(a)

Query Image Retrived Images

Figure 6.16: Query image example of urban and natural scene database, MIT

using precision and recall graphs, and compared with some existing local patterns.

Evaluation measures clearly demonstrate that the proposed method outperforms other

methods in terms of accuracy.

115

Chapter 7

Object Tracking using Joint Histogram

of Color and Local Rhombus Pattern

Object tracking is a crucial issue in the field of pattern recognition and computer

vision. It mainly finds applications in the areas of vehicle navigation, traffic monitoring,

face tracking, etc. Object tracking has two major tasks, first is feature extraction of

the target object in the video sequence, and second is to track the target object in the

video sequence, using features.

In this chapter, a feature extraction method named local rhombus pattern (LRP)

is proposed, and it is different from the conventional local binary pattern as it extracts

the local relationship of neighboring pixels itself instead of local relationship with

the center pixel. The proposed method is combined with HSV (hue, saturation and

value) quantized histogram, and is applied to object tracking using mean shift tracking

algorithm. Experiments are carried out for road traffic and sports video, using joint

histogram of LRP and HSV color space, and compared to two state-of-art approaches.

The experimental results show the effectiveness of the proposed method over existing

methods.

117

7.1 Local rhombus pattern

7.1 Local rhombus pattern

Local rhombus pattern - 0111Local rhombus pattern value - 14

13 17 1118 16 1815 12 19

00 111 1

(a) (b)-51-33 1 7

(d) (e) (f)10 0

-701

(c)1

1 01

84 1

2

84 14 0

2(h) (i)(g)

I6 I7 I8

I5 Ic I1

I4 I3 I2

00 1

-4 1 -6

Figure 7.1: Local rhombus pattern sample window example

In the proposed method, features are derived using neighboring pixels that evaluate

the mutual relationship among neighboring pixels instead of the center pixel. Four

neighborhood pixels, two each in the vertical and horizontal directions have been used

for pattern formation. For each of the four pixels, two neighboring pixels are considered,

and relationships based on their comparison are extracted.

As shown in Fig. 7.1(a), I1, I3, I5 and I7 are considered for pattern formation.

A sample window of the image is given in Fig. 7.1(b), and steps involved in pattern

creation process are demonstrated in Fig. 7.1(c-f). Fig. 7.1(g-i) shows how the pattern

values are obtained from local rhombus pattern. In Fig. 7.1(c), the pixel I1 is sub-

tracted from pixel I2 and I8, and based on both difference values, a pattern is assigned

to I1. If both difference values are of different signs, i.e., positive and negative, then

‘0’ is assigned, and if both difference values are of same sign, i.e., both positive or both

118

Chapter 7. Object Tracking using Joint Histogram of Color and Local RhombusPattern

negative then ‘1’ is assigned to that pixel. Hence, in this example, 0, 1, 1 and 1 values

are assigned to I1, I3, I5 and I7, respectively. These values are further multiplied with

weights as mentioned in Fig. 7.1(h) and summed up to a single pattern value as shown

in Fig. 7.1(i). The four pixels which form a rhombus around the center pixel, are used

for pattern creation. Hence, this method is named as the local rhombus pattern.

For a pixel (x, y), LRP is formulated as follows:

T n1 = In−1 − In, T n2 = In+1 − In, ∀ n = 3, 5, 7. (7.1)

T n1 = I8 − In, T n2 = In+1 − In, for n = 1 (7.2)

F1(Tn1 , T

n2 ) =

1 T n1 × T n2 ≥ 0

0 else(7.3)

LRP (x, y) =3∑i=0

2i × F1(T2i+11 , T 2i+1

2 ) (7.4)

where In;n = 1, 2, ..., 8 are neighboring pixel positions as shown in Fig. 7.1(a), and

LRP (x, y) is local rhombus pattern value of pixel (x, y) in the image.

The proposed feature descriptor is motivated by the conventional local binary pat-

tern. LBP is a very strong feature descriptor but it has a high dimension for joint

histogram purpose. The modified rotation invariant uniform patterns [106] have less

feature vector length, however it loses information with feature reduction. The pro-

posed method, named local rhombus pattern (LRP) is based on mutual relationship

among neighboring pixels rather than the center pixel and neighboring pixel relation-

ship, and originally it has less feature vector length, and no further reduction is re-

quired. Hence, it is not losing extra information in reduction of feature vector length.

7.2 Framework of proposed algorithm

7.2.1 Target object representation

The proposed method is inspired by the local binary pattern that extracts the local

information based on neighboring pixels and center pixel [105]. Ning et al. used joint

histogram of LBP and RGB color channels for object tracking [102].

In the proposed work, the HSV color space is used for color information of the

target object. It separates the color component (hue), brightness (saturation) and

119


intensity (value) such that individual information regarding hue, saturation and value

can be extracted. Hue, saturation and value components of the HSV color space are

quantized in order to reduce the complexity of the algorithm. Hue, saturation and value

components are quantized into 18, 3 and 3 bins respectively. Texture information of

the object is created by LRP and joint histogram of LRP, and hue, saturation and

value components are generated. The local rhombus pattern has a total of 16 features,

and hue, saturation and value have 18, 3 and 3 bins respectively [171]. Hence, the total

length of histogram is 16×18×3×3. The target object is tracked in next frames using

mean shift tracking algorithm [22]. The algorithm for the proposed system is given in

the following sequence:

7.2.2 Algorithm

Input: Video sequence with location of the target object in the first frame.

Output: Tracked object in full video.

1. Upload the video and select the target object in the first frame for tracking.

2. Compute LRP of the target object in the first and next frame.

3. Convert the current and next frame from RGB to HSV color space, and quantize

hue, saturation and value bins to 18, 3 and 3 respectively.

4. Create joint histogram of hue, saturation, value and LRP for the target object

in the current and next frame.

5. Track the target object in the next frame using mean shift tracking algorithm

with joint HSV and LRP histogram.

6. Repeat the process from step 2 to 5 till the end frame.


In this work, two experiments on different videos have been conducted and the pro-

posed algorithm is compared with the following two algorithms.

120


LBPriu2 RGB : Rotation invariant uniform local binary pattern + RGB color his-

togram [102]

LEP RGB : Local extrema pattern + RGB color histogram [93].

The proposed method is abbreviated as LRP HSV. In each experiment, the target

object is selected manually and marked as a red box. Our algorithm first extracts

features, and then tracks the required object in the next frames.

In the first experiment, a video of nearly similar moving cars is used. The video

Frame 1

Frame 90 Frame 120

(a)Frame 160 Frame 201

Frame 45Frame 20 Frame 64Frame 63

Frame 140

Frame 1 Frame 20 Frame 45 Frame 63 Frame 64

Frame 90 Frame 120 Frame 160 Frame 201Frame 140(b)

Frame 63Frame 20 Frame 45 Frame 64Frame 1

Frame 201Frame 160(c)

Frame 140Frame 120Frame 90

Figure 7.2: Object tracking in road traffic video using (a)LBPriu2 RGB (b) LEP RGB

and, (c) LRP HSV

121


sequence comprises 201 frames of size 640× 480. A car is selected as the target object

for tracking, and marked with red color box, as shown in Fig. 7.2. In next frames, the

target object is tracked, and shown in red color. Results of LBPriu2 RGB, LEP RGB

and LRP HSV are shown in Fig. 7.2(a), (b) and (c) respectively. It has been observed

that up to frame 63, LBPriu2 RGB and LEP RGB tracked the correct object, whereas

they lost the track of object near after frame 63. The failure in tracking can be

attributed to the fact that, near frame 63 another car went through the target object,

and both methods could not identify the correct car between the two. On the contrary,

the proposed method handled this issue, and tracked the correct object till the end as

shown in Fig. 7.2(c). In the second experiment, a video of football game is employed for


Frame 180 Frame 215 Frame 227 Frame 229 Frame 232(a)


Frame 180 Frame 215 Frame 227 Frame 229 Frame 232(b)


Frame 180 Frame 215 Frame 227 Frame 229 Frame 232(c)

Figure 7.3: Results of a player tracking in football video of (a) LBPriu2 RGB (b)

LEP RGB and (c) LRP HSV

122


the purpose of tracking one player. The tracking results of all three methods have been

demonstrated in Fig. 7.3. At the beginning of the video, all three algorithms worked

equally well and tracked the correct object. This is because the target object was almost

in isolation, and no disturbing objects were presented nearby. Whereas, towards end

frames, LBPriu2 RGB and LEP RGB missed the target object, and started tracking

other spurious objects. The reason for incorrect tracking is that towards the end video

frames, other players with similar appearance have come close to the target object, and

both methods have failed to distinguish the target object from other objects. Whereas,

in the proposed method, the target object is tracked correctly till the end of the video,

and are shown in Fig. 7.3(c).

Feature vector length and tracking time of all three methods are given in the table

7.1. Computation time depends on complexity of feature extraction method as well as

feature vector length. Feature vector length of LRP HSV is considerably less than of

LBPriu2 RGB and LEP RGB, hence, computation time is also less than LBPriu2 RGB

and LEP RGB.

Table 7.1: Feature vector length and process time of proposed method and previous

methods

Method Feature vector length Time taken

LBPriu2 RGB 10× 8× 8× 8 = 5120 1:08

LEP RGB 16× 8× 8× 8 = 8192 1:15

LRP HSV 16× 18× 3× 3 = 2592 1:02

7.4 Conclusion

A novel algorithm in the field of object tracking is proposed for feature extraction.

The proposed LRP extracts the local relationship among neighboring pixels for texture

features, and the HSV color space is used for color features. Next, a joint histogram

is constructed for color-texture features. The proposed method is applied to object

tracking application. Object tracking is performed on two video sequences of traffic

and sports using mean shift tracking algorithm. Experiments have been conducted

on a car traffic video and a football sport video. Experimental results prove that the

123

7.4 Conclusion

proposed algorithm has an important advantage over other methods that it is able to

track correct objects in similar looking objects in the video. It can be very useful in

traffic monitoring and other tracking applications.

124

Chapter 8

A Hierarchical Shot Boundary De-

tection Algorithm

A video is considered as high dimensional data which is tedious to process. Shot

detection and key frame selection are activities to reduce redundant data from a video

and make it presentable in few images. Researchers have worked in this area diligently.

Usually in video clips, shots repeat after one another, in that case the basic shot

detection scheme gives redundant key frames from same video. In a conversation video

or in general, shots usually repeat after one or more shots.

Basic shot detection schemes provide shot boundaries in a video sequence and key

frames are selected based on each shot. Usually in video clips, shots repeat after one

another, in that case the basic shot detection scheme gives redundant key frames from

same video. In this work, we have proposed a hierarchical shot detection and key

frames selection scheme which reduce a considerable amount of redundant key frames.

For temporal analysis and abrupt transformation detection, color histogram has been

used. After shot detection, spatial analysis has been done using local features and local

binary patterns have been utilized for local feature extraction. The proposed scheme is

applied to three video sequences of news video, movie clip and tv-advertisement video.

125

8.1 Hierarchical clustering for shot detection and key frame selection

In a video, many shots may be similar in visualization. In a conversation video or in

general, shots usually repeat after one or more shots. Usually, shot boundary detection

algorithms classify all shots in different clusters irrespective of redundant shots. In the

proposed method, the authors have developed a hierarchical shot detection algorithm

in two stages. First stage extracts temporal information of video and detect the initial

shot boundary and extract the keyframes based on each shot. In the second stage,

spatial information of extracted key frames from first stage are analyzed, and redundant

keyframes are excluded.

Figure 8.1: Consecutive frames and shot boundary of a video

8.1 Hierarchical clustering for shot detection and

key frame selection

Shot detection problem is very common in video processing. Processing a full video

at a time and extracting shot boundary may give results of similar shots. Frames of a

video are shown in Fig. 8.1. Ten different shots are there in the video, in which 3, 5

and 7, and 4, 6 and 8 are of similar kind. Hence, keyframes extracted from these shots

would be similar, and redundant information will be extracted from the video. It is a

small example and it can happen in large video. To solve this problem, a hierarchical

scheme has been adopted for keyframe extraction from a video.

For abrupt shot boundary detection, we have used RGB color histogram. RGB

color histogram provides global distribution of three color bands in RGB space. A

quantized histogram of 8 bins for each color channel has been created. Initially, each

color channel has been quantized in 8 intensities and histogram has been generated

126

Chapter 8. A Hierarchical Shot Boundary Detection Algorithm

using the following equation.

HistC(L) =m∑a=1

n∑b=1

F (I(a, b, C), L)

where C = 1, 2, 3 forR,G,B color bands

(8.1)

F (a, b) =

1 if, a = b

0 else.(8.2)

where size of the image is m× n and L is total number of bins that is ranging from 0

to 7. I(a, b, C) is the intensity of color channel C at position of (a, b).

For temporal information in a video sequence, each frame of video has been ex-

tracted and RGB color histogram has been generated. Difference of each frame to the

next frame is extracted using the following distance measure.

Dis(I1, I2) =l∑

s=1

|FI1(s)− FI2(s)| (8.3)

where Dis(I1, I2) is the distance between frame I1 and I2. FI1 and FI2 are feature

vectors of frame I1 and I2, and l is the feature vector length. If the measured distance

between two frames is greater than a fixed threshold value, then those frames are

separated in different clusters. This process is applied for each consecutive pair of

frames in video sequence. In this process, we get different clusters of similar frames.

After getting clusters, we extract one key frame from each cluster. For keyframe

extraction, entropy is calculated for every frame in one cluster using Eq. 8.4, and the

maximum entropy frame has been chosen as a keyframe for that cluster.

Ent(I) = −∑i

(pi × log2(pi)) (8.4)

where p is the histogram for the intensity image I.

During this process, consecutive keyframes will not hold the similar information.

However, except consecutive positions, two or more non-consecutive clusters may con-

tain similar types of frames as a video sequence may hold similar shots at non-consecutive

positions. Due to this, many keyframes may hold redundant information. To overcome

this issue, hierarchical process is adopted in this work. Local binary pattern (LBP) is

a well-known texture feature descriptor [106]. It computes relation of each pixel with

neighboring pixels. Explanation about LBP is given in detail in Chapter 5, Section

127


5.1.1. LBP is extracted from each of the keyframe obtained from the above process.

Now, the distance between each frame is mutually calculated using Eq. 8.3 as shown

in Fig. 8.2. Distance of frame 1 has been calculated with frame 2, 3 up to n. Distance

of frame 2 has been calculated with frame 3, 4 up to n. In a similar process, distance of

frame n− 1 has been calculated with frame n. A tri-diagonal matrix has been created

for all the distance measures. Now, if the distance between two or more frames is

less than a fixed threshold, then all those frames are grouped into one cluster. In this

process, even non-consecutive similar keyframes have been clustered, and completely

non-redundant data in different clusters have been obtained. Again, the entropy of

each of the frames in different cluster is calculated, and maximum entropy frame is ob-

tained as final keyframe. Finally, we get a reduced number of final key frames without

any redundant information.

1 2 3 n-1

2 3 4 n

Figure 8.2: Distance measure calculation in 2nd phase


8.2.1 Algorithm

Phase 1:

Input : Video clip

Output: Initial key frames

Upload video and extract all frames.

for i=1: n1

Calculate the RGB histogram of frame i and i+1.

Calculate Dist(i,i+1).

128


If(Dist(i,i+1)> Th1)

put i and i+1 in different clusters.

end

end

Calculate the entropy of each frame in different clusters.

Select maximum entropy frame from each cluster as a keyframe.

n1=total number of frames in video

Phase 2: Input : Initial key frames

Output: Selected final key frames

Load all keyframes extracted from Phase 1 and calculate LBP histogram for all.

Compute distance as explained in Fig. 8.2 and make a distance matrix ‘D’ for all

frames.

Initialize a zero vector key array of size n2.

for i=1: n2

If(key array(i))=0)

assign key array(i)=1

Initialize a Stack ‘S’ of size n2 and push first element i.

while(S is not empty)

t1= pop an element from S

Check if the distances between t1 and other frames are less than Th2 then

put them in one cluster t2.

Push all the elements of t2 in the Stack ‘S’.

end

Delete redundant frames from cluster if there is any.

else

continue

end

end

Calculate the entropy of each frame in different clusters.

Select maximum entropy frame from each cluster as a keyframe.

n2= number of keyframes extracted from Phase 1.

129

8.3 Experimental results

8.3 Experimental results

For experimental purpose, we have used three different videos of news, advertisement

and movie clip. General details about all three videos are given in table 8.1. All three

videos are of different size with respect to time and frame size. In news video, anchor

Table 8.1: Video details

Video Time(min.) Frame size Frame/sec

News video 02:55 1280× 720 30

Movie clip 00:30 720× 384 29

Advertisement 01:00 1920× 1080 25

and guest are present at first. The camera is moving from anchor to guest and guest

to anchor many times in the video. Hence, in shot detection, many shots are having

similar kind of frames (either anchor or guest). Further, other events are shown in

the video repeatedly one after another shot. All these redundant shots are separated

initially and key frames are selected. In the second phase of algorithm, redundant key

frames are clustered and keyframes of maximum entropy are extracted as final key

frames. Initially, 63 key frames are extracted and after applying hierarchical process

only 12 key frames are extracted at the end. This hierarchical process has removed a

significant amount of redundant key frames for further processing.

Second video for the experiment is a small clip of a animation movie called ‘Ice

age’. In this video, conversation between animated character of lion, elephant and

other animals are shown. Camera is moving from one kind of frames to other kind

of frames. The same hierarchical process is applied to the video clip. Initially, 11

key frames are extracted as shown in Fig. 8.3(a). However, many frames are similar

that is clearly visible in results, and then by using LBP for spatial information, 6 final

non-redundant key frames are extracted. In Fig. 8.3(b) keyframes of final phase have

been demonstrated.

The third video is taken for experiment is of a Tata-sky advertisement. The pro-

posed method is applied to the video and two phase keyframes are collected. The

keyframes of phase one and two are shown in Fig. 8.4. It is clearly visible that us-

ing hierarchical method, the number of key frames has been reduced significantly and

130


Figure 8.3: Video 1: (a) Initial stage keyframes (b) final stage keyframes

redundant key frames have been removed. Information regarding extracted key frames

in phase one and two are given in table 8.2. Summary of reduced keyframes explains

that the proposed algorithm has removed the repeated frames from key frames detected

from the color histogram method. Further, in phase two using LBP, we have obtained

optimum amount of key frames which summarize the video significantly.

Table 8.2: Number of keyframes extracted in both phases

Video Keyframes in Phase 1 Keyframes in Phase 2

News video 63 12

Movie clip 11 6

Advertisement 34 11

131

8.4 Conclusions

Figure 8.4: Video 1: (a) Initial stage keyframes (b) Final stage keyframes

8.4 Conclusions

In the proposed work, shot boundary detection problem has been discussed and further,

key frames have been obtained. A hierarchical approach is adopted for final keyframes

selection. This approach helped in reducing similar keyframes in non-consecutive shots.

Initially, a color histogram technique is used for temporal analysis and abrupt transition

is obtained. Based on abrupt transition, shots are separated and keyframes are selected.

Moreover, spatial analysis has been done in obtained keyframes using local binary

pattern and finally redundant keyframes are removed. In this process, a significant

amount of redundant keyframes are removed. The proposed method is applied on

three videos of news reading, movie clip and tv advertisement, and experiments show

that the proposed algorithm helped in removing redundant keyframes.

132

Chapter 9

Conclusions and Future Scope

9.1 Conclusions

In this work, we have presented texture features for different pattern recognition appli-

cations which includes content based image retrieval, object tracking and shot bound-

ary detection. Also, combination of image features (color and texture) is measured

to enhance the feature description. Proposed methods are demonstrated on publicly

available databases.

In content based image retrieval, feature extraction is a dominant step that can lead

the whole retrieval system in a positive or negative direction. Image feature extraction

in image retrieval application extremely depends on image database which have been

used to match the query image. Texture is a noticeable feature of image that can be

found in most of the kind images. Texture might be represented as a repeated pattern

in the image and can be extracted in a good way using local features.

An attempt has been made to extract local features using wavelet domain in Chap-

ter 2. Wavelet domain gives subband image which contains directional information.

Further, local patterns were used to extract local features of those subband images.

Local extrema patterns (LEP) [93] and directional local extrema patterns (DLEP) [95]

133

9.1 Conclusions

were proposed by Murala et al. Both of them are local feature descriptors that extracts

local information base on four directions. Discrete wavelet transform (DWT) captures

the low frequency and high frequency features and helps LEP and DLEP to create

more detailed features. Experiments are done on Corel-5k and Corel-10k databases

and both the methods have compared with some existing local patterns. Both the

methods are better in performance from other method. Although, performance of

proposed method 2 (DWT+DLEP) is better than proposed methods 1 (DWT+LEP).

On the other hand, feature vector length of proposed method 1 is less than proposed

method 2, hence proposed method 1 is faster than proposed method 2 in computation.

Image retrieval problem is solved using color and texture features in Chapter 2. HSV

color space is used for color descriptor and combined with texture descriptor. Hue and

saturation components are used for color information and value component is used to

extract texture features with the help of GLCM and LEP. GLCM extract the pixel pair

relation in terms of occurrence in image. Initially, LEP is extracted from value compo-

nent for local pattern information. Further, for each pixel pair, information is collected

using GLCM. The proposed method is using co-occurrence information of local extrema

pattern, hence it is called local extrema co-occurrence pattern (LECoP). To combine

color and texture features a joint histogram is created for hue saturation and LECoP.

The proposed method’s effectiveness is tested on three natural (Corel-1k, Corel-5k and

Corel-10k) and two color-texture (MIT VisTex and STex) image databases. The pro-

posed technique is compared with some color-texture features and proved its excellence

in terms of precision, recall and F-measure curves. The performance (precision %, recall

%) of the proposed (52.50,23.29) method has been improved from CS LBP+colorhist

(44.08, 18.57), LEPSEG+colorhist (35.58, 13.48), LEPINV+colorhist (41.25, 15.74) ,

Wavelet+colorhist (42.28, 17.34), Joint LEP colorhist (44.14, 16.77), Joint colorhist

(43.96,16.66) in Corel-10k database. Similarly in STex texture database (ARR %)

of the proposed (74.15) method has been improved from CS LBP+colorhist (53.33),

LEPSEG+colorhist (46.37), LEPINV+colorhist (48.10) , Wavelet+colorhist (45.08),

Joint LEP colorhist (59.90), Joint colorhist (59.90) in Corel-10k database. Further,

performance of the proposed method is evaluated using four distance measure in which

d1 distance has been proved the best among four. The main contribution of this work

is to extract pixel pair information in local patterns and it is noticed that it worked

134

Chapter 9. Conclusions and Future Scope

effectively. Chapter 3 is also based on a co-occurrence pattern using pixel pair infor-

mation. In Chapter 3, center symmetric local binary pattern (CSLBP) is extracted

from the original gray scale image. Co-occurrence matrix is collected from CSLBP

map in different directions and distances. Performance of the system is measured on

different combinations of directions and distances and four combinations are chosen

to collect the features. Combinations of 1 and 2 distances and 0◦, 45◦, 90◦ and 135◦

directions are used to extract feature and integrated in one feature vector. With this,

we are getting co-occurrence information in different directions. This method can be

treated as a rich texture feature descriptor as it contains local pattern information with

co-occurrence in different distance and directions. This feature descriptor is tested on

textural (MIT VisTex, Brodatz), facial (ORL face database) and bio-medical (OASIS

MRI) image database and compared with existing local feature descriptors. The pro-

posed method has proved its significance in all different types of databases. Hence, it

can be performed in different pattern recognition applications as a texture feature.

In chapter 4, a novel feature descriptor called, local tri-directional pattern has been

proposed. This local pattern is an extened version of LBP and it extracts local infor-

mation based on the difference neighboring pixels in three direction. Using the same

three directions, one magnitude pattern is extracted and both patterns are combined

to extract features for CBIR system. The proposed feature descriptor are applied to

textural (MIT VisTex and Brodatz) and facial (ORL face database) image databases.

A new feature extraction method is proposed in chapter 5 and named as local neigh-

borhood difference pattern. The proposed feature descriptor transforms the mutual

relationship of all neighboring pixels in a binary pattern. Both LBP and LNDP are

opposite to each other as they obtain different information using local pixels. In the

proposed CBIR system, both LBP and LNDP features are combined. To prove the

excellence of the proposed method, four different database of textural images (STex

and Brodatz) and natural images (Corel-10k and MIT natural and urban scene image

database) are used. Performance has been analyzed using precision and recall for all

databases and compared with some existing local patterns. Performance of the pro-

posed method (70.24, 39.00) in terms of (precision %, recall %) has been improved

from CSLBP (55.82, 31.38), LEPINV (58.04, 32.85), LEPSEG (62.06, 35.46), LBP

135

9.2 Future scope

(68.16, 34.49), DLEP (66.71, 34.23), LTrP (68.14, 38.82) in MIT natural and urban

scene database.

In chapter 6, local rhombus pattern is proposed for texture features and combined

with HSV color histogram and applied to object tracking. Object tracking requires

many steps for processing. In this work, only feature extraction method is proposed

that is a very crucial step. To track the object, mean shift tracking algorithm is used.

The proposed method is tested on football sports and car traffic video sequences and

compared with existing LBP [102] and LEP [93] based tracking algorithms. Visual

results have been shown using frames and observed that the proposed method worked

fine when two similar objects are near or crossing each other whereas earlier approaches

failed to recognize actual object.

A shot detection problem is solved for reducing the repetitive keyframes in a video.

Shot detection is common problem in video analysis. After shot detection, keyframe ex-

traction need to be done that makes a video ready for further process as it reduces huge

amount of data. However, still there exist many similar keyframes which make a sys-

tem slow in further process. A hierarchical approach is proposed using color histogram

and local binary pattern. The proposed method is tested on three video sequences.

Initial keyframe detection is performed using color histogram. Final keyframes are

extracted from a set of initial keyframes using LBP. Experiment results shows that the

hierarchical approach has reduced a huge amount of data.

9.2 Future scope

The presented work in this thesis leaves some scope to extend the work in computer

vision applications. Some of them are as follows:

1. The proposed features can be utilized in secure image retrieval using encryption

techniques.

2. Proposed feature are mainly based on texture and few are integrated with color

features. Integration of shape features with proposed techniques might used to

enhance the image feature extraction.

136

Chapter 9. Conclusions and Future Scope

3. In chapter 6, feature are extracted using closest neighborhood in LNDP and LBP

in radius one. Extended neighborhood can be used for feature extraction. As

LBP has proved better features in extended neighborhood for radius two and

three, LNDP can be utilized in a similar way along with LBP.

4. Proposed feature descriptors are based on texture and rich in extracting in texture

information. They can be utilized for video retrieval and image based video

retrieval.

5. Proposed hierarchical shot detection approach can be utilized for a video retrieval

system.

137

Appendix

The proposed methods in this thesis are compared with some existing methods.

Few of them are already explained in thesis chapters since they are required as a prior

knowledge for proposed techniques. Local binary pattern (LBP), local ternary pattern

(LTP), local extrema pattern (LEP), directional local extrema pattern (DLEP) and

center symmetric local binary pattern (CSLBP) are explaned in detail in previous

chapters. For LBP, values of P and R are taken as 1 and 8 respectively (nearest

neighboring pixels). For CSLBP, values of P, R and T are taken as 1, 8 and 2.6

respectively. (Refer the previous chapters for explanation)

Rest of the techniques which are used in comparison are explained below:

Local tetra pattern (LTrP)

Murala et al. proposed local tetra pattern (LTrP) [96]. Given image I, the first-order

derivatives along 0◦ and 90◦ directions are denoted as I1θ |θ=0◦,90◦ . Let gc denote the

center pixel in I, and let gh and gv denote the horizontal and vertical neighborhoods

of gc, respectively. Then, the first-order derivatives at the center pixel can be written

as

I10◦(gc) = I(gh)− I(gc) (9.1)

I190◦(gc) = I(gv)− I(gc) (9.2)

139

Appendix

and the direction of the center pixel can be calculated as

I1Dir.(gc) =

1, I10◦(gc) ≥ 0 and I190◦(gc) ≥ 0

2, I10◦(gc) < 0 and I190◦(gc) ≥ 0

3, I10◦(gc) < 0 and I190◦(gc) < 0

4, I10◦(gc) ≥ 0 and I190◦(gc) < 0

(9.3)

From 9.2, it is evident that the possible direction for each center pixel can be either 1,

2, 3, or 4, and eventually, the image is converted into four values, i.e., directions.

The second-order LTrP 2(gc) is defined as

LTrP 2(gc) = {f1(I1Dir.(gc), I1Dir.(g1)), f1(I1Dir.(gc), I1Dir.(g2)), ..., f1(I1Dir.(gc), I1Dir.(gP ))}|P=8

(9.4)

f1(I1Dir.(gc), I

1Dir.(gP )) =

0, I1Dir.(gc) = I1Dir.(gP )

I1Dir.(gP ), else(9.5)

From 9.4 and 9.5, 8-bit tetra patterns have been obtained for each center pixel. Then,

they are separated into four parts based on the direction of center pixel. Finally, the

tetra patterns for each part (direction) are converted to three binary patterns.

Let the direction of center pixel (I1Dir.(gc)) obtained using 9.3 be“1; then, LTrP 2

can be defined by segregating it into three binary patterns as follows:

LTrP 2|Direction=2,3,4 =P∑p=1

2p−1 × f2(LTrP 2(gc))|Direction=2,3,4 (9.6)

f2(LTrP2(gc))|Direction=φ =

1, if LTrP 2(gc) = φ

0, else(9.7)

where φ = 2, 3, 4.

Similarly, the other three tetra patterns for remaining three directions (parts) of

center pixels are converted to binary patterns. Thus, there are 12 (4×3) binary pat-

terns. Magnitude pattern (LP) has been obtained using following equations:

MI1(gp) =√

(I10◦(gp))2 + (I190◦(gp))

2 (9.8)

LP =P∑p=1

2p−1 × f3(MI1(gp) −MI1(gc))|P=8 (9.9)

f3(x) =

1, if x ≥ 0

0, else(9.10)

140

Appendix

To reduce the feature vector length, uniform patterns have been used. The uniform

pattern refers to the uniform appearance pattern that has limited discontinuities in the

circular binary representation. In this method, those patterns that have less than or

equal to two discontinuities in the circular binary representation are referred to as the

uniform patterns, and the remaining patterns are referred to as nonuniform. Thus, the

distinct uniform patterns for a given query image would be P (P − 1) + 2.

Histogram of LTrP is calculated as follows:

HLTrP (l) =

N1∑j=1

N2∑k=1

f(LTrP (j, k), l), l ∈ [0, 58] (9.11)

f(x, y) =

1 x = y

0 else(9.12)

For each pattern histogram length is 59 (0 to 58 pattern values). In LTrP, total 13

histograms are there, hence total lenght of histogram would be 59× 13.

Local maximum edge binary pattern (LMEBP)

LMEBP was proposed by Murala et al. In this method, for a given image the first

maximum edge is obtained by the magnitude of local difference between the center

pixel and its eight neighbors as shown below:

I′(gi) = I(gi)− I(gc), i = 1, 2, ..., 8 (9.13)

i1 = argi

(max(|I ′(g2)|, ..., |I ′(g8)|)) (9.14)

where max(x) calculates the maximum value in an array x. If this edge is positive,

assign ‘1 to this particular center pixel otherwise‘0.

Inew(gc) = f4(I′(gi1)) (9.15)

f4(x) =

1 x ≥ 0

0 else(9.16)

The LMEBP is defined as

LMEBP (I(gc)) = {Inew(gc); Inew(g1); I

new(g2); ...Inew(g8)} (9.17)

141

Appendix

Eventually, the given image is converted to LMEBP image having values ranging from

0 to 511.

After calculation of LMEBP, the whole image is represented by building a histogram

supported by

HLMEBP (l) =

N1∑j=1

N2∑k=1

f(LMEBP (j, k), l), l ∈ [0, 511] (9.18)

where the size of input image is N1 ×N2. Similarly, the remaining seven LMEBPs

are evaluated using seven maximum edges (second maximum edge to eighth maximum

edge) to obtain eight LMEBP histograms. Hence the feature vector of this method is

8 × 512.

Local edge pattern (LEPSEG and LEPINV)

Local edge patterns (LEP) were proposed by Yao and Chen in 2003 [168]. To compute

the LEP value, an edge image must be obtained first. The edge image is obtained by

applying the Sobel edge detector to gray level image.

LEPSEG(n,m) =∑i,j∈I

ke(i, j)× e(n,m),

ke(i, j) =

1 2 4

128 256 8

64 32 16

(9.19)

where e(n,m) denotes the binary edge image (obtained using Sobel operator), I is a 3

3 neighborhood, ke(i, j) is the LEP mask, and LEPSEG(n,m) is the output LEPSEG

value at the pixel located at (n,m). Accordingly, the LEPSEG histogram he′(01) for a

texture region R is obtained using the following equation:

he′(01)i =

n′(01)i

N (0), i = 0, 1, 2, ..., 511. (9.20)

where n′(01)i is the number of pixels with LEPSEG value i and N is the number of

total pixels in R. The LEPSEG value LEPSEG(n,m) as speci1ed by Eq. 9.19 can

be expressed by a binary string b8b7b6b5b4b3b2b1b0. After the most signi1cant bit corre-

sponding to the central pixel is excluded, a number of binary shifts is then applied to

the 8-bit binary string b8b7b6b5b4b3b2b1b0 until the value represented by the bit string

142

Appendix

is the least value. After the processing, there are only 36 different least values derived

from the 8-bit binary string. Obviously, the 36 values are rotation invariant since only

the sequence of the bit string is concerned rather than its starting point. After the most

signi1cant bit corresponding to the central pixel is excluded from 0000011110, the bit

strings 000011110, 000111100 001111000 11110000, 111000001, 110000011, 100000111

and 000001111 have the same least value 15. However, the 36 values do not describe

whether or not the central pixel is an edge pixel. Thus, if the central pixel is an edge

pixel then 36 is added, leading to a LEPROT value.The LEPROT values are divided

into two parts depending on whether or not the central pixel of the neighborhood is

an edge pixel. In this way two LEPINV histograms,he(0) and he(1) can be obtained for

a texture region R using the following equation:

he(0)i =

niN (0)

(9.21)

he(1)i =

ni+36

N −N (0)(9.22)

where ni is the number of pixels with LEPROT value i and N (0) is the number of total

non-edge pixels in R.

143

Bibliography

[1] “Corel-1k database,” Availble online : http://wang.ist.psu.edu/docs/related/,

(last accessed on 11/12/2015).

[2] “Corel-5k and Corel-10k database,” Available online :

http://www.ci.gxnu.edu.cn/cbir/, (last accessed on 8/10/2014).

[3] “MIT vision and modeling group, Cambridge, vision texture,” Available online :

http://vismod.media.mit.edu/pub/, (last accessed on 11/12/2015).

[4] “Urban and natural scene categories, computational visual cognition

laboratory, Massachusetts Institute of Technology,” Available online :

http://cvcl.mit.edu/database.htm, (last accessed on 11/12/2015).

[5] “The AT&T database of faces, AT&T laboratories Cambridge,” Avaiable

online : http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html,

2002, (last accessed on 11/12/2015).

[6] A. Ahmadian and A. Mostafa, “An efficient texture classification algorithm us-

ing gabor wavelet,” in Proceedings of 25th Annual International Conference of

Engineering in Medicine and Biology Society vol., 1. Cancun, Mexico: IEEE,

2003, pp. 930–933.

145

BIBLIOGRAPHY

[7] T. Ahonen, A. Hadid, and M. Pietikainen, “Face recognition with local bi-

nary patterns,” in Proceedings of 8th European Conference on Computer Vision.

Prague, Czech Republic: Springer, 2004, pp. 469–481.

[8] P. Anantharatnasamy, K. Sriskandaraja, V. Nandakumar, and S. Deegalla, “Fu-

sion of colour, shape and texture features for content based image retrieval,” in

Proceedings of 8th International Conference on Computer Science & Education

(ICCSE). Colombo, Sri Lanka: IEEE, 2013, pp. 422–427.

[9] E. Apostolidis and V. Mezaris, “Fast shot segmentation combining global and lo-

cal visual descriptors,” in Proceedings of the International Conference on Acous-

tics, Speech and Signal Processing (ICASSP). Florence, Italy: IEEE, 2014, pp.

6583–6587.

[10] J. Baber, S. Satoh, N. Afzulpurkar, and M. Bakhtyar, “Q-CSLBP: compression

of CSLBP descriptor,” in Advances in Multimedia Information Processing–PCM.

Singapore: Springer, 2012, pp. 513–521.

[11] R. V. Babu and P. Parate, “Robust tracking with interest points: A sparse

representation approach,” Image and Vision Computing vol., 33, pp. 44–56,

2015.

[12] A. Baraldi and F. Parmiggiani, “An investigation of the textural characteris-

tics associated with gray level cooccurrence matrix statistical parameters,” Geo-

science and Remote Sensing, IEEE Transactions on vol., 33, no. 2, pp. 293–304,

1995.

[13] A. C. Bovik, Handbook of image and video processing. Academic press, 2010.

[14] S. Brandt, J. Laaksonen, and E. Oja, “Statistical shape features for content-

based image retrieval,” Journal of Mathematical Imaging and Vision vol., 17,

no. 2, pp. 187–198, 2002.

[15] R. Brunelli, O. Mich, and C. M. Modena, “A survey on the automatic indexing of

video data,” Journal of Visual Communication and Image Representation vol.,

10, no. 2, pp. 78–112, 1999.

146

BIBLIOGRAPHY

[16] G. Camara-Chavez, F. Precioso, M. Cord, S. Phillip-Foliguet, and

A. de A Araujo, “Shot boundary detection by a hierarchical supervised ap-

proach,” in Proceedings of 14th International Workshop on Systems, Signals and

Image Processing, 2007 and 6th EURASIP Conference focused on Speech and

Image Processing, Multimedia Communications and Services. 14th International

Workshop on. Maribor, Slovenia: IEEE, 2007, pp. 197–200.

[17] T. Celik and T. Tjahjadi, “Multiscale texture classification using dual-tree com-

plex wavelet transform,” Pattern Recognition Letters vol., 30, no. 3, pp. 331–339,

2009.

[18] M. Chen and A. Hauptmann, “Searching for a specific person in broadcast news

video,” in Proceedings of IEEE International Conference on Acoustics, Speech,

and Signal Processing, ICASSP vol., 3. Quebec, Canada: IEEE, 2004, pp.

iii–1036–1039.

[19] J. Choi, Z. Wang, S. C. Lee, and W. J. Jeon, “A spatio-temporal pyramid match-

ing for video retrieval,” Computer Vision and Image Understanding vol., 117,

no. 6, pp. 660–669, 2013.

[20] W. W. Chu, C. C. Hsu, A. F. Cardenas, and R. K. Taira, “Knowledge-based

image retrieval with spatial and temporal constructs,” Knowledge and Data En-

gineering, IEEE Transactions on vol., 10, no. 6, pp. 872–888, 1998.

[21] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid ob-

jects using mean shift,” in Proceedings of International Conference on Computer

Vision and Pattern Recognition vol., 2. Hilton Head Island, South Carolina:

IEEE, 2000, pp. 142–149.

[22] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” Pattern

Analysis and Machine Intelligence, IEEE Transactions on vol., 25, no. 5, pp.

564–577, 2003.

[23] C. Cotsaces, N. Nikolaidis, and I. Pitas, “Video shot detection and condensed

representation. a review,” Signal Processing Magazine, IEEE vol., 23, no. 2, pp.

28–37, 2006.

147

BIBLIOGRAPHY

[24] A. Csillaghy, H. Hinterberger, and A. O. Benz, “Content-based image retrieval

in astronomy,” Information retrieval vol., 3, no. 3, pp. 229–241, 2000.

[25] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and S. Sirotti, “Improving shadow

suppression in moving object detection with hsv color information,” in Proceed-

ings of Conference on Intelligent Transportation Systems. Oakland, California:

IEEE, 2001, pp. 334–339.

[26] M. M. H. Daisy, S. T. Selvi, and J. S. G. Mol, “Combined texture and shape

features for content based image retrieval,” in Proceedings of International Con-

ference on Circuits, Power and Computing Technologies (ICCPCT). Nagercoil,

India: IEEE, 2013, pp. 912–916.

[27] P. P. Dash, D. Patra, and S. K. Mishra, “Local binary pattern as a texture feature

descriptor in object tracking algorithm,” in Intelligent Computing, Networking,

and Informatics. Raipur, India: Springer, 2014, pp. 541–548.

[28] L. S. Davis, S. A. Johns, and J. K. Aggarwal, “Texture analysis using general-

ized co-occurrence matrices,” Pattern Analysis and Machine Intelligence, IEEE

Transactions on vol., 1, no. 3, pp. 251–259, 1979.

[29] P. De Rivaz and N. Kingsbury, “Complex wavelet features for fast texture image

retrieval,” in Proceedings of International Conference on Image Processing vol.,

1. Kobe, Japan: IEEE, 1999, pp. 109–113.

[30] M. Dey, B. Raman, and M. Verma, “A novel colour-and texture-based image

retrieval technique using multi-resolution local extrema peak valley pattern and

rgb colour histogram,” Pattern Analysis and Applications, pp. 1–21, 2015.

[31] F. Dirfaux, “Key frame selection to represent a video,” in Proceedings of Interna-

tional Conference on Image Processing vol., 2. Vancouver, BC, Canada: IEEE,

2000, pp. 275–278.

[32] A. Ekin, “Generic play-break event detection for summarization and hierarchical

sports video analysis,” in Proceedings of International Conference on Multimedia

and Expo (ICME) vol., 1. Balitmore, Maryland: IEEE, 2003, pp. I–169–172.

148

BIBLIOGRAPHY

[33] J. Fehr, “Rotational invariant uniform local binary patterns for full 3d volume

texture analysis,” in Finnish Signal Processing Symposium (FINSIG), Oulu, Fin-

land, 2007.

[34] J. C. Felipe, A. J. M. Traina, and C. Traina Jr, “Retrieval by content of medical

images using texture for tissue identification,” in Proceedings of 16th IEEE Sym-

posium on Computer-Based Medical Systems. New York, USA: IEEE, 2003, pp.

175–180.

[35] T. Gevers and A. W. M. Smeulders, “Pictoseek: Combining color and shape

invariant features for image retrieval,” Image Processing, IEEE Transactions on

vol., 9, no. 1, pp. 102–119, 2000.

[36] A. B. Gonde, R. Maheshwari, and R. Balasubramanian, “Modified curvelet trans-

form with vocabulary tree for content based image retrieval,” Digital Signal Pro-

cessing vol., 23, no. 1, pp. 142–150, 2013.

[37] X. C. Guo and D. Hatzinakos, “Content based image hashing via wavelet and

radon transform,” in Advances in Multimedia Information Processing–PCM 2007.

Hong Kong, China: Springer, 2007, pp. 755–764.

[38] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary pat-

tern operator for texture classification,” Image Processing, IEEE Transactions

on vol., 19, no. 6, pp. 1657–1663, 2010.

[39] R. Gupta, H. Patil, and A. Mittal, “Robust order-based methods for feature

description,” in Proceedings of International Conference on Computer Vision

and Pattern Recognition (CVPR). San Francisco, California: IEEE, 2010, pp.

334–341.

[40] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features for image

classification,” Systems, Man and Cybernetics, IEEE Transactions on vol., 3,

no. 6, pp. 610–621, 1973.

[41] M. Heikkila and M. Pietikainen, “A texture-based method for modeling the back-

ground and detecting moving objects,” Pattern Analysis and Machine Intelli-

gence, IEEE Transactions on vol., 28, no. 4, pp. 657–662, 2006.

149

BIBLIOGRAPHY

[42] M. Heikkila, M. Pietikainen, and C. Schmid, “Description of interest regions

with center-symmetric local binary patterns,” in Computer Vision, Graphics and

Image Processing. Madurai, India: Springer, 2006, pp. 58–69.

[43] L. Houam, A. Hafiane, A. Boukrouche, E. Lespessailles, and R. Jennane, “One di-

mensional local binary pattern for bone texture characterization,” Pattern Anal-

ysis and Applications vol., 17, no. 1, pp. 179–193, 2014.

[44] J. Huang, S. R. Kumar, and M. Mitra, “Combining supervised learning with

color correlograms for content-based image retrieval,” in Proceedings of the 5th

ACM International Conference on Multimedia. New York, USA: ACM, 1997,

pp. 325–334.

[45] J. Huang, S. R. Kumar, M. Mitra, W. J. Zhu, and R. Zabih, “Image indexing

using color correlograms,” in Proceedings of IEEE Computer Society Conference

on Computer Vision and Pattern Recognition. San Juan, Puerto Rico: IEEE,

1997, pp. 762–768.

[46] P. W. Huang and S. K. Dai, “Image retrieval by texture similarity,” Pattern

recognition vol., 36, no. 3, pp. 665–679, 2003.

[47] R. M. Jacob and D. Narmadha, “A literature analysis of object tracking and

interactive modeling in videos for augmented reality,” International Journal of

Engineering Research & Technology vol., 3, no. 1, pp. 879–884, 2014.

[48] A. K. Jain and A. Vailaya, “Image retrieval using color and shape,” Pattern

recognition vol., 29, no. 8, pp. 1233–1244, 1996.

[49] K. P. Jasmine and P. R. Kumar, “Integration of HSV color histogram and

LMEBP joint histogram for multimedia image retrieval,” in Intelligent Comput-

ing, Networking, and Informatics. Raipur, India: Springer, 2014, pp. 753–762.

[50] U. Jayaraman, S. Prakash, and P. Gupta, “An efficient color and texture based

iris image retrieval technique,” Expert Systems with Applications vol., 39, no. 5,

pp. 4915–4926, 2012.

150

BIBLIOGRAPHY

[51] I. Jeena Jacob, K. G. Srinivasagan, and K. Jayapriya, “Local oppugnant color

texture pattern for image retrieval system,” Pattern Recognition Letters vol., 42,

pp. 72–78, 2014.

[52] S. Jeong, “Histogram-based color image retrieval,” Psych221/EE362 Project Re-

port, 2001.

[53] S. Jeong, C. S. Won, and R. M. Gray, “Image retrieval using color histograms

generated by gauss mixture vector quantization,” Computer Vision and Image

Understanding vol., 94, no. 1, pp. 44–66, 2004.

[54] N. Jhanwar, S. Chaudhuri, G. Seetharaman, and B. Zavidovique, “Content based

image retrieval using motif cooccurrence matrix,” Image and Vision Computing

vol., 22, no. 14, pp. 1211–1220, 2004.

[55] B. F. Jones, G. Schaefer, and S. Y. Zhu, “Content-based image retrieval for

medical infrared images,” in Proceedings of 26th Annual International Conference

on Medicine and Biology Society, (IEMBS) vol., 1. San Francisco, California:

IEEE, 2004, pp. 1186–1187.

[56] H. B. Kekre and S. D. Thepade, “Color based image retrieval using amendment

of block truncation coding with YCbCr color space,” International Journal of

Imaging and Robotics vol., 2, no. A09, pp. 2–14, 2009.

[57] M. L. Kherfi, D. Ziou, and A. Bernardi, “Image retrieval from the world wide

web: Issues, techniques, and systems,” ACM Computing Surveys (CSUR) vol.,

36, no. 1, pp. 35–67, 2004.

[58] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Texture image retrieval using

new rotated complex wavelet filters,” Systems, Man, and Cybernetics, Part B:

Cybernetics, IEEE Transactions on vol., 35, no. 6, pp. 1168–1178, 2005.

[59] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Rotation-invariant texture image

retrieval using rotated complex wavelet filters,” Systems, Man, and Cybernetics,

Part B: Cybernetics, IEEE Transactions on vol., 36, no. 6, pp. 1273–1282, 2006.

151

BIBLIOGRAPHY

[60] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Texture image retrieval using

rotated wavelet filters,” Pattern Recognition Letters vol., 28, no. 10, pp. 1240–

1249, 2007.

[61] D. Koubaroulis, J. Matas, and J. Kittler, “Colour-based image retrieval from

video sequences,” in 3rd UK Conference on Image Retrieval, 2000, pp. 1–12.

[62] V. Kovalev and M. Petrou, “Multidimensional co-occurrence matrices for object

recognition and matching,” Graphical Models and Image Processing vol., 58,

no. 3, pp. 187–197, 1996.

[63] M. S. Kumar and Y. S. Kumaraswamy, “A boosting frame work for improved

content based image retrieval,” Indian Journal of Science and Technology vol.,

6, no. 4, pp. 4312–4316, 2013.

[64] M. Kuzu, M. S. Islam, and M. Kantarcioglu, “Efficient similarity search over

encrypted data,” in Proceedings of 28th International Conference on Data Engi-

neering (ICDE). Washington, DC, US: IEEE, 2012, pp. 1156–1167.

[65] R. Kwitt and P. Meerwald, “Salzburg texture image database,” Sep 2012,

avaiable online : http://www.wavelab.at/sources/STex (last accessed on

11/12/2015).

[66] A. Laine and J. Fan, “Texture classification by wavelet packet signatures,” Pat-

tern Analysis and Machine Intelligence, IEEE Transactions on vol., 15, no. 11,

pp. 1186–1191, 1993.

[67] S. Liao, M. W. K. Law, and A. C. S. Chung, “Dominant local binary patterns for

texture classification,” Image Processing, IEEE Transactions on vol., 18, no. 5,

pp. 1107–1118, 2009.

[68] C. H. Lin, R. T. Chen, and Y. K. Chan, “A smart content-based image retrieval

system based on color and texture feature,” Image and Vision Computing vol.,

27, no. 6, pp. 658–665, 2009.

[69] J. Liu, P. Carr, R. T. Collins, and Y. Liu, “Tracking sports players with context-

conditioned motion models,” in Proceedings of International Conference on Com-

152

BIBLIOGRAPHY

puter Vision and Pattern Recognition (CVPR). Portland, Oregon: IEEE, 2013,

pp. 1830–1837.

[70] Y. Liu, D. Zhang, G. Lu, and W. Y. Ma, “A survey of content-based image

retrieval with high-level semantics,” Pattern Recognition vol., 40, no. 1, pp.

262–282, 2007.

[71] E. Loupias, N. Sebe, S. Bres, and J. M. Jolion, “Wavelet-based salient points for

image retrieval,” in Proceedings of International Conference on Image Processing

vol., 2. Vancouver, BC, Canada: IEEE, 2000, pp. 518–521.

[72] W. Lu, A. Swaminathan, A. L. Varna, and M. Wu, “Enabling search over en-

crypted multimedia databases,” in IS&T/SPIE Electronic Imaging vol., 7254,

725418. San Jose, California: International Society for Optics and Photonics,

February 2009.

[73] W. Lu, A. L. Varna, A. Swaminathan, and M. Wu, “Secure image retrieval

through feature protection,” in IEEE International Conference on Acoustics,

Speech and Signal Processing. Taipei, Taiwan: IEEE, 2009, pp. 1533–1536.

[74] W. Lu, A. L. Varna, and M. Wu, “Confidentiality-preserving image search: A

comparative study between homomorphic encryption and distance-preserving

randomization,” Access, IEEE vol., 2, pp. 125–141, 2014.

[75] W. Y. Ma and B. S. Manjunath, “Texture-based pattern retrieval from image

databases,” Multimedia Tools and Applications vol., 2, no. 1, pp. 35–51, 1996.

[76] B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of

image data,” Pattern Analysis and Machine Intelligence, IEEE Transactions on

vol., 18, no. 8, pp. 837–842, 1996.

[77] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7: Multi-

media Content Description Interface. John Wiley & Sons, 2002 vol., 1.

[78] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, and R. L.

Buckner, “Open access series of imaging studies (oasis) : crosssectional mri data

in young, middle aged, nondemented, and demented older adults,” Journal of

cognitive neuroscience vol., 19, no. 9, pp. 1498–1507, 2007.

153

BIBLIOGRAPHY

[79] B. M. Mehtre, M. S. Kankanhalli, and W. F. Lee, “Shape measures for content

based image retrieval: a comparison,” Information Processing & Management

vol., 33, no. 3, pp. 319–337, 1997.

[80] S. Moore and R. Bowden, “Local binary patterns for multi-view facial expression

recognition,” Computer Vision and Image Understanding vol., 115, no. 4, pp.

541–558, 2011.

[81] H. Muller, N. Michoux, D. Bandon, and A. Geissbuhler, “A review of content-

based image retrieval systems in medical applicationsclinical benefits and future

directions,” International Journal of Medical Informatics vol., 73, no. 1, pp.

1–23, 2004.

[82] H. Muller, W. Muller, D. M. Squire, S. Marchand-Maillet, and T. Pun, “Per-

formance evaluation in content-based image retrieval: overview and proposals,”

Pattern Recognition Letters vol., 22, no. 5, pp. 593–601, 2001.

[83] H. Muller, A. Rosset, J. P. Vallee, and A. Geissbuhler, “Comparing features

sets for content-based image retrieval in a medical-case database,” in Medical

Imaging. San Diego, California: International Society for Optics and Photonics,

2004, pp. 99–109.

[84] S. Murala, Q. Jonathan Wu, R. P. Maheshwari, and R. Balasubramanian, “Mod-

ified color motif co-occurrence matrix for image indexing and retrieval,” Com-

puters & Electrical Engineering vol., 39, no. 3, pp. 762–774, 2013.

[85] S. Murala, R. P. Maheshwari, and R. Balasubramanian, “Local maximum edge

binary patterns: a new descriptor for image retrieval and object tracking,” Signal

Processing vol., 92, no. 6, pp. 1467–1479, 2012.

[86] S. Murala, A. B. Gonde, and R. P. Maheshwari, “Color and texture features for

image indexing and retrieval,” in International Advance Computing Conference,

(IACC). Patiala, India: IEEE, 2009, pp. 1411–1416.

[87] S. Murala and Q. Jonathan Wu, “Local ternary co-occurrence patterns: A new

feature descriptor for MRI and CT image retrieval,” Neurocomputing vol., 119,

no. 6, pp. 399–412, 2013.

154

BIBLIOGRAPHY

[88] S. Murala and Q. Jonathan Wu, “Peak valley edge patterns: A new descriptor for

biomedical image indexing and retrieval,” in IEEE Conference on Computer Vi-

sion and Pattern Recognition Workshops (CVPRW). Portland, Oregon: IEEE,

2013, pp. 444–449.

[89] S. Murala and Q. Jonathan Wu, “Expert content-based image retrieval system

using robust local patterns,” Journal of Visual Communication and Image Rep-

resentation vol., 25, no. 6, pp. 1324–1334, 2014.

[90] S. Murala and Q. Jonathan Wu, “Local mesh patterns versus local binary pat-

terns: biomedical image indexing and retrieval,” Biomedical and Health Infor-

matics, IEEE Journal of vol., 18, no. 3, pp. 929–938, 2014.

[91] S. Murala and Q. Jonathan Wu, “Mri and ct image indexing and retrieval using

local mesh peak valley edge patterns,” Signal Processing: Image Communication

vol., 29, no. 3, pp. 400–409, 2014.

[92] S. Murala and Q. Jonathan Wu, “Spherical symmetric 3D local ternary patterns

for natural, texture and biomedical image indexing and retrieval,” Neurocomput-

ing vol., 149, pp. 1502–1514, 2015.

[93] S. Murala, Q. Jonathan Wu, R. Balasubramanian, and R. P. Maheshwari, “Joint

histogram between color and local extrema patterns for object tracking,” in

IS&T/SPIE Electronic Imaging vol., 8663, 86630T. Burlingame, California:

International Society for Optics and Photonics, March 2013.

[94] S. Murala, R. P. Maheshwari, and R. Balasubramanian, “Directional binary

wavelet patterns for biomedical image indexing and retrieval,” Journal of medical

systems vol., 36, no. 5, pp. 2865–2879, 2012.

[95] S. Murala, R. P. Maheshwari, and R. Balasubramanian, “Directional local ex-

trema patterns: a new descriptor for content based image retrieval,” International

Journal of Multimedia Information Retrieval vol., 1, no. 3, pp. 191–203, 2012.

[96] S. Murala, R. P. Maheshwari, and R. Balasubramanian, “Local tetra patterns:

a new feature descriptor for content-based image retrieval,” Image Processing,

IEEE Transactions on vol., 21, no. 5, pp. 2874–2886, 2012.

155

BIBLIOGRAPHY

[97] J. Nam and A. H. Tewfik, “Speaker identification and video analysis for hierar-

chical video shot classification,” in Proceedings of International Conference on

Image Processing vol., 2. Santa Barbara, California: IEEE, 1997, pp. 550–553.

[98] L. Nanni, A. Lumini, and S. Brahnam, “Local binary patterns variants as texture

descriptors for medical image analysis,” Artificial intelligence in medicine vol.,

49, no. 2, pp. 117–125, 2010.

[99] F. Nian, T. Li, X. Wu, Q. Gao, and F. Li, “Efficient near-duplicate image de-

tection with a local-based binary representation,” Multimedia Tools and Appli-

cations, pp. 1–18, 2015.

[100] A. Nigam, V. Krishna, A. Bendale, and P. Gupta, “Iris recognition using block lo-

cal binary patterns and relational measures,” in Proceedings of Biometrics IEEE

International Joint Conference on (IJCB). Clearwater, Florida: IEEE, 2014,

pp. 1–6.

[101] S. Nigam and A. Khare, “Multiresolution approach for multiple human detection

using moments and local binary patterns,” Multimedia Tools and Applications

vol., 74, no. 17, pp. 1–26, 2014.

[102] J. Ning, L. Zhang, D. Zhang, and C. Wu, “Robust object tracking using joint

color-texture histogram,” International Journal of Pattern Recognition and Ar-

tificial Intelligence vol., 23, no. 07, pp. 1245–1263, 2009.

[103] R. Nosaka, Y. Ohkawa, and K. Fukui, “Feature extraction based on co-occurrence

of adjacent local binary patterns,” in Advances in Image and Video Technology.

Gwangju, South Korea: Springer, 2012, pp. 82–91.

[104] R. Nosaka, C. H. Suryanto, and K. Fukui, “Rotation invariant co-occurrence

among adjacent LBPs,” in Computer Vision-ACCV Workshops. Daejeon, Korea:

Springer, 2013, pp. 15–25.

[105] T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture

measures with classification based on featured distributions,” Pattern recognition

vol., 29, no. 1, pp. 51–59, 1996.

156

BIBLIOGRAPHY

[106] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rota-

tion invariant texture classification with local binary patterns,” Pattern Analysis

and Machine Intelligence, IEEE Transactions on vol., 24, no. 7, pp. 971–987,

2002.

[107] P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton, “Trecvid 2005-an overview,”

in TRECVid - Text REtrieval Conference TRECVID Workshop. Gaithersburg,

Maryland: NIST, November 2005.

[108] C. Palm, “Color texture classification by integrative co-occurrence matrices,”

Pattern Recognition vol., 37, no. 5, pp. 965–976, 2004.

[109] G. A. Papakostas, D. E. Koulouriotis, E. G. Karakasis, and V. D. Tourassis,

“Momeant-based local binary patterns: A novel descriptor for invariant pattern

recognition applications,” Neurocomputing vol., 99, pp. 358–371, 2013.

[110] S. S. Park, K. K. Seo, and D. S. Jang, “Expert system based on artificial neural

networks for content-based image retrieval,” Expert Systems with Applications

vol., 29, no. 3, pp. 589–597, 2005.

[111] M. Partio, B. Cramariuc, M. Gabbouj, and A. Visa, “Rock texture retrieval

using gray level co-occurrence matrix,” in Proceedings of the 5th Nordic Signal

Processing Symposium vol., 75. Citeseer, 2002.

[112] S. Parui and A. Mittal, “Similarity-invariant sketch-based image retrieval in large

databases,” in Proceedings of 13th European Conference on Computer Vision

(ECCV). Zurich, Switzerland: Springer, 2014, pp. 398–414.

[113] G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vec-

tors,” in Proceedings of the 4th International Conference on Multimedia. New

York, USA: ACM, 1997, pp. 65–73.

[114] L. Paulhac, P. Makris, and J.-Y. Ramel, “Comparison between 2D and 3D local

binary pattern methods for characterisation of three-dimensional textures,” in

Image Analysis and Recognition. Pvoa de Varzim, Portugal: Springer, 2008, pp.

670–679.

157

BIBLIOGRAPHY

[115] F. Pernici and A. Del Bimbo, “Object tracking by oversampling local features,”

Pattern Analysis and Machine Intelligence, IEEE Transactions on vol., 36,

no. 12, pp. 2538–2551, 2014.

[116] M. Petkovic, “Content-based video retrieval,” in Proceedings of 7th Conference

on Extending DataBase Technology, Ph. D. Workshop. Konstanz, Germany:

University of Konstanz, 2000, pp. 74–77.

[117] K. H. Phyu, A. Kutics, and A. Nakagawa, “Self-adaptive feature extraction

scheme for mobile image retrieval of flowers,” in Proceedings of 8th International

Conference on Signal Image Technology and Internet Based Systems (SITIS).

Naples, Italy: IEEE, 2012, pp. 366–373.

[118] M. Pietikainen, T. Ojala, and Z. Xu, “Rotation-invariant texture classification

using feature distributions,” Pattern Recognition vol., 33, no. 1, pp. 43–52, 2000.

[119] S. Piramanayagam, E. Saber, N. D. Cahill, and D. Messinger, “Shot bound-

ary detection and label propagation for spatio-temporal video segmentation,” in

IS&T/SPIE Electronic Imaging vol., 9405, 94050D. San Francisco, California:

International Society for Optics and Photonics, 2015.

[120] X. Qian, X. S. Hua, P. Chen, and L. Ke, “Plbp: An effective local binary patterns

texture descriptor with pyramid representation,” Pattern Recognition vol., 44,

no. 10, pp. 2502–2515, 2011.

[121] P. V. B. Reddy and A. R. M. Reddy, “Content based image indexing and retrieval

using directional local extrema and magnitude patterns,” AEU-International

Journal of Electronics and Communications vol., 68, no. 7, pp. 637–643, 2014.

[122] P. Reungjitranon and O. Chitsobhuk, “Weather map image retrieval using con-

nected color region,” in International Symposium on Communications and Infor-

mation Technologies, (ISCIT). Vientiane, Laos: IEEE, 2008, pp. 464–467.

[123] F. Roberti de Siqueira, W. Robson Schwartz, and H. Pedrini, “Multi-scale gray

level co-occurrence matrices for texture description,” Neurocomputing vol., 120,

pp. 336–345, 2013.

158

BIBLIOGRAPHY

[124] Y. Rui, T. S. Huang, and S.-F. Chang, “Image retrieval: Current techniques,

promising directions, and open issues,” Journal of Visual Communication and

Image Representation vol., 10, no. 1, pp. 39–62, 1999.

[125] Y. Rui, T. S. Huang, and S. Mehrotra, “Exploring video structure beyond the

shots,” in Proceedings of International Conference on Multimedia Computing and

Systems. Austin, Texas: IEEE, 1998, pp. 237–240.

[126] P. R. Sabbu, U. Ganugula, S. Kannan, and B. Bezawada, “An oblivious im-

age retrieval protocol,” in IEEE Workshops of International Conference on Ad-

vanced Information Networking and Applications (WAINA). Biopolis, Singa-

pore: IEEE, 2011, pp. 349–354.

[127] A. Safia and D. He, “New brodatz-based image databases for grayscale color

and multiband texture analysis,” ISRN Machine Vision, pp. 1–14, 2013, avail-

able online : http://multibandtexture.recherche.usherbrooke.ca/ (last accessed

on 11/12/2015).

[128] C. Shan, S. Gong, and P. W. McOwan, “Robust facial expression recognition

using local binary patterns,” in Proceedings of International Conference on Image

Processing vol., 2. Genova, Italy: IEEE, 2005, pp. II–370–373.

[129] C. Shan, S. Gong, and P. W. McOwan, “Facial expression recognition based on

local binary patterns: A comprehensive study,” Image and Vision Computing

vol., 27, no. 6, pp. 803–816, 2009.

[130] S. Sharma and P. Khanna, “ROI segmentation using local binary image,” in

Proceedings of International Conference on Control System, Computing and En-

gineering (ICCSCE). Penang, Malaysia: IEEE, 2013, pp. 136–141.

[131] K. She, G. Bebis, H. Gu, and R. Miller, “Vehicle tracking using on-line fusion

of color and shape features,” in Proceedings of the 7th International Conference

on Intelligent Transportation Systems. Washington, DC, US: IEEE, 2004, pp.

731–736.

159

BIBLIOGRAPHY

[132] P. Shih and C. Liu, “Comparative assessment of content-based face image re-

trieval in different color spaces,” International Journal of Pattern Recognition

and Artificial Intelligence vol., 19, no. 07, pp. 873–893, 2005.

[133] M. Sifuzzaman, M. R. Islam, and M. Z. Ali, “Application of wavelet transform

and its advantages compared to fourier transform,” Journal of Physical Sciences

vol., 13, pp. 121–134, 2009.

[134] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based

image retrieval at the end of the early years,” Pattern Analysis and Machine

Intelligence, IEEE Transactions on vol., 22, no. 12, pp. 1349–1380, 2000.

[135] A. R. Smith, “Color gamut transform pairs,” in ACM Siggraph Computer Graph-

ics vol., 12, no. 3, New York, USA, 1978, pp. 12–19.

[136] D. A. Socolinsky, A. Selinger, and J. D. Neuheisel, “Face recognition with visible

and thermal infrared imagery,” Computer Vision and Image Understanding vol.,

91, no. 1, pp. 72–114, 2003.

[137] D. A. Socolinsky, L. B. Wolff, J. D. Neuheisel, and C. K. Eveland, “Illumination

invariant face recognition using thermal infrared imagery,” in Proceedings of the

IEEE Computer Society Conference on Computer Vision and Pattern Recognition

vol., 1. Kauai, Hawaii: IEEE, 2001, pp. I–527–534.

[138] S. Srivastave and S. Agarwal, “Rotation invariant texture based image indexing

and retrieval,” in 5th IEEE International Conference on Advanced Computing

and Communication Technologies, Haryana, India, 2011, pp. 139–142.

[139] M. A. Stricker and M. Orengo, “Similarity of color images,” in IS&T/SPIE’s

Symposium on Electronic Imaging: Science & Technology. San Jose, California:

International Society for Optics and Photonics, 1995, pp. 381–392.

[140] S. Sural, G. Qian, and S. Pramanik, “Segmentation and histogram generation

using the HSV color space for image retrieval,” in Proceedings of International

Conference on Image Processing vol., 2. Rochester, NY, USA: IEEE, 2002, pp.

II–589–592.

160

BIBLIOGRAPHY

[141] M. J. Swain and D. H. Ballard, “Indexing via color histograms,” in Active Per-

ception and Robot Vision. Springer, 1992 vol., 83, pp. 261–273.

[142] V. Takala, T. Ahonen, and M. Pietikainen, “Block-based methods for image

retrieval using local binary patterns,” in Image Analysis. Joensuu, Finland:

Springer, 2005, pp. 882–891.

[143] V. Takala and M. Pietikainen, “Multi-object tracking using color, texture and

motion,” in Proceedings of International Conference on Computer Vision and

Pattern Recognition, (CVPR). Minneapolis, Minnesota: IEEE, 2007, pp. 1–7.

[144] X. Tan and B. Triggs, “Enhanced local texture feature sets for face recognition

under difficult lighting conditions,” in Analysis and Modeling of Faces and Ges-

tures. Rio de Janeiro, Brazil: Springer, 2007, pp. 168–182.

[145] S. Tippaya, S. Sitjongsataporn, T. Tan, and K. Chamnongthai, “Abrupt shot

boundary detection based on averaged two-dependence estimators learning,” in

Prceeding of 14th International Symposium on Communications and Information

Technologies (ISCIT). Incheon, South Korea: IEEE, 2014, pp. 522–526.

[146] A. J. M. Traina, C. A. B. Castanon, and C. Traina Jr, “MultiWaveMed: a system

for medical image retrieval through wavelets transformations,” in Proceedings of

the 16th Symposium Computer-Based Medical Systems. New York, USA: IEEE,

2003, pp. 150–155.

[147] A. Vadivel, S. Sural, and A. K. Majumdar, “An integrated color and intensity

co-occurrence matrix,” Pattern Recognition Letters vol., 28, no. 8, pp. 974–983,

2007.

[148] J. C. Van Gemert, C. J. Veenman, and J. M. Geusebroek, “Episode-constrained

cross-validation in video concept retrieval,” Multimedia, IEEE Transactions on

vol., 11, no. 4, pp. 780–786, 2009.

[149] R. C. Veltkamp and T. M., “Content-based image retrieval systems: A survey,”

Dept. of Computing Science, Utrecht University, Technical Report UU-CS-2000-

34, 2000.

161

BIBLIOGRAPHY

[150] M. Verma, B. Raman, and S. Murala, “Multi-resolution local extrema patterns

using discrete wavelet transform,” in Proceedings of 7th International Conference

on Contemporary Computing (IC3). Noida, India: IEEE, 2014, pp. 577–582.

[151] M. Verma, B. Raman, and S. Murala, “Wavelet based directional local extrema

patterns for image retrieval on large image database,” in 2nd International Con-

ference on Advances in Computing and Communication Engineering. Dehradun,

India: IEEE, 2015, pp. 649–654.

[152] M. Verma and B. Raman, “Center symmetric local binary co-occurrence pattern

for texture, face and bio-medical image retrieval,” Journal of Visual Communi-

cation and Image Representation vol., 32, pp. 224–236, 2015.

[153] M. Verma, B. Raman, and S. Murala, “Local extrema co-occurrence pattern for

color and texture image retrieval,” Neurocomputing vol., 165, pp. 255–269, 2015.

[154] S. K. Vipparthi and S. K. Nagar, “Multi-joint histogram based modelling for

image indexing and retrieval,” Computers & Electrical Engineering vol., 40,

no. 8, pp. 163–173, 2014.

[155] M. Visser, “Feature fusion for efficient content-based video retrieval,” Ph.D. dis-

sertation, TU Delft, Delft University of Technology, 2013.

[156] J. Z. Wang, G. Wiederhold, O. Firschein, and S. X. Wei, “Content-based image

indexing and searching using Daubechies’ wavelets,” International Journal on

Digital Libraries vol., 1, no. 4, pp. 311–328, 1998.

[157] L. Wang, T. Liu, G. Wang, K. L. Chan, and Q. Yang, “Video tracking using

learned hierarchical features,” Image Processing, IEEE Transactions on vol.,

24, no. 4, pp. 1424–1435, 2015.

[158] X. Wang, H. Gong, H. Zhang, B. Li, and Z. Zhuang, “Palmprint identification

using boosting local binary pattern,” in Proceedings of 18th International Con-

ference on Pattern Recognition, (ICPR) vol., 3. Hong Kong, China: IEEE,

2006, pp. 503–506.

162

BIBLIOGRAPHY

[159] Y. Wang and D. Hatzinakos, “Random translational transformation for change-

able face verification,” in Proceedings of 16th International Conference on Digital

Signal Processing. Santorini, South Aegean: IEEE, 2009, pp. 1–6.

[160] Y. Wang, Z. C. Mu, and H. Zeng, “Block-based and multi-resolution methods

for ear recognition using wavelet transform and uniform local binary patterns,”

in Proceedings of 19th International Conference on Pattern Recognition, (ICPR).

Tampa, Florida: IEEE, 2008, pp. 1–4.

[161] W. Wolf, “Key frame selection by motion analysis,” in Proceedings of Interna-

tional Conference on Acoustics, Speech, and Signal Processing, (ICASSP) vol.,

2. Atlanta, Georgia: IEEE, 1996, pp. 1228–1231.

[162] Y. Xia, S. Wan, and L. Yue, “Local spatial binary pattern: A new feature de-

scriptor for content-based image retrieval,” in Proceedings of 5th International

Conference on Graphic and Image Processing vol., 9069, 90691K. Hong Kong,

China: International Society for Optics and Photonics, 2014.

[163] Y. Xia, S. Wan, and L. Yue, “A new texture direction feature descriptor and

its application in content-based image retrieval,” in Proceedings of the 3rd Inter-

national Conference on Multimedia Technology, (ICMT). Guangzhou, China:

Springer, 2014, pp. 143–151.

[164] G. Xue, J. Sun, and L. Song, “Dynamic background subtraction based on spatial

extended center-symmetric local binary pattern,” in Prceedings of International

Conference on Multimedia and Expo, (ICME). Suntec City, Singapore: IEEE,

2010, pp. 1050–1054.

[165] L. Yang, Y. Cai, A. Hanjalic, X. S. Hua, and S. Li, “Video-based image retrieval,”

in Proceedings of the 19th ACM International Conference on Multimedia. New

York, USA: ACM, 2011, pp. 1001–1004.

[166] L. Yang, X.-S. Hua, and Y. Cai, “Searching for images by video,” US Patent

US20 120 294 477 A1, May 18, 2012, US Patent App. 13/110,708.

[167] M. Yang, K. Kpalma, and J. Ronsin, “A survey of shape feature extraction

techniques,” Peng-Yeng Yin. Pattern Recognition, IN-TECH, pp. 43–90, 2008.

163

BIBLIOGRAPHY

[168] C. H. Yao and S. Y. Chen, “Retrieval of translated, rotated and scaled color

textures,” Pattern Recognition vol., 36, no. 4, pp. 913–929, 2003.

[169] M. Yeung, B.-L. Yeo, and B. Liu, “Extracting story units from long programs for

video browsing and navigation,” in Proceedings of 3rd International Conference

on Multimedia Computing and Systemss. Hiroshima, Japan: IEEE, 1996, pp.

296–305.

[170] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” Acm computing

surveys (CSUR) vol., 38, no. 4, p. 13, 2006.

[171] H.-W. Yoo, S.-H. Jung, D.-S. Jang, and Y.-K. Na, “Extraction of major object

features using vq clustering for content-based image retrieval,” Pattern Recogni-

tion vol., 35, no. 5, pp. 1115–1126, 2002.

[172] H. H. Yu and W. Wolf, “A hierarchical multiresolution video shot transition

detection scheme,” Computer Vision and Image Understanding vol., 75, no. 1,

pp. 196–213, 1999.

[173] H. Yu, M. Li, H. J. Zhang, and J. Feng, “Color texture moments for content-based

image retrieval,” in Proceedings of International Conference on Image Processing

vol., 3. Rochester, NY, USA: IEEE, 2002, pp. 929–932.

[174] F. Yuan, “Rotation and scale invariant local binary pattern based on high order

directional derivatives for texture classification,” Digital Signal Processing vol.,

26, pp. 142–152, 2014.

[175] B. Zhang, Y. Gao, S. Zhao, and J. Liu, “Local derivative pattern versus local

binary pattern: face recognition with high-order local pattern descriptor,” Image

Processing, IEEE Transactions on vol., 19, no. 2, pp. 533–544, 2010.

[176] D. Zhang, A. Wong, M. Indrawan, and G. Lu, “Content-based image retrieval

using gabor texture features,” in IEEE Pacific-Rim Conference on Multimedia,

Sydney, Australia, 2000, pp. 392–395.

[177] J. Zhang, G. L. Li, and S. W. He, “Texture-based image retrieval by edge detec-

tion matching GLCM,” in Proceedings of 10th International Conference on High

164

BIBLIOGRAPHY

Performance Computing and Communications, (HPCC). Dalian, China: IEEE,

2008, pp. 782–786.

[178] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gabor binary

pattern histogram sequence (lgbphs): A novel non-statistical model for face rep-

resentation and recognition,” in Proceedings of 10th International Conference on

Computer Vision, (ICCV) vol., 1. Beijing, China: IEEE, 2005, pp. 786–791.

[179] G. Zhao and M. Pietikainen, “Local binary pattern descriptors for dynamic tex-

ture recognition,” in Proceedings of 18th International Conference on Pattern

Recognition, (ICPR) vol., 2. Hong Kong, China: IEEE, 2006, pp. 211–214.

[180] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary

patterns with an application to facial expressions,” Pattern Analysis and Machine

Intelligence, IEEE Transactions on vol., 29, no. 6, pp. 915–928, 2007.

165

Author’s Publications

International Journals

1. Manisha Verma, Balasubramanian Raman and Subrahmanyam Murala, “Local

Extrema Co-occurrence Pattern for Color and Texture Image Retrieval,” Neuro-

computing (Elsevier), vol. 165, pp. 255−269, 2015 (IF 2.005).

2. Manisha Verma and Balasubramanian Raman, “Center Symmetric Local Bi-

nary Co-occurrence Pattern for Texture, Face and Bio-medical Image Retrieval,”

Journal of Visual Communication and Image Representation (Elsevier), vol. 32,

pp. 224−236, 2015 (IF 1.218).

3. Manisha Verma and Balasubramanian Raman, “Local Tri-Directional Patterns

: A New Feature Descriptor for Texture and Face Image Retrieval,” Digital Signal

Processing, (Elsevier),vol. 51, pp. 62−72, 2016 (IF 1.256).

4. Madhumanti Dey, Balasubramanian Raman and Manisha Verma, “A novel

colour and texture based image retrieval technique using multi-resolution local

extrema peak valley pattern and RGB colour histogram,” Pattern Analysis and

Applications (Springer), pp. 1−21, 2015, (IF 0.646).

5. Manisha Verma and Balasubramanian Raman, “Local Neighborhood Differ-

ence Pattern : A New Feature Descriptor for Large Scale Natural and Texture

167

BIBLIOGRAPHY

Image Retrieval”, Pattern Analysis and Applications (Springer).(First revision

submitted)

International Conferences

6. Manisha Verma and Balasubramanian Raman, “Object Tracking using Joint

Histogram of Color and Local Rhombus Pattern,” IEEE International Conference

on Signal and Image Processing Applications (ICSIPA), pp. 77−82, October

19−21 2015, Kuala Lumpur, Malaysia. (Best student paper award)

7. Manisha Verma, Balasubramanian Raman and Subrahmanyam Murala, “Multi-

resolution local extrema patterns using discrete wavelet transform,” in Proceed-

ings of 7th IEEE International Conference on Contemporary Computing (IC3),

pp. 577−582, August 7−9, 2014, Noida, India.

8. Manisha Verma, Balasubramanian Raman and Subrahmanyam Murala, “Wavelet

Based Directional Local Extrema Patterns for Image Retrieval on Large Image

Database,” in 2nd IEEE International Conference on Advances in Computing

and Communication Engineering (ICACCE), pp. 649−654, May 1−2, 2015,

Dehradun, India.

9. Asha Rani, Manisha Verma and Balasubramanian Raman, “Fusion of Sub-

manifold and Local Texture Features for Palmprint Authentication,” IEEE Inter-

national Conference on Visual Communications and Image Processing (VCIP),

December 13−16, 2015, Singapore.(In press)

10. Manisha Verma and Balasubramanian Raman, “A Hierarchical Shot Boundary

Detection Algorithm using Global and Local Features,” International Conference

on Computer Vision and Image Processing (CVIP), February 26−28, 2016, Roor-

kee, India. (In press)

11. Manisha Verma, Nitakshi Sood, Partha Pratim Roy and Balasubramanian

Raman, “Script identification in natural scene images : A dataset and texture-

feature based performance evaluation,” International Conference on Computer

Vision and Image Processing (CVIP), February 26−28, 2016, Roorkee, India.

(In press)

168

· indian institute of technology roorkee roorkee candidate’s declaration i hereby certify that...

Documents