automation of preprocessing and recognition of historical document images

231
Automation of Preprocessing and Recognition of Historical Document Images A Thesis submitted to VISVESVARAYA TECHNOLOGICAL UNIVERSITY Belgaum for the award of degree of Doctor of Philosophy in Computer Science & Engineering by B Gangamma Department of Computer Science & Engineering, P E S Institute of Technology - Bangalore South Campus (formerly P E S School of Engineering), Bangalore, Karnataka, India. 2013

Upload: ganga-holi

Post on 22-Jan-2018

143 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Automation of Preprocessing and Recognition of Historical Document Images

Automation of Preprocessing and Recognition of

Historical Document Images

A Thesis submitted to

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Belgaum

for the award of degree of

Doctor of Philosophy inComputer Science & Engineering

by

B Gangamma

Department of Computer Science & Engineering,

P E S Institute of Technology - Bangalore South Campus(formerly P E S School of Engineering), Bangalore, Karnataka, India.

2013

Page 2: Automation of Preprocessing and Recognition of Historical Document Images

2

Page 3: Automation of Preprocessing and Recognition of Historical Document Images

Department of Computer Science & Engineering,

P E S Institute of Technology - Bangalore South Campus

(formerly P E S School of Engineering),

Bangalore, Karnataka, India.

CERTIFICATE

This is to certify that B Gangamma has worked under my supervision

for her doctoral thesis titled “Automation of Preprocessing and

Recognition of Historical Document Images”. I also certify that

the work is original and has not been submitted to any other University

wholly or in part for any other degree.

Dr. Srikanta Murthy K

Professor & Head,

Department of Computer Science & Engineering,

P E S Institute of Technology - Bangalore South Campus

(formerly P E S School of Engineering),

Bangalore, Karnataka, India.

i

Page 4: Automation of Preprocessing and Recognition of Historical Document Images

Department of Computer Science & Engineering,

P E S Institute of Technology - Bangalore South Campus

(formerly P E S School of Engineering),

Bangalore, Karnataka, India

DECLARATION

I hereby declare that the entire work embodied in this doctoral thesis

has been carried out by me at Research Centre, Department of Com-

puter Science & Engineering, P E S Institute of Technology - Bangalore

South Campus(formerly P E S School of Engineering) under the super-

vision of Dr. Srikanta Murthy K. This thesis has not been submitted

in part or full for the award of any diploma or degree of this or any

other University.

B Gangamma

Research scholar

Department of Computer Science & Engineering

P E S Institute of Technology - Bangalore South Campus,

(formerly P E S School of Engineering), Bangalore.

ii

Page 5: Automation of Preprocessing and Recognition of Historical Document Images

Acknowledgements

Any accomplishment requires the efforts of many people and

this work is not an exception. I will be failing in my duty if I

do not express my gratitude to those who have helped in my

endeavor.

With deep gratitude and reverence, I would like to express

my sincere thanks to my research supervisor Dr. Srikanta

Murthy K, Professor & Head, Department of Computer Sci-

ence & Engineering, P E S Institute of Technology - Banga-

lore South Campus, Bangalore, for his constant and untiring

efforts to guide right through the research work. His tremen-

dous enthusiasm, inspiration, and constant support through-

out my research work have encouraged me to complete this

dissertation work. His wide knowledge and logical way of

thinking, detailed and constructive comments have provided

a good basis for the research work and thesis. I would like

to thank Dr. J Suryaprasad, Principal & Director, P E S In-

stitute of Technology - Bangalore South Campus, Bangalore,

for his constant support.

I owe special thanks and sincere gratitude to Mrs. Shylaja

S S, Professor & Head, Department of Information Science &

Page 6: Automation of Preprocessing and Recognition of Historical Document Images

Engineering, P E S Institute of Technology, Bangalore for mo-

tivating, encouraging and providing necessary support to the

complete research and thesis work. I am also thankful to Dr.

S Natarajan, Professor, Department of Information Science

and Engineering, P E S Institute of Technology, Bangalore,

for providing proper directions to my research work. I wish

to express my warm and sincere thanks to Dr. K. N. Bala-

subramanya Murthy, Principal & Director, P E S Institute of

Technology, Bangalore for inspiring me to take up research

and work towards a doctoral degree. I would like to express

sincere thanks to P E S management for providing motivation

and a platform to carry out the research.

I thank whole heartedly Mr. Jayasimha, Mythic Society of

India, Bangalore, for providing me the scanned copies of the

palm leaf manuscripts. My warm thanks are due to Mr. M

P Shelva Thirunarayana, R Narayana Iyangar, Academy of

Sanskrit Research Center, Melukote and Sri. S N Cheluva-

narayana, Principal, Sanskrit College, Melukote, Karnataka,

for providing knowledge about the historical documents along

with sample manuscripts of paper and palm leaf.

I need to put my sincere effort in thanking Dr. Veeresh

Badiger, Professor, Kannada University Hampi, Karnataka,

for providing information about resources and guiding my re-

search work. Further I would like to extend special thanks to

him for providing digitized samples of palm leaf manuscripts.

iv

Page 7: Automation of Preprocessing and Recognition of Historical Document Images

I would like to thank Dr. G. Hemantha Kumar, Professor

& Chairman, Department of Studies in Computer Science,

University of Mysore, for his valuable suggestions and direc-

tions during pre Ph.D. viva voce. I whole heartedly thank

Dr. M Ashwath Kumar, Professor, Department of Infor-

mation Science & Engineering, M S R Institute of Technol-

ogy, Bangalore, for his valuable directions given during pre

Ph.D. viva-voce. I warmly thank Dr. Bhanumathi, Reader

at Manasa Gangothri, Mysore, for providing useful informa-

tion about palm leaf manuscripts. Detailed discussion about

manuscripts and interesting explorations with her has been

very helpful for my work.

I wish to thank Dr. Suryakantha Gangashetty, Assistant

Professor, IIIT Hyderabad, for his suggestions, and Dhanan-

jaya, Archana Ramesh, Dilip, research scholars at IIIT Hy-

derabad, for their valuable discussions. I am grateful to Dr.

Basavaraj Anami, Principal, K L E Institute of Technology,

Hubli, for his guidance and wonderful interactions, which

helped me in shaping my research work properly.

I would like to express my heartfelt thanks to Dr. Punitha

P Swamy, Professor & Head, Department of Master of Com-

puter Application, P E S Institute of Technology, Bangalore,

for her detailed review, constructive criticism and excellent

advice throughout my research work and also during prepa-

ration of the thesis. My sincere thanks to Dr. Avinash N.

v

Page 8: Automation of Preprocessing and Recognition of Historical Document Images

Professor, Department of Information Science & Engineering,

P E S Institute of Technology, Bangalore, for his valuable dis-

cussions during thesis write up.

I owe my most sincere thanks to my brother-in-law Dr.

Mallikarjun Holi, Professor & Head, Department of Bio-medical

Engineering, Bapuji Institute of Engineering & Technology,

Davanagere, for reviewing my thesis and giving valuable sug-

gestions.

I owe my loving thanks to my husband Suresh Holi and

my children Anish and Trisha, who have extended constant

support in completing my work. Without their encourage-

ment and understanding it would have been impossible for

me to finish this work. I express deepest sense of gratitude

to my father-in-law Prof. S. M. Holi, who has motivated me

towards research. His inspiring and encouraging nature has

stimulated me to take up research. I would like to express

my heartfelt thanks to my mother-in-law, Mrs. Rudramma

Holi for her loving support. I also extend my sincere thanks

to my sister-in-laws Dr. Prema S Badami, Mrs. Shivaleela S

Patil, Sharanu Holi, brother-in-law Mr. Sanganna Holi and

their families for giving me moral support.

I express my heartfelt thanks to my parents Mr. Somaraya

Biradar and Mrs. Shivalingamma Biradar for encouraging

and helping me in my activities. I would like to place my grat-

vi

Page 9: Automation of Preprocessing and Recognition of Historical Document Images

itude to my sisters Mrs. Nirmala Marali, Suvarna Patil and

brothers Manjunath Biradar and Vishwanath Biradar along

with their family for providing moral support during my re-

search work.

During this work, I have collaborated with many colleagues

for whom I have great regard, and wish to extend my warmest

thanks to all faculty colleagues, Department of Information

Science and Engineering in P E S Institute of Technology,

Bangalore. I wish to thank my team mates Mr. Arun Vikas,

Jayashree, Mamatha H R, Karthik S and friends Sangeetha

J, Suvarna Nandyal, Srikanth H R for their support. Lastly,

and most importantly, I am indebted to my faculty colleagues

for providing a stimulating and healthy environment to learn

and grow. It is a pleasure to thank many people who have

helped me directly or indirectly and who made this thesis

possible. I also place my sincere gratitude to external review-

ers for providing critical comments which significantly helped

in improving the standard of the thesis. I take this oppor-

tunity to thank VTU e-learning center for having given me

an opportunity to present the template used to prepare my

doctoral thesis using Latex.

B Gangamma

vii

Page 10: Automation of Preprocessing and Recognition of Historical Document Images

DEDICATED TO MY FAMILY,MENTORS AND WELL

WISHERS

Page 11: Automation of Preprocessing and Recognition of Historical Document Images

Abstract

Historical documents are the priceless property of any country and they

provide insight and information about, ancient culture and civilization.

These documents are found in the form of inscriptions on variety of

hard and fragile materials like stone, pillar, rocks, metal plates, palm

leaves, birch leaves, clothes, and papers. Most of these documents are

nearing the end of their natural lifetime and are posed with various

problems due to climatic condition, method of preservation, materials

used to inscribe etc. Some of the problems are due to the worn out

conditions of the material such as brittleness, strained and stained,

sludge and smudge, fossil deposition, fungus attack, dust accumulation,

wear and tear of the material, broken, damaged etc. These damages

create problems in processing the historical documents and make the

inscriptions illegible for reading and make the historical documents

indecipherable.

Although preservation through digitization is in progress by various

organizations, deciphering the documents is very difficult and demands

the expertise of Paleographers and Epigraphists. Since such experts are

less in number and could become extinct in the near future, there is a

need to automate the process of deciphering these document images.

The problems and complexities posed by these documents have led

to the design of a robust system which automates the processing and

deciphering of these document images, and hence demands thorough

preprocessing algorithms to enhance these.

Page 12: Automation of Preprocessing and Recognition of Historical Document Images

The accuracy of the recognition system always depends on the seg-

mented characters and its extracted features. Historical document im-

ages usually pose uneven line space, inscriptions over curved lines, over-

lapping text lines etc., making segmentation of the document difficult.

In addition, the documents also pose challenges like low contrast; dark

and uneven background, blotched (stained) characters etc, usually re-

ferred to as noise. Presence of noise also leads to erroneous segmen-

tation of the document image. Therefore there is a need for thorough

preprocessing techniques to eliminate the noise and enhance the doc-

ument image. To decipher the documents belonging to various era,

we need a character set pertaining to that era. Hence this warrants a

recognition system to recognize the era of the character.

In this context, this research work focuses on developing algorithms:

to preprocess and enhance historical document images of Kannada -

a South Indian language, to eliminate noise, to segment the enhanced

document image into lines and characters and to predict the era of the

scripts.

To preprocess the noisy document images, three image enhancement

algorithms in spatial domain and two algorithms in frequency domain

are proposed. Out of these spatial domain methods, the first method

utilizes the morphological reconstruction technique to eliminate the

dark uneven noisy background. This algorithm is used as background

elimination technique in the other four algorithms proposed for image

enhancement. Although, the gray scale morphological operations elim-

inate noisy dark background, this method fails to enhance, severely

degraded document image and is unable to preserve the sharp edges.

ii

Page 13: Automation of Preprocessing and Recognition of Historical Document Images

To enhance the image by eliminating the noise without smoothing the

edges, a second algorithm is developed using bilateral filter, which com-

bines domain and range filtering. The third algorithm is a non local

means filter algorithm based on similarity measure between non local

windows and it is proposed to denoise the document images.

Frequency domain based transforms and its varied versions are used

in image denoising, feature extraction, compression and reconstruction.

An algorithm based on wavelet transform is developed to analyze and

restore the degraded document images. However wavelet transform

works well in handling the point discontinuity, but fails to handle curve

discontinuity. To overcome the problem of handling curve discontinuity,

curvelet transform based approach is proposed, which provides better

results in comparison with the wavelet transformed approach. The

performances of all the image enhancement techniques are compared

using Peak Signal Noise Ratio (PSNR), computational time and human

visual perception.

Two segmentation algorithms have been developed to address the

problem of segmenting historical document image, one is based on

piecewise projection profile method and the other is based on mor-

phological closing and connected component analysis (CCA). The first

method addresses the uneven line spacing by dividing the image into

vertical pieces, extracting each line from each piece and combining lines

of all the vertical pieces. The second method addresses the problems of

both uneven spacing and the touching (overlapping) lines using closing

operation and CCA.

iii

Page 14: Automation of Preprocessing and Recognition of Historical Document Images

Document skew might be introduced during image capture and needs

to be deskewed. Since the historical documents usually contain uneven

spacing between lines, correcting document skew will not help in seg-

menting the handwritten document image correctly. Uneven line spac-

ing will usually cause multiple skews within the document. To correct

the skew within the document lines, an extended version of the second

segmentation algorithm is developed.

To predict the era of the script/character, curvelet transform based

algorithm is designed to extract the characteristic features and mini-

mum distance classifier is employed to recognize the era of the charac-

ters. To sum up, in this research work: three spatial domain techniques,

two frequency domain based approaches have been implemented for

denoising and enhancing the degraded historical document images and

two segmentation algorithms have been designed to segment the lines

and characters from the document images, one algorithm is designed to

detect and correct the multiple skews within the document and another

algorithm is presented to predict the era of the segmented character so

that the respective character set belonging to that particular era can

be referred in order to decipher the documents.

iv

Page 15: Automation of Preprocessing and Recognition of Historical Document Images

Contents

1 Preface 1

1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Historical Documents . . . . . . . . . . . . . . . . . . . 3

1.2.1 Kannada Scripts/Character . . . . . . . . . . . 6

1.3 Motivation for the Research Work . . . . . . . . . . . . 7

1.3.1 Data Collection . . . . . . . . . . . . . . . . . . 7

1.3.2 Enhancement/Preprocessing . . . . . . . . . . . 10

1.3.3 Segmentation . . . . . . . . . . . . . . . . . . . 12

1.3.4 Feature Extraction and Recognition . . . . . . . 13

1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Organization of the Thesis . . . . . . . . . . . . . . . . 16

2 Literature Survey 17

2.1 Computer Vision . . . . . . . . . . . . . . . . . . . . . 17

2.2 Preprocessing and Segmentation . . . . . . . . . . . . . 18

2.2.1 Enhancement of Historical Document Image . . 24

2.2.2 Segmentation of Historical Documents . . . . . 26

2.3 Character Recognition . . . . . . . . . . . . . . . . . . 28

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 34

i

Page 16: Automation of Preprocessing and Recognition of Historical Document Images

3 Enhancement of Degraded Historical Documents : Spa-

tial Domain Techniques 35

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Gray Scale Morphological Reconstruction (MR) Based

Approach . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 Overview of Mathematical Morphology . . . . . 38

3.2.2 Adaptive Histogram Equalization(AHE) . . . . 42

3.2.3 Gaussian Filter . . . . . . . . . . . . . . . . . . 42

3.2.4 Proposed Methodology . . . . . . . . . . . . . . 43

3.2.5 Results and Discussion . . . . . . . . . . . . . . 48

3.3 Bilateral Filter (BF) Based Approach . . . . . . . . . . 54

3.3.1 Overview of Bilateral Filter . . . . . . . . . . . 55

3.3.2 Proposed Methodology . . . . . . . . . . . . . . 56

3.3.3 Results and Discussion . . . . . . . . . . . . . . 59

3.4 Non Local Means Filter (NLMF) Based Approach . . . 66

3.4.1 Overview of Non Local Means Filter . . . . . . 67

3.4.2 Proposed Algorithm . . . . . . . . . . . . . . . 68

3.4.3 Results and Discussion . . . . . . . . . . . . . . 73

3.5 Discussion of Three Spatial Domain Techniques . . . . 77

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Enhancement of Degraded Historical Documents : Fre-

quency Domain Techniques 84

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2 Wavelet Transform (WT) Based Approach . . . . . . . 85

4.2.1 Overview of Wavelet Transform . . . . . . . . . 86

4.2.2 Denoising Method . . . . . . . . . . . . . . . . 88

4.2.2.1 Thresholding Algorithms . . . . . . . . 88

ii

Page 17: Automation of Preprocessing and Recognition of Historical Document Images

4.2.3 Proposed Methodology . . . . . . . . . . . . . . 90

4.2.3.1 Stage 1: Mathematical Reconstruction 92

4.2.3.2 Stage 2: Denoising by Wavelet Transform 93

4.2.3.3 Stage 3: Postprocessing . . . . . . . . 94

4.2.3.4 Algorithm . . . . . . . . . . . . . . . . 94

4.2.4 Results and Discussions . . . . . . . . . . . . . 94

4.3 Curvelet Transform (CT) Based Approach . . . . . . . 98

4.3.1 Overview of Curvelet Transform . . . . . . . . . 100

4.3.2 Proposed Method . . . . . . . . . . . . . . . . 104

4.3.2.1 Denoising Using Curvelet Transform . 104

4.3.2.2 Algorithm . . . . . . . . . . . . . . . . 104

4.3.3 Results and Discussions . . . . . . . . . . . . . 106

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.5 Discussion on Enhancement Algorithms . . . . . . . . . 108

5 Segmentation of Document Images 116

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2 Proposed Methodologies . . . . . . . . . . . . . . . . . 117

5.3 Method 1: Piece-wise Horizontal Projection Profile Based

Approach . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.3.1 Division into Vertical Strips . . . . . . . . . . . 120

5.3.2 Horizontal Projection Profile of a Strip . . . . . 120

5.3.3 Reconstruction of the Line Using Vertical Strips 120

5.3.4 Character Extraction . . . . . . . . . . . . . . . 122

5.3.5 Algorithm for Document Image Segmentation. . 122

5.3.6 Results and Discussion . . . . . . . . . . . . . . 125

5.4 Method 2: Mathematical Morphology and Connected

Component Analysis(CCA) Based Approach . . . . . . 126

iii

Page 18: Automation of Preprocessing and Recognition of Historical Document Images

5.4.1 Morphological Closing Operation . . . . . . . . 128

5.4.2 Line Extraction Using Connected Components

Analysis . . . . . . . . . . . . . . . . . . . . . . 129

5.4.3 Finding the Height of Each Line and Checking

the Touching Lines. . . . . . . . . . . . . . . . . 130

5.4.4 Character Extraction . . . . . . . . . . . . . . . 130

5.4.5 Algorithm for Segmentation of the Document Im-

age into Lines. . . . . . . . . . . . . . . . . . . . 131

5.4.6 Results and Discussion . . . . . . . . . . . . . . 132

5.5 Discussion on Method 1 and Method 2 . . . . . . . . . 133

5.6 Skew Detection and Correction Algorithm . . . . . . . 135

5.6.1 Skew Angle Detection . . . . . . . . . . . . . . 137

5.6.2 Skew Correction . . . . . . . . . . . . . . . . . . 138

5.6.3 Algorithm for Deskewing . . . . . . . . . . . . . 140

5.6.4 Results and Discussion . . . . . . . . . . . . . . 140

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 144

6 Prediction of Era of Character Using Curvelet Trans-

form Based Approach 146

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 146

6.2 Related Literature . . . . . . . . . . . . . . . . . . . . 147

6.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . 151

6.3.1 Data Set Creation . . . . . . . . . . . . . . . . 152

6.3.2 Preprocessing . . . . . . . . . . . . . . . . . . . 152

6.3.3 Feature Extraction using FDCT . . . . . . . . . 153

6.3.4 Classification . . . . . . . . . . . . . . . . . . . 153

6.3.5 Algorithm for Era Prediction . . . . . . . . . . 153

6.4 Experimentation and Results . . . . . . . . . . . . . . 154

iv

Page 19: Automation of Preprocessing and Recognition of Historical Document Images

6.4.1 Experimentation 1 . . . . . . . . . . . . . . . . 154

6.4.2 Experimentation 2 . . . . . . . . . . . . . . . . 155

6.4.3 Experimentation 3 . . . . . . . . . . . . . . . . 156

6.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . 157

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 159

7 Conclusion and Future Work 160

7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 164

A Palm Leaf Images 167

B Paper Images 170

C Stone Inscription Images 174

D Author’s Publications 178

v

Page 20: Automation of Preprocessing and Recognition of Historical Document Images

List of Figures

1.1 6th Century Ganga Dynasty Inscription. . . . . . . . . 4

1.2 13th Century Hoysala Dynasty Inscription. . . . . . . . 5

1.3 Inscriptions on palm leaf belonging to 16th − 18th century. 6

1.4 Stone inscription belonging to 3rd century BC. . . . . . 7

3.1 (a) Input image. (b) Result of binary morphological

dilation operation. (c) Result of binary morphological

erosion operation. . . . . . . . . . . . . . . . . . . . . . 39

3.2 (a) Input image. (b) Result of binary morphological

opening operation. (c) Result of binary morphological

closing operation. . . . . . . . . . . . . . . . . . . . . . 40

3.3 (a) Original Gray scale image. (b) Result of gray scale

dilate operation on image. (c) Result of gray scale ero-

sion operation on image. . . . . . . . . . . . . . . . . . 41

3.4 (a) Original Gray scale image. (b) Result of gray scale

closing operation on image. (c) Result of gray scale

opening operation on image. . . . . . . . . . . . . . . . 41

3.5 Noisy palm leaf document image belonging to 16th century. 43

3.6 Binarized noisy images of Figure(3.5). . . . . . . . . . . 43

3.7 Original image of palm leaf script belonging to 16th century. 44

3.8 Binarized noisy image of Figure(3.7). . . . . . . . . . . 44

vi

Page 21: Automation of Preprocessing and Recognition of Historical Document Images

3.9 Flow chart for MR based method. . . . . . . . . . . . . 45

3.10 AHE result on images shown in Figure(3.5) and Figure(3.7) 46

3.11 Result of stage 2. (a), (b) are results of opening opera-

tion on images shown in Figure(3.10)(a), (b). and (c),

(d) are results of reconstruction technique. . . . . . . . 47

3.12 Result of stage 3. (a), (b) Results of closing operation

on stage 2 output images shown in Figure(3.11)(a), (b).

(c), (d) Subtraction of R1 from R4. (e), (f) Subtraction

of result of previous step from R2. . . . . . . . . . . . . 47

3.13 (a), (b) Results of Gaussian filter on images shown in

Figure(3.12((e), (f). . . . . . . . . . . . . . . . . . . . . 48

3.14 Morphological reconstruction technique on images shown

in Figure(3.13)(a), (b). . . . . . . . . . . . . . . . . . . 48

3.15 Binarized images of Figure(3.14)(a),(b). . . . . . . . . 49

3.16 (a), (b), (c), (d) Results of MR based method paper im-

ages shown in Appendix 1 Figure(B.1), Figure(B.2), Fig-

ure(B.3) and Figure(B.4) belonging to nineteenth and

beginning of twentieth century. . . . . . . . . . . . . . 51

3.17 (a), (b) Results of MR based method on image of palm

leaf shown in Appendix 1 Figure(A.1) and (A.3) belong-

ing to 16th to 18th century . . . . . . . . . . . . . . . . 52

3.18 Result of MR based method on sample image taken from

Belur temple inscriptions Figure(C.2) belonging to 17th

century AD. . . . . . . . . . . . . . . . . . . . . . . . . 53

3.19 (a), (b) Result of MR based method on stone inscriptions

shown in Appendix 1 Figure(C.1), Figures(C.3) belong-

ing to 14− 17th century. . . . . . . . . . . . . . . . . . 53

vii

Page 22: Automation of Preprocessing and Recognition of Historical Document Images

3.20 Comparison of proposed method with Gaussian, Aver-

age and Median filter. Figures (a), (b), (c), (d) show the

result of respective methods and figures (e), (f), (g), (h)

show the binarized images of (a), (b), (c), (d). . . . . . 54

3.21 Flow chart for BF based method. . . . . . . . . . . . . 57

3.22 (a) Input image of the palm leaf manuscript belonging

to 18th century. (b) Its binarized version. . . . . . . . . 58

3.23 (a) Filtered image using BF method. (b) Final result

of the BF method. (c) Binarized version of enhanced

image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.24 (a), (b),(c),(d) Results of BF based method on input

paper images in Figure(B.1), Figure(B.2), Figure(B.3)

and Figure(B.4) respectively. . . . . . . . . . . . . . . . 62

3.25 (a), (b) Results of BF based method Figure(A.4 and

Figure(A.5. . . . . . . . . . . . . . . . . . . . . . . . . 63

3.26 (a) Input image of palm leaf manuscript. (b) Result of

MR based method. (b) Enhanced image using BF based

method. . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.27 (a) (b) are results of BF based method on input image

in Figure(A.2) and Figure(3.7). . . . . . . . . . . . . . 64

3.28 Result of BF based method on image Figure(A.6) . . . 64

3.29 (a), (b) Results of BF based method on image in Fig-

ure(C.1) and Figure(C.3). . . . . . . . . . . . . . . . . 65

3.30 Result of BF based method on Figure(C.2) Belur temple

inscriptions belonging to 17th century AD. . . . . . . . 65

viii

Page 23: Automation of Preprocessing and Recognition of Historical Document Images

3.31 Non Local Mean Filter Approach. Small patch of size

2p + 1 by 2p + 1 centred at x is the candidate pixel, y

and y′ are the non local patch within search window size

2k + 1 by 2k + 1. . . . . . . . . . . . . . . . . . . . . . 66

3.32 Input palm script image with low contrast. . . . . . . . 68

3.33 Result of NLMF method with residual image on Fig-

ure(3.32). . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.34 (a) Result of NLMF based method on image shown in

Figure(3.32). (b) Binarized image. . . . . . . . . . . . . 70

3.35 Flow chart for NLMF based method. . . . . . . . . . . 71

3.36 (a) Original image. (b) Filtered image using NLMF. (c)

Binarized image of the proposed NLMF method. (d)

Binarized noisy image using Otsu method. . . . . . . . 72

3.37 Results of NLMF based method on input images in Ap-

pendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Fig-

ure(B.4) . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.38 (a) Result of MR based method, (b) enhanced image of

using BF based method, and (c) result of NLMF based

method on input image shown in Figure(3.26). . . . . . 76

3.39 (a) and (b) Results of NLMF based method on input

images shown in Figure(A.2) and Figure(A.1). . . . . . 76

3.40 Result of NLMF based method on input image in Fig-

ure(A.6). . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.41 Results of NLMF nased method on images Figure (C.1

and Figure(C.3). . . . . . . . . . . . . . . . . . . . . . 77

3.42 (a), (b) Results of NLMF based method on images shown

in Figure(C.2) and Figure(C.4). . . . . . . . . . . . . 78

ix

Page 24: Automation of Preprocessing and Recognition of Historical Document Images

4.1 Comparison of all thresholding methods . . . . . . . . 92

4.2 (a) Paper manuscript image-3 of previous century. (b)

Enhanced image using WT based approach. . . . . . . 95

4.3 Enhanced images using WT based approach on (a) Pa-

per manuscript image of shown in Appendix 1 (a) Fig-

ure(B.2) and (b) Figure(B.3 . . . . . . . . . . . . . . . 96

4.4 (a) Palm leaf manuscript image belonging to 16th - 18th

century. (b) Enhanced image using WT based approach. 96

4.5 (a) Palm leaf manuscript image belonging to 18th cen-

tury. (b) Enhanced image using WT based approach. . 97

4.6 (a) Palm leaf manuscript image belonging to 18th cen-

tury. (b) Enhanced image using WT based approach. . 98

4.7 (a) Palm leaf manuscript image belonging to 18th cen-

tury. (b) Enhanced image using WT based approach. . 99

4.8 (a) Stone inscription image belonging to seventeenth cen-

tury. (b) Result of WT based approach. . . . . . . . . 100

4.9 (a) and (c) Stone inscription images belonging to 14th -

17th century. (b) and (d) Results of WT based approach. 101

4.10 Result of WT based approach on stone inscription be-

longing to seventeenth century shown in Appendix 1 Fig-

ure (C.2). . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.11 (a)Wrapping data, initially inside a parallelogram, into

a rectangle by periodicity(Figures reproduced from pa-

per [172]). The shaded region represents trapezoidal

wedge.(b) Discrete curvelet frequency tiling. . . . . . . 102

4.12 (a), (c) and (e) Input images paper, palm leaf and stone.

(b), (d) and (f) Result of CT based approach. . . . . . 103

x

Page 25: Automation of Preprocessing and Recognition of Historical Document Images

4.13 (a)-(b) Input images. (c)-(d) Results of first and second

stage of curvelet based approach. (e)-(f) Result of last

stage(image 15-49). . . . . . . . . . . . . . . . . . . . . 105

4.14 (a) Palm leaf manuscript image belonging in between

16th to 18th century. (b) Enhanced image using WT

based approach. (c) Result of CT based approach. . . . 106

4.15 (a) Input image of palm script. (b) Result of WT based

method. (c) Result of CT method. . . . . . . . . . . . 107

4.16 (a) Input image of palm script. (b) Result of WT based

method. (c) Result of CT method. . . . . . . . . . . . 107

4.17 (a) Result of WT based approach, (b) result of CT based

approach on image shown in Figure(4.8)(a). . . . . . . 108

4.18 Results of WT based method shown in (a), (c) and result

of CT based method shown in (b)-(d) for stone inscrip-

tion images shown in Figure(4.9)(a) and (c). . . . . . . 109

5.1 (a) Handwritten Kannada document image. (b) Hori-

zontal projection profile of handwritten document image. 118

5.2 Handwritten Kannada document image. . . . . . . . . 119

5.3 Horizontal projection profile of the input image Fig-

ure(5.2). . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.4 Non-Zero Rows (NZRs) and rows labelled NZR1 and

NZR2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.5 Horizontal projection profile of a strip. . . . . . . . . . 121

5.6 Extracted text lines. . . . . . . . . . . . . . . . . . . . 122

5.7 Character extraction from line. . . . . . . . . . . . . . 123

5.8 (a), (c), (e) are the extracted lines and (b),(d),(f) are

showing extracted characters from lines(a), (c), (e). . . 123

xi

Page 26: Automation of Preprocessing and Recognition of Historical Document Images

5.9 Input handwritten image and extracted Lines. . . . . . 124

5.10 Extracted characters. . . . . . . . . . . . . . . . . . . . 124

5.11 Input image with uneven spacing between lines . . . . 126

5.12 Result of method 1 on the image shown in Figure(5.11). 126

5.13 Result of closing operation. . . . . . . . . . . . . . . . 127

5.14 Extracted text lines. . . . . . . . . . . . . . . . . . . . 127

5.15 (a) Line and extracted characters from line (a). . . . . 128

5.16 Input image. . . . . . . . . . . . . . . . . . . . . . . . . 128

5.17 Result of closing operation. . . . . . . . . . . . . . . . 130

5.18 Result of extraction of connected components(lines). . . 131

5.19 Result of binarization operation. . . . . . . . . . . . . . 133

5.20 Result of closing operation. . . . . . . . . . . . . . . . 133

5.21 Result of extraction of connected components and cor-

responding lines. . . . . . . . . . . . . . . . . . . . . . 134

5.22 (a) Touching line portion. (b) Result of closing and

opening operation. . . . . . . . . . . . . . . . . . . . . 135

5.23 Extraction of lines. . . . . . . . . . . . . . . . . . . . . 135

5.24 Input skewed image. . . . . . . . . . . . . . . . . . . . 137

5.25 Horizontal projection profile of the input image(5.24). . 138

5.26 Result of closing operation. . . . . . . . . . . . . . . . 139

5.27 Skew angle calculation from single connected component. 139

5.28 Result of deskewing. . . . . . . . . . . . . . . . . . . . 141

5.29 Reconstructed image of Figure(5.24). . . . . . . . . . . 142

5.30 (a) Input Image. (b) Deskewed image. . . . . . . 143

5.31 Input skewed image. . . . . . . . . . . . . . . . . . . . 143

5.32 Deskewed image. . . . . . . . . . . . . . . . . . . . . . 144

6.1 Sample epigraphical characters belonging to different era. 148

xii

Page 27: Automation of Preprocessing and Recognition of Historical Document Images

6.2 Prediction Rate for Gabor, Zernike and proposed method.157

A.1 Original image of palm leaf script of 18th century. . . . 167

A.2 Input images of palm leaf document belonging to 17th

century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168

A.3 Palm leaf image belonging to 18th century. noisy input

image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

A.4 Input image of palm leaf document belonging to 17th

century. . . . . . . . . . . . . . . . . . . . . . . . . . . 168

A.5 Input image of palm leaf document belonging to 17th

century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169

A.6 Input images of palm leaf document belonging to 17th

century. . . . . . . . . . . . . . . . . . . . . . . . . . . 169

B.1 Sample paper image belonging to previous century. . . 170

B.2 Original paper image -1 belonging to nineteenth and be-

ginning of twentieth century. . . . . . . . . . . . . . . . 171

B.3 Original paper image -2 belonging to nineteenth and be-

ginning of twentieth century. . . . . . . . . . . . . . . . 172

B.4 Original paper image-3 belonging to nineteenth and be-

ginning of twentieth century. . . . . . . . . . . . . . . 173

C.1 Stone inscription image belonging to 14− 17th century. 174

C.2 Digitized image of Belur temple inscription belonging to

17th century AD. . . . . . . . . . . . . . . . . . . . . . 175

C.3 Digitized image of Belur temple inscriptions belonging

to 17th century AD. . . . . . . . . . . . . . . . . . . . 176

C.4 Digitized image of Shravanabelagola temple inscriptions

belonging to 14th century AD. . . . . . . . . . . . . . . 177

xiii

Page 28: Automation of Preprocessing and Recognition of Historical Document Images

List of Tables

1.1 Evolution of Kannada Character . . . . . . . . . . . . . 8

3.1 Comparison of PSNR values and execution time for three

spatial domain methods to enhance the paper document

images of 512× 512 size. . . . . . . . . . . . . . . . . . 79

3.2 Comparison of PSNR values and execution time for three

spatial domain methods to enhance the palm leaf docu-

ment images of 512× 512 size. . . . . . . . . . . . . . . 80

3.3 Comparison of PSNR values and execution time for three

spatial domain methods to enhance the stone inscription

images of 512× 512 size. . . . . . . . . . . . . . . . . . 81

4.1 Comparison of various wavelet thresholding methods for

five images along with PSNR values. . . . . . . . . . . 91

4.2 PSNR values obtained from five different thresholding

methods for few images. . . . . . . . . . . . . . . . . . 93

4.3 Result of Curvelet Transform based approach. . . . . . 111

4.4 Comparison of PSNR Values and execution time for Wavelet

and Curvelet Transform based methods on paper images. 112

xiv

Page 29: Automation of Preprocessing and Recognition of Historical Document Images

4.5 Comparison of PSNR Values and execution time for Wavelet

and Curvelet Transform based methods on palm leaf im-

ages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.6 Comparison of PSNR Values and execution time for Wavelet

and Curvelet Transform based methods on stone inscrip-

tion images. . . . . . . . . . . . . . . . . . . . . . . . . 114

4.7 Comparison of PSNR values of two frequency domain

based approaches. . . . . . . . . . . . . . . . . . . . . . 115

5.1 Result of skew detection and correction. . . . . . . . . 141

5.2 Skew angle detected for each line in the document image. 145

6.1 Confusion Matrix and Recognition Rate(RR) for char-

acter image size 100× 50. . . . . . . . . . . . . . . . . 155

6.2 Confusion Matrix and Recognition Rate (RR) for char-

acter image size 40× 40 with first scale. . . . . . . . . 156

6.3 Recognition Rate(RR) of the data set 64× 64 and Con-

fusion Matrix for character image size 64× 64 with first

scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.4 Comparison of the Recognition Rates(RR) for various

character image sizes 40× 40, 64× 64, 100× 50. . . . 157

xv

Page 30: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 1

Preface

1.1 Preamble

Documents are the major source of data, information and knowledge, which are writ-

ten, printed, circulated and stored for future use. Nowadays computers are gaining

dominion as they are used virtually everywhere to store information from handwritten

as well as printed documents and also produce printed documents [1], [2]. The often

repeated slogan of the paperless office for all organizations has now given way to a

different objective. In order to achieve such a paperless office, information needs to be

entered into a computer manually. Due to the substantial amount of labor required

to do so, the only solution is to make computers capable of reading paper documents

efficiently without the intervention of human operators. There exists massive scope

for research in the field of document image processing, particularly in the conversion

of document images into editable forms [3].

For the past few years, a lot of ambitious large-scale projects have been proposed

to make all written material available online in a digital form. Universities initiated

Million Book Project and industry initiated projects such as Google Books Library

in order to make this goal achievable and a lot of challenges still need to be handled

in the processing of these documents [4]. The main purpose of the digital library is

to consolidate all the documents that are spread across the globe and enable access

to their digital contents. The Optical Character Recognition (OCR) technology has

helped in converting document images into machine editable format. Even though

1

Page 31: Automation of Preprocessing and Recognition of Historical Document Images

the OCR system adequately recognizes the documents, the recognition of handwrit-

ten documents is not completely reliable and is still an open challenge to researchers.

Inaccurate recognition is due to many factors like scanning errors, lighting conditions,

quality of the documents etc. Further inaccuracies stem from the age of these docu-

ments and the condition of the materials these documents are inscribed upon. Some

operations that can be performed on document images include: pre-processing of the

noisy image, enhancement of the low contrast image, de-blurring of the blurred im-

age, estimation of the skew introduced during image acquisition, segmentation of the

document image into lines, words, and characters and recognition of the character.

Historical documents are documents which contain vital information about our an-

cestors. They encompass every aspect of their life, religion, education etc. These are

inscribed or printed on a variety of materials and they substantially differ from vari-

ous other documents that are prevalent today mainly because of the major differences

in their layout structure. Due to their variable structure, extraction of the contents

of historical documents is a complicated task. Additional complexity is posed by

the various states of degradation that the historical documents are found in. The

primary causes for this degradation are factors like aging, faint typing, ink seepage

and bleeding, holes, spots, ornamentation and seals. Historical documents consist of

additional abnormalities like the presence of narrow spaced lines (with overlapping

and touching components) and the unusual and varying shapes in which the charac-

ters and words are found, due to differences in writing techniques and variations in

location and the particular period in which they were drafted. These problems also

create complications in segmenting the document image into lines, words and char-

acters, which is required to extract characteristic features for recognition purposes.

Thus, the removal of noise in the input document image and segmentation of the

document image into lines, words and characters are important factors in improving

the efficiency of OCR. Since processing of degraded documents plays a significant

role in deciding the overall result of the recognition product, it is essential that it be

handled effectively. With this background, in this thesis, we explore some efficient

image enhancement algorithms to enhance the degraded historical document images,

segment the enhanced image into lines, words and characters and documents belong-

2

Page 32: Automation of Preprocessing and Recognition of Historical Document Images

ing to different eras. In this thesis, the terms document images and documents are

used interchangeably to refer to historical document images.

In the subsequent section, we present a brief introduction to historical documents,

its relevance and need for preservation. In the next succeeding section, we present

the motivation for the research work with brief introduction to document image pro-

cessing techniques: data acquisition/collection, pre-processing, segmentation, feature

extraction and recognition. Contribution of the research work and organization of

the thesis is presented in the last two sections.

1.2 Historical Documents

Written scripts have been the primary mode of communication and information stor-

age for hundreds of centuries. Prehistoric humans inscribed on stones, rocks and cave

walls. While some of these were used as a means of communication, others inscribe

a more religious or ceremonial purpose to them. Over the ages, evolving from primi-

tive objects like stones and rocks, novel mechanisms like palm or birch leaves, clothes

and paper became prevalent mediums for information storage. In later centuries,

they were more predominantly used to record information about education, religion,

health and socio-political advancement. These ancient artifacts are conventionally

referred to as historical documents and are a crucial part of any nations cultural

heritage. Some of the sample images shown in Figure(1.1), Figure(1.2) are stone

inscriptions of 6th and 13th centuries and Figure(1.3) is a palm leaf document.

According to Sircar [5] it has been confidently estimated that, about 80 percent of

all knowledge about Indian history (before 10th century A.D) has been derived from

inscriptional sources. Commonly found inscriptions are usually found inscribed on

walls of caves, pillars, big rocks, metal plates, coins etc. The remarkable durability

of these materials compelled ancestors to record vital information imperative for

future generations. Many of these inscriptions were inscribed to preserve truths

about battles and recognize acts of bravery and courage pertaining to our ancestors.

Some of them are : Edicts of the rulers: Achievements of rulers, Eulogies: awards

given to persons in praise, Commemorative inscriptions: this type again has five sub

3

Page 33: Automation of Preprocessing and Recognition of Historical Document Images

Figure 1.1: 6th Century Ganga Dynasty Inscription.

categories. Donatory Inscriptions, Hero stones, Sathi stone, Epitaphs(inscriptions on

Tomb) and Miscellaneous.

These inscriptions not only comprise of text/characters, but also contain paintings

and carvings of humans, animals, nature and spiritual deities. An expert is required

to study and decipher their contents in the context in which they were envisioned in

a particular era. The study of such inscriptions is known as Epigraphy and an expert

involved in deciphering inscriptions is known as an Epigraphist. The inscriptions

on rocks, stones, caves and metals are vital resources which enlighten the present

generation about our past [6].

Stones, rocks, and metals were also used to inscribe significant community mes-

sages to people. Detailed information and stories could not be inscribed on materials

like rocks and stone. Therefore early ancestors used palm leaves and birch leaves

as a medium for imparting such information. They comprise of mythological stories,

spiritual teachings, and knowledge which spans a plethora of fields like science, educa-

tion, politics, law, medicine, literature etc. It has been estimated that India has more

than a hundred lakh palm and birch leaf documents available in various conditions.

Literature has revealed that the first usage of paper discovered through excavations

was in China from the 2nd century BC[7]. People in India, started writing on paper

4

Page 34: Automation of Preprocessing and Recognition of Historical Document Images

Figure 1.2: 13th Century Hoysala Dynasty Inscription.

during 17th century. As these documents contain vital information pertaining to our

past and are reminiscent of our cultural integrity, there is a dire need to preserve

them and prevent any further degradation.

It is rightly said that the nation or the society, which does not know its heritage,

cannot fully comprehend its present and hence is unable to lead its future. This

heritage encompasses almost every aspect of human inquiry, be it culture, spiritual-

ity, philosophy, astronomy, medicine, religion, literature or education that prevailed

during different ages [8]. Majority of the details about a civilization can be obtained

from their ancient scriptures which help in understanding the past. Since these docu-

5

Page 35: Automation of Preprocessing and Recognition of Historical Document Images

Figure 1.3: Inscriptions on palm leaf belonging to 16th − 18th century.

ments have degraded due to various factors like: weather conditions, fossil deposition,

fungus attacks, wear and tear, strain and stain, brittleness due to dry weather, ink

seepage, bleeding through and scratches etc., they cannot be preserved in their origi-

nal form for prolonged duration. Therefore, automated tools are required to capture

the document, enhance the documents images, recognize the era to which they belong

and finally convert them into digital form for long-term preservation. In our research

work, we have considered Kannada historical document images for experimentation.

Hence information about Kannada script and its evolution is provided in the next

section.

1.2.1 Kannada Scripts/Character

In South East Asia, East Asia including India, inscriptions are found in one of the

three scripts namely Indus valley, Brahmi and Kharosti. The Kannada script, a

South Indian language script is one among the many evolved versions of Brahmi and

are shown in Figure 1.1, an instance of Kannada script inscribed during 3rd century

BC. The image shown in Figure(1.4) shows the evolution of Kannada script since 3rd

century. The evolution of the script has brought changes in the structure and shape

of the script, mainly due to factors like writing materials, writing tools, method of

inscribing and the background of the inscriber [9], [10], [11],

Kannada script has a history of more than 2000 years and has taken shape from

early Brahmi script to the present Kannada as shown in Table(1.1). It has undergone

various changes and modifications during the dynasty of Satavahana(2nd century A

6

Page 36: Automation of Preprocessing and Recognition of Historical Document Images

Figure 1.4: Stone inscription belonging to 3rd century BC.

.

D), Kadamba(4th−5th century A D), Ganga(6th century A D), Badami Chalukya(6th

century A D), Rastrakuta(9th century A D), Kalyani Chalukya(11th century A D),

Hoysala(13th century A D), Vijayanagar (15th century A D), Mysore(18th century A

D). Since experts are few in number and are fast decreasing, it is the need of the hour

to preserve and automate the process of deciphering these inscriptions.

1.3 Motivation for the Research Work

Historical documents are national treasures and provide valuable insight into past

cultures and civilizations, the significance of which has been extensively discussed in

the previous sections. The preservation of these documents is of vital importance and

is being strenuously carried out with the help of an assortment of advanced tools and

technologies. These kinds of documents are being digitized, processed and preserved

using a noteworthy set of image processing and pattern recognition techniques. The

major steps involved in the processing of an image are: image acquisition/collection,

preprocessing, segmentation, feature extraction and recognition[12],[13]. These and

other related works are discussed in the following sub sections.

1.3.1 Data Collection

The historical documents considered for this research work were collected from var-

ious libraries and universities across Karnataka; one of the prominent South Indian

7

Page 37: Automation of Preprocessing and Recognition of Historical Document Images

Table 1.1: Evolution of Kannada Character

Character ’a’ Century

Ashoka, 3rd Century B C

Saathavahana, 2nd Century A D

Kadamba, 4th - 5th Century A D

Ganga, 6th] Century A D

Badami Chalukya, 6th Century A D

RashtraKuta, 9th Century A D

Kalyani Chalukya 11th Century A D

Hoysala, 13th Century A D

Vijayanagara 15th Century A D

Mysore 18th Century A D

States. These digitized documents are inscribed/written in Kannada, which is the

regional and official language of Karnataka. About 2700 digitized document images

were considered for our study. Majority of these are palm leaf documents and the

rest are paper and stone inscriptions span different eras from 13th to 19th centuries.

Since these images are collected using different setups i.e. either using a camera

8

Page 38: Automation of Preprocessing and Recognition of Historical Document Images

or a scanner, the particular resolution details are unavailable. Differences in setup

cause significant variations in image size and resolution and introduce complexities

in setting up parameter values for experimentation. Therefore, each image set has

to be manually inspected and adjusted to get suitable image and character size. The

image set consists of documents inscribed by different individuals and also length of

the palm leaves used to inscribe varies across the collection.

Paper documents are categorized into two groups : Good-quality images and Noisy

images. Uneven illumination, brown colored and low contrast paper images without

spots, stains, and smears etc are grouped under Good-quality images. Images with

spots, stains, smears or smudges, with less or more background noise, wrinkles due

to humidity, illumination variation, ink seeping from the other side of the page,

oily pages, thin pen strokes, breaks, dark lines due to folding, de-coloring, etc are

grouped under Noisy images. Approximately 200 documents were collected with

varying resolutions. During experimentation, the images are divided into different

sizes depending on its overall size and also into 512 × 512 sized images. Higher

resolution images are re-sized and divided into smaller sized images. Lower resolution

images are divided without re-sizing. Large images are not capable of being processed

using computers due to hardware constraints, therefore images have to be divided into

smaller sized images. About 500 plus images were created out of 200 images.

Palm leaves are classified into two groups viz. Degraded and Severely Degraded.

Leaves with low contrast due to repeated application of preservatives (oil), stains

due to uneven application of oils, accumulation of dust, holes introduced due to

tying of the leaves together are classified as Degraded. Subsequently, leaves with

dark and brown colored lines introduced due to cracks, strains, breaks, wear and

tear and noise due to scanning errors are grouped under Severely Degraded. These

documents are hard to enhance and segment. About 1000 palm leaf documents were

collected with their sizes varying from 2cm to 24cm in length and 2cm to 6cm in

width. Furthermore, the lengthy images(of size more than 10 cm in length) were

re-sized and divided into smaller size, based on the size and character size within the

document. Therefore images were divided into two to three segments and used for

9

Page 39: Automation of Preprocessing and Recognition of Historical Document Images

subsequent experimentation. Approximately 2000 images were obtained from 1000

images.

The percentage of degradation was found to be significantly higher in earlier stone

inscriptions, particularly those from 3rd century BC to 13th century AD. Capturing

stone inscriptions under different lighting conditions creates illumination and inten-

sity problems along with scratches, cracks, breaks and also leads to erased characters

due to wear and tear. So, stone inscriptions tend to be more severely degraded than

palm leaves and paper. Therefore, it is difficult to enhance the entire image. Ap-

proximately 200 digitized images of stone inscriptions were collected. Even though

more than 400 images were created out of 200, we have considered only 200 resized

samples for our study.

Some of the sample images belonging to paper, palm leaves and stone inscriptions

used for experimentation are shown in Appendix A, Appendix B and Appendix C.

1.3.2 Enhancement/Preprocessing

The primary objective of pre-processing is to improve the image quality by adequately

suppressing unwanted distortions and suitably enhance the part of the full image

features that are important for further processing. Even though we have a myriad

of advanced photography and scanning equipment at our disposal, natural aging and

perpetual deterioration have rendered many historical document images thoroughly

unreadable. Aging of these documents have led to the deterioration of the writing

media employed, due to influences like seepage of ink, smearing along the cracks,

damage to the leaf due to holes used for binding the manuscript leaves and other

extraneous factors such as dirt and discoloration.

In order to suitably preserve these fragile materials, digital images are predomi-

nantly captured using High Definition(HD) digital cameras in presence of an appropri-

ate light source instead of platen scanners. Digitizing palm leaf and birch manuscripts

pose a variety of problems. They cannot be forced flat and the light source used for

digital cameras are usually uneven and the very process of capturing a digital image of

10

Page 40: Automation of Preprocessing and Recognition of Historical Document Images

the leaf introduces many complications. These factors lead to poor contrast between

the background and the foreground text. Therefore, innovative digital image pro-

cessing techniques are necessary to improve the legibility of the manuscripts. To sum

up, historical document images pose several challenges to preprocessing algorithms,

namely low contrast, nonuniform illumination, noise, scratches, holes, etc.

It has been observed from literature that many spatially linear, nonlinear and

spectral filters are used to denoise the image [14], [15], [16], [17]. Gatos et al.[18] pro-

posed a novel noise reduction technique called Wiener filter and adaptive binarization

method. Unsharp masking was proposed to enhance the edge detail information in

the degraded document. These filters eliminate noise, smoothen the image and give

a blurring effect[19]. In degraded documents, the text information is very crucial

for subsequent stages of character recognition and therefore losing out text informa-

tion while smoothing, is unacceptable. Therefore a suitable algorithm is required to

eliminate noise, without losing out much of the textual content.

Literature survey reveals that very little work has been reported on Indian historical

document processing owing to the fact that preservation of ancient physical resources

has taken precedence quite lately. India is a country of vast cultural heritage and

is one of the largest repositories of cultural heritages in the world. It houses an

estimated 5 million ancient manuscripts available in various archives and museums

throughout the country. The preservation of these resources was never a priority

subject in the past, so large resources have either vanished or gone out of our country.

Furthermore, even the ones which have survived have undergone massive degradation.

Therefore preservation of these historical heritages through digitization is of utmost

importance. However, any degradedness in the original document will be transferred

directly to their digitized versions rendering them illegible. To improve the legibility

of the document, images have to be pre-processed in order to get an enhanced copy.

So this warrants the development of novel image processing algorithms to preprocess

the digitized images.

11

Page 41: Automation of Preprocessing and Recognition of Historical Document Images

1.3.3 Segmentation

Image segmentation is the process of splitting a digital image into multiple groups of

pixels, each of which are assigned unique labels so that pixels with the same label share

certain visual characteristics. In general terms, it can be considered as simplifying

the representation of an image into something that is more meaningful and easier

to analyze. Image segmentation is typically used to trace objects, boundaries and

regions of interest. In case of document images, segmentation refers to extraction of

lines, words, and characters from the given document. Segmentation of a document

image into text, lines and words is a critical phase in moving towards unconstrained

handwritten document recognition. Extracting lines from handwritten documents

is more complicated, as these documents contain non uniform line spacing, narrow

spacing between lines, scratches, holes and other factors which are elaborated in the

previous section on historical documents. Apart from variations of the skew angle

between text lines or along the same text line, the existence of overlapping or touching

lines, uneven character size and non-Manhattan layout pose considerable challenges

to text line extraction.

Due to inconsistency in writing styles, scripts, etc., methods that do not use any

prior knowledge adapt to the properties of the document image, as the proposed,

would be more robust. Line extraction techniques may be categorized as projection

based, grouping, smearing and Hough transform based [20]). Global projections based

approaches are very effective for machine printed documents but cannot handle text

lines with variable skew angles. However, they can be applied for skew correction in

documents with constant skew angle[21]. Hough transformed based methods handle

documents with variation in the skew angle between text lines, but are not very

effective when the skew of a text line varies along its width [22].

The most known of these segmentation algorithms are the following: X-Y cuts or

projection profiles based [23], Run Length Smoothing Algorithm(RLSA) [24], compo-

nent grouping [25], document spectrum [26], constrained text lines[27], Hough trans-

form [28], [29], and Scale space analysis [30]. All of the above segmentation algorithms

are mainly devised for present-day documents. For historical and handwritten docu-

ment segmentation, projection profiles [31], Run Length Smoothing Algorithm [32],

12

Page 42: Automation of Preprocessing and Recognition of Historical Document Images

Hough transform[33] and scale space analysis algorithms [34] are mainly used. As

segmentation of the historical document images is another focus of our research work,

a detailed literature survey is given in next chapter and algorithms developed for line

segmentation are detailed in chapter 5.

1.3.4 Feature Extraction and Recognition

Feature extraction involves simplifying the amount of resources required to describe

a large set of data accurately. When performing analysis of complex data, one of the

major problem stems from the number of variables involved. Analysis involving a

large number of variables generally requires a large amount of memory, computation

power and/or the presence of a classification algorithm which over-fits the training

sample and generalizes poorly to new samples.

Feature extraction is a general term used for methods which involve constructing

combinations of the variables to get around these problems, while still describing

the data with sufficient accuracy. Features are used as input to classifiers in order

to classify and recognize the object. To recognize the character, features have to

be extracted from the segmented document. Literature survey reveals a wide array

of creative works in the diverse field of Document image processing and recognition.

Many authors have developed efficient algorithms for segmentation of the document

into lines, words, characters [35][36], feature extraction and classification of charac-

ters [37]. Feature extraction and recognition is an important part of the recognition

system. Major feature extraction algorithms are based on structural features, statis-

tical features and spectral methods. Structural features are based on topological and

geometrical characteristics such as, maxima and minima, reference lines, ascenders,

descenders, strokes and their direction between two points, horizontal curves at top

or bottom, cross points, end points, branch points etc [38]. A detailed literature

survey on the enhancement, segmentation and recognition stages is presented in the

next chapter.

Although significant efforts have been made to digitize the historical content, the

understanding of these documents is beyond the scope of any common man. The

13

Page 43: Automation of Preprocessing and Recognition of Historical Document Images

underlying reason for this is that the character set has evolved and changed from

ancient times to what it is now. The scripts/characters used to inscribe the contents

are no longer prevalent. Hence expert knowledge is required to decipher these docu-

ments. In the present scenario, the number of expert Epigraphists are few in number

and are fast decreasing which could lead to a major problem in deciphering these pre-

cious resources in the future. Hence there is a need to develop supplementary tools

to recognize the era of the character which in turn helps to refer the corresponding

character set to understand the document through applications of computer vision

techniques.

Only few authors have attempted to recognize Brahmi scripts and predict the

corresponding era. Fewer still have worked on deciphering of South Indian Kannada

language epigraphical (Stone inscriptions) scripts and proposed algorithms for predic-

tion of the era of the script [8]. In our research work, palm leaf and paper manuscripts

belonging to various eras are considered to predict the era of the document and the

algorithms devised for the prediction of the era is provided in chapter 6.

1.4 Contribution

In this research work, the severe degradation of the documents has been addressed

by developing spatial and frequency domain based algorithms. In spatial domain,

three algorithms have been designed based on 1) Gray Scale Morphological Recon-

struction(MR); 2) Bilateral filtering and 3) Non Local Means filtering in combination

with morphological operations. In the frequency domain, two algorithms have been

devised using wavelet and curvelet transforms.

In spatial domain, gray scale morphological reconstruction technique is devised

using gray scale opening and closing operations. Gray scale opening is applied to

compensate for non uniform background intensity and suppress bright details smaller

than the structural element, while closing operation suppresses the darker details.

This algorithm is further used as background elimination method in combination

with remaining algorithms in this thesis. This method works well for the images

14

Page 44: Automation of Preprocessing and Recognition of Historical Document Images

with less degradation. Severely degraded images are handled using a Bilateral Fil-

ter (BF) with a combination of gray scale morphological reconstruction technique.

Bilateral filter based method along with the MR algorithm is employed to eliminate

noise, enhance the contrast and eliminate dark background. The bilateral filter is

a non linear filter which uses a combination of range filtering and domain filtering.

A combination of Non Local Means filter (NLMF) and MR technique is employed

in designing enhancement algorithm to de-noise the documents based on similarity

measure between non local windows.

Since simple spatial domain techniques cannot handle all types of degradations, it

becomes necessary to transform the problem into another domain to get better results.

An attempt has been made to eliminate the noise using frequency domain based

methods to achieve the desired results. An algorithm based on wavelet transform is

devised to analyse and enhance the image. Since wavelet transform is unable to handle

curve discontinuity, an extended wavelet transform known as curvelet transform based

approach is used to design the second algorithm to enhance the degraded documents.

Due to the presence of uneven spaces, curved lines and touching lines in a histor-

ical document, the segmentation of the document becomes quite complicated. To

address this problem, two segmentation algorithms have been proposed. First algo-

rithm, based on piecewise projection profile, is suitable for extracting the curved lines,

but fails to segment the touching lines. Therefore the second algorithm, based on

mathematical morphology and Connected Component Analysis(CCA) is developed

to segment the touching lines. The second algorithm segments the touching line as

well as curved lines. The extended version of the second algorithm: a combined ap-

proach of morphology and CCA, is designed to detect the skewed lines and correct the

lines within the document. Usually handwritten documents contain uneven spacing

causing skewed lines in the document. The detection and correction of the individual

line skews will make the segmentation task simple.

The segmented characters are used in further stages of image processing viz. feature

extraction, recognition and classification. To recognize and classify the characters,

features of the individual characters have to be extracted and used as input to the

15

Page 45: Automation of Preprocessing and Recognition of Historical Document Images

classifiers. In this research work, recognizing the era of the character is taken up

so that character set belonging to that era can be used to decipher the document.

Hence, algorithms for era prediction of the segmented characters is devised using

curvelet transform.

1.5 Organization of the Thesis

The thesis is organized into seven chapters. Chapter one provides an introduction to

historical document image processing, motivation for the research and contribution

of the thesis. Chapter two presents the literature survey. Chapter three provides

the algorithms which are designed based on spatial domain techniques. Chapter four

explains the algorithms developed to enhance the historical document images based

on frequency (wavelet) domain techniques. Chapter five presents the algorithms

developed for segmentation of handwritten documents into lines and characters and

skew detection and correction algorithms. Chapter six deals with the development

of the algorithms for feature extraction and recognition of the era of the character.

Chapter seven provides conclusion and future scope of the work.

16

Page 46: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 2

Literature Survey

2.1 Computer Vision

Visual system has been the greatest source of information to all living things since

beginning of the history. To interact effectively with the world, the vision system must

be able to extract, process, and recognize a large variety of visual structures from the

captured images [1]. One picture is worth a thousand words is well known sentence

to describe the importance of the visual data. Visual information transmitted in

the form of digital images is becoming the major method of communication in the

present scenario. This has resulted into a new field of computer technology known

as Computer Vision [2]. It is a rapidly growing field with increasing applications

in science and engineering and holds the responsibility of developing the suitable

machine that could perform the visual functions of an eye. It is mainly concerned

with modeling and replicating human vision using computer software and hardware

[12], [13]. It combines the knowledge of all fields of engineering in order to understand

and simulate the operation of the human vision system.

Computer vision finds its applications in various fields like: military, medicine,

remote sensing, forensic science, transportations etc. Some of these applications are:

content based image retrieval, automated image and video annotation, semantics re-

trieval, document image processing, mining, warehouse, augmented reality, biometric,

non-photorealistic rendering, and knowledge extraction etc. These applications in-

volve, various sub fields of Computer Vision such as Digital Image Processing, Pattern

17

Page 47: Automation of Preprocessing and Recognition of Historical Document Images

Classification and/or Object Recognition, Video Processing, Data Mining and Arti-

ficial Intelligence etc. These sub fields are required to process the image/video data

in various combinations to get desired output.

One sub field of computer vision is Document Image Analysis and Recognition

(DIAR), which aims to develop techniques to automatically read and understand

the contents of document through machines. The DIAR system consists of four

major stages: document image acquisition, image preprocessing, feature extraction

and recognition. Document image acquisition deals with the capturing the document

image using scanners and cameras. Image preprocessing mainly deals with noise

elimination, restoration, segmentation. Feature extraction deals with the extraction

of the characteristic features of the segmented character(document) for recognition

of the character. Pattern recognition or classification is mainly used to recognizing

the object/pattern in the image using features extracted from feature extraction

techniques. In our research work, algorithms for image enhancement of historical

documents, segmentation of the document and prediction/recognition of era of the

document are presented and detailed literature survey of them is given in the following

sections.

2.2 Preprocessing and Segmentation

Often, degraded document creates problems in acquiring better quality images. In

document digitization projects of large volume, the main challenge is to automatically

decide correct and proper enhancement technique. Image enhancement techniques

may adversely influence an image quality if applied to incorrect image. Boutros [39]

proposed a prototype which can automate the image enhancement process. It is

clear that the quality of image acquisition affects the later stages of document image

processing. Hence proper image preprocessing algorithms are needed.

Text line extraction would segment the document images without background noise

and non-textual elements. In practice, it is very difficult to get document images with-

out noise. Some preprocessing techniques need to be performed before segmentation.

Non-textual elements around the text such as book bindings, book sides, and parts of

18

Page 48: Automation of Preprocessing and Recognition of Historical Document Images

fingers should be removed. On the document itself, holes and stains may be removed

by high-pass filtering. Other non-textual elements (stamps, seals) and also ornamen-

tation, decorated initials can be removed using knowledge about the shape, the color

or the position of these elements. Extracting text from figures (text segmentation)

can also be performed on texture grounds [40], [41], or by morphological filters.

Intensive research work has been found in development of algorithms based on text

line distortion [42], [43], [44] methods. These proposed methods are aimed at solving

nonlinear folding of documents. Folding (warping) can sometimes become serious

and contents of the document become unreadable. Fan et al. [45] proposed hybrid

method by combining two cropping algorithms, first based on line detection and the

second based on text region growing, to achieve robust cropping.

Javadevan et al. [46] presented a survey on bank cheque processing. The work

presented covers the aspects of the document image processing. Almost all documents

which are part of any organization viz., business letters, newspaper, technical reports,

legal documents, bank checks need to be processed to extract information. Authors

have discussed various aspects of check processing techniques. As checks are scanned

in various conditions, low contrast, slanted, tilted are common problems. Cheques

may also contain scratches, lines, overwriting ink marks on the check leaf. These

create problems in recognizing the correct date, account number, amount, check

numbers etc. Cheque writers usually cross the text lines and write above the text

line.

Suen et al. [47] proposed a method to process the bank check in which initially the

image was smoothed using mean filter and then background was eliminated through

an iterative thresholding. Madasu and Lovell [48] proposed bank check process-

ing method based on gradient and Laplacian values which are used to find whether

an image pixel belongs to background or foreground. The binarization approach

proposed in [49] was based on Tsallis entropy to find the best threshold value and

histogram specification was adopted for preprocessing some images. To eliminate

the background from the cheque image in [51], a stored background sample image

was subtracted from the skew corrected test image. Background subtraction method

19

Page 49: Automation of Preprocessing and Recognition of Historical Document Images

was adapted to extract written information from Indian bank cheques. Erosion and

dilation operations were used to eliminate the background residual noise. Logical

smearing was applied with the help of end-point co-ordinates of detected lines to deal

with broken lines in [57].

Binarization of images is very important step in any recognition systems. A lot

of work in finding suitable thresholding value for binarization has been found from

the literature survey. Sahoo et al. [52] compared the performances of more than 20

global thresholding algorithms using uniformity or shape measures. The comparison

showed that Otsu‘s class separability method [53] performed best.

Sezgin and Sankur [54] discussed various thresholding techniques in their survey

paper. The binarization algorithm proposed in [55] defines an initial threshold value

using percentage of the desired density of black pixels to appear in the final binarized

image. To improve the efficiency of the algorithm, a cubic function was used to

establish relationship between the initial threshold value and the final one. In [56],

the binarization of the grey-scale image was done with a threshold value calculated

dynamically based on the number of connected components in the area of courtesy

amount.

Slant/skew is the deviation of handwritten strokes from the vertical direction (Y

- axis) due to different writing styles. The skew may be introduced while scanning

the documents and can be detected by finding the angle that the baseline makes

with the horizontal direction. It has to be detected and corrected for successful

segmentation and recognition of handwritten user inputs. Skew correction is done

by simply rotating the image in the opposite direction by an angle equal to the

inclination of the guidelines. A comprehensive survey on different skew detection

techniques was reported in [50]. Due to the presence of guidelines, the histogram

with longest peak corresponds to the skew of the image. To correct the rotation

and translation occurred during the image acquisition process, a method based on

projection profile has been used in [51].

20

Page 50: Automation of Preprocessing and Recognition of Historical Document Images

Kim and Govindaraju [58] proposed a chain code representation for calculating

the slant angle of handwritten information. In [59] and [60], the average slant of

a word was determined by an algorithm based on the analysis of slanted vertical

histograms [61]. The heuristics for finding the average slant was to search for the

greatest positive derivative in all the slanted histograms and then corrected through

a shear transformation in the opposite direction. Also in [62] and [63], the slant of

handwritten information was computed using the histogram of the directions of the

contour pixels.

Many techniques have been developed for page segmentation of printed documents

viz., newspapers, scientific journals, magazines, business letters produced with mod-

ern editing tools [64], [65], [66], [26]. The segmentation of handwritten documents

has also been addressed with the segmentation of address blocks on envelopes and

mail pieces [68], [67], [69], [70] and for authentication or recognition purposes [71],

[72].

There are various methods available for text line extraction. One of the fundamen-

tal methods is projection profile method which is used for printed documents and

handwritten document with proper spacing between lines. The vertical projection

profile is obtained by summing pixel values along the horizontal axis for each y value.

The profile curve can be smoothed by a Gaussian or median filter to eliminate local

maxima [34]. The profile curve was then analyzed to find its maxima and minima.

There are two drawbacks: short lines will provide low peaks, very narrow lines, as

well as those including many overlapping components, will not produce significant

peaks. In case of skew or moderate fluctuations of the text lines, the image may be

divided into vertical stripes and profiles sought inside each stripe [73]. These piece-

wise projections are thus a means of adapting to local fluctuations within a more

global scheme.

In Shapiro et al. [74] paper, the global orientation or skew angle of a handwritten

page was first searched by applying a Hough transform on the entire image. Once this

skew angle was obtained, projections were achieved along this angle. The number of

maxima of the profile gives the number of lines. Low maxima were discarded based

21

Page 51: Automation of Preprocessing and Recognition of Historical Document Images

on their value, which was compared to the highest maxima. Lines were delimited by

strips, searching for the minima of projection profiles around each maxima.

In the work of Antonacopoulos and Karatzas [75], each minimum of the profile curve

was a potential segmentation point. Potential points were then scored according to

their distance to adjacent segmentation points. The reference distance was obtained

from the histogram of distances between adjacent potential segmentation points. The

highest scored segmentation point was used as an anchor to derive the remaining ones.

The method was applied to printed records of the Second World War which have

regularly spaced text lines. The logical structure was used to derive the text regions

where the names of interest can be found. The RXY cuts method applied in He and

Downton [31] uses alternating projections along the X and Y axes. This results in a

hierarchical tree structure. Cuts were found within white spaces. Thresholds were

necessary to derive inter-line or inter-block distances. This method can be applied

to printed documents (which are assumed to have these regular distances) or well-

separated handwritten lines.

For printed and binarized documents, smearing methods such as the Run-Length

Smoothing Algorithm [76] can be applied. Consecutive black pixels along the hori-

zontal direction were smeared: the white space between them was filled with black

pixels if their distance is within a predefined threshold. The bounding boxes of the

connected components in the smeared image enclose text lines. A variant of this

method adapted to gray level images and applied to printed books from the sixteenth

century consists in accumulating the image gradient along the horizontal direction

[77]. This method has been adapted to old printed documents within the Debora

project [78]. For this purpose, numerous adjustments in the method concern the

tolerance for character alignment and line justification. Shi and Govindaraju [79]

proposed a method for text line separation using fuzzy run length which imitates an

extended running path through a pixel of a document image.

The Hough transform [28] is a very popular technique for finding straight lines in

images. The Hough transform can also be applied to fluctuating lines in handwritten

drafts [80]. An approach based on attractive-repulsive forces was presented by Oztop

22

Page 52: Automation of Preprocessing and Recognition of Historical Document Images

et al.[81]. It works directly on gray level images and consists of iteratively adapting

the y position of a predefined number of baseline units. Baselines are constructed

one by one from the top of the image to the bottom. Pixels of the image act as

attractive forces for baselines and already extracted baselines act as repulsive forces.

Tseng and Lee [82] presented a method based on probabilistic Viterbi algorithm ,

which derives non-linear paths between overlapping text lines. In Likforman-Sulem

et al. [33] method, touching and overlapping components are detected using Hough

transform method. Pal and Datta [35] proposed a line segmentation method based

on the piecewise projection profile.

Some solutions for separation of units belonging to several text lines can be found

in literature survey for recognition purposes. In Bruzzone and Coffettis method [83],

the contact point between ambiguous strokes was detected and processed from their

external border. An accurate analysis of the contour near the contact point was per-

formed in order to separate the strokes according to two registered configurations: a

loop in contact with a stroke or two loops in contact. Khandelwal et. al. [84] pre-

sented a methodology based on comparison of neighborhood connected components

to check text line belonging to same line or not. Components less than average height

are ignored and addressed later in the postprocessing.

New algorithm for segmentation of overlapping line and multi touching components

has been proposed by Zahour et al. [85] using block covering method which has

three steps. First step classifies the document using fractal analysis and Fuzzy C

means algorithm. Second step classifies the block using statistical analysis of block

height. Last step was a neighboring analysis for constructing text lines. High accuracy

through fractal analysis and a fuzzy C-means classifier were used to find the type of

the document.

Bloomberg’s [87] text line segmentation algorithm was specially designed for sep-

arating text and halftone image from a document image. But it was unable to

discriminate between text and drawing type non-text components and therefore fails

23

Page 53: Automation of Preprocessing and Recognition of Historical Document Images

to separate them from each other. Hence Syed et al [88] presented a method to over-

come the Bloomberg’s algorithm and was able to separate text and non text regions

properly including halftones, drawings, map, graphs etc.

Bansal and Sihna et al. [89] proposed an algorithm which was based on the struc-

tural properties of the Devanagari script. They have implemented using two pass:

1) words were segmented into characters/composite characters, 2) height and width

of the character box was used to check whether the segmented character is single or

composite. Ashkan et al. [90] proposed skew estimation algorithm using eigen value

technique to detect and correct the skew in the document.

2.2.1 Enhancement of Historical Document Image

Ancient and historical documents strongly differ from the recent documents because

layout structure is completely different. As these documents contain variable struc-

ture, extraction of the contents are complicated. Besides, historical documents are

degraded in nature, due to ageing or faint typing, ink seepage and bleed through.

They include various disturbing facts like holes, spots, ornamentation or seals. Hand-

written pages include narrow spaced lines with overlapping and touching components.

Characters and words have unusual and varying shapes, depending on the writer, the

period and the place.

Relatively good progress can be found in the area of historical document image pro-

cessing. Shi and Govindarahu [91] proposed method for enhancement of historical

degraded document images using background light normalization. In their work, the

method captures the background intensity with the help of best fit linear function and

normalized with respect to the approximation. Shi and Govindaraju [92] also pro-

posed method for segmentation of historical document image using background light

intensity normalization. Yan and Leedham [93] proposed a thresholding technique

for binarization of historical documents. It uses local features vectors for analysis.

Gatos et al. [18] presented new adaptive approach for the binarization and en-

hancement of degraded documents. Proposed method does not require any parameter

24

Page 54: Automation of Preprocessing and Recognition of Historical Document Images

tuning by the user and can deal with degradations which occur due to shadows, non-

uniform illumination, low contrast, large signal-dependent noise, smear and strain.

It consisted of several distinct steps: pre-processing procedure using low-pass Wiener

filter, rough estimation of foreground regions, and background surface calculation

by interpolating neighboring background intensities, thresholding by combining the

calculated background surface with the original image while incorporating image up-

sampling and finally a post-processing step in order to improve the quality of text

regions and preserve stroke connectivity.

Gatos et al. [95] presented a new approach for document image binarization. The

proposed method was mainly based on the combination of several state-of-the-art bi-

narization methodologies as well as on the efficient incorporation of the edge details

of the gray scale image. An enhancement step based on mathematical morphology

operations were also involved in order to produce a high quality result while preserv-

ing stroke information. The proposed method demonstrated superior performance

against six well-known techniques on numerous degraded handwritten and machine

printed documents.

Shi et al. [96] proposed methods for enhancing digital images of palm leaf and

other historical manuscripts. They have approximated the background of a gray-

scale image using piece-wise linear and nonlinear models. Normalization algorithms

are used on the color channels of the palm leaf image to obtain an enhanced gray-

scale image. Experimental results have shown significant improvement in readability.

An adaptive local connectivity map has been used to segment lines of text from

the enhanced images with the objective of further the techniques such as keyword

spotting or partial OCR and thereby making it possible to index these documents

for retrieval from the digital library.

Probabilistic models for text extraction algorithm from degraded document images

has been presented in [86]. Document image was considered as mixture of Gaussian

densities which corresponds to the group of pixels belonging to foreground and back-

ground of document image. Also Expected maximization (EM) algorithm was used to

estimate the parameters of Gaussian mixtures. Using these parameters, the image is

25

Page 55: Automation of Preprocessing and Recognition of Historical Document Images

divided into two class: Text foreground and background using Maximum Likelihood

approach.

2.2.2 Segmentation of Historical Documents

Louloudis et al. [94] presented new text line detection method for unconstrained

handwritten documents. The proposed technique was based on the strategy that

consists of three distinct steps. The first step includes preprocessing for image en-

hancement, connected component extraction and average character height estimation.

In the second step, a block-based Hough transform was used for the detection of po-

tential text lines while the third step was used to correct possible false alarms. The

performance of the proposed methodology was based on a consistent and concrete

evaluation technique that relies on the comparison between the text line detection

result and the corresponding ground truth annotation.

Surinta and Chamchong [36] presented paper on image segmentation of historical

handwriting from palm leaf manuscripts. The process composed of following steps:

background elimination to separate text and background by Otsuas algorithm, line

segmentation and character segmentation by histogram of image.

Shi et al. [97], have presented new text line extraction method for handwritten

Arabic documents. The proposed technique was based on generalized adaptive local

connectivity map using a steerable directional filter. The algorithm was designed to

solve particularly complex problems seen in handwritten documents such as fluctu-

ating, touching or crossing text lines.

Nikolaou et al. [98] presented method towards the development of efficient tech-

niques in order to segment document pages resulting from the digitization of histori-

cal machine-printed sources. To address the problems posed by degraded documents,

they implemented an algorithm which uses following steps. First, using Adaptive

Run Length Smoothing Algorithm (ARLSA) to handle the problem of dense and

complex document layout, second to detect the noise areas and punctuation marks

that usually are present in historical machine-printed documents, third deals with de-

tection of possible obstacles created from background areas to separate neighboring

26

Page 56: Automation of Preprocessing and Recognition of Historical Document Images

text columns or text lines, and last step deals with segmentation using segmentation

paths in order to isolate possible connected characters.

The enhancement of the document with ink bleed through using recursive unsuper-

vised classification technique has been proposed by Fadoua et al. [99]. The presented

method performs recursively K-means algorithm on the degraded image with princi-

pal component analysis of the document image. Then cluster values are taken and

back projected on the space. The iterative method has used for finding logarithmic

histogram and separating background and foreground using K-means algorithm until

clear separation of background and foreground of the document was made.

Kishore and Rege [19] used unsharp masking to enhance the edge detail information

in the degraded document. Gatos et al. [100] proposed method mainly is based on

the combination of several state-of-the-art binarization methodologies as well as on

the efficient incorporation of the edge information of the gray scale source image. An

enhancement step based on mathematical morphology operations was also involved

in order to produce a high quality result while preserving stroke information.

Halabi and Zaid [101] presented an enhanced system for degraded old document.

The developed system was able to deal with degradations which occur due to shadows,

non-uniform illumination, low contrast and noise. Ferhat et al.[102] proposed image

restoration using Singular Value Decomposition and restored even blurred image.

Lu and Tan [14] proposed technique which estimates document background sur-

face using an iterative polynomial smoothing procedure. Various types of document

degradations are then compensated by using the estimated document background

surface intensity. Using L1-norm image gradient, the text stroke edge is detected

from the compensated document image. Finally, the document text is segmented by

a local threshold that is estimated based on the detected text stroke edges. Ntogas

and Ventzas [15] proposed binarization procedure consisted of five discrete steps in

image processing, for different classes of document images.

27

Page 57: Automation of Preprocessing and Recognition of Historical Document Images

Badekas and Papamarkos [103] proposed new method which estimates the best pa-

rameter values for each one of the document binarization techniques and also estima-

tion of the best document binarization result of all techniques. Laurence Likforman-

Sulem et al. [16] presented novel method for document enhancement which combines

two recent powerful noise-reduction steps. The first based on the total variation

framework and second based on Non-local Means. Non Local Means filter computa-

tional complexity depends on the size of the patch and window. Layout analysis is

required to extract text lines and identify the reading order properly which provides

proper input to classifiers.

Generic layout analysis for variety of typed text, handwritten and ancient Arabic

document image has been proposed in [104] paper. The proposed system performs

text and non text separation, then text line detection, and lastly reading order deter-

mination. This method can be combined with an efficient OCR engine for digitization

of documents. Considerable amount of work can be found on segmentation of histor-

ical documents in [105]. Hanault et al. [106] proposed a method based on linear level

set concept for binrizing the degraded documents. This method takes advantage of

the local probabilistic models and flexible active contour scheme. In the next section,

we present detailed literature survey on character recognition.

2.3 Character Recognition

The history of character recognition can be traced back as far as 1940, when the

Russian scientist Tyuring attempted to develop an aid for the visually handicapped

[107]. The first character recognizers appeared in mid 1940s with the development of

digital computers. Early work on automatic recognition of characters concentrated

either upon machine printed content or on a small set of well distinguished handwrit-

ten texts or symbols. Machine printed OCR systems in that period generally used

template matching in which an image is compared to a library of images. For hand-

written text, low level image processing techniques were used on the binary images to

extract feature vectors, which are then fed to statistical classifiers. With the explo-

sion of information technology, the previously developed methodologies found a very

28

Page 58: Automation of Preprocessing and Recognition of Historical Document Images

fertile environment for rapid growth in many application areas as well as OCR sys-

tems development [108], [109]. Structural approaches were initiated in many systems

in addition to statistical methods [110], [111].

The character recognition research was focused basically on the shape recognition

techniques without using any semantic information. This led to an upper limit in the

recognition rate, which was not sufficient in many practical applications. Historical

review of OCR research and development during this period can be found in [112] for

offline and online cases, respectively.

Stubberud et al. [113] proposed a method to improve the performance of an optical

character recognition (OCR) system, by using an adaptive technique that restores

touching or broken character images. By using the output from an OCR system

and a distorted text image, this technique trains an adaptive restoration filter and

then applies the filter to the distorted text image that the OCR system could not

recognize.

Indian language character recognition systems are still in the research stage. Most

of the research work is concerned with Devanagari and Bangla script characters, the

two most popular languages in India. Research work on Bangla character recognition

started in the early 90s. Chaudhuri and Pal [114] have discussed different works done

for Indian script identification. They have also discussed the various steps needed to

improve Indian script OCR development and have developed complete OCR system

for printed Bangla script. This approach involved skew correction, segmentation

and removal of noise. A technique with feature and template matching has been

implemented for recognition. A higher recognition rate was achieved in this method.

Sural and Das [115] have proposed a Hough transform based fuzzy feature ex-

traction method for Bangla script recognition. Some studies are reported on the

recognition of other languages like Tamil, Telugu, Oriya, Kannada, Punjabi, Gu-

jrathi, etc. Pal et al. [116] presented an OCR with error detection and correction

technique for a highly inflectional Indian language, Bangla. The technique was based

on morphological parsing where using two separate lexicons of root words and suffixes,

29

Page 59: Automation of Preprocessing and Recognition of Historical Document Images

the candidate root-suffix pairs of each input string, are detected, their grammatical

agreement was tested and the root/suffix part in which the error occurred was noted.

The correction was made to the corresponding error part of the input string by means

of a fast dictionary access technique.

Pal and Chaudhuri [117] have proposed a system for classification of machine

printed and hand written text line. They have used a method based on structural and

statistical features of the machine printed and handwritten text lines. They achieved

a score of 98.6% in recognition. This technique used string features extracted through

row and column wise scanning of character matrix.

Pal et al. [118] proposed a new method for automatic segmentation of touching

numeral using water reservoir. A reservoir is a metaphor to illustrate the region where

numerals touch. Reservoir could be obtained by considering accumulation of water

poured from the top or from the bottom of the numerals. Touching character position

(top, middle or bottom) can be decided, by considering reservoir location and size.

Next, analyzing the reservoir boundary, touching position and topological features of

the touching pattern, the best cutting point can be determined. By combining with

morphological structural features the cutting path was generated for segmentation.

Structural and topological features based tree classifier and neural network classifier

has been used for most of the Indian Languages [119].

Some work on recognition of Telugu characters could be traced in the literature.

Elastic matching using Eigen deformation for hand character recognition was pro-

posed by Uchida and Sakoe [120]. The accuracy of recognition was found to be

99.47%. The deformations within each character category are of intrinsic nature

and can be estimated by the principal component analysis of the actual deformation

automatically collected by the elastic matching.

Pujari et al. [121] has proposed an algorithm for Telugu character recognition that

uses wavelet multi resolution analysis to extract features and associative memory

model to accomplish the recognition tasks. Multifont Telugu character recognition

algorithm was proposed by Rasanga et al. [122] using spatial feature of histogram

30

Page 60: Automation of Preprocessing and Recognition of Historical Document Images

of orientation(HOG). Sastry et al. [123] implemented a methodology to extract and

recognize the Telugu character from palm leaf using decision tree concept.

Human machine interaction using optical character recognition for Devanagari

scripts has been designed by [124]. Shelke and Apte [125] proposed a novel method to

recognize handwritten character using feature extraction based on structural features

and the classification was done using their parameters. The final stage of feature

extraction was done by radon transform and classification was carried out with the

combination of Euclidean distance, feed forward and back propagation neural net-

works. The extended version of their paper, feature extraction, employs generation

of kernels using wavelet transform [126] and Neural networks [127]. Malayalam char-

acter recognition was proposed by John et al. [128] using Haar wavelet transform as

feature extraction approach and support vector machine as classifier. Pal et al. [129]

proposed a method to recognize unconstrained Malayalam handwritten numeral using

reservoir method. The main reservoir based features used were number of reservoirs,

positions of reservoirs with respect to bounding box of the touching pattern, height

and width of the reservoirs and water flow direction etc. Topological and structural

features were also used as feature extraction method along with the water reservoir

method.

Nagabhushan and Pai [130] have worked on Kannada Character Recognition area.

They proposed a method for the recognition of Kannada characters, which can have

spread in vertical and horizontal directions. The method uses a standard sized rect-

angle which can circumscribe standard sized characters. This rectangle can be inter-

preted as a 2-dimensional, 3×3 structure of nine parts which is defined as bricks. This

structure was also interpreted as consecutively placed three row structures of three

bricks each or adjacently placed three column structures of three bricks each. The

recognition has been done based on an optimal depth logical decision tree developed

during the Learning phase and did not require any mathematical computation.

Printed Kannada character recognition system was designed by Ashwin and Satry

[131] using zonal approach and support vector machine(SVM). In their zonal ap-

proach, the character image is divided into a number of circular tracks and sectors.

31

Page 61: Automation of Preprocessing and Recognition of Historical Document Images

Kannada characters are round in appearance. Text(ON) pixels in the radial and the

angular directions are effective in capturing the shape of the characters. Number of

ON pixels in each zone are taken as the feature set for recognition. They claimed

that their method was faster than the Zernike moment based method by eight times.

Kunte and Samuel [37] presented a paper an OCR system developed for the recog-

nition of basic characters (vowels and consonants) in printed Kannada text, which

can handle different font sizes and font types. Huas invariant moments and Zernike

moments that have been progressively used in pattern recognition are used in the

system to extract the features of printed Kannada characters. Neural classifiers have

been effectively used for the classification of characters based on moment features.

An encouraging recognition rate of 96.8% has been obtained.

Chaudhuri and Bera [132] have proposed a method for text line identification of

handwritten Indian scripts, especially of Bangla, as well as English, Hindi, Malay-

alam etc. They have used new dual method based on interdependency between

text-line and inter-line gap. The method draws curves simultaneously through the

text and inter-line gap points found from strip-wise histogram peaks and inter-peak

valleys. The approach worked well on text of different scripts with various geometric

layouts, including poetry. Lakshmi and Patvardhan [133] developed Telugu OCR

system. Kokku and Chakravarthy [134] have developed a complete OCR System for

Tamil magazine documents which uses RBF neural network for text identification and

character recognition. Shashikiran et al. [135] implemented a method to compare

the results of HMM (Hidden Markov Model) and Statistical Dynamic Time Warping

(SDTW) as classifier for Tamil on-line handwritten character recognition and have

shown that SDTW was better than other methods.

Hirabara et al. [136] presented a two-level based character recognition method

where dynamic zoning selection scheme is presented. Then, features are extracted

from a zone for character recognition. Neural network and a look-up-table were

employed to find the best zoning scheme for unknown English character. Zoning

is a simple way to obtain local information and it has been used for extraction of

topological information from patterns [137]. The goal of zoning is to obtain local

32

Page 62: Automation of Preprocessing and Recognition of Historical Document Images

characteristics as opposed to global characteristics. The resulting partitions allow to

determine the position of specific features of the pattern to be recognized [138], fixed

or symmetrical zoning [139], [140], [141].

Online handwriting recognition of Kannada characters was implemented by com-

bining Direction based Stroke Density principle(DSD) with Kohonen Neural Network

(KNN). DSD principle forms the basis for feature selection whereas the subsequent

classification stage is carried out by K- nearest neighbor [142]. Another work on

online handwritten Kannada character recognition proposed by Prasad et al. [143],

used the divide and conquer technique to reduce the number of combinations in the

compound character as it contained more than one consonants with vowels. The

structural and the dynamic features are used to segment the compound Kannada

characters into 282 distinct symbols. This reduction has helped to overcome the

huge data collection problem and also reduced the computational complexity. In the

next step, these symbols are further divided into three distinct sets of stroke groups,

thus, further reducing the search space for the recognition engine. Combining one

or more of these stroke groups will usually form thousands of Kannada compound

characters. PCA was used as dimensionality reduction method. The subspace fea-

tures of distinct stroke groups are fed to the respective classifiers in an order and the

output of these classifiers are combined to get the unicode of the recognized akshara.

The proposed work is an attempt made for the first time in Kannada language which

considers all possible combinations of symbols, including Kannada numerals.

Kunte and Samuel [144] has proposed another algorithm to address the problem

of Vatthaksharas using connected component analysis(CCA) and projection profile.

Initially CCA was used to extract the individual characters and then vertical projec-

tion profile was employed to extract remaining characters. Authors claimed that the

proposed method works well for other languages. Urolagin et al. [145] proposed a

method for Braille translation of Kannada character. They have employed decision

tree and three modular classifiers. Similar shaped characters were grouped and then

partitioned into categories at various levels to effectively create a decision tree. The

Braille equivalent of Kannada characters was obtained by using translation rules.

Authors claimed 93.8% accuracy in classification and translation. Sheshadri et al.

33

Page 63: Automation of Preprocessing and Recognition of Historical Document Images

[146] proposed a method for segmenting the Kannada character by decomposing each

character into components from three base classes and K means clustering technique

was employed to recognize the character.

Dandra et al. [147] proposed a method for recognition of handwritten Kannada

and English characters using zonal method. Each character was divided into 64 zones

and pixel densities were calculated for normalized character image size of 32 × 32.

Two different classifiers were employed to classify the character and compared the

performance. Authors claimed that their system works for non thinned and slanted

characters.

The recognition of Indian and Arabic handwriting is drawing increasing attention

in recent years. To test the promise of existing handwritten numeral recognition

methods and provide new benchmarks for future research, Chang et al. [148] pre-

sented some results of handwritten Bangla and Farsi numeral recognition on binary

and gray-scale images. For recognition of gray-scale images, they have proposed a

method with proper image pre-processing and feature extraction. Experiments on

three databases, ISI Bangla numerals, CENPARMI Farsi numerals, and IFHCDB

Farsi numerals have achieved very high accuracies using various recognition meth-

ods.

2.4 Summary

In this chapter, we have given detailed literature survey on image enhancement,

segmentation, skew detection and correction, feature extraction and recognition. In

the next chapter, we present image enhancement algorithms using spatial domain

techniques to enhance historical document images.

34

Page 64: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 3

Enhancement of Degraded

Historical Documents : Spatial

Domain Techniques

1

3.1 Introduction

A digital image is an image f(x, y) that has been discretized in both spatial coordi-

nates and intensity. Both spatial coordinate and intensity value constitute the pixel

or picture element. Processing of the image is performed through pixel operations

1Some of the material of this chapter appeared in the following research papers

1. B. Gangamma, Srikanta Murthy K, “Enhancement of Degraded Historical Kannada Documents”, Interna-

tional Journal of Computer Applications (0975a 8887), Volume 29 No.11, pages 1-6, September 2011.

2. B. Gangamma, Srikanta Murthy K, “An Effective Technique using Non Local Means and Morphological

Operations to Enhance Degraded Historical Document”, International Journal of Electrical, Electronics and

Computer Systems, Volume 4, Issue 2, pages 1-10, 2011.

3. B. Gangamma, Srikanta Murthy K, “Enhancement of Historical Document Image using Non Local Means

Filtering Technique”, IEEE International Conference on Computational Intelligence and Computing Research

(ICCIC), Kanyakumari, pages 1264-1267, 2011.

4. B. Gangamma, Srikanta Murthy K , Arun Vikas Singh, “Hybrid Approach Using Bilateral Filter and Set

Theory for Enhancement of Degraded Historical Document Image”, CiiT International Journal of Digital

Image Processing, DOI: DIP052012012, Volume 4, No 9, Issue May 2012, pages 488-496, 2012.

35

Page 65: Automation of Preprocessing and Recognition of Historical Document Images

and any change in the pixel intensity or spatial coordinate values changes the input

image. In general, image processing operations can be divided into four categories:

pixel operations, local operations, global operations, and geometric operations[149].

Pixel operations operate on individual pixels. Some of the examples are image in-

tensity addition, subtraction, contrast stretching, inversion of an image, logarithmic

and power-law transformation. In the local operations, the pixel value is influenced

by neighboring pixel value. The size of the neighborhood depends on the type of ap-

plications. Morphological filters, convolution, edge detection, smoothing filters and

sharpening filters are operations that fall under this category. The global operation

takes the entire image into consideration and processes the pixel. To name a few:

distance transformation of an image, histogram equalization and specification, im-

age warping, Hough transform, spatial-frequency domain transforms, and connected

components analysis etc. Geometric operations takes only required set of pixels that

is calculated by geometric transformation and changes the value of a specified pixel.

However usage of these techniques for enhancing historical document is not straight

forward. As discussed in chapter 1, Kannada Historical documents pose various prob-

lems like: low contrast, uneven background, noise accumulation, broken, erased and

blotched characters, cracks, holes(palm leaf and paper), the enhancement of such de-

graded document images is a real challenge to the research community. Hence there

is a dire need for developing image processing algorithms to enhance these degraded

document image by eliminating noise and uneven backgrounds and also enhance the

character[149] for further recognition. In this chapter, we present three spatial do-

main preprocessing techniques to enhance the degraded historical Kannada document

images.

It has been observed from the literature survey that a reasonable amount of work

has been reported in the area of historical document image processing [18], [95], [15],

[16]. Noticeable amount of work can also be found in the area of historical document

processing of Indic scripts [4], [91], [92], [96], [19], [8]. Few authors have worked

towards noise elimination, segmentation and era prediction of stone inscriptions of

Kannada. However, literature survey reveals that not much work has been carried

out on Kannada documents inscribed on the palm leaf and paper. In this research

work, we consider all three types: stone inscriptions, palm and paper documents

36

Page 66: Automation of Preprocessing and Recognition of Historical Document Images

under one heading as historical documents. In this chapter, we present three spatial

domain techniques to enhance the degraded documents.

The remaining part of this chapter is organized into following sections; In section

3.2, we explain the methodology which is implemented based on gray scale morpho-

logical reconstruction technique; In section 3.3, we present another image enhance-

ment based on bilateral filter technique; In section 3.4, we explain Non local means

filter technique based approach for image enhancement. Experimental results are

compared for the three techniques in section 3.5 and the summary of the work is

provided in section 3.6.

3.2 Gray Scale Morphological Reconstruction (MR)

Based Approach

The basic goal of any document image enhancement technique is to enhance the im-

age for binarization so that binarized image can be segmented into two classes: one

is foreground containing text and another is clear background. Global threshold-

ing algorithms work well for clean images. Local thresholding methods like Saulova,

Niblack will work fine for low contrast and unevenly illuminated images. However

these methods cannot be used directly on degraded documents. The results of bina-

rization(thresholding) using global thresholding method on the noisy images shown

in Figure(3.5) and Figure(3.7) are shown in Figure(3.6) and Figure(3.8)). These bi-

narized images are not suitable for segmentation of document image into lines, words

and characters which are further used to recognize the character. Any recognition sys-

tem is completely dependent on the output from its previous stages. So, there is need

for image enhancement techniques or combination of these techniques to enhance the

image, so that we can binarize the enhanced image directly using global threshold-

ing methods. In this section, we present an algorithm for degraded document image

enhancement using a combination of Adaptive Histogram Equalization(AHE), MR

technique and Gaussian filer. AHE is used for contrast enhancement of low contrast

image. Gay scale morphological operations, opening and closing are used for elim-

ination of uneven background and for suppressing the finer details. Gaussian filter

37

Page 67: Automation of Preprocessing and Recognition of Historical Document Images

is employed to suppress the background and to normalize the background intensity

along with smoothing. In the following sub sections, we present, brief explanation

about the techniques used in this method.

3.2.1 Overview of Mathematical Morphology

Image-processing techniques have developed exponentially in the past five decades

and among them, mathematical morphology has received a great deal of interest be-

cause it provides a quantitative description of geometric structure and shape while

also providing a mathematical description of algebra, topology, probability, and in-

tegral geometry [150]. Very few authors have used mathematical morphology for

document image enhancement. Ye et al. [151] proposed a method for extraction of

bank check items using morphological operations. Shetty and Sridhar [152] proposed

a method for background elimination of bank checks using gray scale morphological

operations. Mengucci and Granado [153] implemented a method for separating text

and figures from the book using morphological operations.

Mathematical morphology is a tool used for extracting image components that

are useful for representation and description of region of shape, such as boundaries,

skeletons and the convex hull. It can also be used as a tool for pre or post processing

such as, morphological filtering, thinning, and pruning [154].

The two basic morphological set transformations are erosion and dilation. These

transformations involve the interaction between an image A (the object of interest)

and a structuring set B, called the structuring element. Typically the structuring

element B is a circular disc or rectangle in the plane, but it can be of any shape and

any dimension.

Dilation: With A and B as sets in Z2 (set of Integer), the dilation of A by B

denoted as A⊕ B, is defined as

A⊕B = {Z(B)z∩ 6= ⊘} (3.1)

Erosion: With A and B as sets in Z2 (set of Integer), the erosion of A by B

denoted as A⊖ B, is defined as

A⊖ B = {Z(B)z ⊆ A} (3.2)

38

Page 68: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b) (c)

Figure 3.1: (a) Input image. (b) Result of binary morphological dilation operation.

(c) Result of binary morphological erosion operation.

where A in equation (3.1) and (3.2), is the object image and B is structuring element

of any size, but less than or equal to the size of A. Dilation expands the region of the

object. Erosion shrinks or thins objects in an image. Figure(3.1) shows the result of

dilation and erosion on object image A by structuring element B.

Opening and Closing: Erosion and dilation can be used in a variety of ways,

in parallel and series, to give other transformations including thickening, thinning,

skeletonization and many others. Opening and closing are two very important trans-

formations. Opening generally smooths contour in an image, breaking narrow isth-

muses and eliminating thin protrusions. Closing tends to narrow smooth sections of

contours, eliminating small holes, filling gaps in contours, fusing narrow breaks and

long thin gulfs [154]. The opening of image A by structuring element B, denoted by

A ◦B, is given by the erosion by B, followed by the dilation by B, that is

A ◦B = (A⊖ B)⊕ B (3.3)

Opening is like rounding from the inside of an object/structure. The opening of

A by B is obtained by taking the union of all translates of B that fit inside A and

parts of A that are smaller than B are removed. Closing is the dual operation of

opening and is denoted by A •B. It is produced by the dilation of A by B, followed

by the erosion by B:

A •B = (A⊕ B)⊖ B (3.4)

Figure(3.2) shows the application of opening and closing operation on input image.

Opening operation smooth the object from the outside. Holes are filled in and narrow

39

Page 69: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b) (c)

Figure 3.2: (a) Input image. (b) Result of binary morphological opening operation.

(c) Result of binary morphological closing operation.

valleys are closed in closing operation. Because opening suppresses the bright details

smaller than the structuring element and closing suppress dark details, these are

used in combination for image smoothing and noise removal. Opening can be used

to compensate for non uniform background illumination. Also subtracting an opened

image from original image produces even background [154].

In binary morphology both image A and structuring element B are binary images,

the operations applied on the two sets are logical operations such as AND, OR, and

COMPLEMENT, commonly referred as binary morphology. The output is also in the

form of binary image. Gray scale morphology operations are based on finding local

maxima and minima in specified window. In many cases, gray scale morphological

processing adopts symmetrical structuring elements so as to reduce computational

complexity [150]. The erosion of A by structuring element B at any location(x,y) is

defined as the minimum value of the image in the specified local region centred at

(x,y). Gray scale erosion is defined by following equation

[A⊖B](x, y) = min(s,t)∈B{f(x+ s, y + t)} (3.5)

Gray scale erosion computes the minimum intensity value of A in every local region,

eroded image will be darker than original image. Noise smaller than structuring

element will be eliminated. The gray scale dilation of A by B is defined by finding

the maximum value of the image in the window outlined by B and is given by

[A⊕ B](x, y) = max(s,t)∈B{f(x− s, y − t)} (3.6)

40

Page 70: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b) (c)

Figure 3.3: (a) Original Gray scale image. (b) Result of gray scale dilate operation

on image. (c) Result of gray scale erosion operation on image.

(a) (b) (c)

Figure 3.4: (a) Original Gray scale image. (b) Result of gray scale closing operation

on image. (c) Result of gray scale opening operation on image.

Result of gray closing and opening operation on input image are shown in figure

Figure(3.3).

Gray scale Opening and Closing Formulae for opening and closing for gray

scale morphology are same as binary morphology as specified in equation (3.3) and

(3.4). Figure(3.4) shows the result of gray closing and opening operations. In or-

der to enhance the degraded document, mathematical morphological operations gray

scale opening and closing are used with reconstruction technique. Before applying

these operations contrast of the input image is enhanced using adaptive histogram

equalization.

Morphological image processing also deals with one more concept called Mor-

phological Reconstruction which is based on dilation, erosion, opening and clos-

ing operations. Morphological reconstruction processing is based on two images, a

marker and a mask, rather than an image and a structuring element. Processing is

41

Page 71: Automation of Preprocessing and Recognition of Historical Document Images

completely based on the concept of connectivity, rather than a structuring element.

In our method we use morphological reconstruction technique based on opening and

closing. The main aim is to get a clear background by suppressing the noise.

3.2.2 Adaptive Histogram Equalization(AHE)

Histogram equalization is a technique used to adjust the image intensities in order

to enhance the contrast. Sometimes, the overall histogram of an image may have a

wide distribution, while the histogram of local regions is highly skewed towards one

end of the gray spectrum. In such cases, it is often desirable to enhance the contrast

of these local regions, rather than entire region using global histogram equalization.

To enhance the contrast effectively, AHE method is employed. In AHE method,

the image is divided into number of regions and different regions of the image are

processed differently depending on local properties.

3.2.3 Gaussian Filter

The smoothing filters are used to smooth the noisy image. Smoothing eliminates

noise and blurs the image. One of the smoothing filters is Gaussian smoothing fil-

ter(operator) is a 2-D convolution operator that is used to blur images and remove

detail and noise. It is like a mean filter, but it uses a different kernel that represents

the shape of a Gaussian (bell-shaped) hump. The Gaussian distribution in 1-D has

the form:

G(X) =1√2πσ

e−x2

2σ2 (3.7)

where σ is the standard deviation of the distribution of the Gaussian kernel. In

2-D, an isotropic (i.e. circularly symmetric) Gaussian has the form:

G(X, Y ) =1

2πσ2e−

x2+y

2

2σ2 (3.8)

The degree of smoothing filter is determined by the standard deviation of the

Gaussian kernel. The Gaussian outputs a weighted average of each pixels neigh-

borhood, with the average weighted more towards the value of the central pixels.

Because of this, a Gaussian provides gentler smoothing and preserves edges better

42

Page 72: Automation of Preprocessing and Recognition of Historical Document Images

than a similarly sized mean filter. In the next subsection, we present the proposed

methodology to enhance the degraded document using a combination of AHE, gray

scale morphological operations and Gaussian filter is applied to normalize the back-

ground intensity.

Figure 3.5: Noisy palm leaf document image belonging to 16th century.

(a)

Figure 3.6: Binarized noisy images of Figure(3.5).

3.2.4 Proposed Methodology

The proposed method consists of four major stages shown in flow chart Figure(3.9)

and are detailed in the following paragraphs.

Stage 1: The degraded noisy original color image with low contrast and uneven

illumination is taken as input. The color image is converted to gray scale image using

43

Page 73: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.7: Original image of palm leaf script belonging to 16th century.

Figure 3.8: Binarized noisy image of Figure(3.7).

equation(3.9).

Y = 0.2126R+ 0.7152G+ 0.0722B (3.9)

where Y is gray scale image and R, G, B are red, green and blue components of

color image. Processing of color image is complex and is not necessary for our work,

as we need to binarize the document image for segmentation and recognition. Also,

we are concentrating on the foreground(character) and background pixel intensities.

Then AHE is applied on gray scale image to get histogram equalized and contrast

enhanced image. AHE calculates the multiple local histograms and equalizes the

image intensity. This image is referred to as R1 and is shown in Figure(3.10)(a) and

(b) on images shown in Figure(3.5) and Figure(3.7).

44

Page 74: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.9: Flow chart for MR based method.

45

Page 75: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 3.10: AHE result on images shown in Figure(3.5) and Figure(3.7)

Stage 2: This stage consists of two steps: first morphological gray scale opening

and second reconstruction stage. In first step, morphological gray scale opening

operation is applied on stage 1 output, i.e R1 to get opened image R2. Resulting

image of opening operation is added to adaptive histogram equalized image(R1) to

get image R3 in the second step, with clear background which is further used as

input to next step. Here concept of morphology reconstruction is applied on opened

image which acts as marker and histogram equalized image which acts as mask. Since

the opening operation removes small bright features, the result of opened operation

will be darker than the original image. Opening can also be used to compensate

for non uniform background illumination. Addition of R1 and R2 produces bright

background image. This intermediate result image R3 is an enhanced image with

good contrast showing clear separation of text and background. The images shown

in Figure(3.11)(a), (b) are the opening operation on R1 and Figure(3.11)(c), (d) are

the reconstructed images.

Stage 3: This stage in turn contains two steps : first morphological closing on

R3 that is output of stage 2; and second part consists of reconstruction steps. As

closing operation suppresses dark details, it is used to remove darker pixels smaller

than structuring element. The result of this is shown in Figure(3.12)(a), (b). Further

reconstruction step consists of subtracting R4, the closed image from R1, giving

intermediate reconstructed image R5. Again R5 is subtracted from opened image R2

to get R6. A combination of opening and closing operations are most suitable for

image smoothing and noise removal. The result of this stage is shown in Figure with

uniform background intensity and smoothed image. Results of the stage 3 are shown

in Figure(3.12)(a)-(f).

46

Page 76: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 3.11: Result of stage 2. (a), (b) are results of opening operation on images

shown in Figure(3.10)(a), (b). and (c), (d) are results of reconstruction technique.

(a) (b)

(c) (d)

(e) (f)

Figure 3.12: Result of stage 3. (a), (b) Results of closing operation on stage 2 output

images shown in Figure(3.11)(a), (b). (c), (d) Subtraction of R1 from R4. (e), (f)

Subtraction of result of previous step from R2.

47

Page 77: Automation of Preprocessing and Recognition of Historical Document Images

Stage 4: Enhanced image is further subjected to Gaussian filtering to eliminate

noise that are larger than structuring element, as it provides gentler smoothing and

preserves edges better than a similarly sized mean filter. The result of Gaussian filter

on R6 produces the smoothed image R7. Reconstruction techniques is applied on

R7 and R1 with addition operation to get enhanced image R8 having clear bright

background. Lastly thresholding is applied on R8 to get binarized image using global

threosholding method by Otsu [53]. The proposed method The result of Gaussian

filter, reconstruction step and binarization are shown in Figure(3.13)(a), (b), Fig-

ure(3.14)(a), (b) and Figure(3.15)(a), (b). The proposed method is very fast as it

uses simple linear and non linear filters and computational complexity/cost is very

low which is almost equal to M , where M and N represent the size of image.

(a) (b)

Figure 3.13: (a), (b) Results of Gaussian filter on images shown in Figure(3.12((e),

(f).

(a) (b)

Figure 3.14: Morphological reconstruction technique on images shown in Fig-

ure(3.13)(a), (b).

3.2.5 Results and Discussion

Experimentation is conducted on the Kannada historical document data set contain-

ing 2700 images of palm, paper and stone inscriptions having various sizes. Enhanced

48

Page 78: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.15: Binarized images of Figure(3.14)(a),(b).

images have clear background due to which it produced proper binarized images. Pa-

per documents pose problems like: decoloring, bleeding through, ink seepage, stains,

holes etc. The proposed method has been applied to enhance the degraded paper

documents. Images shown in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and

Figure(B.4) are paper manuscripts that are nearly one and half centuries old. The

dark brown color, stains, and dust accumulation have resulted in a low contrast image.

The result of the proposed method on these documents are shown in Figure(3.16)(a),

(b), (c), (d) respectively.

Palm scripts will become dark brown after repeated application of disinfectant to

prolong their duration. These scripts usually have low contrast with characters sub-

merged in the background making the smoothing operation difficult. Also this dark

background makes the script illegible. So these scripts need thorough preprocessing

before preservation. The proposed method works well even for the documents and en-

49

Page 79: Automation of Preprocessing and Recognition of Historical Document Images

hances the images by eliminating noisy background and produces a clear image with

almost white background. The palm leaves are taken from a collection of manuscripts

from 16th to 18th centuries. Images shown in Figure(A.1) and (A.3) are palm leaf

scripts with low contrast and noise. Enhanced images are shown in Figure(3.17)(a),

(b).

The experimentation has also been performed on stone inscription images. Results

of MR based method on stone inscription images are shown in Figure(3.19) (a) and

(b) and Figure(3.18) along with input images of stone inscriptions belonging to 14th

- 17th centuries shown in Appendix 1 Figure(C.1), Figures(C.3) and Figure(C.2).

However this method is unable to enhance the images shown in Figure(C.1) and

Figure(C.2) properly. The stone inscriptions are severely degraded and are difficult

to process and extract the information. Therefore improving the algorithm is a real

challenge.

The proposed method is compared with average, median and Gaussian filters.

These are well known smoothing algorithms used to eliminate noise. But sometimes,

these methods also smooth out the edge information causing a kind of blurring effect.

If the filter size is more, the smoothing algorithm produces a more blurred image and

makes the task of binarization very difficult. Binarized images are used as evaluation

criteria for evaluating the performance of the enhancement technique. Binarized im-

ages of the three smoothing are compared against binarized image of MR method and

results are shown in Figure(3.20) and MR outperforms the compared three smoothing

filters.

The proposed MR method is able to enhance the low contrast and noisy documents

with reasonable complexity, but fails in eliminating the noise completely from severely

degraded document images. The MR method requires proper selection of size of the

structuring element and filter mask. The selection of the parameters is difficult due to

various factors for degradation. Also the size of the inscriptions, size of the character,

status of the inscribed material and the age of the document make the enhancement

process much more complex. Resolution of the camera with which the document

was captured, lighting conditions during image acquisition also contribute to severe

50

Page 80: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 3.16: (a), (b), (c), (d) Results of MR based method paper images shown

in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4) belonging to

nineteenth and beginning of twentieth century.

51

Page 81: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.17: (a), (b) Results of MR based method on image of palm leaf shown in

Appendix 1 Figure(A.1) and (A.3) belonging to 16th to 18th century .

52

Page 82: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.18: Result of MR based method on sample image taken from Belur temple

inscriptions Figure(C.2) belonging to 17th century AD.

(a) (b)

Figure 3.19: (a), (b) Result of MR based method on stone inscriptions shown in

Appendix 1 Figure(C.1), Figures(C.3) belonging to 14− 17th century.

degradation. Hence there is need for an improved technique which addresses few of

these problems and also attempts to enhance the severely degraded documents. In the

next section, we present, one more method based on bilateral filter and combination

of morphological operations and Gaussian filter.

53

Page 83: Automation of Preprocessing and Recognition of Historical Document Images

(a) (e)

(b) (f)

(c) (g)

(d) (h)

Figure 3.20: Comparison of proposed method with Gaussian, Average and Median

filter. Figures (a), (b), (c), (d) show the result of respective methods and figures (e),

(f), (g), (h) show the binarized images of (a), (b), (c), (d).

3.3 Bilateral Filter (BF) Based Approach

The second spatial domain technique based on bilateral filtering in combination with

first method morphological reconstruction has been developed to address the severely

degraded documents. Bilateral filtering has been introduced by Tomasi and Manduch

[155] as a nonlinear filter which combines domain and range filtering. This bilateral

filtering is widely used to eliminate the noise, without losing the edge information.

54

Page 84: Automation of Preprocessing and Recognition of Historical Document Images

Many authors have used bilateral filter in various applications. Barash [156] imple-

mented a common framework for nonlinear diffusion, adaptive smoothing, bilateral

filtering, and the mean shift paradigm. Hamarneh and Hradsky [157] extended the

well-known scalar image bilateral filtering method to diffusion tensor(DT) magnetic

resonance images. Bilateral image filtering scalar version was extended to perform

smoothing with edge-preserving of diffusion tensor. Authors applied bilateral DT

filtering in the Log-Euclidean framework to guaranteed valid output tensors. Bazan

and Blomgren [158] proposed a new image smoothing and edge detection technique

by combining nonlinear diffusion and bilateral filtering and developed a simple dif-

fusion criterion function based on second derivative of the fact that depends on the

correlation between noisy image and filtered image. May authors have used bilateral

filters directly, indirectly, in combination with other techniques to denoise the noisy

images and to enhance the degraded images. The following section provides brief

overview of the bilateral filter.

3.3.1 Overview of Bilateral Filter

Bilateral filter is a non linear filter in spatial domain, which does averaging without

smoothing the edges. The bilateral filter takes a weighted sum of the pixels in a

local neighborhood; the weights depend on both spatial and intensity distance. This

weighted sum is nothing but a product of two Gaussian filter weights, one of which

corresponds to average intensity in the spatial domain, and second weight corresponds

to the intensity difference. Hence no smoothing occurs, when one of the weights is

close to 0. It means that the product becomes negligible around the region, where

intensity changes rapidly, which usually represents the sharp edges. As a result, the

bilateral filter preserves sharp edges. Mathematically, at a pixel location p of image

I, the output of a bilateral filter is calculated as follows.

BF [I](p) =1

Wp

q∈S

Gσr(|I(q)− I(p)|)Gσs(‖ p− q ‖)Iq (3.10)

where normalization factor Wp ensures pixel weights sum to 1.0:

W (p) =∑

q∈S

Gσr(|I(q)− I(p)|)Gσs(‖p− q‖) (3.11)

55

Page 85: Automation of Preprocessing and Recognition of Historical Document Images

where Gσ is Gaussian filter given by the equation

Gσ(x) =1

2πσ2exp(− x2

2σ2) (3.12)

Gaussian filtering is a weighted average of the intensity of the adjacent positions

with a weight decreasing with the spatial distance to the center position p. The

weight for pixel q is defined by the Gaussian Gσ(‖p − q‖), where σ is a parameter

defining the neighborhood size.

Parameters σs and σr are parameters controlling the fall-off of weights in spatial

and intensity domains, respectively. σs controls the amount of the standard Gaussian

spatial filtering while σr controls the discrimination power between true features and

noises. It has been observed that a large σs causes more smoothing [155]. The

bilateral filtering will behave similar to the normal low pass filter for too large σr or

no filtering for too small σr. When σr is infinite, the bilateral filter is reduced to a

normal low pass filter. Therefore, the choice of these two parameters will be essential

as it affects the performance of bilateral filtering. The time complexity of this filter

is O(N ∗M ∗D2), where N ∗M is the size of the image and D2 is the product of two

Gaussian filter weights, while the first corresponds to average intensity in the spatial

domain, the second weight corresponds to the intensity difference.

3.3.2 Proposed Methodology

We present, second enhancement method based on combination of bilateral filter,

mathematical morphology reconstruction techniques and Gaussian smoothing. The

flow chart for the proposed method which consists of three stages is shown in Fig-

ure(3.10) and is detailed in the following paragraph in detail.

Stage 1 : The color image is converted into gray scale image shown in Fig-

ure3.22(a) using the equation(3.9). Bilateral filter is applied on the gray scale image

to get filtered image R1. As mentioned in subsection 3.3.1, bilateral filter denoises

the image without smoothing out the edges. The result of stage 1 is shown in Fig-

ure(3.23)(a).

56

Page 86: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.21: Flow chart for BF based method.

57

Page 87: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.22: (a) Input image of the palm leaf manuscript belonging to 18th century.

(b) Its binarized version.

58

Page 88: Automation of Preprocessing and Recognition of Historical Document Images

Stage 2 : Morphological gray scale opening is applied on R1 output of the first

Stage 1, with disk structuring element and output is R2 image. Output of opening

operation R2 is subjected to closing operation to get result image R4. R1 and R3 are

added to get R4 as morphological reconstruction step. As we explained in the previous

MR method, the opening and closing operations are suitable in suppressing noise

edging gap between objects and filling holes. These are effective in reconstruction of

the objects.

Stage 3 : Intensity normalization is done to suppress the background with uneven

intensity using Gaussian filter with large window size and standard deviation. The

reconstructed image R4 is blurred and the blurred image R5 is subtracted from the

bilateral filtered image R1 to get R6. Again the resulting image R6 is added to the R1,

bilateral filtered image and reconstructed image R in order to get the enhanced image

R7. The morphological dilation is applied to smooth the edges by dilation operation.

The result image R8 is binarized using global thresholding Otsu [53] method. The

result of stage 2 and 3 are shown in Figure(3.23)(a) and (b).

3.3.3 Results and Discussion

Bilateral filter(BF) based approach is applied to enhance the Kannada historical doc-

ument images. Experimentation has been performed on three types of digitized image

with varying size as well as 512×512 size images. The proposed method enhances the

degraded image by eliminating noise and uneven background. The binarized image

of the preprocessed image can be further used to segment the document into lines,

words and characters for recognition purpose. So the preprocessing stage plays a very

important role in pattern recognition. The accuracy of the recognition system com-

pletely depends on the features extracted. Features extraction in turn depends on the

segmentation of the character from the binarized image. So preprocessed image gives

better binarized images and improves the recognition rate of any recognition system.

Therefore preprocessing algorithms are required to enhance the degraded document

image.

59

Page 89: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

(c)

Figure 3.23: (a) Filtered image using BF method. (b) Final result of the BF method.

(c) Binarized version of enhanced image.

60

Page 90: Automation of Preprocessing and Recognition of Historical Document Images

The result images of the proposed BF method are shown in Figure(3.23). The

input noisy image shown in Figure(3.22)(a), result of BF method in Figure(3.23)(a),

enhanced image using morphological operations in Figure(3.23)(b). The binarized

version of the enhanced image and binarized version of noisy image are shown in

Figure(3.23)(c). Results of BF method on paper document images are shown in Fig-

ure(3.24)(a), (b), (c), (d) on images shown in Appendix 1 Figure(B.1), Figure(B.2),

Figure(B.3) and Figure(B.4) respectively.

The palm leaf document images are taken from collection of manuscripts from

16th to 18th centuries. The results of BF method on palm leaf images are shown

along with the input images in Figure(3.25) (a) and (b) on input palm leaf images

shown in Figure(A.2) and Figure(A.5). The MR and BF method results are given

in Figure(3.26). Experimentation on Figure(A.2) and Figure(3.7), Figure(A.6) are

shown in Figure(3.27)(a), (b) and Figure(3.28) respectively.

The experimentation on stone inscriptions images are shown in Figure(3.29)(a),

(b) and Figure(3.30)(a) are enhanced images of Figure(C.1) and Figure(C.3) and

Figure(C.2) respectively. The results of BF method are better than MR method.

The edges are sharper than MR method. BF enhances the severely degraded images

by preserving sharp edges. The performance of the proposed method completely

depends on the parameters σd and σr, as these parameters are controlling parameters

in spatial and intensity domains, respectively. Experimentation has been conducted

using Gaussian window size 4, σd = 7 and σr = 0.3. These values are selected based

on experimentation. The choice of these two parameters has produced good results

compared to the first method. However this method is unable to handle all types

of degradations and fails in handling and enhancing the degraded documents. To

enhance the degraded documents, another algorithm is developed in the next section

using Non Local Means filter based approach.

61

Page 91: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 3.24: (a), (b),(c),(d) Results of BF based method on input paper images in

Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4) respectively.

62

Page 92: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.25: (a), (b) Results of BF based method Figure(A.4 and Figure(A.5.

(a) (b) (c)

Figure 3.26: (a) Input image of palm leaf manuscript. (b) Result of MR based

method. (b) Enhanced image using BF based method.

63

Page 93: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.27: (a) (b) are results of BF based method on input image in Figure(A.2)

and Figure(3.7).

Figure 3.28: Result of BF based method on image Figure(A.6)

64

Page 94: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 3.29: (a), (b) Results of BF based method on image in Figure(C.1) and

Figure(C.3).

Figure 3.30: Result of BF based method on Figure(C.2) Belur temple inscriptions

belonging to 17th century AD.

65

Page 95: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.31: Non Local Mean Filter Approach. Small patch of size 2p+ 1 by 2p+ 1

centred at x is the candidate pixel, y and y′ are the non local patch within search

window size 2k + 1 by 2k + 1.

3.4 Non Local Means Filter (NLMF) Based Ap-

proach

Smoothing algorithms are usually employed for noise elimination. But these algo-

rithms blur the image in the process of elimination of noise. Isotropic, median and

mean filters average the pixel values in the direction of contours and they tend to

preserve straight curvature but fail to preserve corners. Wavelet denoising is one

technique widely used for filtering. Wavelet filters try to separate the input image

into true image and noisy image by removing the higher frequencies components from

the lower frequency components by assuming that high frequencies corresponds to

noise. When the high frequencies are removed, the high frequency content of the

true image will be removed along with the high frequency noise because the method

cannot categorize the components as noise and true image components [17], resulting

in a loss of finer details in the denoised image. Noise in the low frequency components

will be left unaddressed after filtering. To address the loss of detail in the filtered

image, Antoni Buades et al. [17] developed the non-local means filter.

66

Page 96: Automation of Preprocessing and Recognition of Historical Document Images

3.4.1 Overview of Non Local Means Filter

By taking the advantage of similar sub window in the same image, Antoni Buades

et al. [17], introduced Non-Local Means filter for image denoising. This filter uti-

lizes spatial correlation in the entire image for noise removal and adjusts each pixel

value with a weighted average of neighborhood pixels that has a similar geometrical

structure. Even movie denoising can be achieved by employing Non Local Means Fil-

ter(NLMF) [159]. Given a discrete noisy image S = S(x)|x ∈ I, where I represents

set of images, the computed value of NL{S(x)} for the pixel S is given by

NL{S(x)} =∑

y∈N(x)

w(x, y)S(y) (3.13)

where x is the candidate pixel, y is the pixel in the neighborhood N . Similarity

between two patch weights can be computed using Euclidean distance and weight

function for the Gaussian kernel window is given by

w(x, y) = − 1

C(x)exp

(‖ S(x)− T (y) ‖22,ah2

)

(3.14)

where a > 0 is the standard deviation of Gaussian Kernel, h is the parameter which

controls the degree of the filtering and C(x) is normalizing constant given by

C(x) =∑

y∈N(x)

exp

(‖S(x)− T (y)‖22,ah2

)

(3.15)

Size of the patch window shown in Fig(3.31) should be 2p+ 1 by 2p+ 1, and size of

the search window should be greater than patch window and is 2k + 1 by 2k + 1.

Figure(3.32) shows a palm script image with low contrast, dark letters. The doc-

ument image has to be preprocessed before performing segmentation on it. Non local

mean filter technique works well for denoising the noisy pixel values with similarity

measure of the non local neighbor window. The complexity of the non local filter

algorithm is K2 ∗ P 2 ∗ N ∗M , where P = 2p + 1, K = 2k + 1, N ∗M are the total

number of pixels in the image. Center pixel in the Figure(3.31) is the candidate pixel

for evaluation within search window of size 2k + 1 by 2k + 1, and other surrounding

pixels are the centers of the neighboring patches. New value is calculated for the

67

Page 97: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.32: Input palm script image with low contrast.

candidate pixel based on the similarity measure between candidate pixel patch and

surrounding patch. Then replace the value of the average of non local windows, which

is having highest similarity value. The patch having stronger influence on the denois-

ing of pixel x is selected for replacing the pixel value by taking average of that patch

window. Eq(3.14) will be used to find the similarity measure between two patches.

New value of the candidate pixel will be calculated by Eq(3.13). The parameter h

controls the filtering process and gives prominent result in the range of 10 to 15 in

the case of document images. For smaller size document with small font size images

k and p should be in between 4 to 5 and 2 to 4 respectively to preserve the edge

details and sharp continuity.

3.4.2 Proposed Algorithm

The proposed algorithm for enhancement of the degraded document image uses a

combination of mathematical morphology and Non Local Mean filter (NLMF). It

employs gray scale mathematical morphological operations opening and closing in

combination with NLMF technique. The method consists of three stages and is

explained in details in the following paragraphs.

Stage 1 : The color image is converted into gray scale image R using the equa-

tion(3.9). As we discussed in the MR method, opening and closing operations are

helpful in removing noise around characters by using suitable structuring elements

and also to bridge the gap between strokes of the characters shown in Figure(3.33).

Gray scale morphological opening operation is applied on input image R to get R1

68

Page 98: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.33: Result of NLMF method with residual image on Figure(3.32).

69

Page 99: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.34: (a) Result of NLMF based method on image shown in Figure(3.32). (b)

Binarized image.

70

Page 100: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.35: Flow chart for NLMF based method.

71

Page 101: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 3.36: (a) Original image. (b) Filtered image using NLMF. (c) Binarized image

of the proposed NLMF method. (d) Binarized noisy image using Otsu method.

image. Morphological reconstruction is performed to get R2, by adding R1 and R.

Further gray scale closing operation is applied to get R3. Subtraction of R3 from

input image R reconstructs the image R4.

Stage 2 : The dilation operation on image R4 dilates the boundary of the char-

acter so that it bridges the gap between broken character. The result image R5 is

shown in Figure. Further NLMF is applied to remove the noise present in the image

by replacing the pixel value with the value of similar geometrical structure in the

neighborhood window.

Stage 3 : Postprocessing is performed to eliminate background by applying open-

ing on filtered image and then filtered image is added back to NLM filtered image.

The main advantage of NLMF is that denoising will be performed on the image if it

contains noise. Otherwise the image will not be altered, as it will not find maximum

similar patch in neighboring window. Finally Otsu method is applied to get the bi-

72

Page 102: Automation of Preprocessing and Recognition of Historical Document Images

narized image. The proposed algorithm is given below.

3.4.3 Results and Discussion

Experimentation has been conducted on images of palm script, paper and stone

inscriptions with variable size and resolution. The parameters for patch window size

p, search window size k and filter controlling parameter h are selected by observing

the character size. Usually the values of k, p, and h can be 4 to 5, 2 to 4 and 10 to 15

respectively. Initially the values for k, p and h are taken as 4, 2, and 10 respectively

and experimentation is carried out. The result of the proposed method along with

original image, is shown in Figure(3.33). Enlarged version of result of NLMF method

on image in Figure(3.32) is shown in Figure(3.34)(a) along with postprocessed image

is shown in Figure(3.34)(b). However experimentation has been carried out to verify

the result for various combinations of k, p, and h values, where k is set to 4, p is

to 2 & 3 and h is set to 10 & 15 respectively. As the search window value k, patch

size p, and the degree of filtering h values increase, the image becomes blurred and

it also increases computational time. For larger character sized documents, k value

has been set to 5 and p value to 3 to get a sharp edged image.

The major problem with the NLMF based method is the selection of the size of

the patch and search window. If the size of the search window is large, then image

becomes over smoothed and if the size is small, then smoothing effect will not be

noticed. The time taken to denoise depends directly on the square of patch window

size and the square of the search window size. If the both window sizes are small and

image is very large, then more number of iterations are required to denoise the image

and the method takes more time as the time complexity is more than quadratic order

as explained earlier.

Experimentation has been carried out using NLMF based approach to enhance the

palm leaf document images of 3 to 5 century old. Images shown in Figure(3.36)(a)

is the input noisy palm leaf image. Figure(3.36)(b) shows the filtered image us-

ing NLMF method without eliminating the background. Figure(3.36)(c) and Fig-

73

Page 103: Automation of Preprocessing and Recognition of Historical Document Images

ure(3.36)(d) are the result of binarized image of Figure(3.36)(a) and binarized image

without enhancement respectively.

Experimentation has been carried out on previous century paper documents in the

form of digital and results are shown in Figure(3.37) (a), (b), (c), (d) on input images

shown in Appendix 1 Figure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4). The

NLMF method performs better and enhances paper document images better than

MR and BF method.

Some more results of NLMF method on palm documents images are shown in

Figure(3.38) along with the results of previous methods. Experimentation on palm

leaf images shown in Appendix 1 Figure(3.5) and Figure(A.1) gives the results shown

in Figure(3.39) and Figure(3.40) respectively.

The images shown in Figure(3.41) (a), (b) and Figure(3.42) are the results on

NLMF method on stone inscription images shown in Figure(C.1) and Figure(C.3) and

Figure(C.2) respectively. The proposed method has enhanced the image properly, but

is unable to eliminate noise completely.

74

Page 104: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 3.37: Results of NLMF based method on input images in Appendix 1 Fig-

ure(B.1), Figure(B.2), Figure(B.3) and Figure(B.4)

75

Page 105: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b) (c)

Figure 3.38: (a) Result of MR based method, (b) enhanced image of using BF based

method, and (c) result of NLMF based method on input image shown in Figure(3.26).

(a)

(b)

Figure 3.39: (a) and (b) Results of NLMF based method on input images shown in

Figure(A.2) and Figure(A.1).

76

Page 106: Automation of Preprocessing and Recognition of Historical Document Images

Figure 3.40: Result of NLMF based method on input image in Figure(A.6).

(a) (b)

Figure 3.41: Results of NLMF nased method on images Figure (C.1 and Figure(C.3).

3.5 Discussion of Three Spatial Domain Techniques

The performance of the denoising and compression techniques are usually measured

using Peak Signal Noise Ratio (PSNR) and is given by equation

PSNR = 10.log10MAX2

I

MSE(3.16)

77

Page 107: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 3.42: (a), (b) Results of NLMF based method on images shown in Figure(C.2)

and Figure(C.4).

78

Page 108: Automation of Preprocessing and Recognition of Historical Document Images

Table 3.1: Comparison of PSNR values and execution time for three spatial domain

methods to enhance the paper document images of 512× 512 size.

PSNR in dB Time in seconds

S. No MR BF NLMF MR BF NLMF

1 23.7489 37.7467 37.9339 1.3809 4.1930 75.5781

2 23.0110 37.2470 37.3458 0.8599 3.9922 75.0680

3 21.7989 36.4558 36.6667 0.7258 3.6989 74.9188

4 27.9908 38.9503 49.2923 0.9177 4.2560 75.2277

5 21.6121 31.8781 49.0835 0.5370 3.3715 75.5161

6 32.9706 36.5272 42.0805 0.7368 3.8580 80.6541

7 23.3884 31.8267 36.7855 0.5735 3.4506 77.8826

8 24.3084 33.9875 38.0856 0.6086 3.5448 77.3176

9 21.7715 38.6051 37.2253 0.8701 4.2231 74.8562

10 18.8089 38.4077 38.7401 0.8700 4.2708 75.0643

11 27.5821 38.5781 42.3505 0.8114 4.1390 73.8697

12 24.7502 34.3364 39.4167 0.6264 3.6351 74.9942

13 28.3042 36.3690 41.1888 0.6785 3.7626 74.8971

14 26.0303 33.3616 37.1162 0.6218 3.5696 76.4342

15 22.0632 32.7488 36.3494 0.5879 3.4778 74.6217

where MAXI represents the maximum intensity in the gray scale image and MSE is

the Mean Square Error given by

MSE =1

mn

m−1∑

i=0

[s(i, j)− g(i, j]2 (3.17)

where s is the input noisy image and g is the enhanced output image.

The performances of the three spatial methods in this chapter are measured using

PSNR value, execution time and human interpretation. PSNR value is the quanti-

tative measurement for performance, based on the intensity difference between input

image and output image, which is in the form of mean square error. The PSNR

value is zero if the images are same. High PSNR value signifies large difference in the

79

Page 109: Automation of Preprocessing and Recognition of Historical Document Images

Table 3.2: Comparison of PSNR values and execution time for three spatial domain

methods to enhance the palm leaf document images of 512× 512 size.

PSNR in dB Time in seconds

S. No MR BF NLMF MR BF NLMF

1 26.4524 42.2957 47.9365 0.5969 3.4360 74.9510

2 26.2149 43.1558 51.8227 0.6293 3.5938 75.0180

3 28.0176 36.1681 42.8645 0.5465 3.3711 74.6831

4 37.7392 43.9641 50.7459 0.6046 3.5207 74.4475

5 36.5024 45.4206 53.4520 0.6582 3.7449 74.3839

6 35.1399 42.8637 55.7108 0.6127 3.6044 73.0795

7 36.3590 39.1353 36.7321 0.5780 3.3639 72.8865

8 26.1604 43.5994 53.8239 0.7203 3.7519 73.1462

9 35.3636 35.9913 40.6100 0.7166 3.3461 73.6913

10 28.9509 34.3862 42.0768 0.5607 3.3302 73.8049

11 36.1223 36.2438 48.1984 0.5725 3.2981 73.7186

12 35.7097 47.4333 68.1295 0.7405 3.8808 74.3674

13 25.1643 43.6651 52.2960 0.6647 3.5375 71.8197

14 31.6850 40.2207 54.0870 0.6022 3.4445 74.0001

15 33.1248 36.5673 41.2524 0.5893 3.3587 73.7465

16 29.8257 35.5526 46.7688 0.5671 3.3819 71.9199

17 29.8723 40.3023 46.9310 0.7836 3.8665 70.9062

18 29.8161 38.2893 46.0705 0.6401 3.7022 71.4467

19 31.4859 37.2854 43.0458 0.5720 3.4136 72.5182

20 27.0621 44.8477 66.3683 0.6410 3.6374 71.6499

21 28.8259 34.1294 41.2640 0.5590 3.4780 71.0915

22 31.4629 39.4394 70.8347 0.6927 3.8897 71.2999

23 27.7132 42.3511 53.9032 0.7264 3.7395 70.8606

24 33.4980 36.5833 48.6999 0.5746 3.3302 71.0554

25 23.1851 41.2683 49.8445 0.6972 3.8921 71.5517

intensity of input and output images, and is considered a good measure for obtaining

better results. However, it is very difficult to prove that the method having high

80

Page 110: Automation of Preprocessing and Recognition of Historical Document Images

Table 3.3: Comparison of PSNR values and execution time for three spatial domain

methods to enhance the stone inscription images of 512× 512 size.

PSNR in dB Time in seconds

S. No MR BF NLMF MR BF NLMF

1 27.5511 34.5556 55.3837 0.7176 7.0044 72.7901

2 24.1430 37.8203 59.7441 0.9204 6.8328 74.8493

3 32.3258 46.0989 49.5172 0.9360 6.7236 72.9305

4 29.3418 42.6803 51.0786 1.1076 7.0200 71.9633

5 24.7338 34.7385 41.8901 0.7020 6.7236 71.9165

6 27.2829 37.3984 37.2663 0.7644 6.8328 72.2753

7 27.8336 34.1124 38.5350 0.6240 6.8172 72.8993

8 28.2949 37.7342 34.9951 0.6552 6.9108 72.2285

9 21.6276 29.5958 34.9956 0.7332 6.7860 73.4609

10 21.7366 32.9754 43.9985 0.7644 6.5988 73.4921

PSNR value will give better results. Human interpretation is required along with

PSNR value to judge the quality of the output of the proposed system. The perfor-

mance of the algorithm is also measured using execution time. All these methods are

implemented on Intel Core i5-560M Processor with 2.66 GHz speed machine. The

algorithm with low computational time would be considered as the better method.

However the computational time is not the standard way of evaluating the perfor-

mance of any algorithm, we have added to demonstrate actual time taken in real

scenario to enhance 512 × 512 size image. Therefore all these three parameters are

used to analyze the quality of the output image. The PSNR values and execution

time in seconds for three methods are calculated and tabulated in the three tables.

Results on paper document image, palm leaf images and stone inscription images of

size 512 × 512 are given in Table(3.1), Table(3.2) and Table(3.3) respectively. The

detail discussion on all enhancement algorithms is given in the last section of the next

chapter.

The PSNR values for the MR method provided in the Table(3.1) shows that this

method works well for the document images which have clean and moderate degra-

81

Page 111: Automation of Preprocessing and Recognition of Historical Document Images

dation. This method enhances low resolution, low contrast, stained, decolored paper,

palm leaf images and some of the stone inscriptions properly. It is however, un-

able to enhance severely degraded historical document images which include stone

inscription images. The PSNR values are lesser than BF and NLMF values. The

execution time is very low as MR method is completely based on simple set theory

operations requires O(M × N) operations. BF method enhances the paper, palm

and stone inscriptions better than MR method. The time complexity of BF method

is O(M × N × D2) as discussed in the sub section (3.3.1). However, time taken by

BF method is more than MR method but less than NLMF method. The PSNR val-

ues and enhanced appearance of the images along with execution time have proved

that BF method provides better results for severely degraded document images. The

NLMF based method performs better in denoising the noisy documents and preserves

the smoothness based on the similarity measure of the non local neighbor (window)

means. The time complexity of NLMF based method is N ∗M ∗K2∗P 2. This method

enhances paper, some of the palm leaf documents and stone inscriptions. The PSNR

values are proved to be better than previous two methods, but the main drawback

lies in its computational time as it takes ten times more than BF method.

Enhancement of such images is really a challenging task and demands development

of suitable algorithms for enhancement. Another drawback of these three methods is

the selection of proper values for structuring element size and controlling parameter

values for BF and NLMF methods. It is very difficult to select suitable parameter

values to address all types of degradation. The time required to enhance the large

size documents is very high in case of NLMF based method. However NLMF based

method enhances the contrast of the palm leaf document images and a well formed

binarized image can be obtained without background elimination. As enhancement

results play a vital role in the subsequent segmentation and recognition stages of the

document processing, results of these three methods can be used.

3.6 Summary

Three spatial domain techniques are implemented to enhance the degraded historical

documents which usually pose low contrast, uneven background and noise. These

82

Page 112: Automation of Preprocessing and Recognition of Historical Document Images

methods effectively handle the problems present in the images due to various factors

mentioned in previous chapters. The first method uses adaptive histogram equal-

ization for contrast enhancement followed by morphological operations as these are

simple and powerful tools to eliminate noise, remove background and produce an

enhanced image with uniform background intensity and text content. This simple

and computationally efficient method is used as a background elimination technique

and is also used for character enhancement in the other two enhancement algorithms.

However this method is unable to handle all types of degraded document images.

Therefore a second method is developed to address these problems.

The Second method implemented uses a simple and computationally efficient bi-

lateral filter approach in combination with set theory techniques. Bilateral filter is

an efficient technique to eliminate noise without smoothing the edges. Mathematical

morphology is used to eliminate the background and enhance the characters. These

operations are very much useful in bridging the gap between the broken parts of the

character. This method enhances the stone inscription images properly and performs

better than MR method. This method takes more time than MR method, but less

time than NLMF method.

The third method is developed using mathematical morphology and NLMF tech-

nique. Both are powerful in maintaining the edge and contour details. NLMF is

powerful in addressing the noise present in the low frequency components of the im-

age. The proposed hybrid approach is compared with the results of MR method and

BF methods and the results of the proposed method outperforms the existing ones by

preserving the edge details and eliminating noise. But the NLMF method takes more

time compared to MR and BF methods. NLMF method is also unable to enhance

the stone inscription images properly.

Limitations of these methods lie in properly selecting the size of the structuring

element and the values for the parameters as discussed earlier. Limitations of these

methods have motivated us to further explore the frequency domain based approaches

which are explained in the next chapter.

83

Page 113: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 4

Enhancement of Degraded

Historical Documents : Frequency

Domain Techniques

1

4.1 Introduction

In the previous chapter spatial domain approaches have been implemented and ex-

perimented on the historical document images. However, limitations spatial domain

techniques which are explained in the previous chapter lead us to explore different

domain approaches. Some of the complex operations and measurements are carried

out better in frequency domain than in spatial domain. Images in spatial domain

can be transformed into the frequency domain by the technique called Fourier trans-

form. The signal can then be analyzed for its frequency content because the Fourier

coefficients of the transformed function represent the contribution of each sine and

1Some of the material of this chapter appeared in the following research papers

1. B. Gangamma, Srikanta Murthy K , Priyanka Chandra G C, Shishir Kaushik, Saurabh Kumar, “A Combined

Approach for Degraded Historical Documents Denoising Using Curvelet and Mathematical Morphology”,

IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India,

pages 824-829, 2010.

84

Page 114: Automation of Preprocessing and Recognition of Historical Document Images

cosine function at each frequency. It is a powerful tool for analyzing the components

of a stationary signal where there is no change in the properties of signal. But it

is unable to analyze and process non-stationary signal where there is a change in

the properties of signal. Extended version of Fourier transform, Short Time Fourier

Transform(STFT) is used to analyze the frequency at different time. But it is not

possible to measure the signals at different scales(frequencies). Wavelet theory has

been used to address the problem of analyzing the signal properties of both varying

and stationary signals.

Wavelets allow complex information such as music, speech, images and patterns

to be decomposed into elementary forms at different positions and scales and subse-

quently reconstructed with high precision. Wavelet transforms enable us to represent

signals with a high degree of sparsity. Due to its excellent localization property,

wavelet transform has rapidly become an essential signal and image processing tool

for a variety of applications, including denoising, reconstruction, feature extraction

and compression. Wavelet transform are widely used in character recognition of var-

ious languages [160]. Wavelet transform based denoising methods attempt to remove

the noise present in the signal while preserving the signal characteristics, regardless

of its frequency contents. In put research work, wavelet transform based approach

is presented to enhance the degraded documents and a brief introduction of wavelet

transform and thresholding algorithms is given in the following sections.

4.2 Wavelet Transform (WT) Based Approach

As historical document images are degraded in nature, enhancement of such docu-

ment image becomes primarily important to get well formed image for preservation

as well in the further stages of image processing. Wavelet transform based denoising

method employs the thresholding technique to remove the noise present in the image

[161]. Thresholding is applied to each part of the decomposed image to get denoised

image. Since wavelet transform technique is widely used for denoising the noisy

image. Using wavelet transform, 2D image is decomposed into four types of coeffi-

cients: approximation, horizontal, vertical and diagonal coefficients. Approximation

coefficients contains the low frequency components of an image which usually carry

85

Page 115: Automation of Preprocessing and Recognition of Historical Document Images

useful information. Detailed coefficients are contained in the remaining three set of

coefficients. Thresholding is applied to either diagonal coefficients or all the three.

The thresolding may be hard thresholding or soft thresholding. Hard thresholding

means setting the coefficient(elements) values to zero whose absolute values are lower

than the threshold. Soft thresholding means setting the values to zero whose values

are lower than the threshold and then scaling the non zero coefficients to towards

zero. Soft thresholding eliminates the discontinuity. Inverse wavelet transform will

be applied to reconstruct the decomposed image. Various authors have proposed

algorithms for finding the threshold value using the information available.

4.2.1 Overview of Wavelet Transform

The 1 Dimensional Discrete Wavelet Transform(1D DWT) coefficients of function

f(x) is given by

Wϕ(j0, k) =1√M

n

f(n)ϕj0,k(n) (4.1)

Wψ(j, k) =1√M

(n)ψj,k(n)forj0 (4.2)

whereWϕ(j0, k) andWψ(j, k) in these equations are sampled version of basis functions

ϕj0,k(x) and ψj,k(x).

Inverse discrete wavelet transform is applied to reconstruct the signal using the

following equation

f(n) =1√M

k

Wϕ(j0, k)ϕj0,k(n) +1√M

∞∑

j=j0

k

Wψ(j, k)ψj,k(n) (4.3)

The wavelet transforms for 2D signal is given by the scaling function ϕ(x, y) and three

dimensional wavelets ψH(x, y), ψV (x, y)andψD(x, y). Each is the product of two one

dimensional functions. Excluding products that produce one- dimensional results,

like ϕ(x)ψ(y), the four remaining products produce the separable scaling function

ϕ(x, y) = ϕ(x)ϕ(y) (4.4)

and separable “directionally sensitive” wavelets

ψH(x, y) = ψ(x)ϕ(y) (4.5)

86

Page 116: Automation of Preprocessing and Recognition of Historical Document Images

ψV (x, y) = ϕ(x)ψ(y) (4.6)

ψD(x, y) = ψ(x)ψ(y) (4.7)

These wavelets measure functional variations and intensity variations for images along

different directions: ψH measures variations along columns(for example horizontal

edges). ψV responds to variations along rows(like vertical edges), and ψD responds

to variations along diagonals[154].

Given separable two dimensional scaling and wavelet functions, extension of the

1-D DWT to two dimensions is given by using scaling and translated basis functions :

ϕj,m,n(x, y) = 2j/2ϕ(2jx−m, 2jy − n) (4.8)

ψij,m,n(x, y) = 2j/2ψ(2jx−m, 2jy − n), i = H, V,D; (4.9)

where index i indicates the directional wavelets in Eqs.(4.5),(4.6) and (4.7). The

discrete wavelet transform of image f(x, y) of size M ×N is then

Wϕ(j0, m, n) =1√MN

M−1∑

x=0

N−1∑

y=0

f(x, y)ϕj0,m,n(x, y) (4.10)

W iψ(j,m, n) =

1√MN

M−1∑

x=0

N−1∑

y=0

f(x, y)ψij,m,n(x, y), i = H, V,D (4.11)

As in the one-dimensional case, j0 is an arbitrary starting scale and the Wϕ(j0, m, n)

coefficients define an approximation of f(x, y) at scale j0. The W iψ(j0, m, n) coeffi-

cients add horizontal, vertical, and diagonal details for scales j ≥ j0. Normally initial

value of j0 is set to 0(j0 = 0) and selecting N = M = 2j so that j = 0, 1, 2, ...J − 1

and m = n = 0, 1, 2, ...2j− 1. Given the Wϕ and W iψ of Eqs.(4.10) and (4.11), f(x, y)

is obtained via the inverse discrete wavelet transform

f(x, y) =1√MN

m

n

Wϕ(j0, m, n)ϕj0,m,n(x, y)

+1√MN

i=H,V,D

∞∑

j=j0

m

n

W iψ(j,m, n)ψ

ij,m,n(x, y) (4.12)

87

Page 117: Automation of Preprocessing and Recognition of Historical Document Images

4.2.2 Denoising Method

A general wavelet transform procedure for denoisning the image is as follows

1. Select a wavelet and number of levels(scales) P, for the decomposition. Then

compute the discrete wavelet transform of the noisy image.

2. Threshold the detail coefficients and apply a threshold to the detail coefficients

from scales J-1 to J-P. This can be accomplished by hard thresholding or by

soft thresholding. Soft thresholding eliminates the discontinuity.

3. Compute the inverse wavelet transform using the original approximation coef-

ficients at level J-P and the modified detail coefficients for level J-1 to J-P.

4.2.2.1 Thresholding Algorithms

Various thresholding algorithms have been implemented for denoising the image using

wavelet transform. Based on the literature survey, five thresholding algorithms are

selected and implemented. Results of these five thresholding algorithms are compared

using PSNR value. Thresholding method with high PSNR value is selected for wavelet

transform based approach to denoise the degraded historical document image. The

following section provides brief explanation of five thresholding algorithms.

1. Bayes Shrink

Chang et al. [162] proposed Bayes Shrink which is an adaptive data-driven

threshold for image denoising and uses soft-thresholding method. The aim of

this method is to minimize the Bayesian risk. Therefore it is known as Bayes

Shrink. The Bayes threshold, λs is defined as

λs =σ2n

σx(4.13)

Where σ2n is the estimated noise variance, and can be found as median of the

absolute deviation of the diagonal detail coefficients on the finest level(sub band

HH1) and is given by

σn =median(| Xij |∈ HH1)

0.67452(4.14)

88

Page 118: Automation of Preprocessing and Recognition of Historical Document Images

σx is the estimated signal variance on the subband level is

σx =√

Max(σ2y − σ2

n, 0) (4.15)

where σ2y , is estimate of the variance of the observation and is given by

σx =1

Ns

N∑

k=1

s(W2k ) (4.16)

where Ns is the number of wavelet coefficients Wk on the sub band selected.

Value 0.67452 is the median absolute deviation of normal distribution with zero

mean and unit variance.

2. Visu shrink

VisuShrink is thresholding by applying the Universal threshold proposed by

Donoho and Johnstone [163]. This threshold t is given by

t = σ√

2logM (4.17)

where σ is the noise variance and M is the number of pixels in the image. For

denoising images, VisuShrink is found to yield an overly smoothed estimate.

3. SURE shrink

Donoho and Johnstone proposed a threshold value chooser method based on

the concept of Steinas Unbiased Risk Estimator (SURE) and is known as Sure

Shrink. It combines the features of universal threshold and the SURE threshold

[164][165]. This technique suggests a level dependent threshold value for each

resolution level. The goal of Sure shrink is to minimize the mean squared

error[166], defined as,

MSE =1

n2

m∑

x,y−1

(Z(x, y)− S(x, y))2 (4.18)

where Z(x, y) is the output image, S(x, y) is the original image without noise

and m×n is the size of the image. Sure Shrink suppresses noise by thresholding

the empirical wavelet coefficients. The Sure shrink threshold t∗ is defined as

t∗ = min(t, σ√

(2log(N)) (4.19)

89

Page 119: Automation of Preprocessing and Recognition of Historical Document Images

where t represents the value that minimizes Stein’s Unbiased Risk Estimator,

σ which is the noise variance and N is the size of the image. It is smooth-

ness adaptive, which means that if the unknown function contains abrupt

changes or boundaries in the image, the reconstructed image also contains the

same.[167][168].

4. Norm Shrink The threshold value which depends on the sub band character-

istics of transform is given by

TN = βσ2

σy(4.20)

where the scale parameter β has computed once for each scale, using the fol-

lowing equation.

β =

logLkK

(4.21)

where Lk means the length of sub band at Kth scale. σ2 means the noise

variance[163], which can be estimated from the sub band of diagonal coefficients.

5. Universal Shrink

The universal shrinkage method is introduced[169] to denoise the noisy image

using wavelet transform and is known as universal shrink method. The thresh-

old T can be defined as,

T = σ√

2log(N) (4.22)

where N is the size of the image. σ being the local noise variance in sub band

and calculated by

σ =1

N

N−1∑

j=0

(X2j ) (4.23)

4.2.3 Proposed Methodology

The proposed method uses discrete wavelet transform based approach for denois-

ing the noisy document image. It uses Bayes shrink method for thresholding and

Daubechies’s wavelets are selected based on the experimentation on 500 images. Fig-

ure(4.1) shows the result of five thresholding algorithms. The result of five thresh-

olding algorithms for five images are shown in Table(4.1) and PSNR values for 20

90

Page 120: Automation of Preprocessing and Recognition of Historical Document Images

Table 4.1: Comparison of various wavelet thresholding methods for five images along

with PSNR values.

Image No −→ 1 2 3 4 5

Methods ↓

Input Image

Bayes

27.5350∗ 28.6793∗ 28.5738∗ 26.1720∗ 24.8080∗

SURE soft

27.4348∗ 28.1360∗ 28.1144∗ 25.9753∗ 24.4478∗

Visusoft

27.4597∗ 28.1038∗ 28.1196∗ 25.9671∗ 24.4574∗

Norm soft

27.5298∗ 28.6736∗ 28.5581∗ 26.1685∗ 24.8017∗

Univ soft

27.4433∗ 27.9932∗ 28.0337∗ 25.9598∗ 24.4888∗

* - PSNR

91

Page 121: Automation of Preprocessing and Recognition of Historical Document Images

a) b) c)

d) e) f)

Figure 4.1: Comparison of all thresholding methods

images are shown in Table(4.2). The proposed method also uses adaptive histogram

equalization to enhance the contrast of the image, mathematical morphology for back-

ground suppression and post processing is done by applying bottom hat morphological

operation which is equivalent to subtracting the input image from the result of mor-

phological closing operation on the input image. The algorithm is explained in detail

in the following subsection.

4.2.3.1 Stage 1: Mathematical Reconstruction

1. R1 ← Apply adaptive histogram equalization on noisy input image.

2. R2 ← Perform gray scale opening on R1.

3. R3 ← Add R1 and R2.

4. R4 ← Perform morphological closing on R3.

5. R5 ← Reconstruct the image by subtracting closed image R4 from R3.

92

Page 122: Automation of Preprocessing and Recognition of Historical Document Images

Table 4.2: PSNR values obtained from five different thresholding methods for few

images.

BayesSoft SURESoft VisuSoft NormSoft UniSoft

1 24.4761 24.3609 24.3372 24.4711 24.4690

2 24.6531 24.4182 24.4140 24.6449 24.4513

3 24.8936 24.4925 24.5077 24.8842 24.5345

4 26.4264 26.2376 26.2288 26.4264 26.2450

5 26.1738 25.9753 25.9671 26.1685 25.9598

6 25.1758 24.5030 24.5185 25.1532 24.5213

7 24.8116 24.4478 24.4574 24.8017 24.4888

8 25.2862 24.7684 24.7735 25.2426 24.7675

9 24.6178 24.3007 24.3324 24.6081 24.3688

10 24.5663 24.3703 24.4560 24.5513 24.4721

11 25.5967 24.9730 24.9772 25.5880 24.9895

12 24.7232 24.2524 24.2662 24.7137 24.2689

13 26.1956 25.7236 25.6325 26.1956 25.7726

14 28.9752 28.7342 28.7704 28.9520 28.7376

15 28.7500 28.3716 28.3746 28.7398 28.2950

16 28.4078 28.2830 28.2827 28.4225 28.2398

17 28.9265 28.4545 28.4810 28.9007 28.3533

18 27.9564 27.4455 27.4085 27.9569 27.3642

19 29.0562 28.6055 28.5946 29.0443 28.5176

20 27.5408 27.4348 27.4597 27.5298 27.4433

4.2.3.2 Stage 2: Denoising by Wavelet Transform

Apply wavelet transform denoisining method on reconstructed image R5 from the

subsection (4.2.3.1) and obtain denoised image R6. The Daubechies’s wavelet is used

and thresholding is applied to detailed coefficients of first level and as well as second

level decomposition coefficients of vertical, horizontal and diagonal. Bayes Shrink is

employed for thresholding, as it denoises better than other four methods which are

93

Page 123: Automation of Preprocessing and Recognition of Historical Document Images

explained in subsection(4.2.2.1).

1. R6 ← Apply wavelet transform to R5.

4.2.3.3 Stage 3: Postprocessing

1. R7 ← Add output of 2nd stage(4.2.3.2)(R6) to R1 of 1st stage(4.2.3.1).

2. R8 ← Apply bottom hat operation (the difference between the closing of the

original image and the original image) to R7.

3. R9 ← Reconstruct the image by adding complemented versions of R8 and R7.

4.2.3.4 Algorithm

This algorithm takes the noisy degraded document and produces the output image

by adjusting the contrast, character, eliminating the noise, eliminating uneven back-

ground.

Input: Degraded historical document image.

Output: Enhanced image.

begin

1. Perform stage 1 (sub section[4.2.3.1]) operations on input noisy image to get

partially enhanced image.

2. Perform stage 2 (sub section[4.2.3.2]) operation on output of stage 1.

3. Perform postprocessing operations mentioned in stage 3 (sub section[4.2.3.3])

on output of stage 2.

end.

4.2.4 Results and Discussions

Experimentation has been conducted on the entire data set using the wavelet trans-

form in combination with morphological operations to enhance degraded documents.

To select suitable thresholding algorithm, initially experimentation has been on more

94

Page 124: Automation of Preprocessing and Recognition of Historical Document Images

than 500 images using five wavelet thresholding methods and PSNR values calcu-

lated. Out of 500 results only 20 values are tabulated in Table(4.2). Table(4.1)

shows the result images of thresholding algorithms and corresponding PSNR values.

By observing these PSNR values of all methods of denoised images and human visual

perception Bayes shrink method gives better result and thus selected as threshold-

ing algorithm to denoise degraded historical document. Soft thresholding has been

employed since it gives the smoothing effect to contours and edges.

(a) (b)

Figure 4.2: (a) Paper manuscript image-3 of previous century. (b) Enhanced image

using WT based approach.

Experimentation is carried on paper documents belonging to nineteenth and begin-

ning of twentieth century from a private collection. Result of the WT based approach

is shown in (4.2)(b) on input image Figure(4.2)(a). Figure (4.3) shows the results of

the WT based on input images shown in Appendix 1 (a) Figure(B.2) and (b) Fig-

ure(B.3. The results of the proposed method enhances reasonably well, but again

the selection of the proper size of the structuring element is not automated. Man-

ually, each and every image set should be inspected and normalized, so that proper

structuring element size can be used. But, when compared to NLMF method, WT

based approach takes less time and also is not controlled by the controlling param-

eter as in BF and NLM method. The proposed method also tested on palm leaf

manuscripts belonging to various century from 16th to 18th. Input images and results

95

Page 125: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 4.3: Enhanced images using WT based approach on (a) Paper manuscript

image of shown in Appendix 1 (a) Figure(B.2) and (b) Figure(B.3

.

are shown in (a) and (b) of Figure(4.4), Figure(4.5), Figure(4.6), Figure(4.7). The

method enhances the method better for the palm leaf images than paper images.

Figure 4.4: (a) Palm leaf manuscript image belonging to 16th - 18th century. (b)

Enhanced image using WT based approach.

96

Page 126: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 4.5: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced

image using WT based approach.

Results of the WT based approach on stone inscription images shown in Fig-

ure(4.8)(a), Figure(4.9)(a), (c), and Appendix 1 Figure (C.2) are shown in Fig-

ure(4.8)(b), Figure(4.9)(b), (d), and Figure(4.10)(b). The method enhances some

of the stone inscriptions properly, but unable to enhance severely degraded noise.

The limitation of the proposed method is that the wavelet is unable to handle curve

discontinuity. Hence curvelet transform based approach is implemented to address

this issue and explained in the next section.

97

Page 127: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 4.6: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced

image using WT based approach.

4.3 Curvelet Transform (CT) Based Approach

Remarkable efforts of researchers have produced significant contribution in the field

of spectral domain. Even though, intense research work has happened in the wavelet

field, wavelet transform are suitable only to address the point discontinuity, but fails

in addressing edge, and curve discontinuity. Apart from edge discontinuity prob-

lem, discrete wavelet transform uses only 3 directional wavelets; horizontal, vertical

and diagonal to capture the image information. Wavelet spectral domain will not

be able to represent images which contain high level of directionality. Because of

this limitation of discrete wavelet transform, researchers are trying to introduce spec-

tral approaches with more directional information in an image. This resulted in

development of ridgelet and curvelet transforms [170]. Curvelet transform has been

developed to overcome the limitations of wavelet and Gabor filters. Multiple orien-

98

Page 128: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 4.7: (a) Palm leaf manuscript image belonging to 18th century. (b) Enhanced

image using WT based approach.

tation approach of Gabor filters have proved to be better than wavelet transform

in representing textures and retrieving images. Gabor filters are unable to provide

complete spectral information, because of which they cannot be effectively used to

represent images. This will degrade the classifier performance. Hence curvelet trans-

forms are promising method to capture spectral information and can be employed in

denoising, reconstruction and feature extraction problems. Curvelet transform also

provides the flexibility for the degree of localisation in orientation that varies with

scale. Fine scale basis functions are long ridges in curvelet, and the shape of the basis

functions at scale j is given by the 2j × 2j/2.

99

Page 129: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 4.8: (a) Stone inscription image belonging to seventeenth century. (b) Result

of WT based approach.

4.3.1 Overview of Curvelet Transform

Curvelet transform implemented using ridgelet transform proved to be less efficient

[171], because of the complex nature of ridgelet transform. Candes et al.[172] pro-

posed two new curvelet transforms based on Fast Fourier Transform(FFT) and re-

ferred to as Fast Discrete Curvelet Transform(FDCT). The first form is Unequally-

Spaced Fast Fourier Transform(USFFT) and second one is wrapping based FDCT.

Wrapping based Curvelet Transform is faster in computation time and more robust

than ridgelet transform and USFFT based curvelet transform [171]. Curvelet trans-

form based on wrapping of Fourier samples takes a 2-D image as input in the form of

a Cartesian array f [m,n]0 ≤ m < M, 0 ≤ n < N and generates a number of curvelet

coefficients indexed by a scale j, an orientation l and two spatial location parameters

k1, k2. Discrete curvelet coefficients can be defined by:

CD(j, l, k1, k2) =∑

0≤m<M,0≤n<N

f [m,n]ϕDj,l,k1,k2[m,n] (4.24)

where ϕDj,l,k1,k2[m,n] is a digital curvelet waveform.

Wrapping based FDCT[172] is a multi scale transform with a pyramid structure

and includes several subbands at different scales in the frequency domain. Orientation

and positions of the subbands at high frequency are different from subbands at low

100

Page 130: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 4.9: (a) and (c) Stone inscription images belonging to 14th - 17th century. (b)

and (d) Results of WT based approach.

frequency. The curvelet waveform looks like a needle shaped element at high scales,

where as it is non directional at the coarsest scale. Curvelet becomes finer and

smaller at high scales and addresses curved edges more sensitively. The FDCT uses

effective parabolic scaling approach on the subbands in the frequency domain to

capture curved edges within an image more effectively. Since curvelet effectively

captures the curves in an image, curved singularities can be well approximated. The

best results can be achieved in the frequency domain. Both the curvelet and the image

are transformed to fourier frequency domain and then multiplied. The frequency

response of the curvelet transform is a trapezoidal wedge shown Figure(4.11)(a). This

wedge data cannot be accommodated directly into a rectangle of size 2j × 2j/2. To

101

Page 131: Automation of Preprocessing and Recognition of Historical Document Images

Figure 4.10: Result of WT based approach on stone inscription belonging to seven-

teenth century shown in Appendix 1 Figure (C.2).

(a) (b)

Figure 4.11: (a)Wrapping data, initially inside a parallelogram, into a rectangle by pe-

riodicity(Figures reproduced from paper [172]). The shaded region represents trape-

zoidal wedge.(b) Discrete curvelet frequency tiling.

overcome this problem, Candes et al.[172] have implemented wrapping based FDCT

where a parallelogram with sides 2j × 2j/2 is chosen as a support to the wedge data.

The wrapping procedure is applied by periodic tiling of the spectrum inside the wedge

and collecting the rectangular coefficient area in the center. Figure(4.11)(b) shows the

wrapping of the data into rectangle tile. Then taking inverse FFT gives the curvelet

coefficients in spatial domain. The fastest curvelet transform currently available is

curvelets via wrapping [173], [174], [175], which is used in our work.

102

Page 133: Automation of Preprocessing and Recognition of Historical Document Images

4.3.2 Proposed Method

The curvelet transform is used to eliminate noise and enhance the degraded noisy

image. Mathematical morphological operators opening and closing are used to elim-

inate the background of the document image. The following steps are used in the

method.

4.3.2.1 Denoising Using Curvelet Transform

Curvelet toolbox is used to extract curvelet coefficients as explained in subsection

(4.3.1). These curvelet coefficients will be in spatial domain. Thresholding value is

applied to normalize curvelet coefficients. Again inverse curvelet transform is ap-

plied to get the output image. Three levels(scales) of decomposition is applied for

enhancement process.

1. OutputImage1 ← Extract curvelet coefficients using curvelet transform, nor-

malize the curvelet coefficients by applying thresholding and Take inverse curvelet

transform.

4.3.2.2 Algorithm

This algorithm takes the noisy degraded document and produces the output image

by adjusting the contrast, character, eliminating the noise, eliminating uneven back-

ground.

Input: Degraded historical document image.

Output: Enhanced image.

begin

1. Perform stage 1 of sub section[4.2.3.1] operations on input noisy image to get

partially enhanced image.

2. Perform stage 2 of sub section[4.3.2.1] operation on output of stage 1.

3. Perform postprocessing operations mentioned in stage 3 of sub section[4.2.3.3]

on output of stage 2.

end

104

Page 134: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

(e) (f)

Figure 4.13: (a)-(b) Input images. (c)-(d) Results of first and second stage of curvelet

based approach. (e)-(f) Result of last stage(image 15-49).

105

Page 135: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b) (c)

Figure 4.14: (a) Palm leaf manuscript image belonging in between 16th to 18th century.

(b) Enhanced image using WT based approach. (c) Result of CT based approach.

4.3.3 Results and Discussions

The experimentation has been carried out using Matlab Curvelet Toolbox downloaded

from Curvelab.org [176]. The proposed method has been tested on historical docu-

ments of Kannada language. The proposed method enhances the severely degraded

noise by eliminating dark background. Figure(4.12) shows the experimentation re-

sult of curvelet transform based approach on paper, palm leaf and stone inscription

images. Figure(4.13) shows the experimentation result of CT based approach along

with the intermediate results on palm leaf manuscript images.

The results of CT based approach for 5 images which includes palm leaf image

and paper document image, are given in Table(4.3). Results on some more palm

leaf images are shown in Figure(4.14), Figure(4.15) and Figure(4.16). The proposed

method also enhances the stone inscriptions and results are shown in Figure(4.17)

and Figure(4.18).

106

Page 137: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 4.17: (a) Result of WT based approach, (b) result of CT based approach on

image shown in Figure(4.8)(a).

4.4 Summary

Two frequency domain based approaches are developed and experimentation has

been performed on the historical Kannada document images. First method is based

on wavelet transform approach and second based on curvelet transform. Curvelet

transform is suitable in handling curve discontinuity and gives smooth curve, when

compared to wavelet transform. The low contrast and uneven background inten-

sity has been handled using mathematical reconstruction techniques. Both these

frequency domain methods are compared using PSNR values, execution time and hu-

man visual perception. Curvelet Transform based approach outperforms the wavelet

transform based approach with respect to visual appearance by human interpretation

and PSNR values, but takes slightly more time than WT based method and results

are given in Table(4.7). The Table(4.4), Table(4.5) and Table(4.6) show the PSNR

values and execution time for paper document images, palm leaf document image

and stone inscriptions respectively.

4.5 Discussion on Enhancement Algorithms

In the current and the previous chapter, we have presented five enhancement algo-

rithms and experimented on large data sets containing approximately 2700 images.

108

Page 138: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

(c) (d)

Figure 4.18: Results of WT based method shown in (a), (c) and result of CT based

method shown in (b)-(d) for stone inscription images shown in Figure(4.9)(a) and

(c).

The proposed methods are able to enhance the degraded documents and produce

better results in terms of giving quality binary images. The proposed methods are

compared using basic filtering/denoising techniques only. The proposed methods are

tested on Kannada historical document images and also works well on any other lan-

guage documents. But in our research work, we concentrate on the era prediction

of characters belonging to Kannada language only, because, the inclusion of other

languages creates further complexities and demands additional algorithms for identi-

fying the language. We are unable to compare proposed methods with state-of-the-art

methods, because we have used our own data sets to conduct experiments. We could

109

Page 139: Automation of Preprocessing and Recognition of Historical Document Images

not completely implement state-of-the-art methods and experiment on them using

our data sets. We wanted to implement simple techniques to enhance Kannada doc-

uments. The performance evaluation parameters like PSNR, SSIM, MSE are not

standard, because PSNR value is a quantitative measurement and whose high value

may not always signify an enhanced image. The Structural Similarity Index Mea-

sure(SSIM) is applicable only when the original ground truth images are available

and similarity is measured between the original image and the restored image. In our

research work, original images are not available and therefore, SSIM cannot be used

to measure the performance. We cannot compare the results of our method with the

state-of-the-art methods, even if the state-of-the-art methods are implemented and

used for experimentation, because the data sets that are used to compare their results

are different from our data set. One more evaluation criteria is to use the enhanced

binarized image for segmentation. The segmentation algorithm should segment the

document image into lines, words and character properly. So if the segmentation

algorithm segments the binarized image properly, then the performance of the en-

hancement methods can be said to be satisfactory. The output of the enhancement

techniques can be used as input to the segmentation algorithms which is explained

in next chapter segmentation of document image into lines, words and character.

110

Page 141: Automation of Preprocessing and Recognition of Historical Document Images

Table 4.4: Comparison of PSNR Values and execution time for Wavelet and Curvelet

Transform based methods on paper images.

PSNR for PSNR for Time in Sec Time in Sec

S.No WT Based CT Based WT Based CT Based

1 25.8798 35.5900 3.0985 5.7283

2 25.7759 35.2788 2.4512 5.0381

3 24.9246 33.8267 2.1122 4.5498

4 27.3457 37.8142 2.3272 4.5129

5 24.6156 30.0907 1.9518 4.5297

6 25.1462 34.0457 2.1306 4.5294

7 25.6768 29.4456 1.9917 4.5357

8 24.4999 31.7419 1.9671 4.5847

9 24.5966 36.1923 2.2618 4.5223

10 25.0807 36.1255 2.5894 4.5129

11 24.3833 37.0358 2.5262 4.5744

12 24.5652 32.2564 2.3669 4.5386

13 26.7434 34.2467 2.3850 4.5189

14 25.3168 31.2967 2.3612 4.5554

15 24.9640 30.2086 2.3910 4.5514

112

Page 142: Automation of Preprocessing and Recognition of Historical Document Images

Table 4.5: Comparison of PSNR Values and execution time for Wavelet and Curvelet

Transform based methods on palm leaf images.

PSNR for PSNR for Time in Sec Time in Sec

S.No WT Based CT Based WT Based CT Based

1 24.2724 35.6228 2.3424 4.6382

2 24.2116 36.6294 2.3034 4.6146

3 24.3478 31.7468 2.2459 4.5703

4 24.2225 36.0736 1.9751 4.5886

5 24.6479 36.7832 2.0093 4.6487

6 24.7061 36.0069 1.9547 4.6589

7 24.4497 30.2781 1.8661 4.8125

8 24.2690 37.7530 2.0014 4.5264

9 24.3078 30.6214 1.8734 4.5395

10 24.4159 31.3910 1.8745 4.5329

11 24.3806 31.4680 1.8825 4.4984

12 24.1759 38.9257 2.0604 4.7344

13 24.2226 36.1537 1.9738 4.5569

14 24.3169 34.9385 1.9947 4.5872

15 24.4767 31.6551 1.8945 4.5200

16 24.3498 32.2874 1.9326 4.5080

17 24.3430 36.9334 2.0311 4.5991

18 24.3146 33.2810 1.9806 4.5533

19 24.3899 31.9061 1.8903 4.5287

20 24.2313 37.4947 1.9488 4.6408

21 24.4372 31.0819 1.9853 4.5355

22 24.1572 39.0294 2.3300 4.7449

23 24.3488 38.2007 2.0846 4.6863

24 24.5303 31.5589 1.9545 4.6738

25 24.3335 37.4303 2.0544 4.6256

113

Page 143: Automation of Preprocessing and Recognition of Historical Document Images

Table 4.6: Comparison of PSNR Values and execution time for Wavelet and Curvelet

Transform based methods on stone inscription images.

PSNR for PSNR for Time in Sec Time in Sec

S.No WT Based CT Based WT Based CT Based

1 24.4136 31.1071 2.7333 5.3786

2 24.4923 37.5022 2.1230 4.9018

3 24.5634 35.9363 2.1270 4.8408

4 24.6208 39.2543 2.2215 7.5410

5 25.1249 32.9747 1.9283 6.8818

6 24.9396 33.1379 1.9560 6.7669

7 24.6928 30.6739 1.8946 6.8962

8 24.3710 31.7174 1.9159 6.9271

9 24.9423 29.7077 1.8633 7.7404

10 25.0671 33.1831 1.9341 6.8797

114

Page 145: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 5

Segmentation of Document Images

1

5.1 Introduction

In the previous two chapters, enhancement algorithms have been developed to en-

hance the degraded historical documents using spatial and frequency domain tech-

niques. In this chapter, document image segmentation algorithms have been pre-

sented. Document image segmentation is the process of segmenting the document

image into lines, words and characters. Segmented characters are used further in the

classification and recognition stages. Efficiency of the classifier is completely depen-

dent on the character features extracted, which in turn depends on the segmentation

of the characters. Hence the development of efficient segmentation algorithms to ex-

tract lines, words and characters is very important. Extracting lines from printed doc-

uments is comparatively simpler than extracting lines from handwritten documents,

as lines in the handwritten document are usually contain nonuniform spacing between

1Some of the material of this chapter appeared in the following research papers

1. B. Gangamma, Srikanta Murthy K , Hemanth Kumar G, Riddhi J Shah, Swati D V, Sandhya B, “Text Line

extraction from Kannada Handwritten Document”, IEEE, International Conference on Computer Engineer-

ing and Technology, November, Jodhpur, India, pages E 8-11, 2010

2. B. Gangamma, Srikanta Murthy K, Riddhi J. Shah, Swati D V, “Text Line Extraction from Palm Script Docu-

ments Using Morphological Approach”, International Conference on Computer Engineering and Applications

Dubai, UAE, pages 1452-1455, January 29-31, 2012.

116

Page 146: Automation of Preprocessing and Recognition of Historical Document Images

them. Apart from this, historical document images which are inscribed/written usu-

ally pose uneven line space, inscriptions over curved lines, overlapping text lines

etc., making segmentation of the document difficult. Therefore, there is a need for

development of efficient segmentation algorithms to address these problems..

This chapter deals with the segmentation of the document image into lines. The

chapter is organized into six sections. Section one a provides brief introduction, the

second section gives information about the proposed methodologies, section 3,4, and

5 detail the three different segmentation algorithms. The last section provides the

summary of all the methods.

5.2 Proposed Methodologies

Tremendous efforts have been expended to address the segmentation problem. But

these algorithms address only a specific set of problems and are unable to address all

the segmentation problems. It is still an open challenge for the research community to

devise a suitable algorithm to address the segmentation problem. It is noticed from

the literature survey that, only a countable number of researchers have addressed

the segmentation of historical documents and not much work has been traced back

to South Indian Language documents. This has motivated us to design suitable

segmentation algorithms to segment the Kannada language historical documents into

lines and characters and extract the character features to recognize the era of the

character.

The segmentation algorithm requires binarized image as input. Binarization is

the process of separating(segmenting) the document into foreground and background

groups. Also this process requires thorough preprocessing methods to enhance the

documents, as there is significant degradation in these documents due to various

factors which are discussed in the previous chapters. Hence there is a requisite to de-

velop efficient preprocessing algorithms. In this chapter, an attempt is made towards

developing efficient segmentation algorithms to segment the historical document im-

age. The segmentation algorithm requires a well formed binary image. The Results

117

Page 147: Automation of Preprocessing and Recognition of Historical Document Images

(a) (b)

Figure 5.1: (a) Handwritten Kannada document image. (b) Horizontal projection

profile of handwritten document image.

of the preprocessing algorithms are considered as input to the segmentation process

and are binarized using global thresholding Otsu[53] method.

Two algorithms have been developed for the extraction of text lines and characters

from the historical document images and an algorithm is developed to detect and

correct the skews in the document. All of which are explained in detail in the following

sections.

5.3 Method 1: Piece-wise Horizontal Projection

Profile Based Approach

Global horizontal projection profile is the widely used method to segment the lines.

This method is well suited for printed documents where the spacing between lines is

prominent. Individual lines are segmented based on the valley points in the histogram.

It is also used to segment the handwritten documents into lines with sufficient line

spacing between lines. The document image shown in Figure(5.1)(a) is a handwritten

Kannada document with uniform spacing between lines and Figure(5.1)(b) is its

horizontal projection profile. The projection profile is used to extract individual lines

from the document. The gap between two valleys is used to separate the line. This

118

Page 148: Automation of Preprocessing and Recognition of Historical Document Images

method works well for images with uniform spacing between lines. However, not

all handwritten documents will possess this uniformity. Lines in the document are

usually skewed or curved and they pose uneven spacing between lines. The global

projection profile method fails to segment the lines in such situation. The sample

input text document is shown in Figure(5.2) with uneven spacing between lines and

its projection profile is shown in Figure(5.3). The proposed method is devised to

address the segmentation of document image with uneven line spacing into lines and

characters.

This method consists of four stages: the first stage divides the document into verti-

cal strips, the second stage obtains the horizontal projection profile of the individual

strips, the third stage constructs the line using vertical strips and the last stage deals

with extraction of the characters. The following sub section explains the algorithm

steps in detail.

Figure 5.2: Handwritten Kannada document image.

Figure 5.3: Horizontal projection profile of the input image Figure(5.2).

119

Page 149: Automation of Preprocessing and Recognition of Historical Document Images

5.3.1 Division into Vertical Strips

In this approach, the image is divided into vertical strips of equal width W as shown

in Figure(5.4). The value ofW can be chosen in terms of multiples of 100. Depending

on the size of the image, W can be selected. If the size of the smaller image is less

than 500, then 100 will be a better value. If the size is more than 1000, then choosing

200 will be better. The more the number of pieces, better the extraction. But the

arrangement of the pieces into blocks of lines is very imprecise. So, the number of

vertical strips should reasonably be between 5 and 10.

To calculate number of strips N , the total number of columns C in the image

is divided by W . If C is exactly divisible by W then all strips of equal size are

obtained. Else, N +1 strips, with N strips of W size and (N +1)th size is calculated

by size = C −W ∗C/N where N is obtained by N = C/W , where C is the width of

the image.

5.3.2 Horizontal Projection Profile of a Strip

For each strip obtained using the above method, the horizontal projection profile is

calculated and the text pixel count in each row is stored in the pixel count array. The

plot of pixel count versus row number yields a projection profile which contains clear

peaks and valleys. Valley points on the plot represent the zero text pixel count in the

rows and peak points represent the maximum number of text pixels in rows. These

zero text pixel rows are represented as Zero Rows (ZR) and non-zero pixel rows as

Non Zero Row (NZR). Keeping only one ZR between two NZRs and eliminating

other ZR makes extraction of the lines easier. The Figure(5.4) depicts the vertical

division, zero pixel count row and non-zero pixel count rows. Horizontal projection

profile of a strip is shown in Figure(5.5)

5.3.3 Reconstruction of the Line Using Vertical Strips

By scanning each row of the profile array h, search the first NZR and store the row

number in Rn as a starting row number(NZR1), where n represents strip number.

Continue to scan until the next first zero text pixel count. Store the previous row

120

Page 150: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.4: Non-Zero Rows (NZRs) and rows labelled NZR1 and NZR2.

Figure 5.5: Horizontal projection profile of a strip.

as ending row number(NZR2) of the first line. Continue to scan the profile array

until the next NZR is found and store this row number as (NZR3) and proceed

scanning until the ending row number for the second line, as discussed for the first

line is found. This process is repeated until all the lines are scanned and all the strips

are processed.

Once all the potential NZRs are extracted, calculate the distance between pairs

of potential consecutive NZRs. Average distance is calculated and used to check

whether these NZRs represent the starting and ending of a single line or not. Average

height of all these lines is used as threshold value. If the difference between the

corresponding NZRs of the adjacent strip is less than the threshold, then both NZR1

and NZR2 values are considered, else ignored for the current line. To extract the

first line, contents between the first pair of NZRs are extracted from each strip and

121

Page 151: Automation of Preprocessing and Recognition of Historical Document Images

joined. This method is applied repeatedly to all pairs of NZRs from each strip in

order to extract all the lines.

(a) Line 1

(b) Line 2

(c) Line 3

Figure 5.6: Extracted text lines.

5.3.4 Character Extraction

The extracted lines are used to extract words and characters using connected com-

ponent analysis (CCA). Vertical projection profile is suitable for extraction of words

from the paper document image, but is unable to separate the words from palm leaf

document images, because of improper spacing between words and characters. Also

the prediction of the era requires individual characters. So, extraction of the charac-

ter is performed. However reconstruction of the character from the segmented piece

of character and broken character is not performed, as is out of the scope of our

research work.

5.3.5 Algorithm for Document Image Segmentation.

Input : Input binarized image

Output: Segmented lines and characters.

1. Binarize the input image using Otsu[53] method.

2. Divide the image into vertical strips of size W.

122

Page 153: Automation of Preprocessing and Recognition of Historical Document Images

(a) Line 1

(b) Line 2

Line 3

Figure 5.9: Input handwritten image and extracted Lines.

(a) Line 1

(b) Line 2

(c) Line 3

Figure 5.10: Extracted characters.

3. For each strip, obtain the horizontal projection profile h, that is the array

containing number of text pixels in each row.

4. Find the potential NZR and store the corresponding row numbers in separate

array Rn where R represents an array to store row number and n represents

the strip number.

5. Repeat the step from 3 to 5 for all the strips.

124

Page 154: Automation of Preprocessing and Recognition of Historical Document Images

6. Extract the first row number pairs from Rn and join them to make first line

and store it as separate line image.

7. Extract the characters from the extracted line using CCA.

8. Repeat the step 6 until all the lines are formed.

5.3.6 Results and Discussion

Experimentation has been performed on 200 different Kannada language handwrit-

ten documents and some of the results are shown here. Two sets of documents are

considered for experimentation, one set containing paper documents and the other

set containing palm leaf documents. Figure(5.2) shows a handwritten paper docu-

ment image. As this document does not have distinct gaps between two valleys, the

separation of the two lines is not possible. The projection profile of Figure(5.2) shown

in Figure (5.3) does not have distinct gaps between two valleys. The division of the

document into vertical strips helps us to extract each vertical strip separately. From

each vertical strip, individual lines are extracted and stored separately. Furthermore,

each piece of line from each strip is taken and joined to get single line. The ex-

tracted lines from the document image are shown in Figure(5.6). Vertical projection

profile is employed to extract individual characters from each line and the extracted

characters from the line are shown in Figure(5.7). Figure(5.8) shows the extracted

characters for all the lines in the document. one more experimental result is shown

in Figure(5.9) for extraction of lines and Figure(5.10)shows the extracted characters

in the Figure(5.9) respectively.

When compared to global projection profile method and Hough transform method,

the proposed method works well for document image with uneven spacing between

lines and words. If the gap between two lines is very small and too curvy, then the

strip size has to be reduced in such a way that projection of each individual strip

should contain prominent gaps between two lines. Dividing the image into too many

vertical strips makes extraction and joining process complicated and the time involved

for reconstruction is also more. Further, the reconstruction of the line will also not

be proper because of too many pieces. Therefore, dividing the document image into

125

Page 155: Automation of Preprocessing and Recognition of Historical Document Images

too many pieces reduces the quality of the line construction. Therefore this method

cannot be used to segment touching lines and lines that have too much curvedness.

One such image shown in Figure (5.11) is subjected to the proposed method. The

result of segmentation of Figure(5.11) is shown in Figure(5.12). The 1st, 2nd, 3rd and

6th vertical strips in the image Figure(5.12) have uneven spacing between lines and

thus make the segmentation of row piece much more difficult. Therefore, the present

research work has proposed another algorithm to segment the touching lines from the

document image and is explained in the next section.

Figure 5.11: Input image with uneven spacing between lines

Figure 5.12: Result of method 1 on the image shown in Figure(5.11).

5.4 Method 2: Mathematical Morphology and Con-

nected Component Analysis(CCA) Based Ap-

proach

As the piece-wise projection profile method is unable to handle skewed and touching

lines, a second method is devised to segment the lines from the curved line docu-

126

Page 156: Automation of Preprocessing and Recognition of Historical Document Images

ment which will have uneven spacing between lines and touching lines. The proposed

method uses mathematical morphology and connected component analysis. The fol-

lowing section explains the procedure used to segment the image having uneven spac-

ing between lines.

Figure 5.13: Result of closing operation.

(a) Line 1

(b) Line 2

(c) Line 3

Figure 5.14: Extracted text lines.

The proposed method requires binarized image of the historical document. De-

graded historical document images are enhanced using the methods explained in the

previous chapters, 3 and 4. Output of the Curevlet transform and Non Local Means

filter techniques are used to obtain the enhanced images. Efficient global thresholding

method Otsu[53] algorithm is used to binarize the enhanced image.

127

Page 157: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 5.15: (a) Line and extracted characters from line (a).

Figure 5.16: Input image.

5.4.1 Morphological Closing Operation

Morphological closing operation is applied to the binarized image. This is mainly used

for connecting and merging the characters in a line. The line structuring element with

length L and zero degree is used for closing operation, where the value of L can be

between 20 and 60. L value can be selected based on the length of the palm script

and higher value of L will be used to bridge gaps between words and fill the holes

that are prominent in palm scripts. These holes are used to tie all scripts together.

If the gap between the words is more, then the smaller value of L will create many

number of components. To avoid this, the length of the line structuring element is

chosen carefully. The result of closing operation should yield single component for

each line. The result of closing operation on the sample input image Figure(5.11) is

shown in Figure(5.13).

128

Page 158: Automation of Preprocessing and Recognition of Historical Document Images

5.4.2 Line Extraction Using Connected Components Analy-

sis

Once the closing operation is performed on the image, connected component analysis

is used to extract the connected components. The following paragraph explains the

steps to be followed in extracting lines from the connected component image.

1. Scan the row from the beginning until text(ON) pixel is found.

2. Store this row number as starting row in an array called Row Numbers(RN).

Copy the pixel values to another array called as Extracted Connected Compo-

nent Line (ECCL).

3. Continue to scan each line and copy the pixels to (ECCL) array until all the

pixels belonging to one connected component are copied. Store the end row of

the connected component in RN .

4. Steps 2 and 3 should be repeated for all connected components. Line com-

ponents and row pairs for each connected component lines are extracted and

stored separately in (ECCL) and RN arrays. (ECCL) for each component is

maintained separately.

Once the connected components are extracted, the original lines have to be ex-

tracted from the image as follows:

• Select the pair of connected component coordinates from the RN array and

extract the pixels between pair of RN values(as starting row number and ending

row number), from the original document image. Since the lines are not straight,

extracted contents between two rows will usually contain pixels from next line

also. Corresponding connected line component ECCL is taken and logical AND

operation is performed on extracted pixels in the original document.

• Extracted lines are then stored separately as a line segment. This line segment

can then be used to extract the words and characters for further processing in

recognition system.

129

Page 159: Automation of Preprocessing and Recognition of Historical Document Images

5.4.3 Finding the Height of Each Line and Checking the

Touching Lines.

Using starting row and ending number which are stored in the RN array for each line,

height of the line can be calculated. Difference between starting and ending rows gives

the height of connected line which is also equal to the height of the line. Average

height of all connected components is calculated. This is required to check whether

each component is containing one line or more than one line. If each components

height is more than the average height, then the component will contain more than

one line. This occurs when two or more lines are touching. There is a need to break

the connectivity. If the height of the line is less than the average height, then the

lines are extracted directly. Otherwise the extracted touching line components is

again given as input to the opening operation to break the connectivity between the

lines. Then perform the same operation of extracting connected components and

calculate the line height. Repeat from the beginning until all the single lines are

extracted.

5.4.4 Character Extraction

Simple CCA is used to extract the individual characters from the extracted line.

Vertical projection profile may not yield proper result because of spacing problem

between characters. Since Kannada language script contains vattu’s and matras, it

is difficult to segment the characters along with the vattu’s and matras properly.

CCA method proved to be better for extraction of characters. As mentioned in the

previous section, reconstruction of the broken character is not performed.

Figure 5.17: Result of closing operation.

130

Page 160: Automation of Preprocessing and Recognition of Historical Document Images

(a) First Line.

(b) Second Line.

(c) Third Line.

Figure 5.18: Result of extraction of connected components(lines).

5.4.5 Algorithm for Segmentation of the Document Image

into Lines.

Input: Binarized image obtained from the first stage of the proposed method.

Output: Segmented lines of the document image.

1. Mathematical Morphological closing with line structuring element being ap-

plied.

2. Connected component analysis is applied and components are extracted.

3. Calculate the height of all connected components.

4. Find out the average height of the connected component.

5. Check the height of each connected component, if the height of the connected

component is less than the average height, then the original line is extracted

using the method explained above. If the connected component height is greater

than the average height, then the mathematical opening operation is applied

to break the touching lines. Repeat the steps 2 to 5.

6. Use CCA to extract the characters from the extracted line.

131

Page 161: Automation of Preprocessing and Recognition of Historical Document Images

Each extracted line component is painted with different color to show the different

components shown in Figure(5.14). The characters are extracted again using CCA

method. The CCA is applied to the extracted lines and the results are shown in the

Figure(5.15).

5.4.6 Results and Discussion

Experiments have been conducted on 200 historical document images of palm leaf

and paper scripts with varying space between lines. Out of them, only a few results

are presented here. More than 50% lines are extracted in the first iteration and the

remaining lines are extracted in the subsequent iterations. Not more than 2 iterations

are required to extract all the lines, if the lines are touching with narrow width.

Same segmentation procedure can be used for word extraction with the smaller

value of L of the structuring element. The lines which are connected to the above

and below lines may lose some information and that can be addressed when we extract

each character. Some of the segmentation results are shown in the following figures:

input image is shown in Figure(5.16) and its closed image is shown in Figure(5.17).

Extracted lines are given in the Figure(5.18). One more experimentation result of

image Figure(5.19) is shown in Figure(5.20), that is closed(painted) image. Result of

line extraction is shown in Figure(5.21).

Touching lines(two lines) in the image are shown in red color (second red line) as

in Figure(5.22), and its opened image is in the same image. Again applying opening

operation on touching line component will segment the lines into distinct lines. Re-

sults of segmentation of touching lines are shown in Figure(5.22), Figure(5.23). If the

lines are touching with minimum width(may be narrow joining), then using the same

structuring element, opening operation can be applied. If the touching width is more,

then segmentation of lines becomes very difficult. Again, the structuring element size

has to be changed. Setting the structuring element size completely depends on the

portion of the line that is touching. Development of the algorithm for automatic

selection of the structuring element size is a challenging task for researchers.

132

Page 162: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.19: Result of binarization operation.

Figure 5.20: Result of closing operation.

5.5 Discussion on Method 1 and Method 2

The two algorithms presented in the previous sections are designed to segment and

extract lines from the document image. The performance of the segmentation al-

gorithm is usually measured using the parameters viz. number of lines extracted,

number of character extracted, number of character recognized correctly using the

OCR system. Again the segmentation algorithms performance completely depends

on the samples present in the data set and language OCR. In this research work, data

sets containing historical Kannada language documents, inscribed on palm and pa-

per are considered. OCR for such characters(old Kannada, middle Kannada) are not

available and it is highly impossible to measure the performance of the segmentation

algorithms based on the OCR performance. Almost all the state-of-art methods are

tested on standard data sets and the performance of such algorithms can be measured

using the number of lines extracted and the number of characters recognized. The

proposed algorithms which are developed are again based on the state-of-art method-

133

Page 163: Automation of Preprocessing and Recognition of Historical Document Images

(a) First Line.

(b) Second Line.

(c) Third Line.

(d) Fourth Line.

(e) Fifth Line.

Figure 5.21: Result of extraction of connected components and corresponding lines.

ologies available for segmenting the document image into lines, words and characters

and will work for any language. As the reconstruction of the segmented pieces of

the character and broken characters is not in the scope of the present research work,

only whole/complete character is used in the recognition stage which is given in the

next chapter. In this research work, an attempt has been made to design simple and

efficient algorithms to address some of the issues present in the existing methods.

Also, we wanted to test the performance of the enhancement algorithms which are

presented in the previous chapters using segmentation algorithm as another param-

eter of measuring the performance of the enhancement algorithms. Therefore, the

proposed segmentation algorithms cannot be compared with the state-of-art meth-

ods. However, we have tested the performance of the algorithms on a small portion of

our data set which have a clear background. This selection is made manually. In the

134

Page 164: Automation of Preprocessing and Recognition of Historical Document Images

(a)

(b)

Figure 5.22: (a) Touching line portion. (b) Result of closing and opening operation.

(a)

(b)

Figure 5.23: Extraction of lines.

next section, one more algorithm to find the skew within the document and correct

the skew is explained. The motivation behind the development of this algorithm is

to reconstruct the lines after correcting the skew so that a simple global projection

profile algorithm for line segmentation can be applied to segment the document. In

the next section simple and efficient skew detection and correction is designed.

5.6 Skew Detection and Correction Algorithm

Document skew is a common problem that occurs during the digitization process

using advanced scanners or cameras. Skew or tilt in the images are caused due to

incorrect positioning of the documents on the scanners. It may also be introduced

135

Page 165: Automation of Preprocessing and Recognition of Historical Document Images

while capturing the photograph. Skew angle in digital documents can be defined as

the angle made by the text lines of a digital document with that of the direction of

the x-axis of the co-ordinate system.

The skew may cause problems in text line extraction, word and character extrac-

tion. Incorrect segmentation leads to incorrect classification. Therefore, it is often

necessary to determine the skew angle and correct the skew before proceeding to

the subsequent steps i.e. segmentation, feature extraction, classification, document

layout analysis, representation, in order to make recognition in the document image

analysis stage more intelligent. Hence, skew angle detection is a major and funda-

mental step in document image analysis.

From the literature survey, it is observed that the document skew is applied to

printed and handwritten documents for the whole page. All the above mentioned

methods work well for the whole document with single skew. None of the authors

have addressed the handwritten document skew detection and correction. Skew can

also occur while writing, by authors. It is hard to find handwritten documents which

are similar to printed documents. Each and every line will be skewed or slanted

upwards or downwards with different angle as shown in the Figure(5.26) of image in

Figure(5.24). The horizontal profile is shown in Figure(5.25). This can be viewed as a

multi-skewed document and there is a need to deskew each and every line separately.

If the skew correction is done on each line properly and the document is reconstructed,

then the simple and efficient horizontal projection profile method can be used to

segment the lines accurately.

The proposed method is based on line smearing approach. The binarized docu-

ment image is subjected to mathematical closing and each line is smeared(painted)

by merging all the words and characters to make a single line of block. Connected

component analysis is used to extract the components and boundary values are ex-

tracted. Upper Left Corner(ULC) point and Lower Right Corner(LRC) are used to

find the skew angle. Then pixel values of f(x′, y′) is obtained by copying the pixel

value of f(x, y) ie, f(x′, y′) = f(x, y).

136

Page 166: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.24: Input skewed image.

The proposed method is explained in detail in the following sub sections. Once

the binarized image is obtained, the size of the image is calculated. The image

length(column size) is used to calculate the size of the line structuring element. The

next two steps are the same as the second line extraction algorithm steps where

morphological operation closing operation is applied to the binarized document to

merge the text line. The CCA method is used to extract the connected blocks of

lines.

5.6.1 Skew Angle Detection

Skew angle is calculated using the two opposite corners of the connected block shown

in Figure(5.27). Upper Left Corner(ULC) point and Lower Right Corner(LRC) are

used to calculate the length and width of the connected block of line. The skew angle

137

Page 167: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.25: Horizontal projection profile of the input image(5.24).

can be calculated using the simple formula

tan(θ) = R/C (5.1)

where R is the row difference given by R2 − R1 and C is the column difference

given by C2 − C1. R1 and R2 represent the row values of ULC and LRC, C1 and

C2 represent the column values of ULC and LRC. Once skew angle is calculated,

new points are obtained using skew correction step.

5.6.2 Skew Correction

Using calculated skew angle, actual line is rotated using the formula given in Eq(5.2)

and Eq(5.3).

x′ = xcosθ − ysinθ (5.2)

y′ = xsinθ + ycosθ (5.3)

Segmentation of the line has to be done to get the actual line. Using second line

extraction algorithm discussed in the previous subsection, skewed lines are extracted.

138

Page 168: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.26: Result of closing operation.

Figure 5.27: Skew angle calculation from single connected component.

Skew correction is now applied to the actual text line to get the deskewed line.

Corrected lines are then transferred to another image to reconstruct the complete

document image with uniform spacing between lines. These steps result in the de-

skewed line shown in Figure(5.28). These steps are repeated for all the lines in the

entire document.

139

Page 169: Automation of Preprocessing and Recognition of Historical Document Images

5.6.3 Algorithm for Deskewing

Input:Binarized document image.

Output:Deskewed image.

1. Calculate the line structuring element value and its value awa is equal to the

one tenth of the column width of the image, w = C/10; where C represents the

number of columns in an image.

2. Apply morphological closing with line element of width w and zero angle.

3. Extract the connected block of merged line using CCA.

4. Find the width c and height of the block r using ULC and LRC values of the

connected block.

5. Calculate the skew angle using the formula θ = tan−1(c/r).

6. Rotate the image using skew angle which is detected in the previous step using

the formula

x′ = xcosθ − ysinθ (5.4)

y′ = xsinθ + ycosθ (5.5)

7. Append the deskewed lines to another image.

8. Repeat step 3 to step 7 until all the lines are deskewed.

5.6.4 Results and Discussion

To substantiate the efficiency of the proposed methodology, several experiments have

been conducted on document images of various scripts with different skews of his-

torical documents. Out of them, only a few results are presented here. Historical

documents of Kannada language which are in the form of palm leaf image and paper

images are considered. Since the handwritten documents usually have curved and

140

Page 170: Automation of Preprocessing and Recognition of Historical Document Images

Extracted connected component of a line

Exacted document line

Deskewed line.

Figure 5.28: Result of deskewing.

Table 5.1: Result of skew detection and correction.Merged Line Extracted Line Deskewed Line Skew angle

3.697

4.454

5.434

3.727

5.327

8.045

6.295

7.002

6.545

5.228

6.702

4.618

141

Page 171: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.29: Reconstructed image of Figure(5.24).

skewed lines, detecting skew angle and correcting the skew is a major challenge. The

documents are scanned using flat bed scanner at a resolution of 300 dpi. The results

obtained for Kannada document images scanned at different orientations are shown

in the Table(5.1). Final reconstructed image is shown in Figure(5.29). Results on

some more input images shown in Figure(5.30)(a) and Figure(5.31) are de-skewed

and the results are shown in Figure(5.30)(b) and Figure(5.32).

142

Page 172: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.30: (a) Input Image. (b) Deskewed image.

Figure 5.31: Input skewed image.

Experimentation conducted on original historical documents with varying skew

are listed out in the Table(5.2). The corrected skew lines can be used further for

word and character segmentation and for feature extraction and classification. Also,

these de-skewed lines can be used to reconstruct documents with sufficient inter-line

spacing.

In this proposed method, there is no need to thin or skeletonize the connected

component as the two opposite corner values are sufficient to calculate the skew angle.

Once the skew angle is obtained the proposed algorithm works well irrespective of

the type of script even for a wide range of skew angles within ±90◦.

143

Page 173: Automation of Preprocessing and Recognition of Historical Document Images

Figure 5.32: Deskewed image.

5.7 Summary

Segmentation of the document image is very essential as the recognition of any char-

acter from the document image is carried out by segmenting the document image

into lines, words and characters. Segmentation of the handwritten document com-

pletely depends on the way each line is written. Usually, uneven spacing between

lines, curved and touching lines and touching characters create problems in proper

segmentation. In this chapter, two efficient algorithms for segmentation of the doc-

ument image into lines and words and one skew detection and correction algorithm

are presented. The first algorithm works well for curved lines but fails to address the

touching lines problems. A second algorithm has been developed to address touching

lines problem and it segments touching lines properly. Another algorithm efficiently

detects skews within the lines and suitably corrects them. The segmented charac-

ters are used to extract the characteristic features and subsequently to recognize the

extracted characters. In the next section, we propose an era prediction algorithm

144

Page 174: Automation of Preprocessing and Recognition of Historical Document Images

Table 5.2: Skew angle detected for each line in the document image.

No. Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8

1 -4.433 -0.122 -1.959

2 -9.047 -5.526 -6.431

3 -1.581 -1.453 0.251

4 -0.554 -0.368 0.114

5 9.335 9.659 11.085

6 12.980 7.812 8.506

7 0.407 0.623 0.470 3.807

8 7.625 8.595 7.762 8.665 8.002 8.458

9 2.023 2.443 1.959 2.632 2.066 2.952

10 -7.362 -8.084 -7.094 -8.054 -7.618 -7.447

11 5.429 6.219 5.928 5.993 5.023 5.238 5.526 5.702

12 3.697 4.454 5.434 3.727 5.327 8.045 6.295 7.002

to identify the era of the characters so that the character set corresponding to that

particular era can be referred in order to decipher the contents of the document.

145

Page 175: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 6

Prediction of Era of Character

Using Curvelet Transform Based

Approach

1

6.1 Introduction

Recognition of characters from handwritten document images and categorizing them

into various classes is one of the major challenges in the area of document image anal-

ysis and recognition. The characters written on paper depend on the author’s mood,

style and materials used for writing. So the extraction of characters from handwrit-

ten document images is a profoundly complex task. Historical documents inscribed

on variety of materials usually pose many challenges to researchers, particularly in

pre-processing, segmentation and feature extraction stages. In order to recognize the

characters and decipher a given document, there is a prerequisite to know the period

1Some of the material in this chapter appears in the following research papers:

1. B. Gangamma, Srikanta Murthy K, Punitha P, “Curvelet Transform Based Approach for Prediction of Era

of the Epigraphical Scripts”, IEEE International Conference on Computational Intelligence and Computing

Research, coimbatore, pages 636-641, 2012.

146

Page 176: Automation of Preprocessing and Recognition of Historical Document Images

of the character, so that the character set pertaining to that era can be used to ap-

propriately decipher the document. Hence there is a pressing need to predict the era

of the character efficiently. In this chapter, we have presented an algorithm for the

prediction of the era of the script based on curvelet transform approach. Curvelet

transform is effective in handling curve features [172]. In this research work Fast

Discrete Curvelet Transform (FDCT) based algorithm is designed to predict the era

of the character/script. The characters are extracted using segmentation techniques

discussed in the previous chapter and used as input to the algorithm to suitably

predict era of the script.

Writing or inscribing on hard materials was the usual practice in early days. An-

cestors used both hard soft materials like rocks, metal plates and palm leaves to

inscribe the information. The current practice is to decipher these documents manu-

ally. Expert epigraphists use few characters (a, e, ka, cha, la) as standard characters

for predicting the era of the script. These are the key characters having distinct

shapes and variations in structures as shown in Figure(6.1). In this research an at-

tempt has been made to develop a method for the prediction of the era of various

scripts of Kannada - a South Indian Language, so that the deciphering can be done

by selecting the character set belonging to that particular era.

This chapter is organized as follows; Section 2 deals with related work; the pro-

posed methodology is extensively discussed in section 3; The Experimental results

are provided in section 4 and finally section 5 provides the summary of the proposed

method.

6.2 Related Literature

Research efforts in the field of character recognition have grown exponentially and

a substantial number of articles have been published during the last few decades.

Designing an OCR system is one of the most fascinating and challenging areas of

pattern recognition and it can contribute immensely to the advancement of automa-

tion processes. OCR is one of the most important components of pattern recognition

and has many applications in automatic document processing.

147

Page 177: Automation of Preprocessing and Recognition of Historical Document Images

Figure 6.1: Sample epigraphical characters belonging to different era.

Handwriting identification and recognition are of great practical interest in the

extraction of discriminating and invariant information from a handwritten specimen.

One of the major difficulties in offline word recognition originates from the presence

of same writer over time or from different scripters. There is no perfect mathematical

model that can describe such extreme variations and hence it is impossible to find

the characteristic features that are invariant with different writing styles.

Literature survey reveals that enormous amount of work has been done in the area

of document image processing and recognition. Many authors have developed efficient

algorithms for the enhancement of the degraded documents, segmentation of the

document into lines, words and characters and subsequently for feature extraction and

classification of the characters [190],[35][36], [37]. Feature extraction and recognition

are crucial steps in any recognition system. Almost all feature extraction algorithms

are based on spectral, statistical and structural features which are based on the

topological and geometrical characteristics of the character [38].

148

Page 178: Automation of Preprocessing and Recognition of Historical Document Images

The most widely used statistical methods are zonal and projection profile methods.

Local and global image statistics such as mean, variance and deviation are used as

feature sets along with other methods. Zonal methods count the number of ON pixels

in different image zones. Horizontal and vertical profiling method counts the number

of ON pixels in each row and column respectively. Dholakia et al. [191], proposed an

algorithm to recognize the Gujarati printed text using zonal method. This method

deals with the identification of various zones for text regions. Zones in the image

are identified by the slope of the lines created by the upper left corner of rectangle

created by the boundaries of connected components. They attempted to simplify the

task of OCR design by developing algorithms for character zone extraction.

Desai [193] suggested a Multi-layered feed forward neural network classifier for the

recognition of the Gujarati digits by using zonal profile method. The author has

taken four different profiles of digits, two of them are in the diagonal direction and

the other two are in the horizontal and vertical direction. This method has been ap-

plied on isolated characters after thinning and skew correction has been performed.

Authors have claimed to have achieved an admirable recognition rate of 82%. Gatos

and Kesidis [194] presented the idea of Adaptive Zoning Features based on the local

pattern information, every zones pixel density was extracted after adjusting the po-

sition of each zone by maximizing the local pixel density around each zone. Khanale

and Chitnis [195] proposed a method for recognition of Devanagari characters using

Directional plane method, where the character image is decomposed into directional

planes and each plane is partitioned into equal zones and the sum of the pixels in

each zone is used as feature value. A texture based method has been employed by

Murthy et al. [196], to extract the epigraphical character features. The era of the

characters was adequately predicted using template matching method.

A modified algorithm with scale and translation invariant properties has been de-

signed by Amayeh et al. [192]. Image normalization has been performed in a different

manner by rescaling the coordinates of the image instead of the usual technique of

re-sampling it. The ratio of the image area to the area of the unit disk is set to

a constant value. The Authors have claimed that their algorithm resulted in faster

computation and yielded a higher recognition rate.

149

Page 179: Automation of Preprocessing and Recognition of Historical Document Images

Kan et al. [197] presented a novel approach by combining two different invariant

moment methods a Orthogonal Fourier Mellin Moments [198] and Zernike [199] mo-

ments for recognition of alphanumeric characters. The combined method was useful

in characterizing images with large variability. Kunte and Samuel [200] have devel-

oped an OCR system for the recognition of basic printed Kannada characters, which

works for different font sizes and styles. Hu [201] invariant moments and Zernike

moments have been used for extracting invariant moments which serve as feature

vectors. These moments work fine for low order moments which represents less infor-

mation, but is unable to handle higher order moments as higher order moments are

prone to noise. The Zernike moments are used as efficient shape descriptors for im-

ages that cannot be defined by a single contour. The Zernike moment have rotation

invariance and noise robustness properties. But these moments do not have scale

and translation invariance properties which are required for efficient shape recogni-

tion algorithms. These moments can represent global information more accurately,

but smaller images are represented with comparatively less accuracy.

Spectral methods consider images in frequency domain and locate the Fourier com-

ponents. These methods are invariant to rotation, translation and scale and ade-

quately address the recognition of characters with these variations. Wavelet trans-

form of the input coordinates and the angle were used as feature sets for classification

of the Malayalam character with simplified Fuzzy ARTMAP Network, which takes

comparatively very less time for training. It also supports incremental learning mak-

ing it suitable for practical implementation [202]. In the last two decades, extensive

research has taken place in the field of mathematics and computational tools based

on Multi-resolution Analysis. This research has led to the design of newer tools for

analysing the information. Development of wavelets and related transforms has pro-

vided a lot of methods for addressing the problems related to large data sets like

image compression, de-noising and reconstruction of objects. Various efforts have

been made which include simple ideas like thresholding of the orthogonal wavelet co-

efficients of noisy data, followed by its reconstruction. Translation invariance property

was achieved by using un-decimated wavelet transform [173] as further improvement.

150

Page 180: Automation of Preprocessing and Recognition of Historical Document Images

Literature survey reveals that sufficient work has been done in the spectral do-

main for various applications. Even though enormous work is found in the wavelet

transform field, wavelet transforms are suitable only to address point discontinuity

and fail to address edge and curve discontinuity. Apart from the edge discontinuity

problem, discrete wavelet transform uses only three directional wavelets; horizontal,

vertical and diagonal to capture image information. Another limitation of Wavelet

Transform is that it is unable to represent images that contain high levels of direc-

tionality because of which, discrete wavelet transform researchers have tried to find

other spectral approaches which have more directional information in an image. At-

tempts to overcome the disadvantages of waveform has resulted in the development

of ridgelet and curvelet transforms [175], [170]. Furthermore, Curvelet Transform

has been developed to overcome the limitations of Gabor filters. Even though the

multiple orientation approach of Gabor filters gives better results than wavelet trans-

form in representing textures and retrieving images, it is unable to provide complete

spectral information. Therefore Gabor filters cannot be effectively used to represent

images. This will degrade the classifier performance. Hence Curvelet Transforms

are used for feature extraction. Curvelet Transform also provides flexibility for the

degree of localisation in orientation that varies with scale. Fine scale basis functions

are long ridges in curvelet, and the shape of the basis functions at scale j is given by

the 2j × 2j/2 [170]. Brief explanation of Curvelet Transform is provided in chapter 4.

4.3.1

6.3 Proposed Method

The proposed method uses Curvelet Transform to extract character features and uses

minimum distance classifiers to predict the era of the character. The proposed model

comprises of 4 stages viz. 1) Data set generation, 2) Preprocessing, 3) Feature ex-

traction and 4) Classification.

1) Data set generation step deals with the collection of sample characters belong-

ing to various eras.

2) Preprocessing step explains the method of binarization of the scanned documents

151

Page 181: Automation of Preprocessing and Recognition of Historical Document Images

and the segmentation and extraction of individual characters.

3) Curvelet coefficients are extracted using Fast Discrete Curvelet Transform(FDCT)

for each of the segmented characters in the feature extraction step.

4) Nearest Neighbour Method is employed in the classification step, to predict the

era of the character.

6.3.1 Data Set Creation

The Database of era characters contains 4145 samples belonging to 6 various eras

which are extracted from various documents. A minimum of 13 and maximum of 19

characters are considered from 6 different eras and 40 samples are collected for each

character. Out of the 4145 samples, 2600 characters were taken for training and 1545

were considered for testing. These characters have translation, scale and rotational

variance with different image sizes.

6.3.2 Preprocessing

Characters are extracted from documents having uniform intensity. Character images

with pixel value 1 as foreground of an image and plain surface as background with

pixel value 0 have been considered. Binarization techniques are available for binariz-

ing the document images. Otsu[53] method has been used to binarize the document

containing epigraphical characters. Once the binarized image is obtained, characters

are extracted using aConnected Component Analysisa. Characters are normalized to

100 × 50 size to maintain an aspect ratio of 1:2. Many characters have rectangular

rather than square shapes. Characters belonging to same era are labeled using era

numbers and referred to as class labels. Two more data sets have been created from

the same collection. The samples are preprocessed and skeletonized, dilated once

and normalized to 40 × 40 and 64× 64 to study the curvelet transform response for

skeletonized and dilated image characters with equal aspect ratio.

152

Page 182: Automation of Preprocessing and Recognition of Historical Document Images

6.3.3 Feature Extraction using FDCT

In this stage, FDCT is used to extract the features of segmented characters. Curvelet

coefficients of a character image f(m,n)0 ≤ m < M, 0 ≤ n < N whereM×N , is size

of the character image, is calculated using equation(4.24). This equation computes

an array of coefficients of scale j, orientation l along with the location parameters

(k1, k2) as explained in [section curvelet]. These coefficients are used as representative

feature vectors of the segmented character. Such features are computed for all the

characters in the database.

6.3.4 Classification

Characters belonging to various eras are predicted in the classification stage. The

test characters are subjected to preprocessing and feature extraction steps. The

curvelet coefficients of the test character are extracted using FDCT and used as

feature vectors. This feature vector is compared with other feature vectors from

the database. Minimum Distance Classifier is employed to predict the era of the

characters. Distance between the test character feature vector and database feature

vector is computed using Euclidean Distance. The distance with the minimum value

is selected and the era is predicted based on the index of the character using class

label table. The algorithm for era prediction is given below.

6.3.5 Algorithm for Era Prediction

Input: Set of test character images belonging to different eras.

Output: Classification of the test characters based on their eras.

1. Curvelet coefficients are extracted using curvelet transform method which forms

the feature vector of the test character.

2. Feature vector of the test character is compared with database. Euclidean

distance classifier is used to find the match. It calculates the distance between

the test feature vector and trained feature vector using the equation,

d(p, q) = d(q, p) =√

(p1 − q1)2 + (p2 − q2)2 + . . .+ (pn − qn)2where p is the training feature vector and q is the test feature vector. Index of

the distance with minimum value is used to find the era of the character.

153

Page 183: Automation of Preprocessing and Recognition of Historical Document Images

3. Repeat step 1 and 2 for entire set of test characters.

6.4 Experimentation and Results

The characters belonging to 6 different eras are shown in the Fig(6.1). Experimenta-

tion was conducted on normalized character image size 100 × 50 and also compared

with 40 and 64× 64 size character images.

Experimentation has been conducted using curvelet tool box CurveLab-2.1.2[176]

for feature extraction to extract the curvelet coefficients. Experimentation has been

performed to understand the curvelet transform in different scales. Curvelet coef-

ficients at coarsest level are good at capturing the low level approximation of the

function. The remaining scales give the finer details especially corresponding to edge

details. Curvelet coefficients of first scale, second scale and with both scales having 16

orientations are extracted. Size of the feature vector completely depends on the size

of the image as well as the number of scales considered. For an image size of 40× 40,

scale 1 contains 169 coefficients, scale 2 contains 2352 coefficients. Total of both scale

1 and scale 2, contains 2521 coefficients. For character image size 64 × 64, scale 1

consists of 441 coefficients, both 1 and 2 contain 6425 coefficients. Character image

of 100× 50 size contains 3 scales with 561 coefficients in scale 1, 4131 coefficients in

scale 1 and 2.

6.4.1 Experimentation 1

Experimentation on image size 100× 50 has been performed and feature vectors are

obtained with scale 1, scale 2 and for both scales. Table 6.1 shows recognition results

and confusion matrix for the image size 100× 50 with 1 scale having 561 coefficients.

The feature vector size is 1× 561 and the training data set has 2600× 561 coefficient

matrix. The testing set contains 1545 images with each image having again 4131

coefficients in the feature vector and 1545 × 561 total coefficients in the testing set.

The recognition accuracy of 3rd century B.C. characters is 267 out 285 , as these

characters have distinct shapes and features and they stand out to be significantly

different from other era characters. Error rate is 6.32% because 6 to 7 characters have

154

Page 184: Automation of Preprocessing and Recognition of Historical Document Images

similar structure and shape. Beginning few characters are similar to the characters

of the first three eras. Therefore there is a chance of predicting the era incorrectly.

The second set of characters belonging to 1st century A.D. are classified correctly

with 77.41% and wrongly with 22.59%. Since the 1st century A.D. characters have

evolved from the previous century and most of these characters have similar structure.

Therefore the characters that have similarity will be classified wrongly. 5th century

A.D. characters are having overlapping from previous and next century, because of

which recognition rate has decreased and has a value of 78.15%. Recognition rate of

6th, 9th and 11th century A.D are 88.72%, 90.42% and 86.32% respectively.

Table 6.1: Confusion Matrix and Recognition Rate(RR) for character image size

100× 50.

Era BC3 AD1 AD5 AD6 AD9 AD11 Total RR in %

BC3 267 10 3 4 0 1 285 93.68

AD1 31 209 18 5 5 2 270 77.41

AD5 13 13 211 11 8 7 265 78.15

AD6 1 0 12 173 4 5 195 88.72

AD9 1 2 5 4 217 11 240 90.42

AD11 2 1 4 9 23 246 285 86.32

6.4.2 Experimentation 2

In the second experimentation, the image was normalized to square size of 40×40 andfeatures were extracted from the training data set and test images were subjected to

classification. This training set has 2600×169 coefficients and testing set has 1545×169. The recognition rate for various era characters are shown in Table Table 6.2.

Some of the era characters have similar geometrical structure from next and previous

century characters, causing misclassification. In addition, normalizing images with

square dimensions causes many characters to lose significant features. Therefore

normalizing images with square size deteriorates the recognition or classification rate.

155

Page 185: Automation of Preprocessing and Recognition of Historical Document Images

Table 6.2: Confusion Matrix and Recognition Rate (RR) for character image size

40× 40 with first scale.

40× 40 BC3 AD1 AD5 AD6 AD9 AD11 Samples RR in %

BC3 243 17 16 6 0 3 285 85.263

AD1 41 190 25 8 2 4 270 70.370

AD5 13 18 216 15 3 5 270 80.000

AD6 3 3 14 165 5 5 195 84.615

AD9 3 1 6 8 205 17 240 85.417

AD11 3 1 7 8 15 251 285 88.070

6.4.3 Experimentation 3

Experimentation on image size 64 × 64 is shown in the Table 6.3. Classification of

the era characters along with misclassified characters result are shown in the Table

6.3 and observed. As per observations, era prediction rate has decreased with the

increase in size (square matrix size from 40 to 64). The recognition rate for 3rd

century B.C, 1st, 5th, 6th, 9thand11th century A.D were obtained as 89.825%, 65.614%,

75.439%, 56.491%, 69.123%, and 81.404

Table 6.3: Recognition Rate(RR) of the data set 64 × 64 and Confusion Matrix for

character image size 64× 64 with first scale.

64× 64 scale1 BC3 AD1 AD5 AD6 AD9 AD11 Samples RR in %

BC3 256 13 10 4 1 1 285 89.825

AD1 49 187 24 4 1 5 270 65.614

AD5 16 15 215 7 7 10 270 75.439

AD6 4 2 17 161 7 4 195 56.491

AD9 9 0 9 6 197 19 240 69.123

AD11 8 4 15 6 20 232 285 81.404

156

Page 186: Automation of Preprocessing and Recognition of Historical Document Images

Table 6.4: Comparison of the Recognition Rates(RR) for various character image

sizes 40× 40, 64× 64, 100× 50.

Era ↓ 40× 40 Size 64× 64 Size 100× 50

% RR in % RR in % RR in %

BC3 85.263 89.825 93.68

AD1 70.370 65.614 77.41

AD5 80.000 75.439 78.15

AD6 84.615 56.491 88.72

AD9 85.417 69.123 90.42

AD11 88.070 81.404 86.32

Average 82.289 72.982 85.78

Figure 6.2: Prediction Rate for Gabor, Zernike and proposed method.

6.4.4 Discussion

The proposed method was implemented to find the era to which the character be-

longs. In this thesis, prediction of the era has been implemented using the synthetic

157

Page 187: Automation of Preprocessing and Recognition of Historical Document Images

data set generated by various persons. Following assumptions have been made: 1)

Isolated characters are selected and collected manually from various persons 2) These

characters are free from noise, 3) Some of the characters are taken from result of seg-

mentation methods explained in the previous chapter, 4) these characters are having

non uniform resolution and dots per inch, as these documents are acquired from

different scanners at different resolutions.

The characters contained in the data set have rotated/slanted, translated, non

uniform character sizes. The characters also have variable resolutions as images

are acquired using different resolutions under various conditions. Some characters

are created synthetically and some are extracted from the enhanced and segmented

historical documents. The features extracted from these characters should be able to

represent the unique features of individual characters. Therefore feature extraction

method should be efficient in extracting features from characters, so that the era can

be predicted properly. The main focus of the work is to select a suitable feature

extraction technique to address these variations and classify the character eras. As

discussed in the previous chapter’s section 4.3.1, Curvelet Transform is used to extract

the features as it is more efficient in handling curve details and complete spectrum

information of the image.

The distinguishing patterns in the image provide the unique features as represen-

tative feature vectors. The Gabor filters are suitable for extracting features from

objects with different scales and orientations. There is a prerequisite to understand

and analyze the Gabor wavelet bank in detail. There are total 40 filter banks and

they are selected based on the features that need to be extracted. The selection of

the proper scale and the number of filter banks also plays a vital role. Furthermore,

Gabor Filters are unable to represent complete spectrum information or handle curve

discontinuity. Characters have only uniform background after preprocessing which

tends to deteriorate the recognition rate. Therefore the recognition rate is compara-

tively low compared to other methods.

Most popular methods like Hu, Zernike moments are usually used to extract shape

descriptors and are employed in the recognition of characters. These moments work

158

Page 188: Automation of Preprocessing and Recognition of Historical Document Images

fine for low order moments with less information, but are unable to handle higher

order moments as those are prone to noise. Hu and Zernike moments are invariant

to rotation but are not scale invariant. Digital images must be mapped onto the

unit disk before the Zernike moments can be calculated. Zernike moments are not

natural scaling invariants. The scaling invariance should be provided by this mapping;

therefore, the correct mapping of objects into the unit disk is a crucial step. Hence

the recognition of the characters with variable scale is not possible using Zernike

moments.

6.5 Summary

The shapes of the character set in historical scripts have evolved over the centuries.

Hence, in order to competently understand the script, it is necessary to know the

corresponding era in order to use its character set. Lines and curves happen to be

the dominating features in these character sets. Since curvelet transform is effective in

handling these features[172], Fast Discrete Curvelet Transform(FDCT) based model

was designed to predict the era of the script. Experimentations were conducted on

data sets comprising of 4145 images belonging to six different eras. The resultant

recognition rate of the proposed method was 85.78%. The proposed method was

compared with Gabor filter and Zernike moments based approaches. The results

showed that the proposed method on an average had 20% to 25% better accuracy

over Gabor filter and Zernike moment based approaches in efficiently predicting the

era of the epigraphical scripts.

159

Page 189: Automation of Preprocessing and Recognition of Historical Document Images

Chapter 7

Conclusion and Future Work

7.1 Conclusion

Historical documents are immeasurably crucial resources which provide valuable in-

formation about our past. It is necessary to preserve these resources for posterityas

sake in a suitable format. There are various issues which need to be adequately ad-

dressed during the preservation and processing of these documents. One of the major

issues is the legibility of the document content, which is impacted by numerous fac-

tors that have affected the health of these materials. Since these issues persist, as

they are carried over, when they are transferred into digital form, they need to be

handled appropriately.

Therefore, there is a dire need to address these issues using appropriate image

processing techniques. Literature survey reveals that many admirable works exist

in the field of Kannada historical document image processing. In our research work

we have presented several image enhancement algorithms to enhance the quality

of the degraded historical Kannada documents. These algorithms satisfactorily de-

noise the input document image and subsequently binarize the document for further

processing. The resultant image thus produced by these methods, is an enhanced

image with sharp edges which is further used to segment and predict the era of the

scripts.

160

Page 190: Automation of Preprocessing and Recognition of Historical Document Images

To enhance the degraded document images, five image enhancement algorithms

have been developed, three of these are in the spatial domain and other two are in

the frequency domain. In the spatial domain, morphological reconstruction (MR)

techniques have been used to develop an algorithm to eliminate dark uneven noisy

backgrounds. It has also been used in combination with the remaining four algo-

rithms as a background elimination technique. However, MR method was unable to

effectively handle severely degraded noisy images and furthermore it was also unable

to address all types of problems posed by degraded documents. Therefore a Bilateral

Filter (BF) based approach was devised in combination with domain and range fil-

tering methods in order to de-noise severely degraded images without smoothing the

edges and it proved to be quite proficient in this undertaking. BF method performed

better than MR method and was also found to enhance stone inscription images.

However, the computational time of BF method turned out to be slightly more than

that of MR method. MR method has its complexity equivalent to the size of the

image and was unable to enhance all types of degraded document images.

The major limitation of this method is the selection of the controlling parameter

value and structuring element size.

Furthermore, an algorithm based on Non Local Means Filter (NLMF) was imple-

mented to de-noise the document images using similarity measure between non local

windows in order to enhance the image. This method adequately addressed severely

degraded document images by eliminating noise and preserving the edge information.

Although NLMF method proved to be a better solution when compared with the pre-

vious two methods, the computational cost was very high. Moreover, the need for the

proper selection of the search and patch window sizes tends to complicate matters

further because as the search window size increases so does the computational time.

The performances of the above three spatial methods were measured using PSNR

value, execution time, human interpretation and binarized image after enhancement.

PSNR value is the quantitative measurement for performance, based on the intensity

difference between the input image and the output image. High PSNR value signifies

large difference in the intensity of input and output image, and is considered as

a good measure for obtaining better results. However, it is very difficult to prove

161

Page 191: Automation of Preprocessing and Recognition of Historical Document Images

that the method having high PSNR value will always give better results. Although,

computational time cannot be taken as a parameter for measuring the performance

of any given algorithm, we have considered this parameter to signify the duration

taken by different methods from a practical point of view.

To support the evaluation criteria, we have considered computational time as one

of the parameters to evaluate performance. Therefore time complexity plays a major

role in evaluating the performance of any algorithm. Image enhancement algorithms

need to produce images which are not only good in quality but are also visually

appealing, irrespective of their PSNR values and computational time. Human inter-

pretation along with binarized output, PSNR and computational time are required.

The performance of an image enhancement algorithm is also measured by the seg-

mentation algorithm, which tells us how well the binarized image will get segmented

into lines, words and characters. Therefore, these are the deciding parameters that

are used to evaluate the performance of an image enhancement algorithm. While the

PSNR values of BF method are higher than MR method, the PSNR values of NLMF

method are higher than that of all the other methods. Execution time of MR method

is considerably less when compared to the other two methods. BF method takes ten

times more time than MR method. Although NLMF method satisfactorily enhances

the image, it approximately consumes ten times more time than BF method. NLMF

and BF methods enhance the image with respect to PSNR values and human visual

interpretation and NLMF method takes more time than MR and BF methods which

turns out to be a major drawback.

Frequency domain based transforms are employed in preprocessing the documents.

An algorithm based on Wavelet Transform (WT) has been developed to analyze and

restore the degraded document images. Since wavelet transform handles only point

discontinuity and not curve discontinuities, another algorithm based on Curvelet

Transform (CT) approach was devised, which proved to be better than other pre-

processing algorithms developed in this research work. The major advantage of the

frequency domain based approach lies in the efficiency of its computational time and

the fact that it does not require the selection of any parameters that control the

output of the filter. However, the selection of the structuring element for morpholog-

ical operations is completely dependent on the size of the character in the document

162

Page 192: Automation of Preprocessing and Recognition of Historical Document Images

image. Wavelet transform can be applied to images of variable size, where as curvelet

transform requires the size of the image to be square. This is another limitation of

the curvelet transform based approach in addition to the selection of the structuring

element size. Two frequency domain based methods were compared using PSNR

values, computational time and human visual perceptions. The CT based method

takes slightly more time than WT, but gives an enhanced image in terms of PSNR

values and human visual interpretation.

To segment the historical document image, two segmentation algorithms have been

developed: one is based on piecewise projection profile method and other on mor-

phological operations and Connected Component Analysis (CCA). The First method

addresses uneven spacing between lines by dividing the image into vertical stripes.

It subsequently extracts each line from each the stripes and combines them to make

a line. Although this method segments the document image with uneven spacing

between two lines, it is unable to segment touching lines. The Second method was

developed to address both uneven line spacing and touching lines using morphological

operations and connected component analysis.

Skew is a common error introduced during the image acquisition process which

is performed either using a camera or a scanner and needs to be de-skewed. Hand-

written documents typically contain multiple skewed lines which are commonly due

to uneven spacing between lines where each line gets skewed. Global skew correc-

tion algorithms were not helpful in segmenting the handwritten documents correctly.

To address the skew problem within the document lines, an extended version of the

second segmentation algorithm: gray scale morphological morphology and connected

component analysis based method was developed. To recognize the segmented char-

acters, the character set pertaining to that particular era needs to be identified.

Therefore the prediction of the era of the script necessitates knowing its character

set. To predict the period of the script/character, a recognition algorithm based on

curvelet transform has been implemented. Curvelet transform is employed to extract

the character features and the minimum distance classifier is used to classify the

characters according to their eras.

163

Page 193: Automation of Preprocessing and Recognition of Historical Document Images

7.2 Future Work

The present research work implemented various preprocessing techniques such as the

elimination of noise, segmentation of lines and characters, skew detection and cor-

rection which are very essential for historical document image processing. Historical

documents are usually plagued by low contrast, noise, existence of broken characters

and are typically found in worn out conditions. In this research work, several algo-

rithms have been developed for preprocessing of low contrast and noisy documents.

Although the methods developed in this work are suitably efficient, they were not

extended to address all types of degradations.

Issues in historical documents like the Ink bleeding through creates double con-

tours for characters, which makes the binarization task much more difficult. The

extraction of characters from documents which have ink bleeding through them is

the limitation of our research work and can be taken up as future work. Folding of

the paper introduces unwanted lines and may create distortions if it appears in the

middle of some characters. Suitable elimination of these lines will be our future work.

Even though the enhancement techniques improved legibility, they failed to eliminate

the noise completely. Therefore Post processing algorithms are required to eliminate

these noises and reconstruct the broken characters and words. These could prove to

be grounds for future work.

Extracting useful information from severely degraded stone inscriptions and palm

leaf documents poses an array of challenges. Image acquisition task has to handle

the issues related to capturing and scanning the document. Furthermore, several

additional complexities will be introduced during the image acquisition process based

on lighting condition, image resolution, document size, character size of the document,

paper/palm leaf position, blurring, illumination etc. Elimination of these problems

can also be taken up for future research work.

Documents also contain calligraphic and ornamental styles and carvings of animals,

birds etc and are typically inscribed during and after the writing process. These pose

significant challenges, not only to document processing but also in the extraction of

characters for recognition purposes. The algorithms developed in our research have

164

Page 194: Automation of Preprocessing and Recognition of Historical Document Images

not been extended to address these issues. Elimination of such styles creates more

avenues for the research community.

Palm leaf documents are available in large volumes and they need to be efficiently

deciphered. Digitizing and enhancing these documents requires fully automated sys-

tems. As palm leaves vary in size, acquiring proper images from their raw form

requires technical expertise. The Scanning process also introduces varieties of chal-

lenges and makes the enhancement task much more difficult. In this work, we have

considered digitized document images and subsequently processed them. Therefore,

the problems that occur during image acquisition have not been addressed. This

could be a motivation for researchers to take up.

The stone inscriptions pose a wide range of problems, starting from image acqui-

sition, all the way up to image recognition. In our research work, we have enhanced

few images which were available in digital format, but most of these images posed a

multitude of problems even after enhancement. These problems need to be handled

effectively and can be taken up in future works.

Handwritten documents are typically very hard to segment into lines, words and

characters, as they consist of very narrow spaces between lines, touching characters,

skewed and curved lines. Efficient algorithms need to be designed to appropriately

extract the lines from curved lines. The algorithms which are developed in this re-

search work adeptly handled segmented touching lines by breaking their connectivity.

The reconstruction of the broken characters has not been handled, it has been de-

ferred for our future work, where we will make an attempt to reconstruct the broken

character in its entirety. Efforts were expended in using Morphological operations to

reconstruct during enhancement, but they were not proficient in handling that task.

As we have mentioned earlier, we have used a myriad of digitized images through-

out our research work and they were converted from their raw for using different

setups. Methods implemented in this research work demanded that the analysis of

the document for character size be done manually. Furthermore it required that the

approximate estimation of the resolution of the image acquisition device to select an

165

Page 195: Automation of Preprocessing and Recognition of Historical Document Images

optimal value for the controlling parameters and the size of the structuring element.

This could be an effective motivation in paving the path for further work in the

automation of deciphering of historical document images. Since Stone inscriptions

pose several unprecedented challenges, they provide a lot of compelling avenues for

future work. Future work can also be directed towards effectively handling broken

and touching characters and efforts can be guided towards the complete automation

of the deciphering process.

The field of digitization of Historical documents has immense potential whose

applications span numerous fields and domains. Research in Historical Document

Processing endures by being continually intriguing and proffers a fertile ground for

researchers to immerse themselves into and since the research in this field has mostly

remained passive, there is not only a wide scope but also dire need to bring it into

mainstream focus through a number of lively studies.

166

Page 196: Automation of Preprocessing and Recognition of Historical Document Images

Appendix A

Palm Leaf Images

Figure A.1: Original image of palm leaf script of 18th century.

167

Page 197: Automation of Preprocessing and Recognition of Historical Document Images

Figure A.2: Input images of palm leaf document belonging to 17th century.

Figure A.3: Palm leaf image belonging to 18th century. noisy input image.

Figure A.4: Input image of palm leaf document belonging to 17th century.

168

Page 198: Automation of Preprocessing and Recognition of Historical Document Images

Figure A.5: Input image of palm leaf document belonging to 17th century.

Figure A.6: Input images of palm leaf document belonging to 17th century.

169

Page 199: Automation of Preprocessing and Recognition of Historical Document Images

Appendix B

Paper Images

Figure B.1: Sample paper image belonging to previous century.

170

Page 200: Automation of Preprocessing and Recognition of Historical Document Images

Figure B.2: Original paper image -1 belonging to nineteenth and beginning of twen-

tieth century.

171

Page 201: Automation of Preprocessing and Recognition of Historical Document Images

Figure B.3: Original paper image -2 belonging to nineteenth and beginning of twen-

tieth century.

172

Page 202: Automation of Preprocessing and Recognition of Historical Document Images

Figure B.4: Original paper image-3 belonging to nineteenth and beginning of twen-

tieth century.

173

Page 203: Automation of Preprocessing and Recognition of Historical Document Images

Appendix C

Stone Inscription Images

Figure C.1: Stone inscription image belonging to 14− 17th century.

174

Page 204: Automation of Preprocessing and Recognition of Historical Document Images

Figure C.2: Digitized image of Belur temple inscription belonging to 17th century

AD.

175

Page 205: Automation of Preprocessing and Recognition of Historical Document Images

Figure C.3: Digitized image of Belur temple inscriptions belonging to 17th century

AD.

176

Page 206: Automation of Preprocessing and Recognition of Historical Document Images

Figure C.4: Digitized image of Shravanabelagola temple inscriptions belonging to

14th century AD.

177

Page 207: Automation of Preprocessing and Recognition of Historical Document Images

Appendix D

Author’s Publications

List of Publications in Journal

1. B. Gangamma, Srikanta Murthy K, Arun Vikas Singh, “Hybrid Approach Using

Bilateral Filter and Set Theory for Enhancement of Degraded Historical Doc-

ument Image”, CiiT International Jounal of Digital Image Processing, Volume

5, Issue May, pages 488-496, 2012.

2. B. Gangamma, Srikanta Murthy K, Arun Vikas Singh, “Restoration of De-

graded Historical Document Image”, Journal of Emerging Trends in Computing

and Information Sciences, Volume 3, No. 5, pages 792-798, May 2012.

3. B. Gangamma, Srikanta Murthy K, “An Effective Technique using Non Local

Means and Morphological Operations to Enhance Degraded Historical Docu-

ment”, International Journal of Electrical, Electronics and Computer Systems,

Volume 4, Issue 2, pages 1-10, 2011.

4. B. Gangamma, Srikanta Murthy K, “Enhancement of Degraded Historical Kan-

nada Documents”, International Journal of Computer Applications (0975-8887),

Volume 29, No. 11, pages 1-6, September 2011.

5. B. Gangamma, Srikanta Murthy K , “A Collective Approach for Enhancement

and Segmentation of Historical Document Image using Mathematical Morphol-

178

Page 208: Automation of Preprocessing and Recognition of Historical Document Images

ogy and Non Local Means”, communicated to International Journal on Image

and Graphics, World Scientific Publications.

List of Publications in Conferences

1. B. Gangamma, Srikanta Murthy K, Punitha P, “Curvelet Transform Based Ap-

proach for Prediction of Era of the Epigraphical Scripts”, IEEE International

Conference on Computational Intelligence and Computing Research, coimbat-

ore, pages 636-641, 2012.

2. B. Gangamma, Srikanta Murthy K, Riddhi J Shah, Swati D V, “Extraction

of Text Lines from Historical Documents using Mathematical Morphology”,

National Conference on Indian Language Computing, Cochin, pages 1-4, 2012.

3. B. Gangamma, Srikanta Murthy K, Riddhi J. Shah, Swati D V, “ Text Line Ex-

traction from Palm Script Documents Using Morphological Approach”, Inter-

national Conference on Computer Engineering and Applications Dubai, UAE,

pages 1452-1455, January 29-31, 2012.

4. B. Gangamma, Srikanta Murthy K, “Enhancement of Historical Document Im-

age using Non Local Means Filtering Technique”, IEEE International Confer-

ence on Computational Intelligence and Computing Research, Kanyakumari,

pages 1264-1267, 2011.

5. B. Gangamma, Srikanta Murthy K, Priyanka Chandra G C, Shishir Kaushik,

Saurabh Kumar, “A Combined Approach for Degraded Historical Documents

Denoising Using Curvelet and Mathematical Morphology”, IEEE International

Conference on Computational Intelligence and Computing Research, Coimbat-

ore, pages 824-829, 2010.

6. B. Gangamma, Srikanta Murthy K, Hemanth Kumar G, Riddhi J Shah, Swati

D V, Sandhya B, “Text Line extraction from Kannada Handwritten Docu-

ment”, IEEE, International Conference on Computer Engineering and Tech-

nology, Jodhpur, India, pages E 8-11, 2010.

179

Page 209: Automation of Preprocessing and Recognition of Historical Document Images

7. B. Gangamma, Srikanta Murthy K, Priyanka Chandra G C, Shishir Kaushik,

Saurabh Kumar, “Degraded Historical Documents Enhancement Using Curvelet

and Mathematical Morphology ”, IEEE, International Conference on Computer

Engineering and Technology, Jodhpur, pages E105-111, 2010.

180

Page 210: Automation of Preprocessing and Recognition of Historical Document Images

Bibliography

[1] Lap K. H., Guan L., Perry S. W., Wong H. S., “Adaptive image processing, a

computational Intelligent perspective”, CRC Press, Taylor and Francis Group,

2010.

[2] Umbaugh S. E., “Digital Imaging Processing and Analysis : Human and Com-

puter Vision Applications with CVIP tools,”, CRC Press, Taylor & Francis

Group, Second Edition, 2010.

[3] Cheriet M., Kharma N., Liu C. L., “Character Recognition System: A Guide

for Students and Practitioners”, John Wiley & Sons Publications, 2007.

[4] Govindaraju V., Setlur S., “Guide to OCR for Indic Scripts: Document Recog-

nition and Retrieval”, Springer-Verlag London Ltd, 2009.

[5] Sircar D C, “Indian Epigraphy”, Motilal Banarsidass Publications, Delhi, 1996.

[6] Narasimhacharya R, “History of Kannada Literature”, Madras: Asian Educa-

tional Services, 1988.

[7] Tsien, Tsuen-Hsuin, “Paper and Printing”, Joseph Needham, Science and Civil-

isation in China, Chemistry and Chemical Technology, Cambridge University

Press, Volume 5, part 1, 1985.

[8] Murthy K. S., “Transformation of Epigraphical Objects into Machine Recog-

nizable Image Patterns”, Ph.D Thesis, University of Mysore, 2005.

[9] Murthy A. V. N., “Kannada Lipiya Ugama mattu Vikasa”, Kannada Ad-

hyayana Samsthe, Mysore University, 1968.

181

Page 211: Automation of Preprocessing and Recognition of Historical Document Images

[10] Parpola A., “The Indus Script: a Challenging Puzzle”, World Archaeology,

Volume 17, No. 3, pages 399-419, Feb 1986.

[11] www.ancientscripts.com

[12] Fisher R., Ken D. H., Fitzgibbon A., Robertson C., Trucco E., “Dictionary

of Computer Vision and Image Processing”, John Wiley & Sons, Publications,

2005.

[13] Haralick R. M, Shapiro L. G., “Glossary of Computer Vision Terms”, Pattern

Recognition, Volume 24, Issue 1, pages 69-93, 1990.

[14] Lu S., Tan C. L., “Binarization of Badly Illuminated Document Images through

Shading Estimation and Compensation”, Ninth International Conference on

Document Analysis and Recognition, pages 321-316, 2007.

[15] Ntogas, N., Ventzas D., “A Binarization Algorithm for Historical Manuscripts”,

12th WSEAS International Conference on Communications, Greece, July 23-25,

pages 41-51, 2008.

[16] Likeforman-Sulem L., Drabon J., Smith E. H. B., “Enhancement of Historical

Printed Document Images By Combining and Non Local Means Filtering”,

Image and Vision Computing, Volume 29, Issue 5, pages 351-363, April 2011.

[17] Buades A., Coll B., Morel J. M., “A Non-Local Algorithm for Image Denois-

ing”, Proceedings IEEE Computer Society Conference on Computer Vision and

Pattern Recognition, Volume 2, pages 60-65, 2005.

[18] Gatos B., Pratikakis I., Perantonis S. J., “Adaptive Degraded Document Image

Binarization”, Journal of Pattern Recognition, Volume 39, pages 317-327, 2005.

[19] Kishore N. K., Rege P. P., “Adaptive Enhancement of Historical Document

Images”, IEEE International Symposium on Signal Processing and Information

Technology, pages 983-88, 2007.

[20] Razak Z, Zulkiflee K, Idris M. Y. I, Tamil E. M, Noor M. N. M, Rosli Salleh R,

Yusof M. Y. M. Y, Yaacob M. “Off-line Handwriting Text Line Segmentation:

182

Page 212: Automation of Preprocessing and Recognition of Historical Document Images

A Review”, International Journal of Computer Science and Network Security,

Volume 8, Issue 7, pages 12-20 , 2008.

Razak08

[21] B. Yanikoglu,P.A. Sandon,Segmentationof Off-line Cursive Handwriting Us-

ing Linear Programming ,Pattern Recognition, Volume 31, Issue 12,pages

1825a1833, 1998.

[22] Louloudis G, Gatos B, Halatsis C, aText line detection in unconstrained hand-

written documents usinga block-based Hough transform approacha, Proceed-

ings of International Conference on Document Analysis and Recognition, pages

599a603, 2007

[23] Nagy G., Seth S., ”Hierarchical Representation of Optically Scanned Docu-

ments”, Seventh International Conference on Pattern Recognition, pages 347-

349, 1984.

[24] Wahl F. M., Wong K.Y., Casey R. G., “Block Segmentation and Text Extrac-

tion in Mixed Text/Image Documents”, Computer Graphics and Image Pro-

cessing, pages 375-390, 1982.

[25] Feldbach M., Tonnies K. D, “Line detection and segmentation in Historical

Church registersa, Proceedings of the 6th International Conference on Document

Analysis and Recognition, pages 743-747, 2001.

[26] O’Gorman L., “The document spectrum for page layout analysisa, IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, Volume 15, Issue 11,

pages 1162-1173, 1993.

[27] Breuel T. M., “Two geometric algorithms for layout analysisa, Proceedings of

the 5th International Workshop on Document Analysis Systems V, pages 188-

199, 2002.

[28] Hough P. C. V., “Methods and Means for Recognizing Complex Patterns”, US

Patent 3069654, 1962.

183

Page 213: Automation of Preprocessing and Recognition of Historical Document Images

[29] Duda R. D., Hart P.E., “Use of the Hough Transform to Detect Lines and

Curves in Pictures”, Communications of the ACM, Volume 15, Issue 1, pages

11-15, 1972.

[30] Manmatha R., Rothfeder J. L., “A scale space approach for automatically seg-

menting words from historical handwritten documents”, IEEE Transactions on

Pattern Analysis and Machine Intelligence, Volume 27, Issue 8, pages 1212-

1225, 2005.

[31] He J., Downton A. C., ”User-assisted archive document image analysis for

digital library construction”, Seventh International Conference on Document

Analysis and Recognition, pages 498-502, 2003.

[32] Shi Z., Govindaraju V., “Line separation for complex document images using

fuzzy runlength”, Proceedings a First International Workshop on Document

Image Analysis for Libraries a DIAL 2004, pages 306-312, 2004.

[33] Likforman-Sulem L., Hanimyan A., Faure C., “A Hough based algorithm for

extracting text lines in handwritten document” Proceedings of International

Conference on Document Analysis and Recognition, pages 774-777, 1995.

[34] Manmatha R., Srimal N., “Scale space technique for word segmentation in

handwritten manuscripts”, Proceedings 2nd International Conference on Scale

Space Theories in Computer Vision, pages 22-33, 1999.

[35] Pal U., Datta S., “Segmentation of Bangla Unconstrained Handwritten Text”,

Proceedings of Seventh International Conference on Document Analysis and

Recognition, pages 1128-1132, 2003.

[36] Surinta O., Chamchong R., “Image Segmentation of Historical Handwriting

from Palm Leaf Manuscripts”, Intelligent Information Processing IV series,

pages 182-189, 2008.

[37] Kunte R. S., Samuel R. D. S., “A Simple and Efficient Optical Character Recog-

nition System for Basic Symbols in Printed Kannada Text”, Sadhana, Volume

32, pages 521-533, 2007.

184

Page 214: Automation of Preprocessing and Recognition of Historical Document Images

[38] Arica N., Vural F. Y., “An Overview of Character Recognition Focused on

Off-line Handwriting”, IEEE Transactions on Systems, Man, and Cybernetics,

Part C: Applications and Reviews, Volume 31, Issue 2, pages 216-233, 2001.

[39] Boutros G., “Automating Degraded Image Enhancement Processing”, Sympo-

sium on Document Image Understanding Technology, College Park, Maryland,

2005.

[40] Jain A., Bhattacharjee S., “Text Segmentation using Gabor Filters for Auto-

matic Document Processing”, MVA 5, pages 169-184, 1992.

[41] Mello C. A. B., Cavalcanti C. S. V. C., Carvalho C., “Colorizing Paper Texture

of Green-Scale Image of Historical Documents”, In: Proceedings of the 4th

IASTED Conference on Visualization, Imaging and Image Processing, 2004.

[42] Ulges A., Lampert C. H, Breuel T. M., “Document Image Dewarping using Ro-

bust Estimation of Curled Text Lines”, Proceedings of 8th International Con-

ference on Document Analysis and Recognition, pages 1001-1005, 2005.

[43] Zhang L., Tan C. L, “Warped Image Restoration with Applications to Digital

Libraries”, Proceedings of 8th International Conference on Document Analysis

and Recognition, pages 192-196, 2005.

[44] Cao H., Ding X., Liu C., “Rectifying the bound document image captured by

the camera: a model Based Approach”, Proceedings 7th International Confer-

ence on Document Analysis and Recognition, pages 71-75, 2003.

[45] Fan J., Lin X., Simske S, “A Comprehensive Image Processing Suite for Book

Re-mastering”, Proceedings of 8th International Conference on Document Anal-

ysis and Recognition, pages 447-451, 2005.

[46] Jayadevan R., Kolhe S. R., Patil P. M., Pal U., “Automatic processing of hand-

written bank cheque images:a Survey”, International Journal on Document

Analysis and Recognition, Volume 15, Issue 4, page 267-297, July 2011.

[47] Suen C. Y., Lam L., Guillevic D., Strathy N. W., Cheriet M., Said J. N., Fan

R., “Bank check processing system”, International Journal on Imaging Systems

Technology, Volume 7, pages 392-403, 1996.

185

Page 215: Automation of Preprocessing and Recognition of Historical Document Images

[48] Madasu V. K., Lovell B. C., “Automatic Segmentation and Recognition of Bank

Cheque Fields”, Proceedings of the Digital Imaging Computing: Techniques and

Applications, pages 33-38, 2005.

[49] Neves R. F. P., Mello C. A. B., Silva M. S., Bezerra B. L. D., “A New Algorithm

to Threshold the Courtesy Amount of Brazilian Bank Checks”, Proceedings of

IEEE International Conference on Systems Man and Cybernetics, pages 1226-

1230, 2008.

[50] Hull J. J., “Document Image Skew Detection: Survey and Annotated Bibliogra-

phy”, Document Analysis Systems II, pages 40-64. World Scientific, Singapore,

1998.

[51] Lee L. L., Lizarraga M.G, Gomes N.R, Koerich A.L, “A Prototype for Brazil-

ian Bank Check Recognition”, International Journal on Pattern Recognition

Artificial Intelligence, Volume 11, Issue 4, pages 549-570, 1997.

[52] Sahoo P. K., Soltani S., Wong A. K. C, Chen Y. C., “A Survey of Thresholding

Techniques”, Computer Vision Graphics and Image Processing, Volume 41,

Issue 2, pages 233-260, 1988.

[53] Otsu N., “A Threshold Selection Method from Gray Level Histograms”, IEEE

Transaction on Systems Man and Cybernetics, Volume 9, Issue 1, pages 62-66,

1979.

[54] Sezgin M, Sankur B, “Survey Over Image Thresholding Techniques and Quan-

titative Performance Evaluation”, Journal on Electronic Imaging, Volume 13,

Issue 1, pages 317-327, 2004.

[55] Mello C. A. B, Bezerra B. L. D., Zanchettin C., Macario V., “An efficient thresh-

olding algorithm for Brazilian bank checks”, Proceedings of 9th International

Conference on Document Analysis and Recognition, pages 193-197, 2007.

[56] Palacios R, Gupta A, “A System for Processing Handwritten Bank Checks

Automatically”, Image and Vision Computing Journal, Volume 26, Issue 10,

pages 1297-1313, October, 2008.

186

Page 216: Automation of Preprocessing and Recognition of Historical Document Images

[57] Chandra L., Gupta R., Kumar P., Ganotra D., “Automatic Courtesy Amount

Recognition for Indian Bank Checks”, Proceedings of IEEE Region 10 Confer-

ence, pages 1-5, 2008.

[58] Kim G., Govindaraju V., “Bank Check Recognition using Cross Validation Be-

tween Legal and Courtesy Amounts”, International Journal on Pattern Recog-

nition Artificial Intelligence, Volume 11, Issue 4, pages 657-674, 1997.

[59] Guillevic D., Suen C. Y., “Recognition of Legal Amounts on Bank Cheques”,

International Journal on Pattern Analysis and Application, Volume 1, Issue 1,

pages 28-41, 1998.

[60] Guillevic D., Suen C. Y., “Cursive Script Recognition Applied to the Processing

of Bank cheques”, Proceedings of 3rd International Conference on Document

Analysis and Recognition, pages 11-14, 1995.

[61] Guillevic D., “Unconstrained Handwriting Recognition Applied to the Recog-

nition of Bank Cheques”, PhD thesis, Concordia University, 1995.

[62] Kaufmann G., Bunke H., “Automated Reading of Cheque Amounts”, Interna-

tional Journal on Pattern Analysis and Application, Volume 3, pages 132-141,

2000.

[63] Kimura F., Tsuruoka S., Miyake Y., Shridhar M., “A Lexicon Directed Algo-

rithm for Recognition of Unconstrained Handwritten Words”, IEICE Transac-

tions on Information Systems, pages 785-793, 1994.

[64] Antonacopoulos A., “Flexible Page Segmentation using The Background”, Pro-

ceedings of the 12th International Conference on Pattern Recognition, Volume

2, pages 339-344, 1994.

[65] Fletcher L. A., Kasturi R., “Text String Segmentation From Mixed

Text/Graphics Images”, IEEE Pattern Analysis and Machine Intelligence, Vol-

ume 10, Issue 3, pages 910-918, 1988.

[66] Nagy G., Seth S., “Hierarchical Representation Of Optically Scanned Docu-

ments”, 7th International Conference on Pattern Recognition, pages 347-349,

1984.

187

Page 217: Automation of Preprocessing and Recognition of Historical Document Images

[67] Downton A., Leedham C. G., “Preprocessing And Presorting of Envelope Im-

ages for Automatic Sorting using OCR”, International Journal on Pattern

Recognition, Volume 23, Issue 3-4, pages 347-362, 1990.

[68] Cohen E., Hull J., Srihari S., “Understanding Handwritten Text in a Structured

Environment: Determining Zip Codes From Addresses”, International Journal

on Pattern Recognition, Volume 5, Issue 1-2, pages 221-264, 1991.

[69] Govindaraju V., Srihari S. N., “Handwritten Text Recognition”, Proceedings of

Document Analysis Systems, pages 157-171, 1994.

[70] Seni G., Cohen E., “External Word Segmentation of Off-line Handwritten Text

Lines”, Journal of Pattern Recognition, Volume 27, Issue 1, pages 41-52, 1994.

[71] Srihari S., Kim G., “Penman: A System for Reading Unconstrained Handwrit-

ten Page Image”, Symposium on Document Image Understanding Technology,

pages 142-153, 1997.

[72] Zhang B., Srihari S. N., Huang C., “Word Image Retrieval Using Binary Fea-

tures”, SPIE Conference on Document Recognition and Retrieval XI, pages

18-22, January 2004.

[73] Zahour A., Taconet B., Mercy P., Ramdane S., “Arabic Handwritten Text-

Line Extraction”, Proceedings of the 6th International Conference on Document

Analysis and Recognition, Seattle, pages 281-285, 2001.

[74] Shapiro V., Gluhchev G., Sgurev V., “Handwritten Document Image Segmen-

tation and Analysis”, Pattern Recognition Letters, Volume 14, pages 71-78,

1993.

[75] Antonacopoulos A., Karatzas D., “Document Image Analysis for World War II

Personal Records”, First International Workshop on Document Image Analysis

for Libraries, DIALa04, pages 336-341, 2004.

[76] Wong K., Casey R., Wahl F., “Document Analysis Systems”, IBM J. Res. Dev.

Volume 26, Issue 6, pages 647-656, 1982.

188

Page 218: Automation of Preprocessing and Recognition of Historical Document Images

[77] LeBourgeois F., “Robust Multifont OCR System From Gray Level Images”,

4thInternational Conference on Document Analysis and Recognition, Volume 1,

pages 1-5, 1997.

[78] LeBourgeois F., Emptoz H., Trinh E., Duong J., “Networking Digital Document

Images”, 6th International Conference on Document Analysis and Recognition,

pages 379-383, 2001.

[79] Shi Z., Govindaraju V.,“Line Separation for Complex Document Images using

Fuzzy Runlength”, Proceedings of International Workshop on Document Image

Analysis for Libraries, pages 23-24, January 2004.

[80] Pu Y., Shi Z., “A natural learning algorithm based on Hough transform for

text lines extraction in handwritten documents”, In: Proceedings of the 6th

International Workshop on Frontiers in Handwriting Recognition, Korea, pages

637-646, 1998.

[81] Oztop E., Mulayim A. Y., Atalay V., Yarman-Vural F., “Repulsive Attrac-

tive Network for Baseline Extraction on Document Images”, Signal Processing,

Volume 75, pages 1-10, 1999.

[82] Tseng Y. H., Lee H. J., “Recognition-Based Handwritten Chinese Character

Segmentation using a Probabilistic Viterbi Algorithm”, Pattern Recognition

Letters, Volume 20, Issue 8, pages 791-806, 1999.

[83] Bruzzone E., Coffetti M. C., “An Algorithm for Extracting Cursive Text Lines”,

Proceedings of International Conference on Document Analysis and Recogni-

tion, pages 20-22, pages 749-752, 1999.

[84] Khandelwal A., Choudhury P., Sarkar R., Basu S., Nasipuri M., Das N., “Text

Line Segmentation for Unconstrained Handwritten Document Images Using

Neighborhood Connected Component Analysis”, Pattern Recognition and Ma-

chine Intelligence, pages 369-374, 2009.

189

Page 219: Automation of Preprocessing and Recognition of Historical Document Images

[85] Zahour A., Taconet B., Likforman-Sulem L., Boussellaa W., “Overlapping and

Multi-Touching Text-Line Segmentation by Block Covering Analysis”, Interna-

tional Journal on Pattern Analysis and Applications, Volume 12, Issue 4, pages

335-351, 2009.

[86] Boussellaa W., Bougacha A., Zahour A., Abed H. E, Alimi A., “Enhanced Text

Extraction from Arabic Degraded Document Images using EM Algorithm”,

International Conference on Document Analysis and Recognition, pages 743-

747, 2009.

[87] Bloomberg D. S., “Multiresolution Morphological Approach to Document Im-

age Analysis”, Proceedings of International Conference on Document Analysis

and Recognition, pages 963-971, 1991.

[88] Bukhari S. S., Shafait F., Breue T. M., “Improved Document Image Seg-

mentation Algorithm using Multiresolution Morphology”, In IST/SPIE Elec-

tronic Imaging, International Society for Optics and Photonics,, pages 78740D-

78740D, January 24, 2011.

[89] Bansal V., Sihna R. M. K, “Segmentation of Touching and Fused Devanagari

Characters”, Pattern Recognition, Volume 35, Number 4, pages 875-893, 2002.

[90] Ashkan M.Y., Guru D. S., Punitha P., “Small Eigen Value Based Skew Es-

timation in Persian Digitized Documents”, Proceedings of the International

Conference on Computer Graphics, Imaging and Visualisation, pages 64-70,

2006.

[91] Shi Z., Govindaraju V., “Historical Document Image Enhancement Using Back-

ground Light Intensity Normalization”, 17th International Conference on Pat-

tern Recognition, Volume 1, pages 473-476, 2004.

[92] Shi Z., Govindaraju V., “Historical Handwritten Document Image Segmenta-

tion Using Background Light Intensity Normalization”, Proceedings of SPIE

5676, 167 and Document Recognition and Retrieval, Issue 12, pages 167-174,

2005.

190

Page 220: Automation of Preprocessing and Recognition of Historical Document Images

[93] Yan C, Leedham G., “Decompose Threshold Approach to Handwriting Extrac-

tion in Degraded Historical Document”, Proceedings of the 9th International

Workshop on Frontiers in Handwritten Recognition, pages 239-244, 2004.

[94] Louloudis G., Gatos B., Pratikakis I., Halatsis K., “A Block-Based Hough

Transform Mapping for Text Line Detection in Handwritten Documents”, Pro-

ceedings of the Tenth International Workshop on Frontiers in Handwriting

Recognition, pages 515-520, 2006.

[95] Gatos B., Pratikakis I., Perantonis S. J., “Efficient Binarization of Historical

and Degraded Document Images”, 8th International Workshop on Document

Analysis Systems(DAS’08), pages 447-454, Japan, 2008.

[96] Shi Z., Setlur S., Govindaraju V., “Digital Image Enhancement of Indic His-

torical Manuscripts, Guide to OCR for Indic Scripts”, Advances in Pattern

Recognition, Springer-Verlag London Ltd, pages 249-267, 2009.

[97] Shi Z., Setlur S., Govindaraju V., “A Steerable Directional Local Profile Tech-

nique for Extraction of Handwritten Arabic Text Lines”, Proceedings of the

10th International Conference on Document Analysis and Recognition, pages

176-180, 2009.

[98] Nikolaou N., Makridis M., Gatos B., Stamatopoulos N., Papamarkos N.,

“Segmentation of Historical Machine-Printed Documents using Adaptive Run

Length Smoothing and Skeleton Segmentation Paths”, Image and Vision Com-

puting Journal, Volume 28, Issue 4, pages 590-604, 2010.

[99] Fadoua D., Bourgeis F. L., Emptoz H., “Restoring Ink Bleed Through Degraded

Document Images Using a Recursive Unsupervised Classification Technique”,

Spinger-Verlag Berlin Heidelberg, DAS LNCS 3872, pages 38-49, 2006.

[100] Gatos B., Pratikakis I., Perantonis S.J., “Improved Document Image Bina-

rization by Using a Combination of Multiple Binarization Techniques and

Adapted Edge Information”, 19th International Conference on Pattern Recog-

nition, pages 1-4, 2008.

191

Page 221: Automation of Preprocessing and Recognition of Historical Document Images

[101] Halabi Y. S., Zaid S. A., “Modeling Adaptive Degraded Document Image Bi-

narization and Optical Character System”, European Journal of Scientific Re-

search, Volume 28, No. 1, pages 14-32, 2009.

[102] Fillali F., Benmahammed K, Abid G., “Image Restoration using SVD and

Adaptive Regularization”, Journal Automation and Systems Engineering Vol-

ume 4, Issue 3, pages 173-181, 2010.

[103] Badekas E., Papamarkos N., “Estimation of Appropriate Parameter Values

for Document Binarization Techniques”, International Journal of Robotics and

Automation, Volume 24, No. 1, pages 66-78, 2009.

[104] Bukhari S. S., Shafait F., Breuel T. M., “Layout Analysis of Arabic Script

Documents”, Book Chapter 2, Guide to OCR for Arabic Scripts, pages 35-53,

Springer-Verlag London, 2012.

[105] Asi A, Saabni R., Sana1 J. E., “Text Line Segmentation for Gray Scale His-

torical Document Images”, Proceedings of the 2011 Workshop on Historical

Document Imaging and Processing, Beijing, China, pages 120-126, 2011.

[106] Hanault D. R., Moghaddam R. F., Cheriet M., “A Local Linear Level Set

Method For the Binarization of Degraded Historical Document Images”, Inter-

national Journal on Document Analysis and Recognition, Volume 15, Issue 2,

pages 101-124, June 2012.

[107] Mantas J., “An Overview of Character Recognition Methodologies”, Pattern

Recognition, Volume 19, No. 6, pages 425-430, 1989.

[108] Govindan V. K, Shivaprasad A.P, “Character Recognition a A Review”, Pattern

Recognition, Volume 23, No. 7, pages 701-709, 1990.

[109] Tian Q., Peng Z., Thomas A., Yongmin K., “Survey: Omni Font Printed Char-

acter Recognition”, Proceedings of SPIE Visual Communication and Image

Processing, Volume 1606, pages 260-268, 1991.

[110] Belaid A. Haton J. P., “A Syntactic Approach for Handwritten Mathematical

Formula Recognition”, IEEE Transaction on Pattern Analysis and Machine

Intelligence, Volume 1, pages 105-111, 1984.

192

Page 222: Automation of Preprocessing and Recognition of Historical Document Images

[111] Sridhar M., Badreldin A., “High Accuracy Syntactic Recognition Algorithm for

Handwritten Numerals”, IEEE Transaction on Systems Man and Cybernetics,

Volume 15, Issue 1, pages 152-158, 1985.

[112] Tappert C.C., Suen C.Y., Wakahara T., “The State of Art in on-line Handwrit-

ing Recognition”, IEEE Transaction on Pattern Analysis and Machine Intelli-

gence, Volume 12, No. 8, pages 787-808, 1990.

[113] Stubberud P, Kanai J., Kalluri V., “Adaptive Image Restoration of Text Im-

ages That Contain Touching or Broken Characters”, Proceedings of the Third

International Conference on Document Analysis and Recognition, pages 778-

781, 1995.

[114] Chaudhuri B. B., Pal U., “A Complete Printed Bangla OCR Systems”, Pattern

Recognition , Volume 31, Issue 5 pages 531-549, 1998.

[115] Sural S., Das P. K., “Fuzzy Hough transform, Linguistic Sets and Soft Decision

MLP for Character Recognition”, Proceedings of Fifth International Conference

on Soft Computing and Information/Intelligent Systems, pages 975-978, 1998,.

[116] Pal U., Kundu P. K., Chaudhuri B. B., “OCR Error Correction of Inflectional

Indian Language using Morphological Parsing”, Journal of Information Science

and Engineering, Volume 16, pages 903-922, 2000.

[117] Pal U., Chaudhuri B. B., “Machine Printed and Handwritten Text Lines Iden-

tification”, Pattern Recognition Letters, Volume 2, pages 431-441, 2001.

[118] Pal U., Belaid A., Choisy Ch., “Touching Numeral Segmentation using Water

Reservoir Concept”, Pattern Recognition Letters, Volume 24, pages 261-272,

2003.

[119] Pal U., Chaudhuri B. B., “Indian Script Character Recognition:A Survey”,

Journal of Pattern Recognition, Volume 37, Issue 9, pages 1128-1132, 2004.

[120] Uchida S., Sakoe H., “Eigen Deformations for Elastic Matching Based Hand-

written Character Recognition”, Pattern Recognition, Volume 36, pages 2031-

2040, 2003.

193

Page 223: Automation of Preprocessing and Recognition of Historical Document Images

[121] Pujari A. K., Naidu C. D., Jinaga B. C., “An Intelligent Character Recognizer

for Telugu Scripts using Multiresolution Analysis and Associative Memory”,

Image and Vision Computing, Volume 22, Issue 14, pages 1221-1227, 2004.

[122] Rasagna V., Jinesh K. J., Jawahar C. V., “On Multifont Character Classifi-

cation in Telugu”, Information Systems for Indian Languages,Communications

in Computer and Information Science, Volume 139, pages 86-91, 2011.

[123] Sastry P. N., Krishnan R., Ram B. V. S., “Classification and Identification

of Telugu Handwritten Characters Extracted from Palm leaves Using Decision

Tree Approach”, ARPN Journal of Engineering and Applied Sciences, Volume

5, Issue 3, pages 22-32, 2010.

[124] Goyal P., Diwakar S., Agrawal A., “Devanagari Character Recognition towards

natural Human-Computer Interaction”, India HCI,Interaction Design and In-

ternational Development, Indian Institute of Technology, pages 20-24, 2010.

[125] Shelke S., Apte S., “A Novel Multi-feature Multi-Classifier Scheme for Uncon-

strained Handwritten Devanagari Character Recognition”, Proceedings of 12th

International Conference on Frontiers in Handwriting Recognition, Kolkata,

pages 215-219, 2010.

[126] Shelke S., Apte S., “A Novel Multistage Classification and Wavelet Based Ker-

nel Generation For Handwritten Marathi Compound Character Recognition”,

Proceedings of International Conference on Communications and Signal Pro-

cessing, pages 193-197, 2011.

[127] Shelke S., Apte S., “Multistage Handwritten Marathi Compound Character

Recognition Using Neural Networks”, Journal of Pattern Recognition Research,

Volume 2, pages 253-268, 2011.

[128] John J., Pramod K. V., Balakrishnan K., “Unconstrained Handwritten Malay-

alam Character Recognition using Wavelet Transform and Support vector Ma-

chine Classifier”, Procedia Engineering, Volume 30, pages 598-605, 2012.

194

Page 224: Automation of Preprocessing and Recognition of Historical Document Images

[129] Pal U., Kundu S., Ali Y., Islam H., Tripathy N., “Recognition of Unconstrained

Malayalam Handwritten Numeral”, Proceedings of the Fourth Indian Confer-

ence on Computer Vision,ICVGIP, pages 423-428, 2004.

[130] Nagabhushan P., Pai R. M., “Modified Region Decomposition Method and Op-

timal Depth Decision Tree in the Recognition of Non-uniform Sized Characters-

An Experimentation with Kannada Characters”, Pattern Recognition Letters,

Volume 20, pages 1467-1475, 1999.

[131] Ashwin T. V., Sastry P. S, “A Font and Size Independent OCR Systems for

Printed Kannada Documents using Support Vector Machines”, Sadhana, Vol-

ume 27, pages 35-58, 2002.

[132] Chaudhuri B. B., Bera S., “Handwritten Text Line Identification In Indian

Scripts”, 10th International Conference on Document Analysis and Recognition,

pages 636-640, 2009.

[133] Lakshmi C. V., Patvardhan C., “An optical character recognition system for

printed Telugu text”, Journal on Pattern Analysis and Application, Volume 7,

Issue 2, pages 190-204, 2004.

[134] Kokku A., Chakravarthy S., “A Complete OCR System for Tamil Magazine

Documents, A Guide to OCR for Indic Scripts”, Springer Verlag London lim-

ited, pages 147-162, 2009.

[135] Shashikiran K., Kolli S. P., Kunwar R., Ramakrishnan A. G., “Comparison of

HMM and SDTW for Tamil Handwritten Character Recognition”, 2010 Inter-

national Conference on Signal Processing and Communications, pages 1-4.

[136] Hirabara L. Y., Aires S. B. K., Freitas C. O. A., “Dynamic Zoning Selection for

Handwritten Character Recognition”, Progress in Pattern Recognition, Image

Analysis, Computer Vision, and Applications, LNCS 7042, pages 507-514, 2011.

[137] DiLecce V, Dimauro G, Guerriero A, Impedovo S, Pirlo G, Salzo A, “Zoning

Design for Handwritten Numerical Recogniotion”, 7th International Workshop

on Frontiers in Handwriting Recognition, pages 583-588, 2000.

195

Page 225: Automation of Preprocessing and Recognition of Historical Document Images

[138] Freitas C. O. A., Oliveira L. E. S., Bortolozzi F., Aires S. B. K., “Handwrit-

ten Character Recognition using Non-Symmetrical Perpetual Zoning”, Inter-

national Journal of Pattern Recognition and Artificial Intelligence, Volume 21,

Issue 1, pages 1-21, 2007.

[139] Poisson E., Gaudin V. C., Lallican P.M, “Multi-Modular Architecture Based

on Convolutional Neural Networks for Online Handwritten Character Recogni-

tion”, In:International Conference on Neural Information Processing, Volume

5, pages 2444-2448, 2002.

[140] Tay Y. H., Lallican P. M., Khalid M., Gaudin C. V., Knerr S., “An Offline Cur-

sive Handwritten Word Recognition System”, In: Proceedings of IEEE Region

10 International Conference on Electrical and Electronic Technology, Volume

2, pages 519-524, 2001.

[141] Avila S. D., Matos L., Freitas C., Carvalho J. M. D., “Evaluating a Zon-

ing Mechanism and Class-Modular Architecture for Handwritten Characters

Recognition”, CIARP’07 Proceedings of the Congress on pattern recognition,

12th Iberomerican Conference on Progress in pattern recognition, image analy-

sis and applications, pages 515-524, 2007.

[142] Vishwaas M., Arjun M. M., Dinesh. R., “Handwritten Kannada Character

Recognition Based on Kohonen Neural Network”, International Conference on

Recent Advances in Computing and Software Systems, pages 91-97, 2012.

[143] Prasad M. M., Sukumar M., Ramakrishnan A. G., “Divide and Conquer Tech-

nique in Online Handwritten Kannada Character Recognition”, Proceedings of

the International Workshop on Multilingual OCR, Article No. 11, ACM New

York, 2009.

[144] Kunte R. S., Samuel R. D. S., “A two-stage Character Segmentation Tech-

nique for Printed Kannada Text”, GVIP Special Issue on Image Sampling and

Segmentation, pages 1-8, 2006.

[145] Urolagin S., Prema K. V., Reddy N. V. S., “Kannada Alphabets Recognition

with Application to Braille Translation”, International Journal on Image and

Graphics, Volume 11, No. 3, pages 293-314, 2011.

196

Page 226: Automation of Preprocessing and Recognition of Historical Document Images

[146] Sheshadri K , Ambekar P, Prasad D. P., Kumar R. P, “An OCR System for

Printed Kannada using K-Means Clustering” , 2010 IEEE International Con-

ference on Industrial Technology, pages 183-187, 2010.

[147] Dhandra B. V., Mukarambi G., Hangarge M., “A Recognition System for Hand-

written Kannada and English Characters”, International Journal of Computa-

tional Vision and Robotics, Volume 2, No. 4, pages 290-301, 2011.

[148] Liu C. L., Suen C. Y., “A New Benchmark on the Recognition of Handwritten

Bangla and Farsi Numeral Characters”, Pattern Recognition, Volume 42, pages

3287-3295, 2009.

[149] Sonka M., Hlavac V., Boyle R., “Image Processing, Analysis, and Machine

Vision”, Brooks and Cole Publishing, 1998.

[150] Shih F., “Image Processing and Mathematical Morphology Fundamentals and

Applications”, Wiley Publications, IEEE press, 2010.

[151] Ye X., Cheriet M., Suen C. Y., Liu K., “Extraction of Bank Check Items by

Mathematical Morphology”, International journal on Document Analysis and

Recognition, Springer Link, Volume 2, No. 2, pages 53-66, 1999.

[152] Shetty S. , Sridhar M., “Background Elimination in Bank Cheques using Gray

Scale Morphology”, Proceedings of the 7th International Workshop on Frontiers

in Handwriting Recognition, pages 83-92, 2000.

[153] Mengucci M., Granado I., “Morphological Segmentation of Text and Figures in

Renaissance Books XVI Century”, Mathematical Morphology and its applica-

tions to image processing by Goutsias J, Vincent L, Bloomberg D(eds.), pages

397-404, 2000.

[154] Gonzalez R. C., Woods R. E., “Digital Image processing”, PHI Publication,

Third Edition, 2008.

[155] Tomasi C., Manduchi R., “Bilateral Filtering for Gray and Color Images”, Pro-

ceedings of the IEEE International Conference on Computer Vision, Bombay,

India, pages 839-846, 1998.

197

Page 227: Automation of Preprocessing and Recognition of Historical Document Images

[156] Barash D., “A Fundamental Relationship Between Bilateral Filtering, Adap-

tive Smoothing, and the Nonlinear Diffusion Equation”, IEEE Transactions on

Pattern Analysis and Machine Intelligence, Volume 24, No. 6, pages 844-847,

2002.

[157] Hamarneh G., Hradsky J., “Bilateral Filtering of Diffusion Tensor Magnetic

Resonance Images”, IEEE Transactions on Image Processing, Volume 16, No.

10, pages 2463-2475, October 2007.

[158] Bazan C., Blomgren P., “Image Smoothing and Edge Detection by Nonlinear

Diffusion and Bilateral Filter”, Research Report CSRCR, Volume 21, pages

2-15, 2007.

[159] Buades A., Coll B., Morel J. M., “A Non local Image and Movie Denoising”,

International Journal of Computer Vision, Volume 72, No. 123-139, 2008.

[160] Chacko B. P., Krishnan V. R. V,, Raju G., Anto P. B., “Handwritten Character

Recognition using Wavelet Energy and Extreme Learning Machine”, Interna-

tional Journal of Machine Learning and Cybernetics, Volume 3, No. 2, pages

149-161, 2012.

[161] Tan C. L, Cao R, Shen P, “Restoration of Archival Documents using a Wavelet

Technique”, IEEE Pattern Analysis and Machine Intelligence, Volume 24, Issue

10, pages 1399-1404, 2002.

[162] Chang S. G., Yu B., Vetterli M., “Adaptive Wavelet Thresholding for Image

Denoising and Compression”, IEEE Transaction on Image Processing, Volume

9, No. 9, pages 1532-1546, September 2000.

[163] Donoho D. L., Johnstone I. M., “Adapt to Unknown Smoothness Via Wavelet

Shrinkage”, Journal of American Statistical Association, Volume 90, pages

1200-1224, 1995.

[164] Luisier F., Blu T., Unser M., “A New SURE Approach to Image Denoising:

Interscale Orthonormal Wavelet Thresholding”, IEEE Transactions on Image

Processing, Volume 16, No. 3, pages 593-606, March 2007.

198

Page 228: Automation of Preprocessing and Recognition of Historical Document Images

[165] Zhang X. P., Desai M., “Adaptive Denoising Based On Sure Risk”, IEEE Trans-

actions on signal Procesing, Volume 5, Issue 10, pages 265 - 267, 1998.

[166] Rao R. M, Bopardikar A. S., “Wavelet Transforms: Introduction To The-

ory And Application”, Fundamental of Electronic Image Processing, Addison-

Wesly, pages 126, 2001.

[167] Blu T, Luisier F., “The SURE-LET Approach to Image Denoising”,IEEE

Transactions On Image Processing, Volume 16, Issue 11, pages 2778-2786, 2007.

[168] Chipman H. A., Kolaczyk E. D., McCulloch R. E., “Adaptive Bayesian Wavelet

Shrinkage”, Journal of American Statistical Association, Volume 92, Issue 440,

pages 1413-1421, Dec 1997.

[169] Donoho D. L., “De-noising by Soft-Thresholding”, IEEE Transaction on Infor-

mation Theory, Volume 41, Issue 3, pages 613-627, May 1995.

[170] Zhang B., Zhang Y., Lu W., Han G., “Phenotype Recognition by Curvelet

Transform and Random Subspace Ensemble”, Journal of Applied Mathematics

Bio-informatics, Volume 1, Issue 1, pages 79-103, 2011.

[171] Fadili M. J., Starck J. L., “Curvelets and Ridgelets”, In Encyclopedia of Com-

plexity and Systems Science, Volume 3, pages 1718-1738, 2007.

[172] Candes E., Demanet L., Donoho D., Ying L., “Fast Discrete Curvelet Trans-

forms”, Multiscale Modeling and Simulation, Volume 5, No. 3, pages 861-899,

2006.

[173] Starck J. L., CandA¨s E. J, Donoho D. L., “The Curvelet Transform for Image

Denoising”, IEEE Transactions on Image Processing, Volume 11, No. 6, pages

670-684, 2002.

[174] Starck J. L., Murtagh F, Candes E. J., Donoho D. L., “Gray and Color Image

Contrast Enhancement by the Curvelet Transform”, IEEE Transactions on

Image Processing, Volume 12, Issue 6, pages 706-717, 2003.

199

Page 229: Automation of Preprocessing and Recognition of Historical Document Images

[175] Sumana I., Islam M., Zhang D., Lu G., “Content Based Image Retrieval using

Curvelet Transform”, IEEE 10th Workshop on Multimedia Signal Processing,

pages 11-16, 2008.

[176] http://www.curvelet.org/software.html, last updated 24 August 2007.

[177] Louloudis G., Gatos B., Pratikakis I., Halatsis C., “Text Line and Word Seg-

mentation of Handwritten Documents”, Journal of Pattern Recognition, Vol-

ume 42, pages 3169-3183, 2009.

[178] Basu S, Chaudhuri C., Kundu M., Nasipuri M., Basu D. K., “Text Line Extrac-

tion from Multi-Skewed Handwritten Documents”, Journal of Pattern Recog-

nition, Volume 40, Issue 6, pages 1825-1839, 2007.

[179] Kennard D. J, Barrett W. A., “Separating Lines of Text in Free-Form Handwrit-

ten Historical Documents”, The Second International Conference on Document

Image Analysis for Libraries, pages 23, 2006.

[180] Likforman-Sulem L., Faure C., “Extracting Text Lines in Hand-Written Doc-

uments by Perceptual Grouping”, Advances in Handwriting and Drawing: A

multidisciplinary approach, Paris, pages 21-38, 1994.

[181] Likforman-Sulem L., Hanimyan A., Faure C., Nat E., “A Hough Based Al-

gorithm for Extracting Text Lines in Handwritten Documents”, Proceedings

of the Third International Conference on Document Analysis and Recognition,

Volume 2, pages 774-777, 1995.

[182] Aradhya V. N. M., Kumar G. H., Shivakumara P., “Skew Detection Technique

for Binary Document Images Based on Hough Transform”, International Jour-

nal of Information Technology, Volume 3, Issue 3, pages 194-200, 2006.

[183] Nandini N., Murthy K. S., Kumar G. H., “Estimation of Skew Angle in Bi-

nary Document Images Using Hough Transform”, World Academy of Science,

Engineering and Technology, Volume 42, pages 44-49, 2008.

[184] Chaudhuri B. B., Bera S., “Handwritten Text Line Identification In Indian

Scripts”, 10th Inter. Conference on Document Analysis and Recognition, pages

636-640, 2009.

200

Page 230: Automation of Preprocessing and Recognition of Historical Document Images

[185] Papavassiliou V., Katsouros V., Carayannis G., “A Morphological Approach for

Text-Line Segmentation in Handwritten Documents”, International Conference

on, Frontiers in Handwriting Recognition, pages 19-24, 2010.

[186] Papavassiliou V., Stafylakis T., Katsouros V., Carayannis G., “Handwritten

Document Image Segmentation into Text Lines and Words”, Pattern Recogni-

tion, Volume 43, pages 369-377, 2010.

[187] Rashid S. F., Shafait F., Breuel T. M., “Scanning Neural Network for Text

Line Recognition”, 10th IAPR Workshop on Document Analysis Systems, Gold

Coast, Australia, pages 105-109, March 2012.

[188] Dani A. H., “Indian Paleography”, Manoharlal Publications, ISBN-10:

8121500281, 1997.

[189] http://www.indianetzone.com/7/kannada.html, last updated on: 01/01/2009.

[190] Buades A., Coll B., Morel J. M., “Nonlocal Image and Movie Denoising”, In-

ternational Journal of Computer Vision Volume 76, Issue 2, pages 123-139,

2008.

[191] Dholakia J., Negi A., Mohan S. R., “Zone Identification in the Printed Gujarati

Text”, Proceedings of the Eight International Conference on Document Analysis

and Recognition, pages 272 - 276, 2005.

[192] Amayeh G., Kasaei S., Tavakkoli A., “A Modified Algorithm to Obtain Trans-

lation, Rotation and Scale Invariant Zernike Moment Shape Descriptors”, In-

ternational Workshop on Computer Vision, Tehran, April 2004.

[193] Desai A., “Gujarati Handwritten Numeral Optical Character Reorganization

Through Neural Network”, Pattern Recognition, Volume 43, pages 2582-258,

2010.

[194] Gatos B., Kesidis A. L., Papandreou A., “Adaptive Zoning Features for Char-

acter and Word Recognition”, International Conference on Document Analysis

and Recognition, pages 1160-1164, 2011.

201

Page 231: Automation of Preprocessing and Recognition of Historical Document Images

[195] Khanale R. R., Chitnis S. D., “Handwritten Devanagari Character Recognition

using Artificial Neural Network”, Journal of Artificial Intelligence, Volume 4,

Issue 1, pages 55-62, 2011.

[196] Murthy K. S., Doreswamy, Kumar G. H., Nagabhushan P, “Texture Features for

the Prediction of Period of an Epigraphical Script”, In proceedings of National

Workshop on Document Analysis and recognition, P E S College of Engineering

Mandya, India, pages 192-196, 2003.

[197] Kan C, Mandyam D. Srinath, “Invariant Character Recognition with Zernike

and Orthogonal Fourier Mellin Moments”, Pattern Recognition, Volume 35,

Issue 1, pages 143-154, 2002.

[198] Sheng Y., Shen L., “Orthogonal Fourier-Mellin Moments for Invariant Pattern

Recognition”, Journal of Optical Society of America, Volume 11, Issue 6, pages

1748-1757, 1994.

[199] Khotanzad A., Hongs Y. S., “Invariant Image Recognition by Zernike Mo-

ments”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol-

ume 12, Issue 5, pages 489-497, 1990.

[200] Kunte S. R., Samuel R. D. S, “Hu’s Invariant Moments and Zernike Moments

Approach for the Recognition of Basic Symbols in Printed Kannada Text”,

Sadhana, Volume 32, Issue 5, pages 521-533, 2007.

[201] Hu M. K., “Visual Pattern Recognition by Moment Invariants”, IEEE IRE

Transaction on Information Theory, Volume 8, pages 179-187, 1962.

[202] Primekumar K. P., Idiculla S. M., “On-line Malayalam Handwritten Character

Recognition Using Wavelet Transform and SFAM”, 3rd International Confer-

ence on Electronics Computer Technology, pages 49-53, 2011.

202