jjenkinson_thesis

140
ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES APPROVED BY SUPERVISING COMMITTEE: Arytom Grigoryan, Ph.D., Chair Walter Richardson, Ph.D. David Akopian, Ph.D. Accepted: Dean, Graduate School

Upload: john-jenkinson

Post on 08-Aug-2015

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JJenkinson_Thesis

ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

APPROVED BY SUPERVISING COMMITTEE:

Arytom Grigoryan, Ph.D., Chair

Walter Richardson, Ph.D.

David Akopian, Ph.D.

Accepted:Dean, Graduate School

Page 2: JJenkinson_Thesis

Copyright 2014 John Jenkinson

All rights reserved.

Page 3: JJenkinson_Thesis

DEDICATION

To my family.

Page 4: JJenkinson_Thesis
Page 5: JJenkinson_Thesis

ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

by

JOHN JENKINSON, M.S.

DISSERTATION

Presented to the Graduate Faculty of

The University of Texas at San Antonio

In Partial Fulfillment

Of the Requirements

For the Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT SAN ANTONIO

College of Engineering

Department of Electrical and Computer Engineering

December 2014

Page 6: JJenkinson_Thesis

All rights reserved

INFORMATION TO ALL USERSThe quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscriptand there are missing pages, these will be noted. Also, if material had to be removed,

a note will indicate the deletion.

Microform Edition © ProQuest LLC.All rights reserved. This work is protected against

unauthorized copying under Title 17, United States Code

ProQuest LLC.789 East Eisenhower Parkway

P.O. Box 1346Ann Arbor, MI 48106 - 1346

UMI 1572687

Published by ProQuest LLC (2015). Copyright in the Dissertation held by the Author.

UMI Number: 1572687

Page 7: JJenkinson_Thesis

ACKNOWLEDGEMENTS

My most sincere regard is given to Dr. Artyom Grigoryan for giving me the opportunity to learn

to research and for being here for the students, to Dr. Walter Richardson, Jr. for teaching complex

topics from the ground up and leading this horse of a student to mathematical waters applicable

to my research, to Dr. Mihail Tanase for being the study group that I have never had, and to Dr.

Azima Mottaghi for constant motivation, support and the remark, "You can finish it all in one day."

Additionally, this work was progressed through discussions with Mehdi Hajinoroozi, Skei, hftf,

and pavonia. I also acknowledge the UTSA Mexico Center for their support of this research.

December 2014

iv

Page 8: JJenkinson_Thesis

ENHANCEMENT CLASSIFICATION OF GALAXY IMAGES

John Jenkinson, B.S.

The University of Texas at San Antonio, 2014

Supervising Professor: Arytom Grigoryan, Ph.D., Chair

With the advent of astronomical imaging technology developments, and the increased capacity

of digital storage, the production of photographic atlases of the night sky have begun to generate

volumes of data which need to be processed autonomously. As part of the Tonantzintla Digi-

tal Sky Survey construction, the present work involves software development for the digital image

processing of astronomical images, in particular operations that preface feature extraction and clas-

sification. Recognition of galaxies in these images is the primary objective of the present work.

Many galaxy images have poor resolution or contain faint galaxy features, resulting in the mis-

classification of galaxies. An enhancement of these images by the method of the Heap transform

is proposed, and experimental results are provided which demonstrate the image enhancement to

improve the presence of faint galaxy features thereby improving classification accuracy. The fea-

ture extraction was performed using morphological features that have been widely used in previous

automated galaxy investigations. Principal component analysis was applied to the original and en-

hanced data sets for a performance comparison between the original and reduced features spaces.

Classification was performed by the Support Vector Machine learning algorithm.

v

Page 9: JJenkinson_Thesis

TABLE OF CONTENTS

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Galaxy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Hubble Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 de Vaucouleurs Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Digital Data Volumes in Modern Astronomy . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Digitized Sky Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.2 Problem Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3 Problem Description and Proposed Solution . . . . . . . . . . . . . . . . . . . . . 14

1.4 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4.1 Survey of Automated Galaxy Classification . . . . . . . . . . . . . . . . . 15

1.4.2 Survey of Support Vector Machines . . . . . . . . . . . . . . . . . . . . . 17

1.4.3 Survey of Enhancement Methods . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 2: Morphological Classification and Image Analysis . . . . . . . . . . . . . . . 20

2.1 Astronomical Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Image enhancement measure (EME) . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Spatial domain image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.1 Negative Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.2 Logarithmic Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 28

vi

Page 10: JJenkinson_Thesis

2.3.3 Power Law Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3.4 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.5 Median Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4 Transform-based image enhancement . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.1 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4.2 Enhancement methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.5.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.5.2 Rotation, Shifting and Resizing . . . . . . . . . . . . . . . . . . . . . . . 53

2.5.3 Canny Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.6 Data Mining and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.6.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.6.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 64

2.6.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.7 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Appendix A: Project Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.1 Preprocessing and Feature Extraction codes . . . . . . . . . . . . . . . . . . . . . 85

A.2 SVM Classification codes with data . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.2.1 Original data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.2.2 Enhanced data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Vita

vii

Page 11: JJenkinson_Thesis

LIST OF TABLES

Table 1.1 Hubble’s Original Classification of Nebulae Table . . . . . . . . . . . . . . 3

Table 2.1 Morphological Feature Descriptions . . . . . . . . . . . . . . . . . . . . . 64

Table 2.2 Feature Values Per Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Table 2.3 Galaxy list and relation between NED classification and current project

classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Table 2.4 Summary of classification results for original and enhanced data. Accuracy

improved by 12.924% due to enhancement. . . . . . . . . . . . . . . . . . 81

viii

Page 12: JJenkinson_Thesis

LIST OF FIGURES

Figure 1.1 Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rl-

s/astro1020/ch20/ch26_fig26_9.jpg. . . . . . . . . . . . . . . . . . . . . . 2

Figure 1.2 Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Obser-

vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 4

Figure 1.3 Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Obser-

vatory originally included in Hubble’s paper, Extra-galactic Nebulae. . . . . 6

Figure 1.4 A plane projection of the revised classification scheme. . . . . . . . . . . . 10

Figure 1.5 A 3-Dimensional representation of the revised classification volume and

notation system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Figure 1.6 Sloan Digital Sky Survey coverage map. http://www.sdss.org/sdss-surveys/.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 2.1 Schmidt Camera of Tonantzintla. Permission to use image from the Insti-

tuto Nacional de Astrofísica, Óptica y Electrónica (INAOE). . . . . . . . . 20

Figure 2.2 Plate Sky Coverage. Permission to use image from the Instituto Nacional

de Astrofísica, Óptica y Electrónica (INAOE). . . . . . . . . . . . . . . . . 21

Figure 2.3 Digitized plate AC8431 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 2.4 Marked plate scan AC8431 . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Figure 2.5 Plate scan AC8409 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Figure 2.6 Marked plate scan AC8409 . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Figure 2.7 Cropped galaxies from plate scans AC8431 and AC8409 read left to right

and top to bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393,

4414, 4448, 4559, 3985, 4085, 4088, 4096, 4100, 4144, 4157, 4217, 4232,

4218, 4220, 4346, 4258. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 2.8 Negative, log and power transformations. . . . . . . . . . . . . . . . . . . 28

ix

Page 13: JJenkinson_Thesis

Figure 2.9 Top to bottom: Galaxy NGC4258 and its Negative Image. . . . . . . . . . . 29

Figure 2.10 Logarithmic and nth root transformations. . . . . . . . . . . . . . . . . . . 30

Figure 2.11 γ-power transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Figure 2.12 Galaxy NGC 4217 power law transformations. . . . . . . . . . . . . . . . . 32

Figure 2.13 Histogram processing to enhance Galaxy NGC 6070. . . . . . . . . . . . . 34

Figure 2.14 Top to Bottom: Histogram of original and enhanced image. . . . . . . . . . 35

Figure 2.15 Illustration of the median of a set of points in different dimensions. . . . . . 36

Figure 2.16 Signal-flow graph of determination of the five-point transformation by a

vector x = (x0, x1, x2, x3, x4)′. . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 2.17 Network of the x-induced DsiHT of the signal z. . . . . . . . . . . . . . . . 44

Figure 2.18 Intensity values and spectral coefficients of Galaxy NGC 4242. . . . . . . . 46

Figure 2.19 Butterworth lowpass filtering performed in the Fourier (frequency) domain. 47

Figure 2.20 α-rooting enhancement of Galaxy NGC 4242. . . . . . . . . . . . . . . . . 47

Figure 2.21 Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap

transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Figure 2.22 Computational scheme for galaxy classification. . . . . . . . . . . . . . . . 49

Figure 2.23 Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresh-

olding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Figure 2.24 Morphological opening for star removal from Galaxy NGC 5813. . . . . . 54

Figure 2.25 Rotation of Galaxy image NGC 4096 by galaxy second moment defined

angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Figure 2.26 Resizing of Galaxy NGC 4220. . . . . . . . . . . . . . . . . . . . . . . . . 59

Figure 2.27 Canny edge detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Figure 2.28 PCA rotation of axes for a bivariate Gaussian distribution. . . . . . . . . . 65

Figure 2.29 Pictorial representation of the development of the geometric margin. . . . . 69

Figure 2.30 Maximum geometric margin. . . . . . . . . . . . . . . . . . . . . . . . . . 70

Figure 2.31 SVM applied to galaxy data. . . . . . . . . . . . . . . . . . . . . . . . . . 73

x

Page 14: JJenkinson_Thesis

Figure 2.32 Classification iteration class pairs. . . . . . . . . . . . . . . . . . . . . . . 77

Figure 2.33 PCA feature space iteration 1 classification. . . . . . . . . . . . . . . . . . 78

Figure 2.34 PCA feature space iteration 2 classification. . . . . . . . . . . . . . . . . . 79

Figure 2.35 PCA feature space iteration 3 classification. . . . . . . . . . . . . . . . . . 79

Figure 2.36 PCA feature space iteration 4 classification. . . . . . . . . . . . . . . . . . 80

Figure 2.37 PCA feature space iteration 1 classification of enhanced data. . . . . . . . . 81

Figure 2.38 PCA feature space iteration 2 classification of enhanced data. . . . . . . . . 82

Figure 2.39 PCA feature space iteration 3 classification of enhanced data. . . . . . . . . 82

Figure 2.40 PCA feature space iteration 4 classification of enhanced data. . . . . . . . . 83

xi

Page 15: JJenkinson_Thesis

Chapter 1: INTRODUCTION

1.1 Galaxy Classification

Why classify galaxies? It is an inherent characteristic of man to classify objects. Our country’s

government classifies families according to annual income to establish tax laws. Medical doctor’s

classify our blood’s type making successful transfusion possible. Organic genes are classified by

genetic engineers so that freeze resistant DNA from a fish can be used to "infect" a tomato cell

making the tomato less susceptible to cold. Words in the English language are assigned to the

categories noun, verb, adjective, adverb, pronoun, preposition, conjunction, determiner, and excla-

mation, allowing for the structured composition of sentences. Differential equations are classified

as ordinary (ODEs) and partial (PDEs) with ODEs having sub-categories: linear homogeneous,

exact differential equations, n-th order equations, etc..., which allowing easy of study and for solu-

tion methods to be developed for certain classes such as the method of undetermined coefficients

for ordinary linear differential equations with variable coefficients. If we say that a system is linear,

there is no need to mention that the system’s input-output relationship is observed to be additive

and homogeneous. Classification pervades every industry, and enables improved communication,

organization and operation within society. For galaxies classification in particular, astrophysicists

think that to understand the formation and subsequent evolution of galaxies one must first dis-

tinguish between the two main morphological classes of massive systems: spirals and early-type

systems which are also called ellipticals. Galaxies with spiral arms, for example, are normally ro-

tating disk of stars, dust and gas with plenty of fuel for future star formation. Ellipticals, however,

are normally more mature system which long ago finished forming stars. The galaxies’ histories

are also revealed; dust lane early-type galaxies are starbust systems formed in gas-rich mergers

of smaller spiral galaxies. A galaxy’s classification can reveal information about its environment.

A morphology-density relationship has been observed in many studies; spiral galaxies tend to be

located in low-density environments and ellipticals in more dense environments [1,2,3].

1

Page 16: JJenkinson_Thesis

There are many physical parameters of galaxies that are useful for their classification, but this

paper considers the classification of galaxies by their morphology, a word derived from the Greek

word morph, meaning shape or form.

1.1.1 Hubble Scheme

Hubble’s scheme was visually popularized by the "tuning fork" diagram which displays examples

of each nebulae class, described in this section, in the transition sequence from early-type elliptical

to late-type spiral. The tuning fork diagram is shown in Figure 1.1. While the basic classification

Figure 1.1: Hubble Tuning Fork Diagram. Image from http://www.physast.uga.edu/ rls/as-

tro1020/ch20/ch26_fig26_9.jpg.

of galaxy morphology assigns members to the categories of elliptical and spiral, the most promi-

nent classification scheme was introduced by Sir Edwin Hubble in his 1926 paper, "Extra-galactic

Nebulae." This classification scheme is based on galaxy structure. The individual members of a

class differ only in apparent size and luminosity. Originally, Hubble stated that the forms divide

themselves naturally into two groups: those found in or near the Milky Way and those in moderate

2

Page 17: JJenkinson_Thesis

or high altitude galactic latitudes. This paper, along with Hubble’s classification scheme will only

consider the extra-galactic division: Table 1.1 shows that this scheme contains two main divisions,

Table 1.1: Hubble’s Original Classification of Nebulae Table

Type: Symbol Example

A. Regular: N.G.C

1. Elliptical....................................................En

(n=1,2,...,7 indicates the ellipticity of the image)

3379

221

4621

2117

E0

E2

E5

E7

2. Spirals:

a) Normal spirals............................................S

(1) Early..........................................................Sa

(2) Intermediate..............................................Sb

(3) Late...........................................................Sc

b) Barred spirals.............................................SB

(1) Early..........................................................SBa

(2) Intermediate..............................................SBb

(3) Late...........................................................SBc

N.G.C.

4594

2841

5457

N.G.C.

2859

3351

7479

B. Irregular: ........................................................................Irr 4449

regular and irregular galaxies. Within the regular division, three main classes exist: elliptical,

spirals, and barred spirals. The term nebulae and galaxies are used interchangeably with a brief

discussion of the rational for this at the end of this subsection. N.G.C. and U.G.C are acronyms

for New General Catalogue and Uppsala General Catalogue, respectively, and are designations for

deep sky objects.

Elliptical galaxies range in shape from circular through flattening ellipses to a limiting lenticu-

lar figure in which the ratio of axes is about 1 to 3 or 4. They contain no apparent structure except

for their luminosity distribution which is maximum at the center of the galaxy and decreases to

unresolved edges. The degree to which an elliptical nebulae is flattened is determined by the cri-

terion, elongation, defined as (a − b)/a, where a and b are the semi major and semi minor axes,

respectively, or an ellipse fitted to the nebulae. The elongation mentioned here is different than,

and not to be confused with, the morphic feature elongation that is introduced later in this paper.

Elliptical nebulae are designated by the symbol,"E," followed by the numerical value of ellipticity.

3

Page 18: JJenkinson_Thesis

The complete series is E0, E1,. . ., E7, the last representing a definite limiting figure which marks

the junction with spirals. Examples of nebulae with differing ellipticities are shown in Figure 1.2.

Figure 1.2: Plate scan of Elliptical and Irregular Nebulae from Mount Wilson Observatory origi-

nally included in Hubble’s paper, Extra-galactic Nebulae.

All regular nebulae with ellipticities greater than about E7 are spirals, and no spirals are known

4

Page 19: JJenkinson_Thesis

with ellipticity less than this limit. Spirals are designated by the symbol "S". Classification criteria

for spiral nebulae is: (1) relative size of the unresolved nuclear region; (2) extent to which the arms

are unwound; (3) degree of resolution in the arms. Relative size of the nucleus decreases as the

arms of the spiral more widely open. The stages of this transition of spiral galaxies are designed

as "a" for early types, "b" for intermediate types, and "c" for late types. Nebulae intermediate

between E7 and Sa are occasionally designated as S0, or lenticular.

Barred spirals is a class of spirals which have a bar of nebulosity extending diametrically across

the nucleus. This class is designated by the symbol "SB", with a sequence which parallels that of

normal spirals, leading to the subdivision of barred spirals designated by "SBa", "SBb", and "SBc"

for early, intermediate and late type barred spirals, respectively. Examples of normal and barred

spirals along with their subclasses are shown in Figure 1.3.

Irregular nebulae are extra-galactic nebulae that lack both discriminating nuclei and rotational

symmetry. Individual stars may emerge from an unresolved background in these galaxies.

For any given imaging system, there is a limiting resolution beyond which classification cannot

be made with any confidence. Hubble designed galaxies within this category by the letter "Q."

On the usage of nebulae versus galaxy, the astronomical term nebulae has come down through

the centuries as the name for permanent, cloudy patches in the sky that are beyond the limits of

the solar system. In 1958, the term nebulae was used for two types of astronomical bodies: clouds

of dust and gas which are scattered among the stars of the galactic system (galactic nebulae),

and the remaining objects, which are now recognized as independent stellar systems scattered

through space beyond the limits of the galactic system (extra-galactic nebulae). Some astronomers

considered that since nebulae are now considered stellar systems they should be designated by

some other name, which does not carry the connotation of clouds or mist. Today, those who adopt

this consideration refer to other stellar systems as external galaxies. Since this paper only considers

external galaxies we will drop the adjective and employ the term galaxies for whole external stellar

systems [4].

5

Page 20: JJenkinson_Thesis

Figure 1.3: Plate scan of Spiral and Barred Spiral Nebulae from Mount Wilson Observatory orig-

inally included in Hubble’s paper, Extra-galactic Nebulae.

6

Page 21: JJenkinson_Thesis

1.1.2 de Vaucouleurs Scheme

The de Vaucouleurs Classification system is an extension of the Hubble Classification system, and

is the most commonly used system. For this reason it is noted in this paper.

About 1935, Hubble undertook a systematic morphological study of the approximately 1000

brighter galaxies listed in the Shipely Ames Catalogue, north of -30° declination, with a view of

refining his original classification scheme. The main revisions include a) the introduction of the

S0 and SB0 types regarded as transition stages between ellipticals and spirals at the branching off

point of the tuning fork. S0, or lenticular galaxies resemble spiral galaxies in luminosity, but do

not contain visible spiral arms. A visible lens surrounds these galaxies bordered by a faint ring

of nebulosity. Characteristics of lenticular galaxies are a bright nucleus in the center of a disc

or lens. Near the perimeter of the galaxy, there exists a faint rim or envelope with unresolved

edges. Hubble separated the lenticulars into two groups, S0(1) and S0(2). These groups have a

smooth lens and envelope, and some structure in the envelope in the form of a dark zone and ring,

respectively. S0/a is the transition stage between S0 and Sa and shows apparent developing spiral

structure in the envelope. SB0 objects are characterized by a bar through the central lens. Hubble

distinguished three groups of SB0 objects: group SB0(1) have a bright lens, with broad, hazy bar

and no ring, surrounded by a larger, fainter envelopes some being circular, group SB0(2) have a

broad, weak bar across a primary ring, with faint outer secondary rings, and group SB0(3) have a

well developed bar and ring pattern, with the bar stronger than the ring.

c) Harlow Shapely proposed an extension to the normal spiral sequence beyond Sc designating

galaxies showing a very small, bright nucleus and many knotty irregular arms by Sd. A parallel

extension of the barred spiral sequence beyond the stage SBc was introduced by de Vaucouleurs in

1955 which may be denoted SBd or SBm [5,6].

For Irregular type galaxies related to Magenellic Clouds, I(m), an important characteristic is

their small diameter and low luminosity which marks them as dwarf galaxies.

d) Shapely discovered the existence of dwarf ellipticals (dE) by observation of ellipticals with

7

Page 22: JJenkinson_Thesis

very low surface brightness.

de Vaucouleurs noted that after all such types or variants have been assigned into categories,

there remains a hard core of "irregular" objects which do not seem to fit into any of the recognized

types. These outliers are presently discarded, and only isolated galaxies are considered in the

present article.

The coherent classification scheme proposed by de Vaucouleurs which included most of the

current revision and additions to the standard classification is described here. Classification and

notation of the scheme are illustrated in Figure 1.4, which may be considered as a plane projection

of the three dimensional representation in Figure 1.5. Four Hubble classes are retained: ellipticals

E, lenticulars S0, spirals S, irregulars I.

Lenticulars and spirals, were re-designated "ordinary" SA and "barred" SB, respectively, to

allow for the use of the compound symbol SAB for the transition stage between these two classes.

The symbol S alone is used when a spiral object cannot be more accurately classified as either SA

or SB because of poor resolution, unfavorable tilt, etc.

Lenticulars were divided into two subclasses, denoted SA0 and SB0, where SB0 galaxies have

a bar structure across the lens and SA0 galaxies do not. SAB0 denotes objects with a very weak

bar. The symbol S0 is now used for a lenticular object which cannot be more precisely classified

as either SA0 or SB0; this is often the case for edgewise objects.

Two main varieties are recognized in each of the lenticular and spiral families, the" annular"

or "ringed" type, denoted (r), and the" spiral" or " S-shaped" type, denoted (s). Intermediate types

are noted (rs). In the "ringed" variety the structure includes circular (sometimes elliptical) arcs or

rings (SO) or consists of spiral arms or branches emerging tangentially from an inner circular ring

(5). In the "spiral" variety two main arms start at right angles from a globular or little elongated

nucleus (5 A) or from an axial bar (5 B). The distinction between the two families A and B and

between the two varieties (r) and (s) is most clearly marked at the transition stage SO/a between

the SO and 5 classes. It vanishes at the transition stage between E and SO on the one hand, and at

the transition stage between 5 and I on the other (d. Fig. 3).

8

Page 23: JJenkinson_Thesis

Four sub-divisions or stages are distinguished along each of the four spiral sequences SA(r),

SA (s), SB(r), SB(s), viz. "early", "intermediate" and "late" denoted a, b, e as in the standard

classification, with the addition of a "very late" stage, denoted d. Intermediate stages are noted 5

ab, 5 be, 5 cd. The transition stage towards the magellanic irregulars (whether barred or not) is

noted 5 m, e.g. the Large Magellanic Cloud is 5 B (s) m. Along each of the non-spiral sequences

the signs + and - are used to denote " early" and "late" subdivisions; thus E+ denotes a "late" E,

the first stage of the transition towards the SO class 2. In both the SAO and S BO sub-classes

three stages, noted SO-, 50°, 50+ are thus distinguished; the transition stage between SO and Sa,

noted SO/a by HUBBLE, may also be noted Sa-. Notations such as S a+, S b-, etc. may be used

occasionally in the spiral sequences, but the distinction is so slight between, say, 5 a+ and S b-,

that for statistical purposes it is convenient to group them together as 5 a b, etc. Experience shows

that this makes the transition subdivisions, Sab, Sbe, etc. as wide as the main sub-divisions, Sa,

Sb, etc. 3.

The classification of irregulars which do not show clearly the characteristic spiral structure are

noted I(m).

Figure 1.4 shows a plane projection of the revised classification scheme.Compare with Fig-

ure 1.5. The ordinary spirals SA are in the upper half of the figure, the barred spirals SB in the

lower half. The ring types (r) are the the left, the spiral types (s) to the right. Ellipticals and lentic-

ulars are near the center, magellanic irregulars near the rim. The main stages of the classification

sequence from E to Im through S0-, S0, S0+, Sa, Sb, Sc, Sd, Sm are illustrated, approximately

on the same scale, along each of the four main morphological series SA(r), SA(s), SB(r), SB(s).

A few mixed or "intermediate" types SAB and S(rs) are shown along the horizontal and vertical

diameters respectively. This scheme is superseded by the slightly revised and improved system

illustrated in Figure 1.5.

Figure 1.5 shows a 3-Dimensional representation of the revised classification volume and no-

tation system. From left to right are the four main classes: ellipticals E, lenticulars S0, spirals S,

and Irregulars I. Above are ordinary families SA, below the barred families SB; on the near side

9

Page 24: JJenkinson_Thesis

Figure 1.4: A plane projection of the revised classification scheme.

are the S-shaped varieties s(s), on the far side the ringed varieties S(r). The shape of the volume

indicated that the separation between the various sequences SA(s), SA(r), SB(r), SB(s) is greatest

at the transition stage S0/a between lenticulars and spirals and vanishes at E and Im. A central

cross-section of the classification volume illustrates the relative location of the main types and the

notation system. There is a continuous transition of mixed types between the main families and va-

10

Page 25: JJenkinson_Thesis

rieties across the classification volume and between stages along each sequence; each point in the

classification volume represents potentially a possible combination of morphological characteris-

tics. For classification purposes this infinite continuum of types is represented by a finite number

of discrete "cells" [5, 6, 7]. The classification scheme included here defers to [5, 6] for a complete

Figure 1.5: A 3-Dimensional representation of the revised classification volume and notation sys-

tem.

description.

11

Page 26: JJenkinson_Thesis

1.2 Digital Data Volumes in Modern Astronomy

1.2.1 Digitized Sky Surveys

Modern astronomy has produced massive volumes of data relative to that produced at the start of

the 20th century. Digitized sky surveys attempt to construct a virtual photographic atlas of the

universe through the identification and cataloging of observed celestial phenomena for the purpose

of understanding the large-scale structure of the universe, the origin and evolution of galaxies,

the relationship between dark and luminous matter, and many other topics of research interest

in astronomy. This idea is being realized through the efforts of multiple organizations and all

sky surveys. Notable surveys and their night sky coverage contribution and data collection are

mentioned here.

The Sloan Digital Sky Survey (SDSS) is the most prominent on going all sky survey, in its

seventh data release almost 1 billion objects have been identified in approximately 35% of the

night sky. Comprehensive data collection for the survey which uses electronic light detectors for

imaging is projected at 15 terabytes [8]. An image from the SDSS displaying the current coverage

of the sky in orange with selected regions displayed in higher resolution is shown in Figure 1.6.

The Galaxy Evolution Explorer (GALEX), a NASA mission led by Caltech, has used micro

channel plate detectors in two bands to image 2/3 of the night sky from the GALEX satellite be-

tween 2003 and the present in its survey [9]. In 1969, the two micro sky survey (TMSS) scanned

70% of the sky and detected approximately 5,700 celestial sources of infrared radiation [10]. With

the advancement of infrared sensing technology, the Two micron "all-sky" survey (2MASS) de-

tected an 80,000 fold increase over the TMSS between 1997 and 2001. The 2MASS was conducted

by two separate observatories at Mount Hopkins Arizona and Cerro Tololo Inter-American Obser-

vatory (CITO), Chile, using 1.3 meter telescopes equipped with a 3 channel camera and a 256x256

electronic light detector. Each night of released data consisted of 250,000 point sources, 2,000

galaxies, and 5,000 images weighing about 13.8 Gigabytes per facility. The compiled catalog has

over 1,000,000 galaxies, extracted from 99.998% sky coverage and 4,121,439 atlas images [11].

12

Page 27: JJenkinson_Thesis

Figure 1.6: Sloan Digital Sky Survey coverage map. http://www.sdss.org/sdss-surveys/.

Sky coverage by the Space Telescope Science Institute’s Guide Star Catalog 2 (GSC-2) survey

which occurred from 2000 to 2009 was 100%. The optical catalog produced by this survey used 1"

resolution scans of 6.5x6.5 square degrees photographic plates from the Palomar and UK Schmidt

telescopes. Almost 1 billion point sources were imaged. Each plate was digitized using a modified

microdensitometer with a pixel size of either 25 or 15 microns (1.7 or 1.0 arcsec respectively).

The digital images are 14000x14000 (0.4 GB) or 23040x23040 (1.1 GB) in size [12]. The second

Palomar Observatory Sky Survey (POSS2) images 897 plates between the early 1980’s and 1999

which covered the entire southern celestial hemisphere using the Oschin Schmidt telescope [13].

One of the main objectives of the ROSAT All-sky survey was to conduct the first all-sky survey

in X-ray with an imaging telescope leading to a major increase in sensitivity and source location

13

Page 28: JJenkinson_Thesis

accuracy. ROSAT was conducted between 1990-1991 covering 99.7% of the sky [14]. The Faint

Images of the Radio Sky at Twenty-centimeters (FIRST) project was designed to produce the radio

equivalent of the Palomer Observatory Sky Survey 10,000 square degrees of the North and South

Galactic Caps. The survey began in 1993 and is currently active [15, 16]. The Deep Near Infrared

Survey (DENIS) is a survey of the southern sky in two infrared and one optical band conducted at

the La Silla European Space Observatory in Chile. The survey ran from 1996 through 2001 and

cataloged 355 million point sources [17]. The present work is part of the Tonantzintla Digital Sky

Survey which is discussed in Chapter 2.

1.2.2 Problem Motivation

The image quantity and data volume produced by digital sky surveys presents human analysis with

an impossible task. Therefore, source detection and classification in modern astronomy necessitate

automation in the image processing and analysis, providing the motivation for the present work.

To address this problem, an algorithm for processing astronomical images to classify galaxies con-

tained therein is presented and implemented using followed by class discrimination of the detected

galaxies according to the scheme mentioned in section 1.1.1. Class discrimination is performed

using extracted galaxy feature values which experience varying accuracy with different methods of

segmentation. Faint regions of galaxies can be lost during segmentation, leading to increased error

during feature extraction and subsequent classification. Enhancement of the galaxy image by mul-

tiple methods is proposed and implemented to reduce data loss during segmentation and improve

the accuracy of feature extraction implied through the increase of classification performance.

1.3 Problem Description and Proposed Solution

This project is part of the on going work within the Tonantzintla Digital Sky Survey. The present

work focuses on automated astronomical image processing and classification. Final performance

criterion is 100% classification in categories E0,. . . ,E7, S0, Sa, Sb, Sc, SBa, SBb, SBc, Irr, while

the present work builds towards that goal by incremental improvement of classification perfor-

14

Page 29: JJenkinson_Thesis

mance with categories elliptical "E," spiral "S," lenticular "S0," barred spiral "SB," and irregular

"Irr." The intent in this work is to partially or fully resolve the classification performance limita-

tions within the galaxy segmentation, edge detection and feature extraction stages of the image

processing pipeline by enhancing the galaxy images by method of the Heap transform to preserve

the faint regions of the galaxies which may be lost during the processing of images without en-

hancement. Classification is performed by the supervised machine learning algorithm Support

Vector Machines (SVM).

1.4 Previous Work

1.4.1 Survey of Automated Galaxy Classification

Morphological classification of galaxies into 5 broad categories was performed by the artificial

neural network (ANN) machine learning algorithm with back propagation trained using 13 pa-

rameters by Storrie-Lombardi in [18]. Odewahn classified galaxies from large sky surveys using

ANNs in [35, 36, 37]. The development progress of an automatic star/galaxy classifier using Ko-

honen Self-Organizing Maps was presented in [38, 39] and using learning vector quantization and

fuzzy classified with back-propogation based neural networks in [39]. An automatic system to

classify images of varying resolution based on morphology was presented in [40]. Owens, in [19],

shows comparable performance between the machine learning algorithms of oblique decision trees

induced with different impurity measures to the artificial neural network used in [18] and that clas-

sification of the original data could be performed with less well-defined categories. In [20] an

artificial neural network was trained on the features of galaxies that were defines as a galaxy class

mean by 6 independent experts. The network performed comparable to the overall root mean

square dispersion between the experts. A comparison of the classification performance of an artifi-

cial neural network machine learning algorithm to that of human experts for 456 galaxies with their

source being the SDSS in [20] was detailed in [21]. Lahav showed the classification performance

of galaxy images and spectra an unsupervised artificial neural network trained with galaxy spectra

15

Page 30: JJenkinson_Thesis

de-noised and compressed by principal component analysis. A supervised artificial neural net-

work was also trained with classes determined by human experts [22]. Folkes, Lahav and Maddox

trained an artificial neural network using a small number of principal components selected from

galaxy spectra with low signal-to-noise ratios characteristic of redshift surveys. Classification was

the performed into 5 broad morphological classes. It was shown that artificial neural networks are

useful in discriminating normal and unusual galaxy spectra [23]. The use of galaxy parameters lu-

minosity and color and the image-structure parameters: size, image concentration, asymmetry and

surface brightness to classify galaxy images into three classes was performed by Bershady, Jangren

and Conselice. It was determined that the essential features for discrimination were a combination

of spectral index, e.g., color, and concentration, asymmetry, and surface brightness [24]. A com-

parison using ensembles of classifiers for the classification methods Naive bayes, back propagation

artificial neural network, and a decision-tree induction algorithm with pruning was performed by

Bazell which resulted in the artificial neural network producing the best results, and ensemble

methods improving the performance of all classification methods [30]. A computational scheme

to develop an automatic galaxy classifier using galaxy morphology was shown to provide robust-

ness for classification using artificial neural networks in [26,34]. Bazell derived 22 morphological

features, including asymmetry, which were used to train an artificial neural network for the clas-

sification of galaxy images to determine which features were most important [27]. Strateva used

visual morphology and spectral classification to show that two peaks correspond roughly to early

(E, S0, Sa) and late-type (Sb, Sc, Irr) galaxies. It was also shown that the color of galaxies corre-

lates with their radial profile [28]. The Gini coefficient, a statistic commonly used in econometrics

to measure the distribution of wealth among a population, was used to quantify galaxy morphol-

ogy based on galaxy light distribution in [29]. In [31], an algorithm for preprocessing galaxy

images for morphological classification was proposed. In addition, the classification performance

between an artificial neural network, locally weighted regression and homogeneous ensembles of

classifiers was performed for 2 and 3 galaxy classes. Lastly, compression and discrimination by

principal component analysis was performed. The artificial network performed best under all con-

16

Page 31: JJenkinson_Thesis

ditions. In [32], principal component analysis was applied to galaxy images and a structural type

estimator names "ZEST" used a 5 nonparametric diagnosis to classify galaxy structure. Finally,

Banerji presented morphological classification by artificial neural networks for 3 classes yielding

90% accuracy in comparison to human classifications [33].

1.4.2 Survey of Support Vector Machines

This method of class segregation is performed by hyperplanes which can be defined by a variety

of functions, both linear and non linear. The development of this method is presented in Chapter 2.

Support vector machines (SVMs) have been employed widely in the areas of pattern recognition

and prediction. Here a limited survey of SVM applications is presented, which includes two sur-

veys conducted by researchers in the field. Romano applied SVMs to photometric and geometric

features computed from astronomical imagery for the identification of possible supernovae in [42].

M. Huertas-Company applied SVM to 5 morphological features, luminosity and redshift calcu-

lated from galaxy images in [43]. Freed and Lee classified galaxies by morphological features into

3 classes using a SVM in [44]. Saybani conducted a survey of SVMs used in oil refineries in [45].

Xie proposed a method for predicting crude oil prices using a SVM in [90]. Petkovi used a SVM

to predict the power level consumption of an oil refinery in [47]. Balabin performed near infrared

spectroscopy for gasoline classification using nine different multivariate classification methods in-

cluding SVMs in [48]. Byun and Lee conducted a comprehensive survey on applications of SVMs

for pattern recognition and prediction in [41]. References contained therein are included here in

support of the present survey. For classification with q classes (q>2), classes are trained pairwise.

The pairwise classifiers are arranged in trees where each tree node represents a SVM. A bottom up

tree originally proposed for recognition of 2D objects was applied to face recognition in [49, 50].

In contrast, an interesting approach was the top down tree published in [51]. SVMs applied to

improve classification speed of face detection was presented in [63, 53]. Face detection from mul-

tiple views was presented in [56, 55, 54]. A SVM was applied to coarse eigenface detection for

a fine detection in [57]. Frontal face detection using SVMs was discussed in [58]. [59] presented

17

Page 32: JJenkinson_Thesis

SVMs for face and eye detection. Independent component analysis for face features were input

to the SVM in [60], orthogonal Fourier-Mellin Moments in [61], and an overcomplete wavelet

decomposition as input in [62]. A myriad of other applications have been ventured using SVMs

including but not limited to 2-D and 3-D object recognition [64, 65, 66], texture recognition [66],

people and pose recognition [67, 68, 69, 70, 71], moving vehicle detection [72], radar target recog-

nition [73, 76], hand written character and digit recognition [74, 75, 71, 77], speaker or speech

recognition [78,79,80,81], image retrieval [82,83,84,85], prediction of financial time series [86],

bankruptcy [87], and other classifications such as gender [88], fingerprints [89], bullet-holes for

auto scoring [90], white blood cells [91], spam categorization [92], hyperspectral data [93], storm

cells [94], and image classification [95].

1.4.3 Survey of Enhancement Methods

Image enhancement is the process of visually improving the quality of a region of or the entire

image with respect to some measure of quality, e.g., the Image Enhancement Measure (EME)

introduced in Chapter 2. Enhancement methods can be classified as either spatial domain or trans-

form domain methods depending on whether the manipulation of the image is performed directly

on the pixels or on the spectral coefficients, respectively. Here, a survey of both spatial and trans-

form domain methods is presented for the enhancement of astronomical images and images in

general. Spatial domain methods are commonly referred to as contrast enhancement methods.

The core of these methods are histogram equalization, logarithmic and inverse log transforma-

tions, negative and identity transformations, nth-power and nth-root transformations, histogram

matching and local histogram processing. Adaptive histogram equalization, which uses local con-

trast stretching to calculate several histograms corresponding to distinct sections of the image, was

applied after denoising to improve the contrast of astronomical images in [96, 99, 100, 34] and

generic images in [106]. Traditional histogram equalization was applied to the Hale-Bopp comet

image for enhancement in [98] and other astronomical images in [97, 101, 103, 104, 105]. [102]

included histogram equalization in the development of two algorithms for point extraction and

18

Page 33: JJenkinson_Thesis

matching for registration of infrared astronomical images. Astronomical images were logarithmi-

cally transformed for visualization in [108] and likewise for generic images in [127]. Inverse log

transformations, negative and identity transformations, nth-power and nth-root transformations,

histogram matching and local histogram processing are introduced and applied to generic images

in [107, 126, 127, 129]. At the core of transform domain methods for image enhancement exist

the discrete Fourier, Heap, α-rooting, Tensor, and Wavelet transforms. Astronomical image en-

hancement performed by the discrete Fourier transform was presented in [109, 111, 112], by the

Wavelet transform in [110] and by the Heap and α-rooting transform in [113], and the Curvelet

transform in [114, 98]. The enhancement of generic images can be seen in [115, 127, 128, 129] by

the discrete Fourier and Cosine transforms, in [116] by the Heap transform, in [117,118,127,128]

by the α-rooting, in [119,120,121,122] by the Tensor or Paried transform, in [123,98,124] by the

Wavelet transform, and in [124,125] by other methods of transform domain processing.

19

Page 34: JJenkinson_Thesis

Chapter 2: MORPHOLOGICAL CLASSIFICATION AND IMAGE

ANALYSIS

2.1 Astronomical Data Collection

Figure 2.1: Schmidt Camera of Tonantzintla. Permission to use image from the Instituto Nacional

de Astrofísica, Óptica y Electrónica (INAOE).

20

Page 35: JJenkinson_Thesis

The Tonantzintla Schmidt camera was constructed in the Harvard Observatory shop under the

guidance of Dr. Harlow Shapley, and started operation in 1942. The spherical mirror is 762 mm

in diameter and coupled to a 660.4 mm correcting plate. The camera is shown in figure 2.1. The

8x8 inch2 photographic plates cover a 5ºx5º field with a plate-scale of 95 arcsec/mm. The existing

collection consists of a total of 14565 glass plates: 10445 taken in direct image mode; and 4120

through a 3.96° objective prism. Figure 2.2 shows the sky covered by the complete plate collection,

marking the center of each observed field [130].

Figure 2.2: Plate Sky Coverage. Permission to use image from the Instituto Nacional de As-

trofísica, Óptica y Electrónica (INAOE).

The plates are first digitized at the maximum optical resolution of the scanner, 4800 dots per

inch (dpi), and then rebinned by a factor 3 for a final pixel size of ˜ 15 μm (1.51 arcsec/pixel) and

transformed to the transparency (positive) mode. Each image has 12470 x 12470 pixels (about 350

Mb in 16-bit mode) and is stored in FITS format.

The images in this project were received from the collection of digitzed photographic plates at

21

Page 36: JJenkinson_Thesis

the Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). The present data set consists

of 6 plate scans. All 6 plates were marked to indicate the galaxies contained within the image. The

goal is the process the digitized plates automatically, i.e., segmenting galaxies within the image,

calculating their features and performing classification. In initial attempts of processing the plate

scans in Matlab on an Alienware M14x with an Intel Core i7-3840QM 2.80GHz CPU and 12.0GB

DDRAM5, e.g, applying the watershed algorithm for segmentation, memory consumption errors

were experienced. Consequently, the galaxies within each plate scan were cropped and process-

ing individually. Figures 2.3, 2.4, 2.5, 2.6, and 2.7 show the original digitized plates AC841 and

AC8409, their marked versions indicating captured galaxies, and the cropped galaxies from both

plates. Upon performing automatic classification with the cropped images, one of the University

of Texas at San Antonio’s (UTSA) high performance computing clusters SHAMU, will be used for

the automatic classification of whole plate scans. SHAMU consists of twenty-two computational

nodes and two high-end visualization nodes. Each computational node is powered by dual Quad-

core Intel Xeon E5345 2.33GHz processors (8M Cache). SHAMU consists of twenty-three Sun

Fire X4150 servers, four Penguin Relion 1800E servers, a DELL Precision R5400 and a DELL

PowerEdge R5400. SHAMU utilizes GlusterFS open-source file system over high speed Infini-

Band connection. A Sun StorageTek 2530 SAS array, fully populated with twelve 500GB hard

drives, acts as SHAMU’s physical storage in a RAID 5 configuration. SHAMU is networked to-

gether with two DELL PowerConnect Ethernet switches and one QLogic Silverstorm InfiniBand

switch.

2.2 Image enhancement measure (EME)

To measure the quality of images and select optimal processing parameters, we consider the de-

scribed in [131, 128] quantitative measure of image enhancement that relates to Weber’s law of

human visual system. This measure can be used for selecting the best parameters for image en-

hancement by the Fourier transform, as well as other unitary transforms. The measure is defined

as follows. A discrete image {fn,m} of size N1 × N2 is divided by k1k2 blocks of size L1 × L2,

22

Page 37: JJenkinson_Thesis

Figure 2.3: Digitized plate AC8431

where integers Li = [Ni/ki], i = 1, 2. The quantitative measure of enhancement of the processed

image, Ma : {fn,m} → {fn,m}, is defined by

EMEa(f) =1

k1k2

k1∑k=1

k2∑l=1

20 log10

[maxk,l(f )

mink,l(f)

],

where maxk,l(f) and mink,l(f) respectively are the maximum and minimum of the image fn,m

inside the (k, l)th block, and a is a parameter, or a vector parameter of the enhancement algorithm.

23

Page 38: JJenkinson_Thesis

Figure 2.4: Marked plate scan AC8431

EMEa(f ) is called a measure of enhancement, or measure of improvement of the image f. We

define a parameter a0 such that EME(f) = EMEa0(f) to be the best (or optimal) Φ-transform-

based image enhancement vector parameter. Experimental results show that the discrete Fourier

transform can be considered as the optimal, when compared with the cosine, Hartley, Hadamard,

and other transforms. When Φ is the identity transformation, I, the EME of f = f is called the

enhancement measure of the image f, i.e., EME(f) = EMEI(f). EME values of the enhanced

galaxy images are presented in subsequent subsections.

24

Page 39: JJenkinson_Thesis

Figure 2.5: Plate scan AC8409

2.3 Spatial domain image enhancement

Contrast enhancement is the process of improving image quality by manipulating the values of

single pixels in an image. This processing is said to occur in the spatial domain, meaning that the

image involved in processing is represented as a plane in 2-Dimensional Euclidean space, which

coined contrast enhancement methods as spatial domain methods. Contrast enhancement in the

spatial domain is paralleled by transform based methods which operate in the frequency domain as

25

Page 40: JJenkinson_Thesis

Figure 2.6: Marked plate scan AC8409

is shown in following subsections. The image enhancement is described by a transformation T

T : f(x, y)→ g(x, y) = T[f(x, y)]

where f(x, y) is the original image, g(x, y) is the processed image, and T is the enhancement

operator. As a rule, T is considered to be a monotonic and invertible transformation.

26

Page 41: JJenkinson_Thesis

Figure 2.7: Cropped galaxies from plate scans AC8431 and AC8409 read left to right and top to

bottom: NGC 4251, 4274, 4278, 4283, 4308, 4310, 4314, 4393, 4414, 4448, 4559, 3985, 4085,

4088, 4096, 4100, 4144, 4157, 4217, 4232, 4218, 4220, 4346, 4258.

2.3.1 Negative Image

This transformation is especially useful for processing binary images, e.g., text-document images,

and is described as

Tn : f(x, y) → g(x, y) = M − f(x, y)

27

Page 42: JJenkinson_Thesis

for every pixel (x, y) in the image plane. M is the maximum intensity in the image f(x, y). Figure

2.8 shows this transformation for the image 0 ≤ f(x, y) ≤ L − 1, where L is the intensity value

in the image. In the discrete, M is the maximum level, M = L− 1, and Tn : r → s = L− 1− r,

where r is the original image intensity and s is the intensity mapped by the transformation. The

example of an image negative is given in Figure 2.9.

0 50 100 150 200 2500

50

100

150

200

250

identitynegative46*log(1+r)16*sqrt(1+r)

40*(1+r)(1/3)

0.004*r2

c*r3

Figure 2.8: Negative, log and power transformations.

2.3.2 Logarithmic Transformation

The logarithmic function is used in image enhancement, because it is a monotonically increasing

function. The transformation is described as

Tl : f(x, y) → g(x, y) = c0log(1 + f(x, y))

28

Page 43: JJenkinson_Thesis

Figure 2.9: Top to bottom: Galaxy NGC4258 and its Negative Image.

where c0 is a constant and is calculated as c0 = M/log(1 + M) in order to preserve the resolution

of the enhanced image by gray scale. For example, for the 256-gray level scale image, c0 ≈ 46.

Other versions of this transform are based on the use of the nth roots instead of the log function as

29

Page 44: JJenkinson_Thesis

shown in Figure2.8. For example,

T2 : f(x, y)→ g(x, y) = c0

√1 + f(x, y).

where the constant c0 = 16, when processing a 256-level gray scale image. Examples of image

enhancement by such transformations are given in Figure 2.10.

(a) Original image (b) log transformation

(c) square root transformation (d) 3rd root transformation

Figure 2.10: Logarithmic and nth root transformations.

2.3.3 Power Law Transformation

These transformations are parameterized by γ and described as

Tγ : f(x, y) → g(x, y) = cγ(1 + f(x, y))γ

30

Page 45: JJenkinson_Thesis

where γ > 0 is a constant which is selected by the user. The constant cγ is used to normalize the

gray scale levels within [0,M].

For 0 ≤ γ ≤ 1, the transform maps a narrow range of dark samples of the image into a wide

range of bright samples, and it smoothes the difference between intensities of bright samples of the

original image. The Power law transformation is shown with γ = 0.0500, 0.8500, 1.6500, 2.4500,

3.2500, 4.0500, and 4.8500 in Figure 2.11.

0 50 100 150 200 2500

50

100

150

200

250

original0.050.851.652.453.254.054.85

Figure 2.11: γ-power transformation.

Examples of image enhancement by power log transformations are given in Figure 2.12.

2.3.4 Histogram Equalization

Consider an image of size N×N as a random realization ξ takes values r from a range [rmin, rmax],

and let h(r) = fξ(r) be the probability density function of ξ. It is desirable to transform the image

in such a way that the new image will have the uniform distribution. The equates to a change of

31

Page 46: JJenkinson_Thesis

(a) Original image (b) γ = 0.005

(c) γ = 0.3 (d) γ = 0.9

Figure 2.12: Galaxy NGC 4217 power law transformations.

32

Page 47: JJenkinson_Thesis

random variable

ξ → ξ = w(ξ) (w : r → s)

such that w is a monotonically increasing function

h(s) = fbξ(s) =1

w(rmax)−w(rmin).

The following fact is well-known:

h(s) = h(r)dr

ds

or h(r)dr = h(s)ds. Integrating this equality yields

r∫rmin

1

w(rmax)− w(rmin)ds =

r∫rmin

h(a)da

which yields s = w(r)

w(r) −w(rmin)

w(rmax)− w(rmin)=

r∫rmin

h(a)da = F (r).

In the particular case, when rmin = 0 and w(rmin) = 0, the following result is obtained

w(r) = w(rmax)F (r).

In the case of digital image, where the image has been sampled and quantized, the discrete version

of this transform has the representation

r → s =

⎧⎪⎪⎨⎪⎪⎩[M

r∑k=1

h(k)

]if r = 1, 2, . . . , M − 1

0 if r = 0

where r is the integer value of the original image, s is the quantized value of the transformed

image, and h(k) is the histogram of the image.

33

Page 48: JJenkinson_Thesis

So, independent of the image intensity probability density function, the intensity density func-

tion of the processed image is uniform,

fbξ(s) =1

w(rmax)− w(rmin).

Histogram equalization applied to galaxy NGC 6070 is shown in Figure 2.13 with the correspond-

ing original and enhanced image histograms shown in Figure 2.14. The histogram equalization

destroys the details of the galaxy image, indicating that spatial methods of enhancement are not

suitable for all images. This is part of the motivation for using α-rooting, Heap transform, and

other transform based which are described in the next section.

(a) Original image (b) Histogram equalization

Figure 2.13: Histogram processing to enhance Galaxy NGC 6070.

2.3.5 Median Filter

A noteworthy spatial domain filter is the Median filter. This filter is based on order statistics. Given

a set of numbers S = {1, 2, 1, 4, 2, 5, 6, 7}, the values in S are rearranged in order of descending

value, i.e., 7, 6, 5, 4, 2, 2, 1, 1, and labeled as order statistics in ascending order, i.e., 7 is the 1st order

statistic and the second 1 is the 7th order statistic. The 4 and adjacent 2 can both be considered

34

Page 49: JJenkinson_Thesis

0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

7000

8000

9000

Figure 2.14: Top to Bottom: Histogram of original and enhanced image.

35

Page 50: JJenkinson_Thesis

as the median here, and the selection is made at the discretion of the user. In general, the highest

order statistics is regarded as the nth order statistic.

The Median filter comes from the follow problem in probability. Given a set of points S =

{x1, x2, . . . , x7} containing the Median point m, i.e., m ∈ S, which point in the set closest to

every other point in the set. Figures 2.15 illustrate this in two different ways.

The Median m is found by minimization of the following function

|m− x1|+ |m− x2|+ |m− x3|+ · · ·+ |m− xn| =n∑

k=1

|xk −m|.

In signal filtration, the Median filter preserves the range and edges of the original signal in contrast

to the mean filter which destroys the signal edges. For signals with many consecutive noisy points,

the length of the median filter must be extended to retain this behavior. The Median filter has the

root property where the output of the filtration will be identical to the previous output after a certain

number of filtration iterations. The Median filter is effective in removing salt and pepper noise.

x

xx

x

x

mx

x

5

2

1

4

3

6

7

(a) median in the line

x x x x m x x x1 2 3 4 5 6 7 x8

(b) median in space

Figure 2.15: Illustration of the median of a set of points in different dimensions.

36

Page 51: JJenkinson_Thesis

2.4 Transform-based image enhancement

In parallel to directly processing image pixels in the spatial domain by contrast enhancement meth-

ods, transform based methods of enhancement manipulate the spectral coefficients of an image in

the domain of the transform. The primary benefits of these methods are low computational com-

plexity and the usefulness of unitary transforms for filtering, coding, recognition, and restoration

analysis in signal and image processing. First the operators that transform the domain of the image

are introduced followed by methods of enhancement in the transform domain.

2.4.1 Transforms

Each of the following transforms presented here in one dimension can easily be extended into two

dimensions which is where the transforms are useful for image processing.

Fourier Transform

The one dimensional discrete Fourier transform (1-D DFT) maps the real line in the time domain to

the complex domain resulting in time domain signals being transformed into the frequency domain.

The direct transform and inverse transform pair are defined, for a discrete function xn, as

Fp =

N−1∑n=0

xncos

(2πnp

N

)− jxnsin

(2πnp

N

)

xn =1

N

N−1∑p=0

Fpcos

(2πnp

N

)+ jFpsin

(2πnp

N

)

where n = 0, 1, . . . , N − 1 represents discrete time points and p = 0, 1, . . . , N − 1 represents

discrete frequency points. The basis functions for this transform are complex exponentials. The

"real" and "imaginary" parts of this sum are considered as the sum of the cosine terms and the sum

of the sine terms, respectively, and are computed by the fast Fourier transform.

37

Page 52: JJenkinson_Thesis

Hartley Transform

Similar to the Fourier transform is the Hartley transform, but only generates real coefficients. This

transform is defined in the one dimensional case as

Hp =N−1∑n=0

xn

(cos

(2πnp

N

)+ sin

(2πnp

N

))=

N−1∑n=0

xncas

(2πnp

N

)

where the basis function cas(t) = cos(t) + sin(t). The inverse transform is calculated by

xn =1

N

N−1∑p=0

Hpcas

(2πnp

N

)

Cosine Transform

The cosine transform or cosine transform of type 2 is determined by the following basis functions:

φp(n) =

⎧⎪⎪⎨⎪⎪⎩1√2N

, if p = 0

1√N

cos

(π(n + 1/2)p

N

), if p �= 0

for the p = 0 case as

Xc0 =

1√2N

N−1∑n=0

xn

and for the p �= 0 case as

Xcp =

1√N

N−1∑n=0

xncos

(π(n + 1/2)p

N

)

=1√N

N−1∑n=0

xn

(cos( πn

2N

)cos(pπn

N

)− sin

( πn

2N

)sin(pπn

N

))where p = 1 : (N − 1).

38

Page 53: JJenkinson_Thesis

Paired Transform

The one dimensional unitary discrete paired transform (DPT), also known as the Grigoryan trans-

form is described in the following way. The transform describes a frequency-time representation

of the signal by a set of short signals which are called the splitting-signals. Each such signal is

generated by a frequency and carries the spectral information of the original signal in a certain set

of frequencies. These sets are disjoint. Therefore, the paired transform transfers the signal into a

space with frequency and time, or space which represents a source "bridge" between the time and

frequency. Consider the most interesting case, when the length of signals is N = 2r , r > 1. Let

p, t ∈ XN = {0, 1, . . . , N − 1}, and let χp,t(n) be the binary function

χp,t(n) =

⎧⎪⎨⎪⎩ 1, if np = tmodN

0, otherwisen = 0 : (N − 1).

Given a sample p ∈ XN and integer t ∈ [0, N/2], the function

χ′p,t(n) = χp,t(n)− χp,t+n/2(n)

is called the 2-paired, or shortly the paired function.

The complete set of these functions is defined for frequency points p = 2k, k = 0, . . . , r − 1

and p = 0, and time points 2kt. The binary paired functions can also be written as the following

transformation of the consine function:

χ′2k,2kt(n) = M(cos(2π(n− t)/2r−k)), (χ′

0,0(n) ≡ 1),

where t = 0 : (2r−k−1 − 1). M(x) is the real function which is not zero only on the bounds

of the interval [−1, 1] and takes values M(−1) = −1 and M(1) = 1. The paired functions are

determined by the extremal values of the consine functions, when they run through the interval

with different frequencies.

39

Page 54: JJenkinson_Thesis

The totality of the N paired functions

{χ′2k,2kt; n = 0 : (r − 1), t = 0 : (2r−n−1 − 1, 1}

is the complete and orthogonal set of functions [132,134].

Haar Transform

The Haar transform is the first orthogonal transform found after the Fourier transform, which is

now widely used in wavelets theory and in applications in image processing, in the N = 2r, r > 1

the transform is defined without normalization by the following matrix:

[HA2] =

⎡⎢⎣ 1 1

1 −1

⎤⎥⎦

[HA4] =

⎡⎢⎣ [HA2] [HA2]√

2I2 −√2I2

⎤⎥⎦ ,

where I2 is the unit matrix 2× 2, and for k > 2

[HA2k+1] =

⎡⎢⎣ [HA2k] [HA2k]√

2kI2k −√

2kI2k

⎤⎥⎦ .

Heap Transform

The discrete Heap transform is a new concept which was introduced by Artyom Grigoryan in 2006

[135]. The basis functions of the transformation represent certain waves which are propagated in

the “field" which is associated with the signal generator. The composition of the N-point discrete

heap transform, T, is based on the special selection of a set of parameters ϕ1, ..., ϕm, or angles

from the signal generator and given rules, where m ≥ N − 1. The transformation T is considered

40

Page 55: JJenkinson_Thesis

separable, which means there exist such transformations Tϕ1, Tϕ2, ..., Tϕm that

T = Tϕ1,...,ϕm = Tϕi(m). . . Tϕi(2)

Tϕi(1)

where i(k) is a permutation of numbers k = 1, 2, ..., m.

Consider the case when each transformation Tϕkchanges only two components of the vec-

tor z = (z1, ..., zN−1)′. These two components may be chosen arbitrarily and such a selection is

defined by a path of the transform. Thus, Tϕkis represented as

Tϕk: z→ (z1, ..., zk1−1, fk1(z, ϕk), zk1+1, ..., zk2−1, fk2(z, ϕk), zk2+1, ..., zm). (2.1)

Here the pair of numbers (k1, k2) is uniquely defined by k, and 1 ≤ k1 < k2 ≤ m. For simplicity

of calculations, we assume that all first functions fk1(z, ϕ) in (2.1) are equal to a function f(z, ϕ),

as well as all functions fk2(z, ϕ) equal to a function g(z, ϕ). The n-dimensional transformation

T = Tϕ1,...,ϕm is composed by the transformations

Tk1,k2(ϕk) : (zk1, zk2) → (f(zk1, zk2 , ϕk), g(zk1, zk2, ϕk)).

The selection of parameters ϕk, k = 1 : m, is based on specified signal generators x, the num-

ber of which is defined through the given decision equations, to achieve a uniqueness of parameters

and desired properties of the transformation T. Consider the case of two decision equations with

one signal-generator.

Let f(x, y, ϕ) and g(x, y, ϕ) be functions of three variables; ϕ is referred to as the rotation

parameter such as the angle, and x and y as the coordinates of a point (x, y) on the plane. It is

assumed that, for a specified set of numbers a, the equation g(x, y, ϕ) = a has a unique solution

with respect to ϕ, for each point (x, y) on the plane or its chosen subset.

41

Page 56: JJenkinson_Thesis

The system of equations ⎧⎪⎨⎪⎩ f(x, y, ϕ) = y0

g(x, y, ϕ) = a

is called the system of decision equations [135]. First the value of ϕ is calculated from the second

equation which we call the angular equation. Then, the value of y0 is calculated from the given

input (x, y) as y0 = f(x, y, ϕ). It is also assumed that the two-point transformation

Tϕ : (z0, z1)→ (z′0, z

′1) = (f(z0, z1, ϕ), g(z0, z1, ϕ)),

which is derived from the given decision equations by Tϕ : (x, y)→ (f(x, y, ϕ), a), is unitary. We

call Tϕ the basic transformation.

Example 1: Consider the following functions that describe the elementary rotation:

f(x, y, ϕ) = x cos ϕ− y sinϕ,

g(x, y, ϕ) = x sinϕ + y cosϕ.

Given a real number, the basic transformation is defined as the rotation of the point (x, y) to the

horizontal Y = a,

Tϕ : (x, y)→ (x cos ϕ− y sin ϕ, a).

The rotation angle ϕ is calculated by

ϕ = arccos

(a√

x2 + y2

)+ arctan

(y

x

).

The first pair to be processed is (x0, x1),

(x0, x1) → (x(1)0 , a),

42

Page 57: JJenkinson_Thesis

the next is (y0, x2),

(x(1)0 , x2) → (x

(2)0 , a),

with the new value of x0 = x(2)0 , and so on. The first component of the signal is renewed and

participates in calculation of all (N − 1) basic transformations Tk = Tϕk, k = 1 : (N − 1).

Therefore, at the stage k, the first component of the transform is y0 = x(k)0 .

The complete transform of the signal-generator x is

T (x) = (y0, a1, a2, . . . , aN−1), (y0 = x(N−1)0 ).

The signal-flow graph of processing the five-point generator x is shown in Figure 2.16.

T1

T2

T3

T4

x1

x0

x3

x2

x4

a4

a3

a1

a2

y0

y0

y0

y0

Tk=T(φ

k), k=1:4

φk=r(y

0,x

k,a

k)

Figure 2.16: Signal-flow graph of determination of the five-point transformation by a vector x =(x0, x1, x2, x3, x4)

′.

This transform is applied the the input signal zn in the same order, or path P , as the generator

x. In the first stage the first two components are processed

Tϕ1 : (z0, z1) → (z(1)0 , z

(1)1 ),

next

Tϕ2 : (z(1)0 , z2) → (z

(2)0 , z

(1)2 ),

43

Page 58: JJenkinson_Thesis

φ1

φ2 φ

N−1

z0

(1)z

0

(2)z0

z1

(1)

z0

(N−1)

φ1,T

2,T

N−1,T

N−1

1

2

N−1

z1

z2 z

N−1

x1

z2

(1) zN−1(1)

z0

(N−2)

x2

xN−1

x0 x

0

(1) x0

(2) x0

(N−2)y

0

...

...

......

Level 1

Level 2

Figure 2.17: Network of the x-induced DsiHT of the signal z.

and so on. The result of the transform is

T [z] = (z(n−1)0 , z

(1)1 , z

(1)2 , . . . , z

(1)N−1), a = 0.

Now consider the case when all parameters ak = 0, i.e., when the whole energy of the vector

x is collected in one heap, and then transfered to the first component. In other words, we consider

the Givens rotations of vectors, or points (y0, xk) on the horizontal Y = 0. Figure 2.16 shows

the transform-network of the transform of the signal z = (z0, z1, z2, ..., zN−1)′. The parameters

(angles) of the transformation are generated by the signal-generator x. In the 1st level and the kth

stage of the flow-graph, the angle ϕk is calculated by inputs (x(k−1)0 , xk), where k ∈ {1, N − 1}

and x(0)0 = x0. This angle is used in the basic transform Tk = Tϕk

to define the next component

x(k)0 , as well as to perform the transform of the input signal z, in the 2nd level. The full graph itself

represents a co-ordinated network of transformation of the vector z, under the action on x.

2.4.2 Enhancement methods

The common algorithm for image enhancement via a 2-D invertible transform consists of: The

frequency ordered system-based method can be represented as

x → X = T(x)→ O ·X → T−1[O(X)] = x.

44

Page 59: JJenkinson_Thesis

Algorithm 2.1 Transform based image enhancement

1. Perform the 2-D unitary transform

2. Multiply the transform coefficients, X(p, s) by some factor, O(p, s)

3. Perform the 2-D inverse unitary transform

O is an operator which could be applied on the coefficients X(p, s) of the transform or its real

and imaginary parts ap,s and bp,s if the transform is complex. For instance, they could be X(p, s),

aαp,s, bα

p,s, or logαap,s, logαbp,s. The cases of greatest interest are when O(X)p,s is an operator of

magnitude and when O(x)p,s is performed separately on the coefficients.

Let X(p, s) be the transform coefficients and let the enhancement operator O be of the form

X(p, s) · C(p, s), where the latter is a real function of the magnitude of the coefficients, i.e.,

C(p, s) = f(|X|)(p, s). C(p, s) must be real since only modification of the magnitude and not

phase information is desired. The following possibilities are a subset of methods for modifying the

magnitude coefficients within this framework.

1. C1(p, s) = C(p, s)γ|X(p, s)|α−1, 0 ≤ α < 1 (which is the so-called modified α-rooting);

2. C2(p, s) = logβ[|X(p, s)|λ + 1

], 0 ≤ β, 0 < λ;

3. C3(p, s) = C1(p, s) · C2(p, s).

α, λ, and β are the parameters of the enhancement which are selected by the user to achieve the

desired enhancement. Denoting by θ(p, s) ≥ 0 the phase of the transform coefficient X(p, s), the

transform coefficient can be expressed as

X(p, s) = |X(p, s)|ejθ(p,s)

where |X(p, s)| is the magnitude of the coefficients. The investigation of the operator O applied to

the modules of the transform coefficients instead of directly to the transform coefficients X(p, s)

45

Page 60: JJenkinson_Thesis

will be performed as

O(X)(p, s) = O(|X|)(p, s)|e[jθ(p,s)].

The assumption that the enhancement operator O(|X|) takes one of the forms Ci(p, s)|X(p, s)|, i =

1, 2, 3 at every frequency point (p, s) is made. Figure 2.18 shows Galaxy NGC 4242 in the time

domain (pixel intensity values) and frequency domain (spectral coefficients).

(a) intensity image (b) spectral coefficients

Figure 2.18: Intensity values and spectral coefficients of Galaxy NGC 4242.

Figure 2.19 shows Butterworth lowpass filtering for Galaxy UGC 7617 for n = 2 and D0 =

120. The transfer function of the filter of order n with cutoff frequency at a distance D0 from the

origin is defined as

X(p, s) =1

1 + [D(p, s)/D0]2n.

α-rooting

Figure 2.20 shows the enhancement of Galaxy NGC 4242 by method C1(p, s) with α = 0.02.

Heap transform

Figure 2.21 shows the results of enhancing galaxy images PIA 14402 and NGC 5194 by the Heap

transform.

46

Page 61: JJenkinson_Thesis

(a) original image (b) low pass filtering

Figure 2.19: Butterworth lowpass filtering performed in the Fourier (frequency) domain.

(a) original image (b) enhancement by α = 0.02

Figure 2.20: α-rooting enhancement of Galaxy NGC 4242.

2.5 Image Preprocessing

The steps taken to prepare the galaxy images for feature extraction are detailed in this section. The

position, size, and orientation of the galaxy varies from image to image. Therefore, the prepro-

cessing steps will produce a training set that is invariant to galaxy position, scale and orientation.

Individual galaxies were cropped from the digitized photographic plates and processed manually

by adjusting parameters at several stages in the pipeline. Automatic selection of these parameters

if part of future work. Figure 2.5 shows the computational scheme for the classification pipeline.

47

Page 62: JJenkinson_Thesis

Figure 2.21: Top: Galaxy PIA 14402, Bottom: NGC 5194, both processed by Heap transform.

2.5.1 Segmentation

Other than the object of interest, galaxy images contain stars, gast, dust, and artifacts induced

during the imaging and scanning process. For a galaxy to be recognized, such contents not included

in the galaxy need to be removed. In general, this process involves denoising and inpainting. Here,

the background is subtracted via a single threshold or Otsu’s method. Otsu’s method is calculated

in Matlab by the command graythresh. Otsu’s method automatically selects a good threshold

for images where there are few stars and the galaxy intensity varies greatly from the background.

As the quantity and size of stars increase in the image, or when the background is close in intensity

to the galaxy, Otsu’s method is not performing well. After background subtraction by thresholding,

stars and other artifacts are removed by the morphological opening operation by different values

of pixel connectivity using the Matlab function bwareaopen.

A grayscale image relates to a function f(x, y) that takes values from a finite interval [0, M ].

In the discrete case, M is considered to be a positive integer. Consider an image with only one

48

Page 63: JJenkinson_Thesis

Galaxy Images

Segmentation:

Thresholding

Morphological Opening

Feature Invariance:

Rotation, Centering, Resizing

Canny Edge Detection

Feature Extraction:

Elongation

Form Factor

Convexity

Bounding-rectangle-to-fill-factor

Bounding-rectangle-to-perimeter

Asymmetry Index

Support Vector Machine

Galaxy Classes

Figure 2.22: Computational scheme for galaxy classification.

49

Page 64: JJenkinson_Thesis

object

f(x, y) =

⎧⎪⎨⎪⎩ 1 (x, y) ∈ O ⊂ X

0 otherwise

where O is the set of pixels in the object, and X is the whole domain of the image. The function

f(x, y) represents a binary image. Any number can be used instead of 1, e.g., 255. Thresholding

is defined as the following procedure

g(x, y) = gT (x, y) =

⎧⎪⎨⎪⎩ 1 f(x, y) ≥ T

0 otherwise

where T is a positive number from the interval [0, M ]. This number is called a threshold.

Otsu’s method begins by representing a grayscale image by L gray levels. ni represents the

number of pixels at level i, and the total number of pixels N = n1 + n2 + . . . + nL. The image

histogram is then described by a probability distribution

pi =ni

N, pi ≥ 0,

L∑i=1

pi = 1.

The intensity values are then separated into two classes C0 and C1 by a threshold k, where C0

represents the intensities [0, . . . , k] and C1, [k + 1, . . . , L]. The occurrence, mean levels for each

class are respectively given by

w0 = Pr(C0) =

k∑i=1

pi = w(k)

w1 = Pr(C1) =L∑

i=k+1

pi = 1− w(k)

and

μ0 =k∑

i=1

iPr(i|C0) =k∑

i=1

ipi

w0=

μ(k)

w(k)

50

Page 65: JJenkinson_Thesis

μ1 =L∑

i=k+1

iPr(i|C1) =L∑

i=k+1

ipi

w1=

μT − μ(k)

1− w(k)

where w(k) and μ(k) are the zeroth- and first-order moments up the the kth level, respectively, and

μT = μ(L) =L∑

i=1

ipi

is the total mean level of the original image. The following relationships are easily verified for any

k

w0μ0 + w1μ1 = μT , w0 + w1 = 1. (2.2)

The class variances are given by

σ20 =

k∑i=1

(i− μ0)2Pr(i|C0) =

k∑i=1

(i− μ0)2pi

w0

σ21 =

L∑i=k+1

(i− μ1)2Pr(i|C1) =

L∑i=k+1

(i− μ1)2pi

w1.

The following criteria to measure k as an effective threshold are introduced from discriminant

analysis

λ =σ2

B

σ2W

, κ =σ2

T

σ2W

, η =σ2

B

σ2T

,

where

σ2W = w0σ

20 + w1σ

21

σ2B = w0(μ0 − μT )2 + w1(μ1 − μT )2

and from equation 2.2

σ2T =

L∑i=1

(i− μT )2pi

are the within-class variance, the between-class variance, and the total variance of levels, respec-

tively.

51

Page 66: JJenkinson_Thesis

Through relationships between the criteria, the problem becomes finding the k that maximizes

the criterion η or equivalently σ2B by

η(k) =σ2

B

σ2T

or

σ2B(k) =

[μTw(k)− μ(k)]2

w(k)[1− w(k)],

and, as shown in [136], the optimal threshold k∗, restricted to the range S∗ = {k; 0 < w(k) < 1}is

σ2B(k∗) = max

1≤k<Lσ2

B(k).

Figure 2.23 shows original images with subtracted backgrounds by different manual thresholds

and Otsu’s method.

(a) Original image (b) T = 60

(c) T = 74 (d) Otsu’s T = 85

Figure 2.23: Background subtraction of Galaxy NGC 4274 by manual and Otsu’s thresholding.

The average difference between single thresholds and thresholds by Otsu’s method for the

enhanced data set was 6.67 with a standard deviation of 11.21.

Mathematical morphology provides image processing with powerful nonlinear filters which

52

Page 67: JJenkinson_Thesis

operate according to the Minkowski’s addition and subtraction. Given subsets X and B of Rn,

Minkowski’s addition, X ⊕B, of sets X and B is the set

X ⊕ B =⋃b∈B

{Xb = {x + b; x ∈ X}}.

For the set B = {−b; b ∈ B} symmetric to B with respect to the origin, the set X ⊕ B is called a

dilation of the set X by B. The set B is said to be a structuring element.

So, in the symmetric case, if B = B, Minkowski’s addition of sets X and B and the dilation

of X by B are the same concepts.

The dual operation to Minkowski’s addition of sets X and B is the subtractions, X B, which

is defined as

X B = (Xc ⊕ B)c =⋂b∈B

{Xb = {x + b; x ∈ X}}.

The set X B dual to the dilation X ⊕ B is called an erosion of the set X by B. By means of

dilation and erosion of sets, the corresponding operations of opening, X ◦ B, and closing, X • B,

can be defined as

X ◦ B = (X B)⊕ B =⋃{x + B; x + B ⊂ X}

X • B = (Xc ◦ B)c = (X ⊕ B) B.

Herewith, the operation of opening of X by B is dual to the operation of closing of X by B, i.e.,

X ◦ B = (Xc • B)c. Figure 2.24 shows star and artifact removal of Galaxy NGC 5813 with pixel

connectivity P = 64.

2.5.2 Rotation, Shifting and Resizing

To achieve invariance to orientation, position, and scale, the galaxies were shifted by their geomet-

rical center, rotated by the angle between their first principal component and the image x-axis, and

resized to a uniform size of 128x128 pixels, respectively.

53

Page 68: JJenkinson_Thesis

(a) original image (b) thresholded image

(c) opened image

Figure 2.24: Morphological opening for star removal from Galaxy NGC 5813.

54

Page 69: JJenkinson_Thesis

The geometrical center, or centroid, of an object in an image is the center of mass of the object.

The center is the point where one can concentrate the whole mass of the object without changing

the first moment relative to any axis. The first moment with respect to the x axis is defined by

μx

∫ ∫X

f(x, y)dxdy =

∫ ∫X

xf(x, y)dxdy.

The first moment with respect to the y axis is defined by

μy

∫ ∫X

f(x, y)dxdy =

∫ ∫X

yf(x, y)dxdy.

The coordinate of the object center is then (μx, μy).

In the discrete case, the first moment with respect to the axis x is defined by

μx

∑n

∑m

fn,m =∑

n

∑m

nfn,m =∑

n

n∑m

fn,m

and with respect to the y axis

μy

∑n

∑m

fn,m =∑

n

∑m

mfn,m =∑

n

m∑m

fn,m

where the summation is performed over all pixels (n, m) of the object O.

The center of the object is defined as

(μx, μy) =

⎛⎜⎜⎝∑

n

∑m

nfn,m∑n

∑m

fn,m

,

∑n

∑m

mfn,m∑n

∑m

fn,m

⎞⎟⎟⎠ .

55

Page 70: JJenkinson_Thesis

In the discrete binary case, the center is defined as

(μx, μy) =

⎛⎜⎜⎜⎝∑

(n,m)∈O

n

∑(n,m)∈O

1,

∑(n,m)∈O

m

∑(n,m)∈O

1

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝∑

(n,m)∈O

n

card(O),

∑(n,m)∈O

m

card(O)

⎞⎟⎟⎟⎠where card(O) is the cardinality of the set O that defines the binary image.

To find the orientation of an object in an image, if possible or if such exists and is unique,

consider a line along which the second moment is minimum. In other words, consider the integral

E = μ2(l) =

∫ ∫l

r2f(x, y)dxdy (2.3)

where r is the distance of point (x, y) from the line l, i.e., the length of the perpendicular emitted

from point (x, y) to the line l. The line l is described by the equation

l : xsinθ − ycosθ + p = 0

where p is the length of the perpendicular drawn from the origin (0, 0) to the line l. Therefore, 2.3

can be rewritten as

E = E(θ) =

∫ ∫l

(xsinθ − ycosθ + p)2f(x, y)dxdy. (2.4)

The following two denotations are made to for the image coordinates shifted by the geometrical

center of the object

x′ = x− μx, y′ = y − μy,

and the second moments of the shifted object are denoted

a =

∫ ∫l

(x′)2f(x, y)dx′dy′, c =

∫ ∫l

(y′)2f(x, y)dx′dy′, b =

∫ ∫l

(x′)2(y′)2f(x, y)dx′dy′.

56

Page 71: JJenkinson_Thesis

E(θ) can then be rewritten as

E(θ) = asin2(θ)− bsin(θ)cos(θ) + ccos2(θ)

or E(θ) =1

2(a + c)− 1

2(a− c)cos(2θ)− 1

2bsin(2θ).

Differentiating E by θ gives

E(θ)′ = 0 → tan(2θ) =b

a− c(a �= c �= b).

Therefore, the angle of the orientation line l(θ) is found by

sin(2θ) = ± b√b2 + (a− c)2

, cos(2θ) = ± a− c√b2 + (a− c)2

.

The angle of the orientation line l(θ) was calculated for each galaxy image, and the used to rotate

the image by the Matlab function imrotate. Figure 2.25 shows this rotation for galaxy image

NGC 4096 by angle−64 degrees. Note that the image x-axis of the image in Matlab is vertical, and

the desired orientation of the galaxy’s first principal component being collinear with the horizontal

axis of the image is achieved by rotating the galaxy an additional 90 degrees.

(a) segmented galaxy (b) rotated galaxy

Figure 2.25: Rotation of Galaxy image NGC 4096 by galaxy second moment defined angle.

57

Page 72: JJenkinson_Thesis

Resizing an image involves either subsampling if the desired image size is less than the original

image size and resampling if the desired image size is greater than the original image. Subsampling

reduces the size of an image by creating a new image with pixel value a calculated from the values

of a neighborhood of pixels about a in the original image. Resampling from the image size of

128× 128 into 256× 256 is calculated by

⎡⎢⎢⎢⎢⎢⎢⎢⎣

· · · ·· a b ·· c d ·· · · ·

⎤⎥⎥⎥⎥⎥⎥⎥⎦→

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

· · · · · ·· a a b b ·· a a b b ·· c c d d ·· c c d d ·· · · · · ·

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

Another process of subsampling is defined by calculation of means, as follows below for the

2× 2 subsampling example where

a =a1 + a2 + b1 + b2

4, b =

a3 + a4 + b3 + b4

4

c =c1 + c2 + d1 + d2

4, d =

c3 + c4 + d3 + d4

4.

Image resizing is peformed in Matlab by the function imresize. Figure 2.26 shows an example

of image resizing from size 138× 197 into size 128× 128 for galaxy image NGC 4220.

2.5.3 Canny Edge Detection

The Canny edge detection method was developed by John Canny in 1986. The Canny edge detector

was developed to satisfy the performance criteria: (1) Good detection (2) Good localization (3)

Only one response to a single edge. Good detection means reducing false positives (non edges

being detected as edges) and false negatives (edges not being detected). Good localization means

that minimal error exists between identified edge points and true edge points. Only one response

58

Page 73: JJenkinson_Thesis

(a) cropped image size 138× 197 (b) image resized to 128× 128

Figure 2.26: Resizing of Galaxy NGC 4220.

to a single edge ensures that the operator eliminates the multiple maxima output from the filter

at step edges. Canny formulated each of these three criterion mathematically and found solutions

through numerical optimization. The result is that impulse response of the first derivative of a

Gaussian approximately the optimal edge detector which optimizes the signal-to-noise ratio and

localization, i.e., the first two criteria. The edge detected algorithm is presented below here. Let

f(x, y) denote the input image and G(x, y) denote the Gaussian function

G(x, y) = e−x2 + y2

2σ2 .

The convolution of these two functions results in a smoothing of the input image and is written as

s(x, y) = f(x, y) ∗G(x, y),

where σ controls the degree of smoothing of the image.

First order finite difference approximations are used to compute the gradient of s(x, y) which

59

Page 74: JJenkinson_Thesis

is written as [sx, sy] where

sx =δs

δx, sy =

δs

δy.

The gradient magnitude and orientation or angle are respectively computed by

M(x, y) =√

sx2 + sy

2

and

α(x, y) = tan−1

[sy

sx

].

The array of image magnitudes will contain large values in the directions of greatest change. The

array is then thinned so that only the magnitudes at the points of greatest local change remain.

This procedure is called non maxima suppression. An example presents this notion. Consider a

3× 3 grid where 4 possible orientations are found through the center point in the grid: horizontal,

vertical, +45 degrees, and −45 degrees. All possible orientations have been discretized into these

4 orientations. A range of orientations is then specified to quantize the orientations. Edge direction

is determined by the edge normal, computed by 2.5.3.

Let dk, k = 1, 2, . . . , n represent the discrete orientations where n is the number of orientations.

Using the 3× 3 grid, every nonmaxima suppression scheme at every point (x, y) in α(x, y) can be

formulated as where st(x, y) is the nonmaxima suppressed image.

Algorithm 2.2 Nonmaxima suppression algorithm

1. Find the orientation dk which is closest to α(x, y)

2. Set st(x, y) = 0ifM(x, y) is less than at least one of its two neighbors along dk, otherwise,

st(x, y) = M(x, y).

Finally, a hysteresis thresholding is applied to st(x, y) to reduce falsely detected edges. Two

thresholds are used here and are referred to as a weak (or low) threshold τ1 and a strong (or high)

threshold τ2. Too low of a threshold will retain false positives. Too high of a threshold will remove

60

Page 75: JJenkinson_Thesis

correctly detected edges. The double threshold produces two new images written as

stw(x, y) = st(x, y) ≥ τ1

where stw(x, y) denotes the image created due to the weak threshold and

sts(x, y) = st(x, y) ≥ τ2

where sts(x, y) denotes the image created due to the strong threshold. Edges in sts(x, y) are

linked into contours by searching through an 8 pixel neighborhood in stw(x, y) for edges that can

be linked to the end of the current edge. The output of the algorithm is the image of all nonzero

points in stw(x, y) appended to sts(x, y). Canny edge detection was performed using the Matlab

function edge with τ1 = 0.3, τ2 = 0.9 and σ = 1.5. Figure 2.27 shows the Canny edge detector

for multiple galaxy images.

2.6 Data Mining and Classification

The canonical problem addressed in the field of data mining and classification is the following:

Given a very large family of vectors (signals, images, etc.) each of which lives in a high dimen-

sional space, how can the set be effectively represented this data for storage and retrieval, for

recognizing patterns within the images, and for classifying objects. In the subsequent sections, a

small subset of the tools used in statistics, data mining, and machine learning in astronomy will be

investigated to address the posed problem of the representation and classification of galaxy images.

2.6.1 Feature Extraction

A useful galaxy feature descriptor varies in value so that a classifier can discriminates between

input galaxies and place each galaxy into one of several classes. The shape, or morphologi-

cal, features used in this paper are described in [26, 31, 137] and are Elongation (E), Form Fac-

61

Page 76: JJenkinson_Thesis

(a) NGC 6070 original (b) NGC 6070 canny edge

(c) NGC 4460 original (d) NGC 4460 canny edge

(e) NGC 4283 original (f) NGC 4283 canny edge

Figure 2.27: Canny edge detection.

62

Page 77: JJenkinson_Thesis

tor (F), Convexity (C), Bounding-rectangle-to-fill-factor (BFF), Bounding-rectangle-to-perimeter

(BP), and Asymmetry Index (AI). Table ?? gives the average values of the original data for these

features.

Elongation has higher values for spiral and lenticular galaxies and lower values for irregular

and elliptical galaxies. This feature can be written as

E =(a− b)

(a + b)

where a is the major axis and b is the minor axis.

Form factor is useful in dividing spiral galaxies from other classes. This feature can be written

as

F =A

P 2

where A is the number of pixels in the galaxy and P is the number of pixels in the galaxy edge

found by canny edge detection.

Convexity has larger for spirals with open winding arms and lower values for compact galaxies

such as are in the class elliptical. This feature can be written as

C =P

(2H + 2W )

where P is as defined above and H and W are the height and width of minimum bounding rectangle

for the galaxy.

Bounding-rectangle-to-fill-factor is... This feature is defined as

BFF =A

HW

where A, H, and W are as defined above.

Bounding-rectangle-to-perimeter shows a decreasing trend from compact and circular galaxies

63

Page 78: JJenkinson_Thesis

Table 2.1: Morphological Feature Descriptions

Feature Formula

E (a−b)/(a+b) Has higher values for s

F A/P 2 Form factor is useful in dividing spiral gala

C P/(2H+2W ) Convexity has larger for spirals with open winding arms and lower v

BFF A/HW

BP HW/(2H+2W )2 Bounding-rectangle-to-perim

AIP

i,j |I(i,j)−I180(i,j)|/Pi,j |I(i,j)| The asymmetry index tends towa

Table 2.2: Feature Values Per Class

Feature Elliptical Lenticular Simple Spiral Barred Spiral Irregular

E 0.071 0.382 0.547 0.485 0.214

F 0.059 0.049 0.025 0.029 0.044

C 0.888 0.872 1.05 1.01 0.953

BFF 0.744 0.699 0.609 0.583 0.634

BP 0.062 0.052 0.043 0.048 0.059

AI 0.274 0.375 0.510 0.464 0.354

to open and edge-on galaxies. This feature can be written as

BP =HW

(2H + 2W )2

where H and W are as defined above.

The asymmetry index tends towards zero when the image is invariant under a 180 degree rota-

tion. This feature can be written as

AI =

∑i,j

|I(i, j)− I180(i, j)|∑i,j

|I(i, j)|

where I is the original image and I180 is the image rotated by 180 degrees.

2.6.2 Principal Component Analysis

Data may be highly correlated, but represented such that its axes are not aligned with the directions

in which the data varies the most. A data set generated by N observations with K measurements

64

Page 79: JJenkinson_Thesis

per observation lives in a K-Dimensional space, each dimension, or axis, representing a feature of

the data. To represent the data in a more compact form, the axes can be rotated to be collinear with

the directions of maximum variance in the data, thereby discriminating between the data points. In

other words, this rotation results in the first feature being collinear with the direction of maximum

variance, the second feature being orthogonal to the first and maximizing the residual variance,

and so on. This dimensionality reduction technique is called Principal Component Analysis (PCA),

also known as the Karhunen-Loéve transform or Hotelling transform, and is depicted in Figure 2.28

for a bivariate Gaussian distribution. Consider the data set xi with N observations and K features

Figure 2.28: PCA rotation of axes for a bivariate Gaussian distribution.

written as the N ×K matrix X. The covariance matrix of zero mean data is estimated as

CX =1

N − 1XT X

65

Page 80: JJenkinson_Thesis

where N is the dimension of the matrix and division by N−1 is necessary for CX to be an un-biased

estimate of the covariance matrix. Nonzero components in the off diagonal entries represent corre-

lation between the features, whereas zero components represent uncorrelated data. PCA transform

the original data into equivalent uncorrelated data so that the covariance matrix of the new data is

diagonal and the diagonal entries decrease from top to bottom. To achieve this, PCA attempts to

find a nonsingular matrix R which transforms X into such an ideal matrix. The data transforms to

Y = XR and its covariance estimate to

CY = RT XT XR = RT CXR

The first column r1 of R is the first principal component, and is along the direction of the data with

maximum variance. The columns of R which are called principal components form an orthonormal

basis of the data space. The first principal component r1 can therefore be derived using Lagrangian

multipliers and setting equal to zero the cost function φ(r1, λ) as

φ(r1, λ) = rT1 CXr1 − λ1(r

T1 r1− 1).

Settingδφ(ri, λ)

δri

set= 0 then gives

CXr1 − λ1r1 = 0 or CXr1 = λ1r1.

This shows that λ1 is an eigenvalue of the covariance matrix CX , i.e., a root of (CX − λ1I) = 0.

λ1 = rT1 CXr1 being the largest eigenvalue in CX equates to maximizing the variance along the

first principal component. The remaining principal components are derived in the same manner.

CX The matrix CY is the transformation of CX in the basis consisting of the columns of R,

the eigenvectors of CX . This comes to have the new basis, i.e., the columns of R have a basis, of

eigenvectors of CX . Since CX is symmetric by definition, the Spectral Theorem guarantees that

the eigenvectors of CX are orthogonal. These eigenvectors can be listed in any order and CY will

66

Page 81: JJenkinson_Thesis

remain diagonal. However, the requirement of PCA is to list them such that the diagonal entries of

CY be in decreasing order of their values, which comes to a unique order of the eigenvectors which

make the columns of R. The order of the components (or dimensions) is the so named rank-order

according to variance. With CX = RCY RT and these eigenvectors in this order, the set of principal

components is defined.

The morphological feature data described in 2.6.1 was reduced in dimension from 6 to 2 by

keeping the first two principal components for both the comparison of classification performance

with compressed data and visualization. All classification figures in the following sections were

generated from the classification of PCA features.

2.6.3 Support Vector Machines

The Support Vector Machine (SVM) learning algorithm captures the structure of a multi-class

training data set towards predicting class membership of unknown data with correctness and high

decision confidence. Classes are divided by a decision boundary or hyperplane defined by with the

minimum distance between the boundary and nearest point in each class defining the margins of

the boundary, which the SVM optimizes. Points that lie on the margin are called support vectors.

Consider a linear classifier for a binary classification problem with labels y, y ∈ {−1, 1}, and

features x. The classifier is written as

hw,b(x) = g(wT x + b),

and

g(z) =

⎧⎪⎨⎪⎩ 1 if z ≥ 0

−1 otherwise

67

Page 82: JJenkinson_Thesis

where w is the weight vector, and b is the bias of the hyperplane. Given a training example

(x(i), y(i)), the functional margin of (w, b) is defined with respect to the training example as

γ(i) = y(i)(wT x(i) + b).

If y(i) = 1, then wT (x(i) + b) need to be a large positive number for a large functional margin, and,

conversely, if y(i) = −1, then wT (x(i) + b) needs to be a large negative number. A large functional

margin represents a confident and correct prediction.

With the chosen g, if w and b are scaled by 2, the function margin is scaled by a factor of 2.

However, since g(wT x + b) = g(2wT x + 2b), no change would occur in hw,b(x). This shows that

hw,b(x) depends only on the sign, and not the magnitude, of g(wTx + b).

Given a training set S = {(x(i), y(i)); i = 1, 2, . . . , m}, the functional margin of (w, b) with

respect to S is defined as the smallest functional margin of the individual training examples and is

written as

γ = mini=1,...,m

γ(i).

Another type of margin is the geometric margin. Consider the training set in Figure 2.6.3.

The hyperplane defined by (w, b) is shown, along with vector w, which is normal to the hyper-

plane. Point A represents positive training example x(i) with label y(i) = 1. The geometric margin

of point A, γ(i), has distance of line segment AB. Point B is defined by x(i) − γ(i)w/||w||. Since

point B is on the decision boundary, which satisfies the equation wT x + b = 0, then

wT

(x(i) − γ(i) w

||w||)

+ b = 0.

Solving for γ(i) yields

γ(i) =wT x(i) + b

||w|| =

(w

||w||)T

x(i) +b

||w|| .

In general, the geometric margin of (w, b) with respect to any training example (x(i), y(i)) is given

68

Page 83: JJenkinson_Thesis

��

��

��

��

��

��

��

��

��

��

��

���

�����

w

�A

B γ(i)

��

���

Figure 2.29: Pictorial representation of the development of the geometric margin.

by

γ(i) = y(i)

((w

||w||)T

x(i) +b

||w||

).

Note that if ||w|| = 1, then the geometric margin equals the functional margin. Additionally, the

geometric margin is invariant to scaling the parameters w and b.

Given a training set S = {(x(i), y(i)); i = 1, 2, . . . , m}, the geometric margin of (w, b) with

respect to S is defined as the smallest geometric margin of the individual training examples and is

written as

γ = mini=1,...,m

γ(i).

Assuming the training data is linearly separable, the problem of determining the boundary decision

that maximizes the geometric margin is posed as the follow optimization problem

maxγ,w,b

γ subject to y(i)(wT x(i) + b) ≥ γ, i = 1, 2, . . . , m and ||w|| = 1.

The ||w|| = 1 constraint is non-convex. To work towards recasting the optimization problem as

convex, first recall that γ = γ/||w||. With this relation, the problem can then be written as an

69

Page 84: JJenkinson_Thesis

��

��

��

��

��

��

��

��

��

��

��

����

����

����

����

����

����

��

����

����

����

����

����

��

Figure 2.30: Maximum geometric margin.

optimization of the functional margin that achieves the geometric margin optimization:

maxγ,w,b

γ

||w|| subject to y(i)(wTx(i) + b) ≥ γ, i = 1, 2, . . . , m.

Again, the object functionγ

||w|| is non-convex, and the problem cannot be solved by standard

optimization software.

Recall that w and b can be scaled without affecting the decision of our classifier. The scaling

constraint that the functional margin of (w, b, ) with respect to the training set must be 1 is intro-

duced, γ = 1.γ

||w|| then becomes1

||w|| , and since maximizing1

||w|| is equivalent to minimizing

||w||, the geometric margin convex optimization problem is then posed as

minγ,w,b

1

2||w||2 subject to y(i)(wTx(i) + b) ≥ 1, i = 1, 2, . . . , m.

Which can be solved by the commercial quadratic programming (QP) code. Figure ?? illustrates

the geometric margin for a training set.

Whereas the previous problem is referred to as the primal form, optimization theory tells of

70

Page 85: JJenkinson_Thesis

a dual form for expressing the primal problem. Constructing the Lagrangian for the optimization

problem gives

L(w, b, α) =1

2||w||2 −

m∑i=1

αi[y(i)(wT x(i) + b)− 1]. (2.5)

To find the dual form of the problem, L(w, b, α) is minimized with respect to w and b for fixed α.

Setting the derivatives of L with respect to w and b to zero gives

�wL(w, b, α) = w −m∑

i=1

αiy(i)x(i) = 0

which implies that

w =m∑

i=1

αiy(i)x(i) = 0. (2.6)

Take the derivative with respect to b gives

δ

δbL(w, b, α) =

m∑i=1

αiy(i) = 0. (2.7)

Substituting the definition of w in (2.2) into the Lagrangian in (2.1) yields

L(w, b, α) =

m∑i=1

αi − 1

2

m∑i,j=1

y(i)y(j)αiαj(x(i))Tx(j) − b

m∑i=1

αiy(i),

but from (3) the last term is equal to zero, which gives

L(w, b, α) =m∑

i=1

αi − 1

2

m∑i,j=1

y(i)y(j)αiαj(x(i))Tx(j).

This result, along with the constraints αi ≥ 0 and (10), the following dual optimization problem is

obtained

maxα

W (α) =m∑

i=1

αi − 1

2

m∑i,j=1

y(i)y(j)αiαj〈x(i), x(j)〉

subject to αi ≥ 0, i = 1, 2, . . . , m, and

m∑i=1

αiy(i) = 0.

71

Page 86: JJenkinson_Thesis

Suppose the model’s parameters have been fit to a training set. The task now is to predict class

membership of a new point input x by calculating wT x + b and, if this quantity is grater than zero,

predict y = 1. Using the expression for w in (2.2), this calculation can be written

wTx + b =

(m∑

i=1

αiy(i)x(i)

)T

x + b (2.8)

g(z) =m∑

i=1

αiy(i)〈x(i), x〉+ b (2.9)

where the points x(i) for which αi �= 0 are the support vectors.

So far, the assumption for the data is linear separability. In application this assumption is

relaxed by the introduction of slack variables ξi leading to the primal minimization formulation

minγ,w,b

1

2||w||2 subject to y(i)(wTx(i) + b) ≥ 1− ξi, i = 1, 2, . . . , m.

with the following constraints limiting the amount of slack

ξi ≥ 0 and∑

i

ξi ≤ C.

Therefore, misclassification is bounded in quantity by C .

Finally, the SVM optimization is equivalent to minimizing

m∑i=1

(1− y(i)g(xi))+ + λ||w||2, (2.10)

where λ is related to the misclassification bound C and the index + indicates x+ = max(0, x).

Figure 2.31 shows the SVM decision boundary computed for the data of 15 galaxies having

class membership to either class Irregular or Regular. SVM maps data from the input space Υ to

a feature space F using a nonlinear function φ : Υ → F called a kernel so that the discriminant

72

Page 87: JJenkinson_Thesis

5 10 15 20 25 30−3

−2

−1

0

1

2

3

4

5

6

I (training)I (classified)R (training)R (classified)Support Vectors

Figure 2.31: SVM applied to galaxy data.

function becomes

hw,b(x) = wT φ(x) + b. (2.11)

Many kernel functions are possible, and the present work has used the quadratic kernel

K(x, x′) = (xT x′ + 1)d. (2.12)

2.7 Results and Discussion

The galaxy data used in this classification is listed in Table 2.3. The name of each galaxy is given

along with its corresponding classification obtained from the NASA/IPAC Extragalactic Database

(NED) and the relation between the NED classification and the scheme used in the present work.

Only the major galaxy classes Elliptical "E," Lenticular "S0," Spiral "S," Barred Spiral "SB," and

73

Page 88: JJenkinson_Thesis

Irregular "Irr" were used in classification. All subclasses listed in the table below such as Sa, Sd,

SBm, etc... were, in the SVM training and validation, generalized to belong to their respective

major class. Galaxy NGC 4457 has NED classification SAB0/a(s), which is interpreted as either

S0 or SBa for compliance with the present scheme, and was judicially assigned to class barred

spiral (SB) due to similarities between the feature values of NGC 4457 and the SB class. Galaxy

NGC 4144 has NED classification SAB(s)cd? edge-on and was not used in classification since a

definite relation to the present classification scheme was unable to be determined.

Table 2.3: Galaxy list and relation between NED classification and current project classification

Galaxy name N.E.D. Class Present Work Class

NGC 4278 E1-2 E

NGC 4283 E0 E

NGC 4308 E? E

NGC 5813 E1-2 E

NGC 5831 E3 E

NGC 5846 E0-1 E

NGC 5846A compact E2+ E

NGC 4346 S0 edge-on S0

NGC 4460 SB0ˆ+(s)? edge-on S0

NGC 4251 SB0? edge-on S0

NGC 4220 SA0ˆ+(r) S0

NGC 4346 S0 edge-on S0

NGC 4324 SA0ˆ+(r) S0

NGC 5854 SB0 S0

NGC 5838 SA0ˆ- S0

NGC 5839 SAB0ˆ0?(rs) S0

NGC 5864 SB0ˆ0(s)? edge-on S0

74

Page 89: JJenkinson_Thesis

Table 2.3: Continued

NGC 5865 SAB0ˆ- S0

NGC 5868 SAB0ˆ- S0

NGC 4310 SAB0ˆ+(r) S0

NGC 4218 Sa? Sa

NGC 4217 Sb edge-on Sb

NGC 4100 SA(rs)bc Sb/Sc

UGC 10288 Sc: edge-on Sc

NGC 6070 SA(s)cd Sc/Sd

UGC 07617 Sd Sd

NGC 4457 SAB0/a(s) (S0)/SBa

NGC 4314 SB(rs)a SBa

NGC 4274 SB(r)ab SBa/SBb

NGC 4448 SB(r)ab SBa/SBb

NGC 4157 SAB(s)b? edge-on SBb

NGC 5850 SB(r)b SBb

NGC 5806 SAB(s)b SBb

NGC 4232 SBb pec? SBb

NGC 4088 SAB(rs)bc SBb/SBc

NGC 4258 (Messier 106) SAB(s)bc SBb/SBc

NGC 4527 SAB(s)bc SBb/SBc

NGC 4389 SB(rs)bc pec? SBb/SBc

NGC 4496 SBc SBc

NGC 4085 SAB(s)c SBc

NGC 4096 SAB(rs)c SBc

NGC 4480 SAB(s)c SBc

75

Page 90: JJenkinson_Thesis

Table 2.3: Continued

UGC 10133 SAB(r)c SBc

NGC 4559 SAB(rs)cd SBc/SBd

NGC 4242 SAB(s)dm SBd

NGC 4393 SABd SBd

NGC 4288 SB(s)dm SBd/SBm

NGC 3985 SB(s)m SBm

NGC 4449 IBm Irr

UGC 07408 IAm Irr

UGC 07577 Im Irr

UGC 07639 Im Irr

UGC 07690 Im Irr

NGC 4496B IB(s)m Irr

NGC 4144 SAB(s)cd? edge-on not used

The classification scheme used in this project is a subset of Hubble’s classification scheme;

galaxies are assigned to 1 of the 5 major classes: Elliptical "E," Lenticular "S0," Spiral "S," Barred

Spiral "SB," and Irregular "Irr." Classification was performed by two classes at a time using Sup-

port Vector Machines (SVM) in Matlab’s Statistical Toolbox with both a linear and quadratic kernel

and default parameters. The Matlab functions svmtrain and svmclassify were used to train

the classifiers and and perform validation, respectively. The pairs used at each iteration is shown

in Figure 2.32. The idea was to iteratively perform classification between the whole remaining set

and a single class, removing the classified set from the remaining whole in the next iteration. The

training and validation sets are separated such that approximately one third of the data was used

for validation while the remainder was used for training. The extracted feature data is was listed

in a spreadsheet and sorted by class starting from elliptical to lenticular through barred spirals and

76

Page 91: JJenkinson_Thesis

Iteration 1 Irregular

Elliptical

Lenticular

Simple Spiral Barred Spiral

Spiral

Not Elliptical

Regular

Iteration 2

Iteration 3

Iteration 4

Figure 2.32: Classification iteration class pairs.

irregular. The bottom one third of each class was reserved fro validation while the top one third

was used for training. This process was a single-fold validation.

For Iteration 1, galaxies were assigned to the Irregular or Regular class. Training was per-

formed on a set of 40 galaxies consisting of 5 irregular and 35 regular, with class membership

ranging from elliptical to spiral and barred spiral. All 6 morphic features were used in the training.

The validation set contained 15 galaxies: 1 irregular and 14 regular. Of the validation set, 7/15

galaxies were classified correctly giving an accuracy of 46.6667%. Principal component analysis

(PCA) was applied to the training and validation sets, and the data was projected onto the first two

principal components. Using the reduced data as input the SVM yielded a classification accuracy

of 13.3333%. Using the quadratic kernel for the SVM classification yielded 13/15 (86.6667%) and

12/15 (80%) accuracy for 6 and 2 features, respectively. Figure 2.33 shows classification in the

PCA feature space for each kernel. The legend indicates the symbols Irregular (I) and Regular (R).

For all subsequent classification of un-enhanced galaxy images the irregular class was removed

from the training and validation sets.

The next pair of classes used for SVM training is Elliptical and Not Elliptical. The label vector

77

Page 92: JJenkinson_Thesis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

I (training)I (classified)R (training)R (classified)Support Vectors

(a) linear kernel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

I (training)I (classified)R (training)R (classified)Support Vectors

(b) quadratic kernel

Figure 2.33: PCA feature space iteration 1 classification.

used was binary with entries 1 for Elliptical and 0 for Not Elliptical. The training set consisted of

34 galaxies: 5 elliptical and 29 galaxies belonging to classes lenticular, spiral and barred spiral.

The validation set contained 15 galaxies: 2 elliptical and 13 others. All 6 morphic features were

used. Classification accuracy was 13/15 (86.6667%). PCA was applied to the data set and the data

was projected onto the first two principal components. Classification in the reduced feature space

was 12/15 correctly classified galaxies or 80% accuracy. Using the quadratic kernel for the SVM

classification yielded 3/15 (20%) accuracy for both sets of 6 and 2 features. Figure 2.34 shows

classification in the PCA feature space for each kernel. The legend indicates the symbols Elliptical

(1) and Not Elliptical (0).

Elliptical galaxies were then removed from the training and test sets for all subsequent classi-

fication of un-enhanced galaxy images.

Lenticular and Spiral are the next two classes to be trained by the SVM. The training set con-

sisted of 9 lenticular galaxies and 20 spiral galaxies, while the test set consisted of 4 lenticular and

9 spiral galaxies. All 6 morphic features were used. Classification accuracy was 11/13 (84.6154%)

and 8/13 (61.5385%) with the linear and quadratic kernels, respectively. Classification accuracy of

the two PCA features with the linear and quadratic kernels respectively was 9/13 (69.2308%) and

3/13 (23.0769%). Figure 2.35 shows classification in the PCA feature space for each kernel. The

78

Page 93: JJenkinson_Thesis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.34: PCA feature space iteration 2 classification.

legend indicates the symbols Lenticular (1) and Spiral (0). Lenticular galaxies were then removed

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.35: PCA feature space iteration 3 classification.

from the training and test sets for all subsequent classification of un-enhanced galaxy images.

The final categories to be trained by SVM are Simple Spiral, which are referred to as spiral,

and Barred Spiral. The training set contained 5 simple spirals and 15 barred spirals. Validation

was performed by 2 simple and 7 barred spirals. All 6 morphic features were used. The SVM clas-

sified 7/9 galaxies correctly giving 77.7778% accuracy. After PCA, 2/9 galaxies were classified

correctly giving 22.2222% accuracy. Using the quadratic kernel for the SVM classification yielded

79

Page 94: JJenkinson_Thesis

8/9 (88.8889%) and 2/9 (22.2222%) accuracy for 6 and 2 features, respectively. Figure 2.36 shows

classification in the PCA feature space for each kernel. The legend indicates the symbols Simple

Spiral (1) and Barred Spiral (0). Classification was then performed for the Heap transform en-

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.36: PCA feature space iteration 4 classification.

hanced galaxy image set. Table 2.4 summarizes the classification results for both the original and

enhanced data, 6 and 2 features, and linear and quadratic kernels. Figures 2.37, 2.38, 2.39, 2.40

show the classification results of the PCA feature space for the enhanced data set. The total classifi-

cation accuracy for the original data was 51.570% and for the enhanced data was 64.494% giving

an overall improvement in classification performance by galaxy image enhancement of 12.924%.

80

Page 95: JJenkinson_Thesis

Classification Results

Linear Kernel

Original Data 6 Features 2 PCA Features

Irregular/Regular 7/15 (46.6667%) 2/15 (13.3333%)

Elliptical/Not Elliptical 13/15 (86.6667%) 3/15 (20%)

Lenticular/Spiral 11/13 (84.6154%) 9/13 (69.2308%)

Spiral/Barred Spiral 7/9 (77.7778%) 2/9 (22.2222%)

Enhanced Data

Irregular/Regular 4/15 (26.6667%) 2/15 (13.3333%)

Elliptical/Not Elliptical 11/15 (73.3333%) 10/15 (66.6667%)

Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%)

Spiral/Barred Spiral 8/9 (88.8889%) 7/9 (77.7778%)

Quadratic Kernal

Original Data 6 Features 2 PCA Features

Irregular/Regular 13/15 (86.6667%) 12/15 (80%)

Elliptical/Not Elliptical 10/15 (66.6667%) 3/15 (20%)

Lenticular/Spiral 8/13 (61.5385%) 3/13 (23.0769%)

Spiral/Barred Spiral 4/9 (44.4444%) 2/9 (22.2222%)

Enhanced Data

Irregular/Regular 12/15 (80%) 0/15 (0%)

Elliptical/Not Elliptical 12/15 (80%) 13/15 (86.6667%)

Lenticular/Spiral 11/13 (84.6154%) 9/13 (60%)

Spiral/Barred Spiral 6/9 (66.6667%) 6/9 (66.6667%)

Table 2.4: Summary of classification results for original and enhanced data. Accuracy improved

by 12.924% due to enhancement.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

I (training)I (classified)R (training)R (classified)Support Vectors

(a) linear kernel

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

I (training)I (classified)R (training)R (classified)Support Vectors

(b) quadratic kernel

Figure 2.37: PCA feature space iteration 1 classification of enhanced data.

81

Page 96: JJenkinson_Thesis

0 0.5 1 1.5 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0 0.5 1 1.5 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.38: PCA feature space iteration 2 classification of enhanced data.

0 0.5 1 1.5 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0 0.5 1 1.5 2−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.39: PCA feature space iteration 3 classification of enhanced data.

2.8 Future Work

Improve the segmentation scheme to capture more accurately the shape of the galaxies. Extend the

classification scheme to include classes Sa, Sb, Sc, SBa, SBb, SBc, SBd, SBm and the elliptical

subclasses E0, . . . , E7. Use a sparse dictionary to perform classification of image data. Download

a data set from the CDS Strausburg to increase the size of training and validation sets. 5-fold and

10-fold cross validation for classification.Implement classification procedures in python. Develop

graphical user interface for user driven or automated classification software.

82

Page 97: JJenkinson_Thesis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(a) linear kernel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 (training)0 (classified)1 (training)1 (classified)Support Vectors

(b) quadratic kernel

Figure 2.40: PCA feature space iteration 4 classification of enhanced data.

83

Page 98: JJenkinson_Thesis

Appendix A: PROJECT SOFTWARE

A list and brief description of the Matlab codes used in the paper is below,

• galaxy_processing.m: preprocessing and feature extraction as delineated in sections 2.5 and

2.6.1.

• centroid.m: calculates the center of brightness of the galaxy image to be used for shifting the

image by the centroid.

• galaxy_shift.m: shifts the galaxy image so that the center of brightness and the image center

are coincident.

• secondmoment.m: calculates the second moments of the galaxy image and the angle between

the first second moment and the vertical axis of the image.

• calculateEllipse: calculates and plots an ellipse by the centroid, ellipse axes and angle of

rotation on the galaxy image.

• classification_Irr_Reg.m: original data classification for classes irregular and regular. (gen-

erated figure 2.33)

• classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.34)

• classification_S0_S.m: " " lenticular and spiral. (generated figure 2.35)

• classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.36)

• heap_classification_Irr_Reg.m: enhanced data classification for classes irregular and regular.

(generated figure 2.37)

• heap_classification_E_NE.m: " " elliptical and not elliptical. (generated figure 2.38)

• heap_classification_S0_S.m: " " lenticular and spiral. (generated figure 2.39)

• heap_classification_S_SB.m: " " simple spiral and barred spiral. (generated figure 2.40)

84

Page 99: JJenkinson_Thesis

A.1 Preprocessing and Feature Extraction codes

% call: galaxy_processing.m

%

% Background subtraction by thresholding. Threshold is determined

% by either manual inspection of threshold image iterations of the

% histogram levels or Otsu's method. Star/object removal by morphological

% opening. Shift image so galaxy centroid and image center are coincident.

% Galaxy rotation by angle between 2nd moment and vertical image axis.

% Crop and resize image to 128x128. Edge detection and calculate best fit

% ellipse for use in feature extraction by 6 morphological features.

%% Read image in

A=imread('AC8431_NGC3985.tif');

[N M L]=size(A);

A=A(:,:,1);

%% Find best threshold

H=imhist(A,65535); %imhist(A,65535) for uint16 images

figure;

subplot(2,2,1)

imshow(A)

subplot(2,2,[3,4])

plot(H)

% x1=1*10^4; % for uint16 images

for i=50:200

subplot(2,2,[3,4])

hold on;

T=i;

xx=[T T];

yy=[0 H(T)];

hline=line(xx,yy);

set(hline,'Color',[1 0 0]);

htext=text(T-5, H(T),'T');

set(htext,'Color',[1 0 0]);

85

Page 100: JJenkinson_Thesis

Ab=(A>T);

subplot(2,2,2)

imshow(Ab)

ss=sprintf('Thresholding by %g',T);

stitle=title(ss);

pause(.1)

delete(htext); delete(hline);

end

% Thresholding

bw=im2bw(A,19200/65535);

bw=1-bw;

bw2=bwareaopen(bw,256);

% cc=bwconncomp(bw2); %use for more than 1 object

% L=labelmatrix(cc);

% L(L~=2)=0;

% L=double(L);

% imshow(L,[])

% X=double(A);

% g=L.*X;

imshow(bw2,[]); %colormap(gray(65535))

X=double(A);

g=bw2.*X;

imshow(g,[]); %colormap(gray(65535))

%% Shifting image by centroid

[xc,yc]=centroid(g,1);

Y=galaxy_shift(g,xc,yc);

%% Rotate image by angle defined by 2nd moments

[m11,m20,m02]=secondmoment(g);

theta=(1/2)*atan2(2*m11,m20-m02);

alpha=theta*(180/pi);

gr=imrotate(g,angle);

imshow(gr,[])

86

Page 101: JJenkinson_Thesis

% Crop galaxy

% reduce size of rotated galaxy by size(gr)/n, n=1,2,...

% use the reduced size to compose a new image I which contains

% the galaxy.

I=imcrop(gr,[102 214 129 125]);

gs=imresize(I,[128 128]);

imshow(gs,[])

[N M L]=size(gs);

%% Calculating morphics features

bs=im2bw(gs);

p=regionprops(bs,'all');

p=p(1);

xc=p.Centroid(1);

yc=p.Centroid(2);

a=p.MajorAxisLength/2;

b=p.MinorAxisLength/2;

BBox=round(p.BoundingBox);

[X,Y]=calculateEllipse(p.Centroid(1),p.Centroid(2),a,b,0);

% Edge detection

[gCanny, gt]=edge(gs,'canny',[0.3 .9], 0.5);

imshow(gCanny)

G=find(gCanny>0);

figure;

imshow(gs,[]); hold on;

plot(X,Y,'b*');

rectangle('Position',p.BoundingBox,'EdgeColor','r')

plot(G,'g-');

% Elongation: (a-b)/(b+a).

Elongation=(a-b)/(b+a)

% Form Factor: ratio of the area of the galaxy

% (number of pixels in the galaxy) to its perimeter

% (number of pixels in canny edge detection).

87

Page 102: JJenkinson_Thesis

numpixels_galaxy=0;

for n=1:N

for m=1:M

if(gs(n,m)~=0)

numpixels_galaxy=numpixels_galaxy+1;

end

end

end

numpixels_perimeter=numel(find(gCanny>0));

Formfactor=numpixels_galaxy/numpixels_perimeter

% Convexity: ratio of the galaxy perimeter to the

% perimeter of the minimum bounding rectangle.

% imshow(A) %show bounding rectangle superimposed on galaxy.

% rectangle('position',[xmin ymin width height],'EdgeColor','r');

rectangle_perimeter=2*BBox(3)+2*BBox(4);

Convexity=numpixels_perimeter/rectangle_perimeter

%Bounding-rectangle-to-fill-factor (BFF): area of the bounding rectangle

%to the number of pixels within the rectangle.

rectangle_area=BBox(3)*BBox(4);

L1=BBox(1);

W1=BBox(2);

L=BBox(1)+BBox(3);

W=BBox(2)+BBox(4);

numpixels_bounding_box=0;

for n=L1:L

for m=W1:W

numpixels_bounding_box=numpixels_bounding_box+1;

end

end

BFF=rectangle_area/numpixels_bounding_box

% Bounding-rectangle-to-perimeter: area of the bounding rectangle

% to the number of pixels included in the perimeter.

88

Page 103: JJenkinson_Thesis

Bounding_rectangle_to_perimeter=rectangle_area/rectangle_perimeter

% Asymmetry index: taking the difference between the galaxy image

% and the same image rotated 180 degrees about the center of the galaxy.

% The sum of the absolute value of the pixels in the difference image

% is divided by the sum of pixels in the original image to give the

% asymmetry parameter.

gs_rotated=imrotate(gs,180);

difference_image=gs-gs_rotated;

Asymmetry_index=sum(sum(abs(difference_image)))/sum(sum(gs))

%===============================================================% call: centroid.m

%

% calculate the first moment of an image. centroid(X,I) calculates

% the centroid for binary or grayscale image X. If X is binary, I=0.

% If X is intensity image, I=1.

% John Jenkinson, Dr. Artyom Grigoryan, ECE UTSA 2014.

function[xc,yc]=centroid(X,I)

[N M L]=size(X);

X=double(X(:,:,1));

xbar=0; ybar=0;

for n=1:N

for m=1:M

a=X(n,m);

xbar = xbar + n*a;

ybar = ybar + m*a;

end

end

if(I==1)

ss=sum(X(:)); %faster than sum(sum(X)) for type double

else if(I==0)

ss=N*M;

end

89

Page 104: JJenkinson_Thesis

end

xc=round(xbar/ss); yc=round(ybar/ss);

end

%===============================================================% call: galaxy_shift.m

%

% Shift the center of brightness to the image center.

% John Jenkinson, ECE UTSA 2014.

function[Y]=galaxy_shift(g,xc,yc)

[N M L]=size(g);

Y=zeros(N,M);

if(N/2-yc<0 & M/2-xc<0)

Y(1:N+(N/2-yc),1:M+(M/2-xc))=g(1-(N/2-yc):N,1-(M/2-xc):M);

else if(N/2-yc<0 & M/2-xc>0)

Y(1:N+(N/2-yc),1+(M/2-xc):M)=g(1-(N/2-yc):N,1:M-(M/2-xc));

else if(N/2-yc>0 & M/2-xc<0)

Y(1+(N/2-yc):N,1:M+(M/2-xc))=g(1:N-(N/2-yc),1-(M/2-xc):M);

else if(N/2-yc>0 & M/2-xc>0)

Y(1+(N/2-yc):N,1+(M/2-xc):M)=g(1:N-(N/2-yc),1:M-(M/2-xc));

end

end

end

end

end

%===============================================================% call: secondmoment.m

%

% Let say you have the image A(n,m) of galaxy of size NxM the moment mu(11)

% is calculated as follows:

% by Art Grigoryan edited by John Jenkinson

function [m11,m20,m02]=secondmoment(A)

[N,M]=size(A);

90

Page 105: JJenkinson_Thesis

m11=0;

m20=0;

m02=0;

for n=0:N-1

n1=n+1;

for m=0:M-1

a=A(n1,m+1);

ma=m*a;

na=n*a;

m11=m11+n*ma;

m20=m20+n*na;

m02=m02+m*ma;

end

end

if(islogical(A)==1)

% normalization

ss=N*M;

m11=m11/ss;

m20=m20/ss;

m02=m02/ss;

else

% normalization

ss=sum(sum(A));

m11=round(m11/ss);

m20=round(m20/ss);

m02=round(m02/ss);

end

end

%===============================================================% call: calculateEllipse.m

%

% calculate points to draw an ellipse

91

Page 106: JJenkinson_Thesis

function [X,Y] = calculateEllipse(x, y, a, b, angle, steps)

% x coordinate

% y coordinate

% semimajor axis

% semiminor axis

% angle of the ellipse (in degrees)

narginchk(5, 6);

if nargin<6, steps = 36; end

beta = -angle * (pi / 180);

sinbeta = sin(beta);

cosbeta = cos(beta);

alpha = linspace(0, 360, steps)' .* (pi / 180);

sinalpha = sin(alpha);

cosalpha = cos(alpha);

X = x + (a * cosalpha * cosbeta - b * sinalpha * sinbeta);

Y = y + (a * cosalpha * sinbeta + b * sinalpha * cosbeta);

if nargout==1, X = [X Y]; end

end

A.2 SVM Classification codes with data

A.2.1 Original data

training=[0.2379 0.031 1.1141 0.6371 0.0604 0.338

0.2066 0.0623 0.8261 0.7143 0.0595 0.7111

0.3681 0.0423 0.8803 0.586 0.0559 0.1604

0.1589 0.0275 1.1492 0.5895 0.0617 0.2602

0.2876 0.058 0.8281 0.6792 0.0586 0.3558

0.0577 0.059 0.8803 0.7329 0.0624 0.2386

0.0175 0.0585 0.9 0.7582 0.0624 0.1724

0.054 0.0497 0.9521 0.7206 0.0625 0.1144

0.0316 0.0767 0.7955 0.7769 0.0625 0.2979

0.1817 0.0733 0.7895 0.75 0.0609 0.303

92

Page 107: JJenkinson_Thesis

0.5137 0.0393 0.8707 0.651 0.0458 0.4838

0.5666 0.0372 0.9038 0.6854 0.0444 0.1155

0.3609 0.0482 0.8878 0.6913 0.055 0.0932

0.6616 0.0284 0.9259 0.6455 0.0377 0.2113

0.3547 0.047 0.871 0.6524 0.0546 0.219

0.4334 0.0457 0.8917 0.6918 0.0525 0.1033

0.461 0.0428 0.8625 0.6395 0.0498 0.5098

0.2049 0.0629 0.84 0.74 0.06 0.2342

0.1287 0.0718 0.8158 0.7841 0.0609 0.2609

0.5203 0.032 0.9405 0.625 0.0454 0.64

0.1891 0.0412 1.011 0.6946 0.0606 0.1607

0.7442 0.0141 1.0774 0.5336 0.0306 0.5857

0.6239 0.0347 0.9126 0.7114 0.0406 0.4327

0.411 0.0415 0.9273 0.6789 0.0525 0.2029

0.7521 0.0145 1.0337 0.497 0.0312 1.0387

0.1306 0.0462 0.9239 0.6423 0.0614 0.3473

0.5882 0.0291 0.9153 0.5535 0.0441 0.1941

0.4012 0.0192 1.3214 0.6382 0.0527 0.4836

0.5257 0.0356 0.9414 0.7045 0.0448 0.1554

0.7802 0.0155 1 0.5871 0.0264 0.2839

0.5228 0.0184 1.2409 0.5717 0.0496 0.3557

0.505 0.0305 0.9783 0.5842 0.0499 0.3856

0.4325 0.0322 0.8718 0.4838 0.0506 0.443

0.5556 0.0282 0.8941 0.5242 0.0429 0.6762

0.521 0.0233 1.119 0.6573 0.0443 0.3256

0.453 0.044 0.8519 0.6118 0.0521 0.2796

0.7246 0.0227 0.9924 0.6446 0.0347 0.7385

0.5626 0.0248 1.115 0.6618 0.0466 0.9537

0.6077 0.0318 0.9091 0.657 0.04 0.5912

0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];

%NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690

%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831

93

Page 108: JJenkinson_Thesis

%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346

%NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864

%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157

%NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258

%NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096

Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R'];

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(training,Y,'kernel_function',...

'quadratic','showplot',true);

test=[0.0284 0.044 0.9241 0.6016 0.0625 0.299

0.0474 0.0469 0.9684 0.705 0.0624 0.3738

0.1105 0.0548 0.9314 0.7682 0.0619 0.4194

0.1687 0.0637 0.8448 0.75 0.0606 0.1961

0.1563 0.0692 0.85 0.8333 0.06 0.2

0.4373 0.051 0.8421 0.7037 0.0514 1.6172

0.3642 0.0147 1.5035 0.603 0.055 0.5245

0.7489 0.0199 0.9471 0.5502 0.0324 0.6252

0.304 0.0258 1.1216 0.5528 0.0588 0.4888

0.2894 0.0161 1.3588 0.4942 0.06 0.6418

0.6478 0.0129 1.3286 0.5558 0.0411 0.6026

0.3865 0.0333 0.9158 0.5154 0.0541 0.4956

0.3934 0.0403 0.8406 0.5123 0.0556 0.4945

0.484 0.0319 0.9286 0.4949 0.0556 0.5979

0.2565 0.0618 0.7857 0.6361 0.06 0.3743];

%test set is first row irregular, remaining regular.

%NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310

%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

94

Page 109: JJenkinson_Thesis

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,test,'showplot',true);

%===============================================================%elliptical versus not elliptical

training=[0.0577 0.059 0.8803 0.7329 0.0624 0.2386

0.0175 0.0585 0.9 0.7582 0.0624 0.1724

0.054 0.0497 0.9521 0.7206 0.0625 0.1144

0.0316 0.0767 0.7955 0.7769 0.0625 0.2979

0.1817 0.0733 0.7895 0.75 0.0609 0.303

0.5137 0.0393 0.8707 0.651 0.0458 0.4838

0.5666 0.0372 0.9038 0.6854 0.0444 0.1155

0.3609 0.0482 0.8878 0.6913 0.055 0.0932

0.6616 0.0284 0.9259 0.6455 0.0377 0.2113

0.3547 0.047 0.871 0.6524 0.0546 0.219

0.4334 0.0457 0.8917 0.6918 0.0525 0.1033

0.461 0.0428 0.8625 0.6395 0.0498 0.5098

0.2049 0.0629 0.84 0.74 0.06 0.2342

0.1287 0.0718 0.8158 0.7841 0.0609 0.2609

0.1891 0.0412 1.011 0.6946 0.0606 0.1607

0.7442 0.0141 1.0774 0.5336 0.0306 0.5857

0.6239 0.0347 0.9126 0.7114 0.0406 0.4327

0.411 0.0415 0.9273 0.6789 0.0525 0.2029

0.7521 0.0145 1.0337 0.497 0.0312 1.0387

0.1306 0.0462 0.9239 0.6423 0.0614 0.3473

0.5882 0.0291 0.9153 0.5535 0.0441 0.1941

0.4012 0.0192 1.3214 0.6382 0.0527 0.4836

0.5257 0.0356 0.9414 0.7045 0.0448 0.1554

0.7802 0.0155 1 0.5871 0.0264 0.2839

0.5228 0.0184 1.2409 0.5717 0.0496 0.3557

0.505 0.0305 0.9783 0.5842 0.0499 0.3856

95

Page 110: JJenkinson_Thesis

0.4325 0.0322 0.8718 0.4838 0.0506 0.443

0.5556 0.0282 0.8941 0.5242 0.0429 0.6762

0.521 0.0233 1.119 0.6573 0.0443 0.3256

0.453 0.044 0.8519 0.6118 0.0521 0.2796

0.7246 0.0227 0.9924 0.6446 0.0347 0.7385

0.5626 0.0248 1.115 0.6618 0.0466 0.9537

0.6077 0.0318 0.9091 0.657 0.04 0.5912

0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];

%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460

%NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838

%NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850

%NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527

%NGC 4389,NGC 4496,NGC 4085,NGC 4096

% 1 for Elliptical 0 for Not Elliptical

Y=[1 1 1 1 ...

1 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0];

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(reduced_training,Y,'kernel_function',...

'quadratic','showplot',true);

% 'kernal_function','quadratic'

test=[0.0474 0.0469 0.9684 0.705 0.0624 0.3738

0.1105 0.0548 0.9314 0.7682 0.0619 0.4194

0.5203 0.032 0.9405 0.625 0.0454 0.64

0.1687 0.0637 0.8448 0.75 0.0606 0.1961

0.1563 0.0692 0.85 0.8333 0.06 0.2

96

Page 111: JJenkinson_Thesis

0.4373 0.051 0.8421 0.7037 0.0514 1.6172

0.3642 0.0147 1.5035 0.603 0.055 0.5245

0.7489 0.0199 0.9471 0.5502 0.0324 0.6252

0.304 0.0258 1.1216 0.5528 0.0588 0.4888

0.2894 0.0161 1.3588 0.4942 0.06 0.6418

0.6478 0.0129 1.3286 0.5558 0.0411 0.6026

0.3865 0.0333 0.9158 0.5154 0.0541 0.4956

0.3934 0.0403 0.8406 0.5123 0.0556 0.4945

0.484 0.0319 0.9286 0.4949 0.0556 0.5979

0.2565 0.0618 0.7857 0.6361 0.06 0.3743];

%NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868,NGC 4310

%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,reduced_test,'showplot',true);

%===============================================================%lenticular versus spiral

clear all; close all; clc

training=[0.5137 0.0393 0.8707 0.651 0.0458 0.4838

0.5666 0.0372 0.9038 0.6854 0.0444 0.1155

0.3609 0.0482 0.8878 0.6913 0.055 0.0932

0.6616 0.0284 0.9259 0.6455 0.0377 0.2113

0.3547 0.047 0.871 0.6524 0.0546 0.219

0.4334 0.0457 0.8917 0.6918 0.0525 0.1033

0.461 0.0428 0.8625 0.6395 0.0498 0.5098

0.2049 0.0629 0.84 0.74 0.06 0.2342

0.1287 0.0718 0.8158 0.7841 0.0609 0.2609

0.1891 0.0412 1.011 0.6946 0.0606 0.1607

0.7442 0.0141 1.0774 0.5336 0.0306 0.5857

0.6239 0.0347 0.9126 0.7114 0.0406 0.4327

0.411 0.0415 0.9273 0.6789 0.0525 0.2029

97

Page 112: JJenkinson_Thesis

0.7521 0.0145 1.0337 0.497 0.0312 1.0387

0.1306 0.0462 0.9239 0.6423 0.0614 0.3473

0.5882 0.0291 0.9153 0.5535 0.0441 0.1941

0.4012 0.0192 1.3214 0.6382 0.0527 0.4836

0.5257 0.0356 0.9414 0.7045 0.0448 0.1554

0.7802 0.0155 1 0.5871 0.0264 0.2839

0.5228 0.0184 1.2409 0.5717 0.0496 0.3557

0.505 0.0305 0.9783 0.5842 0.0499 0.3856

0.4325 0.0322 0.8718 0.4838 0.0506 0.443

0.5556 0.0282 0.8941 0.5242 0.0429 0.6762

0.521 0.0233 1.119 0.6573 0.0443 0.3256

0.453 0.044 0.8519 0.6118 0.0521 0.2796

0.7246 0.0227 0.9924 0.6446 0.0347 0.7385

0.5626 0.0248 1.115 0.6618 0.0466 0.9537

0.6077 0.0318 0.9091 0.657 0.04 0.5912

0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];

%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854

%NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806

%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389

%NGC 4496,NGC 4085,NGC 4096

% 1 for Lenticular 0 for Spiral

Y=[1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(reduced_training,Y,'showplot',true);

%,'kernel_function','quadratic'

test=[0.5203 0.032 0.9405 0.625 0.0454 0.64

0.1687 0.0637 0.8448 0.75 0.0606 0.1961

0.1563 0.0692 0.85 0.8333 0.06 0.2

0.4373 0.051 0.8421 0.7037 0.0514 1.6172

98

Page 113: JJenkinson_Thesis

0.3642 0.0147 1.5035 0.603 0.055 0.5245

0.7489 0.0199 0.9471 0.5502 0.0324 0.6252

0.304 0.0258 1.1216 0.5528 0.0588 0.4888

0.2894 0.0161 1.3588 0.4942 0.06 0.6418

0.6478 0.0129 1.3286 0.5558 0.0411 0.6026

0.3865 0.0333 0.9158 0.5154 0.0541 0.4956

0.3934 0.0403 0.8406 0.5123 0.0556 0.4945

0.484 0.0319 0.9286 0.4949 0.0556 0.5979

0.2565 0.0618 0.7857 0.6361 0.06 0.3743];

%NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070

%UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,reduced_test,'showplot',true);

%===============================================================%simple spiral versus barred spiral

training=[0.1891 0.0412 1.011 0.6946 0.0606 0.1607

0.7442 0.0141 1.0774 0.5336 0.0306 0.5857

0.6239 0.0347 0.9126 0.7114 0.0406 0.4327

0.411 0.0415 0.9273 0.6789 0.0525 0.2029

0.7521 0.0145 1.0337 0.497 0.0312 1.0387

0.1306 0.0462 0.9239 0.6423 0.0614 0.3473

0.5882 0.0291 0.9153 0.5535 0.0441 0.1941

0.4012 0.0192 1.3214 0.6382 0.0527 0.4836

0.5257 0.0356 0.9414 0.7045 0.0448 0.1554

0.7802 0.0155 1 0.5871 0.0264 0.2839

0.5228 0.0184 1.2409 0.5717 0.0496 0.3557

0.505 0.0305 0.9783 0.5842 0.0499 0.3856

0.4325 0.0322 0.8718 0.4838 0.0506 0.443

0.5556 0.0282 0.8941 0.5242 0.0429 0.6762

0.521 0.0233 1.119 0.6573 0.0443 0.3256

99

Page 114: JJenkinson_Thesis

0.453 0.044 0.8519 0.6118 0.0521 0.2796

0.7246 0.0227 0.9924 0.6446 0.0347 0.7385

0.5626 0.0248 1.115 0.6618 0.0466 0.9537

0.6077 0.0318 0.9091 0.657 0.04 0.5912

0.6071 0.0219 1.0536 0.5515 0.0441 0.3169];

%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457

%NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806

%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527

%NGC 4389,NGC 4496,NGC 4085,NGC 4096

% 1 for Spiral 0 for Barred Spiral

Y=[1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(reduced_training,Y,'kernel_function',...

'quadratic','showplot',true);

%,'kernel_function','quadratic'

test=[0.3642 0.0147 1.5035 0.603 0.055 0.5245

0.7489 0.0199 0.9471 0.5502 0.0324 0.6252

0.304 0.0258 1.1216 0.5528 0.0588 0.4888

0.2894 0.0161 1.3588 0.4942 0.06 0.6418

0.6478 0.0129 1.3286 0.5558 0.0411 0.6026

0.3865 0.0333 0.9158 0.5154 0.0541 0.4956

0.3934 0.0403 0.8406 0.5123 0.0556 0.4945

0.484 0.0319 0.9286 0.4949 0.0556 0.5979

0.2565 0.0618 0.7857 0.6361 0.06 0.3743];

%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,reduced_test,'showplot',true);

A.2.2 Enhanced data

100

Page 115: JJenkinson_Thesis

training=[0.2028 0.0399 1.0698 0.7504 0.0608 1.0613

0.2187 0.0379 1.02 0.646 0.0611 0.4876

0.4311 0.0116 1.5897 0.5422 0.0541 1.4891

0.0873 0.0179 1.4145 0.5727 0.0625 0.2709

0.1025 0.0493 0.9038 0.6488 0.0621 0.2294

0.0616 0.0223 1.3416 0.6442 0.0623 0.2499

0.0439 0.0594 0.8462 0.6845 0.0621 0.2609

0.0498 0.0386 1.0259 0.6544 0.062 0.4055

0.066 0.0297 1.2147 0.7027 0.0624 0.297

0.1106 0.0612 0.8429 0.7007 0.062 0.1972

0.563 0.0361 0.9012 0.6811 0.043 0.3022

0.5703 0.0343 0.9 0.6169 0.045 0.1646

0.4029 0.0437 0.85 0.6012 0.0525 0.203

0.6352 0.0297 0.8824 0.5779 0.04 0.3413

0.5132 0.0402 0.8814 0.6323 0.0494 0.0874

0.4404 0.0377 0.9557 0.6677 0.0516 0.2233

0.4778 0.0393 0.9455 0.7083 0.0496 0.4565

0.2595 0.0531 0.89 0.7148 0.0589 0.1188

0.1686 0.0687 0.7857 0.6927 0.0612 0.2556

0.5027 0.0415 0.8942 0.6748 0.0492 0.1393

0.1871 0.0386 1.0429 0.6948 0.0603 0.2403

0.6458 0.0125 1.1555 0.4424 0.0377 0.6822

0.606 0.0331 0.9237 0.6895 0.0409 0.3285

0.3777 0.047 0.8939 0.6923 0.0542 0.23

0.7385 0.0209 0.9314 0.5232 0.0347 0.8889

0.2116 0.044 0.9107 0.6126 0.0596 1.8642

0.5645 0.0352 0.9286 0.6695 0.0454 0.2674

0.4421 0.0279 1.1424 0.7056 0.0516 0.3175

0.5088 0.0314 1.0037 0.6476 0.0489 0.1525

0.7701 0.0192 0.9634 0.6308 0.0283 0.2815

0.4965 0.0186 1.2083 0.4951 0.0548 0.4783

0.487 0.0362 0.9202 0.6272 0.0488 0.3401

101

Page 116: JJenkinson_Thesis

0.3847 0.043 0.8571 0.55 0.0574 0.3939

0.5871 0.0271 0.9656 0.6006 0.042 0.5289

0.5288 0.0309 0.9645 0.6443 0.0446 0.3304

0.4683 0.0366 0.9167 0.6028 0.051 0.4332

0.4643 0.042 0.9167 0.672 0.0525 0.4112

0.5166 0.0175 1.297 0.5687 0.0518 0.4288

0.6254 0.0331 0.8987 0.6617 0.0404 0.5547

0.6314 0.031 0.9389 0.6861 0.0398 0.1997];

%NGC 4449,UGC 7408,UGC 7577,UGC 7639,UGC 7690

%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831

%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346

%NGC 4324,NGC 5854,NGC 5838,NGC 5839,NGC 5864

%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157

%NGC 5850,NGC 5806,NGC 4232,NGC 4088,NGC 4258

%NGC 4527,NGC 4389,NGC 4496,NGC 4085,NGC 4096

Y=['I'; 'I'; 'I'; 'I'; 'I'; 'R'; 'R'; 'R'; 'R'; 'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R';...

'R'; 'R';'R';'R';'R';'R';'R';'R';'R';'R'];

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(reduced_training,Y,'kernel_function',...

'quadratic','showplot',true);

%,'kernel_function','quadratic'

test=[0.1574 0.0321 1.1029 0.6243 0.0625 0.42

0.0188 0.0368 1.0887 0.6989 0.0625 0.3746

0.0763 0.0406 1.0671 0.7388 0.0625 0.9791

0.1338 0.0592 0.8514 0.6912 0.0621 0.2128

0.0194 0.0653 0.875 0.8 0.0625 0.3

0.4014 0.0365 0.9178 0.6007 0.0512 0.9695

0.3985 0.0389 0.9258 0.6263 0.0533 0.3471

102

Page 117: JJenkinson_Thesis

0.7009 0.0124 1.1456 0.3995 0.0406 0.5291

0.2914 0.0296 1.067 0.577 0.0583 0.3457

0.246 0.0247 1.1345 0.5328 0.0597 0.5092

0.5692 0.0229 1.0414 0.5557 0.0447 0.5544

0.2172 0.0124 1.5533 0.4904 0.0611 0.5086

0.3022 0.0153 1.3596 0.4925 0.0576 0.7544

0.3798 0.029 1.0152 0.4938 0.0604 0.4923

0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];

%test set is first row irregular, remaining regular.

%NGC 4496B,NGC 5846,NGC 5846A,NGC 5865,NGC 5868,NGC 4310

%NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

GROUP = svmclassify(svmStruct,reduced_test,'showplot',true);

%===============================================================%elliptical versus not elliptical

training=[0.0616 0.0223 1.3416 0.6442 0.0623 0.2499

0.0439 0.0594 0.8462 0.6845 0.0621 0.2609

0.0498 0.0386 1.0259 0.6544 0.062 0.4055

0.066 0.0297 1.2147 0.7027 0.0624 0.297

0.1106 0.0612 0.8429 0.7007 0.062 0.1972

0.563 0.0361 0.9012 0.6811 0.043 0.3022

0.5703 0.0343 0.9 0.6169 0.045 0.1646

0.4029 0.0437 0.85 0.6012 0.0525 0.203

0.6352 0.0297 0.8824 0.5779 0.04 0.3413

0.5132 0.0402 0.8814 0.6323 0.0494 0.0874

0.4404 0.0377 0.9557 0.6677 0.0516 0.2233

0.4778 0.0393 0.9455 0.7083 0.0496 0.4565

0.2595 0.0531 0.89 0.7148 0.0589 0.1188

0.1686 0.0687 0.7857 0.6927 0.0612 0.2556

0.1871 0.0386 1.0429 0.6948 0.0603 0.2403

103

Page 118: JJenkinson_Thesis

0.6458 0.0125 1.1555 0.4424 0.0377 0.6822

0.606 0.0331 0.9237 0.6895 0.0409 0.3285

0.3777 0.047 0.8939 0.6923 0.0542 0.23

0.7385 0.0209 0.9314 0.5232 0.0347 0.8889

0.2116 0.044 0.9107 0.6126 0.0596 1.8642

0.5645 0.0352 0.9286 0.6695 0.0454 0.2674

0.4421 0.0279 1.1424 0.7056 0.0516 0.3175

0.5088 0.0314 1.0037 0.6476 0.0489 0.1525

0.7701 0.0192 0.9634 0.6308 0.0283 0.2815

0.4965 0.0186 1.2083 0.4951 0.0548 0.4783

0.487 0.0362 0.9202 0.6272 0.0488 0.3401

0.3847 0.043 0.8571 0.55 0.0574 0.3939

0.5871 0.0271 0.9656 0.6006 0.042 0.5289

0.5288 0.0309 0.9645 0.6443 0.0446 0.3304

0.4683 0.0366 0.9167 0.6028 0.051 0.4332

0.4643 0.042 0.9167 0.672 0.0525 0.4112

0.5166 0.0175 1.297 0.5687 0.0518 0.4288

0.6254 0.0331 0.8987 0.6617 0.0404 0.5547

0.6314 0.031 0.9389 0.6861 0.0398 0.1997];

%NGC 4278,NGC 4283,NGC 4308,NGC 5813,NGC 5831,NGC 4346,NGC 4460

%NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854,NGC 5838

%NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850

%NGC 5806,NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527

%NGC 4389,NGC 4496,NGC 4085,NGC 4096

% 1 for Elliptical 0 for Not Elliptical

Y=[1 1 1 1 ...

1 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0 ...

0 0 0 0 0 0];

104

Page 119: JJenkinson_Thesis

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(training,Y,'showplot',true);

%'kernel_function','quadratic',S0

test=[0.0188 0.0368 1.0887 0.6989 0.0625 0.3746

0.0763 0.0406 1.0671 0.7388 0.0625 0.9791

0.5027 0.0415 0.8942 0.6748 0.0492 0.1393

0.1338 0.0592 0.8514 0.6912 0.0621 0.2128

0.0194 0.0653 0.875 0.8 0.0625 0.3

0.4014 0.0365 0.9178 0.6007 0.0512 0.9695

0.3985 0.0389 0.9258 0.6263 0.0533 0.3471

0.7009 0.0124 1.1456 0.3995 0.0406 0.5291

0.2914 0.0296 1.067 0.577 0.0583 0.3457

0.246 0.0247 1.1345 0.5328 0.0597 0.5092

0.5692 0.0229 1.0414 0.5557 0.0447 0.5544

0.2172 0.0124 1.5533 0.4904 0.0611 0.5086

0.3022 0.0153 1.3596 0.4925 0.0576 0.7544

0.3798 0.029 1.0152 0.4938 0.0604 0.4923

0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];

%NGC 5846,NGC 5846A,NGC 5864,NGC 5865,NGC 5868,

%NGC 4310,NGC 6070,UGC 07617,NGC 4480,UGC 10133,NGC 4559,

%NGC 4242,NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,test,'showplot',true);

%===============================================================%lenticular versus spiral

training=[0.563 0.0361 0.9012 0.6811 0.043 0.3022

0.5703 0.0343 0.9 0.6169 0.045 0.1646

0.4029 0.0437 0.85 0.6012 0.0525 0.203

0.6352 0.0297 0.8824 0.5779 0.04 0.3413

105

Page 120: JJenkinson_Thesis

0.5132 0.0402 0.8814 0.6323 0.0494 0.0874

0.4404 0.0377 0.9557 0.6677 0.0516 0.2233

0.4778 0.0393 0.9455 0.7083 0.0496 0.4565

0.2595 0.0531 0.89 0.7148 0.0589 0.1188

0.1686 0.0687 0.7857 0.6927 0.0612 0.2556

0.1871 0.0386 1.0429 0.6948 0.0603 0.2403

0.6458 0.0125 1.1555 0.4424 0.0377 0.6822

0.606 0.0331 0.9237 0.6895 0.0409 0.3285

0.3777 0.047 0.8939 0.6923 0.0542 0.23

0.7385 0.0209 0.9314 0.5232 0.0347 0.8889

0.2116 0.044 0.9107 0.6126 0.0596 1.8642

0.5645 0.0352 0.9286 0.6695 0.0454 0.2674

0.4421 0.0279 1.1424 0.7056 0.0516 0.3175

0.5088 0.0314 1.0037 0.6476 0.0489 0.1525

0.7701 0.0192 0.9634 0.6308 0.0283 0.2815

0.4965 0.0186 1.2083 0.4951 0.0548 0.4783

0.487 0.0362 0.9202 0.6272 0.0488 0.3401

0.3847 0.043 0.8571 0.55 0.0574 0.3939

0.5871 0.0271 0.9656 0.6006 0.042 0.5289

0.5288 0.0309 0.9645 0.6443 0.0446 0.3304

0.4683 0.0366 0.9167 0.6028 0.051 0.4332

0.4643 0.042 0.9167 0.672 0.0525 0.4112

0.5166 0.0175 1.297 0.5687 0.0518 0.4288

0.6254 0.0331 0.8987 0.6617 0.0404 0.5547

0.6314 0.031 0.9389 0.6861 0.0398 0.1997];

%NGC 4346,NGC 4460,NGC 4251,NGC 4220,NGC 4346,NGC 4324,NGC 5854

%NGC 5838,NGC 5839,NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288

%NGC 4457,NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806

%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527,NGC 4389

%NGC 4496,NGC 4085,NGC 4096

% 1 for Lenticular 0 for Spiral

Y=[1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];

106

Page 121: JJenkinson_Thesis

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(training,Y,'showplot',true);

% 'kernel_function','quadratic',

test=[0.5027 0.0415 0.8942 0.6748 0.0492 0.1393

0.1338 0.0592 0.8514 0.6912 0.0621 0.2128

0.0194 0.0653 0.875 0.8 0.0625 0.3

0.4014 0.0365 0.9178 0.6007 0.0512 0.9695

0.3985 0.0389 0.9258 0.6263 0.0533 0.3471

0.7009 0.0124 1.1456 0.3995 0.0406 0.5291

0.2914 0.0296 1.067 0.577 0.0583 0.3457

0.246 0.0247 1.1345 0.5328 0.0597 0.5092

0.5692 0.0229 1.0414 0.5557 0.0447 0.5544

0.2172 0.0124 1.5533 0.4904 0.0611 0.5086

0.3022 0.0153 1.3596 0.4925 0.0576 0.7544

0.3798 0.029 1.0152 0.4938 0.0604 0.4923

0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];

%NGC 5864,NGC 5865,NGC 5868,NGC 4310,NGC 6070

%UGC 07617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,test,'showplot',true);

%===============================================================%simple spiral versus barred spiral

training=[0.1871 0.0386 1.0429 0.6948 0.0603 0.2403

0.6458 0.0125 1.1555 0.4424 0.0377 0.6822

0.606 0.0331 0.9237 0.6895 0.0409 0.3285

0.3777 0.047 0.8939 0.6923 0.0542 0.23

0.7385 0.0209 0.9314 0.5232 0.0347 0.8889

0.2116 0.044 0.9107 0.6126 0.0596 1.8642

107

Page 122: JJenkinson_Thesis

0.5645 0.0352 0.9286 0.6695 0.0454 0.2674

0.4421 0.0279 1.1424 0.7056 0.0516 0.3175

0.5088 0.0314 1.0037 0.6476 0.0489 0.1525

0.7701 0.0192 0.9634 0.6308 0.0283 0.2815

0.4965 0.0186 1.2083 0.4951 0.0548 0.4783

0.487 0.0362 0.9202 0.6272 0.0488 0.3401

0.3847 0.043 0.8571 0.55 0.0574 0.3939

0.5871 0.0271 0.9656 0.6006 0.042 0.5289

0.5288 0.0309 0.9645 0.6443 0.0446 0.3304

0.4683 0.0366 0.9167 0.6028 0.051 0.4332

0.4643 0.042 0.9167 0.672 0.0525 0.4112

0.5166 0.0175 1.297 0.5687 0.0518 0.4288

0.6254 0.0331 0.8987 0.6617 0.0404 0.5547

0.6314 0.031 0.9389 0.6861 0.0398 0.1997];

%NGC 4218,NGC 4217,NGC 4100,NGC 4414,UGC 10288,NGC 4457

%NGC 4314,NGC 4274,NGC 4448,NGC 4157,NGC 5850,NGC 5806

%NGC 4232,NGC 4088,NGC 4258 (Messier 106),NGC 4527

%NGC 4389,NGC 4496,NGC 4085,NGC 4096

% 1 for Spiral 0 for Barred Spiral

Y=[1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0];

Y=Y';

coeff=pca(training);

reduced_training=training*coeff(:,1:2);

svmStruct=svmtrain(training,Y,'kernel_function',...

'quadratic','showplot',true);

%'kernel_function','quadratic',

test=[0.3985 0.0389 0.9258 0.6263 0.0533 0.3471

0.7009 0.0124 1.1456 0.3995 0.0406 0.5291

0.2914 0.0296 1.067 0.577 0.0583 0.3457

0.246 0.0247 1.1345 0.5328 0.0597 0.5092

0.5692 0.0229 1.0414 0.5557 0.0447 0.5544

0.2172 0.0124 1.5533 0.4904 0.0611 0.5086

108

Page 123: JJenkinson_Thesis

0.3022 0.0153 1.3596 0.4925 0.0576 0.7544

0.3798 0.029 1.0152 0.4938 0.0604 0.4923

0.2377 0.0559 0.8617 0.6898 0.0602 0.2997];

%NGC 6070,UGC 7617,NGC 4480,UGC 10133,NGC 4559,NGC 4242

%NGC 4393,NGC 4288,NGC 3985

coeff2=pca(test);

reduced_test=test*coeff2(:,1:2);

group = svmclassify(svmStruct,test,'showplot',true);

109

Page 124: JJenkinson_Thesis

BIBLIOGRAPHY

[1] Lintott, Chris J. and Schawinski, Kevin and Slosar, Anze and Land, Kate and Bamford, Steven

and others. Galaxy Zoo: morphologies derived from visual inspection of galaxies from the

Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc., 389, 1179-1189, 2008.

[2] R. A. Skibba, S. P. Bamford, R. C. Nichol, C. J. Lintott, D. Andreescu, E. M. Edmondson,

P. Murray and M. J. Raddick. Galaxy Zoo: disentangling the environmental dependence of

morphology and colour. Mon. Not. R. Astron. Soc., 399, 966-982, 2009.

[3] Shabala, S. S. and Ting, Y. S. and Kaviraj, S. and Lintott, C. and Crockett, R. M. and Silk, J.

and Sarzi, M. and Schawinski, K. and Bamford, S. P. and Edmondson, E. Galaxy Zoo: dust

lane early-type galaxies are tracers of recent, gas-rich minor mergers. Mon. Not. R. Astron.

Soc., 423, 59-67, 2012.

[4] Hubble, E. P. Extragalactic nebulae. Astrophysical Journal, 64, 321-369, 1926.

[5] de Vaucouleurs, G. Classification and Morphology of External Galaxies. Handbuch der

Physik, 53, 275-310, 1959.

[6] de Vaucouleurs, G. Revised Classification of 1500 Bright Galaxies. Astrophysical Journal

Supplement, 8, 31, 1963.

[7] Hubble, E. P. The Realm of the Nebulae. Yale University Press, 1936.

[8] Abazajian et al. The First Data Release of the Sloan Digital Sky Survey. The Astronomical

Journal, 128, 4, 2081-2086, 2003. , Volume 126, Issue 4, pp. 2081-2086

[9] Patrick Morrissey et al. The Calibration and Data Products of GALEX. The Astrophysical

Journal Supplement Series , 173, 682, 2007.

[10] Neugebauer, G. and Leighton, R. B. Two-micron sky survey. A preliminary catalogue. NASA

SP, Washington: NASA, 1696.

110

Page 125: JJenkinson_Thesis

[11] Skrutskie et al. The Two Micron All Sky Survey (2MASS). The Astronomical Journal, 131,

1163-1183, 2006.

[12] Lasker et al. The Second-Generation Guide Star Catalog: Description and Properties. The

Astronomical Journal, 136, 735-766, 2008.

[13] Reid et al. The Second Palomar Sky Survey. Publications of the Astronomical Society of the

Pacific, 103, 661-674, 1991.

[14] W. Voges et al. The ROSAT all-sky survey bright source catalogue. Astronomy and Astro-

physics, 349, 389-405, 1999.

[15] Becker, R. H. and White, R. L. and Helfand, D. J. The FIRST Survey: Faint Images of the

Radio Sky at Twenty Centimeters. The Astronomical Journal, 450, 559, 1995.

[16] Becker, R. H. and White, R. L. and Helfand, D. J. The VLA’s FIRST Survey. Astronomical

Society of the Pacific Conference Series, 61, 165, 1994.

[17] Epchtein, N. et al. The deep near-infrared southern sky survey (DENIS). The Messenger, 87,

27-34, 1997.

[18] Storrie-Lombardi, MC et al. Morphological Classification of Galaxies by Artificial Neural

Networks. Mon. Not. R. Astron. Soc., 259, 8-12, 1992.

[19] Owens, E. A. and Griffiths, R. E. and Ratnatunga, K. U. Using oblique decision trees for the

morphological classification of galaxies. Mon. Not. R. Astron. Soc., 281, 153-157, 1996.

[20] Naim, A. and Lahav, O. and Sodre, Jr., L. and Storrie-Lombardi, M. C. Automated morpho-

logical classification of APM galaxies by supervised artificial neural networks. Mon. Not. R.

Astron. Soc., 275, 567-590, 1995.

[21] Nicholas M. Ball. Morphological Classification of Galaxies Using Artificial Neural Net-

works. University of Sussex,UK. MSc thesis, 2001.

111

Page 126: JJenkinson_Thesis

[22] Lahav, O. Artificial neural networks as a tool for galaxy classification. Data Analysis in

Astronomy, 43-51, 1997.

[23] Folkes, S. R. and Lahav, O. and Maddox, S. J. An artificial neural network approach to the

classification of galaxy spectra. Mon. Not. R. Astron. Soc., 283, 651-665, 1996.

[24] Bershady, M. A. and Jangren, A. and Conselice, C. J. Structural and Photometric Classifica-

tion of Galaxies. I. Calibration Based on a Nearby Galaxy Sample. The Astronomical Journal,

119, 2645-2663, 2000.

[25] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica-

tion. The Astrophysical Journal, 548, 219, 2001.

[26] Goderya, Shaukat N. and Lolling, Shawn M. Morphological Classification of Galaxies using

Computer Vision and Artificial Neural Networks: A Computational Scheme. Astrophysics and

Space Science, 279, 377-387, 2002.

[27] D. Bazell. Feature relevance in morphological galaxy classification. Mon. Not. R. Astron.

Soc., 316, 519-528, 2000.

[28] Strateva, I. et al. Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging

Data. The Astronomical Journal, 122, 1861-1874, 2001.

[29] Abraham, R. G. and van den Bergh, S. and Nair, P. A New Approach to Galaxy Morphology.

I. Analysis of the Sloan Digital Sky Survey Early Data Release. The Astrophysical Journal,

588, 218-229, 2003.

[30] D. Bazell and David W. Aha Ensembles of Classifiers for Morphological Galaxy Classifica-

tion. The Astrophysical Journal, 548, 219, 2001.

[31] de la Calleja, J. and Fuentes, O. Machine learning and image analysis for morphological

galaxy classification. Mon. Not. R. Astron. Soc., 349, 87-93, 2001.

112

Page 127: JJenkinson_Thesis

[32] C. Scarlata et al. COSMOS Morphological Classification with the Zurich Estimator of Struc-

tural Types (ZEST) and the Evolution Since z = 1 of the Luminosity Function of Early, Disk,

and Irregular Galaxies. The Astrophysical Journal Supplement Series, 172, 406-433, 2007.

[33] Banerji, M. et al. Galaxy Zoo: reproducing galaxy morphologies via machine learning. Mon.

Not. R. Astron. Soc., 406, 342-353, 2010.

[34] Goderya, S. and Andreasen, J. D. and Philip. Advances in Automated Algorithms For Mor-

phological Classification of Galaxies Based on Shape Features. Astronomical Data Analysis

Software and Systems (ADASS) XIII, 314, 617, 2004.

[35] Odewahn, S. C. Automated galaxy classification in large sky surveys. Neural Networks,

1999. IJCNN ’99. International Joint Conference on, 6, 3824-3829, 1999.

[36] Odewahn, S. C. Automated galaxy classification with the APS digitization of POSS I. Astro-

physical Letters and Communications, 31, 55-64, 1995.

[37] Odewahn, S. C. and Windhorst, R. A. and Driver, S. P. and Keel, W. C. Automated Morpho-

logical Classification in Deep Hubble Space Telescope UBVI Fields: Rapidly and Passively

Evolving Faint Galaxy Populations. Astrophysical Journal Letters, 472, L13-L16, 1996.

[38] Maehoenen, P. H. and Hakala, P. J. Automated Source Classification Using a Kohonen Net-

work. Astrophysical Journal Letters, 452, L77, 1995.

[39] Cortiglioni, F. and Mähönen, P. and Hakala, P. and Frantti, T. Automated Star-Galaxy Dis-

crimination for Large Surveys. The Astrophysical Journal, 556, 937-943, 2001.

[40] Baillard, A. and Bertin, E. and Mellier, Y. and McCracken, H. J. and Géraud, T. and Pelló,

R. and Leborgne, F. and Fouqué, P. Project EFIGI: Automatic Classification of Galaxies.

Astronomical Society of the Pacific, 351, 236, 2006.

113

Page 128: JJenkinson_Thesis

[41] Byun, Hyeran and Lee, Seong-Whan. Applications of Support Vector Machines for Pattern

Recognition: A Survey. Proceedings of the First International Workshop on Pattern Recogni-

tion with Support Vector Machines, 213-236, 2002.

[42] Romano, Raquel A. and Aragon, Cecilia R. and Ding, Chris. Supernova Recognition Us-

ing Support Vector Machines. Proceedings of the 5th International Conference on Machine

Learning and Applications, 77-82, 2006.

[43] Huertas-Company, M. et al. A robust morphological classification of high-redshift galaxies

using support vector machines on seeing limited images. Astronomy & Astrophysics, 497,

743-753, 2009.

[44] Freed, M. and Jeonghwa Lee. Application of Support Vector Machines to the Classifica-

tion of Galaxy Morphologies. Computational and Information Sciences (ICCIS), 2013 Fifth

International Conference on, 322-325, 2013.

[45] Saybani et al. Applications of support vector machines in oil refineries: A survey. Interna-

tional Journal of the Physical Sciences, 6(27), 6295-6302, 2011.

[46] Xie W, Yu L, Xu S, Wang S. A New Method for Crude Oil Price Forecasting Based on

Support Vector Machines. Computational Science-ICCS, 3994, 444-451, 2006.

[47] Petkovic, Milena R. and Rapaic, Milan R. and Jakovljevic, Boris B. Electrical Energy Con-

sumption Forecasting in Oil Refining Industry Using Support Vector Machines and Particle

Swarm Optimization. WSEAS Trans. Info. Sci. and App., 6(11), 1761-1770, 2009.

[48] Balabin RM, Safieva RZ, Lomakina EI. Gasoline classification using near infrared (NIR)

spectroscopy data: comparison of multivariate techniques. Anal Chim Acta., 671, 27-35, 2010.

[49] Guo, Guodong and Li, Stan Z. and Chan, Kapluk. Face Recognition by Support Vector

Machines. Proceedings of the Fourth IEEE International Conference on Automatic Face and

Gesture Recognition 2000, 196-201, 2000.

114

Page 129: JJenkinson_Thesis

[50] Guodong Guo and Stan Z. Li and Kap Luk Chan. Support vector machines for face recogni-

tion. Image and Vision Computing, 19, 631-638, 2001.

[51] John C. Platt and Nello Cristianini and John Shawe-taylor. Large Margin DAGs for Multiclass

Classification. Advances in Neural Information Processing Systems, 547-553, 2000.

[52] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-

ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International

Conference on, 1, 1006-1009, 2001.

[53] Romdhani, S. and Torr, P. and Scholkopf, B. and Blake, A. Computationally efficient face

detection. Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Con-

ference on, 2, 695-700, 2001.

[54] Yongmin Li and Shaogang Gong and Liddell, H.. Support vector regression and classification

based multi-view face detection and recognition. Automatic Face and Gesture Recognition,

2000. Proceedings. Fourth IEEE International Conference on, 300-305, 2000.

[55] Ng, Jeffrey and Gong, Shaogang. Multi-View Face Detection and Pose Estimation Using a

Composite Support Vector Machine Across the View Sphere. Proceedings of the International

Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems,

14–, 1999.

[56] Jeffrey Ng and Shaogang Gong. Composite support vector machines for detection of faces

across views and pose estimation. Image and Vision Computing, 20, 359-368, 2002.

[57] Yongmin Li and Shaogang Gong and Sherrah, J. and Liddell, H. Multi-view face detection

using support vector machines and eigenspace modelling. Knowledge-Based Intelligent Engi-

neering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference

on, 1, 241-244, 2000.

115

Page 130: JJenkinson_Thesis

[58] Osuna, E. and Freund, R. and Girosi, F. Training support vector machines: an application

to face detection. Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE

Computer Society Conference on, 130-136, 1997.

[59] Kumar, V.P. and Poggio, T. Learning-based approach to real time tracking and analysis of

faces. Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE Interna-

tional Conference on, 96-101, 2000.

[60] Yuan Qi and Doermann, D. and DeMenthon, D. Hybrid independent component analysis

and support vector machine learning scheme for face detection. Acoustics, Speech, and Signal

Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE International Conference on, 3,

1481-1484, 2001.

[61] Terrillon, J.-C. and Shirazi, M.N. and Sadek, M. and Fukamachi, H. and Akamatsu, S. Hybrid

independent component analysis and support vector machine learning scheme for face detec-

tion. Pattern Recognition, 2000. Proceedings. 15th International Conference on, 4, 210-217,

2000.

[62] Papageorgiou, C.P. and Oren, M. and Poggio, T. A general framework for object detection.

Computer Vision, 1998. Sixth International Conference on, 555-562, 1998.

[63] Haizhou Ai and Luhong Liang and Guangyou Xu. Face detection based on template match-

ing and support vector machines. Image Processing, 2001. Proceedings. 2001 International

Conference on, 1, 1006-1009, 2001.

[64] Roobaert, D. and Van Hulle, M.M. View-based 3D object recognition with support vector

machines. Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE

Signal Processing Society Workshop., 77-84, 1999.

[65] Pontil, Massimiliano and Verri, Alessandro Support Vector Machines for 3D Object Recog-

nition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 637-646, 1998.

116

Page 131: JJenkinson_Thesis

[66] Yingjie Wang and Chin-Seng Chua and Yeong-Khing Ho. Facial feature detection and face

recognition from 2D and 3D images. Pattern Recognition Letters, 23, 1191-1202, 2002.

[67] Kim, K.I and Kim, J. and Jung, K. Recognition of facial images using support vector ma-

chines. Statistical Signal Processing, 2001. Proceedings of the 11th IEEE Signal Processing

Workshop on, 468-471, 2001.

[68] M. Pittore, C. Basso, and A. Verri. Representing and recognizing visual dynamic events with

support vector machines. In Proceedings of Int. Conference on Image Analysis and Processing,

18-23, 1999.

[69] C. Nakajima, M. Pontil, and T. Poggio. People recognition and pose estimation in image

sequences. In Proceedings of IEEE Int. Joint Conference on Neural Net-works, 4, 189-194,

2000.

[70] S. Gutta, J.R.J. Huang, P. Jonathon, and H. Wechsler. Mixture of experts for classification of

gender, ethnic origin, and pose of human. IEEE Trans. on Neural Networks, 4, 948-960, 2001.

[71] Loo-Nin Teow and Kia-Fock Loe. Robust vision-based features and classification schemes

for off-line handwritten digit recognition. Pattern Recognition, 35, 2355-2364, 2002.

[72] Dashan Gao and Jie Zhou and Leping Xin. SVM-based detection of moving vehicles for

automatic traffic monitoring. Intelligent Transportation Systems, 2001. Proceedings. 2001

IEEE, 745-749, 2001.

[73] Kent, S. and Kasapoglu, N. G. and Kartal, M. Radar target classification based on support

vector machines and High Resolution Range Profiles. Radar Conference, 2008. RADAR ’08.

IEEE, 1-6, 2008.

[74] Choisy, C. and Belaid, A. Handwriting recognition using local methods for normalization

and global methods for recognition. Document Analysis and Recognition, 2001. Proceedings.

Sixth International Conference on, 23-27, 2001.

117

Page 132: JJenkinson_Thesis

[75] Gorgevik, D. and Cakmakov, D. and Radevski, V.. Handwritten digit recognition by combin-

ing support vector machines using rule-based reasoning. Information Technology Interfaces,

2001. ITI 2001. Proceedings of the 23rd International Conference on, 1, 139-144, 2001.

[76] Junxian Li and Limin Shen and Shuo Yang. A Novel Radar Target Recognition Algorithm

Based on SVM. Intelligent Information Technology Application Workshops, 2008. IITAW ’08.

International Symposium on, 431-434, 2008.

[77] Oliveira, L. and Sabourin, R. Support vector machines for handwritten numerical string

recognition. Frontiers in Handwriting Recognition, 2004. IWFHR-9 2004. Ninth International

Workshop on, 39-44, 2004.

[78] Xin Dong and Wi Zhaohui. Speaker recognition using continuous density support vector

machines. Electronics Letters, 37, 1099-1101, 2001.

[79] Bengio, S. and Mariethoz, J. Learning the decision function for speaker verification. Acous-

tics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01). 2001 IEEE Interna-

tional Conference on, 1, 425-428, 2001.

[80] Changxue Ma and Randolph, M.A and Drish, J. A support vector machines-based rejection

technique for speech recognition. Acoustics, Speech, and Signal Processing, 2001. Proceed-

ings. (ICASSP ’01). 2001 IEEE International Conference on, 35, 381-384, 2001.

[81] Wan, V. and Campbell, W.M. Support vector machines for speaker verification and identifi-

cation. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal

Processing Society Workshop, 2, 775-784, 2000.

[82] Guo, G. and Hong-Jiang Zhang and Li, S.Z. Distance-from-boundary as a metric for texture

image retrieval. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ’01).

2001 IEEE International Conference on, 3, 1629-1632, 2001.

118

Page 133: JJenkinson_Thesis

[83] Qi Tian and Hong, Pengyu and Huang, T.S. Update relevant image weights for content-based

image retrieval using support vector machines. Multimedia and Expo, 2000. ICME 2000. 2000

IEEE International Conference on, 2, 1199-1202, 2000.

[84] H. Druker, B. Shahrary, and D.C. Gibbon. Support vector machines: relevance feedback and

information retrieval. Information Processing & Management, 3, 305-323, 2002.

[85] Lei Zhang and Fuzong Lin and Bo Zhang. Support vector machine learning for image re-

trieval. Image Processing, 2001. Proceedings. 2001 International Conference on, 2, 721-724,

2001.

[86] Francis E.H. Tay and L.J. Cao. Modified support vector machines in financial time series

forecasting. Neurocomputing, 48, 847 - 861, 2002.

[87] Fan, A and Palaniswami, M. Selecting bankruptcy predictors using a support vector ma-

chine approach. Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS

International Joint Conference on, 6, 354-359, 2000.

[88] B. Moghaddam and M. H. Yang. Gender classification using support vector machines. In

Proceedings of IEEE Int. Conference on Image Processing, 2, 471-474, 2000.

[89] Yuan Yao and Gian Luca Marcialis and Massimiliano Pontil and Gian Luca and Marcialis

Massimiliano Pontil and Paolo Frasconi and Fabio Roli. Combining Flat and Structured Repre-

sentations for Fingerprint Classification With Recursive Neural Networks and Support Vector

Machines. Pattern Recognition, 36, 397-406, 2003.

[90] Xie, W.F. and Hou, D. J. and Song, Q. Bullet-hole image classification with support vector

machines. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE

Signal Processing Society Workshop, 1, 318-327, 2000.

119

Page 134: JJenkinson_Thesis

[91] C. Ongun, U. Halici, K. Leblebicioglu, V. Atalay, M. Beksac, and S. Beksac. Feature ex-

traction and classification of blood cells for an automated differential blood count system. In

Proceedings of Int. Joint Conference on Neural Networks, 2461-2466, 2001.

[92] Drucker, H. and Wu, S. and Vapnik, V.N. Support vector machines for spam categorization.

Neural Networks, IEEE Transactions on, 10, 1048-1054, 1999.

[93] Junping Zhang and Ye Zhang and Tingxian Zhou. Classification of hyperspectral data using

support vector machine. Image Processing, 2001. Proceedings. 2001 International Conference

on, 1, 882-885, 2001.

[94] Ramirez, L. and Pedrycz, W. and Pizzi, N. Severe storm cell classification using support

vector machines and radial basis function approaches. Electrical and Computer Engineering,

2001. Canadian Conference on, 1, 87-91, 2001.

[95] Chapelle, O. and Haffner, P. and Vapnik, V.N. Support vector machines for histogram-based

image classification . Neural Networks, IEEE Transactions on, 10, 1055-1064, 1999.

[96] Gintu Xavier, Tintu Erlin Philip, Deepthi T.V.N, K.P Soman. An Efficient Algorithm for

the Segmentation of Astronomical Images. IOSR Journal of Computer Engineering, 6, 21-29,

2012.

[97] Grant Privett. Creating and Enhancing Digital Astro Images. Springer Science & Business

Media, Jan 7, 2007.

[98] Jean-Luc Starck and Fionn Murtagh. Astronomical Image and Data Analysis. Springer-

Verlag Berlin Heidelberg, 2006.

[99] Peng, E. W. and Ford, H. C. and Freeman, K. C. and White, R. L. A Young Blue Tidal Stream

in NGC 5128. The Astronomical Journal, 124, 3144-3156, 2002.

120

Page 135: JJenkinson_Thesis

[100] Jacob Lucas, Brandoch Calef, and Keith Knox. Image enhancement for astronomi-

cal scenes. Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, 885603

doi:10.1117/12.2025191, 2013.

[101] Eliska Anna Kubickova. Processing of Astronomical Images Using Matlab Image Process-

ing Toolbox. Ad Alta : Journal of Interdisciplinary Research, 2011.

[102] Adith Chandrasekhar. Point Extraction and Matching for Registration of Infrared Astro-

nomical Images. Chester F. Carlson Center for Imaging Science of the College of Science,

Rochester Institute of Technology, Master’s Thesis, 1999.

[103] Hoekzema, N. M. and Brandt, P. N. Small-scale topology of solar atmosphere dynamics.

Astronomy and Astrophysics, 353, 389-395, 2000.

[104] Comerón, S. and Knapen, J. H. and Sheth, K. and Regan, M. W. and Hinz, J. L. and Gil de

Paz, A. and Menéndez-Delmestre, K. and Muñoz-Mateos, J.-C. and Seibert, M. and Kim, T.

and Athanassoula, E. and Bosma, A. and Buta, R. J. and Elmegreen, B. G. and Ho, L. C. and

Holwerda, B. W. and Laurikainen, E. and Salo, H. and Schinnerer, E. The Thick Disk in the

Galaxy NGC 4244 from S4G Imaging. The Astrophysical Journal, 729, 18, 2011.

[105] Davis, D. R. and Hayes, W. B. Scalable Automated Detection of Spiral Galaxy Arm Seg-

ments. The Astrophysical Journal, 790, 87, 2014.

[106] Ji, T-L and Sundareshan, M.K. and Roehrig, H. Adaptive image contrast enhancement based

on human visual properties. Medical Imaging, IEEE Transactions on, 13, 573-586, 1994.

[107] Rafael C. Gonzales and Richard E. Woods. Digital Image Processing. Prentice Hall, 3

edition, 2012.

[108] Starck, J.-L. and Murtagh, F. and Pirenne, B. and Albrecht, M. Astronomical Image Com-

pression Based on Noise Suppression. Publications of the Astronomical Society of the Pacific,

108, 446-455, 1996.

121

Page 136: JJenkinson_Thesis

[109] Faundez-Abans, M. and de Oliveira-Abans, M. Looking for fine structures in galaxies.

Astronomy and Astrophysics, 128, 289-297, 1998.

[110] James F. Scholl. Image enhancement of the galaxy VV371c using the 2D fast

wavelet transform. Proc. SPIE 2308, Visual Communications and Image Processing ’94,

doi:10.1117/12.185886., 1268, 1994.

[111] Burkhead, M. S. and Matuska, W. Fourier Transform Enhanced Photography of the M51

System. AAS Photo Bulletin, 23, 13, 1980.

[112] S. Djorgovski. Enhancement Of Features In Galaxy Images. Proc. SPIE 0627, Instrumen-

tation in Astronomy VI, 674 doi:10.1117/12.968146, 674, 1986.

[113] Jenkinson et al. Machine Learning and Image Processing in Astronomy with Sparse Data

Sets. Submitted to: IEEE Transactions on Systems, Man, and Cybernetics, 2014.

[114] Starck, J. L. and Donoho, D. L. and Candès, E. J. Astronomical image representation by the

curvelet transform. Astronomy and Astrophysics, 398, 785-800, 2003.

[115] Leonid P. Yaroslavsky. Local adaptive image restoration and enhancement with the use of

DFT and DCT in a running window. Proc. SPIE 2825, Wavelet Applications in Signal and

Image Processing IV, doi:10.1117/12.255218., 2, 1996.

[116] Artyom M. Grigoryan and Mehdi Hajinoroozi. Image and Audio Signal Filtration with

Discrete Heap Transforms. Applied Mathematics and Sciences: An International Journal

(MathSJ), 1, 2014.

[117] Artyom M. Grigoryan, Sos S. Agaian. Alpha-Rooting Method of Color Image Enhancement

by Discrete Quaternion Fourier Transform. Proc. SPIE 9019, Image Processing: Algorithms

and Systems XII, 901904 (February 25, 2014); doi:10.1117/12.2040596.

[118] McClellan, James H. Artifacts in alpha-rooting of images. Acoustics, Speech, and Signal

Processing, IEEE International Conference on ICASSP ’80, 5, 449-452, 1980.

122

Page 137: JJenkinson_Thesis

[119] Arslan, F.T. and Grigoryan, AM. Image enhancement by the tensor transform. Biomedical

Imaging: Nano to Macro, 2004. IEEE International Symposium on, 1, 816-819, 2004.

[120] Arslan, F.T. and Grigoryan, AM. Fast splitting alpha-rooting method of image enhancement:

tensor representation. IEEE Trans. Image Process., 15, 3375-3384, 2006.

[121] Arslan, F.T. and Grigoryan, AM. Enhancement of Medical Images by the Paired Transform.

Image Processing, 2007. ICIP 2007. IEEE International Conference on, 1, 537-540, 2007.

[122] Grigoryan, Artyom M. and Naghdali, Khalil. On a Method of Paired Representation: En-

hancement and Decomposition by Series Direction Images. J. Math. Imaging Vis., 34, 185-199,

2009.

[123] Ronald R. Coifman and Artur Sowa. Combining the Calculus of Variations and Wavelets

for Image Enhancement. Applied and Computational Harmonic Analysis, 9, 1-18, 2000.

[124] Ruchika Mishra and Utkarsh Sharma and Manish Shrivastava. Contrast Enhancement of

Remote Sensing Images using DWT with Kernel Filter and DTCWT. International Journal of

Computer Applications, 87, 43-49, 2014.

[125] Naghdali, K. and Ranjith, R. and Grigoryan, AM. Fast signal-induced transforms in image

enhancement. Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Confer-

ence on, 565-570, 2009.

[126] Morrow, W.M. and Paranjape, R.B. and Rangayyan, R.M. and Desautels, J. E L. Region-

based contrast enhancement of mammograms. Medical Imaging, IEEE Transactions on, 11,

392-406, 1992.

[127] Agaian, S. S. and Silver, B. and Panetta, K. A. Transform Coefficient Histogram-Based

Image Enhancement Algorithms Using Contrast Entropy. Trans. Img. Proc., 16, 741-758,

2007.

123

Page 138: JJenkinson_Thesis

[128] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Transform-Based Image Enhancement

Algorithms with Performance Measure. Image Processing, IEEE Transactions on, 10, 367-

382, 2001.

[129] Douglas F. Elliott. Handbook of Digital Signal Processing: Engineering Applications. Aca-

demic Press, Feb 1, 1988.

[130] Díaz-Hernández, R. and González, J. J. and Costero, R. and Guichard, J. Retrieval of spec-

troscopic information from the Tonantzintla Schmidt camera archival plates. Society of Photo-

Optical Instrumentation Engineers (SPIE) Conference Series, 8011, 2011.

[131] A.M. Grigoryan and S.S. Agaian. Multidimensional Discrete Unitary Transforms: Repre-

sentation, Partitioning and Algorithms. Marcel Dekker Inc., New York, 2003.

[132] A. Grigoryan and M. Grigoryan. Brief notes in advanced dsp: Fourier analysis with matlab.

CRC Press Taylor and Francis Group, 2009.

[133] Agaian, S.S. and Panetta, K. and Grigoryan, AM. Discrete unitary transforms generated by

moving waves. Proc. of the International Conference: Wavelets XII, SPIE: Optics+Photonics,

6701, 25, 2007.

[134] Grigoryan, AM. 2-D and 1-D multipaired transforms: frequency-time type wavelets. Signal

Processing, IEEE Transactions on, 49, 344-353, 2001.

[135] Grigoryan, AM. and Grigoryan, M.M., Nonlinear Approach Of Construction of Fast Unitary

Transforms. Information Sciences and Systems, 2006 40th Annual Conference on, 1073-1078,

2006.

[136] Nobuyuki Otsu. A Threshold Selection Method from Gray-Level Histograms. Systems,

Man and Cybernetics, IEEE Transactions on, 9, 62-66, 1979.

124

Page 139: JJenkinson_Thesis

[137] Abraham, R. G. and Valdes, F. and Yee, H. K. C. and van den Bergh, S. The morphologies

of distant galaxies. 1: an automated classification system. Astrophysical Journal, Part 1, 432,

75-90, 1994.

[138] Ivezic, Ž. and Connolly, A.J. and Vanderplas, J.T. and Gray, A. Statistics, Data Mining and

Machine Learning in Astronomy. Princeton University Press, Princeton, NJ 2014.

125

Page 140: JJenkinson_Thesis

VITA

John Jenkinson is from Austin, Texas. He graduated with a Bachelor of Science from the

University of Texas at San Antonio. He is currently completing his Masters of Science in Electrical

Engineering degree from the University of Texas at San Antonio (UTSA). His future plans include

attending a PhD program at UTSA.