dataminingincancerresearch ieee

5
14 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | FEBRUARY 2010 1556-603X/10/$26.00©2010IEEE Paulo J.G. Lisboa, Liverpool John Moores University, UK Alfredo Vellido, Technical University of Catalonia, SPAIN Roberto Tagliaferri and Francesco Napolitano, University of Salerno, ITALY Michele Ceccarelli, University of Sannio, ITALY José D. Martín-Guerrero, University of Valencia, SPAIN Elia Biganzoli, University of Milano, ITALY Application Notes I. Introduction dvances in cancer medicine have traditionally come from detailed understanding of biological pro- cesses, later translated into therapeutic interventions, whose effectiveness is established by rigorous analysis of clinical trials. Over the last two decades the increasing throughput of data from microarray screening, spec- tral imaging and longitudinal studies are turning the under- standing of cancer pathology into as much a data-based as a biologically and clinically driven science, with potential to impact more strongly on evidence-based deci- sion support moving towards personal- ized medicine [1]. This article is not intended as a comprehensive survey of data mining applications in cancer. Rather, it provides starting points for further, more targeted, literature searches, by embarking on a guided tour of computational intelli- gence applications in cancer medicine, structured in increasing order of the physical scales of biological processes, as outlined in Table 1. II. Robust Clustering and Data Visualization Over the last decade, cancer has become a data-intensive area of research, with accelerating developments in data- acquisition technologies and specific methodologies, for example, next-generation sequencing for monitoring genetic changes in tumor cells as they progress from normal to invasive [2]. The first step in unrav- elling these processes is the study of co-occur- rences, leading to the identification of disease sub-types. This already has profound implications for the clinical management of patients, impacting on one of the most fundamental precepts of cancer med- icine – the histological characterisation of malignancies. Currently this is over- whelmingly done by measuring the dif- ferentiation between cancerous cells and the original normal cells from which they have evolved, marking them as well to poorly differentiated, meaning that the changes are linked with good to poor prognosis.Yet it is becoming apparent that the metabolism of cells is a key factor in uncontrolled cell proliferation, with a potentially greater influence on disease progression than cell differentiation alone. Clustering is the first line analysis methodology to mine data bases for use- ful research hypotheses to take onto fur- ther, more directed, study [3], [4]. From a mathematical perspective the objects of study, be they genes, patients, mode of action drugs or disease sub-types, are points in a multi-dimensional space where similarity is defined, typically trough a measure of the distance between object pairs. The main classes of cluster- ing approaches used in data mining for cancer are partitional, creating a simple partition of the data, or hierarchical, which create a tree-structured hierarchy of the data. Partitional approaches form a very Digital Object Identifier 10.1109/MCI.2009.935311 Data Mining in Cancer Research TABLE 1 Overview of healthcare interventions across physiological scales. BIOLOGICAL PROCESSES PHYSIOLOGICAL SCALE INDUSTRIAL SECTOR APPLICATION DOMAIN MOLECULAR BIOLOGY GENOME AND PHYSIOME BIOINFORMATICS PHARMACEUTICS SUSCEPTIBILITY TO DISEASE METABOLIC PATHWAYS TARGETS FOR INTERVENTION CYTOLOGY AND HISTOLOGY CELL AND TISSUE LEVELS SYSTEMS BIOLOGY LABORATORY TESTS DISEASE DIAGNOSIS MEDICAL IMAGING ORGAN LEVEL NEUROINFORMATICS MEDICAL INFORMATICS MEDICAL EQUIPMENT & INSTRUMENTATION PHARMACEUTICS ANATOMICAL AND PHYSIOLOGICAL MONITORING ELECTROPHYSIOLOGICAL MEASUREMENT CLINICAL SIGNS SYSTEM LEVEL DIAGNOSIS, PROGNOSIS AND SCREENING OUT-OF-HOSPITAL CARE LEVEL OF THE INDIVIDUAL AMBULATORY MONITORING FOR EARLY DIAGNOSIS LIFESTYLE FACTORS POPULATION LEVEL PUBLIC HEALTH INFORMATICS DISEASE PROTECTION AND PREVENTION © EYEWIRE A Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.

Upload: ajithakalyankrish

Post on 09-Feb-2016

10 views

Category:

Documents


0 download

DESCRIPTION

mb

TRANSCRIPT

Page 1: DataMiningInCancerResearch IEEE

14 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | FEBRUARY 2010 1556-603X/10/$26.00©2010IEEE

Paulo J.G. Lisboa, Liverpool John Moores University, UK Alfredo Vellido, Technical University of Catalonia, SPAIN Roberto Tagliaferri and Francesco Napolitano, University of Salerno, ITALY Michele Ceccarelli, University of Sannio, ITALY José D. Martín-Guerrero, University of Valencia, SPAINElia Biganzoli, University of Milano, ITALY

Application Notes

I. Introductiondvances in cancer medicine have

traditionally come from detailed

understanding of biological pro-

cesses, later translated into therapeutic

interventions, whose effectiveness is

established by rigorous analysis of

clinical trials. Over the last

two decades the increasing

throughput of data from

microarray screening, spec-

tral imaging and longitudinal

studies are turning the under-

standing of cancer pathology

into as much a data-based as

a biologically and clinically

driven science, with potential to impact

more strongly on evidence-based deci-

sion support moving towards personal-

ized medicine [1].

This article is not intended as a

comprehensive survey of data mining

applications in cancer. Rather, it provides

starting points for further, more targeted,

literature searches, by embarking on a

guided tour of computational intelli-

gence applications in cancer medicine,

structured in increasing order of the

physical scales of biological processes, as

outlined in Table 1.

II. Robust Clustering and Data VisualizationOver the last decade, cancer has become

a data-intensive area of research, with

accelerating developments in data-

acquisition technologies and specific

methodologies, for example, next-generation

sequencing for monitoring genetic

changes in tumor cells as they progress

from normal to invasive [2].

The first step in unrav-

elling these processes is

the study of co-occur-

rences, leading to the

identification of disease

sub-types. This already has

profound implications for the

clinical management of patients,

impacting on one of the most

funda mental precepts of cancer med-

icine – the histological characterisation

of malignancies. Currently this is over-

whelmingly done by measuring the dif-

ferentiation between cancerous cells and

the original normal cells from which

they have evolved, marking them as well

to poorly differentiated, meaning that the

changes are linked with good to poor

prognosis. Yet it is becoming apparent that

the metabolism of cells is a key factor in

uncontrolled cell proliferation, with a

potentially greater influence on disease

progression than cell differentiation alone.

Clustering is the first line analysis

methodology to mine data bases for use-

ful research hypotheses to take onto fur-

ther, more directed, study [3], [4]. From

a mathematical perspective the objects of

study, be they genes, patients, mode of

action drugs or disease sub-types, are

points in a multi-dimensional space

where similarity is defined, typically

trough a measure of the distance between

object pairs. The main classes of cluster-

ing approaches used in data mining for

cancer are partitional, creating a simple

partition of the data, or hierarchical, which

create a tree-structured hierarchy of the

data. Partitional approaches form a very

Digital Object Identifier 10.1109/MCI.2009.935311

Data Mining in Cancer Research

TABLE 1 Overview of healthcare interventions across physiological scales.

BIOLOGICAL PROCESSES

PHYSIOLOGICAL SCALE

INDUSTRIAL SECTOR

APPLICATION DOMAIN

MOLECULAR BIOLOGYGENOME AND PHYSIOME

BIOINFORMATICSPHARMACEUTICS

SUSCEPTIBILITY TO DISEASE

METABOLIC PATHWAYSTARGETS FOR INTERVENTION

CYTOLOGY AND HISTOLOGY

CELL AND TISSUE LEVELS

SYSTEMS BIOLOGYLABORATORY TESTS

DISEASE DIAGNOSIS

MEDICAL IMAGING

ORGAN LEVEL

NEUROINFORMATICS

MEDICAL INFORMATICS

MEDICAL EQUIPMENT & INSTRUMENTATION

PHARMACEUTICS

ANATOMICAL AND PHYSIOLOGICAL MONITORING

ELECTROPHYSIOLOGICAL MEASUREMENT

CLINICAL SIGNSSYSTEM LEVEL DIAGNOSIS, PROGNOSIS

AND SCREENING

OUT-OF-HOSPITAL CARE

LEVEL OF THE INDIVIDUAL

AMBULATORY MONITORING FOR EARLY DIAGNOSIS

LIFESTYLE FACTORSPOPULATION LEVEL PUBLIC HEALTH

INFORMATICSDISEASE PROTECTION AND PREVENTION

© EYEWIRE

A

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.

Page 2: DataMiningInCancerResearch IEEE

FEBRUARY 2010 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 15

heterogeneous class, including techniques

based on Global Optimization, Vector

Quantization, Kernel functions, Fuzzy

sets, and Graph theory [5]. Hierarchical

approaches can be agglomerative or divisive

and are often used as for first-line clus-

tering of high-throughput data. How-

ever, high- dimensional data spaces are

by nature ultra-metric, each data point

appearing to be at the centre of a hollow

sphere, in contrast with the usual intui-

tive picture in low dimensions [6]. This

effect needs to be taken into account

especially with agglomerative methods

to avoid early stage errors that can arise,

resulting in sub-optimal solutions com-

pared with partition algorithms. A third

main class of algorithms is biclustering

that is to say clustering of both objects

and features within a single model [7].

The range of clustering techniques illus-

trates the complexity of the problem.

The fundamental concept underlying

the definition of clusters, “similarity”, is

complex in itself because of its ambigui-

ty. This difficulty is compounded when

the most relevant features are not known

a priori, yet the choice of different fea-

tures usually produces different solutions.

An open research question is feature selec-

tion operating with unsupervised train-

ing. This will normally involve

parametric modeling and the application

of Bayesian methods to derive the dis-

criminative features which best separate

between the clusters [8].

In addition to such intrinsic complex-

ity, purely algorithmic issues may give

rise to different solutions for the same

data set, even for the same algorithm.

This raises the possibility that cluster sta-

bilization methods including global stabil-

ity assessments [10], [11] and consensus

clustering [12], [13] methods may miss the

real complexity of the problem.

An alternative approach to clustering

assessment shifts the focus from the

search of the best solution to the search

of a set of equally plausible, though

different, solutions [14]. Recently, meta-

clustering [15] (see fig 1) has been pro-

posed as a clustering assessment tool. It is

the process of clustering the solutions i.e.,

the cluster partitions themselves, general-

izing the concept of global stability to

local stability (see Fig. 1), that is, the sta-

bility of a solution with respect to a

homogenous subset of the candidates.

The cluster composition then needs

to be explored in detail, for which den-

drograms provide a useful representation,

usually through sequential univariate

splitting of the data space. However, the

interpretation of geometrical relationships

in the data is often best derived from

direct spatial representations. The most

commonly used methods for data visual-

ization are Multi Dimensional Scaling,

Principal Component Analysis (PCA)

and Self-Organizing Maps [16], [17].

Linear methods are particularly inter-

esting since they enable axes projections

to be overlaid on the data. An efficient

methodology for cluster-based linear

visualization is to use a decomposition of

the covariance matrix which explicitly

takes into account cluster membership.

This is the natural tailoring of PCA to the

case where the data are partitioned into

labelled cohorts – an example from breast

cancer proteomics is shown in Fig. 2 [18].

III. Pathway ModelingThe study of the cell functions is mod-

elled from experimental data sets which

are both complex and dynamical [19],

[20]. This has spawned the field of Sys-

tems Biology to understand and describe

how molecular components interact to

manifest the phenotypic behaviour

which is the target of disease interven-

tion [21], illustrated in Fig. 3. In the sys-

tems biology literature there are two

main classes of approaches:

Inverse problems, which involves ❏

mining data from many perturbation

experiments to identify a compatible

structure of the signaling pathway.

Direct problems, to forecast and simu- ❏

late the forward evolution of dynami-

cal systems, using differential equations.

Several methods make the assump-

tion that the expression level of a given

gene can be considered as a random

variable and the mutual relationships

between them can be derived by statisti-

cal dependencies. The resulting system is

embedded in a probabilistic Graphical

Model [22] described for example by a

Bayesian Network [23], a Factor Graph

[24], a Gaussian Graphical Model [25]

or a Mutual Information derived Influ-

ence Network [26].

IV. Cancer Imaging Cancer imaging is a vast field focused

on non-invasive measurement for

1 2

3 4

5

6 7

8

9

10

11

12 13

14 15

16 17 18 19

20

21 22

23 24 25 26 27 28 29 30 31 32 33 34 35

36

37 38 39 40 41

42 43

44

45 46

47

48 49

50 51

52

53 54 55

56 57

58 59

60

61 62

63

64

65 66

67

68 69 70

71 72

73

74 75 76 77 78 79 80 81

82

83 84 85

86 87

88 89

90 91

92

93

94 95 96 97 98

99 100101102103104105 106107108

109110111

112113

114

115116

117

118119120

121

122

123

124

125126127128129

130131

132133

134

135

136137

138

139

140

141

142

143144

145

146

147148149150151

152153

154

155

156

157

158159160161

162

163

164165

166

167

168

169170

171

172

173

174

175

176

177

178

179180

181

182

183

184

185

186

187

188

189190

191

192

193

194195196

197

198

199 200

201

202203204

205

206

207

208 209

210

211

212

213

214

215

216

217

218219

220

221

222

223224225226

227

228229

230

231232

233

234

235

236

237

238

239240

241

242

243

244

245

246

247248 249

250

251

252253254255256

257

258

259260

261262263264265266

267

268

269

270271

272273

274

275

276

277

278

279

280

281282

283

284

285

286

287

288

289

290

291292

293

294

295

296297298

299

300

301

302

303

304

305306

307308

310311

312

313

314

315

316

317318

319

320

321

322

323

324

325

326

327328

329

330331332

333

335336

337

338339

340

341

342

343

344

345346347348

349

350351

352353

354

355

356357

358

359

360

361362363

364

365366367368

369370

371

−0.1 0 0.1 0.2

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

40.5

41

41.5

42

42.5

Distortion

309

334

FIGURE 1 An illustration of meta-clustering. Each point on the map represents a clustering solution and close points represent similar solutions. Colors indicate a measure of quality for the solutions and red circles highlight local stability [15].

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.

Page 3: DataMiningInCancerResearch IEEE

16 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | FEBRUARY 2010

cancer detection, diagnosis and grading,

and to monitor response to therapy by

measuring tumor size and detecting re-

growth. Having started out as mainly

anatomical measurement for anomaly

detection as an indicator of malignancy,

for instance using ultrasound and Mag-

netic Resonance Imaging (MRI), can-

cer imaging has developed into multiple

modalities of physiological imaging

including spectroscopy, with Magnetic

Resonance Spectroscopy (MRS) and

spatially localized Chemical Shift Imag-

es (CSI), with Positron-Emission

Tomography (PET) and Single-Photon

emission CT (SPECT) clinically widely

used to measure glucose metabolism

and therefore find regions with excep-

tionally high energetic rates, which are

characteristic of tumor growth. An

accessible review of machine learning

applications to detection and diagnosis

of disease may be found in [27].

A recent development is the use of

specific functional imaging for tumor

delineation, that is to say, to measure

the contour during surgical planning

and to identify tumor striations that

may not be apparent clinically even

during re-section. Initial efforts in this

direction applied blind signal separation

to single-voxel spectra to separate infil-

trating tumor mass compared with

necrotic tissue [28]. However, tumors

are not homogeneous, often mixing

different types of tissue and sometimes

with different grades of malignancy.

One way to resolve this mixture is to

segment the tumor composition, which

is called nosological imaging, of which

examples of diagnostic applications in

brain are reported in [29], [30]. A relat-

ed and very important clinical area is

follow-up imaging to determine tumor

progression and to differentiate recur-

rent tumor growth from treatment-in-

duced tissue changes [31], [32].

V. Prognostic OutcomeOutcome modeling usually refers to

the analysis of patterns of mortality and

recurrence in longitudinal cohort stud-

ies, where a specific groups of patients

followed-up for up to 20 years, record-

ing the occurrence of events of interest

which may be cancer specific mortality,

overall mortality or local and distal

recurrence. The value of these studies is

closely linked to the design of the fol-

low-up process, highlighting the need

to involve biostatistical expertise from

the very earliest stages in study design.

It is required to model the likeli-

hood of the event of interest, condi-

tional on the event not taking place at

the start of the time interval and taking

into account the loss of patients to fol-

low-up for reasons other than the event

of interest, termed censorship.

The application in medical statistics of

linear-in-the-parameters models to non-

linear effects in outcome data generally

involves discretising the data space to gen-

erate piecewise linear approximations to

the desired response surfaces. However,

this heuristic approach can have important

consequences in cancer research, making

it difficult to mine the hazard function for

insights into the dynamics of metastatic

spread [33] and inducing a categorisation

of the histological grade of the tumor,

which can give rise to important sub-

jective effects that impact on the choice

of therapy given to the patient [34].

Neural network models fit non-lin-

ear, time dependent estimates of the

hazard function, without the need to

make proportionality assumptions or to

discretize the state-space [35], [36] and

their accuracy has been established by

multi-centre, double-blind, evaluation

[37] and this framework has recently

been extended for estimating the joint

model for more than one risk, e.g. local

and distal recurrence [38].

VI. Conclusion Computational intelligence methodol-

ogy has been widely applied to data

–250

–200

–150

–100 –5

0 0 50 100

150

200

250

–350

–300

–250

–200

–150

–100

–50

0

50

100

150

8

24

1812219

11923

Projection onto Axis 1

(a)

–250

–200

–150

–100 –5

0 0 50 100

150

200

250

Projection onto Axis 1

(b)

1020255

Pro

jectio

n o

nto

Axis

2

–500

–400

–300

–200

–100

0

100

200

300

82418

1

2219

119

231020

255

Pro

jectio

n o

nto

Axis

3FIGURE 2 Two views of cluster-based visualization of 25-d protein expression date from breast cancer histology showing (a) the HER-2 positive cohort distinct from the rest and (b) rotating the 3-d view to illustrate the separation between the projections of the remaining seven clusters [18].

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.

Page 4: DataMiningInCancerResearch IEEE

FEBRUARY 2010 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 17

mining in cancer but open questions

remain before we can fully exploit the

potential of molecular biology, to inte-

grate this into routine imaging and

other non-invasive measurements for

screening and early diagnosis, also to

deliver consistency in laboratory mea-

surements especially in histopathology,

and to more accurately model the bal-

ance between benefit and risk of thera-

py for niche groups of patients, down

to the level of the individual.

New data mining methods need to

be developed to handle high-dimen-

sionality and large data volumes, to more

efficiently model the dynamics of deep

molecular pathways with high levels of

noise and small experimental sample

sizes, and also to robustly estimate haz-

ard functions with censored data with

multiple competing risks. Analytical

models need also to be integrated with

clinical expert knowledge and processes,

potentially by mapping into Boolean

filters [39].

The road ahead is long and bumpy.

This article is provided as a sign-post to

the nature of the key methodological

and application challenges in this grow-

ing and exciting field.

AcknowledgmentsThis work was supported in part by the

Biopattern NoE under FP6/2002/IST/1

and IST-2002-508803. Further support

was provided by MICINN research

project TIN2009-13895-C02-01.

References[1] P. J. G. Lisboa, A. F. G. Taktak, “The use of artif icial neural networks in decision support in cancer: A system-atic review,” Neural Netw., vol. 19, pp. 408–415, 2006. [2] P. J. Campbell, et al., “Identif ication of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing,” Nature Genet-ics, vol. 40, pp. 722–729, 2008. [3] N. Belacel, Q. C. Wang, and M. Cuperlovic-Culf, “Clustering methods for microarray gene expression data,” Omics, vol. 10, no. 4, pp. 507–531, 2006.[4] D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene expression data: A survey,” IEEE Trans. Knowledge Data Eng., vol. 16, no. 11, pp. 1370–1386, 2004.[5] D. Wunsch and R. Xu, Clustering (IEEE Press Series on Computational Intelligence). Washington, DC: IEEE Com-puter Society Press, 2008. [6] F. Murtagh, G. Downs, and P. Contreras, “Hierarchi-cal clustering of massive, high dimensional data sets by

Thyroid Cancer

Normal Thyrocyte

Normal Thyrocyte

Follicular Adenoma

Follicular Carcinoma

Undifferentiated

Carcinoma

05216 3/31/09

© Kanehisa Laboratories

Papillary

Carcinoma

TRK

REDPTC

Ras

Ras

BRAF

BRAF

PPFP

PPARγ

RXR

BRAFMEK ERK

MEK ERK

Thyroid Follicular Epithelial Cell

Thyroid Follicular Epithelial Cell

+p

+p

+p +p

+p

DNA

ProliferationSurvival

DNA

ProliferationSurvival

DNA

Proliferation

Proliferation

DNA

DNA

DNA

Damage

GenomicInstability

Impaired GICycle Arrest

ReducedApoptosis

MAPK SignalingPathway

p53 SignalingPathway

PPAR SignalingPathway

Ligands

p53

c-Myc

CyclinD1TCF/LEFEACD β-Catenin

AdherensJunction

Wnt SignalingPathway

Downregulation

IncreasedInvasion andCell Motility

Genetic Alterations

Oncogenes: RET/PTC1, RET/PTC3,TRK (TRM3/NTRK1),TRK-T1, T2 (TPR/NTRK1),TRK-T3 (TFG/NTRK1),BRAF, Ras,

PPFP (PAX8/PPARγ), β-Catenin

Tumor Suppressors: p53

FIGURE 3 An excerpt of the papillary thyroid carcinoma pathway from the KEGG database.

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.

Page 5: DataMiningInCancerResearch IEEE

18 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | FEBRUARY 2010

exploiting ultrametric embedding,” SIAM J. Sci. Com-put., vol. 30, pp. 707–730, 2008. [7] S. C. Madeira and A. L. Oliveira, “Biclustering al-gorithms for biological data analysis: A survey,” IEEE/ACM Trans. Computat. Biol. Bioinform., vol. 1, no. 1, pp. 24–45, 2004.[8] L. Carrivick, et al., Identif ication of prognostic sig-natures in breast cancer microarray data using Bayesian techniques,” J. R. Soc. Interface, vol. 3, no. 8, pp. 367–381, 2006. [9]. A. Ben-Hur, A. Elisseeff, and I. Guyon, “A stabil-ity based method for discovering structure in clus-tered data,” in Biocomputing (Proc. Pacific Symp.), vol. 7, R. B. Altman and K. Lauderdalc, Eds. Kauai, Hawaii, USA, 2002, pp. 6–17.[10] M. K. Kerr and G. A. Churchill, “Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments,” Proc. Natl. Acad. Sci., vol. 98, pp. 8961–8965, 2001. [11] L. I. Kuncheva and D. P. Vetrov, “Evaluation of sta-bility of k-means cluster ensembles with respect to ran-dom initialization,” IEEE Trans. Pattern Anal. Machine Intell., vol. 28, no. 11, pp. 1798–1808, 2006.[12] A. Gionis, H. Mannila, and P. Tsaparas, “Cluster-ing aggregation,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, 2007. [13] N. Nguyen and R. Caruana, “Consensus clustering,” in Proc. 7th IEEE Int. Conf. Data Mining (ICDM), 2007, pp. 607–612.[14] R. Caruana, M. Elhawary, N. Nguyen, and C. Smith, “Meta clustering,” in Proc. 6th IEEE Int. Conf. Data Mining (ICDM), 2006, pp. 107–118.[15] A. Ciaramella, et al., “Interactive data analysis and clustering of genomic data,” Neural Netw., vol. 21, pp. 368–378, 2008. [16] J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction. New York: Springer-Verlag, 2007. [17] J. Vesanto, “SOM-based data visualization methods,” Intell. Data Anal., vol. 3, pp. 111–126, 1999.

[18] P. J. G. Lisboa, I. O. Ellis, A. R. Green, F. Ambrogi, and M. B. Dias, “Cluster-based visualisation with scat-ter matrices,” Pattern Recogn. Lett., vol. 29, no. 13, pp. 1814–1823, 2008.[19] J. Hasty, D. McMillen, F. Isaacs, and J. J. Collins, “Com-putational studies of gene regulatory networks: In numero molecular biology,” Nature Rev. Gene., vol. 2, pp. 268–279, 2001. [20] H. de Jong, “Modeling and simulation of genetic regulatory systems: A literature review,” J. Computat. Biol., vol. 9, no. 1, pp. 67–103, 2002.[21] E. J. Borowski and J. M. Borwein, Computational Modeling of Genetic and Biochemical Networks. Cambridge, MA: MIT Press, 2001.[22] N. Friedman, “Inferring cellular networks using prob-abilistic graphical models,” Science, vol. 303, p. 799, 2004. [23] C. Needham, J. Bradford, A. Bulpitt, and D. West-head, “A primer on learning in Bayesian networks for computational biology,” PLOS Computat. Biol., vol. 3, no. 8, pp. 1409–1416, 2007.[24] I. Gat-Viks, A. Tanay, D. Raijaman, and R. Shamir, “A probabilistic methodology for integrating knowledge and experiments on biological networks,” J. Computat. Biol., vol. 13, no. 2, pp. 165–181, 2006.[25] J. Schafer and K. Strimmer, “An empirical Bayes ap-proach to inferring large-scale gene association networks,” Bioinformatics, vol. 21, no. 6, pp. 754–764, 2005.[26] K. Basso, et al., “Reverse engineering of regulatory networks in human B cells,” Nature Gene., vol. 37, no. 4, pp. 382–290, 2005. [27] P. Sajda, “Machine learning for detection and di-agnosis of disease,” Annu. Rev. Biomed. Eng., vol. 8, pp. 537–565, 2006. [28] Y. Huang, P. J. G. Lisboa, and W. El-Deredy, “Tu-mour grading from magnetic resonance spectroscopy: A comparison of feature extraction with variable selection,” Stat. Med., vol. 22, no. 1, pp. 147–164, 2003.[29] F. Szabo De Edelenyi, et al., “A new approach for analyzing proton magnetic resonance spectroscopic im-

ages of brain tumors: Nosologic images,” Nature Med., vol. 6, pp. 1287–1289, 2000. [30] M. De Vos, T. Laudadio, A.W. Simonetti, A. Heerschap, and S. Van Huffel, “Fast nosologic imaging of the brain,” J. Magnet. Reson., vol. 184, no. 2, pp. 292–301, 2007.[31] A. H. Jacobs, et al., “Imaging in neurooncology,” J. Amer. Soc. Exp. NeuroTherapeu., vol. 2, pp. 333–347, 2005. [32] S. Cha, “Neuroimaging in neuro-oncology,” J. Amer. Soc. Exp. NeuroTherapeu., vol. 6, pp. 465–477, 2009. [33] R. Demicheli, P. Valagussa, and G. Bonadonna, “Double-peaked time distribution of mortality for breast cancer patients undergoing mastectomy,” Breast Cancer Res. Treat., vol. 75, pp. 127–134, 2002. [34] J. A. Woolgar, “The predictive value of detailed histolog-ical staging of surgical resection specimens in oral cancer,” in Outcome Prediction in Cancer, A. F. G. Taktak and A. C. Fisher, Eds. U.K.: Elsevier, 2007, pp. 3–26.[35] E. Biganzoli, P. Boracchi, L. Mariani, and E. Marubini, “Feed forward neural networks for the analysis of censored survival data: A partial logistic regression approach,” Stat. Med., vol. 17, pp. 1269–1206, 1998. [36] A. Eleuteri, R. Tagliaferri, L. Milano, S. De Placido, and M. De Laurentiis, “A novel neural network-based survival analysis model,” Neural Netw., vol. 16, no. 5–6, pp. 855–864, 2003.[37] A. Taktak, et al., “Double-blind evaluation and bench-marking of survival models in a multi-centre study,” Comput. Biol. Med., vol. 37, no. 8, pp. 1108–1120, 2007. [38] P. J. G. Lisboa, et al., “Partial logistic artif icial neural network for competing risks regularised with automatic relevance determination,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1403–1416, 2009. [39] T. Rögnvaldsson, T. A. Etchells, L. You, D. Garwicz, I. H. Jarman, and P. J. G. Lisboa, “How to find simple and accurate rules for viral protease cleavage specificities,” BMC Bioinform., vol. 10, pp. 149, 2009.

From Imagination to Market

IEEE Expert NowThe Best of IEEE Conferences and Short Courses

Free Trial!Experience IEEE – request a trial for your company.

www.ieee.org/expertnow

An unparalleled education resource that provides the latestin related technologies.

Keep up-to-date on the latest trends in related technologies Interactive content via easy-to-use player-viewer, audio and

video fi les, diagrams, and animations Increases overall knowledge beyond a specifi c discipline 1-hour courses accessible 24/7

“With IEEE, we have24/7 access to the technical informationwe need exactly whenwe need it.”

– Dr. Bin Zhao, Senior Manager, RF/Mixed Signal Design Engineering, Skyworks Solutions

07-PIM-0181f Expert Now Half.ind1 1 7/10/07 10:40:27 AM

Authorized licensed use limited to: LIVERPOOL JOHN MOORES UNIVERSITY. Downloaded on February 12, 2010 at 05:38 from IEEE Xplore. Restrictions apply.