predicting stroke patient recovery from brain images: a machine learning approach

30
PREDICTING STROKE PATIENT RECOVERY FROM BRAIN IMAGES: A MACHINE LEARNING APPROACH Alastair Smith Supervised by Prof. Glyn Humphreys 1

Upload: alastaircharlessmith

Post on 22-Nov-2014

729 views

Category:

Documents


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

PREDICTING STROKE PATIENT

RECOVERY FROM BRAIN IMAGES:

A MACHINE LEARNING

APPROACH

Alastair Smith

Supervised by Prof. Glyn Humphreys

1

Page 2: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Introduction

Objectives

Can machine learning techniques applied to Computed

Tomography (CT) brain imaging data provide meaningful

predictions of functional recovery in stroke patients?

By exploring multiple machine learning techniques examine which approach provides

the most accurate predictions?

What aspects of the images is utilised by the machine learning algorithms to inform

predictions?

2

Page 3: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Introduction

Stroke: The Consequences

Recovery & Rehabilitation:

Effects include physical disability, loss of cognitive and communication skills, mental

health problems.

Recovery program specific to patient symptoms and commonly requires intervention

from physiotherapists, psychologists, occupational therapists, speech therapists and

specialist nurses and doctors.

A third of patients make a close to full recovery physically and are able to live an

independent life, a third will require assistance in daily activities, and a third of

patient will die within a year. (http://www.nhs.uk)

Impact in the U.K. (National Stroke Strategy, 2007)

Every year approximately 110,000 people in England have a stroke, with over 900,000

people currently living in England who have had a stroke.

Stroke is the single largest cause of adult disability with a third of people who have a

stroke left with long-term disability.

Stroke costs the NHS and the economy about £7 billion a year, despite U.K. services being

among the most expensive, outcomes for U.K. patients are comparatively poor with

unnecessarily long lengths of stay and high levels of avoidable disability and mortality.

3

Page 4: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Introduction

Machine Learning Techniques:

Increasingly Influential in Neuroscience and Clinical Medicine

(Belazzi & Zupan, 2008)

Informing individual patient management, selecting appropriate

treatments (Seker et al, 2003)

Brain Imaging Data

Large number of features, small number of samples

Avoids ‘overfitting’ problem

Machine Learning & Brain Imaging (1) 4

Page 5: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Introduction

MRI & fMRI

Support Vector Machine (SVM) applied to MRI data

Ecker et al (2010), Autistic Spectrum Disorder

Kloppel et al (2008), Alzheimer's Disease (acc = 96%, n=68)

Detection of other diseases: Fan et al (2005), Kawasaki et al (2007)

SVM applied to fMRI data

Classifiers developed to distinguish between stimuli, mental states and behaviours, demonstrating data contains sufficient information

For review see Norman et al (2006) and Haynes & Rees (2006)

Saur et al (2010) predicting recovery of stroke patients language abilities after 6 months, (acc = 76%, n=21)

Relevance Vector Regression (RVR) applied to fMRI data

Stonnington et al (2010):

Predicted continuous measure

Clinical measures of Alzheimer's Disease

Predicted Score and actual scores highly correlated (p<0.0001, n=163)

Machine Learning & Brain Imaging (2) 5

Page 6: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Introduction

Machine Learning & Brain Imaging (2)

PET & RVM

Phillips et al (2011):

Distinguish between levels of consciousness

Acc = 100%, n = 58

Computed Tomography (CT)

Automated image segmentation, Li et al (2006)

Haemorrhage detection, Liu et al (2008)

Reid et al (2010):

CT derived variables did not significantly improve multivariate logistic

regression models predictions of functional recovery in stroke patients

6

Page 7: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Nottingham Extended ADL

Ranked assessment of patients ability to complete activities of daily living (ADL) independently

Developed specifically for use with stoke patients (Nouri & Lincoln, 1987)

Completed by patient or carer via post or interview

Demonstrated to be a useful measure of outcome in stroke research

Gladman et al (1993)

Cited in 14 studies as a measure of stroke patient outcomes (Green et al, 2001)

Composed of 21 questions, split in to 4 subsections:

Mobility, Kitchen, Domestic, Leisure

High scores indicate low disability

Maximum score = 21, Minimum Score = 0

7

Page 8: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Data Acquisition

Participants

Patients of to stroke units within West Midlands area

Recruited as part of Birmingham University Cognitive Screen (BUCS) project

All patients selected for current study had suffered ischemic stroke

8

Inclusion Criteria:

• Informed Consent

• New Acute Stroke

• Alert

• Sufficient English Comprehension

Exclusion Criteria:

• Unwell

• Decline to participate

• Concentration span <35mins

Age Time from stroke

to scan (days) Time from stroke

to testing (days) n

NEADL 69.54 1.79 299.3 155

Page 9: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

NEADL data sets 9

Score n Mean SD

Good Recovery >=17 65 19.3 1.46

Poor Recovery <17 90 9.02 4.72

Very Good Recovery >=17 65 19.3 1.46

Very Poor Recovery <=12 65 14.5 1.24

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

No.

NEADL

Very

Good

Reco

very

To

p 4

2 p

erce

ntile

Very

Poor

Reco

very

Bott

om

42

perc

ent

ile

Good Recovery Poor Recovery

Page 10: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Data Acquisition

Computed Tomography (CT) images:

Capture density of tissue

In-plane resolution 0.5x0.5mm², slice thickness 4-5mm

Whole Brain

Pre-processing & Image Compression

Images of poor quality (due to head movement or other imaging issues) removed from sample

Images normalised to an in-house CT template (Ashbumer & Friston, 2003) using SPM8

Images segmented using unified segmentation SPM8 (Seghier et al, 2005) to form Grey Matter, White Matter and Cerebrospinal Fluid images

A further Abnormal tissue class was produced by adding an additional probability map (Seghier et al, 2008)

Smoothed Grey and White matter using a 12mm³ FHWM Gaussian kernel

10

Page 11: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Training & Testing

Cross Validation

Applied in 5 folds

Data set(s) randomly divided into 5 equal test sets

In each fold

Model trained on all samples not present in test set

Model tested on ability to assign correct labels to test set

Measures of performance

Performance measures record mean performance across all 5 folds

Accuracy = Proportion of correct classifications

Specificity = Proportion of samples correctly classified as ‘Bad’

Sensitivity = Proportion of samples correctly classified as ‘Good’

MCC = Matthews Correlation Coefficient (Matthews, 1975)

Common measure of performance for classifiers within machine learning literature

Balanced measure allows for uneven samples

Correlation coefficient equal to phi coefficient

+1 = perfect prediction

11

Page 12: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Improving Efficiency 12

Recursive Feature Elimination (RFE):

Features with the lowest weights attributed by the model are eliminated

iteratively

On each iteration:

Feature with lowest weight identified and eliminated from training data

New model trained on new training set

Training therefore becomes focused on voxels for which high weights are

assigned

Principle Component Analysis (PCA):

Reduce dimensionality of data set

Transforms set of correlated variables to smaller set of set of

uncorrelated variables

PCA applied to 2D data set (Jehan, 2005)

Page 13: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Method

Machine Learning Techniques

Support Vector Machine (Classifier):

Images treated as points in higher dimensional space

SVM aims to identify a hyperplane that separates the two classes, while maximising the distance between classes.

The hyperlane is defined by the set of images (support vectors) that lie on the maximal margin

Joachims (2002, 1999), based on Vapnik (1995)

Sparse Logistic Regression (Classifier):

Logistic regression method applied within Bayesian framework

Sparse Gaussian prior is assumed with mean zero

Iterative algorithm in which least informative features are pruned according to assigned weights

Yamashita et al (2008)

Relevance Vector Machine (Classification & Regression)

Applies Bayesian techniques within a functional form similar to that of an SVM

Probabilistic model therefore able to indicate probability of class membership

By altering the conditional distribution of the target variable RVMs can be applied to both classification and regression problems

Tipping et al (2001, 2003).

13

Optimal Separating Hyperplane defined by

set of support vectors

Page 14: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 14

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Page 15: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 15

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Page 16: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 16

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Page 17: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 17

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Page 18: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 18

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Page 19: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM) 19

SVM

Standard with PCA with RFE 99% Var Extremes

Tissue Type UnG AbT AbT AbT SmG

Accuracy / Pearson's r max 65% 69% 69% 70% 74%

mean n/a 59% 62% 60% 65%

Sensitivity max 54% 46% 66% 66% 71%

Specificity max 73% 87% 71% 73% 76%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48

p< max 0.001 0.001 0.0001 0.0001 0.0001

Frontal Section Horizontal Plane

Sagittal Plane

Relevance map threshold at 90%: • Voxels with weights (absolute value)

attributed by model in top 10 percentile

• Blue = negative weight

• Red = positive weight

R L R L

Page 20: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM & SLR) 20

SVM SLR

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Tissue Type UnG AbT AbT AbT SmG UnG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68%

mean n/a 59% 62% 60% 65% n/a 58%

Sensitivity max 54% 46% 66% 66% 71% 50% 74%

Specificity max 73% 87% 71% 73% 76% 63% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001

Page 21: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM & SLR) 21

SVM SLR

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Tissue Type UnG AbT AbT AbT SmG UnG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68%

mean n/a 59% 62% 60% 65% n/a 58%

Sensitivity max 54% 46% 66% 66% 71% 50% 74%

Specificity max 73% 87% 71% 73% 76% 63% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001

Page 22: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM, SLR & RVM) 22

SVM SLR RVM

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Standard with PCA

(99%) & RFE

Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69%

mean n/a 59% 62% 60% 65% n/a 58% 58%

Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%

Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001

Page 23: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM, SLR & RVM) 23

SVM SLR RVM

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Standard with PCA

(99%) & RFE

Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69%

mean n/a 59% 62% 60% 65% n/a 58% 58%

Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%

Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001

Page 24: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM, SLR, RVM & RVR) 24

SVM SLR RVM RVR

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Standard with PCA

(99%) & RFE

Standard with PCA (99%), RFE

& Standardised Scores

Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT UnG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69% 0.28 0.39

mean n/a 59% 62% 60% 65% n/a 58% 58% n/a 0.35

Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%

Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40 6.75 0.76

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001 0.001 0.0001

Page 25: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Results

NEADL Results (SVM, SLR, RVM & RVR) 25

SVM SLR RVM RVR

Standard with PCA with RFE 99% Var Extremes Standard with PCA

(99%) & RFE

Standard with PCA

(99%) & RFE

Standard with PCA (99%), RFE

& Standardised Scores

Tissue Type UnG AbT AbT AbT SmG UnG AbT SmG AbT UnG AbT

Accuracy / Pearson's r max 65% 69% 69% 70% 74% 58% 68% 67% 69% 0.28 0.39

mean n/a 59% 62% 60% 65% n/a 58% 58% n/a 0.35

Sensitivity max 54% 46% 66% 66% 71% 50% 74% 53% 77%

Specificity max 73% 87% 71% 73% 76% 63% 62% 76% 62%

MCC / RMSE max / min 0.27 0.30 0.37 0.40 0.48 0.13 0.37 0.33 0.40 6.75 0.76

p< max 0.001 0.001 0.0001 0.0001 0.0001 0.15 0.0001 0.0001 0.0001 0.001 0.0001

Page 26: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Discussion

Summary 26

Abnormal Tissue, Smoothed Grey Matter and Unsmoothed Grey Matter consistently

outperform other tissue types

Application of PCA and RFE improves model performance

Best performance produced when model trained on extreme samples within data set

RVM, SVM & SLR classifiers predict patient recovery with significant levels of accuracy

(p<0.001)

SVM & RVM produce similar levels of performance yet outperform SLR

RVR predictions are highly correlated with true scores (p<0.001)

Page 27: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Discussion

Wider Implications

Performance comparable to results in literature

Saur et al (2010) predict language outcome 6 months after stroke with 76% accuracy

using SVM classifier

Stonnington et al (2010) correlation between predicted and actual clinical measures of

Alzheimer's Disease (P<0.0001)

Stroke lesions generally more heterogeneous than those typically found in

Alzheimer's Disease patients

Few studies within currently literature applying Machine Learning to CT data to

predict patient recovery

27

Page 28: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Discussion

Methodological Issues

Model evaluation and selection

Noise may account for maximum values

Accepted methods of evaluation and model selection:

Average across 100 trials with sample order randomised

Adapt algorithm to select when performance peaks

Analyse in the context of 100 random trials with scores randomly assigned

28

Page 29: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Discussion

Future Study

Improving Performance:

Poor performance currently restricts application to patient management or assessment of intervention programs

Additional Variables – e.g. blood vessel effected

Isolate ROI:

Informed by literature (Saur et al, 2010)

Weight maps (Ecker, 2010)

Ensemble methods (Optiz, 1999):

Train on individual lobes

Bootstrap Aggregating

Predict improvement in ADL scores

Saur at al, 2010

Investigate role of weighted voxels

29

Page 30: Predicting Stroke Patient Recovery from Brain Images: A Machine Learning Approach

Discussion

Acknowledgments

Alan Meeson

Provided:

Original code for machine learning algorithms

Support and guidance throughout project

Vaia Lestou

Assisted in the design and analysis of current study

30