use of active learning for selective annotation of training data in a supervised classification...

29
Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1 , Michael Feldman 2 , John Tomaszewski 2 , Anant Madabhushi 1 1 Department of Biomedical Engineering, Rutgers, The State University of New Jersey 2 Department of Surgical Pathology, University of Pennsylvania http://lcib.rutgers.edu

Upload: mervin-heath

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Use of Active Learning for Selective Annotation of Training Data in a

Supervised Classification System for Digitized Histology

Scott Doyle1, Michael Feldman2, John Tomaszewski2, Anant Madabhushi1

1Department of Biomedical Engineering, Rutgers, The State University of New Jersey2Department of Surgical Pathology, University of Pennsylvania

http://lcib.rutgers.edu

Page 2: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Outline Background

Digital Prostate Histopathology Supervised Classification Active Learning

Methodology Active Learning Data Description Experimental Setup

Experimental Results Concluding Remarks

Page 3: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Prostate Cancer Detection

~1 million biopsies per year in USA 10-12 tissue samples per biopsy 80% benign diagnosis Large amount of data to analyze

Page 4: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Computer-Aided Diagnosis

Identifies regions of interest / suspicion Quantitative Automated Reduces variability

Supervised classification system

Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A. “A Hierarchical Computer-aided Classification Scheme for Automated Detection of Prostatic Adenocarcinoma from Digitized Histology,” APIII 2006

Page 5: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Supervised Classification Expert segmentation for training Histopathology:

Expensive, time-consuming to annotate Cost per training sample is high

Page 6: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Supervised Classification Random training inefficient Possible redundancy with existing

training No guarantee of improved accuracy

Page 7: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Solution: Active Learning Choose training samples intelligently, not

randomly Increased accuracy per training sample Forced choice of training, maximized accuracy

Useful where: Large amount of unlabeled data Annotations are expensive

Ideally suited for histopathology data

Page 8: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Active Learning

Classifier Performance

Accuracy

# of Training Samples

Random Learning

Active Learning

Page 9: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Previous Work Liu [2004], Vogiatzis and Tsapatsoulis [2006]

Gene microarray data Yao, et al [2008]

Content-based image retrieval Little work done in histopathology with Active

Learning

Page 10: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Outline Background

Digital Prostate Histopathology Supervised Classification Active Learning

Methodology Active Learning Data Description Experimental Setup

Experimental Results Concluding Remarks

Page 11: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Build Classifier

Active Learning Methodology

Cancer Non-cancerUncertain Classification

Obtained from pathologist

Training DataLabeled

Unlabeled

Build Classifier Classify UnlabeledTraining

Page 12: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Active Learning Methodology

Uncertain ClassificationInformative Samples

Certain ClassificationUninformative

Obtain Expert Labels Combine With Original Set

Eliminate, labeling these adds no information

+

Identify InformativeRegions

Page 13: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Active Learning Methodology

Generate New ClassifierNew Training Set

Page 14: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Feature Extraction

Cancer Region

Original Image

Feature Images

Page 15: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Classification

Feature Images C4.5 Decision Tree

Doyle, S., Madabhushi, A., Feldman, M., Tomaszeweski, J.: A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology, MICCAI, Lecture Notes in Computer Science, Vol. 4191, pp. 504-511, 2006.

“Random Forest” [Brieman, 2001]Majority voting determines classification

Page 16: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Image Data Description 27 H&E stained digital biopsy samples Data breakdown:

Initial Training Set Unlabeled Training Set Testing Set

Active Learning drawn from Unlabeled Training

Groups rotated so all images are tested

Page 17: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Classification Three training groups evaluated:

Initial set:

Active Learning set:

Random Learning set:

Initial Training

Active LearningInitial Training

RandomLearningInitial Training

+

+

Page 18: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Outline Background

Digital Prostate Histopathology Supervised Classification Active Learning

Methodology Active Learning Data Description Experimental Setup

Experimental Results Concluding Remarks

Page 19: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Results: Qualitative

Original ImageRandom LearningActive Learning

Page 20: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Results: Qualitative

Random Learning

Active Learning

Page 21: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Results: Qualitative

Original ImageRandom LearningActive Learning

Page 22: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Results: Qualitative

Active LearningRandom Learning

Page 23: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Quantitative Evaluation

Page 24: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Quantitative Evaluation

Area Under the ROC Curve

0.93

0.935

0.94

0.945

0.95

0.955

0.96

Initial Active Learning Random Learning

AU

C

Page 25: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Quantitative Evaluation

Classification Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Initial Active Learning Random Learning

Acc

ura

cy

Page 26: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Outline Background

Digital Prostate Histopathology Supervised Classification Active Learning

Methodology Active Learning Data Description Experimental Setup

Experimental Results Concluding Remarks

Page 27: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Concluding Remarks Maximize classification accuracy by choosing

training intelligently Efficiently obtain annotations Make the most use of “training budget” Build Active Learning into clinical applications

Online training correction / modification User feedback

Page 28: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman

Acknowledgements The Coulter foundation (WHCF 4-

29368) New Jersey Commission on Cancer

Research The National Cancer Institute

(R21CA127186-01, R03CA128081-01) The US Department of Defense

(427327) The Society for Medical Imaging and

Informatics

Page 29: Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman