personalized medicine: analytics for cancer survival curves ran qi, shujia zhou, yelena yesha june...

14
Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Upload: elissa-hedgecock

Post on 01-Apr-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Personalized Medicine:Analytics for Cancer Survival Curves

Ran Qi, Shujia Zhou, Yelena Yesha

June 13, 2013

IAB Meeting Research Report

Page 2: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Introduction: Cancer Staging (1)

• Cancer stage is an anatomic description of character and quantity of the extent of cancer spread (usually I to IV)– Prognostic factors• Tumor (T): size, location, local extent • Nodes (N): number, location of nodal metastases• Metastasis (M): presence of distance organ spread

Page 3: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Lung cancer staging (bin model)

Stage I T1 N0 M0Stage IIA T1 N1 M0

T2 N0 M0Stage IIB T2 N1 M0

T3 N0 M0Stage IIIA T1, 2 N2 M0

T3 N1, 2 M0Stage IIIB T4 N0,1,2 M0Stage IIIC Any T N3 M0Stage IV Any T Any N M1

bin

Page 4: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Lung cancer survival curves

Page 5: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

A Bin Model

• Breast cancer: 5 T’s, 4 N’s, 2 M’s - 40 bins• Adding grades (3 levels): 120 bins (5x4x2x3)• Adding ER (hormonal status, 2 levels) 240 bins• Thus, for additional variables, the number of

bins that would have to be added to a stage would be enormous, and collapsing into a stage would become impractical.

• “Bin” is also called “combination”.

Page 6: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Problems

• How to combine the growing number of prognostic factors into small number of stages– Since the TNM staging system was announced in

the 1950’s, many new prognostic factors have been identified.

– By 1995, 76 predictive factors for breast cancer. – By 2002, 150 factors for lung cancer.

• Different prognostic factors have different levels of impacts on the survival curves

Page 7: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Objectives

• Reduce the number of bins through grouping the similar patients

• Find the relationship between prognostic factors and survival curve

Page 8: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Approaches

• Grouping cancer patients according to their similarity• Ensemble algorithm for Clustering Cancer Data

(EACCD)• Grouping algorithm for Cancer Data (GACD)

Page 9: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Initialize groups of patients with cutoff

Partitioning clustering +statistical calculations

200,000 patients

Combinations

Log-rank test

Dissimilarity matrix

Learnt dissimilarity matrix

Hierarchical clustering with dendrogram

New groups of patients Kaplan-Meier Estimator

Cancer Patient Dataset

Step 1:

Step 2:

Step 3:

Step 4:

Survival curves

The GACD work flow

MCMC jump over local minimumWeight Increase efficiency

Page 10: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

GACD

• Features– A deterministic grouping method– Use weighted dissimilarity to improve the grouping efficiency.– Use MCMC to avoid local minima

• Results– Find that grouping results are sensitive to the partitioning

algorithms (e.g., PAM and Fuzzy)– Find that grouping results are different between local-minimum

and global-minimum partitioning algorithms.– Implemented weighted dissimilarity

Page 11: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Prognostic factors: Size, node, age, raceNumber of combinations: 59

Reduce 59 curves to 3

Page 12: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Evaluation Metric for Grouping Results

• The area enclosed by two Kaplan-Meier curves

• Linear correlation coefficient between the merging order of dendrogram and the area of Kaplan-Meier curves

Page 13: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Conclusion

• The expanded TNM system (e.g., EACCD and GACD) can analyze cancer survival with more prognostic factors.

• GACD improves the efficiency of grouping algorithm through using weights.

• The area enclosed by two Kaplan-Meier curves appears to be useful for evaluating grouping results.

Page 14: Personalized Medicine: Analytics for Cancer Survival Curves Ran Qi, Shujia Zhou, Yelena Yesha June 13, 2013 IAB Meeting Research Report

Acknowledgement

• This project is sponsored by NIST through NSF CHMPR. We would like to thank D. Chen, D. Henson, A. Schwartz, A. Dima, M. Brady the helpful discussions.