crowdsourcing predictive analytics to enhance clinical ... · – crowdsourcing is the practice of...
Post on 19-Aug-2020
2 Views
Preview:
TRANSCRIPT
Crowdsourcing Predictive Analytics to Enhance Clinical Trial Design Scott A. Jelinsky, Ph.D Computational Precision Medicine Inflammation and Immunity Research Area
Exacerbation requiring emergency visit
Patient selection strategy in Chronic Obstruction Pulmonary Disease (COPD)
2
http://www.nhlbi.nih.gov/health//dci/Diseases/Copd/Copd_WhatIs.html
20%
No Effective way to identify these patients
Stable or slow decline in disease
80%
Patient-Centric DataCommons
Pfizer Confidential │ 3
• Routine clinical trial collect an unprecedented amount data • 1000s to millions of data points per patient • Diverse Data types collected
• Advanced data analytics are need to analysis this wealth of data
Genotype data
Pfizer Confidential │ 4
N=10,000
N=3.2 Billion
N=1 Million
• Genetics: a discipline of biology, is the science of heredity and variation in living organisms.
Imaging Lung Data
Pfizer Confidential │ 5
https://commons.wikimedia.org/wiki/File:Pulmon_fibrosis.PNG#/media/File:Pulmon_fibrosis.PNG
Med Image Comput Comput Assist Interv 2009; 12:690-8
Acad Radiol. 2012 Oct; 19(10): 1241–1251.
Data reduced to ~100 numerically derived fields
Integration of Diverse Biomolecular and Clinical Data
• Our goal: Integrate our Genetics, Genomics, Clinical, Text mining efforts to enable powerful analyses
Open Sourcing
Internal Efforts Externaliza6on
Knowledge to help develop pa6ent stra6fica6on and drive clinical development
Consor6a
• From Wikipedia – Crowdsourcing is the practice of obtaining
needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers
What is Crowdsourcing/Open innovation?
• Crowdsourcing can apply to a wide range of activities • Microtasks- Division of labor for tedious tasks (Wikipedia) • MacroTasks- finding a specific skill for a job (Web Design) • Crowdfunding engagement of social networks to raise money • Crowdcontests a broad-based competition to identify the best solution
for a particular question
Prize based contest approach for open innovation
• Prize-based contest approach – Generalize any life sciences problem into generic computer-science terms
• Remove bias
– Diverse group of programmers tackle the problem
– Award prizes/milestones payments for best solutions • Contest run on platforms including TopCoder.com and
CrowdANALYTIX.com • Community of over 500,000 developers and data scientists
Crowdsourced predictive algorithm development
Pfizer Confidential │ 9
Collect data • Iden6fy data/ques6ons • Obfuscate data to protect pa6ent/Data privacy
Predic6ve models • Crowdsource custom predic6ve models
Visualiza6on / Integra6on • Crowdsource the visualiza6on
Implementa6on • Server Support • Update code
Question Formulation/Data collection
• Identify data/questions – Babbage Analytics and Innovation contracted to help formulate
questions that would be suitable for a crowd-based contest • Goal:
– Create a model that will predict whether a patient with lung disease will experience worsening of symptoms
• Objectives: – Given baseline data and LFU outcomes, develop an algorithm to
predict which patients are more likely to exhibit an exacerbation. – Uncover top variables which can help identify and monitor
exacerbation of disease.
Pfizer Confidential │ 10
Collect data • Iden6fy data/ques6ons • Obfuscate data to protect pa6ent/Data privacy
DataSets
Collect data • Iden6fy data/ques6ons • Obfuscate data to protect pa6ent/Data privacy
• Observation study of past and current smokers – Clinical data (over 400 data points) for >10,000 subjects – High resolution radiology data for >10,000 subjects – Genotype data for >10,000 subjects – Telephony follow-up (3-6 month intervals)
Data was obfuscated to protect patient privacy
• Obfuscate data to protect patient/Data privacy – Contest to have the experts develop a software
solution to obfuscate data • MetaDataEngine
– Python script developed through crowdsourcing contest
• Patient IDs de-identified • All data labels stripped. • All continuous and non continuous data values
normalized to values between 0.0 and 1.0 • Columns/Rows will be randomized and rearranged
– $400 in prize money
Pfizer Confidential │ 12
Collect data • Iden6fy data/ques6ons • Obfuscate data to protect pa6ent/Data privacy
Predictive Analytics contest
• Contest run on CrowdANALYTIX to develop an algorithm to predict which patients are more likely to exhibit an exacerbation – 412 people registered – 101 people submitted solutions
Pfizer Confidential │ 13
Leader BoardPosition Solver Score Location
1 Guschin Alexander 0.8616 Isreal2 Andrey Shapulin 0.86052 Bagdad3 Pietro Marini 0.86003 Amsterdam4 Rohan Rao 0.85954 Kolkata5 piotrek 0.85942 Amsterdam6 Stanislav Semenov 0.85899 Baghdad7 Roberto Abalde 0.85831 Buenos Aires8 Sriram Sampathraman 0.85605 Kolkata9 marios michailidis 0.85521
10 Marija Zoldin 0.85507 Amsterdam11 Manuel Amunategui 0.85083 US12 NimNid 0.85052 Kolkata13 Giuseppe C. 0.85016 Amsterdam
Data Set 400 Clinical Variables 1000 Genetic Markers
Predic6ve models (Complete) • Crowdsource custom predic6ve models
Crowd Sourced prediction contest
• 30 Day Contest – Obfuscated data – 101 unique submitters – ~2870 submissions – $9000 in prize money
• Winning solution had a number of different approaches – Place 1. Random Forest Classifier – Place 2. XGBoost and Logistic Regression – Place 3. Ensemble of KNN, Logistic Regression and Random
Forest – Place 4. Extremely Random Forest – Place 5. Extreme Gradient Boosting
Pfizer Confidential │ 14
Exploration of solution landscape
• Top 5 submission had very similar performance – Over 2800 submissions – Sufficient test of the
landscape
Pfizer Confidential │ 15
Predictive categories
MeanDecreaseGinigenScore 18.9
Quality of life Questionare 7.1Quality of life Questionare 6.1Quality of life Questionare 5.7Quality of life Questionare 5.1Quality of life Questionare 5.1
Lung Function 3.8Lung Function 3.2Lung Function 3.2Lung Function 3.1Lung Function 3.0Lung Function 2.9
Quality of life Questionare 2.9Quality of life Questionare 2.8Quality of life Questionare 2.6
21% increase in performance over current algorithm
What we have Learned
• Sufficient test of the solution landscape – Which algorithms work and which do not – Key predictive variables
• Significant effort spent on identification of key predictive variables – Most variables are relatively easy to measure
• Identification of key next steps
Pfizer Proprietary Information│ 16
Predictive Analytics contest
• Results were used to refine data/Question • Second contest run on CrowdANALYTIX to predict
patients that will have increased disease severity – 350 people registered – 146 people submitted solutions – 2527 different code versions
Pfizer Confidential │ 17
Data Set 60 Clinical Variables 3309 patients 430 Exacerbators
Predic6ve models (Complete) • Crowdsource custom predic6ve models
Additional high level Annotation provided
Crowd Sourced prediction contest
• Winning solution had a number of different approaches – Place 1. Extremely Random Forest Classifier – Place 2. Logistic Regression – Place 3. SVM with Regularization – Place 4. Logistic Regression with elastic net – Place 5. PCA
Pfizer Confidential │ 18
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
Previous Exacerba6on Random Forest Winning Solu6on
Accuracy
+2%
+14% • Unique creation and
selection of variables – PCA, Random Forest,
Linear combinations
Integration and utilization
• Need to develop front end visualization (APP or Dashboard or Web site) – Algorithm does not need to available to the end user – Data is collected at screening visit, entered and probability of
outcome is reported back • Turned to the crowd to develop visualization dashboard
Pfizer Proprietary Information│ 19
Visualiza6on • Crowdsource the visualiza6on
Clinical Prediction Dashboard
• Trial Information Configuration : – List of Sites, – Target Enrollments – Inclusion/Exclusion criteria
• Enrollment Summary : – Summary of Estimated Target Number of Patients – Visualization of Enrollment numbers – Visualization of geographic enrollment number
• Patient Prediction Scores – Summary of each patient’s predicted efficacy, safety and dropout scores.
Pfizer Confidential │ 20
Patient Prediction dashboards
General Information
Sites
Inclusion/Exclusion criteria
Integration with algorithm and ECR/Clinical trials
• Visualization needs to be integrated with predictive algorithm • Data input directly from ECR (Pfizer Electronic Clinical Record ) • Multiple views to support different
• Clinical view (most simplistic) for direct use by clinicians • Research view • Administration view
Pfizer Proprietary Information│ 22
Implementa6on • Server Support • Update code
Summary
• CrowdSourcing can be an effective tool for predictive analytics – leverages crowds of data scientists to identify and build the best
performance model – Access to domain knowledge experts – On demand resources particularly when consultants may not be
appropriate – Winners usually outperform the state of the art methods
Pfizer Confidential │ 23
top related