[trying to correct] selection biasims.nus.edu.sg/events/2017/quan/files/noah2.pdf · 2017-07-11 ·...
TRANSCRIPT
The following ideas may have partly been borrowed from...
He’s the one in the center... giving everyone else a hard time atseminar!
3 / 62
What is Precision Medicine?
I will stick mostly to oncology...
Not because I know much there...
But I definitely know less about everything else!
4 / 62
What is Precision Medicine?
The practice of medicine has always been about
I characterizing dysfunction
I treating based on specific characterizations
5 / 62
What is Precision Medicine?
In the beginning this was based on simple observation alone:
you’ve been vomiting and missed your period −→ Pregnant
Now we have more sophisticated methods:
hCG in urine −→ Pregnant
In oncology, tumors are characterized using histology
6 / 62
What is Precision Medicine?
My understanding is:
Medicine attempts to differentiate diseases...
to develop treatments that target specific disease characteristics
Precision medicine attempts to differentiate diseases...
more precisely?
to develop treatments that target specific disease characteristics
That reads like a high schooler “not-plagiarizing” an essay...
7 / 62
What is Precision Medicine?
My understanding is:
Medicine attempts to differentiate diseases...
to develop treatments that target specific disease characteristics
Precision medicine attempts to differentiate diseases...
more precisely?
to develop treatments that target specific disease characteristics
That reads like a high schooler “not-plagiarizing” an essay...
7 / 62
What is Precision Medicine?
My understanding is:
Medicine attempts to differentiate diseases...
to develop treatments that target specific disease characteristics
[Biomolecular] Precision medicine attempts to differentiate diseasesusing biomolecular profiling
to develop treatments that target specific biomolecular diseasecharacteristics
8 / 62
What am I leaving out
Screening diagnostics
eg. cfDNA
Actionable prognostic biomarkers
eg. oncotypeDX
Often forgotten that the goal is to find actionable biomarkers
9 / 62
Back to “Predictive Biomarkers”
Two common scenarios:
Developing a targeted treatment + diagnostic
Developing a new diagnostic, for an existing, non-targetedtreatment
10 / 62
Targeted Treatments
30+ targeted cancer drugs1 with many different targets
The primary FDA-specified “biomolecular” indications were
I HER2/HR status
I KRAS/EGFR mutation
I BRAF mutation
Many with no “biomolecular indication”...
only approved in very specific cancer-types though!
(histology-based personalization!)
1from “Overview of FDA-approved Anti-Cancer Drugs Used for TargetedTherapy” WCRJ 2015; 2(3) e553
11 / 62
The Road to Failure in Precision Medicine
Where have I seen little success?
Characterizing the [in]effectiveness of non-targeted treatments
Why do poor treatments tend not to work?
???
Why do I tend to miss free throws?
Because I keep forgetting to wear my lucky shirt...?
Or maybe because I’m generally bad at basketball...
12 / 62
The Road to Failure in Precision Medicine
Where have I seen little success?
Characterizing the [in]effectiveness of non-targeted treatments
Why do poor treatments tend not to work?
???
Why do I tend to miss free throws?
Because I keep forgetting to wear my lucky shirt...?
Or maybe because I’m generally bad at basketball...
12 / 62
The Road to Failure in Precision Medicine
Where have I seen little success?
Characterizing the [in]effectiveness of non-targeted treatments
Why do poor treatments tend not to work?
???
Why do I tend to miss free throws?
Because I keep forgetting to wear my lucky shirt...?
Or maybe because I’m generally bad at basketball...
12 / 62
The Road to Failure in Precision Medicine
Where have I seen little success?
Characterizing the [in]effectiveness of non-targeted treatments
Why do poor treatments tend not to work?
Because they tend not to work...
Why do I tend to miss free throws?
Because I keep forgetting to wear my lucky shirt...?
Or maybe because I’m generally bad at basketball...
13 / 62
The Road to Success in Precision Medicine?
What is the best place for statisticians on that road?
Is it building fancier methods?
(in some avenues things work pretty well with simple methods)
Or domain expertise?
Or some other option?
14 / 62
Solve Easy Problems!
EE/CS does this well!
Very approximately solve useful + “easy” domain problems
Statistics seems to have more deep, but slow prodding phenotype.
sometimes the problems are messy...
15 / 62
A familiar problem - testing multiple hypotheses
Prostate cancer data2
n = 102 samples:
50 healthy controls
52 prostate cancer patients
p = 6033 genes
2Singh et al., (2002)17 / 62
A familiar problem - testing multiple hypotheses
Interested in
δj =µ1j − µ2j
σj
Calculate a (scaled) two-sample t-statistic for each gene
zj =x(c)j − x
(d)j
sj,
sj is your favorite estimate of standard deviation
18 / 62
A familiar problem - testing multiple hypotheses
Suppose I adjust for multiplicity...
and find 10 differentially expressed genes.
However, I also want to estimate the effect-size (δj) for those 10.
Given that I already adjusted for multiplicity in testing...
can I just report unadjusted δj?
NO!
20 / 62
A familiar problem - testing multiple hypotheses
Suppose I adjust for multiplicity...
and find 10 differentially expressed genes.
However, I also want to estimate the effect-size (δj) for those 10.
Given that I already adjusted for multiplicity in testing...
can I just report unadjusted δj?
NO!
20 / 62
Estimating Effect Sizes
The test statistics are approximately
zj ∼N
(δj ,
√n−11 + n−1
2
)with
δj =µ1j − µ2j
σj
We’re pretty good at testing if δj = 0
Bonferroni — Benjamini-Hochberg (/Yekutieli)
21 / 62
Estimating Effect Sizes
The test statistics are approximately
zj ∼N
(δj ,
√n−11 + n−1
2
)with
δj =µ1j − µ2j
σj
We’re pretty bad at estimating δj
(especially for most extreme values of δj)
21 / 62
Before we move on
Two flavors of past approaches:
I Conditioning on exceeding a threshold (univariate correction)
I Empirical Bayes (multivariate correction)
This talk will be more related to Empirical Bayes
22 / 62
Illustrative Example
500 data points based on zj |δj ∼ N(δj , 1)
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5 5.0x
dens
ity
Means Statistics
23 / 62
Illustrative Example
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5 5.0x
dens
ity
Means Statistics
Standard estimate δj = zj is poor
24 / 62
Illustrative Example
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5 5.0x
dens
ity
Means Statistics
E ‖z‖2 = ‖δ‖2 + p
24 / 62
Illustrative Example
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5 5.0x
dens
ity
Means Statistics
James-stein scales toward overall mean
24 / 62
Illustrative Example
0.000
0.025
0.050
0.075
0.100
0 10 20 30z
dens
ity
What should we scale towards here??
26 / 62
Illustrative Example
0.000
0.025
0.050
0.075
0.100
0 10 20 30z
dens
ity
Need local shrinkage
26 / 62
Winner’s Curse
Some Definitions
z(k) The k-th order statistic
j(k) The index of the k-th order statistic
Note
j(k) is the inverse of the “rank” operator
zj(k) = z(k)
28 / 62
Winner’s Curse
Some Definitions
z(k) The k-th order statistic
j(k) The index of the k-th order statistic
By Jensen’s inequality (And our pictures):
E[z(p)]≥ max (δj) ≥ E
[δj(p)
]
29 / 62
Winner’s Curse
z(p) − δj(p) = 2.52
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5z
dens
ity
MeansStatistics
30 / 62
Winner’s Curse
z(p) − δj(p) = 3.29
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5z
dens
ity
MeansStatistics
31 / 62
Winner’s Curse
z(p) − δj(p) = 2.78
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5z
dens
ity
MeansStatistics
32 / 62
Winner’s Curse
z(p) − δj(p) = 2.63
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5z
dens
ity
MeansStatistics
33 / 62
Winner’s Curse
z(p) − δj(p) = 2.43
0.0
0.2
0.4
0.6
0.8
−2.5 0.0 2.5z
dens
ity
MeansStatistics
34 / 62
Winner’s Curse
Define βk byβ(δ)k = E
[z(k) − δj(k)
]Now consider the estimate
δj(k) = z(k) − β (δ)k
or equivalentlyδj = zj − β (δ)r(j)
35 / 62
Winner’s Curse...?
A more efficient estimate
E∥∥∥δ − δ∥∥∥2
2= E ‖z− δ‖22 −
∑β (δ)2k
Removes selection bias
E[δj(p)
]= E
[δj(p)
]
36 / 62
Toy example
200 data points from δj = each of 5, 10, 15, 20, 25.
0.000
0.025
0.050
0.075
0.100
0 10 20 30z
dens
ity
10
20
10 20z
mea
n
Naive Oracle Debias Truth
37 / 62
Estimating β
Use parametric bootstrap; estimate β (δ) by
β (δ) = β(δ)
5
10
15
20
25
10 20z
mea
n
Oracle Debias Bootstrap Debias Truth 40 / 62
Non-parametric Empirical Bayes
Assume δ ∼ g(·)
Estimate g by g
Take δ = Eg [δ | z ]
Extremely strong in simple scenarios.
Difficult/impossible to apply in general.
Similar to considering E[δj(k)
]
42 / 62
More Complex Scenarios
What about correlation among estimates?
Maybe interested in regression coefficients?
Or entries of a precision matrix?
Or a very complicated parameter based on a very complicatedprocedure?
43 / 62
More Complex Scenarios
There is a similar, simple framework to accommodate all of these!
(Details are notationally dense, but not hard)
44 / 62
A particularly intriguing scenario
Often consider several candidate biomolecular signatures...
for predicting response to a new test treatment
Want estimate of ATE in test+ for best signature
On phase II data, and select the best (Induces a bias!)
but signatures are likely very correlated (maybe less bias?)
This framework can quite easily be applied there.
45 / 62
Back to Prostate Cancer
n = 102 samples:
50 healthy controls
52 prostate cancer patients
p = 6033 genes
Interested in
δj =µ1j − µ2j
σj
46 / 62
Prostate Cancer — Scaled t-statistics
0.0
0.5
1.0
1.5
2.0
−1.0 −0.5 0.0 0.5 1.0stats
dens
ity
47 / 62
Prostate Cancer — Shrinkage
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0z
estim
ate
uncor bootstrap
para bootstrap
nonpara bootstrap
NP EB
JS
48 / 62
Prostate Cancer — Evaluation
SSE of 50− 50 split
Methods k = 50 k = 25 k = 15
ebayes 204.33 (3.11) 110.13 (2.83) 71.03 (2.40)para-uncor 190.81 (2.40) 93.56 (1.84) 54.93 (1.40)para-cor 178.65 (1.97) 87.90 (1.55) 51.07 (1.17)nonpara 191.73 (2.42) 93.65 (1.84) 54.75 (1.37)
unadjusted 726.62 (8.05) 400.35 (5.76) 258.56 (4.21)
49 / 62
Rank Conditional Coverage
Consider intervals I1, . . . , Ip, ordered as Ij(1), . . . , Ij(p)
We know average coverage is OK
1
p
∑j≤p
P (δj ∈ Ij) =1
p
∑k≤p
P(δj(k) ∈ Ij(k)
)= 1− α
52 / 62
Rank Conditional Coverage
The problem is that
P(δj(k) ∈ Ij(k)
)<< 1− α for k interesting
P(δj(k) ∈ Ij(k)
)∼ 1 for k boring
The most interesting intervals are also the most under-covering
Conditioning on ”rejecting the null” doesn’t fix this!
53 / 62
Rank Conditional Coverage
For random intervals I1, . . . , Ip we call
RCCk ≡ P(δj(k) ∈ Ij(k)
)the Rank Conditional Coverage.
Generally want to control RCC uniformly at level 1− α
54 / 62
Intervals for large δ
We would like to get a procedure to form I1, . . . , Ik with
P(δj(k) ∈ Ij(k)
)≥ 1− α
55 / 62
Intervals for large δ
Using a similar framework + resampling, can construct intervals!
Details are notationally dense, but not hard
As before there is an approximation by resampling
56 / 62
Interval Examples (n=100, p=500)
δ ∼ N(0, 1)
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●
●●●●●●●●
●●●●●●●●●●
●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●
0 100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
Cov
erag
e
●
●
●●●●●●●
●●
●●●●●●●●
●●●
●●●●
●
●
●●●●●
●●●●●●●●●●●●
●●●
●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●●●●●●
●●●
●
●●●
●
●●●●●●
●
●●●●●●
●
●
●●●
●●●
●
●●
●
●●●
●●●●
●●
●
●
●
●
●
NaiveBS
57 / 62
Interval Examples (n=100, p=500)
δj = cor (xj , y)
where X ∼ N (0,Σ),
y =∑20
j=1 xjβj + ε for β1, . . . , β20 ∼ N(0, 1)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
●
●
●
400 420 440 460 480 500
0.0
0.2
0.4
0.6
0.8
1.0
Order Statistic
Cov
erag
e
●●●●●●●●●●
●●●●●●●●●●●
●●●●
●●●
●●●
●●●●●●●
●●●●●
●●●●●●●
●●●
●●●●●●●●
●●●
●
●●●●
●●●●●
●●●
●●●
●●●●●
●
●
●●●●●
●●●
●●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●
●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
NaiveBootstrapWFBWFB2
58 / 62
Takeaways!
Simple formulation for dealing with “selection-bias” in highdimensions
Revolves around the distribution of z(k) − δj(k)
You have to be careful with plug-ins.
59 / 62
More Complex Scenarios — EXTRA
Consider
distribution Fa parameter-vector Θ (F).
Θ an empirical estimate of Θ (F) based on X1, · · · ,Xn ∼ F
If we defineβ (F)k = E
[Θ(k) −Θ (F)j(k)
]and
Θj = Θj − β (F)r(j)
Then
E∥∥∥Θ−Θ (F)
∥∥∥22
= E∥∥∥Θ−Θ (F)
∥∥∥22−∑
β (F)2k
60 / 62
Intervals for large δ EXTRA
Rather than using the mean of our distribution, use quantiles!
I Let Gk(δ) denote the distribution for z(k) − δj(k)
I Define L (δ)k and U (δ)k to be the 1− α/2 and α/2 quantilesof Gk (δ) ie.
P[Uk ≤ z(k) − δj(k) ≤ Lk
]= 1− α
I Pivot! Define Ik =[z(k) − Lk , z(k) + Uk
]
61 / 62