SENSITIVITY, SPECIFICITY, ANDRELATIVES
BMED 6700
February 16, 2012
Overview
Definitions of Sensitivity, Specificity, Positive and Negative
Predictive Values, Likelihood ratio Positive and Negative, Measure
of Agreement.
Performance of Tests: ROC Curves and Area Under ROC.
Assessing Combined Tests
The following problem was posed by Casscells, Schoenberger, and
Grayboys (1978) to 60 students and staff at an elite medical school:
If a test to detect a disease whose prevalence is 1/1000 has a false
positive rate of 5%, what is the chance that a person found to have
a positive result actually has the disease, assuming you know
nothing about the person’s symptoms or signs?
Assuming that the probability of a positive result given the
disease is 1, the answer to this problem is approximately 2%.
Casscells et al. found that only 18% of participants gave this
answer. The most frequent response was 95%
Fundamental 2 x 2 Table
disease (D) no disease (C) total
test positive (P) TP FP nP=TP+FP
test negative (N) FN TN nN=FN+TN
total nD=TP+FN nC = FP + TN n
TP true positives (test positive, disease present)
FP false positives (test positive, disease absent)
FN false negatives (test negative, disease present)
TN true negatives (test negative, disease absent)
nP total number of positives (TP + FP)
nN total number of negatives (TN + FN)
nD total number with disease present (TP + FN)
nC total number without disease present (TN + FP)
n total sample size (TP+FP+FN+TN)
Definitions
Sensitivity (Se) Se = TP/(TP + FN) = TP/nD
Specificity (Sp) Sp = TN/(FP + TN) = TN/nC
Prevalence (Pre) (TP + FN)/(TP + FP + FN + TN)= nD/n
Positive Predictive Value (PPV) PPV = TP/(TP + FP) = TP/nP
Negative Predictive Value (NPV) NPV = TN/(TN + FN) = TN/nN
Likelihood Ratio Positive (LRP) LRP = Se/(1-Sp)
Likelihood Ratio Negative (LRN) LRN = (1-Se)/Sp
Apparent Prevalence (APre) APre = nP/n
Concordance, Agreement (Ag) Ag =(TP + TN)/n
Imagine a test that classifies all subjects as positive – trivially
the sensitivity is 100%. Since there are no negatives, the specificity
is zero. Likewise, the test that classifies all subjects as negative has
a specificity of 100% and zero sensitivity.
One of the most important measures is Positive Predictive Value,
PPV. It is the proportion of true positives among all positives,
TP/nP. Correct only if the population prevalence is well estimated
by nD/n, that is, if the table is representative of its population.
If the table is formed from a convenience sample, the prevalence
(Pre) would be an external information, and positive predictive
value is calculated as
PPV =Se× Pre
Se× Pre + (1− Sp)× (1− Pre).
Why is the Positive Predictive Value so important? Imagine
almost perfect test for a particular disease, with sensitivity 100%
and specificity of 99%. If the prevalence of the disease in
population is 10%, then approximately among 10 positives there
would be one false positive. However, if the population prevalence
is 1/10000, then approximately for each true positive there would
be 100 false positives.
The Likelihood Ratio Positive represents the odds that a positive
test result would be found in a patient with, versus without, a
disease.
The The Likelihood Ratio Negative represents the odds that a
negative test result would be found in a patient without, versus
with, a disease. For example,
Post-test Disease Odds = LRP × Pre-test Disease Odds.
Post-test No-Disease Odds = LRN × Pre-test No-Disease Odds.
D-Dimer Example
The data below consist of quantitative plasma D-dimer levels
among patients undergoing pulmonary angiography for suspected
pulmonary embolism (PE). The patients who exceed the threshold
of 500 ng/mL are classified as positive.
The gold standard for PE is the pulmonary angiogram.
Goldhaber et al. (1993), from Brigham and Women’s Hospital at
Harvard Medical School, considered a population of patients who
are suspected of PE based on a battery of symptoms. The
summarized data for 173 patients are provided in the table below:
acute PE present no PE present total
positive (D-d ≥ 500 ng/mL) 42 96 138
negative (D-d < 500 ng/mL) 3 32 35
total 45 128 173
function [se sp pre ppv npv lrp ag yi] = sesp(tp, fp, fn, tn)
%
%INPUT: 2x2 Contingency (Confusion) Table
% tp-true positives; fp-false positives;
% fn-false negatives; tn-true negatives
%---------
% OUTPUT
% se-sensitivity, sp-specificity, pre-prevalence(for random sample)
% ppv-positive predictive value, npv-negative predictive value,
% lrp - likelihood ratio positive, ag-agreement
% EXAMPLE OF USE:
% D-dimer as a test for acute PE (Goldhaber et al, 1993)
% [s1, s2, p1, p2, p3, lr, a, yi]=sesp(42,96,3,32);
%
[a b c d e f g h] = sesp(42,96,3,32);
Se Sp Pre PPV NPV LRP Ag Yi
0.9333 0.2500 0.2601 0.3043 0.9143 1.2444 0.4277 0.1296
k Independent Tests, Parallel and Serial Strategies.
In the parallel strategy the combination is positive if at least one
test is positive. It is negative if all tests are negative.
Se = 1− [(1− Se1)× (1− Se2)× · · · × (1− Sek)]
Sp = Sp1 × Sp2 × · · · × Spk.
The overall sensitivity is larger than any individual sensitivity, and
specificity smaller than any individual specificity.
In the serial strategy, the combination is positive if all tests are
positive. It is negative if at least one test is negative.
Se = Se1 × Se2 × · · · × Sek.
Sp = 1− [(1− Sp1)× (1− Sp2)× · · · × (1− Spk)]
The overall sensitivity is smaller than any individual sensitivity,
while the specificity is larger than any individual specificity.
Parikh et al (2008) provide an example of combining two tests for
sarcoidosis. Ocular sarcoidosis is an idiopathic multi-system
granulomatous disease, where the diagnosis is made by a
combination of clinical, radiological and laboratory findings. The
gold standard is a tissue biopsy showing noncaseating granuloma.
Angiotensin-converting enzyme (ACE) test has a sensitivity of
73% and a specificity of 83% to diagnose sarcoidosis.
Abnormal gallium scan (AGS) has a sensitivity of 91% and a
specificity of 84%.
Though individually the specificity of either test is not
impressive, for the serial combination, the specificity becomes
Sp = 1− (1− 0.84)× (1− 0.83) = 1− (0.16× 0.17) = 0.97.
The combination sensitivity becomes 0.73× 0.91 = 0.66.
The ROC is defined as a graphical plot of sensitivity vs. (1 -
specificity).
To increase apparently low specificity in the previous D-dimer
analysis the threshold for a positive is increased from 500 ng/mL to
650 ng/mL.
acute PE present no PE present total
positive (D-d ≥ 650 ng/mL) 31 33 64
negative (D-d < 650 ng/mL) 14 95 109
total 45 128 173
This new table results in
[a b c d e f] = sesp(31,33,14,95);
Se Sp Pre PPV NPV LRP Ag YI
0.6889 0.7422 0.2601 0.4844 0.8716 2.6721 0.7283 0.7283
Combining this with the output of 500 ng/mL threshold, we get
vectors 1-sp = [0 1-0.7422 1-0.25 1], and se = [0 0.6889
0.9333 1].
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sen
sitiv
ity
1−specificity
(0.2578, 0.6889)
(0.75, 0.9333)
Area = 0.7297
The m-file RocDdimer.m plots this “rudimentary” ROC curve.
The curve is rudimentary since it is based on only two tests.
Note that points (0,0) and (1,1) always belong to ROC curves.
These two points correspond to the trivial tests in which all
patients test negative or all patients test positive.
The area under the ROC curve (AUC), is a well accepted
measure of test performance. The closer the area is to 1, the more
unbalanced the ROC curve, implying that both sensitivity and
specificity of the test are high. It is interesting that some
researchers assign an academic scale to AUC as an informal
measure of test performance.
AUC performance
0.9-1.0 A
0.8-0.9 B
0.7-0.8 C
0.6-0.7 D
0.0-0.6 F
MATLAB code auc.m calculates AUC when the vectors
1-specificity and sensitivity are supplied.
function A = auc(csp, se)
% A = auc(csp,se) computes the area under the ROC curve
% where ’csp’ and ’se’ are vectors representing (1-specificity)
% and (sensitivity), used to plot the ROC curve
% The length of the vectors has to be the same
Exercise 1. HAAH Improves the Test for Prostate
Cancer.
A new procedure based on a protein called human aspartyl
(asparaginyl) beta-hydroxylase, or HAAH, adds to the accuracy of
standard prostate-specific antigen (PSA) testing for prostate
cancer. The findings were presented at the 2008 Genitourinary
Cancers Symposium (Keith et al, 2008).
The research involved 233 men with prostate cancer and 43 healthy
men, all over 50 years old. Results showed that the HAAH test had
an overall sensitivity of 95% and specificity of 93%.
(a) From the reported percentages, form a table with true positives,
false positives, true negatives and false negatives (tp, fp, tn, and
fn). You will need to round to the nearest integers since the
specificity and sensitivity were reported as integer percents.
(b) Suppose that for the men of age 50+ in US, the prevalence of
prostate cancer is 7%. Suppose, Jim Smith is randomly selected
from this group and tested positive with HAAH test. What is the
probability that Jim has prostate cancer.
(c) Suppose that Bill Schneider is randomly selected person from
the sample of n = 276 (= 233 + 43) subjects involved in the HAAH
study. What is the probability that Bill has prostate cancer if he
tests positive and no other information is available. How do you
call this probability? What is different here from (b)?
Solution:
(a) Recall that sensitivity is the ratio of true positives and total
number of subjects with the disease. Since 233 subjects are with
the disease, the sensitivity of 95% means that there are
233 · 0.95 = 221.35 ≈ 221 true positives. Thus tp = 221. This
gives 233− 221 = 12 false negatives, thus fn = 12.
Similarly, 43 subjects do not have disease. Since specificity is 0.93,
the true negatives are 43 · 0.93 = 39.99 ≈ 40. This means tn=40 and
fp = 3. The table is
disease no disease total
test positive tp=221 fp=3 tot.pos = 224
test negative fn=12 tn=40 tot.neg = 52
total tot.dis=233 tot.ndis=43 total=276
(b) If the prevalence is an external info,
P ( disease | test positive)
=sensitivity · prevalence
sensitivity · prevalence + (1-specificity) · (1- prevalence)
=221/233× 7/100
221/223× 7/100 + 3/43× 93/100
= 0.5058
PPV is 0.5058
(c)
If the table is representative of population,
PPV =tp
tp+ fp= 221/224 = 0.9866 .
PPV is 0.9866
In both (b) and (c) we have found positive predicted value, that is
P ( disease present | test positive). However, (b) and (c) differ in
the information where the subject comes from, which is critical for
the prevalence.
If the subject comes from the general population then the
prevalence is 0.07.
If we selected the subject from the group involved in this study
(that is, selected person is one of 276 subjects), then the
“prevalence” refers to this particular group and istp+fn
total n= 233/276.
Exercise 2. Hypothyroidism.
Low values of a total thyroxine (T4) test can be indicative of
hypothyroidism (Goldstein and Mushlin, 1987). Hypothyroidism is
a condition in which the body lacks sufficient thyroid hormone.
Since the main purpose of thyroid hormone is to “run the body’s
metabolism”, it is understandable that people with this condition
will have symptoms associated with a slow metabolism. Over five
million Americans have this common medical condition.
A total of 195 patients, among which 59 have confirmed
hypothyroidism, have been tested for the level of T4. If the
patients with T4-level ≤ 5 are considered positive for
hypothyroidism, the following table is obtained:
T4 value Hypothyroid Euthyroid Total
Positive, T4 ≤ 5 35 5 40
Negative, T4 > 5 24 131 155
Total 59 136 195
However, if the thresholds for T4 are 6, 7, 8 and 9, the following tables are obtained.
T4 value Hypothyroid Euthyroid Total
Positive, T4 ≤ 6 39 10 49
Negative, T4 > 6 20 126 146
Total 59 136 195
T4 value Hypothyroid Euthyroid Total
Positive, T4 ≤ 7 46 29 75
Negative, T4 > 7 13 107 120
Total 59 136 195
T4 value Hypothyroid Euthyroid Total
Positive, T4 ≤ 8 51 61 112
Negative, T4 > 8 8 75 83
Total 59 136 195
T4 value Hypothyroid Euthyroid Total
Positive, T4 ≤ 9 57 96 153
Negative, T4 > 9 2 40 42
Total 59 136 195
There is a tradeoff between sensitivity and specificity. One can
improve the sensitivity by moving the threshold to a higher T4
value; that is, make the criterion for a positive test less strict. One
can improve the specificity by moving the threshold to a lower T4
value; that is, make the criterion for a positive test more strict.
(a) For the test that uses T4 = 7 as threshold, find sensitivity,
specificity, positive and negative predicted values, likelihood ratio,
and degree of agreement. You can use MATLAB program sesp.m.
(b) Using the given thresholds for the test to be positive, plot the
ROC curve. What threshold would you recommend? Explain your
choice.
(c) Find the area under the ROC curve. You can use MATLAB file
auc.m.
[a b c d e f h] = sesp( 35, 5, 24, 131);
% Se Sp Pre PPV NPV LRP Ag YI
% 0.5932 0.9632 0.3026 0.8750 0.8452 16.1356 0.8513 0.3935
[a b c d e f h] = sesp( 39, 10, 20, 126);
% Se Sp Pre PPV NPV LRP Ag YI
% 0.6610 0.9265 0.3026 0.7959 0.8630 8.9898 0.8462 0.4154
[a b c d e f h] = sesp(46, 29, 13, 107);
% Se Sp Pre PPV NPV LRP Ag YI
% 0.7797 0.7868 0.3026 0.6133 0.8917 3.6563 0.7846 0.4005
[a b c d e f h] = sesp(51, 61, 8, 75);
% Se Sp Pre PPV NPV LRP Ag YI
% 0.8644 0.5515 0.3026 0.4554 0.9036 1.9272 0.6462 0.2941
[a b c d e f h] = sesp(57, 96, 2, 40);
% Se Sp Pre PPV NPV LRP Ag YI
% 0.9661 0.2941 0.3026 0.3725 0.9524 1.3686 0.4974 0.1840
se = [0, 0.5932, 0.6610, 0.7797, 0.8644, 0.9661, 1];
csp = [0, 1 - 0.9632, 1 - 0.9265, 1 - 0.7868, 1-0.5515, 1-0.2941, 1];
figure(1)
plot(csp, se, ’r-’)
hold on
plot(csp, se, ’ro’)
plot([0 1],[0 1], ’r-’)
xlabel(’1 - specificity’)
ylabel(’sensitivity’)
a = auc(csp, se)
% a = 0.8527 (Grade of B).