model selection and inference: motivation, mechanics, and interpretation
DESCRIPTION
Model Selection and Inference: Motivation, Mechanics, and Interpretation. Gail Olson and Dan Rosenberg Department of Fisheries and Wildlife Oregon State University. www.oregonstate.edu/~rosenbed/workshop.htm. Goal of Workshop. Provide motivation for a conceptually simple approach - PowerPoint PPT PresentationTRANSCRIPT
Model Selection and Inference: Motivation, Mechanics, and Interpretation
Gail Olson and Dan RosenbergDepartment of Fisheries and Wildlife
Oregon State University
www.oregonstate.edu/~rosenbed/workshop.htm
Goal of Workshop
• Provide motivation for a conceptually simple approach for the analysis of data using multiple models emphasizing an a priori approach
• Provide the mechanics of how to use AIC
• Guidance on how to interpret results from an AIC approach
• Discuss how this may benefit your research
Starters
The research started with an intriguing and important question
AND
You used a proper experimental or probability-based sampling design
Analytical strategies can not account for the failure of these points
We assume:
The research started with an intriguing and important questionYou used a proper experimental or probability-based sampling design
Goal of Research in Management of Natural Resources
• understand nature and how it reacts to perturbations
• make predictions based on inferences from analysis of empirical data
reliable
Steps in Making Reliable Inferences
• Inference from Sample to the Population
• Identify and understand patterns and mechanisms
• Statistical models to aid detection and interpretation
Why models?“All models are wrong, but some are useful”
Box (1976)
distance
Pr(
use)
What is Meant by “Model”?
1. Theory: A hypothesis that has survived repeated efforts to falsify it
2. Hypothesis: a story about how the world works
3. Model: an abstraction or simplification of the real world; models as tools for the evaluation of hypotheses
Statistical models separate noise from information inherent in “data”
Statistical models as expression of specific hypotheses
This is particularly important in the model selection framework; recognition that there is not necessarily a single model appropriate
for inference
Single vs Multiple ModelsTraditional Hypothesis Testing (Single Model)
• Emphasis on the test itself; usually not informative
•“Habitat selection was not significantly different among crop types (P < 0.05).”
• Probability of use is unrelated to distance from a nest, habitat type, and landscape context
• A trivial straw man
• Null (H0 ): d = 0; hab1 = hab2= hab3; patch size = 0
• Alternative (HA ): 0 > d > 0; etc……
• Reject (H0 ) if test statistic is such that p 0.05That is, if the prob that the data arose from the null is exactly 0.05, reject H0 in favor of HA
• alpha level is arbitrary
All we typically learn is that the sample sizes were not large enough to detect differences
Traditional Hypothesis Testing (Single Model)
A Multiple Model Approach
• Avoid “pet” hypothesis• All models equally likely in their selection or weight• Simultaneously comparing and ranking models• Emphasis on direction, magnitude, and precision of effects• Estimates can be based on multiple models
Single vs Multiple Models
Probability of use is :A. unrelated to distance from a nestB. related linearlyC. related exponentially
All hypotheses receive equal initial weight in evaluation, and all models can be used in inference so one does not have to select a single model
Emphasis on an a priori Model Set
Hypotheses Expressed as Statistical Models
A Global Model• has many parameters representing plausible effects and the state of the science, as well as relevant study design issues; most complex model of set
Subsets • can be considered special cases of the global model; fewer parameters, not necessarily nested; always of same response variable and estimated from the same set of data
These all become “candidate models” • formulation of a useful set of a priori models • selection and weighting models for ranking hypotheses and parameter estimation
Goal:
Developing an a priori Model Set1. Have the question crystal-clear2. Bring in your (team’s) understanding of the problem3. Incorporate past research via literature review4. Understand the expectation of the process based on theory
and include this expectation in your model set
5. Include models of opposing views
6. Should be subjective– bring in various views and thoughts
7. Avoid all possibilities “just because you can”
8. Number of parameters must be considered in terms of sample size
9. Number of models should be a balance between small number of biologically plausible models and not excluding potentially important models
Simplicity and Clarity as Goals
A Model of Habitat Selection
Per unit area,Pr (use) = f(dist. to focal site) + barriers + attractants
N
Hypotheses and their Rationale
Pr
(Use
)
Distance from the Nest
A. Hypotheses related to distance effects
Hypotheses and their Rationale
• cover type by dominant species
• cover type by structure
B. Hypotheses related to Crop Type
C. Hypotheses related to landscape characteristics
• patch size
• distance to perennial crop
• dominant type within home-range
The Set of Candidate Models
Global Model: The most complex model
Pr(use)= distance (polynomial), crop types, patch type, distance to perennial crop, dominant in home range
Model Subsets: Includes one or more parameters• distance (linear)• distance (log)• distance (polynomial)• Crop-Only models includes parameter for each crop type• Crop types combined into structure classes• Best distance model + crop parameters• Best Distance model + structure parameters• No effects model• Best distance + cover or crop model + patch type
•Etc.
Mo
de
l Se
l. C
rite
ria
D(l)Crop
D(P)Crop
D(l)Cover
D(P)Cover
Crop
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Cover
Dist (L)
Dist (P)
Per
cen
t L
ocat
ion
s
Distance (km) from Nest
Conformity of Burrowing Owl Space-Use Patterns to the Central-Place Model
Large individual (and/or sampling) variation
0.0 0.5 1.0 1.5 2.0 2.5 3.00
10
20
30
40
50
60
AgricultureFragmented
Summary:Motivation for an a priori Model Selection Approach
Statistical models to separate pattern from noise
Single vs. multiple model approaches
Insignificance of Statistical Significance Testing (Johnson 1999)
Emphasis on parameter estimation and uncertainty
Ranking and evaluating competing hypotheses
Inference from multiple modelsoften difficult to identify the best model
Akaike’s Information Criterion (AIC)
• Metric to rank and compare models• Hirotugu Akaike (1973)
“An Information Criterion”• Simple metric with DEEP theory
Boltzmann’s entropy – Physics Kullback-Leibler discrepancy – Information theoryMaximum Likelihood Theory - Statistics
Kullback-Leibler Discrepancy
Maximum Likelihood (ML)
• Good statistical propertiesUnbiasedMinimum variance
• Links models, parameters, data• L (parameters | model, data)• Usually expressed as a log value:
log (L (|g(y),y))• Aim is to maximize the log value
ML Example
• Binomial modelL (p | binomial, y)
n
yp py n y
( )1
• Log (L (p | bin, y))
lo g lo g ( ) ( ) lo g ( )n
yy p n y p
1
For n=11 and y=7
Model over-fitting
Principle of Parsimony
AIC Basics
AIC = -2logL + 2k
-2logL = -2log(L (y))
^
(Model fit)k = number of parameters“Penalty”
AICc for small sample sizes
A IC c A ICk k
n k
2 1
1
( )
• Less biased
• Use when n/k < 40
• Better, use all the time!
Model Selection
• Compute AICc for each model• Rank lowest to highest• Lowest AICc = “best” model• Example:
Northern Spotted Owl Survival Analysis
Effects of Seasonal Climate covariates(Precipitation and Temperature)
Model ranking by AICcModel AICcPen 1736.40Pen+Pln 1737.91Pen+Ten 1738.31no climate 1738.72Pdp' 1739.54Phs 1739.82Ths 1740.56Tln+Pln 1741.07Pen+Ten+Pln+Tln 1741.09Pws+Tws 1741.67Pen+Ten+Pws+Tws 1742.36Pen+Ten+Pln+Tln+Pws+Tws 1745.05
AICc
AICc = AICc(model) – AICc(min)
Compare model relative to “best” model
Rules of Thumb (B&A):0-2 = Competing, substantial support4-7 = Less supported10+ = Essentially no support
Model AICc AICcPen 1736.40 0Pen+Pln 1737.91 1.506Pen+Ten 1738.31 1.906no climate 1738.72 2.32Pdp' 1739.54 3.137Phs 1739.82 3.416Ths 1740.56 4.157Tln+Pln 1741.07 4.66Pen+Ten+Pln+Tln 1741.09 4.682Pws+Tws 1741.67 5.265Pen+Ten+Pws+Tws 1742.36 5.953Pen+Ten+Pln+Tln+Pws+Tws 1745.05 8.643
Relative rankings
Akaike weights
• Relative likelihood of each model
• Specific to model set (wi=1)
w ii
rr
R
ex p ( . )
ex p ( . )
0 5
0 51
Model weights
Model AICc Weight Rel. LikelihoodPen 0 0.3318 1.0000Pen+Pln 1.506 0.1562 0.4709Pen+Ten 1.906 0.1279 0.3855no climate 2.32 0.1040 0.3135Pdp' 3.137 0.0691 0.2084Phs 3.416 0.0601 0.1813Ths 4.157 0.0415 0.1251Tln+Pln 4.66 0.0323 0.0973Pen+Ten+Pln+Tln 4.682 0.0319 0.0962Pws+Tws 5.265 0.0239 0.0719Pen+Ten+Pws+Tws 5.953 0.0169 0.0510Pen+Ten+Pln+Tln+Pws+Tws 8.643 0.0044 0.0133
1.0000 3.0143
Model weights
Model AICc AICc WeightPen 1736.40 0 0.3318Pen+Pln 1737.91 1.506 0.1562Pen+Ten 1738.31 1.906 0.1279no climate 1738.72 2.32 0.1040Pdp' 1739.54 3.137 0.0691Phs 1739.82 3.416 0.0601Ths 1740.56 4.157 0.0415Tln+Pln 1741.07 4.66 0.0323Pen+Ten+Pln+Tln 1741.09 4.682 0.0319Pws+Tws 1741.67 5.265 0.0239Pen+Ten+Pws+Tws 1742.36 5.953 0.0169Pen+Ten+Pln+Tln+Pws+Tws 1745.05 8.643 0.0044
Fun things to do with weights
• Evidence ratios Compare one model to another
• Confidence setsWhat models are more likely?
• Importance valuesWhat variables are most
important?
Evidence Ratios
Compare best model (Pen)
with “no climate model”:
Wpen = 0.3318 , Wno climate = 0.1040
ER = 0.3318/0.1040 = 3.19
Pen model ~ 3X more likely than no climate model
Confidence SetModel AICc Weight Cumm. WtsPen 0 0.3318 0.3318Pen+Pln 1.506 0.1562 0.4880Pen+Ten 1.906 0.1279 0.6158no climate 2.32 0.1040 0.7199Pdp' 3.137 0.0691 0.7890Phs 3.416 0.0601 0.8491Ths 4.157 0.0415 0.8906Tln+Pln 4.66 0.0323 0.9229Pen+Ten+Pln+Tln 4.682 0.0319 0.9548Pws+Tws 5.265 0.0239 0.9787Pen+Ten+Pws+Tws 5.953 0.0169 0.9956Pen+Ten+Pln+Tln+Pws+Tws 8.643 0.0044 1.0000
95%
Importance values
• Cement Hardening Example (B&A)• Time to hardening based (y) on
composition of 4 different ingredients (xi)
• Regression:y = b0+b1(x1)+b2(x2)+b3(X3)+b4(x4)
AIC in regression analyses
• Number of parameters:k = number of variables (xi) +
intercept (if used) + error variance (2)
• AIC may be calculated from (2) as:AIC = nlog (2) + 2k
^^
^
Multi-model inferenceModel Averaging
• Incorporates model selection uncertainty
• Used for parameter estimationDirectly estimated or notE.g. Regression coefficients,
predicted values
Pitfalls to avoid
• Use same data set for all modelsCaution: missing values
• Transform X’s but not Y• Number of parameters known?
“hidden” parameters“lost” parameters
Bottom line: Know what you are doing!
Interpreting Results
Some issues:Models differing by 1 parameterModel ambiguityNull model bestModel redundancy
Model K AICc AICc DeviancePen 8 1736.40 0.000 461.182Pen+Pln 9 1737.91 1.506 460.665Pen+Ten 9 1738.31 1.906 461.065no climate 7 1738.72 2.320 465.523Pdp' 8 1739.54 3.137 464.319Phs 8 1739.82 3.416 464.598Ths 8 1740.56 4.157 465.339Tln+Pln 9 1741.07 4.660 463.819Pen+Ten+Pln+Tln 11 1741.09 4.682 459.785Pws+Tws 9 1741.67 5.265 464.423Pen+Ten+Pws+Tws 11 1742.36 5.953 461.056Pen+Ten+Pln+Tln+Pws+Tws 13 1745.05 8.643 459.680
ModelAmbiguity
Model Name k AICc AICchilpsp 12 1123.922 0.000himpsl 12 1124.913 0.991hinpl 12 1125.082 1.160hilpsl 12 1125.448 1.526hi1500p 12 1125.622 1.700himpsp 12 1125.838 1.916hinpn 12 1125.853 1.931hilpsl 13 1126.181 2.259hi2400l lm2400l hir1p hir2p15 1126.195 2.273no1500q 12 1126.303 2.381no habitat covariates 11 1126.331 2.409hi1500l 12 1126.448 2.526hi1500l lm1500l 13 1126.571 2.649no2400q 12 1126.607 2.685hi1500p lm1500p 13 1126.765 2.843hi2400p 12 1126.841 2.919hicorep 12 1126.988 3.066hi600p hir1p 13 1127.143 3.221hi2400l 12 1127.505 3.583elev 12 1127.538 3.616hi600q 13 1127.630 3.709elevl 12 1127.640 3.718hi1500p hiedgen 13 1127.694 3.772hicorel 12 1127.702 3.780
Plus 40 more models < 7 AICc
NSO ProductivityModeled as function of Habitat covariates