model selection and inference: motivation, mechanics, and interpretation

44
Model Selection and Inference: Motivation, Mechanics, and Interpretation Gail Olson and Dan Rosenberg Department of Fisheries and Wildlife Oregon State University www.oregonstate.edu/~rosenbed/workshop.htm

Upload: eaton-henry

Post on 02-Jan-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Model Selection and Inference: Motivation, Mechanics, and Interpretation. Gail Olson and Dan Rosenberg Department of Fisheries and Wildlife Oregon State University. www.oregonstate.edu/~rosenbed/workshop.htm. Goal of Workshop. Provide motivation for a conceptually simple approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model Selection and Inference: Motivation, Mechanics, and Interpretation

Gail Olson and Dan RosenbergDepartment of Fisheries and Wildlife

Oregon State University

www.oregonstate.edu/~rosenbed/workshop.htm

Page 2: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Goal of Workshop

• Provide motivation for a conceptually simple approach for the analysis of data using multiple models emphasizing an a priori approach

• Provide the mechanics of how to use AIC

• Guidance on how to interpret results from an AIC approach

• Discuss how this may benefit your research

Page 3: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Starters

The research started with an intriguing and important question

AND

You used a proper experimental or probability-based sampling design

Analytical strategies can not account for the failure of these points

We assume:

The research started with an intriguing and important questionYou used a proper experimental or probability-based sampling design

Page 4: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Goal of Research in Management of Natural Resources

• understand nature and how it reacts to perturbations

• make predictions based on inferences from analysis of empirical data

reliable

Page 5: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Steps in Making Reliable Inferences

• Inference from Sample to the Population

• Identify and understand patterns and mechanisms

• Statistical models to aid detection and interpretation

Why models?“All models are wrong, but some are useful”

Box (1976)

distance

Pr(

use)

Page 6: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

What is Meant by “Model”?

1. Theory: A hypothesis that has survived repeated efforts to falsify it

2. Hypothesis: a story about how the world works

3. Model: an abstraction or simplification of the real world; models as tools for the evaluation of hypotheses

Statistical models separate noise from information inherent in “data”

Statistical models as expression of specific hypotheses

This is particularly important in the model selection framework; recognition that there is not necessarily a single model appropriate

for inference

Page 7: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Single vs Multiple ModelsTraditional Hypothesis Testing (Single Model)

• Emphasis on the test itself; usually not informative

•“Habitat selection was not significantly different among crop types (P < 0.05).”

• Probability of use is unrelated to distance from a nest, habitat type, and landscape context

• A trivial straw man

• Null (H0 ): d = 0; hab1 = hab2= hab3; patch size = 0

• Alternative (HA ): 0 > d > 0; etc……

• Reject (H0 ) if test statistic is such that p 0.05That is, if the prob that the data arose from the null is exactly 0.05, reject H0 in favor of HA

• alpha level is arbitrary

Page 8: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

All we typically learn is that the sample sizes were not large enough to detect differences

Traditional Hypothesis Testing (Single Model)

Page 9: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

A Multiple Model Approach

• Avoid “pet” hypothesis• All models equally likely in their selection or weight• Simultaneously comparing and ranking models• Emphasis on direction, magnitude, and precision of effects• Estimates can be based on multiple models

Single vs Multiple Models

Probability of use is :A. unrelated to distance from a nestB. related linearlyC. related exponentially

All hypotheses receive equal initial weight in evaluation, and all models can be used in inference so one does not have to select a single model

Page 10: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Emphasis on an a priori Model Set

Page 11: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Hypotheses Expressed as Statistical Models

A Global Model• has many parameters representing plausible effects and the state of the science, as well as relevant study design issues; most complex model of set

Subsets • can be considered special cases of the global model; fewer parameters, not necessarily nested; always of same response variable and estimated from the same set of data

These all become “candidate models” • formulation of a useful set of a priori models • selection and weighting models for ranking hypotheses and parameter estimation

Goal:

Page 12: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Developing an a priori Model Set1. Have the question crystal-clear2. Bring in your (team’s) understanding of the problem3. Incorporate past research via literature review4. Understand the expectation of the process based on theory

and include this expectation in your model set

5. Include models of opposing views

6. Should be subjective– bring in various views and thoughts

7. Avoid all possibilities “just because you can”

8. Number of parameters must be considered in terms of sample size

9. Number of models should be a balance between small number of biologically plausible models and not excluding potentially important models

Simplicity and Clarity as Goals

Page 13: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

A Model of Habitat Selection

Per unit area,Pr (use) = f(dist. to focal site) + barriers + attractants

N

Page 14: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Hypotheses and their Rationale

Pr

(Use

)

Distance from the Nest

A. Hypotheses related to distance effects

Page 15: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Hypotheses and their Rationale

• cover type by dominant species

• cover type by structure

B. Hypotheses related to Crop Type

C. Hypotheses related to landscape characteristics

• patch size

• distance to perennial crop

• dominant type within home-range

Page 16: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

The Set of Candidate Models

Global Model: The most complex model

Pr(use)= distance (polynomial), crop types, patch type, distance to perennial crop, dominant in home range

Model Subsets: Includes one or more parameters• distance (linear)• distance (log)• distance (polynomial)• Crop-Only models includes parameter for each crop type• Crop types combined into structure classes• Best distance model + crop parameters• Best Distance model + structure parameters• No effects model• Best distance + cover or crop model + patch type

•Etc.

Page 17: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Mo

de

l Se

l. C

rite

ria

D(l)Crop

D(P)Crop

D(l)Cover

D(P)Cover

Crop

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Cover

Dist (L)

Dist (P)

Page 18: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Per

cen

t L

ocat

ion

s

Distance (km) from Nest

Conformity of Burrowing Owl Space-Use Patterns to the Central-Place Model

Large individual (and/or sampling) variation

0.0 0.5 1.0 1.5 2.0 2.5 3.00

10

20

30

40

50

60

AgricultureFragmented

Page 19: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Summary:Motivation for an a priori Model Selection Approach

Statistical models to separate pattern from noise

Single vs. multiple model approaches

Insignificance of Statistical Significance Testing (Johnson 1999)

Emphasis on parameter estimation and uncertainty

Ranking and evaluating competing hypotheses

Inference from multiple modelsoften difficult to identify the best model

Page 20: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Akaike’s Information Criterion (AIC)

• Metric to rank and compare models• Hirotugu Akaike (1973)

“An Information Criterion”• Simple metric with DEEP theory

Boltzmann’s entropy – Physics Kullback-Leibler discrepancy – Information theoryMaximum Likelihood Theory - Statistics

Page 21: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Kullback-Leibler Discrepancy

Page 22: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Maximum Likelihood (ML)

• Good statistical propertiesUnbiasedMinimum variance

• Links models, parameters, data• L (parameters | model, data)• Usually expressed as a log value:

log (L (|g(y),y))• Aim is to maximize the log value

Page 23: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

ML Example

• Binomial modelL (p | binomial, y)

n

yp py n y

( )1

• Log (L (p | bin, y))

lo g lo g ( ) ( ) lo g ( )n

yy p n y p

1

For n=11 and y=7

Page 24: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model over-fitting

Page 25: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Principle of Parsimony

Page 26: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

AIC Basics

AIC = -2logL + 2k

-2logL = -2log(L (y))

^

(Model fit)k = number of parameters“Penalty”

Page 27: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

AICc for small sample sizes

A IC c A ICk k

n k

2 1

1

( )

• Less biased

• Use when n/k < 40

• Better, use all the time!

Page 28: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model Selection

• Compute AICc for each model• Rank lowest to highest• Lowest AICc = “best” model• Example:

Northern Spotted Owl Survival Analysis

Effects of Seasonal Climate covariates(Precipitation and Temperature)

Page 29: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model ranking by AICcModel AICcPen 1736.40Pen+Pln 1737.91Pen+Ten 1738.31no climate 1738.72Pdp' 1739.54Phs 1739.82Ths 1740.56Tln+Pln 1741.07Pen+Ten+Pln+Tln 1741.09Pws+Tws 1741.67Pen+Ten+Pws+Tws 1742.36Pen+Ten+Pln+Tln+Pws+Tws 1745.05

Page 30: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

AICc

AICc = AICc(model) – AICc(min)

Compare model relative to “best” model

Rules of Thumb (B&A):0-2 = Competing, substantial support4-7 = Less supported10+ = Essentially no support

Page 31: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model AICc AICcPen 1736.40 0Pen+Pln 1737.91 1.506Pen+Ten 1738.31 1.906no climate 1738.72 2.32Pdp' 1739.54 3.137Phs 1739.82 3.416Ths 1740.56 4.157Tln+Pln 1741.07 4.66Pen+Ten+Pln+Tln 1741.09 4.682Pws+Tws 1741.67 5.265Pen+Ten+Pws+Tws 1742.36 5.953Pen+Ten+Pln+Tln+Pws+Tws 1745.05 8.643

Relative rankings

Page 32: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Akaike weights

• Relative likelihood of each model

• Specific to model set (wi=1)

w ii

rr

R

ex p ( . )

ex p ( . )

0 5

0 51

Page 33: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model weights

Model AICc Weight Rel. LikelihoodPen 0 0.3318 1.0000Pen+Pln 1.506 0.1562 0.4709Pen+Ten 1.906 0.1279 0.3855no climate 2.32 0.1040 0.3135Pdp' 3.137 0.0691 0.2084Phs 3.416 0.0601 0.1813Ths 4.157 0.0415 0.1251Tln+Pln 4.66 0.0323 0.0973Pen+Ten+Pln+Tln 4.682 0.0319 0.0962Pws+Tws 5.265 0.0239 0.0719Pen+Ten+Pws+Tws 5.953 0.0169 0.0510Pen+Ten+Pln+Tln+Pws+Tws 8.643 0.0044 0.0133

1.0000 3.0143

Page 34: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model weights

Model AICc AICc WeightPen 1736.40 0 0.3318Pen+Pln 1737.91 1.506 0.1562Pen+Ten 1738.31 1.906 0.1279no climate 1738.72 2.32 0.1040Pdp' 1739.54 3.137 0.0691Phs 1739.82 3.416 0.0601Ths 1740.56 4.157 0.0415Tln+Pln 1741.07 4.66 0.0323Pen+Ten+Pln+Tln 1741.09 4.682 0.0319Pws+Tws 1741.67 5.265 0.0239Pen+Ten+Pws+Tws 1742.36 5.953 0.0169Pen+Ten+Pln+Tln+Pws+Tws 1745.05 8.643 0.0044

Page 35: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Fun things to do with weights

• Evidence ratios Compare one model to another

• Confidence setsWhat models are more likely?

• Importance valuesWhat variables are most

important?

Page 36: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Evidence Ratios

Compare best model (Pen)

with “no climate model”:

Wpen = 0.3318 , Wno climate = 0.1040

ER = 0.3318/0.1040 = 3.19

Pen model ~ 3X more likely than no climate model

Page 37: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Confidence SetModel AICc Weight Cumm. WtsPen 0 0.3318 0.3318Pen+Pln 1.506 0.1562 0.4880Pen+Ten 1.906 0.1279 0.6158no climate 2.32 0.1040 0.7199Pdp' 3.137 0.0691 0.7890Phs 3.416 0.0601 0.8491Ths 4.157 0.0415 0.8906Tln+Pln 4.66 0.0323 0.9229Pen+Ten+Pln+Tln 4.682 0.0319 0.9548Pws+Tws 5.265 0.0239 0.9787Pen+Ten+Pws+Tws 5.953 0.0169 0.9956Pen+Ten+Pln+Tln+Pws+Tws 8.643 0.0044 1.0000

95%

Page 38: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Importance values

• Cement Hardening Example (B&A)• Time to hardening based (y) on

composition of 4 different ingredients (xi)

• Regression:y = b0+b1(x1)+b2(x2)+b3(X3)+b4(x4)

Page 39: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

AIC in regression analyses

• Number of parameters:k = number of variables (xi) +

intercept (if used) + error variance (2)

• AIC may be calculated from (2) as:AIC = nlog (2) + 2k

^^

^

Page 40: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Multi-model inferenceModel Averaging

• Incorporates model selection uncertainty

• Used for parameter estimationDirectly estimated or notE.g. Regression coefficients,

predicted values

Page 41: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Pitfalls to avoid

• Use same data set for all modelsCaution: missing values

• Transform X’s but not Y• Number of parameters known?

“hidden” parameters“lost” parameters

Bottom line: Know what you are doing!

Page 42: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Interpreting Results

Some issues:Models differing by 1 parameterModel ambiguityNull model bestModel redundancy

Page 43: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

Model K AICc AICc DeviancePen 8 1736.40 0.000 461.182Pen+Pln 9 1737.91 1.506 460.665Pen+Ten 9 1738.31 1.906 461.065no climate 7 1738.72 2.320 465.523Pdp' 8 1739.54 3.137 464.319Phs 8 1739.82 3.416 464.598Ths 8 1740.56 4.157 465.339Tln+Pln 9 1741.07 4.660 463.819Pen+Ten+Pln+Tln 11 1741.09 4.682 459.785Pws+Tws 9 1741.67 5.265 464.423Pen+Ten+Pws+Tws 11 1742.36 5.953 461.056Pen+Ten+Pln+Tln+Pws+Tws 13 1745.05 8.643 459.680

Page 44: Model Selection and Inference:  Motivation, Mechanics, and Interpretation

ModelAmbiguity

Model Name k AICc AICchilpsp 12 1123.922 0.000himpsl 12 1124.913 0.991hinpl 12 1125.082 1.160hilpsl 12 1125.448 1.526hi1500p 12 1125.622 1.700himpsp 12 1125.838 1.916hinpn 12 1125.853 1.931hilpsl 13 1126.181 2.259hi2400l lm2400l hir1p hir2p15 1126.195 2.273no1500q 12 1126.303 2.381no habitat covariates 11 1126.331 2.409hi1500l 12 1126.448 2.526hi1500l lm1500l 13 1126.571 2.649no2400q 12 1126.607 2.685hi1500p lm1500p 13 1126.765 2.843hi2400p 12 1126.841 2.919hicorep 12 1126.988 3.066hi600p hir1p 13 1127.143 3.221hi2400l 12 1127.505 3.583elev 12 1127.538 3.616hi600q 13 1127.630 3.709elevl 12 1127.640 3.718hi1500p hiedgen 13 1127.694 3.772hicorel 12 1127.702 3.780

Plus 40 more models < 7 AICc

NSO ProductivityModeled as function of Habitat covariates