we start part ii (quantifying uncertainty) today lab 2 - equations

We start Part II (Quantifying Uncertainty) today

Lab 2 - EquationsTomorrow - Tue 3-5 or 7-9 PM - SN 4117

Assignment 2 – Data EquationsDue Wednesday

Data = Model + ResidualChapter 5

Data Equations

Data = Model + ResidualData = Model + ResidualData = Model + ResidualData = Model + Residual

Data Equations

• Central concept in the course• We’ll teach a general approach that will allow

you to set up an appropriate analysis of data• You don’t have to worry about whether you

have selected the ‘right’ test• We are going to use data equations to

compare models to dataData = Model + Residual

Flexible approach

Data = Model + Residual

Symbolic expression Analyse data

• Increase confidence using model triangle– Link symbols with graphical and verbal model

Data

Verbal

Graphical Formal

Assessing fit

• We’ll use data equations to measure:– How well the model fits the data (goodness of fit)– Error rate

Model

Plant height

Tim

e in

sunl

ight

Data

• We do not expect a perfect fit

• But model and observed values should be close

Definition

• Three terms: Y = Ŷ + εData Model ResidualObserved Expected Error

Fitted• Residuals:

ε = Y - Ŷ

e.g. Dobzhansky’s Fruit Flies

• Dobzhansky pioneered work on fruit fly evolutionary genetics in the lab and field

• One research question he addressed was:– Does genetic variability decrease at

higher altitude, due to stronger selection in extreme environments?

Nothing in Biology Makes Sense Except in the Light

of Evolution



of Evolution

Heterozygosity H (%)

Elevation E (km)

0.59 0.260.37 0.910.41 1.40.4 1.89

0.31 2.440.18 2.620.2 3.05


of Statistics

?


Elevation E (km)

0.59 0.260.37 0.910.41 1.40.4 1.89

0.31 2.440.18 2.620.2 3.05

Model?

• Heterozygosity in Drosophila = 40%Data = Model + Residual H = Ĥ + ε H = 40% + ε

• Many options• First we’ll check deviance

from a single value

Data ==

Model ++

Res. Res.2

2

0.59 = 0.4 + 0.19 0.03610.37 = 0.4 + -0.03 0.00090.41 = 0.4 + 0.01 0.00010.4 = 0.4 + 0.00 0.0000

0.31 = 0.4 + -0.09 0.00810.18 = 0.4 + -0.22 0.04840.2 = 0.4 + -0.20 0.0400∑ -0.34 0.1336

Model 1: Deviations from a Single Value Model

• With this simple model, we can form 7 data equations

Summed residuals is a measure of biasSummed residuals2 is a measure of goodness of fit

What if the parameter is unknown?

• Use statistical methods to make the "best" estimate• What does "best" mean?– Residuals should sum to zero (unbiased estimate)– Residuals should be as small as possible

• The mean meets both criteria • Next model: Deviations from the Mean

Data = Model + Residual

Data ==

Model ++

Res. Res.2

2

0.59 = 0.3514 + 0.2386 0.05690.37 = 0.3514 + 0.0186 0.00030.41 = 0.3514 + 0.0586 0.00340.4 = 0.3514 + 0.0486 0.0024

0.31 = 0.3514 + -0.0414 0.00170.18 = 0.3514 + -0.1714 0.02940.2 = 0.3514 + -0.1514 0.0229

= 0.3514 ∑ 0 0.1171

Model 2: Deviations from the Mean

• Form 7 data equations

Single value vs. Mean model𝐇=��+𝜺

= 0.4∑ res = -0.34∑ res2 = 0.1336

𝐇=𝐇+𝜺 = 0.3514∑ res = 0∑ res2 = 0.1171

• Mean model: unbiased and better fit• But biological criteria have been replaced by

statistical criteria



of Evolution


Elevation E (km)

0.59 0.260.37 0.910.41 1.40.4 1.89

0.31 2.440.18 2.620.2 3.05

What about elevation?...don't you

remember my question?

Does genetic variability decrease at higher altitude, due to stronger selection in extreme environments?

Model 3: Deviations from a linear trend

• What’s all that? ↗Data = Model + Residual

• Where:– is the heterozygosity gradient (%/km)– is elevation (km)– is the offset

Remember ?

%/km km %%%

Estimate slope () and offset ()


Elevation E (km)

0.59 0.26

0.2 3.05

�� E=∆ 𝑦∆ 𝑥=

(20−59 ) %(3.05−0.26 ) km

=−13.98 %/ km

H=𝐇𝒐+𝜷𝐄 ∙ E

H=H𝑜+(−13.98 % / km) ∙E5 9%= H𝑜+(−13.98 % /km)∙0.26H𝑜=59 %−(−13.98 % /km) ∙0.26H𝑜=62.62 %


𝐇=𝟎 .𝟔𝟐𝟔−𝟎 .𝟏𝟑𝟗𝟖 ∙𝐄+𝜺Parameters estimated using first and last and values

With this equation, we can calculate fitted values

Elevation Data ==

Model ++

Res. Res.2

2

0.26 0.59 = 0.5900 + 0.0000 0.00000.91 0.37 = 0.4991 + -0.1291 0.01671.4 0.41 = 0.4306 + -0.0206 0.0004

1.89 0.4 = 0.3622 + 0.0378 0.00142.44 0.31 = 0.2853 + 0.0247 0.00062.62 0.18 = 0.2601 + -0.0801 0.00643.05 0.2 = 0.2000 + 0.0000 0.0000∑ -0.1700 0.0253


𝐇=𝟎 .𝟔𝟐𝟔−𝟎 .𝟏𝟑𝟗𝟖 ∙𝐄+𝜺With this equation, we can calculate fitted values

???????

???????

There’s a better way to estimate slope

• “Least squares" estimate of slope ()• Estimate offset () from mean values

�� E=∑ (H −H)(E−E)

∑ (E−E ¿)2=−0.127 ¿

𝐇=��𝒐+ ��𝐄 ∙𝐄+𝜺

• Line passes through mean coordinates (, )• We know less about Y-intercept

Model 4: Deviations from a linear trend (least squares)

𝐇=𝟎 .𝟓𝟖−𝟎 .𝟏𝟐𝟕 ∙𝐄+𝜺

Model comparisonSingle value model based on prior knowledge | ∑ res = -0.34Mean model (least squares) | ∑ res = 0Linear trend (two data points) | ∑ res = -0.17Linear trend (least squares) | ∑ res = 0

Two unbiased models

Mean model (least squares) | ∑ res2 = 0.1171Linear trend (least squares) | ∑ res2 = 0.0204Reduction in squared deviance ∑ res2 =

0.0966

we start part ii (quantifying uncertainty) today lab 2 - equations

Documents