Transcript
Page 1: Predicting Median Substrate

Predicting Median Substrate

for Oregon and Washington EMAP sites

Utilizing GIS data

Julia J. Smith

December 12, 2005

Page 2: Predicting Median Substrate

Why Predict Median Substrate?

Indicator of overall stream health• Bed load transport• Stream Power• Microinvertebrate habitat• Fish habitat• How is human development

affecting a stream

Page 3: Predicting Median Substrate

What is LD50?

LD50 is a measure of median substrate.• Geometric mean of class boundaries• Log10 of the geometric means

• Several samples at each site• LD50 is the median value of

log10(geometric mean of class)

Page 4: Predicting Median Substrate

Substrate Classifications

Substrate Size (mm)

Class Geometricmean

Log10 of

geom. mean

8000-4000 Bedrock 5656.85 3.7527

4000-250 Boulders 1000.00 3.0000

250-64 Cobbles 126.49 2.1020

64-16 Gravel (coarse) 32.00 1.5052

16-2 Gravel (fine) 5.66 0.7526

2-.06 Sand 0.35 -0.4604

.06-.001 Fines 0.00775 -2.1109

Page 5: Predicting Median Substrate

Washington EPA Sites for LD50 Study

LD50 key-2.11-0.460.150.751.131.511.802.102.5533.75

Page 6: Predicting Median Substrate

Oregon EPA Sites for LD50 Study

LD50 key-2.11-1.29-0.460.150.751.131.511.802.1033.75

Page 7: Predicting Median Substrate

Geomorphic Metrics

is the total bank-full shear stresss is the density of sediment is fluid densityg is gravitational accelerationh is bank-full depthS is channel slope

D50

(s )gtc*

hS(s )tc

*

* is critical sheer stressct

Page 8: Predicting Median Substrate

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.00

0.05

0.10

0.15

0.20

0.25

0.30

LD50

Dis

tanc

e W

eigh

ted

Stre

am P

ower

Distance-weighted Stream Power versus LD50r = 0.327, p-value = 2.63 x 10 -12

Geomorphic Metrics

Page 9: Predicting Median Substrate

Geomorphic Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.00

0.05

0.10

0.15

0.20

LD50

Slo

pe

Outlet link mean slope versus LD50r = 0.214, p-value = 3.78 x 10-6

Page 10: Predicting Median Substrate

Geologic Metrics

Percent Unconsolidated Geologic type versus LD50 r = -0.246, p-value = 1.18 x 10-7

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.0

0.2

0.4

0.6

0.8

1.0

LD50

Per

cent

Unc

onso

lidat

ed

Page 11: Predicting Median Substrate

Climatic Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

1000

2000

3000

4000

LD50

Ave

rage

Ann

ual P

reci

pita

tion

Annual average precipitation versus LD50r = 0.199, p-value = 1.56 x 10-6

Page 12: Predicting Median Substrate

Climatic Metrics

Average annual potential evapotranspiration (mm) versus LD50 r = -0.046, p-value = 0.342

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

020

040

060

080

010

0012

0014

00

LD50

Ave

rage

Ann

ual P

oten

tial E

vapo

trans

pira

tion

Page 13: Predicting Median Substrate

Land Cover Metrics

1. Developed 2. Barren 3. Forest 4. Grasses5. Agriculture 6. Wetlands7. Open water/perennial ice and snow8. Shrubland

Page 14: Predicting Median Substrate

Land Cover Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.0

0.2

0.4

0.6

0.8

1.0

LD50

Per

cent

For

est

Percentage of watershed that is forest versus LD50 r = 0.19, p-value = 3.516 x 10-5

Page 15: Predicting Median Substrate

Distance-Weighted metrics

1

( )

( )

j

i

d

jj n

di

i

A eWeighted Area

A e

j represents the land cover type of concern, Aj represents the total area for land cover type j in the watershed, represents the coefficient of exponential decay, represents average distance from outlet for land cover of type j n represents the total number of the land cover types jd

Page 16: Predicting Median Substrate

Additional Land Cover Metrics

Buffered Metrics – Buffered within a measure of the stream (30 meters, 100 meters, 300 meters)

Buffered and Distance-weighted metrics

Page 17: Predicting Median Substrate

Goals

Predict LD50 without visiting sites Small number of predictors for

scientifically sensible model

Page 18: Predicting Median Substrate

Methods-Stepwise Variable Selection

Multiple Linear Regression Top-in-tier models Top geomorphic models plus one from

each of the remaining tiers

Page 19: Predicting Median Substrate

Akaike’s Information Criterion

log 2( 2)RSS

N pN

N observationsp predictors

RSS is the sum of squared residuals

Page 20: Predicting Median Substrate

AIC in stepwise variable selection

Forward Stepwise Selection -

Method for choosing the top predictor from each tier

1. Start with the intercept model

2. Choose the variable that reduces AIC the most and include in model.

Stepwise selection in both directions-

Method chosen for choosing all top Geomorphic predictors

1. Start with full model.

2. Add and subtract variables until the model with minimum AIC is found or iteration stops.

Page 21: Predicting Median Substrate

Methods: CART Classification and Regression Trees

|DWSP2< 0.03129

snow_jan< 190.6

MENTR>=20.35

b30_l11< 0.003034

r8_l80_A>=0.0917b100_l51< 0.004057

prcp_sep< 19.05

avgt_jun>=12.58

prcp_may< 46.6

link_sa4< 0.08306

prcp_jan< 47.49

b30_r7_l30>=0.01239

mint_apr>=2.647

min_elev>=1025

-1.66

-1.03 0.69

0.565

0.941-0.823

0.298 1.49

-1.04-0.172 1.02

1.65 0.4391.49 2.01

Page 22: Predicting Median Substrate

Methods: CART Classification and Regression Trees

Predicted Response:

1

ˆ ˆ( ) 1i j

q

i j x Nj

y x a

Page 23: Predicting Median Substrate

Hybrid of Multiple Linear Regression and CART

Utilize CART on the residuals Add indicator variables to the

multiple linear regression equation for one minus the number of terminal nodes in the tree

Create new multiple regression model with variables and indicator variables

Page 24: Predicting Median Substrate

Predictive-ability Statistics

2

( )1

ˆ( )n

p i i ii

PRESS Y Y

2 1 pprediction

PRESSR

SSTO

Page 25: Predicting Median Substrate

Analysis Comparison – Top 4-tier Models

Problems with top 4-tier models Low Adjusted R2

Low Predictive Ability Over-prediction and under-prediction of fine and

bedrock substrate Non-normal residuals

Benefit of top 4-tier models Small number of predictors

Page 26: Predicting Median Substrate

Example of Non-normality of ResidualsTop 4-Tier Model

-3 -2 -1 0 1 2 3

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 27: Predicting Median Substrate

Analysis Comparison – Geomorphic plus Top 3-Tier Models

Problems with top geomorphic plus top 3-tier model Increase in number of variables Predictive ability still low Over-prediction and under-prediction of

fine and bedrock substrate Some collinearity between variables

Page 28: Predicting Median Substrate

Analysis Comparison – Geomorphic plus Top 3-Tier Models

Benefits with top geomorphic plus top 3-tier model Improved predictions Improved normality of residuals

Page 29: Predicting Median Substrate

Comparison of Analysis - CART

Problems with CART Low predictive-ability Predicts several observed substrate sizes in

one node Over-prediction and under-prediction of fines and

bedrock substrate Omitting one site creates different tree

Benefits of CART Simple analysis Missing variables not an issue

Page 30: Predicting Median Substrate

CART Predictions

-2 -1 0 1 2 3 4

-2-1

01

23

4

Observed LD50 Values

LD

50

CA

RT

Pre

dic

tion

s

Page 31: Predicting Median Substrate

Comparison of Analysis-Hybrids

Problems with hybrid models Increased number of variables Collinearity with introduction of node

indicator variables Non-normal residuals

Page 32: Predicting Median Substrate

Comparison of Analysis-Hybrids

Benefit of hybrid models Residuals closer to normal Increased predictive-ability Explains some of the variation created

by fitting a linear model to ordinal data

Page 33: Predicting Median Substrate

One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model

Most promising multiple regression prediction model: Geomorphic plus top 3-tier

Response Adjusted R2

PRESSpfor LD50

MSPR

LD50 0.362 504.802 1.274 0.319

2predictionR

Page 34: Predicting Median Substrate

One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model

|slp_elon< 0.3566

out_sa< 0.008686

CVENTR>=0.1489

out_sa>=0.004734

link_slope>=0.002764

topo_wet>=8.152

shed_slp>=14.97

link_sa< 0.0431

link_sa>=0.08093

b30_r5_l42>=0.929

CVCON>=0.4208 b30_r5_l42< 0.5441

CVCON>=0.4342 avgt_jun< 12.32

b30_r5_l42>=0.759

slp_elon< 0.5467 MENTB>=15.63

-0.8348

-1.1 -0.1191

0.6496

-0.6906

-0.6472

-0.8996 -0.09977

-0.9114 0.2462

-0.97080.0004686

-0.2892 0.4309

0.581

0.4488

0.7804

0.8367

Page 35: Predicting Median Substrate

One example: Observed vs. Predicted forHybrid Geomorphic plus Top 3-Tier Model

Plot of predictions against observed LD50

-2 -1 0 1 2 3

-20

24

Observed LD50 Values

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

Page 36: Predicting Median Substrate

QQ-Plot of Residuals for Hybrid Model

-3 -2 -1 0 1 2 3

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 37: Predicting Median Substrate

Coast Range Ecoregion

Less skewed distribution of LD50 No measurements are outliers Similar ecosystem throughout

region

Page 38: Predicting Median Substrate

Ecoregion Distributions

-3 -1 1 3

LD50

Blue Mountains

Cascades

Coast Range

Colorado Plateau

Columbia Plateau

Eastern Cascades Slopes and Foothills

Klamath Mountains

North Cascades

Northern Basin and Range

Northern Rockies

Puget Low land

Snake River Plain

Willamette Valley

leve

l.3.e

core

gion

Page 39: Predicting Median Substrate

Coast Range EMAP Sites

LD50 key-2.11-1.29-0.460.751.131.511.802.1033.75

Page 40: Predicting Median Substrate

Top 4-Tier Coast Range Model

Predictors Average aspect (climatic) Average watershed elevation (geomorphic) % watershed as volcanic geologic type

(geologic) % wetlands (distance weighted and buffered)

Page 41: Predicting Median Substrate

QQ-Plot: Top 4-Tier Coast Range

-2 -1 0 1 2

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 42: Predicting Median Substrate

Observed versus Predicted: Top 4-Tier Coast Range Model

-2 -1 0 1 2 3

-3-2

-10

12

3

Observed LD50

Cro

ss-V

alid

ated

LD

50 P

redi

ctio

ns

Page 43: Predicting Median Substrate

Coast Range ModelTop Geomorphic Variables

1. Average watershed elevation (m) 2. Drainage density3. Mean slope within a 300-meter buffer4. Ratio of width of stream to width of floodplain5. Coefficient of average hill connectivity6. Distance to the first tributary (m)7. Percent of landscape with less than 4% slope8. Percent of landscape with less than 7% slope9. Measure of size and complexity of river10. Percent of stream as cascade11. Distance-weighted stream power 12. Watershed relief divided by its length

Page 44: Predicting Median Substrate

QQ-Plot: Coast Range Geomorphic plus Top 3-Tier model

-2 -1 0 1 2

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 45: Predicting Median Substrate

Observed versus Predicted: Coast Range Geomorphic + Top 3-Tier

-2 -1 0 1 2 3

-3-2

-10

12

3

Observed LD50

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

Page 46: Predicting Median Substrate

CART - Coast Range Ecoregion

-2 -1 0 1 2 3 4

-2-1

01

23

4

Observed LD50 Values

CA

RT

Pre

dict

ed L

D50

Val

ues

Predictions versus Observed LD50

Page 47: Predicting Median Substrate

Coast Range: Hybrid Models

Benefits of hybrid Improved prediction Improved fit Improved normality of residuals

Problems with hybrid Increased number of predictors Collinearity with node indicator

variables

Page 48: Predicting Median Substrate

QQ-Plot:Coast Range Hybrid Top 4-Tier

-2 -1 0 1 2

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Page 49: Predicting Median Substrate

Observed versus Predicted:Coast Range Hybrid Top 4-Tier

-2 -1 0 1 2 3

-2-1

01

23

Observed LD50 Values

Cro

ss-V

alid

atio

n LD

50 P

redi

ctio

ns

Page 50: Predicting Median Substrate

QQ-Plot: Coast Range Hybrid Geomorphic plus Top 3-Tier

-2 -1 0 1 2

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 51: Predicting Median Substrate

Observed versus Predicted: Coast Hybrid Geomorphic plus Top 3-Tier

-2 -1 0 1 2 3

-4-2

02

Observed LD50

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

Page 52: Predicting Median Substrate

Comparison of Coast Models

2predictionRModel Adjusted R2

Top 4-tier 0.384 0.362

Geomorphic plus top-3 0.548 0.495

CART NA 0.087

Top 4-tier hybrid 0.552 0.503

Geomorphic plus top-3 hybrid 0.700 0.614

Page 53: Predicting Median Substrate

Conclusions

LD50 is difficult to predict Additional geomorphic predictors

increases prediction ability Hybrid models increase prediction

ability More success in Coast Range

Ecoregion

Page 54: Predicting Median Substrate

Future Work

Logistic Regression Ordinal data treated as continuous in

this study 12 categories might require more

sophisticated methods

Spatial Analysis Appears to be spatial correlation in

distribution of LD50


Top Related