predicting median substrate

Post on 11-Jan-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Predicting Median Substrate. for Oregon and Washington EMAP sites Utilizing GIS data. Julia J. Smith December 12, 2005. Why Predict Median Substrate?. Indicator of overall stream health Bed load transport Stream Power Microinvertebrate habitat Fish habitat - PowerPoint PPT Presentation

TRANSCRIPT

Predicting Median Substrate

for Oregon and Washington EMAP sites

Utilizing GIS data

Julia J. Smith

December 12, 2005

Why Predict Median Substrate?

Indicator of overall stream health• Bed load transport• Stream Power• Microinvertebrate habitat• Fish habitat• How is human development

affecting a stream

What is LD50?

LD50 is a measure of median substrate.• Geometric mean of class boundaries• Log10 of the geometric means

• Several samples at each site• LD50 is the median value of

log10(geometric mean of class)

Substrate Classifications

Substrate Size (mm)

Class Geometricmean

Log10 of

geom. mean

8000-4000 Bedrock 5656.85 3.7527

4000-250 Boulders 1000.00 3.0000

250-64 Cobbles 126.49 2.1020

64-16 Gravel (coarse) 32.00 1.5052

16-2 Gravel (fine) 5.66 0.7526

2-.06 Sand 0.35 -0.4604

.06-.001 Fines 0.00775 -2.1109

Washington EPA Sites for LD50 Study

LD50 key-2.11-0.460.150.751.131.511.802.102.5533.75

Oregon EPA Sites for LD50 Study

LD50 key-2.11-1.29-0.460.150.751.131.511.802.1033.75

Geomorphic Metrics

is the total bank-full shear stresss is the density of sediment is fluid densityg is gravitational accelerationh is bank-full depthS is channel slope

D50

(s )gtc*

hS(s )tc

*

* is critical sheer stressct

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.00

0.05

0.10

0.15

0.20

0.25

0.30

LD50

Dis

tanc

e W

eigh

ted

Stre

am P

ower

Distance-weighted Stream Power versus LD50r = 0.327, p-value = 2.63 x 10 -12

Geomorphic Metrics

Geomorphic Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.00

0.05

0.10

0.15

0.20

LD50

Slo

pe

Outlet link mean slope versus LD50r = 0.214, p-value = 3.78 x 10-6

Geologic Metrics

Percent Unconsolidated Geologic type versus LD50 r = -0.246, p-value = 1.18 x 10-7

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.0

0.2

0.4

0.6

0.8

1.0

LD50

Per

cent

Unc

onso

lidat

ed

Climatic Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

1000

2000

3000

4000

LD50

Ave

rage

Ann

ual P

reci

pita

tion

Annual average precipitation versus LD50r = 0.199, p-value = 1.56 x 10-6

Climatic Metrics

Average annual potential evapotranspiration (mm) versus LD50 r = -0.046, p-value = 0.342

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

020

040

060

080

010

0012

0014

00

LD50

Ave

rage

Ann

ual P

oten

tial E

vapo

trans

pira

tion

Land Cover Metrics

1. Developed 2. Barren 3. Forest 4. Grasses5. Agriculture 6. Wetlands7. Open water/perennial ice and snow8. Shrubland

Land Cover Metrics

-2.111 -1.286 -0.46 0.146 0.753 1.129 1.505 1.804 2.102 2.551 3 3.753

0.0

0.2

0.4

0.6

0.8

1.0

LD50

Per

cent

For

est

Percentage of watershed that is forest versus LD50 r = 0.19, p-value = 3.516 x 10-5

Distance-Weighted metrics

1

( )

( )

j

i

d

jj n

di

i

A eWeighted Area

A e

j represents the land cover type of concern, Aj represents the total area for land cover type j in the watershed, represents the coefficient of exponential decay, represents average distance from outlet for land cover of type j n represents the total number of the land cover types jd

Additional Land Cover Metrics

Buffered Metrics – Buffered within a measure of the stream (30 meters, 100 meters, 300 meters)

Buffered and Distance-weighted metrics

Goals

Predict LD50 without visiting sites Small number of predictors for

scientifically sensible model

Methods-Stepwise Variable Selection

Multiple Linear Regression Top-in-tier models Top geomorphic models plus one from

each of the remaining tiers

Akaike’s Information Criterion

log 2( 2)RSS

N pN

N observationsp predictors

RSS is the sum of squared residuals

AIC in stepwise variable selection

Forward Stepwise Selection -

Method for choosing the top predictor from each tier

1. Start with the intercept model

2. Choose the variable that reduces AIC the most and include in model.

Stepwise selection in both directions-

Method chosen for choosing all top Geomorphic predictors

1. Start with full model.

2. Add and subtract variables until the model with minimum AIC is found or iteration stops.

Methods: CART Classification and Regression Trees

|DWSP2< 0.03129

snow_jan< 190.6

MENTR>=20.35

b30_l11< 0.003034

r8_l80_A>=0.0917b100_l51< 0.004057

prcp_sep< 19.05

avgt_jun>=12.58

prcp_may< 46.6

link_sa4< 0.08306

prcp_jan< 47.49

b30_r7_l30>=0.01239

mint_apr>=2.647

min_elev>=1025

-1.66

-1.03 0.69

0.565

0.941-0.823

0.298 1.49

-1.04-0.172 1.02

1.65 0.4391.49 2.01

Methods: CART Classification and Regression Trees

Predicted Response:

1

ˆ ˆ( ) 1i j

q

i j x Nj

y x a

Hybrid of Multiple Linear Regression and CART

Utilize CART on the residuals Add indicator variables to the

multiple linear regression equation for one minus the number of terminal nodes in the tree

Create new multiple regression model with variables and indicator variables

Predictive-ability Statistics

2

( )1

ˆ( )n

p i i ii

PRESS Y Y

2 1 pprediction

PRESSR

SSTO

Analysis Comparison – Top 4-tier Models

Problems with top 4-tier models Low Adjusted R2

Low Predictive Ability Over-prediction and under-prediction of fine and

bedrock substrate Non-normal residuals

Benefit of top 4-tier models Small number of predictors

Example of Non-normality of ResidualsTop 4-Tier Model

-3 -2 -1 0 1 2 3

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Analysis Comparison – Geomorphic plus Top 3-Tier Models

Problems with top geomorphic plus top 3-tier model Increase in number of variables Predictive ability still low Over-prediction and under-prediction of

fine and bedrock substrate Some collinearity between variables

Analysis Comparison – Geomorphic plus Top 3-Tier Models

Benefits with top geomorphic plus top 3-tier model Improved predictions Improved normality of residuals

Comparison of Analysis - CART

Problems with CART Low predictive-ability Predicts several observed substrate sizes in

one node Over-prediction and under-prediction of fines and

bedrock substrate Omitting one site creates different tree

Benefits of CART Simple analysis Missing variables not an issue

CART Predictions

-2 -1 0 1 2 3 4

-2-1

01

23

4

Observed LD50 Values

LD

50

CA

RT

Pre

dic

tion

s

Comparison of Analysis-Hybrids

Problems with hybrid models Increased number of variables Collinearity with introduction of node

indicator variables Non-normal residuals

Comparison of Analysis-Hybrids

Benefit of hybrid models Residuals closer to normal Increased predictive-ability Explains some of the variation created

by fitting a linear model to ordinal data

One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model

Most promising multiple regression prediction model: Geomorphic plus top 3-tier

Response Adjusted R2

PRESSpfor LD50

MSPR

LD50 0.362 504.802 1.274 0.319

2predictionR

One example: Residual Tree forHybrid Geomorphic plus Top 3-Tier Model

|slp_elon< 0.3566

out_sa< 0.008686

CVENTR>=0.1489

out_sa>=0.004734

link_slope>=0.002764

topo_wet>=8.152

shed_slp>=14.97

link_sa< 0.0431

link_sa>=0.08093

b30_r5_l42>=0.929

CVCON>=0.4208 b30_r5_l42< 0.5441

CVCON>=0.4342 avgt_jun< 12.32

b30_r5_l42>=0.759

slp_elon< 0.5467 MENTB>=15.63

-0.8348

-1.1 -0.1191

0.6496

-0.6906

-0.6472

-0.8996 -0.09977

-0.9114 0.2462

-0.97080.0004686

-0.2892 0.4309

0.581

0.4488

0.7804

0.8367

One example: Observed vs. Predicted forHybrid Geomorphic plus Top 3-Tier Model

Plot of predictions against observed LD50

-2 -1 0 1 2 3

-20

24

Observed LD50 Values

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

QQ-Plot of Residuals for Hybrid Model

-3 -2 -1 0 1 2 3

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Coast Range Ecoregion

Less skewed distribution of LD50 No measurements are outliers Similar ecosystem throughout

region

Ecoregion Distributions

-3 -1 1 3

LD50

Blue Mountains

Cascades

Coast Range

Colorado Plateau

Columbia Plateau

Eastern Cascades Slopes and Foothills

Klamath Mountains

North Cascades

Northern Basin and Range

Northern Rockies

Puget Low land

Snake River Plain

Willamette Valley

leve

l.3.e

core

gion

Coast Range EMAP Sites

LD50 key-2.11-1.29-0.460.751.131.511.802.1033.75

Top 4-Tier Coast Range Model

Predictors Average aspect (climatic) Average watershed elevation (geomorphic) % watershed as volcanic geologic type

(geologic) % wetlands (distance weighted and buffered)

QQ-Plot: Top 4-Tier Coast Range

-2 -1 0 1 2

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Observed versus Predicted: Top 4-Tier Coast Range Model

-2 -1 0 1 2 3

-3-2

-10

12

3

Observed LD50

Cro

ss-V

alid

ated

LD

50 P

redi

ctio

ns

Coast Range ModelTop Geomorphic Variables

1. Average watershed elevation (m) 2. Drainage density3. Mean slope within a 300-meter buffer4. Ratio of width of stream to width of floodplain5. Coefficient of average hill connectivity6. Distance to the first tributary (m)7. Percent of landscape with less than 4% slope8. Percent of landscape with less than 7% slope9. Measure of size and complexity of river10. Percent of stream as cascade11. Distance-weighted stream power 12. Watershed relief divided by its length

QQ-Plot: Coast Range Geomorphic plus Top 3-Tier model

-2 -1 0 1 2

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Observed versus Predicted: Coast Range Geomorphic + Top 3-Tier

-2 -1 0 1 2 3

-3-2

-10

12

3

Observed LD50

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

CART - Coast Range Ecoregion

-2 -1 0 1 2 3 4

-2-1

01

23

4

Observed LD50 Values

CA

RT

Pre

dict

ed L

D50

Val

ues

Predictions versus Observed LD50

Coast Range: Hybrid Models

Benefits of hybrid Improved prediction Improved fit Improved normality of residuals

Problems with hybrid Increased number of predictors Collinearity with node indicator

variables

QQ-Plot:Coast Range Hybrid Top 4-Tier

-2 -1 0 1 2

-3-2

-10

12

Normal Q-Q Plot

Theoretical Quantiles

Sa

mp

le Q

ua

ntil

es

Observed versus Predicted:Coast Range Hybrid Top 4-Tier

-2 -1 0 1 2 3

-2-1

01

23

Observed LD50 Values

Cro

ss-V

alid

atio

n LD

50 P

redi

ctio

ns

QQ-Plot: Coast Range Hybrid Geomorphic plus Top 3-Tier

-2 -1 0 1 2

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Observed versus Predicted: Coast Hybrid Geomorphic plus Top 3-Tier

-2 -1 0 1 2 3

-4-2

02

Observed LD50

Cro

ss-v

alid

atio

n LD

50 P

redi

ctio

ns

Comparison of Coast Models

2predictionRModel Adjusted R2

Top 4-tier 0.384 0.362

Geomorphic plus top-3 0.548 0.495

CART NA 0.087

Top 4-tier hybrid 0.552 0.503

Geomorphic plus top-3 hybrid 0.700 0.614

Conclusions

LD50 is difficult to predict Additional geomorphic predictors

increases prediction ability Hybrid models increase prediction

ability More success in Coast Range

Ecoregion

Future Work

Logistic Regression Ordinal data treated as continuous in

this study 12 categories might require more

sophisticated methods

Spatial Analysis Appears to be spatial correlation in

distribution of LD50

top related