# logistic regression used to relate ground water … · logistic regression used to relate ground...

Post on 30-Jul-2018

214 views

Embed Size (px)

TRANSCRIPT

Logistic Regression Used To Relate Logistic Regression Used To Relate Ground Water Quality To ManGround Water Quality To Man--made made

And Natural CausesAnd Natural CausesBetter then Multiple RegressionBetter then Multiple Regression

Jim Tesoriero, Michael G. Rupert, Jim Tesoriero, Michael G. Rupert, Lonna M. Frans, and Lonna M. Frans, and

Alex K. (Sandy) Williamson (Presenter)Alex K. (Sandy) Williamson (Presenter)National WaterNational Water--Quality Assessment ProgramQuality Assessment Program

U.S. Geological SurveyU.S. Geological Survey

Vulnerability Factors:Vulnerability Factors:

Relate waterRelate water--quality data to quality data to naturalnatural and and humanhuman factorsfactors

NaturalNatural Hydrogeology:Hydrogeology: ease with which contaminant ease with which contaminant can reach aquifer or sampling point can reach aquifer or sampling point called called susceptibility or potential vulnerabilitysusceptibility or potential vulnerability

HumanHuman Contaminant Source:Contaminant Source: effects of land use effects of land use activities activities called vulnerabilitycalled vulnerability

Vulnerability MethodsVulnerability Methods

ProcessProcess--based Simulation Modelsbased Simulation Models

Overlay and Index Methods, i.e.. Overlay and Index Methods, i.e.. DRASTICDRASTIC--typetype

Statistical Methods Statistical Methods Regression, Linear or Regression, Linear or Logistic, using Logistic, using Existing waterExisting water--quality dataquality data

Logistic RegressionLogistic Regression

Answers a simpler question than linear regression does. Answers a simpler question than linear regression does.

Therefore it can answer the simpler question with a higher Therefore it can answer the simpler question with a higher degree of confidence.degree of confidence.

Probability that a constituent concentration is above a specifieProbability that a constituent concentration is above a specified d level is predicted.level is predicted.

Variables included in the model and their relative weights are Variables included in the model and their relative weights are chosen statistically.chosen statistically.

Let me explain with a few graphs.

Linear RegressionLinear Regression

Multiple RegressionMultiple Regression

Logistic RegressionLogistic Regression

Logistic RegressionLogistic Regression

0

1

2

3

4

5

6

7

8

9

10

0 50 100 150 200

Well Depth (meters)

Nitr

ate

(mg/

L)

Predictions Of Concentrations By Linear RegressionMay Not Be Possible At Acceptable Certainty Level...

It May Be Possible to Predict the Probability that a Concentration Exceeds a Threshold

0

5

10

15

20

25

0 50 100 150 200

[NO3]3 mg/L

Prob

abili

ty in

Per

cent

Well Depth (meters)

0

20

40

60

80

100

0 50 100 150 200Well Depth (meters)

Prob

abili

ty o

f Ele

vate

d N

itrat

e

Puget Study Results Consistent with Logistic Regression Fit

AgricultureUrban

Forest

Agricultural StudyUrban Study

Predicted depth to which wells need to be cased to have at least an 80-percent

chance of obtaining water

with a nitrate concentration

less than 3 milligrams per

liter.

Advantages of Statistical Method for Advantages of Statistical Method for Vulnerability AssessmentVulnerability Assessment

Probability that a threshold is exceeded can be Probability that a threshold is exceeded can be predicted both spatially and temporallypredicted both spatially and temporally

Variables in model and their relative Variables in model and their relative importance is determined statisticallyimportance is determined statistically

Data collection needs are minimal if existing Data collection needs are minimal if existing water quality and ancillary data can be usedwater quality and ancillary data can be used

Advantages of Advantages of Logistic RegressionLogistic Regression

Calculates actual probability of detections, instead of assigning subjective categories such as High VulnerabilityWeighting and interaction of variables such as geology and soils is automatically evaluated by logistic regressionAll possible combinations of independent variables such as land use and soils can be numerically evaluated, and the best model selected. Can be used for pesticide data (few detections, concentrations near laboratory reporting limits)

ReferencesReferenceshttp://water.usgs.gov/pubs/wri/wri02http://water.usgs.gov/pubs/wri/wri02--4269/pdf/WRIR024269/pdf/WRIR02--4269.pdf4269.pdf

http://wa.water.usgs.gov/pubs/misc/ps.gw.vol35.no.6http://wa.water.usgs.gov/pubs/misc/ps.gw.vol35.no.6.html.htmlhttp://www.ecy.wa.gov/events/hg/PastSymposhttp://www.ecy.wa.gov/events/hg/PastSymposia/abstracts2000.pdfia/abstracts2000.pdf page 10 & 77page 10 & 77http://webserver.cr.usgs.gov/midconherb/html/texas.http://webserver.cr.usgs.gov/midconherb/html/texas.htmlhtmlhttp://wa.water.usgs.gov/pubs/wri/wri004110http://wa.water.usgs.gov/pubs/wri/wri004110

The EndThe End

Sandy Williamson, presenterSandy Williamson, presenterU.S. Geological Survey, Water Resources,U.S. Geological Survey, Water Resources,http://water.usgs.gov/nawqa/datahttp://water.usgs.gov/nawqa/data1201 Pacific Avenue, Suite 600, Tacoma, WA 1201 Pacific Avenue, Suite 600, Tacoma, WA 9840298402(253) 428(253) 428--3600 ext. 2683; cell (253) 3763600 ext. 2683; cell (253) 376--82738273EE--mail [email protected] [email protected]

GIS DATAGIS DATA

Chemical UseChemical UseElevationElevationGeologyGeologyHydrogeomorphic regionsHydrogeomorphic regionsLand coverLand coverPrecipitationPrecipitationSoilsSoils

ConclusionsConclusions

GroundGround--water vulnerability assessed using water vulnerability assessed using statistical approach.statistical approach.

Useful tool for prioritizing monitoring and the Useful tool for prioritizing monitoring and the selection of source areas.selection of source areas.

Method can be used to evaluate other Method can be used to evaluate other contaminants contaminants and levels (e.g., MCLs).and levels (e.g., MCLs).

OPTIMUM NITRATE OPTIMUM NITRATE CONCENTRATIONCONCENTRATION

Three nitrate concentrations were evaluated to Three nitrate concentrations were evaluated to determine which concentration generated the determine which concentration generated the most effective logistic regression model: 2 mg/L, most effective logistic regression model: 2 mg/L, 5mg/L, and 10mg/L.5mg/L, and 10mg/L.The nitrate data were transformed to detect/no The nitrate data were transformed to detect/no detect nomenclature at each of these three detect nomenclature at each of these three concentrations, and logistic regression models concentrations, and logistic regression models were developed.were developed.

AbstractAbstractGroundGround--water vulnerability water vulnerability typically typically has typically been assessed using qualitative methods has typically been assessed using qualitative methods expressed as relative measures of risk, like DRASTIC. The logistexpressed as relative measures of risk, like DRASTIC. The logistic regression approach has the ic regression approach has the advantage of having both model variables and coefficient values advantage of having both model variables and coefficient values determined on the basis of determined on the basis of existing waterexisting water-- quality information. Unlike DRASTquality information. Unlike DRASTIICC--type methods, type methods, the logistic regression the logistic regression approach approach it does not depend on the somewhat arbitrary assignment of variait does not depend on the somewhat arbitrary assignment of variables and weighting bles and weighting factors based on qualitative criteria. Logistic regression is a factors based on qualitative criteria. Logistic regression is a great way to rigorously relate mangreat way to rigorously relate man--made and natural factors to groundmade and natural factors to ground-- water quality. It usually produces more statistical confidence water quality. It usually produces more statistical confidence than regular multiple regression because logisticthan regular multiple regression because logisticalal regression tries to answer a simpler yesregression tries to answer a simpler yes or or /no /no question (contaminant exceeds threshold) rather than the multiplquestion (contaminant exceeds threshold) rather than the multiple regression question relating a e regression question relating a wide range of concentration values to contributing factors. Logiwide range of concentration values to contributing factors. Logistic regression can be used to stic regression can be used to assess assess groundground--water water aquifer susceptibility (relative ease with which contaminants wiaquifer susceptibility (relative ease with which contaminants will ll can can reach reach aquifer) and groundaquifer) and ground--water vulnerability (relative ease with which contaminants will water vulnerability (relative ease with which contaminants will reach aquifer reach aquifer for a given set of landfor a given set of land--use practices). In 3use practices). In 3threethree USGS studies in Washington and Colorado, the USGS studies in Washington and Colorado, the variables that best explain the occurrence of high nitrate or pevariables that best explain the occurrence of high nitrate or pesticides includedsticides included the the (1) (1) well well and/or casing depth, and/or casing depth, (2) (2) the percentage of urban and agricultural land or the amount of fthe percentage of urban and agricultural land or the amount of fertilizer ertilizer applied within a radius of 3.2 kilometers of the well, applied within a radius of 3.2 kilometers of the well, (3) (3) surficial geology, and the surficial geology, and the (4) (4) mean soil mean soil hydrologic group, which is a measure of soil hydrologic group, which is a measure of soil --infiltration rate. Maps can be made of the predicted infiltration rate. Maps can be made of the predicted depth to which wells would need to be cased in order to have an depth to which wells would need to be cased in order to have an XX--percentpercentageage probability of probability of drawing water with high nitrate or pesticides or the predicted pdrawing water with high nitrate or pesticides or the predicted probability of high nitrate or robability of high nitrate or pesticides for wells cased to median casing depth. pesticides for wells cased to median casing depth.

Logistic Regression Used To Relate Ground Water Quality To Man-made And Natural CausesBetter then Multiple RegressionVulnerability Factors:Vulnerability MethodsLogistic RegressionLinear RegressionMultiple RegressionLogistic RegressionLogistic RegressionPredicted depth to which wells need to be cased to have at least an 80-percent chance of obtaining water with a nitrate concenAdvantages of Statistical Method for Vulnerability AssessmentAdvantages of Logistic RegressionReferencesThe EndGIS DATAConclusionsOPTIMUM NITRATE CONCENTRATIONAbstract