logistic regression used to relate ground water … · logistic regression used to relate ground...
TRANSCRIPT
Logistic Regression Used To Relate Logistic Regression Used To Relate Ground Water Quality To ManGround Water Quality To Man--made made
And Natural CausesAnd Natural Causes––Better then Multiple RegressionBetter then Multiple Regression
Jim Tesoriero, Michael G. Rupert, Jim Tesoriero, Michael G. Rupert, Lonna M. Frans, and Lonna M. Frans, and
Alex K. (Sandy) Williamson (Presenter)Alex K. (Sandy) Williamson (Presenter)National WaterNational Water--Quality Assessment ProgramQuality Assessment Program
U.S. Geological SurveyU.S. Geological Survey
Vulnerability Factors:Vulnerability Factors:
Relate waterRelate water--quality data to quality data to naturalnatural and and humanhuman factorsfactors
NaturalNatural Hydrogeology:Hydrogeology: ease with which contaminant ease with which contaminant can reach aquifer or sampling point can reach aquifer or sampling point –– called called susceptibility or potential vulnerabilitysusceptibility or potential vulnerability
HumanHuman Contaminant Source:Contaminant Source: effects of land use effects of land use activities activities –– called vulnerabilitycalled vulnerability
Vulnerability MethodsVulnerability Methods
ProcessProcess--based Simulation Modelsbased Simulation Models
Overlay and Index Methods, i.e.. Overlay and Index Methods, i.e.. DRASTICDRASTIC--typetype
Statistical Methods Statistical Methods –– Regression, Linear or Regression, Linear or Logistic, using Logistic, using Existing waterExisting water--quality dataquality data
Logistic RegressionLogistic Regression
Answers a simpler question than linear regression does. Answers a simpler question than linear regression does.
Therefore it can answer the simpler question with a higher Therefore it can answer the simpler question with a higher degree of confidence.degree of confidence.
Probability that a constituent concentration is above a specifieProbability that a constituent concentration is above a specified d level is predicted.level is predicted.
Variables included in the model and their relative weights are Variables included in the model and their relative weights are chosen statistically.chosen statistically.
Let me explain with a few graphs….
Linear RegressionLinear Regression
Multiple RegressionMultiple Regression
Logistic RegressionLogistic Regression
Logistic RegressionLogistic Regression
0
1
2
3
4
5
6
7
8
9
10
0 50 100 150 200
Well Depth (meters)
Nitr
ate
(mg/
L)
Predictions Of Concentrations By Linear RegressionMay Not Be Possible At Acceptable Certainty Level...
It May Be Possible to Predict the Probability that a Concentration Exceeds a Threshold
0
5
10
15
20
25
0 50 100 150 200
[NO3]≥3 mg/L
Prob
abili
ty in
Per
cent
Well Depth (meters)
0
20
40
60
80
100
0 50 100 150 200Well Depth (meters)
Prob
abili
ty o
f Ele
vate
d N
itrat
e
Puget Study Results Consistent with Logistic Regression Fit
AgricultureUrban
Forest
Agricultural StudyUrban Study
Predicted depth to which wells need to be cased to have at least an 80-percent
chance of obtaining water
with a nitrate concentration
less than 3 milligrams per
liter.
Advantages of Statistical Method for Advantages of Statistical Method for Vulnerability AssessmentVulnerability Assessment
Probability that a threshold is exceeded can be Probability that a threshold is exceeded can be predicted both spatially and temporallypredicted both spatially and temporally
Variables in model and their relative Variables in model and their relative importance is determined statisticallyimportance is determined statistically
Data collection needs are minimal if existing Data collection needs are minimal if existing water quality and ancillary data can be usedwater quality and ancillary data can be used
Advantages of Advantages of Logistic RegressionLogistic Regression
• Calculates actual probability of detections, instead of assigning subjective categories such as “High Vulnerability”•Weighting and interaction of variables such as geology and soils is automatically evaluated by logistic regression•All possible combinations of independent variables such as land use and soils can be numerically evaluated, and the best model selected.• Can be used for pesticide data (few detections, concentrations near laboratory reporting limits)
ReferencesReferenceshttp://water.usgs.gov/pubs/wri/wri02http://water.usgs.gov/pubs/wri/wri02--4269/pdf/WRIR024269/pdf/WRIR02--4269.pdf4269.pdf
http://wa.water.usgs.gov/pubs/misc/ps.gw.vol35.no.6http://wa.water.usgs.gov/pubs/misc/ps.gw.vol35.no.6.html.htmlhttp://www.ecy.wa.gov/events/hg/PastSymposhttp://www.ecy.wa.gov/events/hg/PastSymposia/abstracts2000.pdfia/abstracts2000.pdf page 10 & 77page 10 & 77http://webserver.cr.usgs.gov/midconherb/html/texas.http://webserver.cr.usgs.gov/midconherb/html/texas.htmlhtmlhttp://wa.water.usgs.gov/pubs/wri/wri004110http://wa.water.usgs.gov/pubs/wri/wri004110
The EndThe End
Sandy Williamson, presenterSandy Williamson, presenterU.S. Geological Survey, Water Resources,U.S. Geological Survey, Water Resources,http://water.usgs.gov/nawqa/datahttp://water.usgs.gov/nawqa/data1201 Pacific Avenue, Suite 600, Tacoma, WA 1201 Pacific Avenue, Suite 600, Tacoma, WA 9840298402(253) 428(253) 428--3600 ext. 2683; cell (253) 3763600 ext. 2683; cell (253) 376--82738273EE--mail [email protected] [email protected]
GIS DATAGIS DATA
Chemical UseChemical UseElevationElevationGeologyGeologyHydrogeomorphic regionsHydrogeomorphic regionsLand coverLand coverPrecipitationPrecipitationSoilsSoils
ConclusionsConclusions
GroundGround--water vulnerability assessed using water vulnerability assessed using statistical approach.statistical approach.
Useful tool for prioritizing monitoring and the Useful tool for prioritizing monitoring and the selection of source areas.selection of source areas.
Method can be used to evaluate other Method can be used to evaluate other contaminants contaminants and levels (e.g., MCLs).and levels (e.g., MCLs).
OPTIMUM NITRATE OPTIMUM NITRATE CONCENTRATIONCONCENTRATION
Three nitrate concentrations were evaluated to Three nitrate concentrations were evaluated to determine which concentration generated the determine which concentration generated the most effective logistic regression model: 2 mg/L, most effective logistic regression model: 2 mg/L, 5mg/L, and 10mg/L.5mg/L, and 10mg/L.The nitrate data were transformed to detect/no The nitrate data were transformed to detect/no detect nomenclature at each of these three detect nomenclature at each of these three concentrations, and logistic regression models concentrations, and logistic regression models were developed.were developed.
AbstractAbstractGroundGround--water vulnerability water vulnerability typically typically has typically been assessed using qualitative methods has typically been assessed using qualitative methods expressed as relative measures of risk, like DRASTIC. The logistexpressed as relative measures of risk, like DRASTIC. The logistic regression approach has the ic regression approach has the advantage of having both model variables and coefficient values advantage of having both model variables and coefficient values determined on the basis of determined on the basis of existing waterexisting water-- quality information. Unlike DRASTquality information. Unlike DRASTIICC--type methods, type methods, the logistic regression the logistic regression approach approach it does not depend on the somewhat arbitrary assignment of variait does not depend on the somewhat arbitrary assignment of variables and weighting bles and weighting factors based on qualitative criteria. Logistic regression is a factors based on qualitative criteria. Logistic regression is a great way to rigorously relate mangreat way to rigorously relate man--made and natural factors to groundmade and natural factors to ground-- water quality. It usually produces more statistical confidence water quality. It usually produces more statistical confidence than regular multiple regression because logisticthan regular multiple regression because logisticalal regression tries to answer a simpler yesregression tries to answer a simpler yes or or /no /no question (contaminant exceeds threshold) rather than the multiplquestion (contaminant exceeds threshold) rather than the multiple regression question relating a e regression question relating a wide range of concentration values to contributing factors. Logiwide range of concentration values to contributing factors. Logistic regression can be used to stic regression can be used to assess assess groundground--water water aquifer susceptibility (relative ease with which contaminants wiaquifer susceptibility (relative ease with which contaminants will ll can can reach reach aquifer) and groundaquifer) and ground--water vulnerability (relative ease with which contaminants will water vulnerability (relative ease with which contaminants will reach aquifer reach aquifer for a given set of landfor a given set of land--use practices). In 3use practices). In 3threethree USGS studies in Washington and Colorado, the USGS studies in Washington and Colorado, the variables that best explain the occurrence of high nitrate or pevariables that best explain the occurrence of high nitrate or pesticides includedsticides included the the (1) (1) well well and/or casing depth, and/or casing depth, (2) (2) the percentage of urban and agricultural land or the amount of fthe percentage of urban and agricultural land or the amount of fertilizer ertilizer applied within a radius of 3.2 kilometers of the well, applied within a radius of 3.2 kilometers of the well, (3) (3) surficial geology, and the surficial geology, and the (4) (4) mean soil mean soil hydrologic group, which is a measure of soil hydrologic group, which is a measure of soil --infiltration rate. Maps can be made of the predicted infiltration rate. Maps can be made of the predicted depth to which wells would need to be cased in order to have an depth to which wells would need to be cased in order to have an XX--percentpercentageage probability of probability of drawing water with high nitrate or pesticides or the predicted pdrawing water with high nitrate or pesticides or the predicted probability of high nitrate or robability of high nitrate or pesticides for wells cased to median casing depth. pesticides for wells cased to median casing depth.