background data statistical calculations results future areas for research questions
TRANSCRIPT
Bacterial Contamination in Texas Coastal Bays:
Data Characterization
James SeppiCE397 – Statistics in Water Resources
Spring 2009
Background
CWA mandates classification of impaired water bodies.› Median fecal coliform concentration in bay
and gulf waters, exclusive of buffer zones, shall not exceed 14 colonies per 100 ml, with not more than 10% of all samples exceeding 43 colonies per 100 ml. - TAC, Title 30, Part 1, Chapter 307, Rule §307.7
Future work at the CRWR – modeling for determination of TMDL
Background - Bays
East Matagorda Bay Cedar Lakes Tres Palacios/Turtle
Bays Lavaca/Chocolate
Bays Cox Bay Crancahua Bay San Antonio/ Hynes/
Guadalupe Bays Copano Bay Matagorda Bay
Data
TCEQ Surface Water Quality Monitoring – accessible online
Fecal Colony Forming Units / 100 mL ~1972-2005 Detection Limit of 2 cfu/100mL Censored Data – “Less Thans”
› Ex: <10 cfu/100mL Measured at multiple stations per bay
Data
Statistics - Project Goals
1) Confirm Data are LogNormally-Distributed
2) Calculate Median and 90th Percentiles› Calculate Confidence Intervals› For period of record, for last 5 years, and
for last 7 years
3) Calculate Prediction Intervals
Statistics
How to deal with all the censored data and those at the detection limit?
Best method of estimation? Large data sets (mostly)
Bay n n.cen%
CensoredCedar Lake 65 3 4.62%Lavaca Bay 5839 2754 47.17%Copano Bay 1787 1266 70.84%Cox Bay 483 297 61.49%Crancahua Bay 1054 617 58.54%East Matagorda Bay 1668 1192 71.46%Matagorda Bay 2777 1632 58.77%San Antonio Bay 2742 1599 58.32%Tres Palacios/Turtle Bay 3777 2025 53.61%
Statistics - NADA
Underused in the field, even though we have lots of nondetects in environmental data.
Very important!
Statistics – NADA
Three approaches detailed› Substitution› Maximum Likelihood Estimation› Regression on Order Statistics
Statistics – NADA
Three approaches detailed› Substitution› Maximum Likelihood Estimation› Regression on Order Statistics
Statistics – NADA MLE
Three approaches detailed› Substitution
› Maximum Likelihood Estimation 50-80% censored data Large number of data points
› Regression on Order Statistics
Statistics – NADA MLE
These don’t look so good… MLE might be overestimating SD
Bay Mean SDCedar Lake 158.67 1425.47Copano Bay 118.83 59419.63Cox Bay 97.36 17599.06Crancahua Bay 390.41 149704.98East Matagorda Bay 199.75 175014.38Lavaca Bay 392.71 57165.47Matagorda Bay 273.56 96400.48San Antonio Bay 77.24 5482.49Tres Palacios/Turtle Bay 265.00 45315.73
Results – NADA MLE Plots
Results – NADA MLEBay Median Lower Conf Upper ConfCedar Lake 17.55 10.18 30.26Copano Bay 0.24 0.17 0.33Cox Bay 0.54 0.34 0.86Crancahua Bay 1.02 0.76 1.36East Matagorda Bay 0.23 0.16 0.32Lavaca Bay 2.70 2.45 2.98Matagorda Bay 0.78 0.64 0.94San Antonio Bay 1.09 0.93 1.27Tres Palacios/Turtle Bay 1.55 1.36 1.77
Bay 90th Percentile Lower Conf Upper ConfCedar Lake 258.37 127.07 525.33Copano Bay 21.78 17.26 27.49Cox Bay 33.55 22.28 50.52Crancahua Bay 84.65 62.92 113.90East Matagorda Bay 25.51 19.84 32.80Lavaca Bay 154.04 137.29 172.82Matagorda Bay 62.55 52.17 74.99San Antonio Bay 45.89 39.27 53.62Tres Palacios/Turtle Bay 94.41 81.62 109.21
Statistics – NADA ROS
Three approaches detailed› Substitution› Maximum Likelihood Estimation
› [Robust] Regression on Order Statistics Regression equation on probability plot Use sample data where we have it Assume distribution only for censored data
Impute values for censored points Best for small data sets
Results – NADA ROS Plots
Results – NADA ROSBay Median Lower Conf Upper ConfCedar Lake 17.00 3.07 30.00Copano Bay 0.50 0.47 0.68Cox Bay 1.06 0.99 2.00Crancahua Bay 1.96 1.66 2.57East Matagorda Bay 0.62 0.56 0.81Lavaca Bay 4.00 4.00 5.00Matagorda Bay 1.47 1.59 2.12San Antonio Bay 2.05 1.99 2.51Tres Palacios/Turtle Bay 2.59 2.50 3.15
Bay 90th Percentile Lower Conf Upper ConfCedar Lake 306.00 110.00 920.00Copano Bay 23.00 20.00 33.00Cox Bay 33.00 23.00 70.00Crancahua Bay 79.00 70.00 130.00East Matagorda Bay 33.00 23.00 33.00Lavaca Bay 170.00 130.00 170.00Matagorda Bay 70.00 49.00 79.00San Antonio Bay 46.00 33.00 49.00Tres Palacios/Turtle Bay 110.00 79.00 130.00
Results – Prediction Intervals
Bay Mean Lower PI Upper PICedar Lake 158.66759 158.14966 158.9488Copano Bay 118.83158 114.54587 121.1857Cox Bay 97.35657 94.49062 98.92486Crancahua Bay 390.40849 388.28908 391.5455East Matagorda Bay 199.75109 195.32916 202.1536Lavaca Bay 392.71178 391.40887 393.4091Matagorda Bay 273.56241 271.14292 274.8631San Antonio Bay 77.23869 75.22121 78.338Tres Palacios/Turtle Bay 265.00299 263.28773 265.9233
Prediction Interval – “bracket the range of locations for … observations not currently in the data set.”
Finding a value outside should happen only 1-0.95 = 5% of the time
Used MLE method to get params
Future Work
Repeat for last 5-years and last 7-years of data› Is water quality in bays
improving/declining? Use method/findings in Copano Bay
project to predict median/90th %ile given geomean from model
Look at spatial variation in each bay› Though regulation is not done this way
Thanks & Questions
Thanks to:› Stephanie Johnson› Grace Chen› Sammy Sandoval› Dr. Maidment
Results without NADABay Median Lower Conf Upper ConfCedar Lake 17 10 30Chocolate Bay 4 4 5Copano Bay 2 2 2Cox Bay 2 2 2Crancahua Bay 2 2 2East Matagorda 2 2 2Matagorda Bay 2 2 2San Antonio 2 2 2Tres Palacios 2 2 2
Bay 90th Percentile Lower Conf Upper ConfCedar Lake 350 110 920Chocolate Bay 170 130 170Copano Bay 23 20 33Cox Bay 33 23 70Crancahua Bay 79 70 130East Matagorda 33 23 33Matagorda Bay 70 49 79San Antonio 46 33 49Tres Palacios 110 79 130