9 data quality and implementation hassweddlavalle · introduction ¨ reporting data “as is” is...
TRANSCRIPT
![Page 1: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/1.jpg)
Improving State Reporting of Crime Trends: Initial Testing and Replication of the NIBRS Data Quality and Imputation Package (NIBRS DQIP) in Two States
Christina R. LaValle, M.S., Walter Reed Army Institute of ResearchAlan Wedd, M.S., Ohio Statistical Analysis CenterStephen M. Haas, Ph.D., ICF Fellow
SAC Western Regional Training InstituteMay 23, 2018
![Page 2: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/2.jpg)
Introduction
¨ Reporting data “as is” is often not accurate.
¨ IBR data is an essential data source for tracking crime trends in states without victimization surveys.
¨ Imputed data allows for more stable crime trends.
¨ WVIBR data are well suited to study data quality and imputation100% population and crime coverage with NIBRS.
¨ Established history of NIBRS reporting (since 2006).
¨ Relative stability and consistency in crime reporting.
![Page 3: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/3.jpg)
Our Approach
¨ Designed to assist states in being more timely and accurate in reporting their state crime trends
¨ Accessibility is important – want the approach to be easy for state programs to administer and report
¨ NIBRS data is often criticized for being incomplete or non-representative
¨ We sought to develop an “easy-to-use” tool that can be employed by state repository personnel and data analysts who work with state-level IRB data
![Page 4: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/4.jpg)
Our Approach (cont.)
¨ Two main issues impact data quality for WV and other states: missing data and outliers.
¨ Guidelines were developed to classify reports of zero as ‘true zero’ or ‘missing’ based on agency type, crime total, and number of consecutive zeros reported across crime types.
¨ Different imputation methods were tested using simulation based on the missing value patterns identified by classifying zeros and detecting outliers.
¨ Simulations were also conducted to determine the impact of estimating crime totals with complete agency data deleted at multiple missing data scenarios.
![Page 5: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/5.jpg)
Process of Applying Imputation Methods
1. Obtain crime data (ORI, agency name, crime counts).
2. Identify missing data (zero reports), manually inspect, and classify as missing data or true zeros.
3. Identify irregular data, manually inspect, and classify as outlier or acceptable.
4. Identify non-reporting agencies.
5. Obtain population estimates for municipal police departments and county sheriff departments.
6. Obtain MSA status for all non-municipal agencies.
7. Apply imputation methods.8. Calculate statistics.
![Page 6: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/6.jpg)
Imputation Considerations
¨ Imputation methods systematically estimate missing data values.
¨ There were two types of missing data situations:¤ Estimating for agencies that reported some but not all
months of data (i.e., partial reporting agencies) and,
¤ Estimating for agencies that did not report any data (i.e., non-reporting agencies).
¨ Several different imputation methods for missing data for both partial reporting (4) and non-reporting (5) situations were investigated and compared to the FBI method.
![Page 7: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/7.jpg)
Classifying Zeros - Example
Year Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Violent 2007 1 0 0 1 1 3 0 1 0 1 1 0 9
Property 2007 17 8 19 20 19 13 15 18 10 20 19 13 191
Non-indx 2007 17 13 22 15 20 13 15 21 21 18 11 17 203
Violent 2008 0 2 1 1 0 3 0 0 2 0 1 0 10
Property 2008 12 11 14 7 31 17 26 16 13 11 14 13 185
Non-indx 2008 26 18 17 17 17 9 21 11 8 11 11 5 171
Violent 2009 1 1 1 2 0 0 1 0 0 1 1 0 8
Property 2009 11 11 15 16 17 0 22 25 13 14 19 13 176
Non-indx 2009 11 15 12 18 7 0 10 11 9 8 6 3 110
WVIBRS Data: County Sheriff Department (population (balance of): 19,333)County Agency with moderate reporting
![Page 8: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/8.jpg)
Classifying Zeros - Example
Year Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Violent 2007 0 0 0 0 1 1 0 0 0 0 0 0 2
Property 2007 5 3 2 2 1 0 6 6 2 1 1 1 30
Non-indx 2007 3 3 2 2 2 3 7 6 2 2 2 0 34
Violent 2008 0 0 0 0 0 0 0 0 0 0 0 0 0
Property 2008 1 2 1 4 0 1 4 2 2 2 1 0 20
Non-indx 2008 5 0 2 2 0 2 1 2 3 4 1 2 24
Violent 2009 1 0 0 0 0 0 0 0 0 0 0 0 1
Property 2009 1 0 0 0 0 0 0 0 0 0 0 0 1
Non-indx 2009 1 2 1 0 0 1 0 0 0 0 0 0 5
WVIBRS Data: Municipal Police Department (population: 5,177)PD with moderate reporting
![Page 9: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/9.jpg)
Classifying Zeros - Example
Year Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Violent 2007 0 0 0 0 0 0 0 0 0 0 0 0 0
Property 2007 1 0 1 4 0 0 0 0 0 0 0 0 6
Non-indx 2007 1 0 4 0 0 0 0 2 3 0 0 0 10
Violent 2008 0 1 0 1 0 0 0 0 0 0 0 0 2
Property 2008 0 2 3 4 2 0 0 3 1 1 1 0 17
Non-indx 2008 0 2 2 3 0 0 0 3 1 8 2 0 21
Violent 2009 0 1 0 0 0 0 0 0 0 0 1 0 2
Property 2009 1 3 0 5 1 2 0 0 6 4 5 9 36
Non-indx 2009 0 15 4 2 1 0 0 0 5 9 0 0 36
WVIBRS Data: Campus PoliceCollege campus with seasonal reporting
![Page 10: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/10.jpg)
Classifying Zeros - Example
Year Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Violent 2007 0 0 0 0 0 0 1 1 0 0 0 0 2
Property 2007 3 0 0 0 0 0 0 0 0 0 0 0 3
Non-indx 2007 0 3 1 2 0 0 0 1 1 1 1 0 10
Violent 2008 1 1 0 0 0 0 0 0 1 0 2 0 5
Property 2008 0 1 0 0 0 0 0 0 0 0 1 0 2
Non-indx 2008 0 0 0 1 0 2 0 0 0 1 0 0 4
Violent 2009 0 0 1 0 0 0 0 0 0 1 0 0 2
Property 2009 0 0 0 0 0 0 0 0 0 0 0 0 0
Non-indx 2009 2 0 2 2 0 0 1 0 0 1 2 2 12
WVIBRS Data: Municipal Police Department (population: 2,589)PD with spare reporting
![Page 11: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/11.jpg)
Zero Classification Guidelines
¨ Helper variables for guidelines¤ NCZ - number of consecutive months were all crime counts are zero¤ Total P – total number of property crimes¤ PopStatus – population status (population or zero-population)
1. If any crime count in the same month for violent, property, or non-index crimes are non-zero any zero reported is a true zero.2. If Total P > 25 and NCZ > 0, zeros are identified as missing. 3. If NCZ ≥ 4, zeros are identified as missing.4. Guidelines 2 and 3 may not hold for zero-population agencies. Manual inspection is required for data identified, the reporting pattern for the year is used to assist with classification.
![Page 12: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/12.jpg)
Data Quality Tool
![Page 13: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/13.jpg)
Enter Violent, Property, Non-index Data
![Page 14: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/14.jpg)
Running the Classifying Zeros Macro
![Page 15: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/15.jpg)
Classifying Zeros Guidelines
![Page 16: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/16.jpg)
Classifying Zeros, Inspecting Output
![Page 17: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/17.jpg)
Classifying Zeros, Example 1
![Page 18: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/18.jpg)
Classifying Zeros, Example 2
![Page 19: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/19.jpg)
Classifying Zeros, Example 3
![Page 20: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/20.jpg)
Outlier Detection
¨ Two, complementary techniques and graphical analysis.¨ Outlier statistics were developed by WVSAC.¨ Ratio of Ranges (Rr)
¤ Identifies the agency with suspected outliers¨ Ratio to Median (Yi)
¤ Identifies the month containing suspected outliers¨ Graphical Analysis
¤ Histogram¤ Dot Plot¤ Line Chart
![Page 21: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/21.jpg)
Outlier Detection - Formulas
¨ Ratio or Ranges, Rr
!"#$% ='()* − ,'
-.-/0 1"234 1.56-'()* − '(78
!"9$::$( = '(78 − ,'-.-/0 1"234 1.56-
'()* − '(78¨ Ratio to Median, Yi
;2 = '7,' <." 4/1ℎ 3.6-ℎ 2
![Page 22: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/22.jpg)
Outlier Detection - Example
WVIBRS Data: Municipal Police Department (population: 49,129)Large PD with high crime counts
Year Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total
Violent 2009 24 3 5 3 21 29 28 24 21 31 27 31 247
Property 2009 244 7 13 14 223 276 255 266 273 295 278 248 2392
Non-indx 2009 186 37 38 59 207 192 220 198 199 230 223 193 1982
Violent 2010 31 32 38 27 22 20 26 19 28 20 20 18 301
Property 2010 209 156 228 272 312 310 268 278 208 260 217 195 2913
Non-indx 2010 174 153 168 171 187 215 203 205 175 227 150 166 2194
Violent 2011 22 7 21 24 24 28 3 40 24 21 36 21 271
Property 2011 180 15 237 277 243 255 18 302 272 267 246 245 2557
Non-indx 2011 183 21 154 203 204 178 39 211 207 178 176 162 1916
![Page 23: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/23.jpg)
Back to the Data Quality Tool…
![Page 24: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/24.jpg)
Generate Data for Outlier Detection
![Page 25: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/25.jpg)
Generate Data for Graphical Analysis
![Page 26: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/26.jpg)
Running the Outlier Detection Macro
![Page 27: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/27.jpg)
Outlier Detection, Macro Overview
![Page 28: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/28.jpg)
Outlier Detection, Adjusting Thresholds
![Page 29: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/29.jpg)
Outlier Detection, Inspecting Output
![Page 30: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/30.jpg)
Outlier Detection, Inspecting Output cont.
![Page 31: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/31.jpg)
Outlier Detection, Inspecting Output cont.
![Page 32: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/32.jpg)
Outlier Detection – Graphical Analysis Histogram
0123456789
101112
22 26 30 34 38
Freq
uen
cy
Violent 2010
0123456789
101112
9 15 21 27 33
Freq
uen
cy
Violent 2009
0123456789
101112
11 19 27 35 43
Freq
uen
cy
Violent 2011
![Page 33: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/33.jpg)
Outlier Detection – Graphical Analysis Dot Plot
0 6 12 18 24 30 36 42 48 54 60
Crim e Count
Violent 201020Range: 25.0AverageMonthyCount: 24MedianMonthlyCount:
0 5 10 15 20 25 30 35 40 45 50
Crim e Count
Violent 200928Range: 20.5AverageMonthyCount: 24MedianMonthlyCount:
0 6 12 18 24 30 36 42 48 54 60
Crim e Count
Violent 201137Range: 22.5AverageMonthyCount: 23MedianMonthlyCount:
![Page 34: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/34.jpg)
Outlier Detection – Graphical Analysis Line Chart
06
121824303642485460
1 2 3 4 5 6 7 8 9 10 11 12
Mo
nth
ly C
rim
e C
ou
nt
M onth
Violent 2010
05
101520253035404550
1 2 3 4 5 6 7 8 9 10 11 12
Mo
nth
ly C
rim
e C
ou
nt
M onth
Violent 2009
06
121824303642485460
1 2 3 4 5 6 7 8 9 10 11 12
Mo
nth
ly C
rim
e C
ou
nt
M onth
Violent 2011
![Page 35: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/35.jpg)
Back to the Data Quality Tool…
![Page 36: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/36.jpg)
Running the Graphical Analysis Macro
![Page 37: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/37.jpg)
Graphical Analysis, Acceptable Data
![Page 38: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/38.jpg)
Graphical Analysis, Outliers
![Page 39: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/39.jpg)
Data Quality Research Findings
¨ Zero classification guidelines identified about 1 out of 5 agencies. On average, 1 out of 10 agencies were actually classified as having missing data.
¨ Traditional outlier detection methods failed to identify known irregularities in agency data.
¨ About 1 out of 5 agencies for property crimes and 1 in 10 agencies in violent crimes were identified by at least one outlier detection method (Rr or Yi).
¨ The quantity of data identified and classified as missing or irregular was consistent during the five year study period.
![Page 40: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/40.jpg)
WV Partial Reporting Imputation Method
Crime Total = Partial Crime Total + Q1*(N1) + Q2*(N2) + Q3*(N3) + Q4*(N4)
¨ Nx = number of missing values per period (can vary from 0 to 3)¨ Q1 = average of December, January, February crime counts;¨ Q2 = average of March, April, May crime counts;¨ Q3 = average of June, July, August crime counts;¨ Q4 = average of September, October, November crime counts.¨ If N1 = 3, then Q1 = minimum[Q2, Q3, Q4].¨ If N2 or N4 = 3, then Q2 = Q4.¨ If N3 = 3, then Q3 = maximum[Q1, Q2, Q4].¨ If N2 and N4 = 3, then Q2 = Q4 = average[Q1, Q3].¨ If data for three entire quarters were missing, the average of the
remaining values are used for the respective Qx.
![Page 41: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/41.jpg)
WVIBR and UCR Plots
050 0
10 00
15 00
20 00
25 00
30 0035 00
40 00
45 00
1 2 3 4 5 6 7 8 9 10 11 12
Crim
e Co
unt T
otal
(All
Age
ncie
s co
mbi
ned)
M onth
WV Property Crimes 2009
050
10 015 020 025 030 035 040 045 050 0
1 2 3 4 5 6 7 8 9 10 11 12
Crim
e Co
unt T
otal
(All
Age
ncie
s co
mbi
ned)
M onth
WV Violent Crimes 2009
0
20 000 0
40 000 0
60 000 0
80 000 0
10 000 00
12 000 00
Jan Feb M ar Ap r M ay June Jul Au g Sep t O c t N ov
UCR Data 1997 to 1999
19 97 19 98 19 99
![Page 42: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/42.jpg)
WV Non-reporting Imputation Method
¨ Crime Total = Population Group Crime Rate*Agency’s Population
Group WV Population Groups1 25,000+2 10,000-24,9993 5,000-9,9994 2,500-4,9995 1,000-2,499 + colleges6 Less than 1,0007 Not Applicable8 Non-MSA counties & State Police9 MSA counties & State Police
100,000
![Page 43: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/43.jpg)
WV Non-reporting Imputation Method
¨ Data needed:¤ Incident data aggregated by month¤ Population data (U.S. Census Bureau)
n Population 2000-2009n http://www.census.gov/popest/data/cities/totals/2009/SUB-EST2009-states.html
n Population 2010 - 2012n http://www.census.gov/popest/data/cities/totals/2012/SUB-EST2012.html
¤ County MSA status (U.S. Census Bureau)n County FIPS codes 2010
n https://www.census.gov/geo/reference/codes/cou.html
n County MSA 2003-2009n http://www.census.gov/population/metro/data/defhist.html
¤ Complete state agency list n State Police or state repository
![Page 44: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/44.jpg)
WV Non-reporting Imputation Method
¨ Obtain and format data.¨ Assign population estimates to municipal and county agencies.¨ Assign MSA status to non-municipal agencies (i.e., county, state
police, DNR, task forces, etc.).¨ Group municipal agencies by population (Population Groups 1-6).¨ Group non-municipal agencies by MSA status (Population Groups 8
and 9).¨ Calculate group crime rates by summing the total crime then dividing
by the total population for all full reporting agencies in each group and multiplying by 100,000.
¨ Apply the imputation to non-reporting agencies by multiplying the non-reporting agency’s population by the group crime rate then dividing by 100,000.
![Page 45: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/45.jpg)
Imputation Tool
![Page 46: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/46.jpg)
Running the Imputation Tool, Step 1
![Page 47: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/47.jpg)
Imputation Tool Output1
![Page 48: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/48.jpg)
Running the Imputation Tool, Step 2
![Page 49: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/49.jpg)
Imputation Tool Output 2
![Page 50: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/50.jpg)
Replication in Ohio
![Page 51: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/51.jpg)
OIBRS Reporting – State
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Population Agencies
Perc
ent C
over
ed
Ohio UCR Coverage
SRS OIBRS
![Page 52: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/52.jpg)
OIBRS Reporting – County
![Page 53: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/53.jpg)
Challenges – Mapping
![Page 54: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/54.jpg)
Challenges – Framing Data
“The rate of stalking victims reported to OIBRS…”
02468
1012141618202224
2013 2014 2015 2016 2017
Rate
per
100
,000
Stalking Rate
![Page 55: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/55.jpg)
Other Challenges
¨ Time lag between OIBRS and SRS
¨ Can’t use data for inferential work
¨ Lack of available and validated methods
![Page 56: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/56.jpg)
Advantages of these methods
¨ Replication
¨ Easy to implement
¨ Designed for state-level NIBRS data
![Page 57: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/57.jpg)
Goal of this project
¨ Replicate WV work with OIBRS data ¤ Ohio is good candidate for replication
![Page 58: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/58.jpg)
Methods
¨ OIBRS data from 2014-2016¨ Zero classification, outlier detection,
imputation/simulation
Year Reporting Agencies
Population Covered
2014 564 77.9%
2015 556 78.6%
2016 542 78.6%
![Page 59: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/59.jpg)
Methods – Differences & Guidance
¨ Zero Classification ¤ Extracted and aggregated data from OIBRS
¤ Referenced “completeness” variable in database for zero and low population agencies
¤ Categorized agency size differently
![Page 60: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/60.jpg)
Methods – Differences & Guidance
¨ Outlier Detection¤ Experiment w/Rr and Yi values
¤ Save workbook after each step
¤ Helpful to have “comparison agency”
![Page 61: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/61.jpg)
Methods – Differences & Guidance
¨ Imputation/Simulation ¤ Agency type, population data were already in
database, only needed MSA status
¤ Ran fewer simulations than WV
¤ Updated workbook from Christina
![Page 62: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/62.jpg)
Results – Zero Classification
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2014 2015 2016 Average LaV alle etal.(2014)
Perc
ent I
dent
ified
/Cla
ssifi
ed
Identification and Classification
Agencies Identified Agencies Classified
![Page 63: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/63.jpg)
Results – Zero Classification
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2014 2015 2016 Average LaV alle etal.(2014)
Acc
urac
y
Accuracy
![Page 64: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/64.jpg)
Results – Zero Classification
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2014 2015 2016 Average
Acc
urac
y
Accuracy – Agency SizeLar ge Small
![Page 65: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/65.jpg)
Results – Outlier Detection
0%
2%
4%
6%
8%
10%
12%
14%
2014 2015 2016 Average LaV alle et al.(2014)
Perc
ent I
dent
ified
Violent Crime - Identification
Yi Rr Yi and Rr
![Page 66: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/66.jpg)
Results – Outlier Detection
0%
5%
10%
15%
20%
25%
30%
2014 2015 2016 Average LaV alle et al. (2014)
Perc
ent I
dent
ified
Property Crime - IdentificationYi Rr Yi and Rr
![Page 67: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/67.jpg)
Results – Outlier Detection
0%
1%
2%
3%
4%
5%
2014 2015 2016 Average LaV alle et al.(2014)
Perc
ent o
f A
genc
ies
With
Out
liers
Outlier Frequency
Property Crime Violent Crime
![Page 68: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/68.jpg)
Accuracy Measures for Imputation
¨ Mean Absolute Error (MAE): the average of the absolute distance between imputed and original values. It describes how much, on average, imputed values differ from original values. Smaller values are better.
![Page 69: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/69.jpg)
Results – Simulation and Imputation
0
5
10
15
20
25
30
35
40
45
50
2014 2015 2016 Average
MAE
MAE – Violent Crime
LaV alle et al. (2013) Method FBI Method
![Page 70: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/70.jpg)
Accuracy Measures for Imputation
¨ Root Mean Squared Error (RMSE): the standard deviation of the prediction error. It is a measure of consistency and variation and sensitive to large over or under estimates. Smaller values are better.
![Page 71: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/71.jpg)
Results – Simulation and Imputation
0
50
100
150
200
250
300
2014 2015 2016 Average
RMSE
RMSE – Violent Crime
LaV alle et al. (2013) Method FBI Method
**
**
![Page 72: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/72.jpg)
Accuracy Measures for Imputation
¨ Bias: the average distance between imputed and original values used to indicate the tendency for a method to over- or underestimate values. Bias of zero indicates no bias, negative bias indicates underestimation, and positive bias indicates overestimation.
![Page 73: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/73.jpg)
Results – Simulation and Imputation
-15-14-13-12-11-10-9-8-7-6-5-4-3-2-10
Bias
Bias
LaV alle et al. (2013) Method FBI Method
2014 2015 2016 Average
![Page 74: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/74.jpg)
Results – Simulation and Imputation
0
50
100
150
200
250
300
2014 2015 2016 Average
MAE
MAE - Property Crime
LaV alle et al. (2013) Method FBI Method
**
** **
![Page 75: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/75.jpg)
Results – Simulation and Imputation
0
200
400
600
800
1,000
1,200
2014 2015 2016 Average
RMSE
RMSE - Property Crime
LaV alle et al. (2013) Method FBI Method
**** **
**
![Page 76: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/76.jpg)
Results – Simulation and Imputation
-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
Bias
Bias - Property Crime
LaV alle et al. (2013) Method FBI Method
**
****
2014 2015 2016 Average
![Page 77: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/77.jpg)
Discussion
¨ These methods improved OIBRS data quality¤ Will likely work in other states
![Page 78: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/78.jpg)
Discussion – Zero Classification
¨ Significant improvement over not classifying zeros
¨ Most errors were false positives for small agencies¤ Easy to detect, probably easy to address
![Page 79: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/79.jpg)
Discussion – Outlier Detection
¨ Significant improvement over not detecting outliers
¨ Most errors were false positives for large agencies ¤ Easy to detect, probably easy to address
![Page 80: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/80.jpg)
Discussion – Simulation and Imputation
¨ Both methods performed similarly for violent crime
¨ WV method was better than FBI method for property crime
¨ Much better than not imputing
![Page 81: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/81.jpg)
Discussion
¨ Ohio and WV had similar amounts of missing data and outliers
![Page 82: 9 Data Quality and Implementation HassWeddLavalle · Introduction ¨ Reporting data “as is” is often not accurate. ¨ IBR data is an essential data source for tracking crime trends](https://reader035.vdocuments.mx/reader035/viewer/2022071022/5fd68fd136d8cc19251d6d22/html5/thumbnails/82.jpg)
Future Work
¨ Replicating, refining, and expanding these methods
¨ Improving incident level data