tool for assessing impact of changing editing rules on cost & quality
DESCRIPTION
Tool for Assessing Impact of Changing Editing Rules On Cost & Quality. Alaa Al-Hamad, Begoña Martín, Gary Brown Processing, Editing & Imputation Branch Business Surveys. 1. Overview. Data Editing in the ONS Error Detection Rules Problems Surveys Managers Dilemma - PowerPoint PPT PresentationTRANSCRIPT
Tool for Assessing Impact of Changing Editing Rules On Cost & Quality
Alaa Al-Hamad, Begoña Martín, Gary Brown
Processing, Editing & Imputation Branch
Business Surveys
1. Overview
• Data Editing in the ONS• Error Detection Rules Problems• Surveys Managers Dilemma • Proposed Tool• Tool illustration & output• Conclusion and Further Work
A costly component of the data cleaning process, in the
ONS, is data editing
Data Editing is defined as• An activity aimed at detecting and correcting errors in
data – ONS Glossary
In practice this involves:• the detection of error suspect data (using Editing Rules)
Ex. Fail if A + B ‘>‘ (estimated parameter) • Verification/correction of error suspect data from source
2. Editing in the ONS
If rule parameters are too conservative
• increased response burden (unnecessary recontacts)• reduced data quality (over-validation errors and biases)• costly in terms of staff & resources
If rule parameters are too liberal
• Allows uncorrected errors through• reduced data quality • costly in terms of reputation• less costly in terms of staff & resources
3. Detection Rules Problems
When managers are asked to achieve savings ‘Savings vs Quality Impact’
• An easy way to make quick savings is to loosen the rules parameters so that less data will be edited
The challenge is:• Where to stop.• What impact will such action have on the estimates?
RememberQuality loss is not defined solely by number of error failure but also by the size of the error
4. Surveys Managers Dilemma
5. Proposed Tool
Ideally what is required is a dynamic routine for editing rules parameters that is applicable to all business surveys and:
• offers a choice of different quality measurement criteria • considers all editing rules simultaneously• outputs proposed changes to parameters• outputs savings and quality loss per changed rule and in total
A dynamic routine has not yet been developed so we have pursued a pragmatic solution with the same criteria
6. Suitable Measurements
A Measure of Savings:
Savings = Number of records no longer require editing
A measure of impact:
Exact impact on final estimates is• difficult to calculate• time consuming • costly
Instead, use relative change =
• where X = a response before and after parameter change. w = a calibration weight.
XXX
Before
AfterBefore
w
w )(
7. Routine illustration
Existing Rules
Fail
Pass
No error
Error B*
7. Routine illustration
Existing Rules Loosen Rules
Fail
Pass
No error
Error B*
Pass A
Fail
Pass
Fail
Pass B
Savings# (A + B)
Errors missed
# (B)
8. Example of Rules Changes
Rule 1
Rule 2
Rule 3
c oo
o
S Sfail if S >£199K and 100 >40
S
o cc
c
S Sfail if S >£199K and 100 >40
S
c,t-1 c,t-1 o,tfail if S was returned and S S >£5K
AlterGate 1
Alter Gate 2
Alter Gate 3
9. Routine Results
Rules Routine Output
Gate1 Gate2 Gate3 SavingsErrors Missed
Relative Change (%)
600 40 10 111 77 0.56
600 40 50 205 171 1
600 40 40 192 158 1.28
600 40 20 160 126 1.32
250 40 100 243 209 2.96
300 40 100 243 209 2.96
600 40 100 243 209 2.96
600 30 200 274 240 3.93
600 40 200 274 240 3.93
600 50 200 274 240 3.93
10. Conclusions
• Often changes to validation rules to achieve saving are made in isolation and without consideration of the impact of these changes on the quality of the survey output
• In this work we are offering a simple but effective decision support tool
– to quantify savings & loss in quality resulting from changing editing rules
– help managers identify the editing rules that have the most impact on quality
- Identify the parameters that minimise quality loss given set savings, and vice versa
11. Further Work
Other elements of further work• Make the routine more dynamic • Enhancing the impact measure• Investigating varying the parameters by domains (eg
Standard Industrial Classification (SIC), employment sizeband)
• Apply the routine to other surveys
Over to you!
12. Questions