process for data quality assurance · pdf file“good decisions require good data”...

26
“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy (MCHP) Mahmoud Azimaee Data Analyst at ICES

Upload: vantruc

Post on 28-Mar-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

“Good decisions require good data”

Process for

Data Quality Assurance

at Manitoba Centre for Health Policy (MCHP)

Mahmoud Azimaee

Data Analyst at ICES

Page 2: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Literature and Resources

• CIHI Data Quality Framework, (2009

edition)

• UK’s NHS Data Quality Reports

• Handbook on Data Quality

Assessment Methods and Tools, (European Commission)

• Handbook on Improving Quality by

Analysis of Process Variables , (European Commission)

• Data fitness

(Australian National Statistical Service)

Page 3: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Data Quality at MCHP

1. Data Quality Indicators

2. Rating System

– CIHI Data Quality Framework, (2009 edition)

3. Data Quality Report

– UK’s NHS Data Quality Reports

4. Practical Approach

5. Automation

– Cody’s Data Cleaning

Techniques Using SAS, (by Ron Cody)

Page 4: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Data Quality Indicators and Rating

System • Example:

– Completeness: Rate of missing values for all data

elements.

• Consistency : Agreement with registry database.

Page 5: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Level of Agreement With Other Databases

Completeness

Measurement Error

Level of Bias

Degree of Problems with Consistency

Accuracy Reliability

Availability and Quality of:

Documents , Policies and Procedures, Formats Libraries, Metadata, Data Model Diagrams

Time to Acquisition

Time to Release

Currency of Data

Identifying Units of Analysis (Persons, Places, Things, ...)

Level of Agreement With the Literature and available reports

Internal Consistency

Stability Across Time

Linkability

Completeness

(Missing Values)

Correctness

(Invalid codes, Invalid Dates, Out of Range, Outliers and Extreme Observations)

Accuracy Internal Validity

External Validity Timeliness Interpretability

Database Level

(In Data Management)

Research Level

(In a Specific Research Projects)

Data Quality Assurance

MCHP Data Quality Framework:

Page 6: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Data

Management

Process at

MCHP

6. Release Data to Analyst(s) and Researcher(s)

Meet with programmer(s) and researcher(s) to present data structure and content

5. Document Data

Including original documents, data model diagram, SPDS data dictionary, history, file variations and structural changes, revisions and common problems and data quality report, where available

4. Evaluate Data Quality

Test the installed data using standardized protocol

Identify solutions to address deficiencies in data quality

Prepare data quality report for addition to standard documentation

3. Apply SAS Programs

Apply Normalization or De-normalization as required

Normalization can be defined as the practice of optimizing table structures by eliminating redundancy and inconsistent dependency

Apply data field and SAS format

standards

Install on SPD server

(This includes indexing, sorting and clustering)

Create Metadata

If there is a problem, liaise with the source

agency

2. Become Familiar with Data Structure and Content

Review provided documentation If required, create a data model for the

original data If receiving test data, test it and send

feedback to the source agency

1. Formulate the Request and Receive the Data

Check the data sharing

agreements

Liaise with the source agency to acquire available data, data model diagram, data dictionary, documentation about historical changes in data content,

format, and structure, data quality reports

Prepare the data request

letter

Receive the data and associated documentation

Page 7: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

How to Present Data Quality

Results?

• CIHI Data Quality Report

• UK’s NHS Data Quality Report

– VODIM Test Analysis Methodology

• Valid

• Other

• Default

• Invalid

• Missing

• Valid

• Invalid

• Missing

• Outlier

VIMO!

Page 8: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

VIMO Table

Page 9: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning
Page 10: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

(1) I just discovered that the

data system we have been

working on for the last five

years has major data quality

problems.

(2) That is why I treat data

systems the same way I do

sausage – I do not want to know

what is inside either one.

(3) Ouch!! That is why I am a

vegetarian!

Conversation from: Data Quality and Record Linkage Techniques, Thomas N. Herzog, et al. 2007, Springer

Page 11: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• Example 1: Identifying Outliers/Extreme

Observations: 1. Standard Deviation (Mean +/- 2*SD)

2. Trimmed Standard Deviation

(MeanTrimmed10% +/- 2*1.49*SDTrimmed10%)

3. Interquartile Range

(Q1 – k*IQR , Q3 + k*IQR), k=2.5

– Ordered statistics for calculating quartiles is very

memory intensive

=>> P² method to approximate the quartiles

(Using QMETHOD=P2 in PROC MEANS)

[piecewise-parabolic (P²) algorithm invented by

Jain and Chlamtac (1985)]

Operational Approaches

Page 12: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Operational Approaches

Example 2: Stability Across Time

Based on CIHI guideline:

– Trend analysis is used to examine changes in

core data elements over time

– No change across years may also be an

indication of a problem if the data is expected to

naturally trend upward or downward

– Changes in methodology or inclusion/exclusion

criteria should be taken into account to determine

whether the observed changes were real or not.

Page 13: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Example 2: Stability Across Time (Continued)

• Identify unusual changes

– Outlier analysis

• Outlier analysis requires a model

– How to choose an appropriate model in an

automated fashion?

• Fit a series of common models:

– Simple Linear: Y=β0 + β 1X

– Quadratic: Y= β 0 + β 1X2

– Exponential: Y= β 0 + β 1exp(X)

– Logarithmic: Y= β 0 + β 1log(X)

– SQRT: Y= β 0 + β 1 𝑥

– Inverse: Y= β 0 + β 1 1

𝑥

– Negative Exponential: Y= β 0 + + β 1Exp(-X)

Page 14: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Example 2: Stability Across Time (Continued)

• Choose the best model with the minimum MSE

• Re-fit the chosen model on the data

• Do an outlier analysis

– Estimate Studentized residuals for each observation (with the

current observation deleted)

• Flag significant observations as potential outliers

• Flag observation with no changes over time

• How about Small Cell Size Policy? (0<Frequency<6)

– Use the actual values in modeling but flag and then

force them to 3 in the report

Page 15: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• MCHP’s data repository includes over 65

health and other administrative databases,

(linkable using a common encrypted

individual identifier).

• Annual updates for most of the databases

in its repository.

• Designing an automated process became a

must!

Automation

Page 16: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• A SAS Macro based application package

was developed (16 Macros)

– Pre Data Quality Macro (1)

– Main Macros (6)

– Intermediate Macros (9)

Automation

Page 17: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Automation (Continued)

Documentation System

VIMO Macro

GETNOBS Macro

INVALID Macro

POSTMUN Macro

OUTLIER Macro

GETFORMAT Macro

GETVARLIST Macro

METADATA MACRO

Special Features:

• Can handle standalone and Clustered tables

• Can Validate Postal and Municipal codes

Page 18: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

LINK Macro

GETNOBS Macro

Automation (Continued)

Page 19: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

TREND Macro

GETNOBS Macro

FISCAL Macro

MONTHLY Macro

Automation (Continued)

Page 20: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

CONTENT MACRO

AGREEMENT MACRO

PHINCHECKMACRO

• Checks 3rd and 5th positions of PHINs which must be 0 and 9

• Compares the distribution of the first position with the corresponding

PHINs from registry files

Automation (Continued)

Page 21: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Non-Automated Indicators

• Internal Consistency

• Timeliness

Page 22: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Level of Agreement With Other Databases

Completeness

Measurement Error

Level of Bias

Degree of Problems with Consistency

Accuracy Reliability

TREND Macro LINK Macro PHINCHECK Macro AGREEMENT Macro

VOMO Macro

Availability and Quality of:

Documents , Policies and Procedures, Formats Libraries, Metadata, Data Model Diagrams

Time to Acquisition

Time to Release

Identifying Units of Analysis (Persons, Places, Things, ...)

Level of Agreement With the Literature and available reports

Internal Consistency

Stability Across Time

Linkability

Completeness

(Missing Values)

Correctness

(Invalid codes, Invalid Dates, Out of Range, Outliers and Extreme Observations)

Accuracy Internal Validity

External Validity Timeliness Interpretability

Database Level

(In Data Management)

Research Level

(In a Specific Research Projects)

Data Quality Assurance

Page 23: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• Data Quality Website

Page 24: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• Central Format Library

• Metadata Database

• Standardization

– Bad standards are better than no standards at all!

Missing Links!

Page 25: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

• Data Quality Algebra

• Data Quality Axioms

Data Quality As A Science

Page 26: Process for Data Quality Assurance · PDF file“Good decisions require good data” Process for Data Quality Assurance at Manitoba Centre for Health Policy ... –Cody’s Data Cleaning

Acknowledgment

• Mr. Mark Smith (MCHP Associate Director, Repository)

• Dr. Lisa Lix (Associate Professor at University of Saskatchewan)

CONTACT INFORMATION

Mahmoud Azimaee

Institute for Clinical Evaluative Sciences

Work Phone: (647) 480-4055 (Ex. 3618)

E-mail: [email protected]

Web: www.dastneveshteha.com