public aggregate reporting – dhcs business reports overview linette scott, md, mph chief medical...
TRANSCRIPT
Public Aggregate Reporting – DHCS Business Reports Overview
Linette Scott, MD, MPH
Chief Medical Information Officer, DHCS
July 1, 2015
Public Aggregate Reporting for DHCS Business Reports (PAR-DBR)
HIPAA Standard for De-identification
Overview of the PAR-DBR
Steps of the PAR-DBR
2
HIPAA STANDARD FOR DE-IDENTIFICATION
3
HIPAA standard for de-identification of protected health information:
“Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information”
DHCS is a HIPAA Covered Entity
Health Insurance Portability and Accountability Act (HIPAA)
http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html
4
HIPAA De-identification Standard
Two methods described in the standard:
Safe Harbor 18 identifiers of the individual or of relatives,
employers, or household members of the individual must be removed
In the context of other publicly available information
Expert Determination
5
HIPAA Safe Harbora) Names
b) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:- The geographic unit formed by combining all ZIP codes with the
same three initial digits contains more than 20,000 people; and- The initial three digits of a ZIP code for all such geographic units
containing 20,000 or fewer people is changed to 000
c) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
6
HIPAA Safe Harbor Cont.d) Telephone numbers
e) Fax numbers
f) Email addresses
g) Social security numbers
h) Medical record numbers
i) Health plan beneficiary numbers
j) Account numbers
k) Certificate/license numbers
l) Vehicle identifiers and serial numbers, including license plate numbers
m) Device identifiers and serial numbers
n) Web Universal Resource Locators (URLs)
o) Internet Protocol (IP) addresses
p) Biometric identifiers, including finger and voice prints
q) Full-face photographs and any comparable images
r) Any other unique identifying number, characteristic, or code
7
What Usually Leads to Expert Determination? Time
The time period is less than a year Geography
Less than statewide Other
Rare diagnosisSpecific combinations of variables
8
Expert Determination
Apply statistical or scientific principles
Very small risk that the anticipated recipient could identify an individual
Documents the methods and results of the analysis that justify such determination
http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#standard
9
OVERVIEW OF THE PAR-DBR
10
Purpose of the PAR-DBR
Establish guidelines to be used for reports and documents generated by DHCS programs for public release that include data (tables, charts, graphics)
Create consistency in the analysis and presentation of data in reports and documents
Protect confidentiality of personal data held by DHCS
Compliance with laws that govern data release
11
Public Aggregate ReportingDHCS Business Reports These Guidelines provide a method and process
for Expert Determination
Key Steps:
Evaluate the Numerator
Evaluate the Denominator
Use the Publication Scoring Criteria
Suppress data that has higher risk
Departmental document review processes12
Public Aggregate Reporting
http://www.dhcs.ca.gov/dataandstats/statistics/Documents/3_1_Population_Distribution_Age_Gender.pdf 13
A Table with Suppressed Data
http://www.dhcs.ca.gov/dataandstats/statistics/Documents/RASD_Issue_Brief_MC_Births.pdf 14
STEPS FOR THE PAR-DBR
15
The Cells in the table are the boxes that have values in them, as opposed to the row and column headers
Table Cell
Defining Table Cell
Year# of Medi-Cal Members
in Fee For Service (in thousands)
# of Medi-Cal Membersin Managed Care
(in thousands)2012 2,775 4,853
2011 3,067 4,527
http://www.dhcs.ca.gov/dataandstats/statistics/Documents/1_6_Annual_Historic_Trend.pdf - Data in the Table16
Numerator – number of events with the characteristics of the given row and column
Denominator – the population from which the events arise
Defining Numerator & Denominator
Year# of Medi-Cal Members
in Fee For Service (in thousands)
# of Medi-Cal Membersin Managed Care
(in thousands)2012 2,775 4,853
2011 3,067 4,527
Numerator# of Medi-Cal Members in Fee For
Service (in thousands) 2,775
Denominator # Medi-Cal Members in 2012 (in thousands)
7,628
http://www.dhcs.ca.gov/dataandstats/statistics/Documents/1_6_Annual_Historic_Trend.pdf - Data in the Table 17
Step 1 – Numerator Condition
Have the Numerators (the table cells) been derived from greater than 10 members (beneficiaries)?
If Yes, Go to Step 2 If No, Go to Step 3
Step 2 – Denominator Condition
Is the population denominator for the numerators in the table cells greater than 20,000 individuals?
If Yes, Go to Step 5 If No, Go to Step 3
Step 3 – Apply Publication Scoring Criteria to assess risk:
If the score is ≤ 12, Go to Step 5 If the score is > 12, Go to Step 4
Step 5 – Submit Aggregate Data Analysis for Document Review
Program Management Review Expert Determination Review* OLS Review for legal risk OPA Review OPA Review
Step 4 – Suppress Small Cells and Complimentary Cells
Small Cells are those with numerators fewer than 11 and Complimentary Cells are those that could be used to recalculate the Suppressed Small Cells
Figure 3: Reporting Assessment Decision Tree
Assesses risk for data release of aggregate data through a stepwise process. Aggregate data may be derived from record level data with identifiers, record level data without identifiers or previously aggregated data.
NO
NO
> 12
YES
YES
≤ 12
* l Review for Expert Determination will be performed by individuals who have been qualified as experts by OLS and who meet the HIPAA Privacy Rule implementation specifications: “A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.” [45 CFR Section 164.514(b)(1)]
A stepwise decision tree to assess aggregate data for de-identification
Serves as a tool and guideline for the Expert Determination
18
19
A minimum cell size is set for the Numerator
A minimum value is set for the Denominator
Both the minimum cell size for the numerator and denominator must be met
DHCS has identified a minimum value of 11 for the numerator rule and a minimum value for the denominator 20,000.
Both conditions must be met to release the data in the table cell, otherwise proceed to Step 3
Reporting Assessment Decision Tree Steps 1 & 2
20
Step 3 – Apply Publication Scoring Criteriato assess risk
Step 4 – Suppress Small Cells and Complimentary Cells if score is greater than 12
21
A symbol standing in for an unknown numerical value in an equation
Common variables in health data aggregation: Age Sex Race Ethnicity Time
Common Public Reporting Variables
Geography (State, County, Medical Service Study Area, ZIP Code)
Diagnosis/ConditionProvider (Type, Specialty,
Location)
22
A given variable my have different ranges assigned to it
Ranges assigned to the variable may be defined many ways
Example – Age Groupings 0-10, 11-20, 21-30, etc. (years old) …
provides equal groupings, commonly used, may not apply to a particular program
0-4, 5-11, 12-18, 19-21 (years old) … correlates to school environments: pre-school, elementary, junior high/high school, post-school
Variables - Ranges
23
Variable Characteristics Score
Sex Male or Female +1
Age Range >10-year age range +2
6-10 year age range +3
3-5 year age range +5
1-2 year age range +7
Race Group White, Asian, Black +3
Detailed Race +5
Hispanic Ethnicity
yes or no +2
Detailed ethnicity +3
Language Spoken
English, Spanish, Other Language +2
Publication Scoring Criteria – Step 3
24
Variable Characteristics Score
Events 1000+ events in a specified population +2
(Numerator) 100-999 events +3
11-99 events +5
<11 events +8
Geography State or geography with population >2,000,000
-5
Population 560,001 - 2,000,000 -3
Population 20,001 - 560,000 0
Population ≤ 20,000 +5
Data Year 5 years aggregated -5
2-4 years aggregated 0
1 year (e.g., 2001) +3
Bi-Annual +4
Quarterly +5
Monthly +7
Publication Scoring Criteria – Cont.
25
Release Scoring Criteria approximately quantifies two re-identification risks:
size of potential population
variable specificity
Add the score assigned to each variable characteristic:
If the score is more than 12, cell sizes must be 11 or more before releasing data
Compiling the Score
26
Suppression – Step 4 Complimentary Cells
Any time a single cell could be calculated based on row or column totals when it is suppressed, then additional cells in that row and/or column will also need to be suppressed
The total value of the cells suppressed should be 11 or more
Additional Aggregation Examples Extend the time period included
Group regional geography
Goal – large enough groups that suppression is no longer necessary
27
Aggregate Variables - Examples
Higher Aggregation Lower Aggregation
Age Groups: 0-21, 22-64, 65 and older
Ethnicity: Hispanic or Not Hispanic
Geography: Medical Service Study Areas (542 in CA)
Time: three years
Diagnosis: Diagnosis Related Groups (DRGs), Episode Grouper
Age Groups: one year increments (1, 2, 3, etc.)
Race: White, Black, American Indian, Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, Native Hawaiian, Guamanian, Samoan, Other
Geography: ZIP Code (2,591 in CA)
Time: monthly
Diagnosis: Specific ICD-9 Codes
28
To achieve a minimum number in the given cell, results are combined over the associated variable:
Geographic areas,
Multiple years, or
Subgroups (e.g., age groups)
The number of variables in a table will affect the amount of aggregation necessary
For example, if the results for a 5 year age group (ages 1-5 years of age) do not yield an adequate number of cases, then the age group is extended to cover more ages (1-10 years of age)
Aggregation
29
Program Management
Expert Determination
Office of Legal Services Privacy Team
Office of Public Affairs
Step 5 – Approval Processes
30
Public Aggregate Reporting for DHCS Business Reports
A multi-step process supports public reporting
Balances public reporting with protecting confidentiality
Will continue to review and revise as the data landscape changes and matures
DHCS is committed to supporting data publishing while being consistent with the HIPAA De-identification Standard
31
Thank you!