education data warehouse building blocks: identity matching and data governance

16
Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013 1

Upload: selina

Post on 22-Feb-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Education Data Warehouse Building Blocks: Identity Matching and Data Governance. IPMA May 21, 2013. AGENDA. Vision (Marc Baldwin) Identity matching (John Sabel) Data Governance (Melissa Beard) Questions. IDENTITY MATCHING. Protecting Personally Identifiable Information ( PII) . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

Education Data Warehouse Building Blocks: Identity Matching and Data GovernanceIPMAMay 21, 2013

1

Page 2: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

AGENDA• Vision (Marc Baldwin)

• Identity matching (John Sabel)

• Data Governance (Melissa Beard)

• Questions

2

Page 3: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

IDENTITY MATCHING 3

Page 4: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

4

Protecting Personally Identifiable Information (PII)

• Step 1: Isolate PII data from all other data• Link PII data in isolated environment to create

linking IDs.• Perform data analysis linked data in different

environment. This environment has no PII data.• Step 2: Redact data• FERPA (Family Educational Rights and Privacy

Act) requirements.• Subject to data sharing agreements.

Page 5: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

5

P-20 Data Warehouse Inputs through Outputs

Personally Identifiable Information (PII) is encapsulated in the MDM Oracle database.

Operational Data Store

(SQL Server Database)

SectorData

Providers

DELOSPI

SBCTCPCHEES

ESDDRSNSCL&I

Master Data

Management

(MDM)

(Oracle Database)

PII Data

Linked IDs Only

(PII Data Stripped)

Input Identity Matching Data Store

Output (Business Intelligence)

Stars

Cubes

Data SetsNon-PII Data (Bulk of Data)

5

Page 6: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

6

Identity Matching Challenges

• Most of education data involves deduplication (i.e. consolidation). • Between sources of data, varying number and

quality of common identifiers.• Public post-secondary instruction data has SSNs

but K12 data does not.• Idiosyncrasies in data• For example, Jan 1st birth dates are often used

when the birth day and month are unknown.• Twins

Page 7: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

7

Identity Matching Challenge Matrix

Many Common Identifiers (Easy)

Few Common Identifiers (Hard)

Linking Two Data Sources* (Easy) Easy2 Hard x Easy

Deduplicating One Data Source** (Hard)

Easy x Hard Hard2

* Example: Linking birth certificate data to hospitalization data.** Example: Post-secondary instruction data. A single student can be enrolled in multiple colleges, both longitudinally (over time) as well as at the same time.

Page 8: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

8

Identifiers in the Perfect World

SELECT K12 .*, College.* FROM K12 INNER JOIN College ON K12.Bulletproof_Surefire_Global_Student_ID = College.Bulletproof_Surefire_Global_Student_ID

Page 9: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

9

Identifiers in the Other Perfect World

SELECT K12 .*, College.* FROM K12 INNER JOIN College ON K12.SSN = College.SSN

Note: Every student has a valid, properly assigned SSN.

Page 10: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

10

Addressing Identity Matching Challenges

• Deduplicate each data source first• You then can take advantage of source specific

identifiers. For example, K12 data has the State Student Identifier (SSID).

• Merge deduplicated data source with the rest of the data warehouse.*

* This is itself a deduplication process.

Page 11: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

11

Identity Matching Opportunities

• Use name change data• For example, DOH marriage and divorce data.*

* As of 2012, marriage and divorce contains inferred name changes for females only.

Page 12: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

12

Identity Matching Mechanics

• First, deterministically deduplicate data• Always strive first to minimize false positives and

then try to minimize false negatives.• These matches are then auto-merged.

• Second, use probabilistic techniques to auto-merge additional data• Last, use probabilistic techniques to create manual

review sets• These are selectively merged .

Page 13: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

DATA GOVERNANCE 13

Page 14: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

ERDC Data Governance• No data warehouse without data governance• Rules of engagement• Goal: Link data so it can be shared• Data contributors• Data sharing policy workgroup• Defined set of tasks• Temporary• Small group of problem-solvers

14

Page 15: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

P-20W DATA GOVERNANCE COMMITTEE STRUCTURE

15

Office of FinancialManagement

Education Research& Data Center (ERDC)

Data StewardsCommittee

Experts directly familiar with data from their agency used in research.

Data CustodiansCommittee

Technical experts responsible for the technical delivery of data to and from the warehouse.

ResearchCoordination CommitteePolicy experts who interact with agency decision-makers, stakeholders, and researchers.

ERDCGuidance Committee

Agency directors or deputies from agencies contributing data

Page 16: Education Data Warehouse Building Blocks: Identity Matching and Data Governance

CONTACT INFORMATION• Marc Baldwin, OFM Assistant Director, Forecasting• [email protected]• 360-902-0590

• John Sabel, Education Research Analyst• [email protected]• 360-902-0943

• Melissa Beard, Data Governance Coordinator• [email protected]• 360-902-0584

16