5/21/2014 d ata p reparation and p rofiling : s trategies, challenges, and experiences t im n orris...
TRANSCRIPT
![Page 1: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/1.jpg)
5/21/2014
DATA PREPARATION AND PROFILING:
STRATEGIES, CHALLENGES, AND EXPERIENCESTIM NORRIS AND MARK LUNDGREN
![Page 2: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/2.jpg)
5/21/2014
TODAYS AGENDA
• Introductions•Date Profiling and Readiness•Lessons Learned•Future Direction
![Page 3: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/3.jpg)
5/21/2014
ABOUT THE P20W DATA WAREHOUSE
• Statewide longitudinal data system • De-identified data about people's early childhood, Kindergarten through 12th
grade, higher education and workforce experiences and performances
• Collected and linked from existing state agency data systems.
• It includes data about the kinds of services they receive, programs in which they participate, and their academic performance and program or degree completion.
• It also includes a variety of demographic data so we are able to look at a variety of different groups of people.
• Personally identifiable information, such as names, social security numbers, addresses, and other data which can identify a person as an individual, are not part of the research database.
![Page 4: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/4.jpg)
4
ECEAP students
K-12 students
K-12 teachers
CTC students
Baccalaureate students
National Student Clearinghouse
Workforce
IPEDS Financial
Data Sources
data
Data Management, Governance
Standards, confidentiality, security
Critical questions
Data dictionary, matching,
longitudinal linking, cross-sector derived
elements
P-20/W datasets
ERDC
Research
Data to partner agencies
PCHEES
Collaborative research
Ad-hoc requests (data and research) for
partners and legislature
LEAP
External requests for data
Feedback reports (behalf of agencies)
Output
OFM
![Page 5: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/5.jpg)
5/21/2014
DATA FLOW PROCESS
•Chart of data flow goes here
![Page 6: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/6.jpg)
5/21/2014
DATA SOURCE CHARACTERISTICS
•Over 20 source data feeds•Data systems being developed in
parallel•Some migrated historic data,
some didn’t
![Page 7: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/7.jpg)
5/21/2014
DATA PREPARATION: DATA PROFILING
•Do it early, do it often•Verification of data dictionary•Descriptive statistics•Distinct counts and percentages• Zero, blanks and nulls•Minimum and maximum values• Patterns of data
![Page 8: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/8.jpg)
5/21/2014
DATA PREPARATION: DATA PROFILING
• Dataset validation checks• Counts of records by time, institution
• Values and codes over time• Systematic changes (0,1 to Y,N)• Values defined in data dictionary• Quality of data• Names and identifiers• Data elements
![Page 9: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/9.jpg)
5/21/2014
DATA PREPARATION: DATA PROFILING
• Toolset varied by analyst• SAS• Informatica Data Analyst• Excel
• Goal of understanding the data• Constraints• Completeness, patterns over time• Values of each data element
![Page 10: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/10.jpg)
5/21/2014
DATA PREPARATION: DATA READINESS
•Document and expand results of profiling process•Generate the “goto” resource for
follow-up question•Resource to begin data loading• Content that feeds the data
dictionary
![Page 11: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/11.jpg)
5/21/2014
DATA PREPARATION: DATA READINESS
• Information about:•Data provider•Data file•Data elements
![Page 12: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/12.jpg)
5/21/2014
READINESS CONTENT ITEMSDataset elements Data element
Number of records Name and description
Years Provided Acceptable values
Primary key Data format/length
Business owner and steward Business rules
Update frequency Identity matching flag
Extract process Field/record level data rules
Known issues Security category
Dataset level rules Notes
![Page 13: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/13.jpg)
5/21/2014
DATA READINESS TEMPLATE
• s
![Page 14: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/14.jpg)
5/21/2014
WHAT WE’VE LEARNED
• Customers need to be involved•Dictionaries don’t match data• Educate our analyst on the data,
the customer on the vision of the database•Avoid custom extracts•More time required up front
![Page 15: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/15.jpg)
5/21/2014
TOWARD THE FUTURE
• Empower the provider by offering guidance and tools for profiling•Develop feedback process of data
quality and edits back to customer•Open and transparent
![Page 16: 5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN](https://reader035.vdocuments.mx/reader035/viewer/2022070403/56649f325503460f94c4eee4/html5/thumbnails/16.jpg)
5/21/2014
QUESTIONS?