![Page 1: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/1.jpg)
UNIVERSITY OF WASHINGTON
Managing and Analyzing
Global Health Data
Seattle, August 30, 2011
Peter Speyer, Director of Data Development
![Page 2: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/2.jpg)
IHME Background
• Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health
• Part of the Department of Global Health at the University of Washington
• Funded by the Bill & Melinda Gates Foundation and the State of Washington (‘core funding’), and other funders through specific research grants
• Created in 2007
• 70 researchers, 30 staff
2
![Page 3: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/3.jpg)
IHME Mission
Our goal isto improve the health of the world’s populations
by providing the best informationon population health
3
![Page 4: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/4.jpg)
4
![Page 5: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/5.jpg)
Health-related data
• Social determinants• Risk factors
Health Data
5
Population-based data
• Household / facility surveys• Census• Vital registration• Registries (provider,
disease)
Facility-based data
• Health records• Administrative data
(financial, operational)• Research data (DSS,
clinical trials, etc.)
Individual-based data
• Personal health records• “Quantified self”• Disease-based social
networks
Health Data Innovation
Patient engagementOpen data
Health apps
![Page 6: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/6.jpg)
Key Health Data Challenges
6
Find & access
data
Dissemi-natedata
Use data
![Page 7: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/7.jpg)
Key Health Data Challenges
• Lack of transparency
• Timeliness of data
• Lack of documentation• Access vs. privacy
7
Find & access
data
Dissemi-natedata
Use data
![Page 8: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/8.jpg)
Key Health Data Challenges
• Sheer quantity of data files (30TB, 20K+ source datasets, 40M files)
• Diverse source data types and formats (pdf, csv, SPSS, CSPro, …)
• Data quality issues
8
Find & access
data
Dissemi-natedata
Use data
![Page 9: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/9.jpg)
Key Health Data Challenges
• Make results data engaging
• Accountability: share results, code, source data
• Accommodate diverse audiences (expertise, geographies)
9
Find & access
data
Dissemi-natedata
Use data
![Page 10: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/10.jpg)
Example: Global Burden of Disease
Mortality & causes of death
• Sources: census, surveys, vital registration, verbal autopsy
• Estimates: covariate models, spatial-temporal regressions; weighted combination of models
Morbidity
• Sources: Literature reviews, surveys, registries,hospital data
• Disease modeling: compartmental Bayesian model
• Health severity weights
Burden of disease
• DALYnator
10
300 diseases
40 risk factors
21 regions
1990, 2005, 2010
![Page 11: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/11.jpg)
GBD Country Years, Causes of Death 1950-2009
11
![Page 12: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/12.jpg)
GBD Country Years, Causes of Death 1950-2009
12
Data source Countries Site-years # of Deaths
VR 128 4,190 722,267,710
Household Surveys 136 2,827 10,132,976
Surveillance Systems 12 126 717,698
National VA 21 71 301,855
Subnational VA 59 442 2,606,815
Mortuary Registries 6 25 54,316
TOTAL 7,680 735,564,116
![Page 13: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/13.jpg)
Solutions: Computing Infrastructure
• Analysis with statistical packages
– Projects with 100K+ lines of code
• File system
– 60TB disc space
– Redundant backup
• Cluster with 63 nodes (+300% in 2011), ~2000 cores
– Runs 24x7, very little downtime
• Virtual environments to test new applications, servethem to collaborators, etc.
13
![Page 14: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/14.jpg)
Solutions: Global Health Data Exchange
• Transparency => data catalog• Access => data repository• Information => data community (future)
• One record per dataset• Standardized metadata• Internal users (10K records): files on file server• External users (5K records): files for download
• CMS: Drupal • Search: SOLR
14
Objectives
Approach
Implementation
![Page 15: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/15.jpg)
15
![Page 16: Managing and Analyzing Health Data (VLDB Conference)](https://reader035.vdocuments.mx/reader035/viewer/2022070304/54c67b724a7959a4368b4631/html5/thumbnails/16.jpg)
UNIVERSITY OF WASHINGTON
Thank you!
[email protected]@peterspeyer
www.ghdx.org
Peter Speyer
Director of Data Development