genomic studies at inova - national academiessites.nationalacademies.org/cs/groups/ssbsite/...data...
TRANSCRIPT
![Page 1: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/1.jpg)
Genomic Studies at Inova
Inova Translational Medicine Institute Aaron Black PMP
April 1, 2015
![Page 2: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/2.jpg)
Organization
• Background
• Data management
• Challenges \ Lessons Learned
![Page 3: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/3.jpg)
Six hospital + ambulatory healthcare system • Largest healthcare system in Northern VA
• Two million patient visits/year
• 20,000 deliveries/year
Main hospital • 1,000 beds including Children’s hospital • 10,000 deliveries/year
Background: The Healthcare System
![Page 4: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/4.jpg)
• Started: 2010
• Overall goal: research on integration of genomic information into the practice of medicine
• Staffing: 1/3 clinical, 1/3 bioinformatic/IT, 1/3 laboratory
Background: The Research Institute
![Page 5: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/5.jpg)
Background: ITMI Studies
Themes • Trio-based WGS • Other ‘omics • Comprehensive clinical data • Integrated Laboratory • Unified database
![Page 6: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/6.jpg)
• Molecular associations with preterm birth
• ~500 PTB trios, ~500 FT trios
• WGS + ‘omics + clinical data
• Specimens: blood, saliva,
cord blood, placenta
Preterm Birth Study (2011)
![Page 7: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/7.jpg)
Longitudinal Study (2012)
• WGS + ‘omics + clinical data on 5,000 → 10,000 trio-based families
• Longitudinal study (≥18 yrs)
• Blood, saliva, urine, cord blood, placenta
• DNA, RNA, protein, epigenetic + clinical data
![Page 8: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/8.jpg)
• Mostly NICU-based
• Any other patient with a “congenital/genetic” disorder
• ~2-3 families/week
• Trio-based WGS, etc.
Congenital Disorders Study (2012)
![Page 9: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/9.jpg)
Infrastructure: Hybrid Cloud
Cloud
On-Premises
Data Management
![Page 10: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/10.jpg)
Yes, a Hybrid
Object Store
AWS
NoSQL \ Hadoop Value pair \ Graph
Research Network
Relational Databases
Inova Infrastructure
Database
![Page 11: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/11.jpg)
Bio Materials Omic Data
Reporting & Analysis
On-premises Inova Research staff
On-premises Epic & Laboratory &
Clinical & Datawarehouse
On-premises HPC Cluster & Storage
External Collaborators
Consent Patient
1024 Cores 1.2 PB storage
16 TB Ram
ITMI Data Collection
![Page 12: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/12.jpg)
• Participants
• 9,750 participants (enrolled), > 110 Different Countries of Birth
• Files and data
– Manage 3+ Petabytes of storage capacity (between cloud and on-premise)
– ~10,000,000 million files
– File Sizes Range from Kilobytes to 100+ GB per file
• Specimens
• >450,000 specimens
• Whole Genomes Sequences
• 7,300 +
• ~36,500,000,000 variants!
• Also have epigenetic data
• Clinical
• 55,000+ Patient Diagnosis (Longitudinal)
• 110,000+Surveys and Case Report Forms
• Labs Results
• 2,000,000+ discrete lab results
• 2,200,000+ discrete variables from Case Report Form and Surveys
The Stats
![Page 13: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/13.jpg)
General Data Process AWS
Objects
CRFs
EHR
Omic Analysis
Survey
Staging
DW
Mart
Mart
Mart
Mart
Mart Data Lineage
Data Blending
Data Enrichment
![Page 14: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/14.jpg)
Omic Data Process
AWS Objects
Initial Omic Storage(S3)
Raw Data
Run Data
Log \ Meta Data
Failed QC Data
Long Term Storage
Avere Run Data
ITMI Network
Staging Inova Network
Network
Avere Run Data
Raw Data Avere Raw
Data
Analysis Data
DW
![Page 15: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/15.jpg)
(1) Logically Model
(2) Break down each
Data Model
![Page 16: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/16.jpg)
End
Mom DOB – xx/xx/xxxx
Infant DOB – xx/xx/xxxx
Mom Dx – xx/xx/xxxx
Baby Dx – xx/xx/xxxx
Mom Survey – xx/xx/xxxx
Mom Specimen Coll. – xx/xx/xxxx
Baby Specimen Coll. – xx/xx/xxxx
LONGITUDINAL DATA
Mom Consent – xx/xx/xxxx
WGS Data – xx/xx/xxxx
Baby Lab – xx/xx/xxxx
-13123
-230
-230
-101
0
1
31
65
103
180
Labs
Dx
WGS
Consent
Birth
Event
Survey
Objects Start
Pointers
![Page 17: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/17.jpg)
Visualization
![Page 18: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/18.jpg)
Web Portal
![Page 19: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/19.jpg)
Web Portal
![Page 20: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/20.jpg)
III. Challenges \ Lessons Learned
![Page 21: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/21.jpg)
Challenges
• Scalability
– Storage
–Compute
• Data Movement
• IT Standards
–Data
–Programming
![Page 22: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/22.jpg)
Lessons Learned
• Build infrastructure around data
– Network is bottleneck
– Hidden costs
• Manage Data Tiers
Ch
eap
er
/ b
yte
![Page 23: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/23.jpg)
• Spend time to model • Build strong metadata layer • Make it understandable to your team • Use best practices
• IT Partners that understand the business
Lessons Learned
![Page 24: Genomic Studies at Inova - National Academiessites.nationalacademies.org/cs/groups/ssbsite/...Data Blending Data Enrichment . Omic Data Process AWS Objects Initial Omic Storage(S3)](https://reader034.vdocuments.mx/reader034/viewer/2022050304/5f6c93939814d9354478da25/html5/thumbnails/24.jpg)
Inova Translational Medicine Institute John Niederhuber MD Joe Vockley, PhD Greg Eley, PhD Aaron Black, PMP Kathi Huddleston, PhD Ram Iyer, PhD Dale Bodian, PhD Wendy Wong, PhD Alina Khromykh, MD Dan Stauffer, PhD Sarah Ruppert, CGC Tiffani DeMarco, CGC Kim Rutledge, CGC and team Inova Health System David Ascher, MD Larry Maxwell, MD Al Khoury, MD George Bronsky, MD Barbara Nies, MD and team
Fairfax Neonatal Associates Robin Baker, MD Rajiv Baveja, MD and team Institute for System Biology Ilya Shmulevich, PhD Jared Roach, MD, PhD Brady Bernard, PhD Gustavo Glusman, PhD and team
Acknowledgements