breaking the curse of dimensionality
TRANSCRIPT
![Page 1: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/1.jpg)
ORNL is managed by UT-Battelle for the US Department of Energy
Breaking the curse of dimensionalityExplainable-AI and Evidence Mining as Applied to Systems BiologyDan Jacobson
This research is supported by an INCITE award
and uses resources of the Oak Ridge Leadership
Computing Facility at the Oak Ridge National
Laboratory, which is supported by the Office of
Science of the U.S. Department of Energy under
Contract No. DE AC05-00OR22725.
![Page 2: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/2.jpg)
Experimental Data Types• Natural Variation
– Genome Wide Association Studies– 28 Million of SNPs– ~140,000 Primary Phenotypes
• Morphology/Phenology• Molecular
• Microbiomes & Metagenomes• Omics & Meta-omics– Genomics, Transcriptomics,
Proteomics, Metabolomics
• All publically available Genomes• Differential/Time Series
Expression Studies• Systems Biology Approach– Combining datasets across omics
layers, sample sets, and species
![Page 3: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/3.jpg)
Traditional Results
![Page 4: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/4.jpg)
Integrated Vision: From Systems Biology to 3D Structural Interactions
Structures:28 million
compounds
![Page 5: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/5.jpg)
Integrated Vision: From Human Systems Biology to 3D Structural Interactions - Pharmacogenomics and Personalized Medicine
• Co-evolution• 1000 Genomes
Project• Protein cross species
• Human RNA-seq• Crystal Structures• Protein-protein interaction• Human interactome• ENCODE• Explainable-AI
• iRF/TiRF• DNNs• MIPs
• Public GWAS data• As available from
collaboration with VA• Genetic data• Clinical phenotypes• Polypharmacy data
Structures:28 million
compounds
![Page 6: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/6.jpg)
Integrated Vision: From Human Systems Biology to 3D Structural Interactions - Pharmacogenomics and Personalized Medicine
Structures:28 million
compounds
![Page 7: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/7.jpg)
SNP Vectors Phenotype Vectors
Single QTL mapping: 28 million tests per phenotype
• SNP Matrix expansion from interpolation (ANL)• Control for effects of population structure
![Page 8: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/8.jpg)
![Page 9: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/9.jpg)
140,000 Manhattan Plots???
![Page 10: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/10.jpg)
Network Theory• Networks can be used to represent biological
systems – Nodes• Represent any object (genes, SNPs, proteins,
metabolites, species, microbiomes, etc.)
– Edges • Represent a relationship between two nodes
(correlation, co-occurrence, physical contact, etc.)
• Relationships can be quantitative (represented by the thickness of the line)
• Integration and Visualization of Systems Biology Models
• Mathematical Structure– Allows to be computed upon– Millions of nodes– Trillions of edges
![Page 11: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/11.jpg)
Deeper Discoveries in Systems Biology: The Balance Between Type 1 and Type 2 ErrorOur ability to reconstruct the entirety of a complex biological system improves as the number of population-scale endo-, meso- and exo-phenotypes are measured and combined with deep layers of experimental data collected on individual genotypes.
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery. Deborah Weighill , Piet Jones, Manesh Shah, Priya Ranjan, Wellington Muchero, Jeremy Schmutz, AvinashSreedasyam, David Macaya Sanz, Robert Sykes, Nan Zhao, Madhavi Martin, Stephen DiFazio, TImothy Tschaplinski, Gerald Tuskan, Daniel Jacobson. Front. Energy Res. - Bioenergy and Biofuels, DOI: 10.3389/fenrg.2018.00030
![Page 12: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/12.jpg)
GWAS: Single QTL Mapping
• Very Powerful• Frequently does not capture a significant portion (often
the majority) of the genetic signal• Often does not find complete genetic architectures for
complex phenotypes (Dementia, Alzheimer’s, Schizophrenia, Cardiovascular disease, PTSD, Suicide, Addiction, etc.)
![Page 13: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/13.jpg)
Epistatic Example: Transcription Initiation Complex
![Page 14: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/14.jpg)
The Need for Speed
-50
0
50
100
150
200
250
0 5 10 15 20 25 30
CPU
Hou
rs
Mill
ions
SNPs
Millions
Pairwise Epistatic Compute Time per Phenotype
4-way combinations = 2.4 x 1020 CPU hours per phenotype
![Page 15: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/15.jpg)
15 Presentation name
Breaking the curse of dimensionality
10M Genetic Variants in >40k genes
Genes do not work in isolation: 10170
potential interactions among
variants
Linking genetic variants to phenotypes
requires the exploration of an enormous space
To obtain accuracy and insight, we are developing procedures to detect interactions of any form or order at the same computational cost as main effects
Explainable-AI
![Page 16: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/16.jpg)
16 Presentation name
Machine and Deep Learning Algorithms
• Great at classification• Essentially black boxes
• Don’t reveal the interactions between variables that lead to the classification
• Need Explainable AI
![Page 17: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/17.jpg)
17 Presentation name
Finding Higher Order Combinatorial Interactions in Complex Systems
• X matrix and Y vector• Iterative Random Forests
![Page 18: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/18.jpg)
iRF Workflow
WeightedRandom Forest
Training Data
Tree Ensemble
Test Data
Node Filtering
Random Intersection Trees
Feature Interactions
Interactions(refined)
PredictionAccuracy
Bootstrap Aggregation (Bagging)
Interactions(stabilized)
FeatureImportance
X2> 0.9
X11> 0.5
X7> 0.1
X11> 0.7
X10< 0.5
X11< 0.7
X3> 0.1
X3< 0.1
X9< 0.5 X9>
0.5
X7< 0.1
X2> 0.3
X2< 0.3
X11< 0.5
X2< 0.9
X10< 0.5
{X2, X11} {X11} { } … … … …
{X2, X11, X7}
{X2, X11, X1} {X9, X3}
{X9, X11, X10}{X2, X11}
{X3, X9, X10} {X9, X3} … {X2, X11, X1} ... {X2, X7, X11}
DOE Collaboration: Ben Brown - LBNL
![Page 19: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/19.jpg)
19 Presentation name
iRF – X Matrix and 1 Y Vector
SNP Vectors Phenotype Vectors
![Page 20: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/20.jpg)
20 Presentation name
iRF – X Matrix and 1 Y Vector
SNP Vectors Phenotype Vectors
4-way combination = 1000 CPU hours per phenotype (140,000 phenotypes)
![Page 21: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/21.jpg)
21 Presentation name
Tensor iterative Random Forests (TiRFs)
• Effectively build forests that can be mined for interactions within a multi-dimensional X, a multi-dimensional Y and interactions between multiple dimensions in X and Y, all at the same time.
![Page 22: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/22.jpg)
22 Presentation name
SNP Vectors Phenotype Vectors
![Page 23: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/23.jpg)
23 Presentation name
TiRF – X Matrix and Y Matrix Simultaneously
SNP Vectors Phenotype Vectors
![Page 24: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/24.jpg)
24 Presentation name
Clinical Genomics and Human Systems Biology: DOE & VA – MVP Champion
• ORNL–Clinical records 23+ million
patients, 20 years–358,000 Genotypes–=> 4 million genotypes
![Page 25: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/25.jpg)
25 Presentation name
VA Use Case: Polypharmacy
• Simultaneous use of multiple medication• Of concern if 5 or more medications are used
![Page 26: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/26.jpg)
26 Presentation name
Why Worry About Polypharmacy?
• Drugs interact with each other, the more you are on, the more interactions can occur
• Side effects add up and are more pronounced in older individuals
• Medications are approved by FDA based on short term trails that typically exclude:– Those with other diagnoses on other medications– 65+ year olds
![Page 27: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/27.jpg)
27 Presentation name
Interaction Network• Drug Set Simulations
– For all set sizes 2 – 30• Create 20 million random sets
of drugs for each set size• 58 million sets
– Check for drug to drug edges amongst all possible pairs in each set for the shared target and shared pathway networks• 567 Billion interaction
tests• Clinical Data
– Create drug sets from clinical records
– Check for drug to drug edges amongst all possible pairs in each set for the shared target and shared pathway networks
![Page 28: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/28.jpg)
Drug Interaction: Simulation vs Clinical Practice
0
5
10
15
20
25
30
35
40
45
0 2 4 6 8 10 12 14 16
Know
n In
tera
ctio
ns
Drug Set Size
Non_HIV_Mean
Simulation
![Page 29: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/29.jpg)
Polypharmacy Morbidity & Mortality
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.40
2.60
0
5
10
15
20
25
30
35
40
45
2 4 6 8 10 12 14 16
Haz
ard
Rat
io
Inte
ract
ions
Number of Drugs
HIV+ Interactions
Unifected Interactions
Simulation
HIV+ HR
Unifected HR
Linear (HIV+ Interactions)
Linear (Unifected Interactions)
Linear (Simulation)
Linear (HIV+ HR)
Linear (Unifected HR)
![Page 30: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/30.jpg)
Drugs & Interactions
Outcomes (morbidity and mortality)
VA: Preliminary Results
• Polypharmacy • Clinically relevant patterns• iRF
• Steps toward automated phenotyping
• Interaction edges -> morbidity & mortality
• Diseaseome• iRF on diagnostic codes• 600,000 patients• Relationships between all known
human conditions• Co-morbidity map
• Discovered 9th-order combinations• 1.5 x 1026 possible 9-way
combinations
![Page 31: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/31.jpg)
31 Presentation name
![Page 32: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/32.jpg)
32 Presentation name
TiRF – Any Set of Matrices or Tensor Dimensions Simultaneously
• Spatial and temporal/longitudinal information• Different Omics layers (genome, transcriptome, proteome,
metabolome, microbiome…)• Quantum chemical tensors
![Page 33: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/33.jpg)
33 Presentation name
Tensors: Matrices Cubes
![Page 34: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/34.jpg)
Tensors: Matrices Cubes Polytopes
![Page 35: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/35.jpg)
35 Presentation name
From data matrix to cube to polytopes.
Machine Learning:
TiRF
![Page 36: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/36.jpg)
Tensor iterative Random Forests (TiRFs)• Effectively build forests that can be mined for interactions within a multi-
dimensional X, a multi-dimesional Y and interactions between multiple dimensions in X and Y, all at the same time.
• Applications in Systems Biology– Plants– Microbes– Humans, Mice– Drosophila
• Applications in Text Mining– Electronic Health Records– Scientific Literature
• Simulation Models– Combinatorial parameter sweeps (X) model output (Y)
• Any domain with high a dimensional set of matrices
Iterative Deep Neural Networks (iDNNs)• Unpacking the black box• Discovering the interactions encoded in DNNs
![Page 37: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/37.jpg)
37 Presentation name
High Order Interactions:Explainable AI: Machine and Deep Learning Integration
TiRFs
DNNs
MIPs
![Page 38: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/38.jpg)
38 Presentation name
High Order Interactions:Explainable AI: Machine and Deep Learning Integration
TiRFs
DNNs
MIPs
Netw
ork
Evol
utio
nMEN
NDL
![Page 39: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/39.jpg)
39 Presentation name
Exposome
![Page 40: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/40.jpg)
40 Presentation name
High Order Interactions: Exposome – Adverse Outcome NetworksExplainable AI: Machine and Deep Learning Integration
TiRFs
DNNs
MIPs
Titan/Summit
![Page 41: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/41.jpg)
41 Presentation name
Collaborators• Ben Brown• Jerry Tuskan
• Steve DiFazio• Wayne Joubert• Amy Justice
• Edmon Begoli
Computational Infrastructure• Oak Ridge Leadership Computing Facility (OLCF)• Compute and Data Environment for Science (CADES)
Acknowledgements
Joint Genome Institute
![Page 42: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/42.jpg)
42 Presentation name
Acknowledgements
CBIPMILDRDVAOak Ridge Leadership Computing Facility (OLCF) at ORNLCompute and Data Environment for Science (CADES) at ORNLINCITEJoint Genome Institute (JGI)
Oak Ridge National Laboratory (ORNL)Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville
![Page 43: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/43.jpg)
43 Presentation name
Acknowledgements• JAIL Team effort
– Debbie Weighill– Piet Jones– Carissa Bleker– Armin Geiger– Marek Piatek– Ben Garcia– Ashley Cliff– Jonathon Romero– David Kainer– Annie Fouche
– Sandra Truong– Ryan McCormick– Priya Ranjan– Manesh Shah– Doug Hyatt– Blake Wiley– Jesse Marks– Ian Hodge– Annabel Large– Chris Ellis
![Page 44: Breaking the curse of dimensionality](https://reader031.vdocuments.mx/reader031/viewer/2022022401/621677f00585aa60e6098c21/html5/thumbnails/44.jpg)
Questions?Machine Learning:
TiRF