bmi meets big data - acmi 2016 winter symposium
TRANSCRIPT
Biomedical Informatics Meets Big DataImplications for Research, Training, and Policy
Philip R.O. Payne, PhD, FACMIThe Ohio State University, Department of Biomedical Informatics
Justin B. Starren, MD, PhD, FACMINorthwestern University, Biomedical Informatics Center (NUBIC)
Peter J. Embi, MD, MS, FACMIThe Ohio State University, Department of Biomedical Informatics
ACMI 2016 Winter Symposium
Big Data in BiomedicineTerms, Definitions, and Concepts
What Makes Big Data “Big”?
VolumeVelocity
Variability Veracity
Rethinking Science
Aug. 2015VIVO 2015 ©Starren 2015
4
Theory Experimentation
Computation
Variants in TerminologyBig data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data.1
Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data.1
Data analytics is a process of sifting, organizing, and examining vast amounts of information and then drawing conclusions based on that analysis.2
Sources: 1http://www.techopedia.com/; 2http://discovery.osu.edu/
Theories andMethods
Sources
Applications
SAMPLING
FILTERING
Big Data is About Filtering
Data per Individual
Kilobytes PetabytesTerabytesGigabytesMegabytes
Millions
Thousands
One
Claims
EHR Data
Basic Genomics
Clinical Imaging
Microbiomics, Cell Population Sequencing,
Proteomics
PHR Data
Microarray
Exposome / Sensor
Survey Data
Types of Big Data in Biomedicine
Epigenomics, etc
Num
ber o
f Pati
ents
Nec
essa
ry fo
r Big
Dat
a
Big Data in Biomedicine
Source: FSM Big Data Survey 2013
Note that “size” was not the
most common descriptor.
Data Scientist: Sexiest Job of the 21st Century
Source: Harvard Business Review, http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
The “Hype Cycle” and Big Data
Sources: Gartner Whitepaper, “Gartner Hype Cycle for Emerging Technologies 2014”
Big DataData Science
The Healthcare Information Age?
Characteristics Before The Printing Press After The Printing Press
Cost HighPrinted materials only available to the extremely wealthy
LowPrinted materials become cost effective for general public
Ubiquity LowCopies of printed materials had to be transcribed by hand, limiting number of instances
HighMass production of printed materials leads to broad dissemination and access
Reproducibility LowErrors of transcription and omission very common
HighSystematic printing processes ensure fidelity of materials
The Advent of the Printing Press and the 1st Information Age
Characteristics Before HIT and Big Data After HIT and Big Data
Cost HighData sets generated and/or curated on a need basis
LowData production and storage costs decreasing in excess of Moores Law
Ubiquity LowProprietary data situated in vendor or project-specific repositories and formats
HighData becoming a renewable resource enabled by diverse re-use scenarios
Reproducibility LowErrors of transcription and omission very common
HighLinked public data enables the creation of “commons” model
Growth in HIT and Big Data in the Healthcare Information Age
Data, Data, Everywhere…
Molecular Phenotype
Environment
Enterprise Systems and Data Repositories:EHR, CRMS, Data Warehouse(s)
Emergent SourcesPHR, Instruments, Etc.
UbiComp + Sensors
13The Data Commons = An Ecosystem of Open Data and Tools That Can Be Adopted and Adapted in order to generate value
Open data Open source Open methodology Open peer review Open access Open educational resources
Cumulative, transparent, and reproducible science and innovation
Watson M. When will ‘open science’ become simply ‘science’? Genome biology. 2015;16(1):101.
Using Big Data to Answer Questions
Sources: IBM Whitepaper, “Learn Why Analytics Drive Better Business Outcomes Now”
Align
AnticipateAct
Transform
Learn
Goals and Information
See and Shape OutcomesDecide and Optimize
Data-driven Decision MakingIm
prov
e Ev
ery
Out
com
e
Is this really Data
Science?
Virtuous Cycle of Data Science and Informatics
Biological and Social Processes
Data
Data Science focuses on converting data (esp. big data) into knowledge
InformaticsIncludes both analysis and
creation of digital interventions
Observed, Measured or Instrumented to
Produce
New Knowledge and Insight
VIVO 2015 ©Starren 2015
Precision Medicine“Big Data meets Personalized Medicine”
16
Genotype
Current State
Environment
HealthHistory
Time
OptimalCurrent State
OptimalFuture State
ProbableFuture State
Over
all H
ealth
Con
ditio
n
Genetic Risk
Prediction
QualityInitiatives
Aug. 2015
Precision MedicineBig Data meets Personalized Medicine
Data ScienceThe Science of Big DataBig Data
The four V’sFiltering vs. Sampling
InformaticsData Science meets the Human Condition
Data AnalyticsProblem Solving Using Big Data
Relationship Models
Data ScienceInformatics
Synonymy
Problem: Informatics has an applied and sociotechnical component.
Data Science
Informatics
Subsumption
Problem: Informatics does implementation.
New Term for Big Data at NIHUses Subsumption model 1
Data Science
Informatics
Subsumption 2
Problem: Data Scientists will not accept this view. Data Science students learn things that informatics student often do not.
DataScience
ImplementationScience
Sociotechnical
Informatics
Intersection
Problem: Data Science does not have have a sociotechnical or Implementation science component.
DataScience
ImplementationScience
Sociotechnical
InformaticsCognitive
Workflow
HCI
Intersection 2
Problem: Sociotechnical and Implementation science are not the same.
ImplementationScience
SociotechnicalInformatics
DataScience
Informatics vs. Data Science?
InformaticsData Science meets the Human Condition
Thank you for your time and attention!• [email protected]• [email protected]