wcit 2014 amnon shvo - translational & interoperable health infrastructure
DESCRIPTION
Workshop at the WCIT 2014 Translational & interoperable health infrastructure Amnon Shvo, University of HaifaTRANSCRIPT
© Amnon Shabo (Shvo)
Translational & Interoperable Health Infostructure -
The Servant of Three Masters
Amnon Shabo (Shvo), PhD
Chair, EFMI Translational Health Informatics Work Group
Chair, IMIA Health Record Banking Work Group
Co-chair, HL7 Clinical Genomics Work Group
Research Fellow, University of Haifa
Towards a universal health information language
Revolutionizing healthcare through independent lifetime health records
© Amnon Shabo (Shvo)
“The Patient-centric translational health record”
This lecture is partially based on my recent publication:
2
© Amnon Shabo (Shvo)
Agenda
Translational Medicine and informatics
Universal Exchange Language?
Translational Health Information Language!
Semantic Warehousing of Health Information
The vision - Independent Health Record Banks
3
© Amnon Shabo (Shvo)
Translational Medicine Basic Concepts
T = Translational Barrier
Each T is tough but when Tn succeeds and Tn+1 fails… it‟s frustrating!
4
Source: Sarkar, IN. Biomedical informatics and translational medicine. Journal of Translational Medicine 2010, 8:22.
© Amnon Shabo (Shvo)
Translational Health Informatics
Translational Medicine involves data-driven approaches
CBR; machine learning; simulation, etc.
Analyze observational data found in operational health information
systems and use them for both -
discovery of new insights and suggesting hypotheses to be
checked in controlled trials
refinement of established evidence and clinical guidelines
Translational Research is about translating results of studies
in all relevant disciplines:
Biology
Analytics
Technology (IT, systems, equipment, modalities, devices, etc.)
Socio-economic, bio-ethical and medico-legal considerations
5
Common Informatics
infrastructure (language)
is needed across
disciplines !
© Amnon Shabo (Shvo)
Informaticians as Translators
6
Enable a
feedback loop
Source: Sarkar, IN. Biomedical informatics and translational medicine. Journal of Translational Medicine 2010, 8:22.
© Amnon Shabo (Shvo)
Agenda
Translational Medicine and informatics
Universal Exchange Language?
Translational Health Information Language!
Semantic warehousing of health information
The vision - Independent Health Record Banks
7
© Amnon Shabo (Shvo)
USA President‟s Report on How Best to Use HIT
Key recommendations from the US PCAST Report:
“The initial approach to meaningful use* has focused on
driving physicians to adopt EHR systems that perform
important quality-improving functions within the practice
and, to a lesser extent, on developing capabilities for
broader sharing”
“Creation and dissemination of a
universal exchange language for healthcare information”
“An infrastructure for locating patient records”
“Rigorously protecting privacy and security”
* US Federal Meaningful Use of HIT – Incentives criteria set in the US for reimbursement
8
© Amnon Shabo (Shvo)
Flat representations are flat tires!
Health data semantics and context
cannot be faithfully represented
using flat structures (e.g., a list of
entries), rather, it requires a
compositional language that
associates data entries into a
meaningful statement
© Amnon Shabo (Shvo)
Universal Exchange Language? Start with Statements!
10
Code
Participant
Object
Code
Code
Insert into basic
health objects
Clinical Statement
Observation
Object
Medication
Object
Procedure
Object
Gra
mm
ar
Example: Observation O1
(consisting of Observations
O11 and O12 and related to
Subject S1), is the reason for
Procedure P1 (performed by
Clinician C1) which is the
cause of Observation O2… OthersDocsPharmaLab
SNOMED, LOINC, ICD, etc.
It’s available through the new generation of standards!
© Amnon Shabo (Shvo)
Examples of Data Sets in a Hypertension Study
Blood Pressure:
Systolic and Diastolic measurements are components
Mean is derived from the above components
Heart rate (HR) measurement is timed with
blood pressure (BP)
Anti-hypertension drug
Is taken with the indication of
Hypertension
Microalbuminuria
HR + BP were measured…
before and after taking a drug
Dose increased if Sys.BP>x
11
BP, HR -8
BP, HR -4
BP, HR 0
BP, HR +4
BP, HR +8
BP, HR +12
BP, HR +16
BP, HR +24
BP, HR +48Taken from different
columns
in the source data…
Losartan 50 mg/day (T=0)
Losartan 100 mg/day (T=+4)
Semantics is
often implicit!
So
urc
e: H
yp
erg
en
es
Co
ho
rts
© Amnon Shabo (Shvo)
Putting Detached Data into a Clinical Statement
12
Observation
Blood Pressure
SNOMED [BP code]R
ela
tion:
[time
d]
SubstanceAdministration
DrugTherapy
[Losartan intake details: dose, time, etc. ]
Entity / Role
ManufacturedMaterial
[Losartan ]
Relation:
[participation]
Observation
Heart Rate
LOINC [HR code]
Relation:
[timed]
Observation [Organizer]
Vital Signs
Rela
tion:
[co
mp
]
Rela
tion:
[co
mp
]
© Amnon Shabo (Shvo)
Clinical Genomics Statement
e.g., an OMIM Entry:
Despite the dramatic responses to EGFR inhibitors in
patients with non-small cell lung cancer, most patients
ultimately have a relapse. {12:Kobayashi et al. (2005)}
reported a patient with EGFR-mutant, Gefitinib-responsive,
advanced non-small cell lung cancer who had a relapse
after 2 years of complete remission during treatment with
Gefitinib. The DNA sequence of the EGFR gene in his
tumor biopsy specimen at relapse revealed the presence
of a second mutation ({131550.0006}). Structural modeling
and biochemical studies showed that this second mutation
led to the Gefitinib resistance.
13
© Amnon Shabo (Shvo)
Example: Clinical Genomic Statement
14
Observation
SequenceVariation
[EGFR Variant id
131550.0001]
Relation:
[cause]
Observation
ClinicalPhenotype
[responsive]
Rela
tion:
[su
bje
ct]
SubstanceAdministration
DrugTherapy
[Gefitinib intake details: dose, time, etc. ]
Entity / Role
ManufacturedMaterial
[Gefitinib ]
Relation:
[participation]
Observation
SequenceVariation
[EGFR Variant id
131550.0006]
Relation:
[cause]
Observation
ClinicalPhenotype
[resistant]
Relation:
[SAS]
OBSERVED
INTERPRETIVEGoal is to
provide it
ON TIME!!
© Amnon Shabo (Shvo)
Clinical Genomics Statement Model
15
Indications PhenotypesOmics
Observation
PerformersSpecimen
Genomic
Source
Cli
nic
al G
en
om
ic S
tate
me
nt Associated
Observationsencapsulation
Key omics
datareference Raw omics
data
ObservedInterpreted
* GTR is based on constraining the HL7 Clinical Document Architecture (CDA) base standard
* CDA: Clinical Document Architecture –
An HL7 standard describing generic structure of clinical documents with narrative along with structured data following a clinical statement model.
Specializes the HL7 Clinical Statement model
Aligned with HL7 Clinical Genomics specs
Subset is used by the Genetic Testing Report (GTR)*
© Amnon Shabo (Shvo)
Narrative Structured „Reconciliation‟
Health information language needs to
accommodate unstructured data (e.g.,
clinician's narrative or patient’s story),
while maintaining interlinks to
structured data entries corresponding
to contents that have been structured
16
© Amnon Shabo (Shvo)
HL7/ISO CDA (Clinical Document Architecture)
17
CDA
Human-to-Human
Machine-to-Machine
Printed
Bedside
…
EMR
Transcription
…
Medical Records
Transformation
…
Clinical Decision Support
Patient held-records alerts
…
© Amnon Shabo (Shvo)
CDA Overview
CDA – a generic specification
Could be used to representvarious types of documents:
Consultation note
Visit / progress note
Referral letter
Discharge summary
Operative note
…
A document type is alsocalled ‘template’ or‘implementation guide’
18
Body
Header
Body
SectionNarrative
Clinical Statement
CDA
Entry
CDA
Entry
CDA
Entry…
CDA Document
code
code
code
© Amnon Shabo (Shvo)
CDA IG: Genetic Testing Report (GTR)
Define an implementation guide for a genetic testing
report that is human readable and machine-processable
Target at all types of GTR producers, e.g., genetic labs, clin.
geneticists
Readable content is larger in scope, e.g., detailed description of
the tests performed along with references
Machine-processable should be limited, e.g., exclude raw data
Ballot a Universal IG; then derive specific types of GTR:
Healthcare & Research
Realm-specific guides
Omic-specific guides
Developed using the MDHT* open source tool
19
* MDHT - Model Driven Health Tool
© Amnon Shabo (Shvo)
GTR Overall Layout
20
Sections
order
constraint
Section
titles
constraint
Document
code
constraint
© Amnon Shabo (Shvo)
GTR Rendered – The Header
21
Draft that has not been clinically validated
© Amnon Shabo (Shvo)
GTR Rendered – Summary Section
22
Draft that has not been clinically validated
© Amnon Shabo (Shvo)
Genomic Observations Organizer
Clinical Genomic Statement - Overall Interpretation
23
Overall
Interpretation
Performers
Wit
hin
th
e
Ove
rall
In
terp
reta
tio
n S
ec
tio
n
CG
S-O
I
CGS Reference
CGS Reference
CGS Reference
Test Details Section
CGS Test Details Section
CGS Test Details Section
CGS
GTR
Sp
ec
ific
ge
ne
tic
tes
tin
g's
© Amnon Shabo (Shvo)
GTR UML Model - Model Driven Health Tool
24
Sections multiplicity:
Summary SHALL appear exactly once
TestDetails SHALL appear at least once
TestInformation MAY appear once
© Amnon Shabo (Shvo)
Key data out of raw/mass data sets
pertaining to an individual should
be encapsulated in its native
format into clinical data structures,
where 'bubbled-up' items could be
associated with phenotypic data
(using clinical data standards)
The Challenge of Raw and Mass Data
25
© Amnon Shabo (Shvo)
HL7 Clinical Genomics: Encapsulate & Bubble-up
26
Clinical PracticesGenomic Data
Sources
EHR
System
Bubble up the most clinically-significant raw
genomic data into specialized HL7 objects and
link them with clinical data from the patient EHR
Decision Support Applications
Knowledge(KBs, Ontologies, registries,
reference DBs, Papers, etc.)
the challenge…
Encapsulation by
predefined &
constrained
bioinformatics
schemas
Bubbling-up is
done continuously
by specialized CDS
applications
re-analysis
© Amnon Shabo (Shvo)
XML Fusion: Encapsulation of Raw Genomic Data
27
Raw
gen
om
ic d
ata
rep
resen
ted
in
Bio
info
rmati
cs m
ark
up
HL
7 v
3 X
ML
© Amnon Shabo (Shvo)28
EHR PHR
Genomics
Enable
Decision Support
e.g., risk analysis
algorithms
Family Health History – A Convergence Test Case
© Amnon Shabo (Shvo)
Proof of Concept
29
PHR:
HHS Surgeon General FHH tool Patient enters data
Data exported as HL7 Pedigree instance
CDS:
HughesRiskApps Patient data from Surgeon General tool is imported
Pedigree is constructed
Risk assessment algorithms run
© Amnon Shabo (Shvo)30
Family Health History – HITSP Recommendations
Now
Recommended
also by MU
© Amnon Shabo (Shvo)
Agenda
Translational Medicine and informatics
Universal Exchange Language?
Translational Health Information Language!
Semantic warehousing of health information
The vision - Independent Health Record Banks
31
© Amnon Shabo (Shvo)32
ResearchMetadata:
ISA
Scientific Knowledge:
Nano-publication
Omics Data:iPOP
Bridge Standards:
(e.g., GTR, DIR, PHMR)
Towards a Translational Health Information Language
Key DataEncapsulatedor referenced
Decision Support:Health eDecisions
(HeD)
Compositional Syntax:HL7 Clinical Statement & CDA,
openEHR
Constraining Syntax:ADL (AML)UML+OCL
Clinical Data:SemanticHealthNet
KN
OW
LED
GE
Raw
& m
ass
or
rese
arch
DAT
APo
int
of
Car
e D
ATA
Image Data:DICOM
Device Data:Continua & IEEE
Medical Terminologies:UMLS, epSOS TAS & SemanticHealthNet
Profiling:IHE
openEHR
provenance
findings
reasoning
utilization
Bubble-up
reasoning
© Amnon Shabo (Shvo)
Nano-publication – structuring the „narrative articles‟ using a nanopublication format, which is the smallest
unit of publication: a single assertion, associating two concepts by means of a predicate in machine-readable
format with proper metadata on provenance and context (http://nanopub.org/wordpress/)
HeD = The US ONC S&I Framework Health eDecisions Initiative (HeD), developing CDS Services standards
and CDS content for use as “knowledge artifacts”
(http://wiki.siframework.org/Health+eDecisions+Homepage)
UMLS = Unified Medical Language System (http://www.nlm.nih.gov/research/umls/)
epSOS TAS = Terminology Access Service (http://www.epsos.eu/)
ISA = Investigation – Study – Assay (http://isa-tools.org/)
iPOP = Integrative Personal Omics Profile (http://www.ncbi.nlm.nih.gov/pubmed/22424236)
GTR = Genetic Testing Report, developed by HL7 Clinical Genomics
(http://www.hl7.org/dstucomments/showdetail.cfm?dstuid=95)
DIR = Diagnostic Imaging Report (http://www.hl7.org/implement/standards/product_brief.cfm?product_id=13)
PHMR = Personal Healthcare Monitoring Report
(http://www.hl7.org/implement/standards/product_brief.cfm?product_id=33)
SemanticHealthNet = EC FP7 project, harmonizing major standards in health
(http://www.semantichealthnet.eu/)
CDA = Clinical Document Architecture
(http://www.hl7.org/implement/standards/product_brief.cfm?product_id=7)
openEHR = Open source for EHR, based on CEN EN13606 spec for EHR (http://www.openehr.org/)
ADL = Archetype Definition Language (http://www.openehr.org/downloads/ADLworkbench/learning_about)
UML = Unified Modeling Language (http://www.uml.org/)
OCL = Object Constraining Language (http://www.omg.org/spec/OCL/)
IHE = Integrating the Healthcare Enterprise (http://www.ihe.net/)
* THIL Acronym Glossary
33
© Amnon Shabo (Shvo)
Nanopublication
“A nanopublication is the smallest unit of publishable information: an assertion about
anything that can be uniquely identified and attributed to its author. Nanopublications
support fine-grained attribution to authors and institutions, with the intention of
incentivising the reuse of knowledge. These assertions are organized using (a) the
domain semantics drawn from community ontologies and information models, and (b)
nanopublication representation model permitting provenance, annotation, attribution
and citation.” *
Nano & Micro-publication may
be important to machine learning
technologies as it surfaces up
the essence of a publication
while you could also crunch the
full text for possible nuances
* Source: Barend Mons et.al, The Open PHACTS RDF/Nanopublication Working Group V1.81 26-03-2012
34
© Amnon Shabo (Shvo)
The ISA Format for Assay‟s Metadata
ISA:
-Investigation
-Study
-Assay
ISA captures and
communicates the
complex metadata
required to interpret
experiments employing
combinations of
technologies, and
the associated data files
35
Source: Sansone et. al.
Toward interoperable
bioscience data.
Nature 2012.
© Amnon Shabo (Shvo)
HeD* CDS Guidance Service Use Case
36
Source: USA ONC Standards & Interoperability Framework - Use Case Development and Functional Requirements for Interoperability
CDS Guidance Service; HeD = ONC Health eDecisions initiative
© Amnon Shabo (Shvo)
HeD Lifecycle of a CDS Knowledge Artifact
37
Source: USA ONC Standards & Interoperability Framework - CDS Knowledge Artifact Schema Implementation Guide, October 2012
Such flagging could help machine learning systems recognize
faster what knowledge artifacts become obsolete
© Amnon Shabo (Shvo)
The rise of the 'narciss-ome„…
Transformative paper in the Cell Journal
Reviewed in Nature
(http://www.nature.com/news/the-rise-of-the-narciss-
ome-1.10240)
iPOP – Integrative Personal Omics Profile Our personal omics change over time!
Longitudinal examinations of genome, proteome, metabolome,
autoantibodies, etc. of an individual (one of the authors)
Monitor healthy and disease states
Predict and act accordingly (the data predicted diabetes and diabetes
was diagnosed; life style changes make it manageable!)
38
© Amnon Shabo (Shvo)39
Leverage a landmark
paper* that presented
the benefits of a variety
of omics „tests‟ done on
a healthy individual
Results were packaged
as an iPOP – Integrative
Personal Omics Profile
Should be
standardized and be
part of the
translational EHR !!
• iPOP paper can be found here:
https://register.mssm.edu/seminar/CLR9011/downloads/2013JUN20/8.15.13-dudley1.pdf
Packaging Omics „Testing‟ on Individual Level
© Amnon Shabo (Shvo)40
• Rethinking Domain Specific Standards• Wouldn‟t it be better if we strive to a universal health
information language?!
• Domain standards will then be just different usages of that
language
• Current situation is that various domain standards are
semantically inconsistent
• Are standards for exchange only?!• Natural languages are used for both communication and self
writing
• Similarly, standards could be used beyond exchange for
internal representation
• Differences between representations for exchange and
internal needs often make the exchange of data not fully
semantic interoperable
Rethinking Standards and Languages…
© Amnon Shabo (Shvo)
Agenda
Translational Medicine and informatics
Universal Exchange Language?
Translational Health Information Language!
Semantic warehousing of health information
The vision - Independent Health Record Banks
41
© Amnon Shabo (Shvo)
The HyperGenes Project
Integrated demographic, clinical and environmental data with
genomic data on Essential Hypertension (EH) from well-
established historical cohorts in Europe (~25)
Genome-wide Association Study (1m SNPs for each subject;
~12,000 subjects)
Created disease models that incorporated the new findings
Created a common infostructure to both research and CDS!
42
Translational effort!
© Amnon Shabo (Shvo)43
43
Translational Health Info-structure
43
Health Data Warehouse
DataStaging Repository
(Curation, normalization, de-identification,
standardization and integration)
KnowledgeAnalytics Hub
(rules, guidelines, functions, algorithms,
literature, evidences, insights, etc. )
Annotate
HEALTHCARE
iEHR*
Biomedical Analytics
En
rich
& E
xp
ort
BI AnalysisStudy Analysis
Data mining
New
kn
ow
led
ge
, p
refe
rab
ly s
tan
da
rdiz
ed
Pa
tien
t/su
bje
ct d
ata
, pre
fera
bly
sta
nd
ard
ize
d
CGS*-based
Clinical Data
Information Marts(e.g., transMart)
Standards-based
Data
Lo
ad
* iEHR Interoperable electronic health record
* C-CDA Consolidated CDA – common templates of clinical documents
* DtC Direct-to-Consumer testings and monitoring, etc.
* CGS „Clinical Genomics Statement’ standard representation
* HPO Human Phenotype Ontology
* THIL Translational Health Information Language
Images
(e.g., DICOM)
Templates
(e.g., C-CDA*)
Existing sources
of knowledge
Patient generated data
and DtC* output data
Transform
Encapsulate
reference
THIL*
Ontologies
(e.g., HPO*)
Mass & Raw DataCommon formats in optimized storage (cloud-enabled)
Omics
(e.g., VCF)
Sensors
(e.g., IEEE)
P
r
i
v
a
c
y
&
S
e
c
u
r
i
t
y
© Amnon Shabo (Shvo)
Warehouse Information Models
A BII warehouse can have multiple information
models, each dedicated to a specific solution/project
A warehouse information model is created based on
selection of generic standards such as HL7 CDA and-
Constraining the standards
Interrelating the standards
The Hypergenes Information model is based on HL7
CDA, Pedigree and Genetic Variation standards that
were (1) constrained and (2) interrelated in a way that
there is a single CDA & Pedigree per subject and
multiple Genetic Variations (see next slide)44
© Amnon Shabo (Shvo)
Hypergenes Warehouse Information Model
45
CDA TemplateHeader
subject id
….
Body
reference1 to GV
reference2 to GV
reference2 to PD
….
clinical &
environmental
observations
GV Templatesubject id
Genomic
Observations
Phenotypes
Raw Genomic Data
subject id
HapMap / BSML / MAGE
Relational schemas
optimized for persistency
Encapsulation or
referencing
Pedigree Template
On
e in
sta
nce
pe
r su
bje
ct
subject id
Genotype
Phenotype
Observed Interpretive
or
*Template is a set of constraints specific to a project/solution
Health Records Disease Model
© Amnon Shabo (Shvo)
BII Ontology of Essential Hypertension
46
© Amnon Shabo (Shvo)
ETL Processes into the BII
47
Terminology
Servers
Data Warehouse
Normalization,
Standardization
& Validation
Build CDA Store
HL7 Persistors
ONTOLOGY
Cohort
Data
Harmonization Data Extraction
© Amnon Shabo (Shvo)
Rich Expressiveness vs. Interoperability
There‟s currently a tension between the two goals:
The more expressive it is -
the less interoperate it is
Expressive structures lead to optionality
Possible solution: Constraints
Archetypes with EHR 13606 (European/ISO Standard)
HL7 Templates (no formalism has been agreed upon)
OCL is examined
GELLO (OCL-based) for clinical decision support
Public registries of templates
Need dedicated IT to provide registry services!
In research settings
Granularity, specificity and heterogeneity of data is higher
The same constraining technologies allows for
capturing similarities while preserving disparities
Cohorts harmonization could be reassessed depending on analysis results,
creating new computed fields and aggregates
48
© Amnon Shabo (Shvo)
Hypergenes Data Standardization
49
OWL Ontology
Standard-based
Instances
(e.g., CDA)
Instance
Generation
Engine
Data Source
Mapping local Vocabularies
Template Model
Conform to the
Template Model
Representing constraints
Java API
Ad
ap
ter
CTS
Using MDHT (UML+OCL) to
represent and validate
constraints
© Amnon Shabo (Shvo)
Hypergenes BII Features for Mart Creation
50
User
Schema
RDF Store
RIM-based
XML Database
Mass Datae.g., Genomic; Images;
Sensor... Non-XML format
Promotion
SPARQL API
Data Mart
Relational
Data Mart
Semantic Web Tools &
Applications
© Amnon Shabo (Shvo)
Agenda
Translational Medicine and informatics
Universal Exchange Language?
Translational Health Information Language!
Semantic warehousing of health information
The vision - Independent Health Record Banks
51
© Amnon Shabo (Shvo)
Motivation and Passion…
52
KNOWLEDGE
We don‟t know much
more than we know
Case-based
reasoning
The case is the
lifetime EHRHealth
Record Banking
DATA
Individual‟s Data
Fragmentation
Decision making
Is hard!
Humans
Machines
Case-based
(tacit) knowledgeTrial & error
Sustainability
© Amnon Shabo (Shvo)
From Medical Records to the EHR…
53
Medical
records timeco
nte
nt
From medicine to health…
Longitu-
dinal,
possibly
life long
Cross-institutional
Medical recordEvery authenticated
recording of medical
care (e.g., clinical
documents, patient
chart, lab results,
medical imaging,
personal genetics, etc.)
Health recordAny data items related to the
individual’s health (including
data such as genetic, self-
documentation, preferences,
occupational, environmental,
life style, nutrition, exercise,
risk assessment data,
physiologic and biochemical
parameter tracking, etc.)
Longitudinal (possibly lifetime) EHRA single computerized entity that continuously aggregates and summarizes the medical and health records of individuals throughout their lifetime
Should Also
include
genetic data
© Amnon Shabo (Shvo)
EHR – layers of temporal and summative data
54
Temporal Data
Summative Info E H
R
Evidence
Sensitivities | Diagnoses | Medications | etc.
Medical records: charts, documents, lab results, imaging, etc.
Topical
summary
Non-
redundant
lists
On
go
ing
extr
acti
on
an
d s
um
mari
zati
on
Personal genetic
variations
Genetic-based
disorders
Haifa Research Lab
© Amnon Shabo (Shvo)55
New
Legislation
Operational
IT Systems
Provider
Medical
Records
Archive-
Independent
Health Records
BankOperational
IT Systems
Provider
Medical
Records
Archive-
Operational
IT Systems
Provider
Medical
Records
Archive-
Independent
Health Records
Bank
Standard-based
Communications
Operational
IT Systems
Provider
Standard-based
Communications
Operational
IT Systems
Provider
The Conceptual Transition
Current constellation New constellation
PatientIndividual
© Amnon Shabo (Shvo)56
The End
Thanks for your attention!
Questions?
Comments: [email protected]
Towards a universal health information language
Revolutionizing healthcare through independent lifetime health records