an open -access high performance computing system … · • scidb • integration • r •...

39
AN OPEN-ACCESS HIGH PERFORMANCE COMPUTING SYSTEM FOR DEVELOPING RESEARCH APPLICATIONS (APPS) Mohammad Adibuzzaman, PhD Research Scientist [email protected] Mohammad Adibuzzaman 1 1 Regenstrief Center for Healthcare Engineering, Purdue University, West Lafayette, USA

Upload: others

Post on 14-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

AN OPEN-ACCESS HIGH PERFORMANCE COMPUTING SYSTEM FOR DEVELOPING RESEARCH APPLICATIONS (APPS)

Mohammad Adibuzzaman, PhD

Research Scientist

[email protected]

Mohammad Adibuzzaman1

1Regenstrief Center for Healthcare Engineering, Purdue University, West Lafayette, USA

Page 2: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Use cases• Hemorrhage detection• AF Risk stratification

• Computational platform• What's next?

OUTLINE

Page 3: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

Hemorrhage Detection

Page 4: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

HEMORRHAGE, AHE AND DEATH

• Hemorrhage leads to acute hypotensive episode (AHE) or shock, and shock leads to death.

Page 5: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PROBLEM DESCRIPTION

• Hemorrhage results in over 80% of operating room deaths after major trauma [2]

• Almost 50% of deaths in the first 24 hours of trauma care are due to hemorrhage [2]

• Heart rate, mean arterial pressure, and shock index poorly predict the need for continued resuscitation and the effectiveness of treatment [1]

[2]

A METHOD IS NEEDED TO IDENTIFY PATIENTS THAT REQUIRE IMMEDIATE MEDICAL CARE

Page 6: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Require ~30 heartbeats of baselinepatient data

• Estimates the remaining proportion of physiological reserve available to compensate for loss of blood volume [3]

• Compares individual waveforms to a large library of reference waveforms (using lower body negative pressure (LBNP))

COMPENSATORY RESERVE INDEX (CRI)

STATE OF THE ART

[3]

Page 7: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Immature swine (N=7)

• Underwent continuous hemorrhage of 10 ml/kg over 30 minutes as SBP was recorded [4]

• Eigenvalues were calculated for each window of 2000 samples (20 seconds)

• Correlation coefficients determined between mixing rate and each vital sign (HR, SBP, PP, shock index)

ANIMAL STUDY: DATA

Page 8: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

ALGORITHM

State 2 (75-80 mmHg)

State 3 (80-85 mmHg)

State 1 (70-75 mmHg)

0.82 0.18 00.03 0.93 0.04

0 0.11 0.89

[4]

Arterial Blood Pressure Markov Chain Transition Probability Matrix

Page 9: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Mixing rates from each successive transition probability matrix are compiled into a single graph [4]

ANIMAL STUDY: RESULTS

[4]

[4]

Page 10: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

ANIMAL STUDY: RESULTS

[4]

CORRELATION COEFFICIENTS

Page 11: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

TRANSLATIONAL STUDY: CHALLENGE DATA

• Minute by minute data • Acute hypotensive episode (AHE) is defined a period of 30 minutes or

more during which at least 90% of the mean arterial pressure (MAP) measurements were at or below 60 mmHg

• Training data• Four groups, 15 records for each group

• GROUP H1 (ACUTE HYPOTENSIVE EPISODE IN FORECAST WINDOW, TREATED WITH PRESSORS)

• GROUP H2 (AHE IN FORECAST WINDOW, NOT TREATED WITHPRESSORS)

• GROUP C1 (RECORDS NOT CONTAINING ACUTE HYPOTENSIVE EPISODES)• GROUP C2 (AHE, BUT NOT IN FORECAST WINDOW)

CINC CHALLENGE DATA 2009

Page 12: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

TRANSLATIONAL STUDY: MIMIC DATABASE

• MIMIC II Waveform Database Matched Subset [6]

• Challenge data in Matched Subset

• Clinical records (SBP, DBP, MAP, HR) – 1 reading per minute [7]

• Waveform records (ECGs, continuous blood pressure waveforms) – 125 samples per second [7]

[6]

MIMIC DATA

Page 13: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• 10 minutes prior to onset of forecast window to establish baseline• 60 minutes of data in forecast window (AHE ~30 minutes into

forecast window)

TRANSLATIONAL STUDY: PATIENT DATA

Observation Window(noncritical, 10 minutes)

Forecast Window(60 minutes)

T0(onset of forecast window)

Example Patient BP Waveform Data (125 Hz, ~72 hours total data, T0 known):

AHE of Interest (~30 minutes into Forecast Window)

MIMIC II WAVEFORM DATABASE MATCHED SUBSET

Page 14: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PATIENT MR WAVEFORMS

Page 15: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PATIENT SI WAVEFORMS

Page 16: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PATIENT SBP WAVEFORMS

Page 17: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

Stroke risk stratification

Page 18: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Absolute/ relative risk/ risk stratification of ischemic stroke/TIA among patients with an established diagnosis of persistent AF and Paroxysmal AF.

• Atrial fibrillation (AF) has been shown to be an independent risk factor for an ischemic stroke [1].

• Current approaches include a scoring method for risk assessment using different physiological conditions such as the CHADS2 [2, 3] and, CHA2DS2-VASc score [4].

STROKE, AF AND MORTALITY

Page 19: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

COHORT SELECTION

Page 20: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning
Page 21: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

INITIAL RESULT

Page 22: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

Data Infrastructure

Page 23: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

RESEARCH TO TRANSLATION: BIG DATA IN HEALTHCARE

Patient data•EHR•Device•Genomics

Integration De-identification Data broker

High Performance Computing

Analytics Visualization

Page 24: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

RESEARCH TO TRANSLATION: BIG DATA IN HEALTHCARE

Big Data Preprocess

High Performance Computing

Analysis/CodePublication

Reproduce/Evidence Based Medicine/FDA

Approval

Page 25: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

JANITOR WORK?

Page 26: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PROPOSED ARCHITECTURE

Big Data

High Performance Computing

Analysis

Publication

Reproduce/Analysis

Publication

Evidence Based Medicine/FDA

Approval

Page 27: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

MULTI-PARAMETER INTELLIGENT MONITORING IN INTENSIVE CARE (MIMIC II)

Clinical Database

Waveform Database

MIMIC III

• 58,000 Hospital Admission• 2001-2012• Nurse entered physiology• Medications• Laboratory data• Nursing notes• Discharge notes• Format: CSV, SQL• ~40GB

• 23,180 Records• 2001-2012• Waveforms

• ECG• Blood pressure• Plethysmography

• Format: Text, Matlab• ~3TB Compressed

4,897 Waveform and 5,266

Numeric records matched with 2,809 clinical

records

Matched Subset

Page 28: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Clinical• PostgreSQL• CSV

• Waveform• Physiobank ATM (one by one)• Rsync (batch) (install rsync in Ubuntu by the command)

• sudo apt-get -y install rsync

• Matlab WFDB (Waveform database) toolbox • rdsamp('mimic2wdb/31/3141595/3141595_0008')

MIMIC III ACCESS PLATFORM

Page 29: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

1. High level browsing and exploration of the database• How many patients with Acute Kidney Injury

2. Integration of heterogeneous data sources• SQL and Waveform or Text

3. Cohort selection according to research goal based on clinical criteria, • At least 8 hours of continuous minute by minute HR and BP trend within the first 24 hour of

admission4. Reproduce different machine learning and statistical algorithms.

• Logistic Regression• Multivariate Regression• Artificial Neural Network

5. No parallelism

LIMITATIONS OF CURRENT PLATFORM

Page 30: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

RESEARCH WITH MIMIC DATABASETABLE 1. RESEARCH PROJECTS WITH MIMIC DATABASE. ‘C’ FOR CLINICAL AND ‘W’ FOR WAVEFORM DATABASE.

Citation Research Problem Methods Cohort Selection Criteria Data Population Size

[5] Mortality Prediction with acute kidney injury (AKI)

Multivariable Regression

ICD9 = AKI and ICU stay ≥3 days

C 1,400

[6] Local customized mortality prediction, outcome is survival to hospital discharge

Logistic Regression (LR), Bayesian Network (BN), Artificial Neural Network (ANN)

ICD9 = Acute Kidney Injury (AKI) AND/OR ICD9= subarachnoid hemorrhage (SAH)

C 1,400 for AKI, 223 for SAH

[7] Association of hypermagnesemia and systolic blood pressure (SBP)

Sequential Multivariable Linear Regression

Different exclusion criteria based on missing data, clinical diagnosis

C 10,521

[8] Whether age, co-morbidities and clinical context modulate the effect of transfusion on survival, outcome is 30 day or 1 year mortality

Logistic Regression Admitted to MICU, SICU, CCU, or CSRU. Occurrence of nadir hematocrit between 20% and 30% and age ≥ 30

C 9,809

[9] Study the link between proton-pump inhibitor use and low serum magnesium concentration

Sequential Multivariable Regression

Exclusion criteria include ICD9 code that might influence PPI use, outcome is first serum magnesium level recorded within 36h of admission

C 11,490

[10] Compare general and fluid based resuscitation and vasopressor use in ICU

Univariate Regression To disease based groups: 802 with pneumonia and 143 with diagnosis of pancreatitis; Inclusion criteria includes receiving >250 mL/hour solution for more than one hour

C 2,944

Most of the

studies use only Clinical

database

Page 31: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Platform• Clinical

• PostgreSQL• Waveform

• SciDB• Integration

• R• Interface

• R/Shiny

• SciDB Capabilities• CROSS_JOIN: Combine two arrays, aligning cells with equal dimension values• MERGE: Union-like combination of two arrays• WINDOW: Apply aggregates over a moving window

• window(input, NUM_PRECEDING_X, NUM_FOLLOWING_X, NUM_PRECEDING_Y...,aggregate(ATTNAME) [as ALIAS] [,aggregate2...])

• SORT: Unpack and sort• UNIQ: Select unique elements from a sorted array• KENDALL, PEARSON, SPEARMAN: Correlation metrics• Distributed Computing

PROPOSED ARCHITECTURE

Page 32: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

PROPOSED ARCHITECTURE

WaveformDatabase

‘R’/Shiny

SciDB (Distributed DB) ICU Time Series

Bash/ Python

Postgres (Single Server DB)

Clinical Data

Page 33: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

MIMIC_NumericMIMIC_Metadata

Elapsed_Time

File_ID

File_ID

Start_Time: datetime, mimiciii_id: int32

II:float, V: float, resp: float,…

WAVEFORM DATABASE DESIGN IN SCIDB

Page 34: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• 12 cores (24 hyperthreaded cores).

• 6TB disk

• 64G RAM

• 8 instances of SciDB

HARDWARE

Page 35: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• https://mimic.catalyzecare.org:3838/sample-apps/bcollar/measurementErrors/

USE CASE ONE

Page 36: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• https://mimic.catalyzecare.org:3838/sample-apps/zhou482/Adverse%20Effect/

USE CASE TWO

Page 37: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Sustainability• Privacy/Security

• Scalability

ISSUES TO BE ADDRESSED

Page 38: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

• Roger Mark, Professor, MIT

• Alistair Johnson, Post-doctoral Researcher, MIT

• Elias Bareinboim, Assistant Professor, Purdue University

• Yonghan Jung, PhD Candidate, Purdue University

• Yiyan Zhou, Undergraduate Student, Purdue University

• Brett Collar, Under graduate Student, Purdue University

• Yao Chen, PhD Candidate, Purdue University

• Ananth Grama, Professor, Purdue University

ACKNOWLEDGEMENT

Page 39: AN OPEN -ACCESS HIGH PERFORMANCE COMPUTING SYSTEM … · • SciDB • Integration • R • Interface • R/Shiny • SciDB Capabilities • CROSS_JOIN: Combine two arrays, aligning

QUESTIONS