a hadoop ecosystem to advance clinical research and practice

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36”x60” professional poster. You can use it to create your research poster and save valuable time placing titles, subtitles, text, and graphics. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. To view our template tutorials, go online to PosterPresentations.com and click on HELP DESK. When you are ready to print your poster, go online to PosterPresentations.com. Need Assistance? Call us at 1.866.649.3004

Object Placeholders

Using the placeholders To add text, click inside a placeholder on the poster and type or paste your text. To move a placeholder, click it once (to select it). Place your cursor on its frame, and your cursor will change to this symbol . Click once and drag it to a new location where you can resize it. Section Header placeholder Click and drag this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Template FAQs

Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on LAYOUT to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25. Modifying the color scheme To change the color scheme of this template go to the DESIGN menu and click on COLORS. You can choose from the provided color combinations or create your own.

©"2013"PosterPresenta/ons.com"""""2117"Fourth"Street","Unit"C"""""Berkeley""CA""94710"""""[email protected]

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

Introduction Facebook, Twitter, LinkedIn and Yahoo share the same underlying infrastructure, Apache Hadoop. All three of these applications consume, process and store millions of records consisting of structured, unstructured, image and video data. As healthcare data shares many of the characteristics of the data found in Facebook, Twitter, LinkedIn and Yahoo, Hadoop should be an ideal environment for the ingestion, storing and utilization of healthcare data. Methods A virtual Apache Hadoop version 1.0 infrastructure consisting of a single NameNode server and four Task Node servers was set up within the UCI Medical Center data center. Ubuntu Linux running on VMware was the chosen OS. The Hadoop modules utilized were: Hadoop Common, Hadoop Distributed File System (HDFS), MapReduce, Pig, Mahout and Zookeeper. Java scripted routines processed the legacy data. Mirth HL7 listener and a java scripted routine processed the HL7 data. Results The legacy data of 1.2 million patients, contained in 9 million patient medical records was successfully ingested into the Saritor Hadoop Distributed File System. For researchers the drag and drop query and visualization tool allowed for the visualization of the legacy data. For clinicians in patient care complete patient records were retrieved via a web browser. HL7 messages from all source systems, physiological monitoring data in one-minute intervals, and ventilator data in one-minute intervals and EMR generated data was ingested and stored. Algorithms for sepsis, hospital acquired conditions and 30 day readmits are able to be built into Mahout for real time surveillance. Discussion Our initial findings demonstrated the Hadoop ecosystem is well suited for the ingestion, storage and retrieval of both legacy EMR data and runtime EMR data. Minimal programing is required to process legacy data and the processing of runtime EMR data requires the cloning of existing interfaces. The functionality of real time clinical surveillance presents unlimited use cases. Hadoop is an ecosystem that is affordable, scalable, highly available, allows for clinical research and clinical practice to coexist in the same system.

Charles"Boicey,"MS,"RNFBC1,"Lisa"Dahm,"PhD1,"David"Gonzalez1,"Mahesh"Rangarajan2,"Rushipriya"Panda2,"Jeff"Markham3"

"1University"of"California,"Irvine,""2CMC"Americas,"3Hortonworks""

Saritor:"A"Hadoop"Ecosystem"to"Advance"Clinical"Research"and"Prac/ce""

The"Clinical"and"Transla/onal"Science"Awards"(CTSA)"is"a"registered"trademark"of"DHHS.""

Feed$forward*Learning*

New$Learning$(Pa-ern$Refinement)$

Historical$Data$Sets$

Hypothesis$/$Algorithm$Model$(Core$Engine$with$the$EquaEons$/$Analysis)$

StaEsEcal$Techniques$

Publish$new$version$to$Repository$Output$/$Results$(Actual)$

Input$Data$A-ributes,$Rules,$Parameters$

RealLEme$Data$Feeds$

Create*layers*of*knowledge*that*improves*the*

understanding,*one*layer*at*a*;me*

Training*and*Test*Data*sets*for*

tes;ng*the*model**hypothesis*

Modeling)Possibili-es:)Linear*Equa;on*(to*start*with)*Regression*Models*(Linear*/*Mul;variate)*Neural*Networks*(Layers*of*knowledge)*

Use*the*new*baseline*for*real$;me*analysis*of*the*

incoming*feeds*

Training'Data'Set'

Test'Data'Set'

Diagnosis'PaIerns'Repository'

Input'Data'AIributes,'Rules,'Parameters'

Hypothesis'/'Algorithm'Model'(Core'Engine'with'the'Equa#ons'/'Analysis)'

Analyze'Output'for'Model'Behavior''(Actual'versus'Desired)'

Iden#fy'Improvements'

Feedback'and'Refine'the'Model'

Matches'Expecta#on'

Release'for'Tes#ng'the'Model'

Output'/'Results'(Actual)'

Input'Data'AIributes,'Rules,'Parameters'

Hypothesis'/'Algorithm'Model'(Core'Engine'with'the'Equa#ons/'Analysis)'

Analyze'Output'for'Model'Behavior''(Actual'versus'Desired)'

Iden#fy'Improvements'

Feedback'and'Refine'the'Model'

Matches'Expecta#on'

Baseline'the'PaIern'

Publish'new'version'to'Repository'

Output'/'Results'(Actual)'

Not$Sa'sfactory$ Sa'sfactory$Result$ Not$Sa'sfactory$ Sa'sfactory$Result$

Available'Data'Set'

Sta#s#cal'Techniques'

Sta#s#cal'Techniques'

Algorithm)Management)

Cohort"Discovery"

Legacy"Data"Visualiza/on"

Algorithm"Management"

FeedFforward"Learning"

Hadoop"Distributed"File"System"(HDFS)"

Hive"

User/Role"Based"Access"Control"

Neo"4j""

Graph"Database"

Mahout"Compute"paêrn"

MapReduce""

Generate"and"filter"raw"data"from"HDFS"

TDS"(Legacy"System)"•  22"Years"Pa/ent"Data"•  1.2M"Pa/ents"•  9M"Records"•  Orders"•  Labs"•  Transcribed"Results"•  Pa/ent"Record"

HL7"Feed"•  Lab"Results ""•  Physiological"Monitors"•  Ven/lators"•  Transcribed"Reports"•  Radiology"Results"•  Endoscopy"Results"•  Orders"

EMR"Generated"Data"•  RN"Documenta/on"•  Provider"

Documenta/on""

External"Data"•  Home"Monitoring"•  Personal"Health"Record"•  Social"Media""""""""""""*Twiêr"""""""""""""*Foursquare"""""""""""""*Yelp"""""""""""""*RSS"&"Blog"

Mongo"DB""

Store"data"matrix"for"paêrn"recogni/on"

Query"Language""

Clinician"Viewer"•  Events"(Sepsis)"/"Chronic"

Disease"Monitoring"•  Legacy"Data"Viewer"•  Predic/ve"Analy/cs"

Research"Viewer!•  Legacy"+"EMR"Data"•  Cohort"Discovery"•  Rela/onship"/"Graph"Analysis"•  DeFiden/fied"at"presenta/on"

Quality/Opera/ons"Viewer"•  Pa/ent"Throughput"(RTLS)"•  Quality"Measures"•  Pa/ent"Engagement"•  Asset"U/liza/on"Metrics"

Saritor"Business"Services"Request"/"Reply"processing"Engine"(HTML"5"/"Resiul"Services"/"JSON"driven)"

External"Interfaces"“Saritor!Surround”!Ecosystem!

a hadoop ecosystem to advance clinical research and practice

Documents