RapidMiner
An entrance to explore MIMIC-III ?
Sven Van Poucke, MD
December 8th 2015
Disclosure
Relevant financial arrangement or affiliation with RapidMiner that could be perceived as a real or apparent conflict of interest in the context of the subject of this presentation:
Academia License: RapidMiner Studio, Server, Radoop
“RapidMiner Academia provides free or substantially discounted use of the commercial version of our platform to students, professors, researchers and other academics at educational institutions.”
No reimbursement for consultancy and/or travel expenses.
François Englert Theoretical physicist 2013 Nobel prize
laureate (shared with Peter Higgs)
Christain de DuveNobel prize 1974
(shared with Albert Claude and George E. Palade)
Corneille Heymans
1938
1921
Paul Janssen 1926–2003
Ziekenhuis Oost-LimburgGenk Belgium
6
Big data gap between actual and potential data usage ?
• healthy living, wellness, sport
• health care as industrial process
• public health
• MEDICINE
data scientist
budget
domain expert
“Evidence 1.0”
• > Physicians as non coding scientist.
• The struggle to apply new medical knowledge….
• Most evidence regarding the effectiveness of medical innovations has been generated by studies involving patients who differ from my patients!
Cross Industry Standard Process for Data Mining, CRISP-DM
Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing 2000; 5:13—22
“Evidence 2.0”
Patient characteristics, descriptive statistics…
Correlation, causation? Feature selection….
Modeling, validation, ensemble methods. Helps model future (vaccination)
How different actions will affect therapeutic performance and point them toward the optimal choice.
https://data-analytics.ghost.io/this-is-data-science/
Querying MIMIC-III(II)
Design perspective main screen
“Evidence 2.0”
Querying MIMIC-III• 1. Retrieve tables from Repository
Querying MIMIC-III• 1. Retrieve tables from Repository
Querying MIMIC-III• 2. Read database (SQL)
Querying MIMIC-III• 2. Read database (SQL)
Querying MIMIC-III• 2. Read database (SQL)
Querying MIMIC-III• 2. Read database (SQL)
Querying MIMIC-III• 3. Access by Radoop-Hadoop ecosystem
Radoop-Hadoop
Radoop-Hadoop
Descriptive analytics• Sample
• Filter
• Selection
• Joins
• Generate attributes
• Charts
• ….
Descriptive analytics: sample
Descriptive analytics: sample
Descriptive analytics: sample
Descriptive analytics
Descriptive analytics
Descriptive analytics
Descriptive analytics
Descriptive analytics
Descriptive analytics: generate attribute
Descriptive analytics
Descriptive analytics
Descriptive analytics: filter
Descriptive analytics: filter
“Evidence 2.0”
Diagnostic analytics
Diagnostic analytics
Diagnostic analytics
Predictive analytics
Diagnostic analytics
“Evidence 2.0”
Predictive analytics
Predictive analytics• Ensemble learning methods in RapidMiner
(Decision Stump, AdaBoost, Random Forest, Bagging, W-J48, Decision Tree, Naive Bayes, Stacking, Logistic Regression, Support Vector Machine).
Predictive analytics• AUPRC curves for the 3 best models.
Random Forest (RF) in association with Backward Selection (BS) and 69 features (left), with Forward Selection (FS) and 8 features (middle) and Gini Selection (GS) and 5 features.
Predictive analytics
Prescriptive analytics
• Clinical decision support
• Clinical decision automation
What is next?
• Text mining:
• Image mining
• Time series
• Web mining
Web Mining: Twitter
Web Mining: Twitter
RapidMiner Streams
RapidMiner Server
• The Collaboration tier of the RapidMiner Server for a team of data scientists and analysts.
• It provides a server-based repository for sharing data sources, analytical processes, predictive applications, and best practices.
• Work together more efficiently while accessing, reusing, and sharing content in a version-controlled, secure, and centrally managed environment.
RapidMiner Server
• The RapidMiner Server Computation tier is optimized for performance and lets you run big jobs on enterprise hardware anywhere. With a few clicks you can push jobs to the server and continuously get progress feedback, while freeing up your personal system for other important work.
Future projects: RM server
• RapidMiner Server Deployment tier. Easily integrate with cloud, business, and IT systems through an open API, a rich set of connectors, and a unique ability to deploy processes as web services. Set up scheduled processing, continuously score data in real-time, and graphically build interactive predictive and visual web applications to maximize the value of predictive analytics in your business.
RapidMiner Dashboard
Sven Van Poucke, MD Department of Anesthesiology, Critical Care, Emergency Medicine, Pain Therapy
Ziekenhuis Oost-Limburg Genk, Belgium
RapidMiner An entrance to explore MIMIC-III ?