health risk prediction via mining big health data -...
TRANSCRIPT
-
Health Risk Prediction via Mining Big Health Data
Vincent S. [email protected]
Department of Computer ScienceNational Chiao Tung University
Taiwan
Taiwan-Italy bilateral workshop on Smart City, 2015
-
22
Emerging Needs in Smart City Development
-
3
Outline
Brief Bio Sketch Introduction Traditional Health Risk Assessment Health Risk Mining and Prediction
Some Recent Developments Large-Scale Population-based Health Data Mining Disease Risk Patterns Mining Health Risk Mining for Chronicle DiseasesEarly
Prediction & Monitoring for Disease Outbreak Concluding Remarks
-
44
Brief Bio Sketch Dr. Vincent S. Tseng Professional Positions
Professor, Dept. Computer Science, National Chiao Tung University, Taiwan Chair, IEEE Computational Intelligence Society (CIS) Tainan Section Review Board for government units Taiwan, including Ministry of Science
and Technology, Ministry of Health and Welfare Editor board: IEEE Transactions on Knowledge and Data Engineering
(TKDE), IEEE Journal on Biomedicine and Health Informatics (JBHI), ACM Transactions on Knowledge Discovery from Data (TKDD), etc.
President, Taiwanese Association for Artificial Intelligence (2011-2012) Director, Institute of Medical Informatics, NCKU, Taiwan (2008/8-2011/7) Director, Medical Informatics Center, NCKU Hospital (2004-2007)
Has published more than 300 papers in referred journals/conferences related to data mining and intelligent computing; Held/filed more than 15 patents in USA and ROC
Has conducted 50+ academic/industrial research projects
-
55
Development of Intelligent and High-Performance Big Data Mining Techniques
Applications in Emerging Domains
Biomedical/Health Data Mining
Mobile and Social Network Data Mining
Multimedia Data Mining
Manufacturing Data Mining
Etc.
Main research topics in my research group
-
6
Cloud, GPU & Stream Computing Platform
Big Data Mining Platform
C ++C ++
ClustersAssociation.
PredictiveModels
Reports
Models/Rules
ODBC
Direct
Custom
Data A
ccess C A
PI
Data A
ccess C A
PI Data
PreparationComponents
Data PreparationComponents
Mining EngineComponents
AssociationRulesAssociationRules
SequentialPatternsSequentialPatterns
ClusteringClustering
ClassificationClassification
Rules RetrieveComponentsRules RetrieveComponents
PredictionComponentsPredictionComponents
ApplicationsModuleApplicationsModule
DataPreparation Deploy
DataAccess
DataModeling Presentation
InterestingPatterns
InputData
-
7
Applications in Various Domains
-
8
A large-scale research initiative in 2012 aimed at Innovations around smartphone-based research Collect smartphone data in everyday life conditions Community-based evaluation of related mobile data analysis
methodologies Data source: Lausanne Data Collection Campaign
-
9
Personal information Media files Calendar Applications
Social information Call log Contacts Bluetooth
User Profile/Behavior Modeling and Prediction Device information
Process Accelerometer System Information
Location information GSM WLAN Sequence of place visits
-
10
-
11
11
Biomedical/Health Data Mining
Gene Expression MiningAssociation pattern analysisClusteringTime-series Analysis
Protein Expression Mining
Mass SpectrometryLC/MS mining
Biomarker & Health Risk Mining Vital sign analysis (ECG/EEG)Disease biomarker analysisHealth Risk AssessmentTele-care platform Patient Behavior Mining
Gene Regulation Network AnalysisProtein Structure Mining
Data Mining TechniquesAssociation Rule MiningSequential Pattern MiningClusteringClassificationTime-series Analysisetc.
-
12
12
Trends of Medicine & Healthcare
Medicine Personalized Medicine Personalized treatment
Preventive Medicine Early detection
Preventive Healthcare Personalized Preventive Risk Assessment
-
13
Health Risk Assessment (HRA)- General health examination -
Examinee
Assessed &Interpretation
Health examination report
suggestions
Doctors
Is there some potentialhealth risk?
In general health examination, health conditions are diagnosed and assessed simply as normal/abnormal by Lab results
Lack of predictive assessment on health riskHealth
Examination
0.88Waist-Hip Ratio (male
-
14
Historic Health Examination data
HRA System
Traditional Health Examination System
Traditional Health Examination ReportMr. Chen
Examined Date: 2009/05/03Weight:59 kgHeight:165.0 cmBlood Pressure: 120~70
Examined Date:2008/04/04Weight:65 kgHeight:165.2 cmBlood Pressure: 110~70
Examined Date:2007/10/16Weight:64 kgHeight:165.1 cmBllod Pressure: 120~70
Examined Date:2005/09/30Weight:61 kgHeight:165.2 cmBlood Pressure: 110~70
14
Predictive HRA- Scenario-
-
15
HRA System
Traditional Health Report
Integrated Report
Doctor
Mr. Chen
Diabetes Prediction Model
Pneumonia Prediction Model
Heart Disease Prediction Model
Apoplexy Prediction Model
Malignancy Prediction Model
Predictive HRA (cont.)- Scenario-
-
16
Feature Set
Disease Prediction Related Work
[Huang et al., 2007]
Clinical examination
Clinical examination Feature weighting
Naive Bayes
C4.5
IB1
ClassificationFeature selection
Physician selection
Good
Bad
-
17Intelligent Database Laboratory, CSIE, NCKU - 17 -
Disease Prediction Related Work (cont.)
-
18
Disease Prediction Related Work (cont.)
[Palaniappan et al., 2008] Intelligent Heart Disease Prediction System (IHDPS)
- 18 -
-
19
[Patil et al., 2009]
Disease Prediction Related Work (cont.)
Heart DiseaseDatabases
Weighted
Clustered using K-means
Frequent Pattern Mining
Preprocessing
Heart Attack Prediction System
Process
-
20
Predictive Risk Study of Hepatocellular Carcinoma (HCC)- [Chen et al. JAMA06]
Prospective cohort study of 3653 participants (aged 30-65) in Taiwan
Six main indicators Sex Age cigarette smoking alcohol consumption serostatus for the hepatitis B e antigen (HBeAg) serum alanine aminotransferase level
Elevated serum HBV DNA level (> or =10,000 copies/mL) is a strong risk predictor of hepatocellular carcinoma, independent of HBeAg, serum alanine aminotransferase level, and liver cirrhosis.
-
21
Taiwan offers new models to predict Hepatitis C cancer risk
Announced in 2012 Conference of the Asian Pacific Association for Study of the Liver by Dr. CJ Chens team
The model delivers prediction results with 80% accuracy based on a large-scale population screening study in Taiwan
Main indicators Age Liver function indexes ALT and AST Hepatitis C virus RNA in serum Genotype of the virus Liver cirrhosis
Free Smartphone App provided for use
-
22
Neural NetworkStatistics
Health Risk Assessment - Disease Prediction -
Traditional prediction mechanisms were built based on static and simple view on health/medical records.
Physical Body height (cm) 166
Body weight (kg) 62.8
BMI (18.5~24) 22.8
Systolic blood pressure (110~140) 120
Diastolic blood pressure (60~90) 85
Waist circumference (cm) 79
Hip circumference (cm) 89
Waist-Hip Ratio (male
-
23
Health Risk Prediction- Divination for Health?
What have we seen? A snapshot? (point) Temporal evolution?
(segment) Full view? (full coverage)
-
24
Challenges
Big data with heterogeneous data sources Volume, Variety, Velocity, etc.
Data quality problem Data imbalance problem Post-processing of large analyzed/extracted results Need of deep incorporation of medical domain
knowledge Privacy issue
-
Some Recent Developmentsin Taiwan
25
-
Large-Scale Population-based Health Data Mining
26
-
Health-related information are derived from various data
One hospital data Through electronic medical record (EMR) system we could calculate the
incidence of event B on patients receiving treatment A. Limitation: patients with event B might be diagnosed in another hospital
Question: whats the probability of occurrence of adverse event B after taking treatment A?
Several hospitals data Through EMR exchange system we could have a more accurate estimate
of number of incidence of event B. Limitation: difficult to detect rare event B
All hospitals population-based data Through National Health Insurance Claims Data we could have very
large patient sample size to detect rare event B.
27
-
Micro view through hospital-based data
Conclusions from hospital A
Conclusions from hospital B
28
-
Macro view through bigger data
29
-
Occurrence of febrile convulsion
Frightened mother would ask doctor: what is the probability that my child will become a patient with epilepsy?
Febrile convulsion as an example
30
-
The probability varied greatly with different datasets
Mainly due to referral
bias
31
-
Importance of using big health data to detect drug adverse effects
Lancet 2005;365:475-481
32
-
33
National Health Insurance Research Databasein Taiwan
National Health Insurance (NHI ) Established in March 1, 1995 Serves 99.2% of Taiwanese population (20M+) Covers 92.62% of medical institutions
Longitudinal Health Insurance Database ( LHID ) sampled from NHIRD Including 951,044 people health records 1997 now
Strongly representative in Taiwan Every living regions Big time interval
15+ years
Reference : National Health Insurance, http://www.nhi.gov.tw, 2012
-
Evolution of value-added analysis of health datasets in Taiwan
CR
2000 2005 2010
1G NHI
2G NHI
2015
CODBR
3G Lab data & Patient centered outcomeCloud
computing
CRNHI CODBR
NHI: National Health InsuranceBR: Birth RegistryCR: Cancer RegistryCOD: Cause of Death Mortality Lab: Laboratory data
34
-
Incorporation of more heterogeneous datasets
Lab data & Patient reported outcome
Cloud computing
CRNHI CODBR
Sensor-based biomarker monitoring data
Smart Health RiskAlert
Environmental monitoring data
35
-
36
Rich Topics for Explorations
Taiwans government units have launched large-scale projects for value-added analysis on the national health data: Disease markers discovery Disease progression model Adverse drug reactions (ADRs) Medication redundancy Public health issues Privacy issue .. etc.
-
37
DB1
Databases
DB2 DBn
Data Loading
DB Service
DB Server
Data IntegrationData ProcessingQuery Interface
VisualizationData Output
Data Download
USER
DB
Data Analytics Service
Cloud-based Data Mining Components
Mining Result
Cloud System
Integrated Database &Data Analytics Services
DataMigration
Integrated Platform for Big Health Data Analytics
Cloud-based Data Mining Components
Cloud-based Data Mining Components
-
38
Goal: Healthcare as a Service
Data Cloud Computing Cloud
-
Disease Risk Patterns Mining
39
-
40
Goal
To develop an effective framework for
1. Mining disease risk patterns
For further medical research
2. Assessing disease risk
Identify potential patients for health monitoring and diagnosis assistance
-
41
System Framework
-
42
System Main Frame
-
43
Pattern Annotation & Visualization Frame
-
44
Case Study: Chronic Kidney Disease (CKD)
Leads to End-Stage Renal Disease (ESRD) Dialysis High cost of NT$30 billion per year
Not easy to be found at early stage
Not a single specific disease
Multiple and complex causes
Reference : UNITED STATES RENAL DATA SYSTEM, 2013 Atlas of End-Stage Renal Disease
-
45
Well-Known Related
Risk Pattern ICD-9-CM Definition of Ancestors Support Confidence PubMed Search
{40190}CKD 40190: Essential hypertension, unspecified 18.05% 76.41% 83
{25000}CKD
25000: Diabetes mellitus without mention of complication, type II, or unspecified type, not stated as uncontrolled
14.57% 80.78% 1313
Potentially Surprising
Risk Pattern ICD-9-CM Definition of Ancestors Support Confidence
{53300}CKD 53300:Peptic ulcer, site unspecified
14.42% 65.56%
{30000}CKD 30000:Anxiety state
9.22% 64.83%
{52300}CKD 52300:Gingival and periodontal diseases
37.71% 57.63%
Example Risk Patterns Discovered
-
46
Health Risk Mining for Chronicle Diseases
-
47
Health WarningDiscover potential riskusing trend analysis
The current health examination report did not carry out trend analysis
Mining of Potential Health Risk Trend
200 mg/dl
Value
Date2006/03 2007/02 2008/03 2009/02triglyceride values
200 mg/dl
Value
Date2006/03 2007/02 2008/03 2009/02cholesterol values
-
48
Health Risk Mining and Prediction- in Large-Scale and Dynamic View
Doctors
Risk Pattern Mining&
Health Risk Prediction
Examinee
0.88Waist-Hip Ratio (male
-
4949
General System Framework
Integrateddataset
Health Risk Pattern Mining
Doctor
Disease Predictor Building
Parameter Setting
Profiledataset
Valuedataset
Reportdataset
DiseasePrediction
Model
Feature Integration
Health Risk Patterns
Phase 1. Health Risk Pattern Mining
Phase 2. Model Construction
Health check
PredictionResults
Phase 3. Risk Prediction
Integrateddataset
-
50
Key Steps
Feature Selection Pattern Mining Frequent pattern mining, surprising pattern mining, etc.
Modeling Decision tree, SVM, Neural network, etc
Prediction Ensemble, etc
-
51
Health Examination Data of a Medical Center in Taiwan Time Period: February, 1996 ~ August, 2009 Total Number of subjects:14,218 Target Disease: Diabetes Disease on Fasting Plasma Glucose (FPG)
51
Item Item Item Item ItemTriacylGlycerol Waistline Systolic pressure
(Left Hand)Diastolic Pressure(Right Hand)
HbA1c
HDL-C Arm girth Sphygmus(Right Hand)
Diastolic Pressure(Left Hand)
Fasting Plasma Glucose
Total-cholesterol
Weight Sphygmus(Left Hand)
Diastolic Pressure(Before Stand)
Height Systolic pressure (RightHand)
Sphygmus(Before Stand)
OGTT
Experimental Evaluation
-
52
Experimental Results (FPG)
0%10%20%30%40%50%60%70%80%90%
100%Accuracy Precision Sensitivity Specificity F-Measure
AllFemaleMale
-
53
Experimental Results (High Dense Lipoprotein)
0%10%20%30%40%50%60%70%80%90%
100%Accuracy Precision Sensitivity Specificity F-Measure
AllFemaleMale
-
54
>0 Represent the existence of the health risk patte100) 5.0The 5 records are classified into C3 by the rule
(Note that they are correctly classified.)
Health Risk Pattern - HbA1c (F2) (A238):
Health Risk Pattern - HbA1c (F2) (A211):
Health Risk Pattern - Diastolic Pressure(Left Hand) (F17) (A87):
Health Risk Pattern - HbA1c (F2) (A286):
Decision Tree
-
55- 55 -
Health Risk Pattern - HbA1c (F2) (A286):
Health Risk Pattern - HbA1c (F2) (A238):
Health Risk Pattern - Diastolic Pressure (Left Hand) (F17) (A87):
Health Risk Pattern - HbA1c (F2) (A211):
Target Attribute: Fasting plasma glucose (FPG)C1Represent unhealthy range (100)
Decision Tree (cont.)
-
56
Health Examination Historic Data
Our System
Assume five common chronic disease prediction models have been built via our health analysis system from historic health examination data.
Outpatient Data
Diabetes Prediction Model
Pneumonia Prediction Model
Heart Disease Prediction Model
Apoplexy Prediction Model
Malignancy Prediction Model
Practical Application - Scenario-
-
Early Prediction & Monitoring for Disease Outbreak
- Case Study on Asthma Care
57
-
5858
Asthma Care
Asthma is a chronic disease
Airway constrict Apply MDI When asthma attacks!
- Potential triggers
Cold airWarm, moist airAllergens Stress Cold
-
59
Prediction of Asthma Outbreak
Sliding window Size: 5
day1 day2 day3 day4 day5
day6
Combined Data
Server
Asthma OutbreakPrediction
-
60
60
Integrated Data Mining Mechanism
Data Mining Understandable Pattern DB
Association Rule
Sequential Pattern
Time Series
Classification
Predictive Alarm Engine
Symptom
Factor
Factor
SymptomFactor
EnvironmentDataset
Chronic Disease Patients
Bio-SignalDataset
UserProfiles
-
6161
Data Mining Workflow
UserProfiles
EnvironmentDataset
AsthmaSymptomDataset
Data Pre-Processing
Sequential Pattern MiningPattern Mining
Classification MiningRule Mining: PBD
Classification MiningRule Mining: PBC
{1st asthma symptom} Potential Asthma{5th allergy symptom} Potential Asthma
{PM10 is low, Moderate temperature} None Asthma
Predictive Alarm Engine
Phase 1Phase 2
7 8
0 1
4
3
0
5
2
1
7 8
0 1
4
3
0
5
2
1
Testing DatasetTraining Dataset
Integrated Dataset
-
62
Integrated Data: Patient SymptomsAsthma
SymptomDataset
Asthma Sympto
ms
Fever Sympt
omNighttime Symptom Daytime Symptom
0 NO Sleep well No cough, exercise regularly
1 Yes Sleep well, but with intermittent cough Intermittent dry cough
2 Wake up coughing and can fall asleep after inhaling steroids Cough with phlegm, cough when exercising
3 Serious cough and fall asleep hardly Wheezing, use of inhaled vasodilators
4 Short breath, incessant cough, need medicine and go to hospital immediatelyAllergy Symptom
Nose Symptom eye symptom skin symptom
0 Itchy and rubbing normal normal
1 sneeze rubbing Itchy, no swelling
2 Nasal congestionSwelling and Photophobia
Local swelling
3 Running noseMore than 2 rash
blocks
-
63
Weather Information Source data: Central Weather Burau, Taiwan
Temperature Humidity Highest Temperature Lowest Temperature Temperature
Difference
IntegratedData:EnvironmentData
-
64
Air Pollutants Source Data: Open Environmental Database, Taiwan
Air pollution attributes:
IntegratedData:EnvironmentData(cont.)
-
65
Example of Decision Tree Output
Yes
2nd allergy symptom
PM10 is high Low humidity
Catch a cold
5th asthma symptom
3rd allergy symptom
PotentialAsthma
High temperature difference
PotentialAsthma
None PotentialAsthma
YesNo
YesNo
YesNo
Yes
None PotentialAsthma
No
YesNo
PotentialAsthma
Yes
PotentialAsthma
None PotentialAsthma
No
-
66
Example Induction Rules
,inhaledmaintenancemedicineandbronchodilator Highrisk[sup:14.16%,conf:100%]
,noneedmedicine Normal[sup:5.584%,conf:100%]
,noneedmedicine,differenceintemperature:2,theindicatorofpollutants:O3 Normal[sup:5.076%,conf:100%]
,noneedmedicine,,noneedmedicine,humidity:pattern(up,flat)Highrisk[sup:0.761%,conf:100%]
,inhaledmaintenancemedicineor oralmaintenancemedicinedayandnight,,inhaledmaintenancemedicineand oralmaintenancemedicinedayandnight,Thequalityofair:good general Highrisk[sup:2.792%,conf:100%]
,inhaledmaintenancemedicineororalmaintenancemedicinedayandnight,,inhaledmaintenancemedicineand oralmaintenancemedicinedayandnight,differenceintemperature:8 Highrisk[sup:3.553%,conf:100%]
,inhaledmaintenancemedicineororalmaintenancemedicinedayandnight,thequalityofair:general good bad,PM10:50150 Highrisk[sup:7.614%,conf:100%]
-
67
PerformanceofClassifiers
0.00%
10.00%20.00%
30.00%
40.00%
50.00%
60.00%70.00%
80.00%90.00%
Air pollution Weather Asthma Combined
Datasets
Average of 10 Experiments: PBC Summary
Precision (Out)
Recall(Out)
-
68
GPRS
Intelligent Mobile Healthcare: Framework
Patient Medical Station/ Clinician
InternetAccess online
Query Online
Query by cell phone Urgent event notice
Show messages
AutomaticTransmit
Medical Center
Database
Server
GPRS
Data mining systemPatient monitoring system
EnvironmentalInformation
-
69
Location-Aware Asthma Alert
Data Mining System
Predictive Alert System
-
70
Concluding Remarks
Points -> Segment -> Coverage
Full view for health risk prediction!
-
71
Concluding Remarks (cont.)- A Highly Integrated Framework for Smart Healthcare via Big Data Mining
-
Thanks for your attention&
Look forward to collaborations
-
73