mudim (petr Šimeček, euromise) system for multidimensional compositional models (radim jiroušek)...
TRANSCRIPT
![Page 1: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/1.jpg)
MUDIM (Petr Šimeček, Euromise)
system for multidimensional
compositional models (Radim Jiroušek)
C++ code, distributed as R-package
focused on medical applications
![Page 2: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/2.jpg)
Contents:
idea of conditional independence and (de)composition
possible applications of MUDIM expert system data mining
STULONG dataset
![Page 3: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/3.jpg)
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
![Page 4: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/4.jpg)
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
Statisticallyconnected
Do storks deliver newborns?
![Page 5: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/5.jpg)
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
ENVIRONMENT
No!
![Page 6: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/6.jpg)
CI - Theory of Storks
BIRTH RATESTORK
POPULATION
ENVIRONMENTconnectedco
nnected
![Page 7: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/7.jpg)
CI – Weather
WEATHERYESTERDAY
WEATHERTOMORROW
WEATHERTODAY
![Page 8: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/8.jpg)
CI – Weather
WEATHERYESTERDAY
WEATHERTODAY
WEATHERTOMORROW
![Page 9: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/9.jpg)
CI – Sample Medical Data
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
![Page 10: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/10.jpg)
CI – Sample Medical Data
(unconditional) statistical connection(correlation) betweenthe pair of variables
=
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
![Page 11: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/11.jpg)
CI – Storks & Weather
BIRTH RATESTORK
POPULATION
ENVIRONMENT
YESTERDAY
TODAY
TOMORROW
![Page 12: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/12.jpg)
CI – Storks & Weather
BIRTH RATESTORK
POPULATION
ENVIRONMENT
YESTERDAY
TODAY
TOMORROW
![Page 13: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/13.jpg)
CI – Sample Medical Data
causality betweenthe pair of variables
=
= variable (attribute);f.e. AGE, BLOOD PREASURE, …
![Page 14: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/14.jpg)
Locality - illustrationVariable X
Directly explanatoryvariables for X
Other variables
If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.
![Page 15: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/15.jpg)
Applications – Expert Systems
Causality
![Page 16: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/16.jpg)
Applications – Expert Systems
Causality
![Page 17: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/17.jpg)
Applications – Expert Systems
Causality
![Page 18: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/18.jpg)
Applications – Expert Systems
Causality
![Page 19: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/19.jpg)
Applications – Expert Systems
Causality
![Page 20: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/20.jpg)
)κ(
)κ()π(
)κ()π(
2
3221
3221
X
,XX,XX
,XX,XX
Idea of Compositional Models
![Page 21: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/21.jpg)
Applications – Expert Systems
Causality
What is the distribution of if we know ?
![Page 22: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/22.jpg)
Data Mining
We don’t know “anything”, there are lots of variables and lots of possible
relations between them.
We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).
![Page 23: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/23.jpg)
Data Mining
Variables
Data
![Page 24: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/24.jpg)
Direction of Causality Problem
is equivalent to
are equivalent, but they are notequivalent to
![Page 25: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/25.jpg)
STULONG Dataset= Dataset containing research data on
cardiovascular disease (1976-79)
1417 patients (Czech middle-aged men)
244 attributes surveyed with each patient at the entry examination
37 selected attributes are described here
![Page 26: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/26.jpg)
(Incomplete) List of Attributes
AGE MARITAL STATUS EDUCATION OCCUPATION PHISICAL ACTIVITY TRANSPORT TO
JOB SMOKING ALCOHOL TEA AND COFFEE
MYOCARDIAL INFARCTION
HYPERTENSION ICTUS HYPERLIPIDEMIA CHEST PAIN ASTHMA HEIGHT & WEIGHT BLOOD PREASURE …
![Page 27: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/27.jpg)
Graph of Correlated PairsMARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB
TRANSPORTTRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA
SUGAR
IM
HTHTD
HTLDIABETHYPLIPPAIN.CHESTPAIN.LL
ASTHMAHEIGHT
WEIGHT
SYST1
DIAST1
SYST2
DIAST2
TRIC
SUBSC
CHLST
TRIGL
URINEAGE
464 of 666possiblepairs arestatisticallyconnected(p=0.05)
![Page 28: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/28.jpg)
Graph of Correlated Pairs 2
160 of 666possiblepairs arestatisticallyconnected(p=0.05/666)
MARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB
TRANSPORTTRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA
SUGAR
IM
HTHTD
HTLDIABETHYPLIPPAIN.CHESTPAIN.LL
ASTHMAHEIGHT
WEIGHT
SYST1
DIAST1
SYST2
DIAST2
TRIC
SUBSC
CHLST
TRIGL
URINEAGE
![Page 29: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/29.jpg)
MARIT.STAT
EDUC
RESP
ACT.IN.JOB
ACT.AFTER.JOB
TRANSPORT
TRANSPORT.TIME
SMOKING
SMOKING.YR
ALCOHOL.FREQ
BEER.DAILY
WINE.DAILY
LIQ.DAILY
COFFEE
TEA SUGAR
IM
HT
HTD
HTL
DIABET
HYPLIP
PAIN.CHEST PAIN.LL
ASTHMA
HEIGHT
WEIGHT
SYST1DIAST1
SYST2
DIAST2
TRICSUBSC
CHLST
TRIGL URINE
AGE
56arrows
![Page 30: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/30.jpg)
Risk Factors for Hypertension>summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="bino
mial"))
Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 ***IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 ***SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 .---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` ' 1
![Page 31: MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R-package focused on medical](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649edd5503460f94beda5e/html5/thumbnails/31.jpg)
Risk Factors for Hypertension
Interpretation: HYPERLIPIDEMIA and IM triple odds
of ratio Each three years of AGE double
odds of ratio There is also small, but evincible
connection to skinfold above musculus subscapularis (SUBSC)