mudim (petr Šimeček , euromise )

31
MUDIM (Petr Šimeček, Euromise) system for multidimensional compositional models (Radim Jiroušek) C++ code, distributed as R- package focused on medical applications

Upload: kalil

Post on 08-Jan-2016

28 views

Category:

Documents


3 download

DESCRIPTION

MUDIM (Petr Šimeček , Euromise ). system for multidimensional compositional models (Radim Jirou šek) C++ code, distributed as R-package focused on medical applications. Contents:. idea of conditional independence and (de)composition possible applications of MUDIM expert system data mining - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MUDIM (Petr Šimeček , Euromise )

MUDIM (Petr Šimeček, Euromise)

system for multidimensional

compositional models (Radim Jiroušek)

C++ code, distributed as R-package

focused on medical applications

Page 2: MUDIM (Petr Šimeček , Euromise )

Contents:

idea of conditional independence and (de)composition

possible applications of MUDIM expert system data mining

STULONG dataset

Page 3: MUDIM (Petr Šimeček , Euromise )

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

Page 4: MUDIM (Petr Šimeček , Euromise )

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

Statisticallyconnected

Do storks deliver newborns?

Page 5: MUDIM (Petr Šimeček , Euromise )

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENT

No!

Page 6: MUDIM (Petr Šimeček , Euromise )

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENTconnectedco

nnected

Page 7: MUDIM (Petr Šimeček , Euromise )

CI – Weather

WEATHERYESTERDAY

WEATHERTOMORROW

WEATHERTODAY

Page 8: MUDIM (Petr Šimeček , Euromise )

CI – Weather

WEATHERYESTERDAY

WEATHERTODAY

WEATHERTOMORROW

Page 9: MUDIM (Petr Šimeček , Euromise )

CI – Sample Medical Data

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 10: MUDIM (Petr Šimeček , Euromise )

CI – Sample Medical Data

(unconditional) statistical connection(correlation) betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 11: MUDIM (Petr Šimeček , Euromise )

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

Page 12: MUDIM (Petr Šimeček , Euromise )

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

Page 13: MUDIM (Petr Šimeček , Euromise )

CI – Sample Medical Data

causality betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Page 14: MUDIM (Petr Šimeček , Euromise )

Locality - illustrationVariable X

Directly explanatoryvariables for X

Other variables

If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.

Page 15: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

Page 16: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

Page 17: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

Page 18: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

Page 19: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

Page 20: MUDIM (Petr Šimeček , Euromise )

)κ(

)κ()π(

)κ()π(

2

3221

3221

X

,XX,XX

,XX,XX

Idea of Compositional Models

Page 21: MUDIM (Petr Šimeček , Euromise )

Applications – Expert Systems

Causality

What is the distribution of if we know ?

Page 22: MUDIM (Petr Šimeček , Euromise )

Data Mining

We don’t know “anything”, there are lots of variables and lots of possible

relations between them.

We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).

Page 23: MUDIM (Petr Šimeček , Euromise )

Data Mining

Variables

Data

Page 24: MUDIM (Petr Šimeček , Euromise )

Direction of Causality Problem

is equivalent to

are equivalent, but they are notequivalent to

Page 25: MUDIM (Petr Šimeček , Euromise )

STULONG Dataset= Dataset containing research data on

cardiovascular disease (1976-79)

1417 patients (Czech middle-aged men)

244 attributes surveyed with each patient at the entry examination

37 selected attributes are described here

Page 26: MUDIM (Petr Šimeček , Euromise )

(Incomplete) List of Attributes

AGE MARITAL STATUS EDUCATION OCCUPATION PHISICAL ACTIVITY TRANSPORT TO

JOB SMOKING ALCOHOL TEA AND COFFEE

MYOCARDIAL INFARCTION

HYPERTENSION ICTUS HYPERLIPIDEMIA CHEST PAIN ASTHMA HEIGHT & WEIGHT BLOOD PREASURE …

Page 27: MUDIM (Petr Šimeček , Euromise )

Graph of Correlated PairsMARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

464 of 666possiblepairs arestatisticallyconnected(p=0.05)

Page 28: MUDIM (Petr Šimeček , Euromise )

Graph of Correlated Pairs 2

160 of 666possiblepairs arestatisticallyconnected(p=0.05/666)

MARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

Page 29: MUDIM (Petr Šimeček , Euromise )

MARIT.STAT

EDUC

RESP

ACT.IN.JOB

ACT.AFTER.JOB

TRANSPORT

TRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA SUGAR

IM

HT

HTD

HTL

DIABET

HYPLIP

PAIN.CHEST PAIN.LL

ASTHMA

HEIGHT

WEIGHT

SYST1DIAST1

SYST2

DIAST2

TRICSUBSC

CHLST

TRIGL URINE

AGE

56arrows

Page 30: MUDIM (Petr Šimeček , Euromise )

Risk Factors for Hypertension>summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="bino

mial"))

Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 ***IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 ***SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 .---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'

0.1 ` ' 1

Page 31: MUDIM (Petr Šimeček , Euromise )

Risk Factors for Hypertension

Interpretation: HYPERLIPIDEMIA and IM triple odds

of ratio Each three years of AGE double

odds of ratio There is also small, but evincible

connection to skinfold above musculus subscapularis (SUBSC)