mudim (petr Šimeček , euromise )

Post on 08-Jan-2016

28 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

MUDIM (Petr Šimeček , Euromise ). system for multidimensional compositional models (Radim Jirou šek) C++ code, distributed as R-package focused on medical applications. Contents:. idea of conditional independence and (de)composition possible applications of MUDIM expert system data mining - PowerPoint PPT Presentation

TRANSCRIPT

MUDIM (Petr Šimeček, Euromise)

system for multidimensional

compositional models (Radim Jiroušek)

C++ code, distributed as R-package

focused on medical applications

Contents:

idea of conditional independence and (de)composition

possible applications of MUDIM expert system data mining

STULONG dataset

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

Statisticallyconnected

Do storks deliver newborns?

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENT

No!

CI - Theory of Storks

BIRTH RATESTORK

POPULATION

ENVIRONMENTconnectedco

nnected

CI – Weather

WEATHERYESTERDAY

WEATHERTOMORROW

WEATHERTODAY

CI – Weather

WEATHERYESTERDAY

WEATHERTODAY

WEATHERTOMORROW

CI – Sample Medical Data

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

CI – Sample Medical Data

(unconditional) statistical connection(correlation) betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

CI – Storks & Weather

BIRTH RATESTORK

POPULATION

ENVIRONMENT

YESTERDAY

TODAY

TOMORROW

CI – Sample Medical Data

causality betweenthe pair of variables

=

= variable (attribute);f.e. AGE, BLOOD PREASURE, …

Locality - illustrationVariable X

Directly explanatoryvariables for X

Other variables

If we know information about directly explanatoryvariables for X, then knowledge about other explanatory variables is useless for predicting X.

Applications – Expert Systems

Causality

Applications – Expert Systems

Causality

Applications – Expert Systems

Causality

Applications – Expert Systems

Causality

Applications – Expert Systems

Causality

)κ(

)κ()π(

)κ()π(

2

3221

3221

X

,XX,XX

,XX,XX

Idea of Compositional Models

Applications – Expert Systems

Causality

What is the distribution of if we know ?

Data Mining

We don’t know “anything”, there are lots of variables and lots of possible

relations between them.

We need to formulate possible hypothesis, suggest some promising models, etc. (useful in pre-research).

Data Mining

Variables

Data

Direction of Causality Problem

is equivalent to

are equivalent, but they are notequivalent to

STULONG Dataset= Dataset containing research data on

cardiovascular disease (1976-79)

1417 patients (Czech middle-aged men)

244 attributes surveyed with each patient at the entry examination

37 selected attributes are described here

(Incomplete) List of Attributes

AGE MARITAL STATUS EDUCATION OCCUPATION PHISICAL ACTIVITY TRANSPORT TO

JOB SMOKING ALCOHOL TEA AND COFFEE

MYOCARDIAL INFARCTION

HYPERTENSION ICTUS HYPERLIPIDEMIA CHEST PAIN ASTHMA HEIGHT & WEIGHT BLOOD PREASURE …

Graph of Correlated PairsMARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

464 of 666possiblepairs arestatisticallyconnected(p=0.05)

Graph of Correlated Pairs 2

160 of 666possiblepairs arestatisticallyconnected(p=0.05/666)

MARIT.STATEDUCRESPACT.IN.JOBACT.AFTER.JOB

TRANSPORTTRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA

SUGAR

IM

HTHTD

HTLDIABETHYPLIPPAIN.CHESTPAIN.LL

ASTHMAHEIGHT

WEIGHT

SYST1

DIAST1

SYST2

DIAST2

TRIC

SUBSC

CHLST

TRIGL

URINEAGE

MARIT.STAT

EDUC

RESP

ACT.IN.JOB

ACT.AFTER.JOB

TRANSPORT

TRANSPORT.TIME

SMOKING

SMOKING.YR

ALCOHOL.FREQ

BEER.DAILY

WINE.DAILY

LIQ.DAILY

COFFEE

TEA SUGAR

IM

HT

HTD

HTL

DIABET

HYPLIP

PAIN.CHEST PAIN.LL

ASTHMA

HEIGHT

WEIGHT

SYST1DIAST1

SYST2

DIAST2

TRICSUBSC

CHLST

TRIGL URINE

AGE

56arrows

Risk Factors for Hypertension>summary(glm(HT~HYPLIP+IM+AGE+SUBSC,data=C,family="bino

mial"))

Coefficients: Estimate Std. Error z value Pr(>|z|) Estimate Std. Error z value Pr(>|z|) (Intercept) -4.322730 1.274252 -3.392 0.000693 ***IM 1.246937 0.513342 2.429 0.015138 * HYPLIP 1.126383 0.333971 3.373 0.000744 ***SUBSC 0.009521 0.003978 2.393 0.016699 * AGE 0.245182 0.136678 1.794 0.072835 .---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'

0.1 ` ' 1

Risk Factors for Hypertension

Interpretation: HYPERLIPIDEMIA and IM triple odds

of ratio Each three years of AGE double

odds of ratio There is also small, but evincible

connection to skinfold above musculus subscapularis (SUBSC)

top related