discrete multivariate analysis analysis of multivariate categorical data

75
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Upload: ophelia-johnston

Post on 03-Jan-2016

275 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Discrete Multivariate Analysis

Analysis of Multivariate Categorical Data

Page 2: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Example 1

Data Set #1 - A two-way frequency table Serum Systolic Blood pressure

Cholesterol <127 127-146 147-166 167+ Total <200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439 260+ 67 99 46 33 245 Total 388 527 204 118 1237

In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol

Page 3: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Example 2

The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).

Page 4: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

The study involved a dichotomous response Y– Success (no major parole violation) or – Failure (returned to prison either as technical

violators or with a new conviction)

based on a one-year follow-up.

The predictors of parole success included are: 1. type of committed offence (Person offense or

Other offense), 2. Age (25 or Older or Under 25), 3. Prior Record (No prior sentence or Prior

Sentence), and 4. Drug or Alcohol Dependency (No drug or

Alcohol dependency or Drug and/or Alcohol dependency).

Page 5: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

• The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses.

• The second part of the data was set aside for a validation study of the model to be fitted in the first part.

Page 6: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Table

No drug or alcohol dependency Drug and/or alcohol dependency 25 or older Under 25 25 or Older Under 25 Person

offense Other

offense Person offense

Other offense

Person offense

Other offense

Person offense

Other offense

No prior Sentence of Any Kind Success 48 34 37 49 48 28 35 57 (44) (34) (29) (58) (47) (38) (37) (53) Failure 1 5 7 11 3 8 5 18 (1) (7) (7) (5) (1) (2) (4) (24) Prior Sentence Success 117 259 131 319 197 435 107 291 (111) (253) (131) (320) (202) (392) (103) (294) Failure 23 61 20 89 38 194 27 101 (27) (55) (25) (93) (46) (215) (34) (102)

Page 7: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Analysis of a Two-way Frequency Table:

Page 8: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Frequency Distribution (Serum Cholesterol and Systolic Blood Pressure)

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 117 121 47 22 307 200-219 85 98 43 20 246 220-259 119 209 68 43 439

260+ 67 99 46 33 245 Total 388 527 204 118 1237

Page 9: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Joint and Marginal Distributions (Serum Cholesterol and Systolic Blood Pressure)

Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)

<200 9.46 9.78 3.80 1.78 24.82 200-219 6.87 7.92 3.48 1.62 19.89 220-259 9.62 16.90 5.50 3.48 35.49

260+ 5.42 8.00 3.72 2.67 19.81 Marginal distn (BP)

31.37 42.60 16.49 9.54 100.00

The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.

Page 10: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Conditional Distributions ( Systolic Blood Pressure given Serum Cholesterol )

The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 38.11 39.41 15.31 7.17 100.00 200-219 34.55 39.84 17.48 8.13 100.00 220-259 27.11 47.61 15.49 9.79 100.00

260+ 27.35 40.41 18.78 13.47 100.00 Marginal distn (BP)

31.37 42.60 16.49 9.54 100.00

Page 11: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Conditional Distributions

(Serum Cholesterol given Systolic Blood Pressure)

Serum Systolic Blood pressure Marginal distn Cholesterol <127 127-146 147-166 167+ (Serum Chol.)

<200 30.15 22.96 23.04 18.64 24.82 200-219 21.91 18.60 21.08 16.95 19.89 220-259 30.67 39.66 33.33 36.44 35.49

260+ 17.27 18.79 22.55 27.97 19.81 Total 100.00 100.00 100.00 100.00 100.00

Page 12: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol

127-146 147-166<127 167+

SYSTOLIC BLOOD PRESSURE

<200

200-219

260+

220-259

Marginal Distribution

SERUM CHOLESTEROL

40%

50%

30%

20%

10%

Page 13: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Notation:

Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.

1

c

i i ijj

x R x

1

r

j j iji

x C x

1 1 1 1

r c r c

ij i ji j i j

x N x x x

Page 14: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Different Models

,ij P X i Y j

11 1211 12 11 12

11

, , , rcxx xrc rc

rc

Nf x x x

x x

The Multinomial Model:

Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters ij

11 1211 12

11

!

! !rcxx x

rcrc

N

x x

ij ij ijE x N

Page 15: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

11 1211 12 1| 2| |

1 1

, , , ic

ri xx x

rc i i c ii i ic

Rf x x x

x x

The Product Multinomial Model:

Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters j|i

|ij ij i j iE x R

Page 16: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

11 121 1

, , ,!

ij

ij

xr cij

rci j ij

f x x x ex

The Poisson Model:

In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let ij

denote the mean of xij.

ij ijE x

!

ij

ij

xij

ij ijij

f x ex

Page 17: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Independence

Page 18: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Multinomial Model

,ij P X i Y j P X i P Y j

i j

ij ij i jN N

if independent

and

The estimated expected frequency in cell (i,j) in the case of independence is:

ˆ ˆ ˆ jiij ij i j

xxm N N

N N

i j i jx x R C

N N

Page 19: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

The same can be shown for the other two models – the Product Multinomial model and the Poisson model

namely

The estimated expected frequency in cell (i,j) in the case of independence is:

ˆ i j i jij ij

R C x xm

N x

Standardized residuals are defined for each cell:

ij ijij

ij

x mr

m

Page 20: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

The Chi-Square Statistic

2

2 2

1 1 1 1

r c r cij ij

iji j i j ij

x mr

m

The Chi-Square test for independence

Reject H0: independence if

2

2 2/ 2

1 1

1 1r c

ij ij

i j ij

x mdf r c

m

Page 21: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

TableExpected frequencies, Observed frequencies,

Standardized Residuals

Serum Systolic Blood pressure Cholesterol <127 127-146 147-166 167+ Total

<200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35

200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72

220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17

260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99

Total 388 527 204 118 1237 2 = 20.85 (p = 0.0133)

Page 22: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Example

In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied.

The crime of the first victimization (X) and the crime of the second victimization (Y) were noted.

The data were tabulated on the following slide

Page 23: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Table 1: Frequencies

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Total Ra 26 50 11 6 82 39 48 11 273 A 65 2997 238 85 2553 1083 1349 216 8586

First Ro 12 279 197 36 459 197 221 47 1448 Victimization PP/PS 3 102 40 61 243 115 101 38 703

in pair PL 75 2628 413 229 12137 2658 3689 687 22516 B 52 1117 191 102 2649 3210 1973 301 9595 HL 42 1251 206 117 3757 1962 4646 391 12372 MV 3 221 51 24 678 301 367 269 1914 Total 278 8645 1347 660 22558 9565 12394 1960

Page 24: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Table 2: Standardized residuals

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 21.5 1.4 1.8 1.6 -2.4 -1.0 -1.9 0.6 A 3.6 47.4 2.6 -1.4 -14.1 -9.2 -11.7 -4.5

First Ro 1.9 4.1 28.0 4.7 -4.6 -2.8 -5.2 -0.3 Victimization PP/PS -0.2 -0.4 5.8 18.6 -2.0 -0.2 -4.1 2.9

in pair PL -3.3 -13.1 -5.0 -1.9 35.0 -17.9 -16.8 -2.9 B 0.8 -8.6 -2.3 -0.8 -18.3 40.3 -2.2 -1.5 HL -2.3 -14.2 -4.9 -2.1 -15.8 -2.2 38.2 -1.5 MV -2.1 -4.0 0.9 0.4 -2.7 -1.0 -2.3 25.2

11,430 (highly significant)

Page 25: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Table 3: Conditional distribution of second victimization given the first victimization (%)

Second Victimization in Pair Ra A Ro PP/PS PL B HL MV Ra 9.5 18.3 4.0 2.2 30.0 14.3 17.6 4.0 100.0 A 0.8 34.9 2.8 1.0 29.7 12.6 15.7 2.5 100.0

First Ro 0.8 19.3 13.6 2.5 31.7 13.6 15.3 3.2 100.0 Victimization PP/PS 0.4 14.5 5.7 8.7 34.6 16.4 14.4 5.4 100.0

in pair PL 0.3 11.7 1.8 1.0 53.9 11.8 16.4 3.1 100.0 B 0.5 11.6 2.0 1.1 27.6 33.5 20.6 3.1 100.0 HL 0.3 10.1 1.7 0.9 30.4 15.9 37.6 3.2 100.0 MV 0.2 11.5 2.7 1.3 35.4 15.7 19.2 14.1 100.0

Marginal 0.5 15.1 2.3 1.1 39.3 16.7 21.6 3.4 100.0

Page 26: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Log Linear Model

Page 27: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Recall, if the two variables, rows (X) and columns (Y) are independent then

ij ij i jN N

and

ln ln ln lnij i jN

Page 28: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

In general let

1( ) 2( ) 12( , )ln ij i j i ju u u u

1ln ij

i j

urc

1( )

1lni ij

j

u uc

2( )

1lnj ij

i

u ur

12( , ) 1( ) 2( )lni j ij i ju u u u

then

where1( ) 2( ) 12( , ) 12( , ) 0i j i j i j

i j i j

u u u u

(1)

Equation (1) is called the log-linear model for the frequencies xij.

Page 29: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Note: X and Y are independent if

1( ) 2( )ln ij i ju u u

In this case the log-linear model becomes

12( , ) 0 for all ,i ju i j

Page 30: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Another formulation

*

,12*2

*1

*ln jijiij uuuu

0 where *2

*1

*,12

*,12 JIJijI uuuu

Page 31: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Three-way Frequency Tables

Page 32: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

With two variables the dependence structure is simple: the variables are either dependent or independent.

When there are three or more variables the dependence structure is much more complicated.

Page 33: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Marginal distributions

Distributions of two variables ignoring the third.

1. X1, X2 ignoring X3

2. X1, X3 ignoring X2

3. X2, X3 ignoring X1

Distributions of one variable ignoring the other two.

1. X1 ignoring X2, X3

2. X2 ignoring X1, X3

3. X3 ignoring X1, X2

Page 34: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Conditional distributions

Distributions of two variables given the third.

1. X1, X2 given X3

2. X1, X3 given X2

3. X2, X3 given X1

Distributions of one variable given the other two.

1. X1 given X2, X3

2. X2 given X1, X3

3. X3 given X1, X2

Page 35: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Distributions of one variable given either of the other two.

1. X1 given X2

2. X1 given X3

3. X2 given X1

4. X2 given X3

5. X3 given X1

6. X3 given X2

Page 36: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

ExampleData from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962])

Variables

1. Systolic Blood Pressure (X)– < 127, 127-146, 147-166, 167+

2. Serum Cholesterol– <200, 200-219, 220-259, 260+

3. Heart Disease– Present, Absent

The data is tabulated on the next slide

Page 37: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Three-way Frequency Table

Coronary Heart

Serum Cholesterol

Systolic Blood pressure (mm Hg)

Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4

Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22

Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33

Page 38: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Log-Linear model for three-way tables

Let ijk denote the expected frequency in cell (i,j,k) of the table then in general

1( ) 2( ) 3( ) 12( , )ln ij i j k i ju u u u u

1( ) 2( ) 3( ) 12( , ) 12( , )0 i j k i j i ji j k i j

u u u u u

13( , ) 23( , ) 123( , , )i k j k i j ku u u

where

13( , ) 13( , ) 23( , ) 23( , )i k i k j k j ki k j k

u u u u 123( , , ) 123( , , ) 123( , , )i j k i j k i j k

i j k

u u u

Page 39: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Hierarchical Log-linear models for categorical Data

For three way tables

The hierarchical principle:

If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction

Page 40: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

1.Model: (All Main effects model)

ln ijk = u + u1(i) + u2(j) + u3(k)

i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:

[1][2][3]

Description:

Mutual independence between all three variables.

Page 41: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

2.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j)

i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0.

Notation:

[12][3]

Description:

Independence of Variable 3 with variables 1 and 2.

Page 42: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

3.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k)

i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0.

Notation:

[13][2]

Description:

Independence of Variable 2 with variables 1 and 3.

Page 43: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

4.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u23(j,k)

i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0.

Notation:

[23][1]

Description:

Independence of Variable 3 with variables 1 and 2.

Page 44: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

5.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k)

i.e. u23(j,k) = u123(i,j,k) = 0.

Notation:

[12][13]

Description:

Conditional independence between variables 2 and 3 given variable 1.

Page 45: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

6.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u23(j,k)

i.e. u13(i,k) = u123(i,j,k) = 0.

Notation:

[12][23]

Description:

Conditional independence between variables 1 and 3 given variable 2.

Page 46: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

7.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k)

i.e. u12(i,j) = u123(i,j,k) = 0.

Notation:

[13][23]

Description:

Conditional independence between variables 1 and 2 given variable 3.

Page 47: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

8.Model:

ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) + u23(j,k)

i.e. u123(i,j,k) = 0.Notation:

[12][13][23] Description:Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

Page 48: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

9.Model: (the saturated model)

ln ijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) + u23(j,k) + u123(i,j,k)

Notation:

[123]

Description:

No simplifying dependence structure.

Page 49: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Hierarchical Log-linear models for 3 way table

Model Description

[1][2][3] Mutual independence between all three variables.

[1][23] Independence of Variable 1 with variables 2 and 3.

[2][13] Independence of Variable 2 with variables 1 and 3.

[3][12] Independence of Variable 3 with variables 1 and 2.

[12][13] Conditional independence between variables 2 and 3 given variable 1.

[12][23] Conditional independence between variables 1 and 3 given variable 2.

[13][23] Conditional independence between variables 1 and 2 given variable 3.

[12][13] [23] Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

[123] The saturated model

Page 50: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Maximum Likelihood Estimation

Log-Linear Model

Page 51: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

For any Model it is possible to determine the maximum Likelihood Estimators of the parameters

Example

Two-way table – independence – multinomial model

11 1211 12 11 12

11

, , , rcxx xrc rc

rc

Nf x x x

x x

11 12

11 12

11

!

! !

rcxx x

rc

rc

N

x x N N N

ij ij ijE x N orij

ij N

Page 52: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Log-likelihood

11 12, , ln ! ln !rc iji j

l N x

ln lnij ij iji j i j

N x x lnij ij

i j

K x where ln ! ln ! lnij

i j

K N x N N

1 2ln ij i ju u u

With the model of independence

Page 53: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

and

1 1 1 2 1 2, , , , , ,c rl u u u u u K

1 2ij i ji j

x u u u

with 1 2 0i ji j

u u

1 2i ji ji j

K Nu x u x u

1 2 1 2i j i ju u u u uuij

i j i j i j

e e e e N

also

Page 54: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Let

1 2 21 1 1 2 1 2, , , , , , , , ,c rg u u u u u

1 2

1 11 2i ju uu

i ji j i j

u u e e e N

1 2i ji ji j

K Nu x u x u

Now

1 2 1 0i ju uu

i j

gN e e e N

u

1

Page 55: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

1 2

1

1

i ju uui

ji

gx e e e

u

1

11 0

i

i

u

i u

i

ex N

e

1

1

1i

i

u

i iu

i

x xe

N Ne

1 111 and 0

ii i

i

xx

rN N N

Since

Page 56: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Now 1

1iu

ie x K

or 11 ln lniiu x K

11 ln ln 0iii i

u x r K

Page 57: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Hence

1

1ln lni ii

i

u x xr

1

1ln ln i

i

K xr

and

2

1ln lnj jj

i

u x xc Similarly

1 2 1 2i j i ju u u u uuij

i j i j i j

e e e e N

Finally

Page 58: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Hence

2

1

1

ju j

c c

jj

xe

x

Now

1 2i j

uu u

i j

Ne

e e

and

1

1

1

iu i

r r

ii

xe

x

11

1 1

r c cru

i ji ji j

i j

Ne x x

x x

11

1 1

1 r c cr

i ji j

x xN

Page 59: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Hence

Note

1 1ln ln lni j

i j

u x x Nr c

1 2ln ij i ju u u 1 1

ln ln lni ji j

x x Nr c

1 1ln ln ln lni i j j

i i

x x x xr c

ln ln lni jN x x

or i jij

x x

N

Page 60: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Comments

• Maximum Likelihood estimates can be computed for any hierarchical log linear model (i.e. more than 2 variables)

• In certain situations the equations need to be solved numerically

• For the saturated model (all interactions and main effects), the estimate of ijk… is xijk… .

Page 61: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Goodness of Fit Statistics

These statistics can be used to check if a log-linear model will fit the

observed frequency table

Page 62: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Goodness of Fit StatisticsThe Chi-squared statistic

2

2 Observed Expected

Expected

The Likelihood Ratio statistic:

2 2 ln 2 lnˆ

ijkijk

ijk

xObservedG Observed x

Expected

d.f. = # cells - # parameters fitted

ˆijk ijk

ijk

x

We reject the model if 2 or G2 is greater than2

/ 2

Page 63: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Example: Variables

Coronary Heart

Serum Cholesterol

Systolic Blood pressure (mm Hg)

Disease (mm/100 cc) <127 127-146 147-166 167+ <200 2 3 3 4

Present 200-219 3 2 0 3 220-259 8 11 6 6 260+ 7 12 11 11 <200 117 121 47 22

Absent 200-219 85 98 43 20 220-259 119 209 68 43 260+ 67 99 46 33

1. Systolic Blood Pressure (B)Serum Cholesterol (C)Coronary Heart Disease (H)

Page 64: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

MODEL DF LIKELIHOOD- PROB. PEARSON PROB. RATIO CHISQ CHISQ ----- -- ----------- ------- ------- ------- B,C,H. 24 83.15 0.0000 102.00 0.0000 B,CH. 21 51.23 0.0002 56.89 0.0000 C,BH. 21 59.59 0.0000 60.43 0.0000 H,BC. 15 58.73 0.0000 64.78 0.0000 BC,BH. 12 35.16 0.0004 33.76 0.0007 BH,CH. 18 27.67 0.0673 26.58 0.0872 n.s. CH,BC. 12 26.80 0.0082 33.18 0.0009 BC,BH,CH. 9 8.08 0.5265 6.56 0.6824 n.s.

Goodness of fit testing of Models

Possible Models:1. [BH][CH] – B and C independent given H.2. [BC][BH][CH] – all two factor interaction model

Page 65: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Model 1: [BH][CH] Log-linear parameters

Heart disease -Blood Pressure Interaction

Bp Hd <127 127-146 147-166 167+ Pres -0.256 -0.241 0.066 0.431 Abs 0.256 0.241 -0.066 -0.431

,HB i ju

Bp Hd <127 127-146 147-166 167+ Pres -2.607 -2.733 0.660 4.461 Abs 2.607 2.733 -0.660 -4.461

,

,

HB i j

HB i j

u

uz

Page 66: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Multiplicative effect

,

, ,exp HB i ju

HB i j HB i ju e

Bp Hd <127 127-146 147-166 167+ Pres 0.774 0.786 1.068 1.538 Abs 1.291 1.272 0.936 0.65

, ,ln ijk H i B j C k HB i j HC i ku u u u u u

, ,H i B j C k HB i j HC i ku u u u uuijk e e e e e e

Log-Linear Model

, ,H i B j C k HB i j HC i k

Page 67: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Heart Disease - Cholesterol Interaction

Chol Hd <200 200-219 220-259 260+ Pres -0.233 -0.325 0.063 0.494 Abs 0.233 0.325 -0.063 -0.494

,HC i ku

,

,

HC i k

HC i k

u

uz

Chol Hd <200 200-219 220-259 260+ Pres -1.889 -2.268 0.677 5.558 Abs 1.889 2.268 -0.677 -5.558

Page 68: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Multiplicative effect

,

, ,exp HB i ku

HC i k HB i ku e

Chol Hd <200 200-219 220-259 260+ Pres 0.792 0.723 1.065 1.640 Abs 1.262 1.384 0.939 0.610

Page 69: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Model 2: [BC][BH][CH] Log-linear parameters

Blood pressure-Cholesterol interaction: ,BC j ku

Bp Chol <200 200-219 220-259 260+ <200 0.222 -0.019 -0.034 -0.169 200-219 0.114 -0.041 0.013 -0.086 220-259 -0.114 0.154 -0.058 0.018 260+ -0.221 -0.094 0.079 0.237

Page 70: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

,

,

BC j k

BC j k

u

uz

Bp Chol <200 200-219 220-259 260+ <200 2.68 -0.236 -0.326 -1.291 200-219 1.27 -0.472 0.117 -0.626 220-259 -1.502 2.253 -0.636 0.167 260+ -2.487 -1.175 0.785 2.051

Bp Chol <200 200-219 220-259 260+ <200 1.248 0.981 0.967 0.844 200-219 1.120 0.960 1.013 0.918 220-259 0.892 1.166 0.944 1.018 260+ 0.802 0.910 1.082 1.267

Multiplicative effect ,

, ,exp HB j ku

BC j k BC j ku e

Page 71: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Heart disease -Blood Pressure Interaction

Bp Hd <127 127-146 147-166 167+ Pres -0.211 -0.232 0.055 0.389 Abs 0.211 0.232 -0.055 -0.389

,HB i ju

Bp Hd <127 127-146 147-166 167+ Pres -2.125 -2.604 0.542 3.938

Abs 2.125 2.604 -0.542 -3.938

,

,

HB i j

HB i j

u

uz

Page 72: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Multiplicative effect

,

, ,exp HB i ju

HB i j HB i ju e

Bp Hd <127 127-146 147-166 167+ Pres 0.809 0.793 1.056 1.475

Abs 1.235 1.261 0.947 0.678

Page 73: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Heart Disease - Cholesterol Interaction

Chol Hd <200 200-219 220-259 260+ Pres -0.212 -0.316 0.069 0.460

Abs 0.212 0.316 -0.069 -0.460

,HC i ku

,

,

HC i k

HC i k

u

uz

Chol Hd <200 200-219 220-259 260+ Pres -1.712 -2.199 0.732 5.095

Abs 1.712 2.199 -0.732 -5.095

Page 74: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Multiplicative effect

,

, ,exp HB i ku

HC i k HB i ku e

Chol Hd <200 200-219 220-259 260+ Pres 0.809 0.729 1.071 1.584

Abs 1.237 1.372 0.933 0.631

Page 75: Discrete Multivariate Analysis Analysis of Multivariate Categorical Data

Next topic: Discrete Multivariate Analysis II