multinomial logistic regression: analysis of multi ... · multinomial logistic regression: analysis...

of 50/50
Multinomial logistic regression: Analysis of multi-category outcomes and its application to a Salmonella Enteritidis investigation in Ontario PHO Rounds: Epidemiology February 21, 2013 Dr. Laura Rosella, Scientist Ryan Walton, Epidemiologist

Post on 01-May-2018

223 views

Category:

Documents

Embed Size (px)

TRANSCRIPT

• Multinomial logistic regression: Analysis of multi-category outcomes and its application to a Salmonella Enteritidis investigation in Ontario

PHO Rounds: Epidemiology February 21, 2013 Dr. Laura Rosella, Scientist Ryan Walton, Epidemiologist

• www.oahpp.ca

Overview

1. An overview of multinomial logistic regression (Laura)

2. An applied example of an Salmonella Enteritidis investigation (Ryan)

2

• www.oahpp.ca

Learning Objectives

1. Familiarity with multinomial logistic regression, including the ability to identify situations when the technique may be useful

2. An understanding of the strengths and limitations of multinomial logistic regression

3. Increased awareness of SE epidemiology in Ontario

4. Working knowledge of the application of multinomial logistic regression to public health practice, and where to look for more information

3

• www.oahpp.ca

Multinomial Logit Analysis: When to use it

Binomial logistic regression has two outcomes but how do you deal with 2+ outcomes? Re-categorize into 2 outcomes

E.g. Pain scale as an outcome; options are: No pain, Mild pain, Moderate Pain, Severe Pain

Could re-categorize as: No Pain versus Mild pain, Moderate Pain, Severe Pain OR No Pain/Mild pain versus Moderate Pain, Severe Pain

Challenges with this approach: Results in an inevitable loss of information

Results can change depending on how you collapse your categories

4

• www.oahpp.ca

Multinomial Logit Analysis: When to use it

One might try to use Ordinary Least Squares (linear) regression with categorical outcome (e.g. 1,2,3 or low, medium, high)

Challenges with this approach:

The residuals cannot be normally distributed (OLS assumption)

The OLS model makes nonsensical predictions, since the dependent variable is NOT continuous

The coding is completely arbitrary i.e recoding the dependent variable can give very different results

5

• www.oahpp.ca

Multinomial Logit Analysis: When to use it

One might delete one of the categories

Challenges with this approach:

Losing information, data, and power

6

• www.oahpp.ca

Multinomial Logistic Regression

Another model which considers the full form of the outcome is called the multinomial or polytomous logistic model because the outcome is no longer assumed to be BINOMIAL but rather MULTINOMIAL

Powerful

Slightly more complicated interpretation because you are no longer comparing two outcomes (but this isnt a reason not to use it)

7

• www.oahpp.ca

Types of multinomial outcomes

1. Nominal

The outcomes do not have order; i.e. Discrete choice E.g. Outcome is a particular strain of influenza e.g. H3N2, Flu B, H1N1 no

numeric meaning

A nominal distribution is also assumed when the outcome may have an order; however, this order is not easily captured

E.g. Outcome for Asthma: Primary Care Physician visit, Hospitalization, Death

Even though they are less severe to most severe it may not be appropriate to consider them on a scale of increasing severity

Referred to as the generalized logit model

8

• www.oahpp.ca

Types of multinomial outcomes

2. Ordinal

When the categories are ordered

E.g. Disease Scales, Intensity (low, medium, high)

Referred to as the proportional odds model

If the response is ordinal, this information can: Result in simpler and more parsimonious model

Increase power to detect associations

9

• www.oahpp.ca

Ordinal logistic regression

describes the effect of the covariate x on the log-odds response in category j or below

Implies that for each outcome the curve is identical but is shifted that shift is determined by the intercept ()

Choice of baseline category is either ther highest or the lowest

10

• www.oahpp.ca 11

• www.oahpp.ca

2 = 0.4

3 = 0.3

1 = 0.3

Logit [P(Y 1)]

= log(0.3/(0.4+0.3)

Logit [P(Y 2)]

= log(0.3 + 0.4)/(0.3)

12

• www.oahpp.ca

Proportional Odds Assumption

- applies to any given category and to each cumulative probability

e is the odds ratio for x on increasing or decreasing outcome categories

This is known as the PROPOTIONAL ODDS ASSUMPTION

The s are independent of j

If this assumption holds we can use ONE coefficient to study shifts between any of the categories of the dependent variable

13

• www.oahpp.ca

Example

Merani, Abdulla, Kwong Rosella et al. Increasing tuition fees in a country with two different models of medical education. Medical Education. 2010: 44: 577586

14

• www.oahpp.ca

Example

In 2001 the odds of QB students reporting increasing levels of financial stress were 40% lower compared to the rest of Canada

Although the overall rates of financial stress changed only slightly between 2001 and 2007, students in Quebec were much less likely to report financial stress than students outside Quebec

15

• www.oahpp.ca

Testing the proportional odds assumption

Can use graphical methods as described in Harrell*

Can test in statistical software most commonly using a score test tend to be on the conservative side

* Harrell, Jr., F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York.

16

• www.oahpp.ca

What to do if Proportional Odds Test Fails

Collapse two or more levels, particularly if some of the levels have small N

Estimate separate models using dichotomizations to see how different they are

Examine graphically

Run a multinomial nominal logit model

Use the partial proportional odds model (available in SAS through PROC GENMOD) - advanced

17

• www.oahpp.ca

Logit models for nominal responses

Note that the has a j subscript meaning that each comparison has a different estimate unlike proportional hazard assumption

18

• www.oahpp.ca

Logit models for nominal responses

Multinomial regression fits the above model simultaneously for all outcome categories

Choice of the baseline category is arbitrary

19

• www.oahpp.ca

Outcome:

Willingness to vaccinate children against HPV Willing to vaccinate only if vaccine is free

Willing to vaccine even if the vaccine is not free

Not willing to vaccinate/dont know

Ran a multinomial logistic regression

20

• www.oahpp.ca

Interpretation:

Prior awareness of HPV increased parents willingness to vaccinate. This was true for both being willing to vaccinate only if the vaccine is free (OR: 1.42; 95% CI: 1.211.66) and being willing to vaccinate even if the vaccine is not free (OR: 1.96; 95% CI: 1.752.20) compared to the not willing/dont know group.

21

• www.oahpp.ca

SAS code

Ordinal: (outcome low,med,high)

proc logistic data = your_dataset;

model outcome = X1 X2;

run;

Multinomial:(outcome A,B,C)

proc logistic data = your_dataset;

class outcome (ref = A") / param = ref;

model outcome = X1 X2/ link = glogit;

run;

NB: Make sure you understand the reference category as different packages have different default settings

22

• www.oahpp.ca

Summary

Multinomial logistic regression can allow you to extract more information from your data and prevent the loss of information due to collapsing

Careful consideration is needed to interpretation when comparing multiple categories

Like any regression model, multinomial regression has assumptions, which should be carefully scrutinized

23

• www.oahpp.ca 24

Source: http://www.sagestossel.com

• www.oahpp.ca

What is Salmonella Enteritidis (SE)?

Gram-positive, rod-shaped, facultative anaerobic bacteria

Serovar belonging to the S. enterica subspecies enterica

Phage-typing used to discriminate between clusters of SE (Ward et al., 1987)

25

Symptoms include nausea, vomiting, abdominal pain, diarrhea and fever

Estimated that for every reported case of salmonellosis in Canada, there are approx. 13-37 cases in the population (Thomas et al., 2006) Source: http://salmonellablog.com

• www.oahpp.ca

Figure 1. Number of confirmed cases of SE in Ontario by month, 2007-2010

26

Data source: Ontario Ministry of Health and Long-Term Care, integrated Public Health Information System (iPHIS) database, extracted by Public Health Ontario [2013/02/13]

0

20

40

60

80

100

120

140

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Co

nfi

rmed

cas

es

of

SE in

On

tari

o

Episode month

2007

2008

2009

2010

• www.oahpp.ca

Provincial SE Investigation Partnership between PHO (then OAHPP) and the Ministry of Health and Long-Term Care

Support from public health units across the province

Centralized interviewing

Questionnaire based on previous SE outbreaks, relevant literature

Symptoms, travel, animal exposures, and 3-day food history

Eggs, chicken, dairy, nuts, certain fruits and vegetables

Multiple PTs multiple hypotheses

27

Hypothesis- generating

Case-control

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug

2010 2011

• www.oahpp.ca

To examine the associations between various risk factors and infection with different domestic PTs of SE

6

To inform the development of a case-control study that will investigate risk factors for infection with domestic PTs of SE

Hypothesis-Generating Stage Objective

28

• www.oahpp.ca

Methods Study Population

29

SE cases identified daily from Public Health Ontario Laboratory line list

Age, sex, city of residence (used to impute health unit)

SE investigation dataset contains interview data for 238 individuals

Collected between July 12 and December 10, 2010

Interview Status Total (%)

Interviewed 238 (65)

Lost to follow-up 97 (27)

Refused 31 (8)

Total 366

Table 1. Status of follow-up procedure for SE cases identified from OAHPP PHL line list between July 11 and December 1, 2010

Source: Ontario SE investigation

• www.oahpp.ca

Methods Inclusion and Exclusion Criteria

Case definition: Laboratory confirmation of SE infection with clinically compatible signs and symptoms (i.e., headache, diarrhea, abdominal pain, nausea, fever) in an Ontario resident with lab received date on or after July 10, 2010

Inclusion criteria: Interviewed case (access to telephone, ability to communicate in English), PT available

Exclusion criteria: Travel outside of Canada/U.S. in 3 days before illness onset, potential secondary cases (i.e., living in the same household as someone who was ill with similar symptoms in the week before illness onset)

30

• www.oahpp.ca 9

Phage Type Non-Travel Intl Travel Total

PT 1 2 (9) 21 (91) 23

PT 2 1 (100) 0 (0) 1

PT 3 0 (0) 2 (100) 2

PT 4 1 (25) 3 (75) 4

PT 5b 2 (14) 12 (86) 14

PT 6a 2 (100) 0 (0) 2

PT 6c 0 (0) 1 (100) 1

PT 7a 0 (0) 2 (100) 2

PT 8 52 (87) 8 (13) 60

PT 13 14 (88) 2 (12) 16

PT 13a 32 (86) 5 (14) 37

PT 14b 1 (50) 1 (50) 2

PT 15a 0 (0) 2 (100) 2

PT 21 1 (25) 3 (75) 4

PT 22 4 (80) 1 (20) 5

PT 23 1 (100) 0 (0) 1

PT 27 0 (0) 1 (100) 1

PT 51 5 (83) 1 (17) 6

PT 55 0 (0) 1 (100) 1

PT 256 0 (0) 1 (100) 1

PT 339 1 (100) 0 (0) 1

Atypical PT 2 (66) 1 (33) 3

Untypable PT 2 (66) 1 (33) 3

Unknown 2 (66) 1 (33) 32

Total 125 (64) 70 (36) 195

123 individuals

meeting case

definition

Source: Ontario SE investigation

Table 2. Reported travel history for interviewed SE cases stratified by phage type (secondary cases excluded), 2010

31

• www.oahpp.ca

Methods Measure of Variables Exposure variables obtained from interview data

Date of symptom onset Date of interview (average = 18.4 days)

Do you remember what you ate on Sunday, February 3?

Example questions:

During this 3-day period, did you eat any cooked eggs (either at home or at a restaurant)?

Yes

No

Dont know

What type of nuts did you eat? (CHECK ALL THAT APPLY)

Peanuts

Almonds

Cashews

Pistachios

Pecans

32

• 9

Table 3. Food items reported by SE cases with greater than 10% frequency during hypothesis-generating stage, 2010

Food Items OSEI (%) CDC Food Atlas Nesbitt et al. Food Items OSEI (%) CDC Food Atlas Nesbitt et al.

Cooked eggs 51 (41) 66.5 82.2 Processed cheese 45 (37) - -

Runny eggs 18 (15) 17.7 42.0 Dried or powdered cheese

13 (11) - -

Firm eggs 40 (33) Hard cheese 72 (59) 73.7 -

Eggs at home 45 (37) - - Soft cheese 16 (13) 12.6 -

Eggs at restaurant 15 (12) - - Peanut butter 37 (30) 54.1 -

Uncooked eggs 12 (10) 7.3 5.9 Nuts 16 (13) - -

Poultry 94 (76) 84.0 91.4 Almonds 12 (10) - -

Poultry at home 67 (54) - - Carrots 50 (41) 59.8 80.5

Poultry at restaurant 41 (33) - - Broccoli 38 (31) 29.3 43.4

Poultry from unfrozen 32 (26) - - Peppers 37 (30) - 27.0

Poultry from frozen 17 (14) - - Onions 51 (41) 40.8 45.9

Processed poultry 31 (25) - 19.9 Lettuce 63 (51) 72.6 84.0

Fast food poultry 14 (11) Spinach 15 (12) 16.8 19.9

Deli meat (poultry) 19 (15) 25.2 19.0 Tomatoes 62 (50) 68.2 72.5

Unaware poultry juice colour

26 (21) - - Mushrooms 25 (20) 12.7 36.5

Prepared meal from raw poultry

18 (15) - - Strawberries 40 (33) 32.8 31.5

Pasteurized milk 84 (68) 82.0 87.0 Cantaloupe 13 (11) 27.5 22.8

Any cheese 91 (74) 74.0 66.7 Source: Ontario SE investigation

33

• www.oahpp.ca

Methods Multinomial Logistic Regression

Case-case comparisons leverage exposure data collected for cases (usually with different diseases), in the absence of control data

Consistent with our hypothesis about PT-specific risk factors case-case comparisons could be made between PTs

Multinomial approach maximizes power for finding associations

34

Outcome (nominal): infection with PT 8 or PT 13a or other PT

Main exposures of interest: Processed chicken products, items containing uncooked eggs

Potential confounders, effect modifiers: sex, age, remaining risk factors

• www.oahpp.ca 12

Processed Chicken Products

Risk factor for S. Heidelberg infections in Canada (Currie et al., 2005)

Epidemiological evidence (e.g., barbeque cluster)

Food samples positive for SE

Items Containing Uncooked Eggs

Eggs have been implicated in past SE outbreaks

Epidemiologic evidence (e.g., new protein shake diet)

Biological explanation

Methods Exposures of Interest

35

• www.oahpp.ca

Methods Hosmer and Lemeshow Model Building

Test univariate models, select those with p-values < 0.2

Significant variables: cooked eggs, poultry at home, peanut butter, nuts, spinach, tomatoes, age, sex

Run multivariate models (order based on univariate p-values), select those with p-values < 0.2 or altered effect size (10% rule)

Choose and test possible interaction terms

36

model phagetype (ref = 'Other PT') = procpltry

conunckegg age sex eatnuts spinach anytomatoe

eatckeggs homepltry

• www.oahpp.ca 15

Table 4. Descriptive statistics for selected cases from the Ontario SE investigation, July to December 2010

Item Count (n) Frequency (%)

Gender (n = 122)

Male 64 52.5

Female 58 47.5

Age (n = 123)

0-19 53 43.1

20-39 40 32.5

40 + 30 24.4

PT (n = 123)

PT 8 52 42.3

PT 13a 32 26.0

Other PTs 39 31.7

Symptoms (n = 123)

Diarrhea 113 91.9

Nausea 57 46.3

Fever 81 65.9

Source: Ontario SE investigation 37

Results Descriptive Statistics (Person)

• www.oahpp.ca 16

38

Results Descriptive Statistics (Place)

Source: Ontario SE investigation

• www.oahpp.ca 17

Figure 2. Epidemic curve for cases selected from the Ontario SE Investigation (n=110)

39

Results Descriptive Statistics (Time)

0

1

2

3

4

5

Co

nfi

rmed

cas

es

of

SE in

On

tari

o

Symptom onset date Source: Ontario SE investigation

• www.oahpp.ca 18

Table 5a. Unadjusted multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category

Item No. Cases in Analysis

SE PT 8 SE PT 13a p - value n (%) OR (95% CI) n (%) OR (95% CI)

Processed Chicken Products

No 92 34 (37.0) 1 23 (25.0) 1 0.08

Yes 31 18 (58.1) 3.95 (1.12-13.1) 9 (29.0) 3.28 (0.88-12.2)

Items Containing Uncooked Eggs

No 111 48 (43.2) 1 25 (22.5) 1 0.43

Yes 12 4 (33.3) 0.98 (0.20-4.80) 5 (41.7) 2.24 (0.48-10.5)

40

Results Multinomial Logistic Regression

Source: Ontario SE investigation

• www.oahpp.ca 18

Table 5b. Age- and sex-adjusted multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category

Item No. Cases in Analysis

SE PT 8 SE PT 13a p - value n (%) OR (95% CI) n (%) OR (95% CI)

Processed Chicken Products

No 92 34 (37.0) 1 23 (25.0) 1 0.11

Yes 31 18 (58.1) 3.78 (1.09-13.1) 9 (29.0) 2.77 (0.71-10.8)

Items Containing Uncooked Eggs

No 111 48 (43.2) 1 25 (22.5) 1 0.48

Yes 12 4 (33.3) 0.99 (0.20-4.87) 5 (41.7) 2.15 (0.45-10.2)

41

Results Multinomial Logistic Regression

Source: Ontario SE investigation

• www.oahpp.ca 18

Table 5c. Full model* multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category

Item No. Cases in Analysis

SE PT 8 SE PT 13a p - value n (%) OR (95% CI) n (%) OR (95% CI)

Processed Chicken Products

No 92 34 (37.0) 1 23 (25.0) 1 0.10

Yes 31 18 (58.1) 4.91 (1.13-21.4) 9 (29.0) 2.89 (0.58-14.5)

Items Containing Uncooked Eggs

No 111 48 (43.2) 1 25 (22.5) 1 0.43

Yes 12 4 (33.3) 1.48 (0.23-9.41) 5 (41.7) 3.25 (0.48-22.1)

42

Results Multinomial Logistic Regression

Source: Ontario SE investigation * Full model is adjusted for age, sex, and consumption of cooked eggs, poultry at home, nuts, spinach, tomatoes

• www.oahpp.ca 21

Table 6. Full model* multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category

* Full model also contains sex, age and consumption of processed chicken and items containing uncooked eggs

Full Model* Multinomial Regression

Item No. Cases in Analysis

SE PT 8 SE PT 13a p - value

n (%) OR (95% CI) n (%) OR (95% CI)

Cooked Eggs

No 72 29 (40.3) 1 22 (30.6) 1 0.28

Yes 51 23 (45.1) 0.92 (0.33-2.60) 10 (19.6) 0.40 (0.12-1.39)

Poultry at Home

No 56 19 (33.9) 1 13 (23.2) 1 0.67

Yes 67 33 (49.3) 1.57 (0.59-4.19) 19 (28.4) 1.35 (0.42-4.29)

Nuts

No 107 45 (42.1) 1 29 (27.1) 1 0.30

Yes 16 6 (37.5) 0.55 (0.13-2.23) 2 (12.5) 0.21 (0.03-1.55)

Spinach

No 108 47 (43.5) 1 25 (23.1) 1 0.16

Yes 15 5 (33.3) 1.15 (0.21-6.43) 7 (46.7) 3.95 (0.70-22.3)

Tomatoes

No 61 30 (49.2) 1 12 (19.7) 1 0.07

Yes 62 22 (35.5) 0.53 (0.19-1.52) 20 (32.3) 1.84 (0.55-6.14)

43

Source: Ontario SE investigation

• www.oahpp.ca

Results Interpretation

Domestic cases of SE PT 8 are 4.91 (1.13-21.4) times more likely to consume processed chicken products compared to domestic cases representing other phage types, when controlling for sex, age and consumption of cooked eggs, poultry at home, nuts, spinach, and tomatoes.

44

• www.oahpp.ca

Discussion Strengths and Limitations

Limitation Small sample size (especially stratified by PT)

Limitation Recall issues

Limitation Attributable proportion explained

Limitation Validity of comparison group (aggregation of other PTs)

Strength Centralized interviewing

Eliminates inter-health unit differences in follow-up procedure

Standard questionnaire, limited number of interviewers

Strength Multinomial approach, proof of principle

In agreement with notion of PT-specific hypotheses

Uses greatest number of SE cases increased power for evaluating associations of interest compared to binomial logistic regression

45

• www.oahpp.ca

Discussion Implications

Results support laboratory and epidemiologic evidence implicating processed chicken as a risk factor for infection with domestic PTs of SE

Nine processed chicken samples tested positive during hypothesis-generating stage (four PT8, three PT13a, one PT19, one PT22)

Anecdotal reports from cases indicated a lack of consumer knowledge re: uncooked nature of some products

Results fed into provincial case-control study that ran from January to August 2011

Questionnaire design, food sampling, sample size calculations

Results from case-control study, after adjustment for confounders, the following were significantly associated with SE infection: Consuming any poultry aOR 2.24, 95% CI 1.31-3.83

Processed chicken aOR 3.32, 95% CI 1.26-8.7

Not washing hands following handling of raw eggs OR 2.82, 95% CI 1.48-5.37

46

• www.oahpp.ca 24

47

Discussion Looking Ahead If any meat product is not a ready-to-eat meat product but has the

appearance of, or could be mistaken for, a ready-to-eat meat product, the meat product shall bear the following information on its label: (a) the words must be cooked, raw product, uncooked or any equivalent words or word as part of the common name of the product to indicate that the product requires cooking before consumption; and (b) comprehensive cooking instructions such as an internal temperature-time relationship that, if followed, will result in a ready-to-eat meat product (SOR/90-288).

• www.oahpp.ca

Lauras References

Agresti, A. (2012) Categorical Data Analysis. New Jersey: John Wiley.

Harrell, Jr., F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York.

Armstrong, BG, and M. Sloan. Ordinal models for epidemiologic data. Am J Epidemiol 1989;129:191204.

Ananth, CV; Kleinbaum, DG. Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology. 1997; 26: 1323-1333 .

Preisser, JS; Koch, GG. Categorical data analysis in public health. Annual Review of Public Health. 1997; 18: 51-82.

48

• www.oahpp.ca

Ryans References Ward LR, de Sa JD, Rowe B. A phage-typing scheme for Salmonella enteritidis. Epidemiol Infect

1987;99(2):291-294.

Thomas MK, Majowicz SE, Sockett PN, et al. Estimated numbers of community cases of illness due to Salmonella, Campylobacter and verotoxigenic Escherichia coli: pathogen-specific community rates. Can J Infect Dis Med Microbiol 2006 Jul-Aug;17(4):229234.

Currie A, MacDougal M, Aramini J, et al. Frozen chicken nuggets and strips and eggs are leading risk factors for Salmonella Heidelberg infections in Canada. Epidemiol Infect 2005;133:809816.

Tighe MK, Savage, R, Vrbova, L, et al. The epidemiology of travel-related Salmonella Enteritidis in Ontario, Canada, 20102011. BMC Public Health 2012;12:310.

Nesbitt A, Majowicz S, Finley R, et al. Food consumption patterns in the Waterloo Region, Ontario, Canada: a cross-sectional telephone survey. BMC Public Health 2008;8;370.

Centers for Disease Control and Prevention (CDC). Foodborne Active Surveillance Network (FoodNet) Population Survey Atlas of Exposures. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2006-2007.

Varga C, Middleton D, Walton R, et al. Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a case-case study approach. BMC Public Health 2012;12;866.

49

• www.oahpp.ca 26

Public Health Ontario

Dr. Dean Middleton

Rachel Savage

Steven Johnson

Caitlin Johnson

Duri Song

Diana Yung