s005 - multinomial logit analysis tutorial

of 14/14
6. TUTORIAL FOR MULTINOMIAL LOGIT ANALYSIS * Tutorial 6, July 2005 1. This tutorial includes an appendix on regression in Excel for use with the Bookbinders Book Club case. CASE: BOOKBINDERS BOOK CLUB 1 , P. 185 In recent years, many companies have built extensive databases containing individual-level choice data. One of the most powerful models to have emerged in recent years to analyze individual-level choice data is the multinomial logit (MNL) model. In the MNL model, the dependent variable is choice—for example, the brand of detergent a customer bought when faced with a choice of five different brands of detergents. The independent variables influencing this choice could be price, promotion (whether a detergent brand was on promo- tion), or product characteristics, such as whether a detergent is scent-free. If we have such data from a number of customers over a single purchase occasion or over multiple purchase occasions, we can use an MNL model for analysis. The software implements the non-nested multinomial logit model, with an option for doing latent-class segmentation analysis to identify groups of customers with similar response patterns. It uses the EM (Expectation Maximization) algorithm to implement maximum likelihood estimation with latent classes. The likelihood of a particular sample’s occurrence (i.e., the observed values of choices made by customers) is given by a likelihood function or, more conventionally, a log-likelihood function. The iterative EM algorithm estimates parameter values by maximizing this log-likelihood function. The iterations stop if one of the following criteria is met (within a desired tolerance) for successive iterations: (1) the likelihood function does not improve, (2) the parameter estimates do not change, (3) the search gradients do not change, or (4) the program does not converge even after a large number of iterations. The input data should have a particular structure, described below, for this program to operate correctly. The first column is the choice variable having 1 or 0 (purchase or no purchase) for each alternative under consideration. The remaining columns of data cor- respond to independent variables, one for each column, including any dummy variables. The rows represent “cases.” Each case (e.g., customer) consists of two or more contiguous rows, one for each alternative, where the first column indicates whether that customer chose that alternative (dependent variable) and the remaining columns indicate the data values for each of the independent variables included in the model. Or, each case can consist of multiple observations (e.g., purchases made over several purchase occasions). When there is more than one observation per case, the observation sets must be organized sequentially. If there are N alternatives, M cases, and P observations per case, then the total number of rows of data would generally be N*M*P (except when the data set concerns a two-alternative choice situation, in which there are no observations of the independent variables for one of the choice alternatives, as described below). Note: For each case, only one alternative is chosen, i.e., has a choice value equal to 1. The following shows how the data are organized for analysis: Note: Marketing Engineering supports importing and exporting data to Excel for this model. Please refer to the Excel Input Output Guide (excelinputoutputguide.pdf) for details on how to use this feature.

Post on 26-Oct-2014

46 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

6. TuTorial for MulTinoMial logiT analysisNote: Marketing Engineering supports importing and exporting data to Excel for this model. Please refer to the Excel Input Output Guide (excelinputoutputguide.pdf) for details on how to use this feature.

Case: BookBinders Book CluB1 , p. 185In recent years, many companies have built extensive databases containing individual-level choice data. One of the most powerful models to have emerged in recent years to analyze individual-level choice data is the multinomial logit (MNL) model. In the MNL model, the dependent variable is choicefor example, the brand of detergent a customer bought when faced with a choice of five different brands of detergents. The independent variables influencing this choice could be price, promotion (whether a detergent brand was on promotion), or product characteristics, such as whether a detergent is scent-free. If we have such data from a number of customers over a single purchase occasion or over multiple purchase occasions, we can use an MNL model for analysis. The software implements the non-nested multinomial logit model, with an option for doing latent-class segmentation analysis to identify groups of customers with similar response patterns. It uses the EM (Expectation Maximization) algorithm to implement maximum likelihood estimation with latent classes. The likelihood of a particular samples occurrence (i.e., the observed values of choices made by customers) is given by a likelihood function or, more conventionally, a log-likelihood function. The iterative EM algorithm estimates parameter values by maximizing this log-likelihood function. The iterations stop if one of the following criteria is met (within a desired tolerance) for successive iterations: (1) the likelihood function does not improve, (2) the parameter estimates do not change, (3) the search gradients do not change, or (4) the program does not converge even after a large number of iterations. The input data should have a particular structure, described below, for this program to operate correctly. The first column is the choice variable having 1 or 0 (purchase or no purchase) for each alternative under consideration. The remaining columns of data correspond to independent variables, one for each column, including any dummy variables. The rows represent cases. Each case (e.g., customer) consists of two or more contiguous rows, one for each alternative, where the first column indicates whether that customer chose that alternative (dependent variable) and the remaining columns indicate the data values for each of the independent variables included in the model. Or, each case can consist of multiple observations (e.g., purchases made over several purchase occasions). When there is more than one observation per case, the observation sets must be organized sequentially. If there are N alternatives, M cases, and P observations per case, then the total number of rows of data would generally be N*M*P (except when the data set concerns a two-alternative choice situation, in which there are no observations of the independent variables for one of the choice alternatives, as described below). Note: For each case, only one alternative is chosen, i.e., has a choice value equal to 1. The following shows how the data are organized for analysis:1. This tutorial includes an appendix on regression in Excel for use with the Bookbinders Book Club case. * Tutorial 6, July 2005

6. Tutorial If you have two choice alternatives, the input data may have a special structure. In some such situations, there are no observed values of the independent variables for one of the two alternatives. For example, your data may consist of the following choice alternatives: buy or not buy (this is the type of data used in the BookBinders Book Club case discussed in the text). Data for such binary logit applications can be structured in one of two ways as shown below: (1) one row of data per case (our recommended structure for such data) for each purchase occasion, or (2) two rows of data per case per purchase occasion. The following Table shows how the data should be structured if you choose to represent each case with just one row of data:

If you choose to represent this type of data with two rows per case, then it should be structured as shown in the Table below. Note that the independent variables corresponding to the second alternative in each case is set to the base level of 0.

Depending on the structure of your data set, you should select the appropriate Number of Alternatives in the Setup menu (see below). If there are two rows of data per case, set Number of Alternatives to 2, otherwise to 1. If you have more than 2 choice alternatives, or if you have 2 specific choice alternatives (e.g., Brand A and Brand B), where each alternative has an associated set of observed values of the independent variables, organize the input data as shown below for a four-alternative choice context. The Choice variable column indicates whether a particular choice alternative was chosen by a customer (case) on a particular choice occasion and the remaining columns contain values of the independent variables for each choice alternative for each case. The screen below illustrates this data organization from the sample data file ABBLOGIT.DAT:

To enter the names of the available alternatives, go to the Edit menu and choose Edit Row Labels. If you have n choice alternatives, enter the names of these alternatives in the first n rows.

Multinomial Logit Analysis Note: If you make changes to the data to evaluate alternative solutions, the program will not automatically save these changes. You can save the changes (under a separate filename) by going to File menu and choosing Save As, and saving the file.

The rest of this tutorial is structured primarily to illustrate the use of the MNL program for the Bookbinders Book Club exercise. From the Model menu, select Multinomial Logit Analysis. You will be prompted for a data file. For this example, select the file called BBBC. DAT. You will then see the following screen:

Number of Alternatives: Enter up to eight (choice) alternatives for analysis. For the BBBC. dat supplied with version 2.0 of the marketing engineering software, set this to 1. Number of Cases: Enter the number of cases (customers) for analysis. For the BBBC case, this value is 1600. Observations/Case: Insert the number of observations per case. For example, if you observed choices of each customer over five purchase occasions, this value should be set to 5. The default value is 1, which applies to the BBBC case. Significance (%): Specify the significance level for the statistical tests. The program uses this value to identify statistically significant coefficients from the analysis. Maximum Segments: This value represents the maximum number of segments that you believe exists in the data. Based on the analysis and the accompanying indices of fit, you can select the actual number of segments. The program will then only display further results for the selected number of segments. For the BBBC case, set this to 1 and do the analysis before you experiment with additional segments. Alt. Dummy Present? If you check this option, the program will assume that, in your data file, you have one of the following two situations. First, you have already specified dummy variables that represent alternative-specific constants. If there are n alternatives, then you would need n-1 dummy variablessee example below which contains three dummy variables, called CONST 1, CONST 2, and CONST 3 for a four-alternative choice situation. Second, you have not specified any dummy variables because you do not want to include alternative-specific dummy variables in your analysis. In general, it is a good idea to include alternative-specific constants in the model. If you do not check this option, the program will automatically generate the dummy variables for alternative-specific constants. The program will then set the first choice-alternative as the base alternative in the dummy-variable specification. For the BBBC case, do not check this box.

6. Tutorial

Holdout sample size (percent): Specify a number between 0 and 100 (0 is the default) to represent the percentage of the data you want to use as a holdout sample for assessing the predictive validity of the model. The program will then select the closest feasible sample of a size equal to, or smaller than, the percentage you specify. For example, if you have 110 customers (cases) with one observation per customer, and you specify 15 percent, then the program will set aside the bottom sixteen observations in the data file as a holdout sample. The model will be estimated on the first ninety-four customers. Split by time, not case: This option indicates whether the holdout sample should be split by customers (default), or by time. The latter option is only feasible if you have more than one observation per case. If you would like the holdout sample to split by time, check this box. If you have 100 cases with three observations per case, and you specify a holdout sample size of 40 percent and split by time, then the program will use the last observation in each case (33.3 percent) in the holdout sample. That is, 100 observations out of the total of 300 will be used for predictive validation, and the remaining 200 will be used for model estimation. ID Present: This option indicates whether a unique name is assigned to each customer. IDs are useful in implementing the results of choice models for customer targeting. This option is disabled in the educational version. After you specify all of the parameters, click OK. To run the program, go to the Run menu and choose Run Model. When the program starts running, it will show a progress bar, which indicates how long it might take to complete the estimation. (Be patient if you have a large data set, or if you have specified a large number for Maximum Segments.) When the program has estimated all of the coefficients for all of the segments, it will display the following dialog box, which allows you to evaluate the results and select the appropriate number of segments in the data:

Multinomial Logit Analysis

The drop-down box at the top of this dialog box allows you to explore the model performance for any number of segments that you select up to Maximum Segments that you specified. The box under Parameter (Coefficients) Estimates shows the coefficient estimates in each segment. By clicking on the SEGMENT DETAILS button, you can see details of the model results for the selected number of segments. Look in the box below Details. Select the number of segments based on the solution that has the lowest values of AIC, BIC, and CAIC. When AIC, BIC, CAIC provide conflicting indications about the number of segments to choose, we recommend that you rely on the CAIC. It may sometimes be useful to also explore whether the hit ratio and average choice probability for the selected solution are good. The higher the hit ratio (highest value possible is 100 percent) and the larger the average choice probability compared to (1/n*100) where n is the number of alternatives, the better the solution. For more details, see the description of the various diagnostics later in this tutorial. Once you determine the best value for the number of segments, enter that number in the slot marked Enter the number of segments to retain for further analysis. All the reports and diagnostics will be based on the number of segments you select. Click OK. Note: It is possible that the program may fail to yield valid estimates for some ill-conditioned data sets, for example, where some sets of independent variables are highly correlated, or if the model is not identified (i.e., there are not enough data points for estimating all the parameters). When the program encounters ill-conditioned data matrices at any step of the estimation process, it may proceed forward by making very small changes to the specific data causing the ill-conditioning. However, the resulting estimates may be unreliable (as suggested by high standard errors of the estimates or high elasticities). In such situations, we recommend that you select a different number of maximum segments or drop some of the correlated variables before conducting analyses or interpreting the results.

6

6. Tutorial

You will now see a summary of the results in a sequence of screens, similar to the example below. Click Back and Next buttons to move back and forth among the screens. Click Print to get a printout of a screen.

The screens start with a table of coefficient estimates for each segment, followed by tables showing the elasticities for each variable in each segment (see diagnostics section later in the tutorial to interpret these elasticities). The coefficients that are significantly different from 0 (at the specified level of significance) are shown in bold. Once you view these summaries, the program then displays charts summarizing the coefficients as bar charts. You can view additional charts by going to the Results menu, choosing Summary and then View Next Chart as shown below (or click on the N button on the Menu Bar).

You can access and view these results in tabular form (before exiting the program) by going to the Results menu and choosing View Diagnostics, as described next.

diagnostiCsTo view the statistically useful information about the analysis, go to the Results menu and select View Diagnostics. The extensive set of diagnostics is organized into eight components.

Multinomial Logit Analysis Note: For multiple segment solutions, the estimates may be different each time you run the program. Most often the differences will be small. This is because our EM algorithm starts from a random assignment of customers to segments and, therefore, converges to a different local optimum (i.e., assignment of individuals to segments) each time it is run.

Diagnostics 1The first set of diagnostics that you will see simply summarizes the input data. It shows the sizes of the data sets. It indicates the number of rows of data (records) that the model processed. It displays the means of all the variables. The means are shown only if you do not specify a holdout sample for predictive validation. The means of the variables are particularly useful for detecting any problems with the data setupdo the means have the expected values? Note that the two choice alternatives in the BBBC case are labeled as Response and Dummy. The Dummy represents the second choice alternative to ensure that choice probabilities sum to 1.0.

Diagnostics 2The second set of diagnostics refers to goodness-of-fit indices called AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), and CAIC (Consistent AIC). All three criteria indicate superior model performance the closer they are to zero. These goodness-of-fit measures take into account model parsimony. That is, all other things being equal, given two models with equal (log) likelihood values, the model with the fewer parameters is better. For segmentation purposes, this means that if you are comparing a two-segment model with a three-segment model, you should select the model with the lower values of these indices.

6. Tutorial

Diagnostics 3The third set of diagnostics provides information about the parameter estimates and the variancecovariance matrix of the estimates for each segment. This information is useful for identifying the parameters that are statistically significantly different from zero. You can interpret these coefficients in a manner similar to how you interpret regression coefficients. A significant variable influences the choice probabilities of each alternative, whereas an insignificant variable does not.

Diagnostics 4The fourth set of diagnostics shows the members belonging to each segment, and the segment sizes (including both the number of customers in each segment and the segment size as a proportion of the estimation samplethe following exhibit shows a two-segment solution for the BBBC case). This information is useful for determining the characteristics of the individuals in each segment for targeting purposes. Note that the diagnostic file contains two different sets of segment membership information. The first set titled membership estimates lists the members of a segment as determined by the EM algorithm. The data of these members were used to compute the coefficients for the corresponding segment. In addition, the program lists the segment membership forecasts based on the estimation (or prediction) sample. These lists provide the model-predicted memberships of individuals (based either on the estimation sample if no holdout sample is specified, or on the prediction (holdout) sample if a holdout sample is specified). An example is shown below for a two-segment solution based on the BBBC case data.

Multinomial Logit Analysis

Diagnostics 5The fifth set of diagnostics is the estimated probability of each case (e.g., customer) selecting each of the alternatives.

Additional useful diagnostics are the hit ratio (the percentage of customers for whom the predicted choice alternative is the same as the known actual alternative chosen by the customer) and the average choice probability, which denotes the average of the estimated choice probabilities for the choices actually made by the customers. Note that for computing the hit ratio, each customer is assigned the alternative for which that customers choice probability is the highest. Hit ratios can be computed using the estimation sample, or part of the data can be set aside for holdout prediction (i.e., these data are not used in estimating the model parameters) and the hit ratio estimated on the holdout sample. If there is no holdout sample, the program estimates the hit ratio on the estimation

10

6. Tutorial sample. Otherwise, it computes it on the holdout sample, which is a better indicator of the predictive validity of the model. To interpret the average choice probability, note that for a nave model that assigns equal choice probability for each alternative, the value of the average choice probability would be 1/n, where n is the number of alternatives. For the BBBC case, the nave model should have an average choice probability of 0.5. If the average choice probability of the model is sufficiently larger than that of the nave model, it suggests there is a benefit to using the model to predict which alternative each customer would actually choose. If the estimation and prediction samples are not distinct (i.e., the holdout prediction sample is set at 0 percentthe default value), then the hit rate and average choice probability are measures of the goodness-of-fit rather than the predictive ability of the model.

Diagnostics 6The sixth set of diagnostics gives the estimated choice probabilities and can be used to compute estimated choice share (market share) of each alternative in each segment. If a holdout sample is used for prediction, then the choice shares are computed on the holdout sample, rather than on the sample used for model estimation. Otherwise, the choice shares are computed on the estimation sample. In the following example, the market share forecast for Response is 25 percent, which indicates that 25 percent of the customers responded to the BBBC promotion. (Note that each customer is assigned the alternative for which that customers choice probability is the highest.)

Diagnostics 7The seventh set of diagnostics provides information about the elasticity of impact that each variable has on choice shares, computed as follows: For each independent variable, the element in the (i,j)th position indicates the percentage change in the choice of alternative j for a one percent change of the variable for the ith choice alternative. For example, elements (1,1) and (1,2) in the following elasticity matrix for gender are 0.2145 and -0.0715. This means

Multinomial Logit Analysis

11

that if we increase the amount purchased by one percent, then the share of the customers responding to the promotion would go up by 0.2145 percent and the share of customers not responding would go down by 0.0715 percent. Here, the magnitudes of these elasticities reflect the fact that only about a fourth of the customers in the BBBC database responded to the direct mail offer.

Diagnostics 8Finally, we provide some statistics that can help evaluate how well the chosen model, the full-parametric model, compares to two naive models. The first naive model assigns equal probabilities to all alternatives (i.e., all parameters are equal to 0), and the second naive model sets all parameters other than alternative-specific constants to 0. The reported Chi-square value is asymptotically distributed as a Chi-square distribution with the indicated degree of freedom (DF).

The Goodness-of-fit index provides additional information about the performance of the model. Rho-Square is similar to the R2 measure in regression. It is an index of the extent to which the full parametric model performs better than a null model with all parameters equal to 0. Rho-Bar-Square is another goodness-of-fit measure, which is similar to Adjusted R2 in regression, which corrects for the number of parameters included in the full-parametric model.

limitations of the eduCational version of the softwareMaximum number of cases Number of observations per case: Maximum number of variables: Maximum number of choice alternatives: Maximum number of segments 10,000 20 8 9

aPPEnDiXregression in exCel for BookBinders Book CluB CaseAs a point of comparison for the logit model, we show how to run the BBBC data using ordinary least squares regression. Open BBBC.XLS file.

To start the regression-analysis tool, open the Tools menu, select Add-Ins, and then Analysis ToolPak.

Multinomial Logit Analysis

1

Next, from the Tools menu, open Data Analysis and select Regression.

You are now set to conduct regression analysis. Specify the regression model, as shown in the screen below. In particular, check the Labels box.

1

6. Tutorial

After the model runs, you should get regression results as shown in the screen below: