applied chice analysis

Download Applied chice analysis

If you can't read please download the document

Upload: darha-chavez-vasquez

Post on 12-Apr-2017

203 views

Category:

Economy & Finance


1 download

TRANSCRIPT

  • http://www.cambridge.org/9780521844260

  • Applied Choice Analysis

    A Primer

    Almost without exception, everything human beings undertake involves a choice.In recent years, there has been a growing interest in the development and appli-cation of quantitative statistical methods to study choices made by individualswith the purpose of gaining a better understanding both of how choices are madeand of forecasting future choice responses. In this primer, the authors providean unintimidating introduction to the main techniques of choice analysis and in-clude detail on themes such as data collection and preparation, model estimationand interpretation, and the design of choice experiments. A companion websiteto the book provides practice data sets and software to estimate the main dis-crete choice models such as multinomial logit, nested logit, and mixed logit. Thisprimer will be an invaluable resource to students as well of immense value to con-sultants/professionals, researchers, and anyone else interested in choice analysisand modeling.

    Companion website www.cambridge.org/0521605776

    . is Director of the Institute of Transport Studies and Profes-sor of Management in the Faculty of Economics and Business at the Universityof Sydney

    . is a Lecturer at the Institute of Transport Studies at the Universityof Sydney

    . is Professor of Economics and Entertainment and MediaFaculty Fellow in the Department of Economics at the Stern School of Business,New York University

  • Applied ChoiceAnalysisA Primer

    David A. HensherThe University of Sydney

    John M. RoseThe University of Sydney

    William H. GreeneNew York University

  • Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo

    Cambridge University PressThe Edinburgh Building, Cambridge , UK

    First published in print format

    - ----

    - ----

    - ----

    David A. Hensher, John M. Rose, William H. Greene 2005

    2005

    Information on this title: www.cambridg e.org /9780521844260

    This book is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take placewithout the written permission of Cambridge University Press.

    - ---

    - ---

    - ---

    Cambridge University Press has no responsibility for the persistence or accuracy ofs for external or third-party internet websites referred to in this book, and does notguarantee that any content on such websites is, or will remain, accurate or appropriate.

    Published in the United States of America by Cambridge University Press, New York

    www.cambridge.org

    hardback

    paperbackpaperback

    eBook (NetLibrary)eBook (NetLibrary)

    hardback

    http://www.cambridge.org/9780521844260http://www.cambridge.org

  • Contents

    List of figures page xiiiList of tables xviiiPreface xxiii

    Part I Basic topics

    1 In the beginning 3

    2 Basic notions of statistics 82.1 Introduction 82.2 Data 8

    2.2.1 The importance of understanding data 102.3 A note on mathematical notation 10

    2.3.1 Summation 112.3.2 Product 12

    2.4 Probability 122.4.1 Relative frequencies 132.4.2 Defining random variables 142.4.3 Probability distribution functions 142.4.4 Cumulative distribution functions 162.4.5 Multivariate probability density functions 172.4.6 The multivariate probability function 182.4.7 Marginal probability density functions 212.4.8 Conditional probability density functions 212.4.9 Defining statistical independence 23

    2.5 Properties of random variables 232.5.1 Expected value 25

    2.5.1.1 Properties of expected values 262.5.2 Variance 28

    2.5.2.1 Properties of variance 28

    v

  • vi Contents

    2.5.3 Covariance 302.5.3.1 Properties of covariance 31

    2.5.4 The variancecovariance matrix 322.5.5 Correlation 33

    2.5.5.1 Properties of the correlation coefficient 342.5.6 Correlation and variances 36

    2.6 Sample population statistics 362.6.1 The sample mean 362.6.2 The sample variance 382.6.3 The sample covariance 382.6.4 The sample correlation coefficient 39

    2.7 Sampling error and sampling distributions 392.8 Hypothesis testing 41

    2.8.1 Defining the null and alternative hypotheses 422.8.2 Selecting the test-statistic 442.8.3 Significance of the test and alpha 452.8.4 Performing the test 512.8.5 Example hypothesis test: the one sample t -test 51

    2.9 Matrix algebra 522.9.1 Transposition 532.9.2 Matrix addition and subtraction 532.9.3 Matrix multiplication by a scalar 542.9.4 Matrix multiplication 542.9.5 Determinants of matrices 552.9.6 The identity matrix 562.9.7 The inverse of a matrix 572.9.8 Linear and quadratic forms 582.9.9 Positive definite and negative definite matrices 59

    2.10 Conclusion 59Appendix 2A Measures of correlation or similarity 59

    3 Choosing 623.1 Introduction 623.2 Individuals have preferences, and they count 633.3 Using knowledge of preferences and constraints in choice analysis 713.4 Setting up a behavioral choice rule 743.5 Deriving a basic choice model 823.6 Concluding overview 86

    4 Paradigms of choice data 884.1 Introduction 884.2 Data consistent with choice 894.3 Revealed preference data 92

    4.3.1 Choice-based sampling 95

  • Contents vii

    4.4 Stated preference (or stated choice) data 964.5 Further comparisons 974.6 Why not use both RP and SP data? 984.7 Socio-demographic characteristic data 98

    5 Processes in setting up stated choice experiments 1005.1 Introduction 1005.2 What is an experimental design? 100

    5.2.1 Stage 1: Problem definition refinement 1035.2.2 Stage 2: Stimuli refinement 104

    5.2.2.1 Refining the list of alternatives 1045.2.2.2 Refining the list of attributes and attribute levels 105

    5.2.3 Stage 3: Experimental design considerations 1095.2.3.1 Labeled versus unlabeled experiments 1125.2.3.2 Reducing the number of levels 1145.2.3.3 Reducing the size of experimental designs 1155.2.3.4 Dummy and effects coding 1195.2.3.5 Calculating the degrees of freedom required 1225.2.3.6 Blocking the design 126

    5.2.4 Stage 4: Generating experimental designs 1275.2.4.1 Assigning an attribute as a blocking variable 130

    5.2.5 Stage 5: Allocating attributes to design columns 1315.3 A note on unlabeled experimental designs 1505.4 Optimal designs 152Appendix 5A Designing nested attributes 154Appendix 5B Assignment of quantitative attribute-level labels 156

    6 Choices in data collection 1616.1 Introduction 1616.2 General survey instrument construction 1616.3 Questionnaires for choice data 166

    6.3.1 Stage 6: Generation of choice sets 1666.3.2 Stage 7: Randomizing choice sets 1706.3.3 Stage 8: Survey construction 172

    6.3.3.1 Choice context 1736.3.3.2 Use an example 1746.3.3.3 Independence of choice sets 1746.3.3.4 More than one choice 1756.3.3.5 The no-choice or delay-choice alternative 176

    6.4 Revealed preferences in questionnaires 1776.5 Studies involving both RP and SP data 1776.6 Using RP data in SP experiments: the current alternative 1786.7 Sampling for choice data: the theory 184

    6.7.1 Simple random samples 185

  • viii Contents

    6.7.2 Stratified random sampling 1906.7.3 Conclusion to the theory of calculating sample sizes 192

    6.8 Sampling for choice data: the reality 193

    7 NLOGIT for applied choice analysis: a primer 1977.1 Introduction 1977.2 About the software 197

    7.2.1 About NLOGIT 1977.2.2 About NLOGIT/ACA 1987.2.3 Installing NLOGIT/ACA 198

    7.3 Starting NLOGIT/ACA and exiting after a session 1987.3.1 Starting the program 1987.3.2 Inputting the data 1987.3.3 Reading data 2007.3.4 The project file 2007.3.5 Leaving your session 201

    7.4 Using NLOGIT 2017.5 How to get NLOGIT to do what you want 202

    7.5.1 Using the Text Editor 2027.5.2 Command format 2047.5.3 Commands 2057.5.4 Using the Project File Box 206

    7.6 Useful hints and tips 2067.6.1 Limitations in NLOGIT (and NLOGIT/ACA) 207

    7.7 NLOGIT software 2077.7.1 Support 2087.7.2 The program installed on your computer 2087.7.3 Using NLOGIT/ACA in the remainder of the book 208

    Appendix 7A Diagnostic and error messages 208

    8 Handling choice data 2188.1 Introduction 2188.2 The basic data setup 219

    8.2.1 Entering multiple data sets: stacking and melding 2228.2.2 Handling data on the non-chosen alternative in RP data 2228.2.3 Combining sources of data 2248.2.4 Weighting on an exogenous variable 2268.2.5 Handling rejection: the no option 227

    8.3 Entering data into NLOGIT 2308.3.1 Entering data directly into NLOGIT 2308.3.2 Importing data into NLOGIT 232

    8.3.2.1 The Text/Document Editor 2328.3.3 Reading data into NLOGIT 2328.3.4 Writing data into NLOGIT 2358.3.5 Saving data sets 235

  • Contents ix

    8.3.6 Loading data into NLOGIT 2368.3.6.1 Changing the maximum default size of the

    Data Editor 2368.4 Data entered into a single line 2378.5 Data cleaning 241

    8.5.1 Testing for multicollinearity using NLOGIT 246Appendix 8A Design effects coding 248Appendix 8B Converting single-line data commands 250

    9 Case study: mode-choice data 2549.1 Introduction 2549.2 Study objectives 2549.3 The pilot study 256

    9.3.1 Pilot sample collection 2639.3.1.1 Interviewer briefing 2639.3.1.2 Interviewing 2649.3.1.3 Analysis of contacts 2649.3.1.4 Interviewer debriefing 265

    9.4 The main survey 2659.4.1 The mode-choice experiment 267

    9.4.1.1 Detailed description of attributes 2749.4.1.2 Using the showcards 276

    9.4.2 RP data 2769.4.3 The household questionnaire 2779.4.4 The commuter questionnaire 2779.4.5 The sample 278

    9.4.5.1 Screening respondents 2829.4.5.2 Interviewer briefing 2839.4.5.3 Interviewing 2839.4.5.4 Analysis of total contacts 2839.4.5.5 Questionnaire check edit 2849.4.5.6 Coding and check edit 2849.4.5.7 Data entry 2869.4.5.8 SPSS setup 286

    9.5 The case study data 2869.5.1 Formatting data in NLOGIT 2899.5.2 Getting to know and cleaning the data 292

    Appendix 9A The contextual statement associated with the travelchoice experiment 296

    Appendix 9B Mode-choice case study data dictionary 298Appendix 9C Mode-choice case study variable labels 302

    10 Getting started modeling: the basic MNL model 30810.1 Introduction 30810.2 Modeling choice in NLOGIT: the MNL command 308

  • x Contents

    10.3 Interpreting the MNL model output 31610.3.1 Maximum likelihood estimation 31710.3.2 Determining the sample size and weighting criteria used 32310.3.3 Interpreting the number of iterations to model convergence 32410.3.4 Determining overall model significance 32610.3.5 Comparing two models 33510.3.6 Determining model fit: the pseudo-R2 33710.3.7 Type of response and bad data 33910.3.8 Obtaining estimates of the indirect utility functions 339

    10.3.8.1 Matrix: LastDsta/LastOutput 34310.4 Interpreting parameters for effects and dummy coded variables 34410.5 Handling interactions in choice models 35210.6 Measures of willingness to pay 35710.7 Obtaining choice probabilities for the sample 36010.8 Obtaining the utility estimates for the sample 366Appendix 10A Handling unlabelled experiments 371

    11 Getting more from your model 37411.1 Introduction 37411.2 Adding to our understanding of the data 375

    11.2.1 Show 37511.2.2 Descriptives 37811.2.3 Crosstab 381

    11.3 Adding to our understanding of the model parameters 38311.3.1 ;Effects: elasticities 38411.3.2 Calculating arc elasticities 39211.3.3 ;Effects: marginal effects 393

    11.4 Simulation 39911.4.1 Marginal effects for categorical coded variables 40711.4.2 Reporting marginal effects 411

    11.5 Weighting 41311.5.1 Endogenous weighting 41311.5.2 Weighting on an exogenous variable 418

    11.6 Calibrating the alternative-specific constants of choice modelsestimated on SP data 42011.6.1 Example (1) (the market shares of all alternatives are

    known a priori) 42311.6.2 Example (2) (the market shares for some alternatives

    are unknown) 424Appendix 11A Calculating arc elasticities 426

    12 Practical issues in the application of choice models 43712.1 Introduction 43712.2 Calibration of a choice model for base and forecast years 43812.3 Designing a population data base: synthetic observations 439

  • Contents xi

    12.4 The concept of synthetic households 44012.4.1 Synthetic households generation framework 441

    12.5 The population profiler 44312.5.1 The synthetic household specification 444

    12.6 The sample profiler 44812.7 Establishing attribute levels associated with choice alternatives

    in the base year and in a forecast year 45112.8 Bringing the components together in the application phase 45212.9 Developing a decision support system 453

    12.9.1 Using the data sample averages in creating the DSS 45512.9.2 Using the choice data in creating the DSS 46112.9.3 Improving the look of the DSS 47212.9.4 Using the DSS 475

    12.10 Conclusion 475

    Part II Advanced topics

    13 Allowing for similarity of alternatives 47913.1 Introduction 47913.2 Moving away from IID between all alternatives 48113.3 Setting out the key relationships for establishing a nested logit

    model 48213.4 The importance of a behaviorally meaningful linkage mechanism

    between the branches on a nested structure 48613.5 The scale parameter 48713.6 Bounded range for the IV parameter 49013.7 Searching for the best tree structure 494

    Appendix 13A Technical details of the nested logit model 495

    14 Nested logit estimation 51814.1 Introduction 51814.2 The Hausman-test of the IIA assumption 51914.3 The nested logit model commands 530

    14.3.1 Normalizing and constraining IV parameters 53414.3.2 RU1 and RU2 53814.3.3 Specifying start values for the NL model 53814.3.4 A quick review of the NL model 539

    14.4 Estimating an NL model and interpreting the output 54114.4.1 Estimating the probabilities of a two-level NL model 54914.4.2 Comparing RU1 to RU2 556

    14.5 Specifying utility functions at higher levels of the NL tree 56414.6 Handling degenerate branches in NL models 57014.7 Three-level NL models 57414.8 Searching for the best NL tree structure: the degenerate nested logit 57714.9 Combining sources of data: SP-RP 580

  • xii Contents

    14.10 Additional commands 592Appendix 14A The Hausman-test of the IIA assumption for models with

    alternative-specific parameter estimates 595Appendix 14B Three-level NL model system of equations 601

    15 The mixed logit model 60515.1 Introduction 60515.2 Mixed logit choice models 60615.3 Conditional distribution for sub-populations with common choices 61015.4 Model specification issues 611

    15.4.1 Selecting the random parameters 61115.4.2 Selecting the distribution of the random parameters 61215.4.3 Imposing constraints on a distribution 61415.4.4 Selecting the number of points for the simulations 61415.4.5 Preference heterogeneity around the mean of a random

    parameter 61715.4.6 Accounting for observations drawn from the same individual

    correlated choice situations 61715.4.7 Accounting for correlation between parameters 619

    15.5 Willingness-to-pay challenges 62015.6 Conclusions 621

    16 Mixed logit estimation 62316.1 Introduction 62316.2 The mixed logit model basic commands 62316.3 NLOGIT output: interpreting the mixed logit model 62716.4 How can we use random parameter estimates? 635

    16.4.1 A note on using the lognormal distribution 64116.5 Imposing constraints on a distribution 64516.6 Revealing preference heterogeneity around the mean of a

    random parameter 65016.6.1 Using the non-stochastic distribution 65616.6.2 Handling insignificant heterogeneity around the mean

    parameter estimates 66116.7 Correlated parameters 66716.8 Common-choice-specific parameter estimates: conditional

    parameters 67916.9 Presenting the distributional outputs graphically using a kernel

    density estimator 68416.10 Willingness-to-pay issues and the mixed logit model 686

    Glossary 695References 710Index 714

  • Figures

    2.1 The PDF of a continuous random variable page 162.2 The CDF of a discrete random variable 182.3 The CDF of a continuous random variable 182.4 Plotting the homework, workhome trip times 352.5 Sampling distribution for 1000 draws 412.6 One-tailed test distribution 432.7 Two-tailed test distribution 442.8 The relationship between and 472.9 The rejection region for one- or two-tailed tests 49

    2.10 Calculating the determinant of a matrix in Excel 562.11 A singular matrix 562.12 The inverse of a matrix in Microsoft Excel 582.13 Calculating the inverse of a matrix in Microsoft Excel 583.1 The identification of an individuals preferences for bus use 653.2 The budget or resource constraint 673.3 Changes to the budget or resource constraint 683.4 Individual preferences subject to a budget constraint 683.5 Indifference curves with budget constraints 693.6 Demand curve construction 703.7 Changes in demand and changes in quantity demanded 714.1 The technological frontier and the roles of RP and SP data 975.1 The experimental design process 1025.2 Mapping part-worth utility 1075.3 Stages in deriving fractional factorial designs 1155.4 Estimation of linear vs quadratic effects 1215.5 Generating designs using SPSS 1285.6 Specifying the number of attribute levels per attribute 1295.7 Specifying the minimum number of treatment combinations to

    generate 1305.8 Calculating interaction design codes using Microsoft Excel 133

    xiii

  • xiv List of figures

    5.9 Microsoft Excel commands to generate correlations 1335.10 Microsoft Excel Data Analysis and Correlation dialog boxes 1406.1 Stages 68 of the experimental design process 1666.2 Example choice sets 1696.3 The meaning of grey 1696.4 Example shoe choice set 1706.5 Mode-choice example 1746.6 Example of choice set with more than two choices 1756.7 Collecting RP data for inclusion in an SP experiment 1786.8 Choice sets using fixed attribute-level labels 1806.9 Choice sets using percentage attribute-level labels 183

    6.10 Calculating Z2 using Microsoft Excel 1866.11 Calculating sample sizes using Microsoft Excel 1876.12 Calculating the allowable error using Microsoft Excel 1896.13 Within-stratum acceptable error using overall population proportions 1926.14 Within-stratum acceptable error using strata population proportions 1937.1 Initial NLOGIT desktop 1997.2 File Menu on Main Desktop and Open Project . . . Explorer 1997.3 NLOGIT desktop after Project File Input 20007.4 Dialog for Exiting NLOGIT and Saving the Project File 2017.5 Dialog for Opening the Text Editor 2037.6 Text Editor Ready for Command Entry 2037.7 Commands in the Text Editor 2048.1 Choice set with the no-travel alternative 2288.2 Creating new variables using the toolbar commands 2308.3 Naming new variables using the New Variable dialog box 2318.4 Project dialog box 2318.5 The New Text/Document Editor pushbutton 2328.6 The Go pushbutton 2328.7 Changing the default workspace available in NLOGIT 2378.8 The Options dialog box 2378.9 Differences between the data and experimental design setups 2479.1 Instructions and format of the initial mode-choice survey 2589.2 Example of the format of the mode-choice experiment showcard 268

    10.1 Likelihood surface 31910.2 A complex negative LL surface with local sub-optimal solutions 32110.3 The sigmoid curve 32810.4 Using Excel to perform the 2LL test 33110.5 The 2LL Chi-square test 33110.6 Using Excel to calculate a pseudo-R2 33810.7 Mapping the pseudo-R2 to the linear R2 33810.8 Matrix: LastDsta button 34310.9 Matrix: LastOutp example 344

    10.10 Linear and non-linear in parameters marginal utilities 346

  • List of figures xv

    10.11 Marginal utility estimates for the number of vehiclesnumber of driversinteraction 354

    10.12 Marginal utility estimates for the faretime interaction 35710.13 Model probabilities saved within NLOGIT 36110.14 Calculating choice probabilities using Excel 36210.15 Using Excel to perform what-if scenarios on the choice probabilities 36410.16 The Project dialog box and New Scalar dialog box 36510.17 Estimated utilities saved within NLOGIT 36810.18 Using Excel to test what-if scenarios on the utility estimates 36910.19 Obtaining the average utility using the Calc; command 37010A.1 Two scenarios from an unlabeled choice experiment 372

    11.1 Proportions of correct and incorrect predictions 38311.2 Marginal effects as the slopes of the tangent lines to the cumulative

    probability curve 39511.3 Marginal effects for a categorical (dummy coded) variable 396

    11A.1 Data Editor for calculating arc elasticities 43211A.2 Within-choice-set elasticities 433

    12.1 Synthetic household generation process 44212.2 Constructing the front end of a DSS 44512.3 Using the Microsoft Excel Data Validation option 45612.4 Creating the back end of a DSS 45612.5 Setting up the output for the DSS 45712.6 Creating a graph to visually represent predicted market adjustments

    due to policy changes 45812.7 Using the attribute-level data averages as the initial DSS attribute

    levels 45812.8 Creating a front end using percentage changes 46212.9 Copying the raw choice data into the DSS 463

    12.10 Linking the back end to the front end of the DSS 46412.11 Using the vlookup command in Microsoft Excel 46412.12 Using the if statement to restrict the possible attribute-level ranges

    in the data 46512.13 Inserting the parameter estimates into the back end of the DSS 46612.14 Calculating the utilities for each alternative 46712.15 Calculating the choice probabilities 46712.16 PivotTable and PivotChart Report command menu 46812.17 PivotTable and PivotChart Wizard box 46812.18 Selecting the pivot table data using the PivotTable Wizard 46912.19 The third PivotTable and PivotChart Wizard 46912.20 Creating the pivot table through the Layout Wizard 47012.21 Calculating the choice shares using the Microsoft pivot table 47112.22 Refreshing the data of the pivot table 47112.23 The Introduction screen buttons 47212.24 The Microsoft Excel Forms toolbar 472

  • xvi List of figures

    12.25 Recording macros 47312.26 The DSS Help screen 47412.27 Run Model and Reset Attribute Levels for DSS2 47513.1 A tree diagram to recognize the (potential) linkages between choice sets 485

    13A.1 A four-level NL tree structure 49613A.2 Relationship between the scale parameters at levels 1 and 2 of an NL

    tree structure 50013A.3 Relationship between the scale parameters at levels 2 and 3 of an NL

    tree structure 50713A.4 Relationship between the scale parameters at levels 3 and 4 of an NL

    tree structure 51114.1 Example NL tree structure one 53114.2 Example tree structure two 53214.3 Example tree structure three 53314.4 Example tree structure four 53414.5 Calculating probability and utilities of an NL model using NLOGIT 55414.6 Example calculations of the probabilities and utilities of an NL model

    using NLOGIT 55614.7 An NL tree structure with a degenerate alternative 57114.8 An NL tree structure with two degenerate alternatives 57214.9 A three-level NL tree structure with degenerate branches 574

    14.10 Pooling RP and SP data sources 58314.11 Saving predicted probabilities, conditional probabilities, and

    IV parameters 59414.12 Saving the utility estimates in NL models 59515.1 1000 draws on the unit square 61516.1 Testing for statistical differences in means of triangular and normal

    distributions 63216.2 Testing dispersion of the toll random parameter 63316.3 Testing dispersion of the cartc random parameter 63416.4 Histogram of a randomly drawn normal distribution with mean zero and

    standard deviation one 63716.5 Histogram of a hypothetical sample for a lognormal distributed random

    parameter 63816.6 Hypothetical individual-specific parameter estimates derived from a

    lognormal distribution 63816.7 Hypothetical parameter distribution for a standard normal distribution 63916.8 Histogram of hypothetical parameter estimates for a uniform

    distribution 64016.9 Transforming a uniform distribution into a triangular distribution 642

    16.10 Histogram of hypothetical parameter estimates for a triangulardistribution 643

    16.11 Individual-specific parameter estimates derived from an unconditionallognormal random parameter 645

  • List of figures xvii

    16.12 Placing a constraint upon the dispersion of the cartc random parameterestimate 646

    16.13 Means and spreads of ctc, ctnc, ctp, and cntp random parameters 67116.14 Plotting the marginal utilities from unconditional parameter estimates 68016.15 Locating the conditional parameter estimates 68216.16 A matrix with the stored conditional random parameter estimates 68316.17 Kernel density function for fuel and toll cost parameters 68616.18 WTP-I matrix with WTP ratio 68916.19 Calculating the VTTS for conditional parameter estimates

    (unconstrained distribution) 68916.20 Calculating the VTTS for conditional parameter estimates

    (constrained distribution) 69216.21 The car VTTS distribution 69316.22 The public transport VTTS distribution 69316.23 Plotting the VTTS derived from triangular distributions 695

  • Tables

    2.1 Calculating the relative frequencies for travel to work page 132.2 The PDF of a discrete random variable 152.3 PDF and CDF for a discrete random variable 172.4 Frequency counts for homework and workhome trips 192.5 The bivariate distribution for homework and workhome trips 202.6 Marginal probability distributions 222.7 Demonstrating statistical independence of two random variables 232.8 Demonstrating that the homework and workhome travel times

    are not statistically independent 242.9 The expected value of rolling two dice 25

    2.10 Calculating the average homework and workhome travel times 262.11 The expected values of two separate rolls of a die 262.12 The expected value of a random variable multiplied by a constant 272.13 Calculating the variance of rolling two dice 292.14 The variance of two independent throws of a die 292.15 Calculating X1 X2 f (X1, X2) for homework and workhome trips 312.16 Calculating X1 X2 f (X1, X2) for the roll of two dice 322.17 Estimating the variances for homework and workhome trips 352.18 Respondent ratings of two soft drinks 372.19 Calculating the sample covariance between the ratings of two soft drinks 392.20 The relationships between H0, H1, Type I errors, and Type II errors 472.21 Comparing alpha to the probability value 502A.1 Appropriate correlation formula 60

    5.1 Full factorial design 1105.2 Full factorial design coding 1105.3 Comparison of design codes and orthogonal codes 1115.4 Choice treatment combination 1125.5 Labeled choice experiment 1125.6 Attribute levels for expanded number of alternatives 1145.7 Dummy coding 119

    xviii

  • List of tables xix

    5.8 Effects coding structure 1205.9 Effects coding formats 121

    5.10 Minimum treatment combination requirements for maineffects only fractional factorial designs 123

    5.11 Enumeration of all two-way interactions 1255.12 Orthogonal coding of fractional factorial design 1325.13 Orthogonal codes for main effects plus all two-way

    interaction columns 1345.14 Design correlation 1365.15 Attributes assigned to design columns 1435.16 Using blocking variables to determine allocation of treatment

    combinations 1455.17 Effects coding design of table 5.15 1465.18 Correlation matrix for effects coded design 1475.19 34 Fractional factorial design 1485.20 Randomizing treatment combinations to use for additional

    design columns 1485.21 Correlation matrix for randomizing treatment combinations 1495.22 Using the foldover to generate extra design columns 1495.23 Correlation matrix for designs using foldovers to generate

    additional columns 1505A.1 Pricequality combinations 1545A.2 Fractional factorial of a 61 32 design with pricequality nested

    attribute 1555A.3 Correlation matrix of a fractional factorial of a 61 32 design

    with nested pricequality attribute 1555B.1 Comparison of coding formats 1575B.2 Estimating linear effects using quantitative attributes 1585B.3 Correlation matrix for design shown in table 5B.2 1595B.4 Estimating linear effects using different quantitative attribute-level

    labels 1595B.5 Correlation matrix for the design shown in table 5B.3 160

    6.1 Attaching cognitively meaningful attribute labels to attribute levels 1676.2 A reproduction of the first two treatment combinations for the

    table 5.11 design 1686.3 A 24 orthogonal design with two blocking variables 1716.4 Randomization of choice sets across surveys 1726.5 Reported attribute levels and SP attribute-level labels, different

    approaches 1796.6 Attribute-level labels for a 24 orthogonal design 1796.7 Pivoting SP attribute levels from reported attribute levels using

    percentages 1816.8 Attribute-level labels as percentage changes for a 24 orthogonal

    design 182

  • xx List of tables

    6.9 Pivoting SP attribute levels from reported attribute levels usingwider percentages 182

    6.10 Differences between relative and absolutetreatment of sample proportions 187

    6.11 Breakdown of intercity travellers, by purpose of trip 1906.12 Strata sample sizes calculated using the overall population

    proportions 1916.13 Strata sample sizes calculated using the strata population

    proportions 1918.1 Most general choice data format in NLOGIT 2208.2 Varying alternatives within choice sets 2208.3 Varying the number of alternatives within choice sets: (1) 2218.4 Varying the number of alternatives within choice sets: (2) 2238.5 Entering socio-demographic characteristics 2238.6 Combining SP and RP data 2258.7 Exogenous weights entered 2278.8 Adding the no-choice or delay-choice alternative 2298.9 Data entered into a single line 239

    8.10 RP data in a single-line data format 2398A.1 Design effects coding 248

    9.1 Attributes and levels used for the initial mode-choice showcards 2579.2 Instructions and format of the initial mode-choice survey 2599.3 Orthogonal fractional factorial mode-choice pilot experimental

    design 2609.4 The set of attributes and attribute levels in the mode-choice

    experiment 2699.5 Mode-choice experimental design 2709.6 Correlation matrix for mode-choice experimental design 2739.7 Number of interviews for each city 2789.8 Targeted number of interviews, by location 2799.9 Summary at urban area-wide level of profile of households, by

    fleet size 2819.10 Final questionnaire totals, by state (after ITS editing) 2859.11 Breakdown, by city, for the SP and RP data sets 2859.12 Profile of RP modal share, chosen main mode 2879.13 Profile of RP modal share, alternative mode 2879.14 Effects coding for the wkremply variable 29010.1 Reproduction of table 8.8 31410.2 Searching over a range of values to maximize L* 31810.3 Calculating log likelihood values 32010.4 Part-worth utility estimates for the number of vehiclesnumber

    of drivers interaction 35410.5 Part-worth utility estimates for the faretime interaction 354

  • List of tables xxi

    11.1 Relationship between elasticity of demand, change in price,and revenue 387

    11.2 Reporting marginal effects 41212.1 Household distribution at population level: indexing 25 core cells 44312.2 Cross-tabulation of households at core-cell level 44412.3 SHD1 method 44612.4 SHD2: uniform distribution method 44712.5 Uniform distribution allocation 44712.6 SHD3 44812.7 Household characteristics: a comparison of synthetic household

    projection to Census data 44912.8 Resident characteristics: a comparison of synthetic household

    projection to Census data 45014.1 SPRP alti2 values 58316.1 Comparison of the VTTS derived from unconditional parameter

    estimates, conditional (unconstrained), and conditional (constrained)parameter estimates 692

    16.2 Statistical outputs for VTTS estimated using triangular distributions 695

  • Preface

    Im all in favor of keeping dangerous weapons out of the hands of fools. Letsstart with typewriters. (Frank Lloyd Wright, 18681959)

    Almost without exception, everything human beings undertake involves a choice (con-sciously or sub-consciously), including the choice not to choose. Some choices are theresult of habit while others are fresh decisions made with great care, based on whateverinformation is available at the time from past experiences and/or current inquiry.

    Since the 1970s, there has been a steadily growing interest in the development andapplication of quantitative statistical methods to study choices made by individuals (and,to a lesser extent, groups of individuals). With an emphasis on both understanding howchoices are made and forecasting future choice responses, a healthy literature has evolved.Reference works by Louviere, Hensher, and Swait (2000), and Train (2003) synthesize thecontributions. However while these two sources represent the state of the art (and practice),they are technically advanced and often a challenge for the beginner and practitioners.

    Discussions with colleagues over the last few years have revealed a gap in the literatureof choice analysis a book that assumes very little background and offers an entry point forindividuals interested in the study of choice regardless of their background. Writing such abook increasingly became a challenge for us. It is often more difficult to explain complexideas in very simple language than to protect ones knowledge-base with complicateddeliberations.

    There are many discussion topics in this primer that are ignored in most books on thesubject, yet are issues which students have pointed out in class as important in giving them abetter understanding of what is happening in choice modeling. The lament that too manybooks on discrete choice analysis are written for the well informed is common and issufficient incentive to write this book.

    This primer for beginners is our attempt to meet the challenge. We agreed to try andwrite the first draft without referring to any of the existing material as a means (hopefully)of encouraging a flow of explanation. Pausing to consult can often lead to terseness in thecode (as writers of novels can attest). Further draft versions leading to the final productdid, however, cross-reference to the literature to ensure we had acknowledged appropriate

    xxiii

  • xxiv Preface

    material. This primer, however, is not about ensuring that all contributors to the literatureon choice are acknowledged, but rather to ensure that the novice choice analyst is given afair go in their first journey through this intriguing topic.

    We dedicate this book to the beginners but we also acknowledge our research colleagueswho have influenced our thinking as well as co-authored papers over many years. We espe-cially recognize Dan McFadden (2000 Nobel Laureate in Economics), Ken Train, ChandraBhat, Jordan Louviere, Andrew Daly, Moshe Ben-Akiva, and David Brownstone.Colleagues and doctoral students at the University of Sydney read earlier versions. Inparticular, we thank Sean Puckett, Kwang Kim and Louise Knowles and the January 2004graduate class in Choice Analysis at The University of Sydney, who were guinea pigs forthe first full use of the book in a teaching environment. Sean Puckett also contributed tothe development of the glossary.

  • Part I

    Basic topics

    If knowledge can create problems, it is not through ignorance that we can solvethem. (Isaac Asimov, 19201992)

  • 1 In the beginning

    Education is a progressive discovery of our own ignorance. (Will Durant, 18851991)

    Why did we choose to write this primer? Can it be explained by some inherent desireto seek personal gain, or was it some other less self-centered interest? In determiningthe reason, we are revealing an underlying objective. It might be one of maximizing ourpersonal satisfaction level or that of satisfying some community-based objective (or socialobligation). Whatever the objective, it is likely that there are a number of reasons why wemade such a choice (between writing and not writing this primer) accompanied by a set ofconstraints that had to be taken into account. An example of a reason might be to promotethe field of research and practice of choice analysis; examples of constraints might be thetime commitment and the financial outlay.

    Readers should be able to think of choices that they have made in the last seven days.Some of these might be repetitive and even habitual (such as taking the bus to work insteadof the train or car), buying the same daily newspaper (instead of other ones on sale); otherchoices might be a one-off decision (such as going to the movies to watch a latest releaseor purchasing this book). Many choice situations involve more than one choice, such aschoosing a destination and means of transport to get there, or choosing where to live andthe type of dwelling.

    The storyline above is rich in information about what we need to include in a study of thechoice behavior of individuals. To arrive at a choice, an individual must have considered aset of alternatives. These alternatives are usually called the choice set. Logically one mustevaluate at least two alternatives to be able to make a choice (one of these alternatives maybe not to make a choice or not participate at all). At least one actual choice settingmust exist (e.g. choosing where to live), but there may be more than one choice (e.g. whattype of dwelling to live in, whether to buy or rent, and how much to pay per week if rented).The idea that an individual may have to consider a number of choices leads to a set ofinter-related choices.

    Determining the set of alternatives to be evaluated in a choice set is a crucial task inchoice analysis. Getting this wrong will mean that subsequent tasks in the development of

    3

  • 4 Applied Choice Analysis

    a choice model will be missing relevant information. We often advise analysts to devoteconsiderable time to the identification of the choices that are applicable in the study of aspecific problem. This is known as choice set generation. In identifying the relevant choices,one must also consider the range of alternatives and start thinking about what influencesthe decision to choose one alternative over another. These influences are called attributesif they relate to the description of an alternative (e.g. travel time of the bus alternative),but an individuals prejudices (or tastes) will also be relevant and are often linked to socio-economic characteristics such as personal income, age, gender, and occupation.

    To take a concrete example, a common problem for transportation planners is to studythe transport-related choices made by a sample of individuals living in an urban area. Indi-viduals make many decisions related to their transportation needs. Some of these decisionsare taken occasionally (e.g. where to live and work) while others are taken more often (e.g.departure time for a specific trip). These examples highlight a very important feature ofchoice analysis the temporal perspective. Over what time period are we interested instudying choices? As the period becomes longer, the number of possible choices that canbe made (i.e. are not fixed or predetermined) are likely to increase. Thus if we are interestedin studying travel behavior over a five-year period then it is reasonable to assume that anindividual can make choices related to the locations of both living and working, as well asthe means of transport and departure time. That is, a specific choice of means of transportmay indeed be changed as a consequence of the person changing where they reside orwork. In a shorter period such as one year, choosing among modes of transport may beconditional on where one lives or works, but the latter is not able to be changed given thetime that it takes to relocate ones employment.

    The message in the previous paragraphs is that careful thought is required to define thechoice setting so as to ensure that all possible behavioral responses (as expressed by a setof choice situations) can be accommodated when a change in the decision environmentoccurs. For example, if we increase fuel prices, then the cost of driving a car increases.If one has studied only the choice of mode of transport then the decision maker will beforced to modify the choice among a given set of modal alternatives (e.g. bus, car, train).However it may be that the individual would prefer to stay with the car but to change thetime of day they travel so as to avoid traffic congestion and conserve fuel. If the departuretime choice model is not included in the analysis, then experience shows that the modalchoice model tends to force a substitution between modes which in reality is a substitutionbetween travel at different times of day by car.

    Armed with a specific problem or a series of associated questions the analyst nowrecognizes that to study choices we need a set of choice situations (or outcomes), a set ofalternatives, and a set of attributes that belong to each alternative. But how do we take thisinformation and convert it to a useful framework within which we can study the choicebehavior of individuals? To do this, we need to set up a number of behavioral rules underwhich we believe it is reasonable to represent the process by which an individual considersa set of alternatives and makes a choice. This framework needs to be sufficiently realistic toexplain past choices and to give confidence in likely behavioral responses in the future thatresult in staying with an existing choice or making a new choice. The framework should alsobe capable of assessing the likely support for alternatives that are not currently available,

  • In the beginning 5

    be they new alternatives in the market or existing ones that are physically unavailable tosome market segments.

    The following sections of this primer will introduce the main rules that are needed tostart understanding the richness of methods available to study. We will start right fromthe beginning and learn to walk before we run. We will be pedantic in the interest ofclarity, since what is taken for granted by the long-established choice analyst is oftengobbledy-gook to the beginner. Intolerance on the part of such experts has no place inthis primer.

    We have found in our courses that the best way to understand the underlying constructsthat are the armory of choice analysis is to select one specific choice problem and followit through from the beginning to the end. We will do this here. Selecting such a single casestudy is always fraught with problems since, no matter what we select, it will not be ideal forevery reader. To try and offer a smorgasbord of case studies would (in our view) defeat thepurpose of this primer. While readers will come from different disciplinary backgroundssuch as economics, geography, environmental science, marketing, health science, statistics,engineering, transportation, logistics, and so forth and will be practicing in these and otherfields, the tools introduced through one case study are universally relevant.

    A reader who insists that this is not so is at a disadvantage; she is committing the sinof assuming uniqueness in behavioral decision making and choice response. Indeed thegreat virtue of the methods developed under the rubric of choice analysis is their universalrelevance. Their portability is amazing. Disciplinary boundaries and biases are a threat tothis strength. While it is true that specific disciplines have a lot to offer to the literatureon choice analysis, we see these offerings as contributions to the bigger multi-disciplinaryeffort.

    The case study focuses on a transport choice the choice between a number of publicand private modes of transport for travel within an urban area. The data were collectedin 1994 as part of a larger study that resulted in the development of an environmentalimpact simulator to assess the impact of policy instruments on the demand for travel. Wehave selected this specific context because we have a real data set (provided on the primerwebsite) that has all of the properties we need to be able to illustrate the following featuresof choice analysis:

    1. There are more than two alternatives (in particular, car drive alone, car ride share,train, and bus). This is important because a choice situation involving more than twoalternatives introduces a number of important behavioral conditions that do not existwhen studying a binary choice.

    2. It is possible to view the set of alternatives as more than one choice (e.g. choosingbetween public and private modes, choosing among the private modes, and choosingamong the public modes). This will be important later to show how to set up a choiceproblem with more than one (inter-related) choice decision.

    3. Two types of choice data have emerged as the primary sources of choice response.These are known as revealed preference (RP) and stated preference (SP) data. RPdata refer to situations where the choice is actually made in real market situations; incontrast, SP data refer to situations where a choice is made by considering hypothetical

  • 6 Applied Choice Analysis

    situations (which are typically the same alternatives in the RP data set, but are describedby different levels of the same attributes to those observed in actual markets as wellas additional attributes not in the data collected from actual markets). SP data areespecially useful when considering the choice among existing and new alternativessince the latter are not observed in RP data. The case study data have both RP andSP choice data with the SP choice set comprising the exact same four alternativesin the RP data set plus two new alternatives light rail and a dedicated buswaysystem.

    4. Often in choice modeling we over- and under-sample individuals observed to choosespecific alternatives. This is common where particular alternatives are dominant orpopular (in this data it is use of the car compared to public transport). The case studydata have over-sampled existing choosers of bus and train and under-sampled car users.In establishing the relative importance of the attributes influencing the choice amongthe alternatives we would want to correct for this over- and under-sampling strategyby weighting the data to ensure reproduction of the population choice shares. Theseweighted choice shares are more useful than the sample choice shares.

    5. The data have a large number of attributes describing each alternative and character-istics describing the socio-economic profile of each sampled trip maker (e.g. personalincome, age, car ownership status, occupation). This gives the analyst plenty of scope toexplore the contributions of attributes of alternatives and characteristics of individualsto explaining choice behavior.

    6. The alternatives are well-defined modes of transport that are described by labels suchas bus, train, car drive alone, and car ride share. A data set with labeled alternatives ispreferred over one where the alternatives are not well defined in terms of a label suchas abstract alternatives that are only defined by combinations of attributes. Labeledalternatives enable us to study the important role of alternative-specific constants.

    7. Finally, most analysts have had personal experience in choosing a mode of transportfor the journey to work. Thus the application should be very familiar.

    The following chapters set out the process of choice analysis in a logical sequence consistentwith what researchers and practitioners tend to do as they design their study and collect allthe necessary inputs to undertake data collection, analysis, and reporting. We begin with adiscussion on what we are seeking to understand in a study of choice; namely the role of anindividuals preferences and the constraints that limit the ability to choose alternatives thatare the most preferred in an unconstrained setting. Having established the central role ofpreferences and constraints, we are ready to formalize a framework within which a set ofbehavioral rules can be introduced to assist the analyst in accommodating these individualpreferences, recognizing that the analyst does not have as much information about theseindividual preferences as the individual decision maker being studied. The behavioralrules are used to develop a formal model of choice in which we introduce the sources ofindividual preferences (i.e. attributes), constraints on such preferences (i.e. characteristicsof individuals, peer influences, and other contextual influences), and the available set ofalternatives to choose from. This is where we introduce choice models such as multinomiallogit and nested logit.

  • In the beginning 7

    With a choice-modeling framework set out, we are ready to design the data stage. Impor-tant issues discussed are survey design and administration, data paradigms, data collectionstrategies, and data preparation for model estimation. Many analysts have difficulties inpreparing their data in a format suitable for model estimation. Although there are a numberof software options available for model estimation, we have selected NLOGIT for tworeasons it is the most popular software package for choice model estimation and it isthe package that the authors have greatest expertise in using (William Greene and DavidHensher are the developers of NLOGIT). We set out, step by step, what the analyst mustdo to run a simple choice model and then introduce more advanced features. The resultsare interpreted in a way that ensures that the main outputs are all considered and reportedas appropriate. Estimating models is only one critical element of the choice-modelingprocess. The findings must be used in various ways such as forecasting, scenario anal-ysis, valuation (willingness to pay or WTP), and understanding of the role of particularattributes and characteristics. We discuss the most common ways of applying the resultsof choice models such as simulating the impact of changes in levels of attributes, derivingmarginal rates of substitution (or values) of one attribute relative to another (especially ifone attribute is measured in monetary units), and in constructing empirical distributionsof specific attributes and ratios of attributes. Throughout the book we add numerous hintsunder the heading of As an aside. This format was chosen as a way of preserving the flowof the argument but placing useful tips where they would best be appreciated. Before wecan delve into the foundations of choice analysis we need to set out the essential statisticalconcepts and language that readers new to statistics (or very rusty) will need to make therest of the book easier to read.

  • 2 Basic notions of statistics

    If scientific reasoning were limited to the logical processes of arithmetic, weshould not get very far in our understanding of the physical world. One might aswell attempt to grasp the game of poker entirely by the use of the mathematics ofprobability. (Vannevar Bush 18901974)

    2.1 Introduction

    This chapter is intended to act as a review of the basic statistical concepts, knowledge ofwhich is required for the reader to fully appreciate the chapters that follow. It is not designedto act as a substitute for a good grounding in basic statistics but rather as a summaryof knowledge that the reader should already possess. For the less confident statistician,we recommend that in reading this and subsequent chapters, that they obtain and readother books on the subject. In particular, we recommend for the completely statisticallychallenged Statistics without Tears: A Primer for Non-Mathematicians (Rowntree 1991).More confident readers may find books such as those by Howell (1999) and Gujarati (1999,chapters 25) to be of particular use.

    2.2 Data

    Data are fundamental to the analysis and modeling of real world phenomena such asconsumer and organizational behavior. Understanding data are therefore critical to anystudy application and nowhere more than to studies involving discrete choice analysis.The data sets which we use, whether collected by ourselves or by others, will invariably bemade up of numerous observations on multiple variables (an object that can take on manydifferent values). Only through understanding the qualities possessed by each variable willthe analyst be capable of deriving the most benefit from their data.

    We may define variables on a number of different dimensions. Firstly, variables may bequalitative or quantitative. A qualitative variable is one in which the true or naturally

    8

  • Basic notions of statistics 9

    occurring levels or categories taken by that variable are not described as numbers butrather by verbal groupings (e.g. the levels or categories that hair color may take mightinclude red, blond, brown, black). For such variables, comparisons are based solely on thequalities possessed by that particular variable. Quantitative variables on the other hand arethose in which the natural levels take on certain quantities (e.g. price, travel time). Thatis, quantitative variables are measurable in some numerical unit (e.g. dollars, minutes,inches, etc.).

    A second dimension is whether the variable is continuous or non-continuous in nature.A continuous variable is one in which it can theoretically assume any value between thelowest and highest points on the scale on which it is being measured (e.g. speed, price, time,height). For continuous variables, it is common to use a scale of measure such as minutesto quantify the object under study; however, invariably, such scale measures are potentiallyinfinitely divisible (e.g. one could measure time in seconds, or thousandths of seconds, andso on). As such, continuous-level data will only be an approximation of the true value takenof the object under study, with the precision of the estimate dependent upon the instrumentof measure. Non-continuous variables, sometimes referred to as discrete variables, differfrom continuous variables in that they may take on a relatively few possible distinct values(e.g. male and female for gender).

    A third dimension used in describing data are that of scales of measurement. Scales ofmeasurement describe the relationships between the characteristics of the numbers or levelsassigned to objects under study. Four classificatory scales of measurement were developedby Stevens (1951) and are still in use today. These are nominal, ordinal, interval, and ratio.

    Nominal scaled dataA nominal scaled variable is a variable in which the levels observed for that variable areassigned unique values values which provide classification but which do not provide anyindication of order. For example, we may assign the values zero to represent males and oneto represent females; however, in doing so, we are not saying that females are better thanmales or males are better than females. The numbers used are only to categorize objects. Assuch, it is common to refer to such variables as categorical variables (note that nominal datamust be discrete). For nominal scaled data, all mathematical operations are meaningless(i.e. addition, subtraction, division and multiplication).

    Ordinal scaled dataOrdinal scaled data are data in which the values assigned to levels observed for an object are(1) unique and (2) provide an indication of order. An example of this is ranking of productsin order of preference. The highest-ranked product is more preferred than the second-highest-ranked product, which in turn is more preferred than the third-ranked product,etc. While we may now place the objects of measure in some order, we cannot determinedistances between the objects. For example, we might know that product A is preferredto product B; however, we do not know by how much product A preferred to product B.Thus, addition, subtraction, division and multiplication are meaningless in terms of ordinalscales.

  • 10 Applied Choice Analysis

    Interval scaled dataInterval scaled data are data in which the levels of an object under study are assigned valueswhich are (1) unique, (2) provide an indication of order, and (3) have an equal distancebetween scale points. The usual example is temperature (Centigrade or Fahrenheit). Ineither scale, 41 degrees is higher than 40 degrees, and the increase in heat required to gofrom 40 degrees to 41 degrees is the same as the amount of heat to go from 20 degreesto 21 degrees. However, zero degrees is an arbitrary figure it does not represent anabsolute absence of heat (it does represent the temperature at which water freezes howeverwhen using the Centigrade scale). Because of this, you may add and subtract interval scalevariables meaningfully but ratios are not meaningful (that is 40, degrees is not strictly twiceas hot as 20 degrees).

    Ratio scaled dataRatio scaled data are data in which the values assigned to levels of an object are (1) unique,(2) provide an indication of order, (3) have an equal distance between scale points, and(4) the zero point on the scale of measure used represents an absence of the object beingobserved. An example would be asking respondents how much money was spent on fastfood last week. This variable has order ($1 is less than $2), has equal distances among scalepoints (the difference between $2 and $1 is the same as the difference between $1000 and$999), and has an absolute zero point ($0 spent represents a lack of spending on fast fooditems). Thus, we can add, subtract and divide such variables meaningfully ($1 is exactlyhalf of $2).

    2.2.1 The importance of understanding data

    Information such as that given in the previous section can be found in any of a number ofstatistics books. This fact alone suggests that understanding the types of data one has is animportant (perhaps the most important) element in conducting any study. It is sometimeslost on the practitioner that the type of data she has (whether she collected it herself ornot) dictates the type of analysis that can be undertaken. All statistical analysis makesassumptions with regard to the data used. For example, if one has collected data on incometo use as the dependent variable in a linear regression analysis, the income variable mustbe collected in a format that meets the data requirements of the analytical technique forwhich it was collected to be used. Thus, in collecting data (i.e. in writing surveys), one mustalways be cognizant of the types of analysis one intends to conduct, even if the analysis isnot to be conducted for another six months. Statistics, like a game of chess, requires theplayers to always be thinking several moves ahead.

    2.3 A note on mathematical notation

    In this section we outline the mathematical notion that is used throughout the book.

  • Basic notions of statistics 11

    2.3.1 Summation

    The Greek capital letter sigma, , indicates summation or addition. For example:

    ni=1

    Xi = X1 + X2 + + Xn

    where i is an index of summation indicating that for some variable X , we take the firstvalue of X (i = 1) and add to each subsequent value taken by X up to the nth appearanceof X . The subscript i is important in mathematical notation as it is used to denote a variableas opposed to a constant term. A variable, as the name suggests, is a quantity that is ableto assume any set of values (i.e. has no fixed quantitative value). A constant is a quantitywhich is fixed at some level and as such does not vary. Constant terms are generally denotedwithout a subscript i (e.g. k).

    In practice, one can abbreviate the summation as follows:

    ni=1

    Xi or simply

    Xi

    The summation operator has several useful properties:

    1. The summation of a constant term (note we drop the subscript i for constant terms):

    ni=1

    k = k + k + + k = nk

    For example, where n = 3 and k = 4,3

    i=14 = 4 + 4 + 4 = 3 4 = 12

    2. The summation of a constant term multiplied by a variable:

    ni=1

    k Xi = k X1 + k X2 + + k Xn = k(X1 + X2 + + Xn)

    = k(

    ni=1

    Xi

    )

    For example, where k = 3 and X1 = 2, X2 = 3, and X3 = 4n

    i=13Xi = (3 2) + (3 3) + (3 4) = 3(2 + 3 + 4) = 3

    (n

    i=1Xi

    )= 27

    3. Summing two variables:

    ni=1

    (Xi + Yi ) =n

    i=1Xi +

    ni=1

    Yi

    For example, assume X1 = 2, X2 = 3, and X3 = 4 and Y1 = 4, Y2 = 3, and Y3 = 2n

    i=1(Xi + Yi ) =

    ni=1

    Xi +n

    i=1Yi = (2 + 3 + 4) + (4 + 3 + 2) = 18

  • 12 Applied Choice Analysis

    4. Adding and multiplying constants:

    ni=1

    (z + k X )i = nz + kn

    i=1Xi

    For example assume z = 5, k = 3, and X1 = 2, X2 = 3, and X3 = 4n

    i=1(5 + 3X )i = 3 5 + 3

    ni=1

    Xi = 3 5 + 3(2 + 3 + 4) = 15 + 27 = 42

    Occasionally, you may observe a double summation sign. An example of the double sum-mation is shown below:

    n1i=1

    n2j=1

    X1 X2

    The summation operator is used extensively throughout the chapters that follow. Slightlyless common is the product operator, which we discuss next.

    2.3.2 Product

    Represented by the Greek capital pi,

    , the product symbol is used to denote that theanalyst is to take the product of (multiply) the terms indicated:

    ni=1

    Xi

    As with the summation operand, the product operand is usually abbreviated. Commonabbreviations include:

    Xi

    To demonstrate the product operand, consider a situation where X1 = 2, X2 = 3, andX3 = 4

    ni=1

    Xi = 2 3 4 = 24

    In the next section we discuss the topic of probability, which is central to understandingdiscrete choice modeling.

    2.4 Probability

    Consider the possible set of outcomes of an experiment in which each potential outcomeis mutually exclusive (i.e. only one outcome can be observed at any one time) and equally

  • Basic notions of statistics 13

    likely to occur. Given n possible outcomes and m sets of circumstance that are likely toresult in outcome A, the probability that outcome A will occur is given as:

    P(A) = mn

    (2.1)

    For example, assume that our experiment consisted of the rolling of a single die (notethat die is the singular, dice the plural). From a single role of our die, the possibleset of outcomes (known collectively as the population or sample space) consists of sixpossibilities. We will observe a one, a two, a three, a four, a five, or a six. The outcomesare mutually exclusive in that we cannot observe a one and a two at the same time, andeach potential outcome, assuming the die has not been tampered with, is equally likelyto occur. Assuming that our desired outcome, A, was to observe a roll of three. Only onepossible circumstance, m, out of six possible outcomes (i.e. n = 6) will result in our desiredoutcome. Hence, from (2.1), the probability that we will observe a three in a single roll ofour dice is given as

    P(A) = 16

    In the above example, the nature of the experiment was such that the probability of anyoutcome may be known a priori (i.e. in advance). But what if the universal set of outcomesis not finite or the possibility of any two outcomes occurring is not equal? In order tounderstand the probabilities associated with such problems, one must consider relativefrequencies.

    2.4.1 Relative frequencies

    Consider table 2.1. In table 2.1 we show the observed frequency counts for various traveltimes to work. These observed frequency counts, sometimes referred to as absolute fre-quency counts, are given in the second column of the table. The relative frequencies, given

    Table 2.1. Calculating the relative frequencies for travel to work

    Travel time to Absolute frequency Relativework (minutes) observed frequency

    0 X 5 8 0.07 8 1155 < X 10 15 0.13 15 115

    10 < X 15 17 0.15 17 11515 < X 20 24 0.21 24 11520 < X 25 19 0.17 19 11525 < X 30 15 0.13 15 11530 < X 35 10 0.09 10 11535 < X 40 7 0.06 7 115

    Total 115 1

  • 14 Applied Choice Analysis

    in the third column of the table, are calculated as the ratio of the absolute frequency countsover the sum of the absolute frequency counts. For example, the relative frequency forthose traveling to work taking between 20 and 25 minutes is 0.17 (i.e. 19 115). Giventhat we do not know a priori the probability of a given outcome occurring (i.e. that wewill observe an individual with a travel time to work of 2125 minutes), we must relyon the relative frequencies to establish the probabilities for us. That is, we will use therelative frequencies as the probabilities themselves. Note that this requires knowledge ofthe absolute frequency counts, which we cannot know in advance with certainty (we canguess or estimate them in advance, however). Thus, from table 2.1, we can state that theprobability that we will observe a randomly chosen individual with a travel time to workof between 20 and 25 minutes is 0.17.

    We may use the relative frequencies of outcomes of an event as probabilities provided thesample size of absolute frequencies is sufficiently large. The law of large numbers suggeststhat a sample size of 30 may be sufficiently large; however, for some types of problems,much larger samples may be required.

    In the next sections, we discuss distribution functions. Before we do so however, weneed to formally define random variables.

    2.4.2 Defining random variables

    The concept of random variables is central to the understanding of probability. A randomvariable, sometimes referred to as a stochastic variable, is a variable in which the value thevariable is observed to take is determined as part of an experiment. That is to say, we do notknow the exact value the variable will be observed to take with any certainty a priori. Thetwo preceding illustrations provide examples of random variables. In the case of rollinga single die, the exact outcome of each roll cannot be known before the roll takes place(though the probabilities for each outcome can be calculated with certainty). In the secondexample, while the probability that we will observe an individual with a specific travel-to-work time may be calculated, we cannot say for certain that the next randomly observedindividual will have a specific travel time to work value.

    When dealing with random variables, it is necessary to study the distributions of out-comes that may be observed. In the next two sections we discuss two very important con-cepts related to the distribution functions of random variables; the probability distributionfunction and the cumulative distribution function.

    2.4.3 Probability distribution functions

    Consider the roll of a single die. The probability that the value observed will be a one isthe same as the probability that the value observed will be a three or a six. In each case,the probability that any particular value will be observed is exactly 1/6. Now consider therolling of two dice. The sum of the values of the two dice must lie between two and twelve;however, the probability that a particular value will be observed is not necessarily equal tothe probability of any of the other values being observed. Why?

  • Basic notions of statistics 15

    Table 2.2. The PDF of a discrete random variable

    Number Value Value Value Value Value Value Number ofobserved (die1, (die1, (die1, (die1, (die1, (die1, possible(X ) die2) die2) die2) die2) die2) die2) outcomes f (X )

    2 1, 1 1 1/363 1, 2 2, 1 2 1/184 1, 3 3, 1 2, 2 3 1/125 1, 4 2, 3 3, 2 4, 1 4 1/96 1, 5 2, 4 3, 3 4, 2 5, 1 5 5/367 1, 6 2, 5 3, 4 4, 3 5, 2 6, 1 6 1/68 2, 6 3, 5 4, 4 5, 3 6, 2 5 5/369 3, 6 4, 5 5, 4 6, 3 4 1/9

    10 4, 6 5, 5 6, 4 3 1/1211 5, 6 6, 5 2 1/1812 6, 6 1 1/36

    Sum: 36 1

    In table 2.2, we show all possible outcomes that may be observed in the rolling of ourtwo dice. Only one combination of circumstance will result in the sum of our two diceequalling two. That is, we will only ever observe the sum of our two dice equalling two ifthe face values of both of our dice happen to equal one. Two possible outcomes will resultin the sum of our dice equalling three however; if the first die equals one and the secondequals two, or if the first die equals two and the second die equals one. Similarly, there aresix possible outcomes that will result in the sum of our dice equalling seven. Given that sixdifferent independent outcomes will result in the sum of our two dice equalling seven, butonly one independent outcome will result in an observed value of two, it stands to reasonthat the probability of observing the sum of our two dice equalling seven is greater thanthe probability of observing our two dice equalling two.

    From table 2.2, it can be seen that there are 36 independent possible outcomes that wemight witness. The probability that any one outcome will be observed is therefore 1/36;however, the probability that we will observe a particular outcome in terms of summing theface values of our two dice will generally be much larger. We can calculate the probabilityof observing a particular value for our two dice by dividing the number of outcomes thatwill result in that value being observed (given in the second last column) by the totalnumber of possible outcomes (which in this case is 36). The resulting values, given in thelast column of table 2.2, represent the probability that for any one roll of our two dice, thesum of the values observed will equal a particular value.

    The last column, labeled f (X ) is known as the probability density function (PDF).The PDF of a (discrete random) variable (in this case the roll of two dice) represents theprobability distribution over the various values that the (discrete random) variable mighttake. As the 36 possible outcomes shown in table 2.2 represent the exhaustive set of possibleoutcomes, the probabilities will sum to one.

  • 16 Applied Choice Analysis

    Travel time to work in minutes30 40

    Probability of X being between30 and 40 minutes

    f (X)

    XPro

    babi

    lity

    dens

    ity

    0

    Figure 2.1 The PDF of a continuous random variable

    Mathematically, we may represent the PDF of a discrete random variable as (2.2).

    f (X ) = P (X = xi ) for i = 1, 2, . . . , n= 0 for X = xi (2.2)

    In words, (2.2) states that the probability of a discrete random variable takes the value xiis equal to zero for X not equal to xi and that the probability for X equal to xi is given asthe PDF.

    The PDF of a continuous random variable, though similar to that of a discrete randomvariable, cannot be treated the same. Consider a continuous random variable such as traveltime to work, represented as X . Travel time to work is a continuous random variable which,when measured in (say) minutes, may take on any value within a range. The probabilitythat we will observe an exact event for a continuous random variable is said to be zero.For example, the probability that it will take exactly 34.00000 minutes to get to worktomorrow is zero. Rather than establish the probability that a specific event will be observedto occur, when dealing with a continuous random variable, it is necessary to establish theprobability within a range of events rather than for a specific event. Hence, when dealingwith continuous random variables such as travel time to work, we say things such asthe probability that it will take between 30 and 40 minutes may be non-zero. Note thatthe probability may still be zero (you may work at home), but there may also be a non-zeroprobability attached to observing an outcome within the specified range. Figure 2.1 showsthe probability for observing a travel time to work between 30 and 40 minutes.

    2.4.4 Cumulative distribution functions

    Related to the PDF of a random variable is the cumulative distribution function (CDF).The CDF, denoted as F(X ) may be mathematically defined as in (2.3).

    F (X ) = P (X x) (2.3)

    In words, (2.3) states that the CDF is equal to the probability that a random variable, X , isobserved to take on a value less than or equal to some known value, x . To demonstrate theCDF of a random variable, reconsider our earlier example of rolling two dice. The PDF of

  • Basic notions of statistics 17

    Table 2.3. PDF and CDF for a discrete random variable

    PDF CDF

    Number observed (X ) Value observed for X f (X ) Value observed for X F(X )

    2 2 X < 3 1/36 X = 2 1/363 3 X < 4 1/18 X 3 1/124 4 X < 5 1/12 X 4 1/65 5 X < 6 1/9 X 5 5/186 6 X < 7 5/36 X 6 5/127 7 X < 8 1/6 X 7 7/128 8 X < 9 5/36 X 8 13/189 9 X < 10 1/9 X 9 5/6

    10 10 X < 11 1/12 X 10 11/1211 11 X < 12 1/18 X 11 35/3612 12 X 1/36 X 12 1

    observing any outcome is as before. The CDF is calculated as the sum of the PDF valuesof X less than or equal to a given value of x . Mathematically,

    F(X ) =x

    i=1f (X ) (2.4)

    wherex

    i=1f (X ) suggests the sum of the PDF values of X less than or equal to the some

    specified value, x . We show this relationship between the PDF and the CDF in table 2.3.For example, in table 2.3, the probability that X takes a value less than four (i.e. x = 4) is1/12 (i.e. 1/36 + 1/18). Similarly, the probability that X takes a value less than nine (i.e.x = 9) is 13/18 (i.e. 1/36 + 1/18 + 1/12 + 1/9 + 5/36 + 1/6 + 5/36).

    The information in table 2.3 pertains to a discrete random variable. As X , the sum of thevalues obtained from the role of two dice, is discrete, the CDF is discontinuous in form andthe CDF is drawn as a step function, as shown in figure 2.2. This is because X can only takeon a discrete indivisible value (i.e. we cannot observe X = 4.2). For continuous randomvariables, however, one may observe fractions of events, for example, we may observe atravel time to work of 34.23697 minutes. The CDF of a continuous random variable istherefore continuous, as shown in figure 2.3.

    2.4.5 Multivariate probability density functions

    All the examples we have considered above have related to a single random variable. ThePDF of table 2.2 and the CDF of table 2.3 are known as univariate distribution functions. Inthis section we consider the case of experiments where the observed outcomes are dependenton two or more random variables. The probability distributions of such experiments areknown as multivariate probability functions. In the examples that follow, we discuss thecase of bivariate (or two variable) problems.

  • 18 Applied Choice Analysis

    Pro

    babi

    lity

    Face value of dice

    Figure 2.2 The CDF of a discrete random variable

    00.10.20.30.40.50.60.70.80.9

    1

    5 10 15 20 25 30 35 40 45 50 55

    Travel time to work (minutes)

    Pro

    bab

    ilit

    y

    Figure 2.3 The CDF of a continuous random variable

    2.4.6 The multivariate probability function

    Consider the example of travel times to and from work. Given different traffic densitiesover a day, the amount of time spent traveling to work may not necessarily be the same asthe amount of time spent traveling home from work. The total time spent traveling to andfrom work will be a function of the two separate journeys. Table 2.4 shows the frequencycounts for homework and workhome travel times observed for 115 individuals.

    As with the previous travel time to work example, the probability of observing a spe-cific set of events (e.g. a workhome trip of 1115 minutes and a homework trip of610 minutes) is given as the relative frequency for that event. Table 2.5 shows the relativefrequencies for each cell of table 2.4. Each relative frequency is calculated as the ratio of thecells absolute value over the total population of events (i.e. 115). Table 2.5 represents the

  • Tabl

    e2.

    4.F

    requ

    ency

    coun

    tsfo

    rho

    me

    wor

    kan

    dw

    ork

    hom

    etr

    ips

    Tra

    velt

    ime

    tow

    ork

    (min

    utes

    )

    Tra

    velt

    ime

    from

    wor

    k(m

    inut

    es)

    0

    X

    55 value ] = .00000 |

    | Response data are given as ind. choice. |

    | Number of obs.= 2369, skipped 0 bad obs. |

    +-----------------------------------------------+

    +----------+-------------+----------------+--------+---------+

    | Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |

    +----------+-------------+----------------+--------+---------+

    ASCCART -.06565401 .07603514 -.863 .3879

    CST -.18276235 .02172310 -8.413 .0000

    ASCCARNT .39256214 .07707012 5.094 .0000

    ASCBUS .02150109 .09910716 .217 .8282

    ASCTN -.04096956 .09358895 -.438 .6616

    ASCBUSW .32598510 .15223518 2.141 .0322

    BUSWFARE -.26940378 .04163606 -6.470 .0000

    Ignoring the model fit for the present, the linear effect fare attribute is statistically significantand of the correct sign (i.e. higher busway fares will result in lower utility levels for thebusway alternative as well as a lower probability of that alternative being selected, all elsebeing equal ). The utility function for the busway alternative specified with a linear effectbusway fare attribute is given below:

    Vbusway = 0.3259850999 0.269403771 Fare

    Substituting busway fare values of $1, $3, and $5 into the above equation suggests utilitylevels of 0.06, 0.48, and 1.02, respectively, ceteris paribus. We plot these values infigure 10.10 along with those derived from the non-linear specification.

    From figure 10.10, the estimated utilities derived from the linear specification of thebusway fare almost perfectly match those shown from the linear trend line fitted from theutility estimates derived from the non-linear estimates. While this need not necessarily bethe case, it does provide substantive weight to the argument of using a linear effect ratherthan non-linear effects for the busway fare attribute in the above example.

  • 350 Applied Choice Analysis

    Above, we have relied on an example which has required only two parameter estimatesfor a single non-linear coded variable. What if a non-linear coded variable requires morethan two associated parameter estimates? In such cases, the analyst is required to performmultiple Wald-tests for linear restrictions simultaneously. That is, the analyst is requiredto test whether each slope associated with the non-linear coded variable is equal to ev-ery other slope. In general, the command syntax used in NLOGIT is of the followingform:

    ;Wald: , , . . . ,

    By separating each linear restriction with a comma (,), the analyst is able to conduct severalWald-tests simultaneously. For example, assume an attribute has four attribute levels whichare effects coded into three separate variables. A model estimated using these effects codedvariables will produce three parameter estimates (assuming that they are not specified asgeneric parameters). Assuming that these parameters are the seventh, eighth, and ninthparameters estimated within the model, the NLOGIT command syntax for the Wald-testof linear restriction would look as shown below:

    ;Wald: b(7) - b(8) = 0, b(7)-b(9)=0, b(8)-b(9)=0

    The total number of linear restrictions required for the test may be calculated using (10.23):

    Number of linear combinations = n(n 1)2

    (10.23)

    where n is the number of parameters estimated as part of the non-linear coding system.For the preceding example, using (10.23) we obtain the following number of combina-

    tions:

    Number of linear combinations = 3(3 1)2

    = 3

    Thus, for the above hypothetical variable specified as a non-linear effect using effects (ordummy) coding will require three linear restrictions to be tested simultaneously.

    The use of the Wald-test for linear restrictions to test whether the slopes of non-linearcoded variables are equal is but the first step in determining whether a variable should bespecified as being either linear or non-linear. The analyst is also required to consider theimpact upon the overall model fit. We advise that not only should the analyst conduct theWald-test for linear restrictions but also the log likelihood ratio-test described earlier. Thistest will determine whether the overall model fit is improved when a variable is treatedeither as a linear or non-linear effect by examining the LL function of models estimatedspecifying a variable as linear and non-linear effects.

    For the above example, we leave it to the reader to show that the specification of thebusway fare attribute as a non-linear effect does not statistically improve the model be-yond the model fit derived when the busway fare attribute is specified as a linear effect.Nevertheless, it is foreseeable that a situation may arise whereby the Wald-test of linearrestrictions suggests that a variable is best represented as a linear effect yet the overallmodel is improved statistically if the variable is specified as a non-linear effect (or vice

  • The basic MNL model 351

    versa). In such a case, the analyst must make a choice between overall model performance(and perhaps parsimony) and the estimates derived for that single variable.

    To conclude the discussion on non-linear coded variables, consider the commonly askedquestion of what one should do if one or more parameter components of a variable that hasbeen dummy or effects coded are found to be statistically insignificant. To demonstrate,consider the following NLOGIT output derived when the dummy coded fare attribute isrelated to the train alternative and not the busway alternative:

    +---------+--------------+----------------+--------+---------+

    |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] |

    +---------+--------------+----------------+--------+---------+

    ASCCART -.08183882 .07579985 -1.080 .2803

    CST -.20090871 .02090049 -9.613 .0000

    ASCCARNT .41295828 .07713362 5.354 .0000

    ASCBUS .02315993 .09921531 .233 .8154

    ASCTN -.89160160 .15865305 -5.620 .0000

    FARE1D .78442543 .17629246 4.450 .0000

    FARE3D -.07905742 .19509511 -.405 .6853

    ASCBUSW .09300017 .08587716 1.083 .2788

    In the above example output, the fare1d parameter is statistically significant while fare3d isstatistically insignificant (i.e. fare1d = 0;fare3d = 0). This suggests that, ceteris paribus,a fare of $1 is statistically different to the base fare of $5 (really, the average unobservedeffect for the train alternative given that we have used dummy codes and not effects codes).The positive sign suggests that a $1 train fare is preferred to a $5 train fare. The negativebut insignificant fare3d parameter suggests that a fare of $3 is statistically the same as the$5 fare (where fare5d = 0). The question is: should the analyst include the fare3d in themodel specification or should the variable be removed and the model re-estimated (as iscommon practice when a variable is found to be insignificant)?

    Traditionally, the answer to this question is that when multiple variables are derived froma single attribute or variable due to dummy or effects coding, unless the entire system ofdummy coded or effects coded variables is found to be statistically insignificant, then all ofthe variables are included within the model specification. The reason for this is that whena variable in the system of linked variables is removed and the model re-estimated, theimpact of that variable, albeit statistically insignificant, enters the model via the constantterm of the utility function from which the variable was removed. As such, the removalof an insignificant component of a dummy- or effects-coded variable may be viewed asconfoundment with the constant term of the associated utility functions.

    Tradition aside, if the removal of one or more components of a dummy- or effects-codedvariable due to statistical insignificance is observed to have little or no impact upon theconstant term of the associated utility function, and given that the overall model fit is notadversely affected by their removal, the analyst should seriously consider their removal.This should particularly be the case for complex models where parsimony may be a factorin determining the final model specification.

  • 352 Applied Choice Analysis

    10.5 Handling interactions in choice models

    The examples we have used to this point have assumed that there are no significant inter-action effects present within the data. This assumption is somewhat justifiable given thewell-known findings of Dawes and Corrigan (1974), who found that the vast majority ofvariance within linear models (recall that choice models are linear in the utility functions)can be explained by main effects attributes only. Indeed, Dawes and Corrigan found that:

    7090 percent of variance may be explained by main effects 515 percent of variance may be explained by two-way interactions Higher-order interactions account for the remaining variation.

    The Dawes and Corrigan findings suggest that models incorporating all main effects andall two-way interactions will account for between 75 and 100 percent of the variation!

    It is possible that non-design variables also have significant interaction effects uponchoice. Consider the case of the number of vehicles owned and the number of licenseddrivers within that household. It is possible that the number of vehicles owned within ahousehold will have a differential impact upon the choice of mode when considered inconcert with the number of licensed drivers within that same household. As such, theanalyst may consider estimating a model inclusive of the interaction effect between thenumber of vehicles owned within a household (numbvehs in the data set) and the numberof licensed drivers within each household (drivlic). The following NLOGIT commandsyntax may be used to generate just such an interaction variable:

    CREATE

    ;vehlic=numbvehs*drivlic $

    For the carnt alternative, to estimate a model that includes the number of vehiclesnumberof licensed drivers interaction, the following command syntax may be employed. Wehave not included the number of vehicles and number of drivers with licenses as separatevariables within the model specification. Unlike attributes from an experimental designwhich may be manufactured such that the interaction of two or more attributes may betreated orthogonally to the main effects of the interacted attributes, the interaction of non-design variables will likely be highly collinear with the variables that were used to producethe interaction. Thus the inclusion of the separate variables along with their associatedinteraction is likely to induce multicollinearity within the model:

    NLOGIT

    ;lhs = choice, cset, altij

    ;Choices=cart, carnt, bus, train, busway, LR

    ;Model:

    U(cart) = asccart + cstfuel /U(carnt) = asccarnt + cstfuel + vehlicvehlic /U(bus) = ascbus + cstfare /U(train) = asctn + cstfare /U(busway) = ascbusw + cstfare /U(LR) = cstfare $

  • The basic MNL model 353

    The above command syntax produces the following NLOGIT output:

    +----------------------------------------------+

    | Discrete choice (multinomial logit) model |

    | Maximum Likelihood Estimates |

    | Model estimated: Feb 30, 2003 at 07:45:00AM. |

    | Dependent variable Choice |

    | Weighting variable None |

    | Number of observations 2369 |

    | Iterations completed 5 |

    | Log likelihood function -3214.791 |

    | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj |

    | No coefficients -4244.6782 .24263 .24188 |

    | Constants only. Must be computed directly. |

    | Use NLOGIT ;...; RHS=ONE $ |

    | Chi-squared[ 2] = 1649.79774 |

    | Prob [ chi squared > value ] = .00000 |

    | Response data are given as ind. choice. |

    | Number of obs.= 2369, skipped 0 bad obs. |

    +----------------------------------------------+

    +----------+-------------+----------------+----------+----------+

    | Variable | Coefficient | Standard Error | b/St.Er. | P[|Z|>z] |

    +----------+-------------+----------------+----------+----------+

    ASCCART -.08134599 .07552487 -1.077 .2814

    CST -.20179932 .01917615 -10.523 .0000

    ASCCARNT .14618264 .11271695 1.297 .1947

    VEHLIC .14212966 .04326984 3.285 .0010

    ASCBUS .02211412 .09926283 .223 .8237

    ASCTN -.04219174 .09373865 -.450 .6526

    ASCBUSW .09484292 .08589973 1.104 .2695

    The number of vehiclesnumber of drivers interaction is a statistically significant con-tributor in explaining mode choice. From the output above, the utility function for the carntalternative may be stated as follows:

    Vcarnt = 0.1461826366 + 0.1421296639vehlic

    Table 10.4 shows the part-worth utility estimates for various combinations of number ofvehicle