220714279-workbook-2012

254
Discrete Choice Analysis: Predicting Demand and Market Shares MIT, June 11-15, 2012 Case Studies Workbook

Upload: dunszt-adam

Post on 19-Jan-2016

187 views

Category:

Documents


6 download

DESCRIPTION

workbook

TRANSCRIPT

Page 1: 220714279-Workbook-2012

Discrete Choice Analysis:

Predicting Demand and Market Shares MIT, June 11-15, 2012

Case Studies Workbook

Page 2: 220714279-Workbook-2012
Page 3: 220714279-Workbook-2012

iii

Credits

The principal authors of this edition of the case studies workbook are Gianluca Antonini,

Carmine Gioia, Emma Frejinger, and Micha l Th mans, with contributions by Maya

Abou Zeid, Ricardo Alvarez-Daziano, Ramachandran Balakrishna, Charisma Choudhury,

Matteo Sorci and Yang Wen. There have been many other Teaching Assistants over the

years who have provided significant inputs to the materials on which this workbook is

based.

The development of the case studies in this workbook was initiated and supervised by

Moshe Ben-Akiva for use in the MIT graduate course on Demand Modeling and in the

one-week continuing education course on Discrete Choice Analysis, Michel Bierlaire,

Denis Bolduc and Joan Walker participated in the development of the case studies and

contributed with many commets and suggestions.

Page 4: 220714279-Workbook-2012
Page 5: 220714279-Workbook-2012

Contents

I Introduction and Biogeme 15

1 Introduction 17

2 Biogeme 21

2.1 Install Biogeme . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Invoke Biogeme under Windows . . . . . . . . . . . . . . . . . 23

2.3 Install Emacs . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6 Step-by-Step Example . . . . . . . . . . . . . . . . . . . . . . 28

2.7 BioSim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

II Case Studies 37

3 Introduction to Model Building 39

3.1 Practical Information . . . . . . . . . . . . . . . . . . . . . . . 40

4 Binary Logit 43

4.1 Challenge Question . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Choice-Lab-Fashion Marketing Case . . . . . . . . . . . . . . . 46

4.3 Netherlands Mode Choice Case . . . . . . . . . . . . . . . . . 52

1

Page 6: 220714279-Workbook-2012

2 CONTENTS

4.4 Airline Itinerary Case . . . . . . . . . . . . . . . . . . . . . . . 57

5 Logit 63

5.1 Challenge Question . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Choice of Residential Telephone Services Case . . . . . . . . . 73

5.4 Airline Itinerary Case . . . . . . . . . . . . . . . . . . . . . . . 78

6 Specification Testing 85

6.1 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Choice of Residential Telephone Services Case . . . . . . . . . 102

6.3 Airline Itinerary Case . . . . . . . . . . . . . . . . . . . . . . . 115

7 Forecasting 133

7.1 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.2 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.3 Choice of Residential Telephone Services Case . . . . . . . . . 138

7.4 Airline Itinerary Case . . . . . . . . . . . . . . . . . . . . . . . 141

8 Multivariate (Generalized) Extreme Value Models 145

8.1 Challenge Question . . . . . . . . . . . . . . . . . . . . . . . . 146

8.2 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 150

8.3 Choice of Residential Telephone Services Case . . . . . . . . . 158

9 Mixtures of Logit and GEV Models 169

9.1 Challenge Question . . . . . . . . . . . . . . . . . . . . . . . . 170

9.2 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 173

10 Simultaneous RP/SP Estimation 189

10.1 Model Specification with RP Data . . . . . . . . . . . . . . . . 190

2

Page 7: 220714279-Workbook-2012

CONTENTS 3

10.2 Model Specification with SP Data . . . . . . . . . . . . . . . . 190

10.3 Model Specification with Combined RP-SP Data . . . . . . . . 191

A Datasets 197

A.1 Choice-Lab-Fashion Marketing Case . . . . . . . . . . . . . . . 197

A.2 Netherlands Mode Choice Case . . . . . . . . . . . . . . . . . 203

A.3 Swissmetro Case . . . . . . . . . . . . . . . . . . . . . . . . . 209

A.4 Choice of Residential Telephone Services Case . . . . . . . . . 215

A.5 Airline Itinerary Case . . . . . . . . . . . . . . . . . . . . . . . 220

A.6 Facial Expressions Recognition Case . . . . . . . . . . . . . . 227

A.7 Italy Mode Choice Case . . . . . . . . . . . . . . . . . . . . . 236

3

Page 8: 220714279-Workbook-2012

4 CONTENTS

4

Page 9: 220714279-Workbook-2012

List of Tables

1.1 Datasets and applications . . . . . . . . . . . . . . . . . . . . 20

1.2 Datasets and applications . . . . . . . . . . . . . . . . . . . . 20

4.1 BL Challenge: Netherlands results . . . . . . . . . . . . . . . . 45

4.2 BL: Choice lab marketing case estimation results . . . . . . . 48

4.3 BL: Choice lab marketing case estimation results . . . . . . . 51

4.4 BL: Netherlands mode choice case estimation results . . . . . 52

4.5 BL: Netherlands mode choice case estimation results . . . . . 54

4.6 BL: Netherlands mode choice case estimation results . . . . . 56

4.7 BL: Airline itinerary case estimation results . . . . . . . . . . 58

4.8 BL: Airline itinerary case estimation results . . . . . . . . . . 60

4.9 BL: Airline itinerary case estimation results . . . . . . . . . . 61

5.1 Logit model Challenge: Italy mode choice, Logit model esti-mation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Logit model: Swissmetro estimation results . . . . . . . . . . . 68

5.3 Logit model: Swissmetro estimation results . . . . . . . . . . . 69

5.4 Logit model: Swissmetro estimation results . . . . . . . . . . . 72

5.5 Logit model: Telephone services case estimation results . . . . 76

5.6 Logit model: Telephone services case estimation results . . . . 76

5.7 Logit model: Telephone services case estimation results . . . . 77

5

Page 10: 220714279-Workbook-2012

6 LIST OF TABLES

5.8 Logit model: Airline itinerary case estimation results . . . . . 79

5.9 Logit model: Airline itinerary case estimation results . . . . . 81

5.10 Logit model: Airline itinerary case estimation results . . . . . 83

6.1 Specification Testing: Swissmetro market segmentation test . . 87

6.2 Specification Testing: Swissmetro IIA test . . . . . . . . . . . 89

6.3 Specification Testing: Swissmetro models for Cox test . . . . . 92

6.4 Specification Testing: Swissmetro M1 estimation results . . . . 92

6.5 Specification Testing: Swissmetro M2 estimation results . . . . 93

6.6 Specification Testing: Swissmetro MC estimation results . . . 94

6.7 Specification Testing: Swissmetro piecewise linear model . . . 97

6.8 Specification Testing: Swissmetro power series model . . . . . 99

6.9 Specification Testing: Swissmetro Box-Cox transformed model 101

6.10 Specification Testing: Telephone market segmentation test . . 103

6.11 Specification Testing: Telephone IIA test . . . . . . . . . . . . 103

6.12 Specification Testing: Telephone non-nested test . . . . . . . . 107

6.13 Specification Testing: Telephone piecewise linear model . . . . 110

6.14 Specification Testing: Telephone power series model . . . . . . 112

6.15 Specification Testing: Telephone Box-Cox transformed model . 114

6.16 Specification Testing: Swissmetro market segmentation test . . 116

6.17 Specification Testing: Airline Itinerary IIA test . . . . . . . . 118

6.18 Specification Testing: Airline itinerary models for Cox test . . 122

6.19 Specification Testing: Airline itinerary M1 estimation results . 123

6.20 Specification Testing: Airline itinerary M2 estimation results . 124

6.21 Specification Testing: Airline itinerary MC estimation results . 125

6.22 Specification Testing: Airline itinerary piecewise linear model . 127

6.23 Specification Testing: Airline itinerary power series model . . 128

6.24 Specification Testing: Airline itinerary Box-Cox transformedmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6

Page 11: 220714279-Workbook-2012

LIST OF TABLES 7

7.1 Forecasting: Swissmetro fuel cost policy . . . . . . . . . . . . 137

7.2 Forecasting: Telephone new cost policy . . . . . . . . . . . . . 140

7.3 Forecasting: Airline itinerary fuel cost policy . . . . . . . . . . 143

8.1 MEV Challenge: Swissmetro NL estimation results . . . . . . 149

8.2 MEV: Swissmetro NL estimation results . . . . . . . . . . . . 152

8.3 MEV: Swissmetro CNL estimation results . . . . . . . . . . . 155

8.4 MEV: Swissmetro CNL estimation unknown α . . . . . . . . . 156

8.5 MEV: Telephone NL estimation results . . . . . . . . . . . . . 160

8.6 MEV: Telephone NL estimation results . . . . . . . . . . . . . 162

8.7 MEV: Telephone CNL estimation results . . . . . . . . . . . . 164

8.8 MEV: Telephone CNL estimation with unknown α . . . . . . 167

9.1 Mixtures Challenge: Airline itinerary case . . . . . . . . . . . 172

9.2 Mixtures: Swissmetro alternative specific variance specification 174

9.3 Mixtures: Swissmetro error component specification . . . . . . 177

9.4 Mixtures: Swissmetro error component specification . . . . . . 179

9.5 Mixtures: Swissmetro random coefficient specification . . . . . 181

9.6 Mixtures: Swissmetro mixture of nested Logit estimation . . . 184

9.7 Mixtures: Swissmetro panel data specification . . . . . . . . . 187

10.1 RP-SP: BL with RP data estimation results . . . . . . . . . . 193

10.2 RP-SP: BL with SP data estimation results . . . . . . . . . . 193

10.3 RP-SP: BL with RP-SP data estimation results . . . . . . . . 194

A.1 Choice-Lab Marketing Case: Description of variables . . . . . 201

A.2 Choice-Lab Marketing Case: Descriptive statistics . . . . . . . 202

A.3 Netherlands Mode Choice Case: Description of variables . . . 205

A.4 Netherlands Mode Choice Case: Description of variables . . . 206

A.5 Netherlands Mode Choice Case: Description of variables . . . 207

7

Page 12: 220714279-Workbook-2012

8 LIST OF TABLES

A.6 Netherlands Mode Choice Case: Descriptive statistics . . . . . 208

A.7 Swissmetro Case: Description of variables . . . . . . . . . . . 211

A.8 Swissmetro Case: Description of variables . . . . . . . . . . . 212

A.9 Swissmetro Case: Descriptive statistics . . . . . . . . . . . . . 213

A.10 Swissmetro Case: Cantons . . . . . . . . . . . . . . . . . . . . 214

A.11 Telephone Services Case: Service options . . . . . . . . . . . . 217

A.12 Telephone Services Case: Description of variables . . . . . . . 218

A.13 Telephone Services Case: Descriptive statistics . . . . . . . . . 219

A.14 The choice of airline itinerary: Description of Variables . . . . 222

A.15 The choice of airline itinerary: Description of Variables . . . . 222

A.16 The choice of airline itinerary: Description of Variables . . . . 223

A.17 The choice of airline itinerary: description of Variables . . . . 224

A.18 The choice of airline itinerary: descriptive Statistics . . . . . . 225

A.19 The choice of airline itinerary: descriptive Statistics . . . . . . 226

A.20 Facial Expressions Case: Description of Variables . . . . . . . 229

A.21 Facial Expressions Case: Description of Variables . . . . . . . 230

A.22 Facial Expressions Case: Descriptive Statistics . . . . . . . . . 232

A.23 Facial Expressions Case: Logit Model Results . . . . . . . . . 235

A.24 Italy Mode Choice Case: Description of variables . . . . . . . 238

A.25 Italy Mode Choice Case: Descriptive statistics . . . . . . . . . 239

A.26 Italy Mode Choice Case: RP Logit Model Results . . . . . . . 241

A.27 Italy Mode Choice Case: SP Logit Model Results . . . . . . . 242

A.28 Italy Mode Choice Case: RP/SP Logit Model Results . . . . . 244

A.29 Italy Mode Choice Case: RP NL Results . . . . . . . . . . . . 245

A.30 Italy Mode Choice Case: RP/SP NL Results . . . . . . . . . . 246

A.31 Italy Mode Choice Case: SP Logit Model with Agent EffectResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

A.32 Italy Mode Choice Case: RP/SP Logit Model with Agent Ef-fect Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

8

Page 13: 220714279-Workbook-2012

LIST OF TABLES 9

A.33 Italy Mode Choice Case: RP/SP NL with Agent Effect Results 250

9

Page 14: 220714279-Workbook-2012

10 LIST OF TABLES

10

Page 15: 220714279-Workbook-2012

List of Figures

2.1 Biogeme: DOS example . . . . . . . . . . . . . . . . . . . . . 22

2.2 Biogeme: DOS example . . . . . . . . . . . . . . . . . . . . . 23

2.3 Biogeme: DOS example . . . . . . . . . . . . . . . . . . . . . 24

2.4 Biogeme: Example of data file . . . . . . . . . . . . . . . . . . 29

2.5 Biogeme: Example of model file . . . . . . . . . . . . . . . . . 31

2.6 Biogeme: Example of DOS commands . . . . . . . . . . . . . 33

4.1 BL: Marketing case Biogeme snapshot . . . . . . . . . . . . . 47

5.1 Logit model Challenge: Italy mode choice logit model specifi-cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1 IIA test: Biogeme snapshot IITest section . . . . . . . . . . . 88

6.2 Specification Testing: Swissmetro Biogeme snapshot . . . . . . 96

6.3 Specification Testing: Swissmetro Biogeme snapshot . . . . . . 100

6.4 IIA test: Biogeme snapshot IITest section . . . . . . . . . . . 104

6.5 Specification Testing: Telephone Biogeme snapshot . . . . . . 109

6.6 Specification Testing: Telephone Biogeme snapshot . . . . . . 113

6.7 IIA test: Biogeme snapshot IITest section . . . . . . . . . . . 117

6.8 Specification Testing: Airline itinerary Biogeme snapshot . . . 126

6.9 Specification Testing: Airline itinerary Biogeme snapshot . . . 130

11

Page 16: 220714279-Workbook-2012

12 LIST OF FIGURES

7.1 Forecasting: Swissmetro market shares . . . . . . . . . . . . . 137

7.2 Forecasting: Telephone market shares . . . . . . . . . . . . . . 140

7.3 Forecasting: Market Shares for Non-stop Itinerary . . . . . . . 143

8.1 MEV: Swissmetro NL correlation structure . . . . . . . . . . . 147

8.2 MEV Challenge: Swissmetro NL correlation structure . . . . . 148

8.3 MEV: Swissmetro NL Biogeme snapshot . . . . . . . . . . . . 151

8.4 MEV: Swissmetro NL correlation structure . . . . . . . . . . . 151

8.5 MEV: Swissmetro CNL correlation structure . . . . . . . . . . 153

8.6 MEV: Swissmetro CNL Biogeme snapshot . . . . . . . . . . . 154

8.7 MEV: Telephone NL correlation structure . . . . . . . . . . . 159

8.8 MEV: Telephone Biogeme snapshot . . . . . . . . . . . . . . . 159

8.9 MEV: Telephone Biogeme snapshot . . . . . . . . . . . . . . . 161

8.10 MEV: Telephone Biogeme snapshot . . . . . . . . . . . . . . . 161

8.11 MEV: Telephone CNL correlations structure . . . . . . . . . . 163

8.12 MEV: Telephone Biogeme snapshot . . . . . . . . . . . . . . . 164

8.13 MEV: Telephone Biogeme snapshot . . . . . . . . . . . . . . . 166

9.1 Mixtures Challenge: Airline itinerary logit model specificationwith a random parameter . . . . . . . . . . . . . . . . . . . . 171

9.2 Mixtures: Biogeme snapshot alternative specific variance spec-ification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

9.3 Mixtures: Biogeme snapshot error component specification . . 176

9.4 Mixtures: Biogeme snapshot error component specification . . 178

9.5 Mixtures: Biogeme snapshot random coefficient specification . 180

9.6 Mixtures: Biogeme snapshot Log Normal specification . . . . . 182

9.7 Mixtures: Biogeme snapshot SB specification . . . . . . . . . . 183

A.1 The choice of airline itinerary: Survey Example . . . . . . . . 221

A.2 Facial Expressions Case: Primary Expressions . . . . . . . . . 228

12

Page 17: 220714279-Workbook-2012

LIST OF FIGURES 13

A.3 Facial Expressions Case: Facial Measures . . . . . . . . . . . . 230

A.4 Facial Expressions Case: Image Examples . . . . . . . . . . . 231

A.5 Facial Expressions Case: Interpretation of Results . . . . . . . 234

13

Page 18: 220714279-Workbook-2012

14 LIST OF FIGURES

14

Page 19: 220714279-Workbook-2012

Part I

Introduction and Biogeme

15

Page 20: 220714279-Workbook-2012
Page 21: 220714279-Workbook-2012

Chapter 1

Introduction

The objective of this workbook is to offer the reader a guide on the appli-cation of discrete choice models by the use of case studies. The workbookis addressed to an audience of both academics and practitioners and fromthe very beginner user to the most advanced one. The workbook presents astepwise approach to building, estimating, and interpreting a rich variety ofmodels with application to the fields of transportation, engineering, market-ing and economics. Examples of model specifications are provided for eachcase study together with possible interpretations of the estimation results.

The model building process is illustrated in a step by step approach, startingwith the most simple model, and then adding complexity to it. The idea ofproviding these models is to illustrate an iterative model specification processand to inspire the reader to continue the model development process. Theworkbook does not substitute the theoretical treatment of discrete choicemodels, and should be used as a companion to Ben-Akiva and Lerman (1985).Direct references to Ben-Akiva and Lerman (1985) are therefore provided ineach case study, and the theoretical material in this document is consequentlykept at a minimum.

The case studies start with the treatment of simple binary logit models tofurther continue with the application of more complex models like Gener-alized extreme value and Mixtures of logit models. An integral part of theworkbook is the treatment of forecasting, specification testing, estimation ofmodels based on revealed (RP) and stated preference (SP) data, and paneldata. The workbook includes the following chapters:

17

Page 22: 220714279-Workbook-2012

18 introduction

• Chapter 2 presents an introduction to the freeware Biogeme which isused for the model estimations. This chapter guides the reader throughthe installation and utilization of the software. It also provides a smallhands-on example on how to get started and estimate a simple model.

• Chapter 3 gives an introduction to model building and discusses somegeneral guidelines on how to work with the case studies.

• Chapters 4 and 5 treat respectively the binary logit model and thelogit model. These chapters are very important. They represent thestandard and most used models in the field of discrete choice modeling.Moreover, an extensive amount of hand holding is provided in order tofamiliarize the reader with Biogeme.

• Chapter 6 deals with specification testing; it includes several importanttopics like the McFadden IIA test, non-linear specification tests, non-nested hypothesis test, and market segmentation test.

• Chapter 7 introduces forecasting techniques that are used in order toestimate population market shares and to test policy scenarios.

• Chapter 8 treats the specification and estimation of Multivariate (Gen-eralized) Extreme Value Models and includes the Nested Logit andCross Nested Logit models. These models are very useful in buildingintuition for the understanding of more complex techniques handled inChapter 9.

• Chapter 9 deals with mixtures of logit models which represent the stateof the art in discrete choice modeling. This chapter includes severalspecifications: alternative specific variance models, error component,random coefficient, and Mixed GEV models.

• Chapter 10 treats the simultaneous estimation of models based on re-vealed and stated preference data.

• Appendix A contains the descriptions of the datasets.

The following four datasets have been used in the case studies:

• Netherlands Mode Choice: Data on intercity travelers’ choices betweenthe transport modes of rail and car.

18

Page 23: 220714279-Workbook-2012

Introduction 19

• Choice-Lab-Fashion: Data on clients of a business-to-business firm thatcollects and processes financial and customer data for their clients inthe fashion industry. The dataset includes choices of what informationproducts were purchased by the client over time as well as the choiceto remain as a client or drop as a client.

• Residential Telephone Choice: Data on households’ choices of localtelephone services.

• Swissmetro: Data on travelers’ choices of transport mode among aproposed underground system (Swissmetro), train and car.

• Airline Itinerary: Data on travellers’ ranking of different airline itineraries.

Table 1.1 indicates the use of the datasets with respect to the different casestudies.

In addition, the following three datasets are provided:

• Italy Mode Choice: Data on travellers’ choices between the transportmodes rail, bus and car.

• Facial Expression Recognition: Data on people’s interpretations of fa-cial expressions.

Table 1.2 indicates the type of models that can be specified with the differentdatasets.

19

Page 24: 220714279-Workbook-2012

20 introduction

Type of Model DatasetNetherlands Choice-Lab Residential Swissmetro AirlineMode Choice Fashion Telephone Itinerary

Binary Logit√ √ √

Logit√ √ √

Specification√ √ √

TestingForecasting

√ √ √

MEV√ √

Mixtures of Logit√

RP/SP√

Table 1.1: Datasets and applications

Type of Model DatasetItaly Facial

Mode Choice ExpressionsBinary Logit

Logit√ √

Specification√ √

TestingForecasting

√ √

MEV√ √

Mixtures of Logit√ √

RP/SP√

Table 1.2: Datasets and applications

20

Page 25: 220714279-Workbook-2012

Chapter 2

Biogeme

BIerlaire Optimization toolbox for GEv Model Estimation (Biogeme) is afreeware designed for the estimation of logit models, nested logit models andmore complex models in the Multivariate Extreme Value (MEV) family aswell as mixtures of these models (e.g. mixed logit). All information relativeto Biogeme is maintained at:

http://biogeme.epfl.ch

2.1 Install Biogeme

There is a graphical version available for Windows and for Mac OS X; noinstallation is needed in order to run this version. Simply download Biogemefrom the web page or the course USB stick and save the file to a directoryof your choice. Simply double click on the winBiogeme.exe file in order tostart Biogeme.

In the remainder of this section, we describe how to install the commandline version of Biogeme under Windows. For installation under any otherplatform, we refer the reader to the Biogeme home page.

1. Open a DOS window (from the Start menu, select Run. In the dialogbox, type cmd and select OK).

21

Page 26: 220714279-Workbook-2012

22 biogeme

2. In order to use Biogeme from any directory on your computer, youneed to place the program in a directory that is in your “path” (envi-ronment variable). To find out which directories are in your path, typepath (in the DOS window) and press the enter key. An example isgiven in Figure 2.1 where there are several possible directories, for ex-ample C:\WINDOWS\system32 or C:\WINDOWS. Note that each directoryis separated with a “;” character.

Figure 2.1: DOS example of choosing a path

3. Select a directory in your path, for example C:\WINDOWS.

4. Download Biogeme from the web site or copy it from the course USBstick to the chosen directory. The following files should be available:winBiogeme.exe, biogeme.exe, andbiosim.exe.

5. To check if the installation has been successful, just type biogeme inthe DOS window. A message displaying the version of Biogeme shouldthen appear (this is shown in Figure 2.2).

6. Please do not forget to register to the users group, homepage:http://groups.yahoo.com/group/biogeme/

Here you can find answers to frequently asked questions as well asinformation on new versions of the software.

22

Page 27: 220714279-Workbook-2012

invoke biogeme under windows 23

Figure 2.2: Output after correctly installing Biogeme

2.2 Invoke Biogeme under Windows

Biogeme is invoked in a DOS command window or a Cygwin command win-dow under Windows using the following statement structure:

biogeme model file sample file.dat

Note that the model file is given without the file extension while the sam-ple file does have the extension. When typing this command, the files areassumed to be located in the current directory.

Some useful DOS commands are listed below:

• To select a drive (e.g. C), just type C: at the prompt.

• To connect to a directory (e.g. C:\Biogeme), just type cd C:\Biogeme.

• To see the content of a directory, use Windows Explorer, or type dir .

An example of DOS commands is given in Figure 2.3. The current direc-tory in the example is first C:\Documents and Settings\Emma Frejinger.When typing the command dir, the content of this directory is displayed.In order to move to the directory My Documents, the command cd ‘‘My

Documents’’ is used (note that the quotation marks are optional). Finally,

23

Page 28: 220714279-Workbook-2012

24 biogeme

the current directory is:C:\Documents and Settings\Emma Frejinger\My Documents.In order to return to the previous (top) directory, type cd .. .

Figure 2.3: DOS example of commands

2.3 Install Emacs

For using Biogeme, you need a text editor. Wordpad is fine, but Emacs isrecommended. Note that Notepad 1 should not be used. If you want toinstall Emacs (which is window driven), the procedure is the following:

1. Create a directory for Emacs, for example C:\Emacs

2. Download Emacs for Windows from the web sitehttp://www.gnu.org/software/emacs/

or copy the file Emacs-23.2.zip from the USB stick.

3. Unzip the file into the directory.

4. In the subdirectory bin, execute addpm.exe.

1Notepad adds characters in the end of the line that Biogeme cannot read.

24

Page 29: 220714279-Workbook-2012

input files 25

5. Emacs is now available from the Windows starting menu:Start -> Programs -> Gnu Emacs -> Emacs

2.4 Input Files

Biogeme reads the following files:

• a file containing the model specification: model file.mod;

• a file containing the data: sample file.dat;

• a file containing the parameters controlling the behavior of Biogemeand of its optimization algorithms: default.par.

The model and data files are essential while the parameter file in generaldoes not need to be edited (it is created with default values when Biogemeis invoked).

Model Specification File

You can take a look at the examples on the USB stick and read the instruc-tions given on the website http://biogeme.epfl.ch on menus Biogeme andExamples to understand the details about this file. In general, the specifi-cation of the model file is explained in each case study. Here we list someimportant facts for the labs.

• Variable names are case-sensitive and should be typed exactly as theyappear in the list of variable names in the corresponding data file.

• Every string in the file must be ended with a blank space (even if it isfollowed by a parenthesis);

• Starting values, lower bounds and upper bounds for all model parame-ters to be estimated should be in float format (including decimal point).

• If there is an Alternative Specific Constant (ASC) defined for eachutility function, at least one of these must be fixed (typically set tozero), or absent from the model.

25

Page 30: 220714279-Workbook-2012

26 biogeme

• 0.0 is a reasonable starting value for ASC’s and other parameters β inthe utility functions.

Data File

All data files needed for the labs are provided on the USB stick. Theirstructure is the following:

• The first row contains the list of the variables in the file (the case isimportant).

• Each subsequent row contains the associated data, one row for eachobservation.

• No missing value is allowed and all rows must have the exact samenumber of entries. If a value is missing, a meaningless value must bewritten (e.g. 99999.9).

• Typical information for a given observation is:

– the observed choice;

– the description of the choice set through attributes describing theavailability of each alternative;

– the attributes of each alternative; and

– the socio-economic characteristics of the decision-maker.

Parameters File

This file is divided into different sections associated with different types ofparameters. Each section contains a list of parameters and their correspond-ing values. The most useful parameters for standard users are defined in thesection [GEV], in particular the following ones:

• gevAlgo which allows selection of the optimization algorithm to beused for the maximum likelihood estimation;

• gevTtestThreshold which sets the threshold for the t-test hypothesistests on explanatory variables in the model.

26

Page 31: 220714279-Workbook-2012

output files 27

This is an example of a parameter file:

[GEV]

gevAlgo=’’CFSQP’’

//gevAlgo=’’SOLVOPT’’

//gevAlgo=’’DONLP2’’

//gevAlgo=’’BIO’’

gevTtestThreshold=1.96

The remaining sections are designed for advanced users to allow flexibility tochange parameters’ default values in the different optimization algorithms.

Note that if you do not specify a parameter file, Biogeme will create a defaultone called default.par where the “BIO” algorithm is selected.

2.5 Output Files

Biogeme automatically generates several output files which are described be-low. The most important is the mymodel.html which contains the estimationresults and some statistics in an easily readable format.

• A file containing the results of the maximum likelihood estimation:mymodel.rep.

• The same file in HTML format: mymodel.html.

• A file containing the specification of the estimated model in the sameformat as the model specification file mymodel.mod: mymodel.res.

• A file containing some descriptive statistics on the sample such as thenumber of excluded observations, the total number of observations,details of group membership, etc.: mymodel.sta.

The following files are provided in order to help understand possible prob-lems:

• A file containing messages produced by Biogeme during the run: mymodel.log.

• A file containing the specification of the model as it has actually beenunderstood by Biogeme: speFile.debug.

27

Page 32: 220714279-Workbook-2012

28 biogeme

• A file containing the data stored in Biogeme to represent the model:model.debug.

• A file containing the values of the parameters which have been actuallyused by Biogeme: parameters.out.

These filenames may be modified according to the following rules:

1. If an input file mymodel.xxx does not exist, Biogeme attempts to openthe file default.xxx. If this file does not exist, Biogeme exits with anerror. Typically, the parameter file is not model dependent. Therefore,it is recommended to call it default.par to avoid copying it for eachdifferent model to be estimated.

2. If an output file mymodel.xxx already exists, Biogeme does not over-write it. Instead, it creates the file mymodel~1.xxx. If the file mymodel~1.xxxexists, Biogeme creates the file mymodel~2.xxx, and so on.

Therefore, to avoid any ambiguity, Biogeme displays the filenames actuallyused for a specific run.

If you want more detailed information on the output files generated by Bio-geme, see menu Biogeme on website http://biogeme.epfl.ch.

2.6 Step-by-Step Example

In order to help first time users of Biogeme, we provide in this section a simpleexample where we go through the estimation of a model step-by-step. Theexample works through the estimation of a binary logit model of travelers’choices between auto and rail for intercity trips (Netherlands mode choicedataset). It uses a dataset of 223 travelers. For each traveler, a chosen mode(either rail or auto) for a particular trip was collected, as well as the traveltimes and travel costs of both the traveler’s rail alternative and the traveler’sauto alternative. These travel times and travel costs are used as explanatoryvariables for the model, and the deterministic utility specifications are

28

Page 33: 220714279-Workbook-2012

step-by-step example 29

Vcar = ASCcar + βcostcarcost + βtimecartime

Vrail = βcostrailcost + βtimerailtime.

The example works through 4 steps: (1) examining the data file, (2) examin-ing the model specification file, (3) estimating the model, and (4) examiningthe outputs. As you go through the example, make sure that you know wherethe referenced files are located, how to open the files, and the basic contentsof each of the files.

Step 1: Model and Data Files

Before using Biogeme, you need to specify a model according to the datafile2. In the case studies, you never need to modify the data file, but youneed to specify your model file accordingly. An example of a data file is givenin Figure 2.4 where the first six and last five rows of the complete file areshown.

id choice rail_cost rail_time car_cost car_time

1 0 40 2.5 5 1.167

2 0 35 2.016 9 1.517

3 0 24 2.017 11.5 1.966

4 0 7.8 1.75 8.333 2

5 0 28 2.034 5 1.267

219 1 35 2.416 6.4 1.283

220 1 30 2.334 2.083 1.667

221 1 35.7 1.834 16.667 2.017

222 1 47 1.833 72 1.533

223 1 30 1.967 30 1.267

Figure 2.4: Example of Data File

Each row in the data file corresponds to one observation, except the firstone that contains the column names. The first column id contains a unique

2The files can be edited and viewed with a text editor such as Wordpad or GNU Emacs.Note that Notepad should not be used.

29

Page 34: 220714279-Workbook-2012

30 biogeme

identifier of the observation. The column named choice shows which alterna-tive has been chosen. In this example, there are two alternatives, train andcar. The choice is coded with a variable taking the value 0 if car is chosenand 1 if train is chosen. It can be seen that in the first five observations thecar alternative has been chosen, and in the last five observations the trainalternative has been chosen. The other four columns contain the values ofthe alternative attributes: rail cost, rail time, car cost and car time.

Based on this data file, we specify a binary logit model containing the costand travel time attributes as well as the alternative specific constants (theconstant of one alternative is normalized to zero). This simple model spec-ification file is shown in Figure 2.5. (Comments in the file are given after//.)

The section [Choice] defines in which column Biogeme can find the identi-fier of the chosen alternative. In this example, the column name is choice.In section [Beta], we define the parameters that are included in the utili-ties. Here we have four parameters; two alternative specific constants namedASC CAR and ASC RAIL as well as the cost (BETA COST ) and travel time(BETA TIME ) parameters. In addition to the name of each parameter, wespecify:

• default value that will be used as a starting point for the estimation,normally set to 0.0;

• lower and upper bounds: normally you can keep -100.0 and 100.0.These bounds serve as “safe-guards” for the algorithm; and

• status variable that is 0 if the parameter should be estimated and 1 ifit should be set to the default value.

In this example, we estimate all the parameters except the alternative specificconstant of the rail alternative which is set to zero.

In section [Utilities], we specify the deterministic parts of the utilities.Each row corresponds to one alternative and we need to specify:

• identifier of the alternative, which must be coherent with the identifiergiven in section [Choice], in our case 0 and 1;

• name of the alternative (can be arbitrarily chosen);

30

Page 35: 220714279-Workbook-2012

step-by-step example 31

[Choice]

choice

[Beta]

// Name DefaultValue LowerBound UpperBound status

ASC_CAR 0.0 -100.0 100.0 0

ASC_RAIL 0.0 -100.0 100.0 1

BETA_COST 0.0 -100.0 100.0 0

BETA_TIME 0.0 -100.0 100.0 0

[Utilities]

//Id Name Avail linear-in-parameter expression

0 Car one ASC_CAR * one + BETA_COST * car_cost +

BETA_TIME * car_time

1 Rail one ASC_RAIL * one + BETA_COST * rail_cost +

BETA_TIME * rail_time

[Expressions]

// Define here arithmetic expressions for name that are not directly

// available from the data

one = 1

[Model]

// Currently, only $MNL (multinomial logit), $NL (nested logit), $CNL

// (cross-nested logit) and $NGEV (Network GEV model) are valid keywords

//

$MNL

Note that there should be one line in the [Utilities] section for each alternative in themodel file (they are split in two here because of the size).

Figure 2.5: Biogeme Example of Model File

31

Page 36: 220714279-Workbook-2012

32 biogeme

• availability of the alternative: here both alternatives are always avail-able, so this value is set to one. Biogeme understands what one meansbecause it is specified in the [Expressions] section;

• linear in parameter specification of the deterministic part of the utility,that is, a list of terms separated by a +. Each term is composed of thename of a parameter (as defined in the [Beta] section) and the nameof a variable (as defined in the data file). The names of the variablesand parameters must be written exactly in the same way as defined inthe data file and [Beta] section, respectively.

In section [Expressions], you can define expressions that appear in theavailability conditions or utility functions. Here we have only specified thatone means the numerical value one.

Finally, we need to specify which type of discrete choice model we want toestimate, in this case a logit model (also known as Multinomial logit, MNL).

Now we have a data file that we name data.dat and a model file namedmodel.mod. Both files are saved to the same directory. Here we have chosento save them to C:\BiogemeFiles.

Step 2: Model Estimation

Under Windows, Biogeme is invoked in a DOS command window3. First ofall, you have to go to the directory where you have placed the model anddata files. Figure 2.6 shows the procedure for this example (the commandcd changes the current directory and the command dir displays the contentof the current directory). Second, when the current directory is the one con-taining the model and data files, Biogeme can be invoked with the command:biogeme model data.dat

Note that the model file is given without file extension while the data fileis given with it. After the estimation is finished, Biogeme displays the filenames it has actually used for the estimation as well as the names of theresult files. All the result files are placed in the current directory, thus thedirectory where you have the model and data files.

3The DOS command window can be opened by choosing Run... under the Start Menuand then typing cmd.

32

Page 37: 220714279-Workbook-2012

step-by-step example 33

Figure 2.6: Biogeme Example of DOS commands

Step 3: Estimation Results

For our example, Biogeme writes the following information after the estima-tion is completed:

Biogeme Input files

===================

Parameters: default.par

Model specification: model.mod

Sample 1 : data.dat

Biogeme Output files

====================

Estimation results: model.rep

Estimation results (HTML): model.html

Result model spec. file: model.res

Sample statistics: model.sta

Biogeme Debug files

===================

Screen copy: model.log

Parameters debug: parameters.out

Model debug: model.debug

Model spec. file debug: __specFile.debug

Model informations: Multinomial Logit Model

==================

The minimum argument of exp was -3.45471

Note that there are three input files. In addition to the model and datafile, there is a file named default.par that contains the parameters whichcontrol Biogeme. Since we did not provide such a file, Biogeme automaticallycreates one with the default settings.

33

Page 38: 220714279-Workbook-2012

34 biogeme

The estimation results can be found in model.html. This file contains thesame information as the model.rep file, but is written in HTML formatwhich conveniently can be opened in any browser such as Mozilla Firefox orInternet Explorer. There are two other result files:

• model.res containing the specification of the estimated model in thesame format as the model specification file (here model.mod); and

• model.sta containing data statistics.

A copy of the messages displayed in the DOS command window can befound in the model.log file. If you have problems with your estimation, youcan consult the debug files: model.log, parameters.out, model.debug and__specFile.debug. See section 2.5 for more information on these files.

2.7 BioSim

BioSim is a package provided with Biogeme that can be used for computingpredicted probabilities. BioSim is invoked exactly like Biogeme. BioSim cancompute predicted probabilities for all model types that can be estimatedwith Biogeme, as long as it is not a panel data setting. BioSim is used in thecase study on forecasting, Chapter 7.

Below we indicate how to use BioSim if you have a model named mymodel.mod

that you have just estimated with Biogeme using the data file mydata.dat.

1. Rename the result file mymodel.res to mymodel_res.mod. This filecontains the estimated parameter values to be used for computing theprobabilities.

2. Invoke BioSim with the command:biosim mymodel_res mydata.dat

3. BioSim reports the results in the file mymodel_res.enu. Each line inthis file corresponds to a line in the data file. It is important to note thatonly observations that have been used in the estimation/simulation arereported in the mymodel_res.enu file. That is, if you have excluded ob-servations in the [Exclude] section, these observations are not present

34

Page 39: 220714279-Workbook-2012

biosim 35

in the .enu file.For each observation, BioSim reports the probability for the chosen al-ternative as well as the probability for each alternative in the choiceset.

4. If you want to analyze the BioSim output file with a software such asExcel, then save the file in text format .txt.

See menu Biogeme of website http://biogeme.epfl.ch for more details onBioSim.

35

Page 40: 220714279-Workbook-2012

36 biogeme

36

Page 41: 220714279-Workbook-2012

Part II

Case Studies

37

Page 42: 220714279-Workbook-2012
Page 43: 220714279-Workbook-2012

Chapter 3

Introduction to Model Building

The process of building models is not straightforward and requires the knowl-edge of theory (e.g. consumer theory in the case of marketing), statisticaltools, as well as subjective judgment from the model builder. Hence, it isnot possible to give an exact algorithm for how to build models, but thereare some guidelines. Chapter 7 in Ben-Akiva and Lerman (1985) containsgood advice and procedures for model development. Based on this chapter,we give below some general guidelines on how to approach the case studies(see the introduction of each case study for specific guidelines).

• Start each case study by studying the provided model specifications (the.mod files). Try to understand the underlying assumptions and howthese assumptions are modeled.

• Estimate the example models with Biogeme and analyze the result files(.html). Compare your interpretations with those provided.

• Continue the model development and formulate your own assumptions(select the variables to include and how they should affect the utilities)and modify the model file accordingly. Estimate this modified model.Examples of questions to ask yourself after the estimation are: Do thecoefficients you included have the expected signs? Are they significantlydifferent from zero? Does the new model have a better model fit thanthe original one?

39

Page 44: 220714279-Workbook-2012

40 introduction to model building

It is important for the model to have an intuitive interpretation. It is alwayspossible to improve the model fit by adding parameters to the specification,but the model has to have an intuitive interpretation in order to be useful.

As you go through the case studies, you become familiar with more and moremodeling concepts and statistical tools for analyzing your models. Conse-quently, you are able to perform more and more sophisticated analysis ofyour models.

The Binary Logit case study deals with different specifications of the at-tributes, generic versus alternative specific attributes, as well as includingsocio-economic characteristics of the decision-maker. In the case study onLogit, we show how to statistically test (log-likelihood ratio test) if an unre-stricted model is significantly better than a restricted one. More statisticaltests are introduced in the Specification Testing case study, where wediscuss market segmentation and testing of correlation among alternatives.We also test different ways of including variables in the model. Before work-ing on more general models allowing for correlation among alternatives in theMultivariate Extreme Value Models case study, we show how to use dis-crete choice models for forecasting. The specification of error componentand random coefficients models is covered in the case study on Mixtures ofLogit models. Finally, we discuss simultaneous RP/SP estimation inthe last case study.

3.1 Practical Information

Before starting a case study for the first time, you need to have the followingprograms installed:

• The latest version of Biogeme, which is the reference estimation soft-ware. It is distributed on the course USB stick and on the Biogemewebsite (http://biogeme.epfl.ch). The installation process is de-scribed in section 2.1, page 21.

• A text editor for editing the model files and for reading the data; Word-pad works even though we prefer to work with GNU Emacs. Section 2.3(page 24) shows how to install Emacs on your computer. Please notethat Notepad should not be used.

40

Page 45: 220714279-Workbook-2012

practical information 41

The programs that you need to use when working on the case studies are:

• You can use Biogeme in two different ways: with the graphical userinterface (GUI) (for Windows or Mac OS X) or the command lineversion.

How to invoke Biogeme is described in section 2.2 on page 23.

• Emacs or Wordpad

• It is convenient to use Windows Explorer for opening the results files.Otherwise, the .html result file can be opened directly with InternetExplorer (or another browser of your choice).

Note that depending on the choice of optimization algorithm in Biogeme, theestimation results can differ slightly. See section 2.4 for more details on howto specify the Biogeme parameters.

41

Page 46: 220714279-Workbook-2012

42 introduction to model building

42

Page 47: 220714279-Workbook-2012

Chapter 4

Binary Logit

This case study deals with the estimation of Binary Logit (BL) models using adataset of your interest. The case study will help you to get familiar with theestimation techniques and the basic statistical tests used in the specificationprocess of BL models.

For this case study, you can choose between the Choice-Lab-Fashion Mar-keting, the Netherlands Mode Choice and the Airline Itinerary datasets. Adetailed description of each dataset can be found in Appendix A.

Before starting the case study, read the general introduction to the casestudies in Chapter 3. The introduction discusses how to go through the casestudy and gives you some guidelines on the model building process.

The examples of model specifications that we have provided can be foundin the following sections: Choice-Lab-Fashion Marketing in section 4.2 onpaghe 46, Netherlands Mode Choice in section 4.3 on page 52 and AirlineItinerary in section 4.4 on page 57.

43

Page 48: 220714279-Workbook-2012

44 binary logit

4.1 Challenge Question

The Netherlands mode choice dataset This case study deals with theestimation of a mode choice behavior model for intercity travelers usingrevealed preference data. The survey was conducted during 1987 for theNetherlands Railways to assess factors that influence the choice between railand car for intercity travel.

Context Nijmegen is a small city in the eastern side of the Netherlandsnear the border with Germany. The city has typical rail connections withthe major cities in the western metropolitan area called the Randstad (thatcontains Amsterdam, Rotterdam and The Hague). Trips from Nijmegen tothe Randstad take approximately two hours by both rail and car. A binarychoice model can be developed to model the mode choice of travelers forintercity travel.

Data description Please read Appendix A.2 of the workbook for details.

Files to use with Biogeme:Model file: BL NL socioec g2.modData file: netherlands.dat

After estimating two models that only include variables that were attributesof the alternatives, someone would like to test if a socioeconomic variablegender, which indicates the respondent’s gender, has any impact in the model.

He came up with the following model:

Vcar = ASCcar+ βtime carcartime + βcostcarcost + βgender1gender

Vrail = βtime railrailtime + βcostrailcost + βgender2gender

The variable is categorical and equals one if the gender is female and zero ifmale.

The model is estimated in Biogeme, and the results are listed in Table 4.1.

44

Page 49: 220714279-Workbook-2012

challenge question 45

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 2.85 1.02 2.802 βcost -0.130 0.0265 -4.893 βgender1 -0.338 5.80e+06 0.004 βgender2 0.338 5.80e+06 0.005 βtime car -2.34 0.495 -4.736 βtime rail -0.529 0.414 -1.28

Summary statisticsNumber of observations = 228

L(0) = −158.038

L(β) = −115.880

ρ2 = 0.229

Table 4.1: Estimation results with socioeconomic characteristics

Question: Do you agree with the above approach? Motivate your answer.

45

Page 50: 220714279-Workbook-2012

46 binary logit

4.2 Choice-Lab-Fashion Marketing Case

Binary Logit with Customer Characteristics

Files to use with Biogeme:Model file: BL Marketing 1.modData file: marketing.dat

In this model, we try to assess what are the factors characterizing customers’choice of dropping out as clients from Choice-Lab-Fashion. The decisionmaker (Choice-Lab-Fashion customer) faces a binary choice: either to remainas a client or drop as a client. The dependent variable (Choice) equals 1 ifthe customer “drops” next year and 0 otherwise. The model is estimatedusing the following variables:

• NegProfit: dummy variable for negative profit,

• NegEquity: dummy variable for negative equity,

• LRSC: dummy variable indicating if the legal status of the firm is lim-ited responsibility stock owned company,

• LnNbEmpl: natural logarithm of total number of employees, and

• LnAge: natural logarithm of the company’s age.

For estimation purposes, we normalize the alternative remain client, and theestimated coefficients are therefore interpreted relative to it. The followingexpressions are the systematic parts of the utilities for the two alternatives:

Vremain = 0

Vdrop = ASCdrop + βNegProfitNegProfit+ βNegEquityNegEquity+

βLRSCLRSC+ βEmplLnEmpl + βAgeLnAge.

Figure 4.1 shows a snapshot of the Biogeme code that corresponds to thesystematic parts of the utility functions. Section [Choice] indicates the

46

Page 51: 220714279-Workbook-2012

choice-lab-fashion marketing case 47

[Choice]

Choice

[Beta]

// Name Value LowerBound UpperBound status (0=variable, 1=fixed

ASC_remain 0.0 -100.0 100.0 1

ASC_drop 0.0 -100.0 100.0 0

b_NegProfit 0.0 -100.0 100.0 0

b_NegEquity 0.0 -100.0 100.0 0

b_LRSC 0.0 -100.0 100.0 0

b_Empl 0.0 -100.0 100.0 0

b_Age 0.0 -100.0 100.0 0

[Utilities]

// Id Name Avail linear-in-parameter expression

0 Alt1 avail ASC_remain * one

1 Alt2 avail ASC_drop * one + b_NegProfit * NegProfit +

b_NegEquity * NegEquity + b_LRSC * LRSC + b_Empl * LnNbEmpl +

b_Age * LnAge

[Model]

// Currently, only MNL (multinomial logit), NL (nested logit), CNL

// (cross-nested logit) and $NGEV (Network GEV model) are valid

// keywords

$MNL

[Expressions]

// Define here arithmetic expressions for name that are not directly

// available from the data

one = 1

avail = 1

Figure 4.1: Snapshot from the Biogeme code

47

Page 52: 220714279-Workbook-2012

48 binary logit

dependent variable in the dataset, which is the variable identifying the chosenalternative. The coding of the dependent variable is consistent with the “Id”given in section [Utilities]. Section [Beta] lists the parameters which weintend to use in our systematic utilities. If the status is set to one, this meansthat the parameter is kept fixed at its value; otherwise, it is estimated. Thisis how we normalize one of the alternative specific constants (ASC remain).The parameter names must be exactly the same as those expressed in the[Utilities] section (note that Biogeme is case sensitive). In [Utilities],we define the systematic utilities. Since both options are available to allcustomers, we have set the availability to be 1 in the [Expressions] section.For further details on Biogeme, see Chapter 2.

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCdrop -0.535 0.0880 -6.082 βLRSC -0.234 0.0470 -4.973 βEmpl -0.186 0.0143 -12.984 βAge -0.0973 0.0286 -3.415 βNegEquity 0.185 0.104 1.78∗

6 βNegProfit 0.199 0.0483 4.11

Summary statisticsNumber of observations = 15934

L(0) = −11044.607

L(β) = −7590.130

ρ2 = 0.312

Table 4.2: Estimation results

The estimation results for this first model (BL Marketing 1.mod) are shownin Table 4.2. Given our specification, the negative sign of ASCdrop can beinterpreted as the decision maker prefers to remain client to the company.

The coefficient βAge is negative and statistically significantly different fromzero indicating that the older the customer (age of the firm), the less likelyit is to leave the company. Note that the coefficient βAge also can captureother effects. Young firms might be more vulnerable to be closed down given

48

Page 53: 220714279-Workbook-2012

choice-lab-fashion marketing case 49

financial difficulties (in need to cut costs), so this could explain why theydecide to drop out. However, there might also be other viable explanations.For example, new firms might be interested in buying a one-time list ofaddresses for direct marketing purposes (i.e. product 3).

The significant and negative estimate of the coefficient βLRSC (limited re-sponsibility stock companies) implies that stock owned limited responsibilitycompanies are less likely to drop as clients compared to non-stock limitedresponsibility firms.

The coefficient βEmpl is negative and significantly different from zero whichimplies that larger firms are less likely to drop out. It could be that large firmsare better established in the market, or may be operating in industries whereaccess to companies’ financial information is key to their success. This couldbe, for example, banks and financial institutions. We could also speculatethat large companies have larger client databases and establish credit policiesbased on credit rating information provided by Choice-Lab-Fashion. A smallcompany might only buy one-time credit rating report for one of its clients,and this might happen very sporadically.

On the financial variables indicators, only negative profit is significantly dif-ferent from zero. Companies needing to cut costs are more likely to drop outas clients from Choice-Lab-Fashion, as expected.

Binary Logit with Type of Purchased Product

Files to use with Biogeme:Model file: BL Marketing 2.modData file: marketing.dat

In this model, we keep all the independent variables from the previous modeland add a set of variables describing the product purchased by the deci-sion maker. The idea is to verify if there are any patterns of loyalty thatcan be explained by the type of products that clients have purchased. The

49

Page 54: 220714279-Workbook-2012

50 binary logit

systematic parts of the utilities are:

Vremain = 0

Vdrop = ASCdrop + βNegProfitNegProfit+ βNegEquityNegEquity+

βLRSCLRSC+ βEmplLnNbEmpl + βAgeLnAge+

βIndAnalysisIndAnalysis + βCreditInfoCreditInfo+

βAccountsAccounts + βMonitorMonitor+

βWebWeb + βCDCD + βCRMCRM + βInternetInternet+

βOpenDBOpenDB+ βOtherOther.

In Table 4.3, we show the estimation results for this model. All productchoice coefficients have a negative sign and are significantly different fromzero. However, they vary in magnitude. The largest coefficient absolutevalues are found for the products that provide integrated and web basedservices (CRM, Internet and Web), which are the solutions that provideclients with the most complete and updated data. We could speculate thatthese might be solutions that clients use most frequently and that play animportant role in their day to day decisions. The alternative specific constantASCdrop is positive and significant, compared to negative and significant inthe previous model. This indicates that clients are more likely to drop outthan remain as clients. This should be investigated further.

We have now identified some variables that have a significant impact oncustomer drop outs. However, we can provide Choice-Lab-Fashion with anextra, valuable piece of information: a list of top 100 clients that have thehighest probability of dropping out in the next year. Since we have data onlyuntil 2002, what we can calculate is the probability that a client will dropout in 2003. One way of doing so is to divide the dataset in two samples:training sample and test sample. First, we use the training sample (2000-2001) and estimate the model. Second, we calculate the predicted probabilityof dropping out with the test sample (2002) using the model estimated fromthe training sample. Third, we list the data in descending order and pick the100 clients with the largest probability. Choice-Lab-Fashion could analyzethe listing and decide for which clients it is worth considering a retentionstrategy. We remind the reader that the dataset also includes other variables.Therefore, it is advisable to improve the specification and run additionalmodels.

50

Page 55: 220714279-Workbook-2012

choice-lab-fashion marketing case 51

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCdrop 1.49 0.115 12.972 βLRSC -0.169 0.0497 -3.413 βEmpl -0.131 0.0153 -8.554 βAge -0.216 0.0320 -6.745 βNegEquity 0.322 0.114 2.836 βNegProfit 0.275 0.0519 5.307 βCRM -2.56 0.707 -3.618 βInternet -2.24 0.139 -16.169 βWeb -2.25 0.117 -19.2610 βCD -1.80 0.0706 -25.4711 βMonitor -1.02 0.252 -4.0412 βIndAnalysis -1.08 0.0624 -17.3613 βAccounts -1.04 0.0635 -16.3614 βOther -0.613 0.0539 -11.3715 βCreditInfo -0.568 0.0540 -10.51

Summary statisticsNumber of observations = 15934

L(0) = −11044.607

L(β) = −6717.804

ρ2 = 0.390

Table 4.3: Estimation results

51

Page 56: 220714279-Workbook-2012

52 binary logit

4.3 Netherlands Mode Choice Case

Model Specification with Generic Attributes

Files to use with Biogeme:Model file: BL NL generic.modData file: netherlands.dat

In this first model, we assume that the total travel time (in-vehicle and out-of-vehicle) and travel cost of the modes are the only factors influencing the modechoice. We also assume that the coefficients of the explanatory variables aregeneric, i.e. they do not vary between alternatives. The expression of utilityfor this simple model can be written as:

Vcar = ASCcar + βtimecartime + βcostcarcost

Vrail = βtimerailtime + βcostrailcost

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.798 0.275 -2.902 βcost -0.113 0.0241 -4.673 βtime -1.33 0.354 -3.75

Summary statisticsNumber of observations = 228

L(0) = −158.038

L(β) = −123.133

ρ2 = 0.202

Table 4.4: Estimation results with generic attributes

The estimation results are shown in Table 4.4. All the estimated coefficientsare statistically significantly different from zero. Looking at the alternativespecific constant, the negative sign indicates that the rest of the utilities

52

Page 57: 220714279-Workbook-2012

netherlands mode choice case 53

being equal, car is less preferred than rail. However, this may be due to thefact that the model is too simple and there are important variables left out ofthe model. The negative signs for the generic coefficients for cost and traveltime indicate, as expected, that the utility perceived by the decision makerfor any of the two alternatives decreases with increase in cost and travel time.

Model Specification with Alternative Specific Attributes

Files to use with Biogeme:Model file: BL NL specific.modData file: netherlands.dat

In the second specification, we relax the hypothesis of generic travel timecoefficients. The alternative specific coefficients are more relevant if peopleperceive a minute spent in one mode to be different than a minute spent inthe other mode. To illustrate this idea, two different travel time coefficientsare introduced for car and rail. The corresponding utility function is givenbelow:

Vcar = ASCcar + βtime carcartime + βcostcarcost

Vrail = βtime railrailtime + βcostrailcost

The estimation results are shown in Table 4.5. This model has a betteradjusted likelihood ratio index than the model with generic travel time coef-ficients. However, the coefficient for the travel time of the rail alternative isnot statistically significantly different from zero. The coefficient for the traveltime of the car alternative is negative and significant as expected, and is alsogreater in absolute value than the generic one presented in the previous table(-2.26 vs. -1.33). As in the previous example, the negative sign indicates thatthe utility perceived by the decision maker for the car alternative decreaseswith the increase of travel time. However, it appears that travel time doesnot affect the car and rail alternatives in the same way. The results indicatethat people have less negative utility for travel time in rail compared to car.This may be due to the fact that people can make better use of their timewhen traveling by rail. The alternative specific constant for the car alterna-tive has now the reversed sign denoting increased preference for car (given

53

Page 58: 220714279-Workbook-2012

54 binary logit

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 2.43 0.973 2.502 βcost -0.123 0.0256 -4.793 βtime car -2.26 0.485 -4.664 βtime rail -0.543 0.396 −1.37∗

Summary statisticsNumber of observations = 228

L(0) = −158.038

L(β) = −118.023

ρ2 = 0.228

Table 4.5: Estimation results with alternative-specific attributes

everything else the same) which is more intuitive. A likelihood ratio testcan be performed to test whether or not there is a significant improvementin the goodness-of-fit in the modified specification with alternative specificcoefficients for travel times.

Generic vs. Specific Test

The likelihood ratio test (see pages 28 and 164-167 in Ben-Akiva and Lerman(1985)) can be used to test the generic vs. the alternative-specific specifi-cation. The likelihood ratio test statistic for the null hypothesis of genericattributes is

−2(L(βG) − L(βAS))

where G and AS denote the generic and alternative-specific models, respec-tively. It is χ2 distributed with the number of degrees of freedom equal to thenumber of restrictions (KAS − KG). In this case, −2(−123.133+ 118.023) =

10.220. Since χ20.95,1 = 3.841 at a 95% level of confidence, we can conclude

that the model with the alternative-specific coefficients has a significant im-provement in fit.

54

Page 59: 220714279-Workbook-2012

netherlands mode choice case 55

Model Specification with Socio-Economic Characteris-

tics

Files to use with Biogeme:Model file: BL NL socioec.modData file: netherlands.dat

The previous two models only included variables that were attributes of thealternatives. We now introduce a socioeconomic variable gender which in-dicates the respondent’s gender. The variable is categorical and equals oneif the gender is female and zero if male. Since the variable gender does notvary by alternative (recall that only difference in utility matters), we havenormalized the alternative car to zero. As is shown in the utility functionbelow, the gender variable only enters the utility of the rail alternative. How-ever, this is an arbitrary normalization, as we could also have normalized therail alternative.

Vcar = ASCcar + βtime carcartime + βcostcarcost

Vrail = βtime railrailtime + βcostrailcost + βgendergender

The estimation results are shown in Table 4.6. The results show that there isa slight improvement in the adjusted likelihood ratio index. The coefficientof the gender variable is positive and statistically significant, which indicatesthat women have higher probability than men of choosing the rail alternativewith respect to the car alternative. The reader can verify that if we hadincluded the gender variable in the utility of the car alternative instead ofthe rail alternative, the conclusion would remain unchanged. In fact, theresults would be exactly the same. The only difference is that the coefficientwould show the opposite sign. In our case, it would become negative. Theinterpretation would be that women would have lower probability than menof using the car alternative with respect to the train alternative, which isexactly the same result we had before. Regarding the coefficients of theother explanatory variables, they are almost unchanged with respect to theprevious model.

55

Page 60: 220714279-Workbook-2012

56 binary logit

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 2.85 1.02 2.802 βgender 0.675 0.329 2.053 βcost -0.130 0.0265 -4.894 βtime car -2.34 0.495 -4.735 βtime rail -0.529 0.414 −1.28∗

Summary statisticsNumber of observations = 228

L(0) = −158.038

L(β) = −115.880

ρ2 = 0.235

Table 4.6: Estimation results with socioeconomic characteristics

56

Page 61: 220714279-Workbook-2012

airline itinerary case 57

4.4 Airline Itinerary Case

Model Specification with Generic Attributes

Files to use with Biogeme:Model file: BL airline generic.modData file: airline.dat

We assume the choice variable (dependent variable) includes following alter-natives:

Option 1 a non-stop flight,

Option 2 a flight with one stop on the same airline.

In this first model, we assume leg room, fare, schedule delays (early and late)are the factors influencing the choice. We also assume that the coefficients oftravel time variables are generic, i.e., they do not vary between alternatives.The deterministic part of the utilities for this simple model can be expressedas:

V1 = βFareOpt1 FARE+ βTotal TTTripTimeHours 1

+βLegroomOpt1 Legroom+ βSchedDEOpt1 SchedDelayEarly

+βSchedDLOpt1 SchedDelayLate

V2 = ASC2

+βFareOpt2 FARE+ βTotal TTTripTimeHours 2

+βLegroomOpt2 Legroom+ βSchedDEOpt2 SchedDelayEarly

+βSchedDLOpt2 SchedDelayLate

where fare is coded as Opt1 FARE and Opt2 FARE in the unit of 100$, inorder to reduce numerical issues; the schedule delay is categorized into earlyand late as variables:

• Opt1 SchedDelayEarly,

• Opt1 SchedDelayLate,

57

Page 62: 220714279-Workbook-2012

58 binary logit

• Opt2 SchedDelayEarly and

• Opt2 SchedDelayLate;

The leg room is coded as a continuous variable in inch unit. These variablesare coded in the “[Expressions]” section of the model file.

The estimation results are reported in Table 4.7. The results indicate thatall other things being equal, the first option without stop is preferred. Allthe estimated coefficients are significantly different from zero. The signs ofthe time coefficient βTotal TT and the fare coefficient βFare are negative, asexpected, meaning that the utility of an alternative decreases with increasein travel time and fare. The signs of the schedule delay coefficients are bothnegative, indicating that people don’t like delays. The positive sign of theleg room indicates that people like seats with bigger space.

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ACS2 -1.41 0.176 -8.022 βFare -1.83 0.104 -17.653 βLegroom 0.115 0.0179 6.414 βSchedDE -0.111 0.0213 -5.235 βSchedDL -0.118 0.0189 -6.256 βTotal TT -0.236 0.0966 -2.44

Summary statisticsNumber of observations = 3093

L(0) = −2143.904

L(β) = −1171.504

ρ2 = 0.451

Table 4.7: Estimation results with generic attributes

58

Page 63: 220714279-Workbook-2012

airline itinerary case 59

Logit Model with Alternative-Specific Attributes

Files to use with Biogeme:Model file: BL airline specific.modData file: airline.dat

In this second specification we relax the hypothesis of generic coefficients.To illustrate this idea, two different time coefficients are introduced for twoalternatives. The corresponding utility functions are reported below:

V1 = βFareOpt1 FARE+ βTotal TT1TripTimeHours 1

+βLegroomOpt1 Legroom+ βSchedDEOpt1 SchedDelayEarly

+βSchedDLOpt1 SchedDelayLate

V2 = ASC2

+βFareOpt2 FARE+ βTotal TT2TripTimeHours 2

+βLegroomOpt2 Legroom

+βSchedDEOpt2 SchedDelayEarly

+βSchedDLOpt2 SchedDelayLate,

The estimation results are reported in Table 4.8. In this case, both timecoefficients for the two options are estimated. Both their signs are negative,as expected. The absolute value of βTotal TT1 is larger, meaning that peopleare more sensitive to time in case of non-stop flights. The interpretation forother parameters remains the same.

Generic vs Specific Test

The likelihood ratio test can be used to test the generic vs. the alternative-specific model specifications. The likelihood ratio test statistic for the nullhypothesis of generic attributes is

−2(L(βR) − L(βU)),

where R and U denote the restricted (generic) and unrestricted (alternative-specific) models, respectively. It is χ2-distributed with the number of degrees

59

Page 64: 220714279-Workbook-2012

60 binary logit

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.48 0.205 -7.222 βFare -1.82 0.105 -17.273 βLegroom 0.115 0.0179 6.414 βSchedDE -0.112 0.0214 -5.215 βSchedDL -0.118 0.0190 -6.246 βTotal TT1 -0.257 0.104 -2.477 βTotal TT2 -0.236 0.0967 -2.44

Summary statisticsNumber of observations = 3093

L(0) = −2143.904

L(β) = −1171.318

ρ2 = 0.450

Table 4.8: Binary model with alternative specific attributes

of freedom equal to the number of restrictions (KU−KR), with KU and KR thenumbers of estimated coefficients in the unrestricted and restricted models,respectively. In this case, −2(−1171.504+1171.318) = 0.372. Since χ2

0.90,1 =

2.71 at 90% level of confidence, we can conclude that the null hypothesis ofa generic time coefficient can not be rejected. So the model with alternativespecific coefficient does not have a significant improvement in fit.

Inclusion of Socio-Economic Characteristics

Files to use with Biogeme:Model file: BL airline socioec.modData file: airline.dat

The previous two models only include variables that are attribute of the al-ternatives. We now introduce a socio-economic characteristic, namely thegender of the respondent. MALE is a dummy variable and is equal to 1 ifthe gender is male and zero if female. It should be noticed that the socio-

60

Page 65: 220714279-Workbook-2012

airline itinerary case 61

economic variables do not vary among the alternatives (recall that only dif-ference in the utilities matters), we have normalized alternative 2 to zero.However, this is an arbitrary normalization, as we could also have normal-ized alternative 1. The utility functions can be written now as follows:

V1 = βFareOpt1 FARE+ βTotal TTTripTimeHours 1

+βLegroomOpt1 Legroom+ βSchedDEOpt1 SchedDelayEarly

+βSchedDLOpt1 SchedDelayLate

V2 = ASC2

+βMale Opt2Male + βFareOpt2 FARE

+βTotal TTTripTimeHours 2+ βLegroomOpt2 Legroom

+βSchedDEOpt2 SchedDelayEarly

+βSchedDLOpt2 SchedDelayLate

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.44 0.184 -7.862 βFare -1.83 0.104 -17.663 βLegroom 0.115 0.0179 6.414 βMale Opt2 0.0620 0.105 0.595 βSchedDE -0.111 0.0212 -5.226 βSchedDL -0.118 0.0189 -6.267 βTotal TT -0.234 0.0967 -2.42

Summary statisticsNumber of observations = 3093

L(0) = −2143.904

L(β) = −1171.329

ρ2 = 0.450

Table 4.9: Binary model with socio-economic characteristics

The estimation results are reported in Table 4.9. The coefficient of theβMale Opt2 is not statistically significant different from zero and indicates that

61

Page 66: 220714279-Workbook-2012

62 binary logit

different genders have the same preferences on the two options. The inter-pretation of the other coefficients remains the same as the previous modelspecifications.

62

Page 67: 220714279-Workbook-2012

Chapter 5

Logit

The topic of this case study is the logit model, sometimes called the Multi-nomial Logit (MNL). Different specifications are introduced using a stepwisemodeling strategy, which increases the complexity by adding different vari-ables at each step. The objectives of this case study can be summarized asfollows:

• Specification and estimation of a basic logit model making use of genericattributes.

• Specification and estimation of a logit model including alternative-specific attributes.

• Introduction of generic vs specific test techniques (likelihood ratio test).

For this case study, you can choose between the Swissmetro, the ResidentialTelephone Services and the Airline Itinerary datasets. A detailed descriptionof each dataset can be found in Appendix A.

Before starting the case study, read the general introduction to the casestudies in Chapter 3. The introduction discusses how to go through the casestudy and gives you some guidelines on the model building process.

The examples of model specifications that we have provided can be found inthe following sections: Swissmetro in section 5.2 on page 67, Residential Tele-phone Services in section 5.3 on page 73 and Airline Itinerary in section 5.4on page 78.

63

Page 68: 220714279-Workbook-2012

64 logit

5.1 Challenge Question

The Italy mode choice dataset The data have been collected in Cagliari,which is the capital of Sardinia Italy. In 1998, the local rail authority decidedto upgrade the service into metropolitan-like commuter train service, increas-ing the speed, the frequency and the number of stations inside the corridor.In order to analyze the impact of a potential new train system three typesof surveys were conducted: a qualitative survey using focus groups to gain agood understanding of the phenomenon, a revealed preference (RP) surveydescribing current trips, and a stated preference (SP) survey to evaluate theintroduction of radical improvements to the existing alternative.

In this challenge question, we focus on the RP survey. Households wererandomly selected from the telephone directory and each member of thefamily over the age of 12 was asked to participate. After testing consistencyand validity of the data for mode choice modeling – only people with anactual modal choice among Car, Bus and Train were considered –, a finalsample of 318 observations was left for model estimation.

Data description Please read Appendix A.7 of the workbook for details.

Files to use with Biogeme:Model file: mnl-RP Italy Challenge.modData file: italy.dat

Figure 5.1 gives a suggested Biogeme specification of the model.

Question: Does this model make sense to you? What results do you expectwhen you try to estimate this model?

The results estimated by Biogeme are given in Table 5.1. Do they correspondto your expectations?

64

Page 69: 220714279-Workbook-2012

challenge question 65

[Choice]

ch

[Beta]

// Name Value LowerBound UpperBound status

ASC_car 0 -1000 1000 0

ASC_train 0 -1000 1000 0

B_cost 0 -1000 1000 0

B_Veh_time 0 -1000 1000 0

B_Wal_time 0 -1000 1000 0

B_nb_car 0 -1000 1000 0

[Utilities]

// Id Name Avail linear-in-parameter expression (beta1*x1 + beta2*x2 + ... )

1 TrainRP av1 ASC_train * one + B_Veh_time * tt_t + B_Wal_time * wt_t

+ B_cost * c_t

2 CarRP av2 ASC_car * one + B_Veh_time * tt_c + B_Wal_time * wt_c

+ B_cost * c_c + B_nb_car * nb_car

3 BusRP av3 B_Veh_time * tt_b + B_Wal_time * wt_b + B_cost * c_b

[Model]

$MNL

[Expressions]

one = 1

nb_car = car_lic * 10 * ( ch == 2 )

Figure 5.1: Italy mode choice, logit Specification

65

Page 70: 220714279-Workbook-2012

66 logit

Logit Model Estimation ResultsVariable Variable Coefficient standard t-stat. 0number name estimate error

1 ASC car -48.7 13.2 -3.702 ASC train -1.30 0.996 -1.313 B Veh time -0.101 0.0775 -1.314 B Wal time -0.257 0.0516 -4.985 B cost -4.32 1.78 -2.436 B nb car 33.3 6.25 5.32

Summary statisticsNumber of observations = 318

L(0) = −294.215

L(β) = −22.406

ρ2 = 0.903

Table 5.1: Estimation results for the logit model related to the Italy modechoice dataset

66

Page 71: 220714279-Workbook-2012

swissmetro case 67

5.2 Swissmetro Case

Model Specification with Generic Attributes

Files to use with Biogeme:Model file: MNL SM generic.modData file: swissmetro.dat

The dataset consists of survey data collected on the trains between St. Gallenand Geneva in Switzerland. The idea is to analyze the impact of modal in-novation in transportation, represented by the Swissmetro, against the moreclassic types of transport modes. The choice variable consists of three al-ternatives: train, Swissmetro and car (for car owners). In this first modelspecification, we assume that travel time, cost and headway of public trans-portation modes influence the utility functions. We also assume that thecoefficients of the explanatory variables are generic, that is, they do notvary over the alternatives. The corresponding expressions of the utilities aredefined as follows:

Vcar = ASCcar + βtimeCAR TT+ βcostCAR CO

Vtrain = βtimeTRAIN TT+ βcostTRAIN COST+ βheTRAIN HE

VSM = ASCSM + βtimeSM TT+ βcostSM COST+ βheSM HE

where CAR TT is the car travel time, CAR CO is the car cost, TRAIN TTis the train travel time, TRAIN COST is the train cost (considering theownership of Swiss annual season ticket, GA), TRAIN HE is train headway(in minutes), SM TT is the Swissmetro travel time, SM COST is the Swiss-metro cost (considering the ownership of GA), and SM HE is the Swissmetroheadway.

The estimation results are shown in Table 5.2. For estimation purposes,we have normalized the alternative specific constant of train to zero. Theestimated values for the alternative specific constants ASCcar and ASCSM

show that, all the rest remaining constant, there is a preference in the choiceof car and Swissmetro with respect to train. Moreover, the higher valueof ASCSM shows a greater preference for Swissmetro compared to car. Asexpected, both the travel time and cost coefficients have negative signs. The

67

Page 72: 220714279-Workbook-2012

68 logit

Logit model with generic attributesParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 0.189 0.0798 2.372 ASCSM 0.451 0.0932 4.843 βcost -0.0108 0.000682 -15.904 βhe -0.00535 0.000983 -5.455 βtime -0.0128 0.00104 -12.23

Summary statisticsNumber of observations = 6768

L(0) = −6964.663

L(β) = −5315.386

ρ2 = 0.236

Table 5.2: Logit model with generic attributes

higher the travel time or the cost of an alternative, the lower the relatedutility. The negative estimate of the headway coefficient βhe indicates thatthe higher the headway, the lower the frequency of service, and thus the lowerthe utility.

Model Specification with Alternative Specific Attributes

Files to use with Biogeme:Model file: MNL SM specific.modData file: swissmetro.dat

In this second model, we relax the hypothesis of generic coefficients. Toillustrate this idea, we use three different cost coefficients, one for each alter-native. The corresponding utility functions are

68

Page 73: 220714279-Workbook-2012

swissmetro case 69

Vcar = ASCcar + βtimeCAR TT+ βcar costCAR CO

Vtrain = βtimeTRAIN TT+ βtrain costTRAIN COST+ βheTRAIN HE

VSM = ASCSM + βtimeSM TT+ βSM costSM COST+ βheSM HE.

Logit model with alternative specific travel costParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.971 0.134 -7.222 ASCSM -0.444 0.102 -4.343 βcar cost -0.00949 0.00116 -8.214 βhe -0.00542 0.00101 -5.365 βSM cost -0.0109 0.000703 -15.496 βtime -0.0111 0.00120 -9.267 βtrain cost -0.0293 0.00169 -17.32

Summary statisticsNumber of observations = 6768

L(0) = −6964.663

L(β) = −5068.559

ρ2 = 0.271

Table 5.3: Logit model with alternative-specific cost attributes

The estimation results for this model specification are shown in Table 5.3.The results show the significance of the alternative-specific cost coefficients.The influence of the cost is different, showing a larger negative impact onthe train alternative with respect to car and Swissmetro. In this model, theASC’s are negative implying a preference, with all the rest constant, for thetrain alternative. These results are different from those of the previous modelwhere ASCcar and ASCSM were positive and significant. The larger negativevalue of ASCcar implies that this alternative is more negatively perceivedwith respect to train than the Swissmetro alternative. Considering thatthe deterministic utilities are very simple, only including three explanatory

69

Page 74: 220714279-Workbook-2012

70 logit

variables, the alternative specific constants can capture various effects. Theirsigns and magnitudes should therefore be further investigated.

Generic vs. Specific Test

To test whether a coefficient should be generic or alternative-specific, we usethe likelihood ratio test (see pages 28 and 164-167 in Ben-Akiva and Ler-man, 1985). We compare the log likelihood functions of the restricted andunrestricted models of interest. The restricted model includes generic travelcost coefficients over the three alternatives, and the unrestricted model in-cludes alternative-specific travel cost coefficients. Hence, the null hypothesisis

H0 : βcar cost = βtrain cost = βSM cost

and the test statistic for the null hypothesis is given by

−2(LR − LU)

which is asymptotically distributed as χ2 with df = KU − KR degrees offreedom, where KU and KR are the numbers of estimated parameters in theunrestricted and restricted models, respectively. We reject the null hypoth-esis that the restrictions are true if

−2(LR − LU) > χ2((1−α),df)

where α is the level of significance. In this specific case, using α = 0.05 yields

−2(−5315.386+ 5068.559) = 493.654 > 5.991

We can therefore reject the null hypothesis and conclude that the travel costcoefficient should be alternative-specific.

Model Specification with Socio-Economic Characteris-

tics

Files to use with Biogeme:Model file: MNL SM socioec.modData file: swissmetro.dat

70

Page 75: 220714279-Workbook-2012

swissmetro case 71

To capture the average of the differences between the individuals in the sam-ple, we make use of socio-economic characteristics. These types of variablesdo not change over the choice set and are individual specific. In this exam-ple, we add two variables to the model: a dummy variable (SENIOR) forsenior people (age above 65) and a dummy variable that captures the effectof the Swiss annual season ticket for train (GA). A few observations, wherethe variable AGE is unknown (coded as 6), are removed from the estimation.The deterministic utilities are:

Vcar = ASCcar + βtimeCAR TT+ βcar costCAR CO + βseniorSENIOR

Vtrain = βtimeTRAIN TT+ βtrain costTRAIN COST+ βheTRAIN HE+

βgaGA

VSM = ASCSM + βtimeSM TT+ βSM costSM COST+ βheSM HE+

βseniorSENIOR + βgaGA

The estimation results for this model are shown in Table 5.4. The coefficientsof the socio-economic variables have been estimated and are significantlydifferent from zero at a 95% confidence level. The negative sign of the agecoefficient (referring to SENIOR dummy variable) reflects a preference ofolder individuals for the train alternative. It seems a reasonable conclusion,dictated probably by safety reasons with respect to the car choice and akind of “inertia” with respect to the modal innovation represented by theSwissmetro alternative. The coefficient related to the ownership of the Swissannual season ticket (GA) is positive, as expected. It reflects a preferencefor the SM and train alternative with respect to car, given that the travelerpossesses a season ticket. Finally, the interpretation of the alternative specificconstants is similar to that of the previous model specification.

71

Page 76: 220714279-Workbook-2012

72 logit

Logit model with socio-economic variablesParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.608 0.143 -4.242 ASCSM -0.135 0.106 -1.263 βcar cost -0.00936 0.00117 -8.024 βhe -0.00586 0.00106 -5.555 βSM cost -0.0104 0.000744 -14.026 βtime -0.0111 0.00121 -9.207 βtrain cost -0.0268 0.00176 -15.248 βsenior -1.88 0.109 -17.319 βga 0.557 0.191 2.91

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −4927.167

ρ2 = 0.291

Table 5.4: Logit model with socio-economic variables

72

Page 77: 220714279-Workbook-2012

choice of residential telephone services case 73

5.3 Choice of Residential Telephone Services

Case

Model Specification with Generic Attributes

Files to use with Biogeme:Model file: MNL Tel generic.modData file: telephone.dat

In this example, we model the household’s choice of service option for localtelephone services. The choice variable (dependent variable) includes the fol-lowing alternatives: budget measured (BM), standard measured (SM), localflat(LF), extended flat(EF) and metro flat(MF). In this first model, we as-sume that the cost of the calling plan is the only factor influencing the choiceof the calling plan. We also assume that the coefficients of the explanatoryvariables are generic, i.e. they do not vary among the alternatives. Theexpressions of the utilities for this simple model can be written as:

VBM = ASCBM + βcost ln(cost BM)

VSM = βcost ln(cost SM)

VLF = ASCLF + βcost ln(cost LF)

VEF = ASCEF + βcost ln(cost EF)

VMF = ASCMF + βcost ln(cost MF).

Here we have included the natural logarithm of the cost in order to bettercapture differences in cost among alternatives.

The estimation results are shown in Table 5.5. The results indicate thatall the rest being equal, the budget measured (BM) alternative is the leastdesired alternative and the metro area flat (MF) is the most preferred alterna-tive. The alternative specific constant for the extended flat (EF) alternativeis not significantly different from zero, as shown by the related t-statisticvalue. The sign of the cost coefficient is negative, as expected, meaning thatthe utility of an alternative decreases with increase in cost.

73

Page 78: 220714279-Workbook-2012

74 logit

Model Specification with Alternative-Specific Attributes

Files to use with Biogeme:Model file: MNL Tel specific.modData file: telephone.dat

In this second specification, we relax the hypothesis of generic coefficients.To illustrate this idea, two different cost coefficients are introduced, onefor the flat alternatives and the other for the measured alternatives. Thecorresponding utility functions are shown below:

VBM = ASCBM + βM cost ln(cost BM)

VSM = βM cost ln(cost SM)

VLF = ASCLF + βF cost ln(cost LF)

VEF = ASCEF + βF cost ln(cost EF)

VMF = ASCMF + βF cost ln(cost MF)

The estimation results are shown in Table 5.6. In this case, both cost co-efficients for flat and measured alternatives are estimated. Both their signsare negative, as expected, and the larger absolute value of βM cost indicatesthat people are more sensitive to cost in case of measured alternatives. Thevalue and the sign of the budget measured alternative specific constant stillindicates that this option is the least desired, all the rest remaining constant.The other values of the ASC’s for the flat options are not significant.

Generic vs. Specific Test

The likelihood ratio test (see pages 28 and 164-167 in Ben-Akiva and Ler-man, 1985) can be used to test a generic versus an alternative-specific modelspecification. The likelihood ratio test statistic for the null hypothesis ofgeneric attributes is

−2(L(βR) − L(βU))

where R and U denote the restricted (generic) and unrestricted (alternative-specific) models, respectively. It is χ2 distributed with the number of degrees

74

Page 79: 220714279-Workbook-2012

choice of residential telephone services case 75

of freedom equal to the number of restrictions (KU − KR), where KU and KR

are the numbers of estimated coefficients in the unrestricted and restrictedmodels, respectively. In this case, −2(−477.557 + 476.608) = 1.898. Sinceχ20.95,1 = 3.841 at a 95% level of confidence, we can conclude that the null

hypothesis of a generic cost coefficient cannot be rejected. The restrictedmodel should therefore be preferred.

Model Specification with Socio-Economic Characteris-tics

Files to use with Biogeme:Model file: MNL Tel socioec.modData file: telephone.dat

The previous two models only include variables that are attributes of thealternatives. We now introduce a socio-economic characteristic, namely thenumber of users in the household (users), in the utility of the flat options.It should be noted that the socio-economic variables do not vary among thealternatives and are individual specific. The utility functions can be writtennow as follows:

VBM = ASCBM + βM cost ln(cost BM)

VSM = βM cost ln(cost SM)

VLF = ASCLF + βF cost ln(cost LF) + βusersusers

VEF = ASCEF + βF cost ln(cost EF) + βusersusers

VMF = ASCMF + βF cost ln(cost MF) + βusersusers

The estimation results are shown in Table 5.7. The coefficient of the usersvariable is statistically significantly different from zero and indicates thatpeople have higher preference towards flat options if the number of users ishigher (as expected). The interpretation of the other coefficients remains thesame as in the previous model specifications.

75

Page 80: 220714279-Workbook-2012

76 logit

Logit model with generic attributesParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.721 0.152 -4.762 ASCLF 1.20 0.159 7.563 ASCEF 1.00 0.703 1.424 ASCMF 1.74 0.267 6.515 βcost -2.03 0.212 -9.55

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −477.557

ρ2 = 0.139

Table 5.5: Logit model with generic attributes

Logit model with alternative specific attributesParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.747 0.155 -4.822 ASCLF 0.155 0.691 0.223 ASCEF -0.0920 1.00 -0.094 ASCMF 0.479 0.817 0.595 βM cost -2.16 0.243 -8.906 βF cost -1.71 0.273 -6.25

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −476.608

ρ2 = 0.139

Table 5.6: Logit model with alternative-specific attributes

76

Page 81: 220714279-Workbook-2012

choice of residential telephone services case 77

Logit model with socio-economic characteristicsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.731 0.153 -4.772 ASCLF -0.0871 0.700 -0.123 ASCEF -0.319 1.02 -0.314 ASCMF 0.274 0.830 0.335 βusers 0.394 0.108 3.636 βM cost -1.96 0.246 -7.967 βF cost -1.79 0.286 -6.25

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −468.791

ρ2 = 0.151

Table 5.7: Logit model with socio-economic characteristics

77

Page 82: 220714279-Workbook-2012

78 logit

5.4 Airline Itinerary Case

Logit model with Generic Attributes

Files to use with Biogeme:Model file: MNL airline generic.modData file: airline.dat

The choice set consists of the following three alternatives:

1. a non-stop flight,

2. a flight with one stop on the same airline,

3. a flight with one stop and a change of airline.

We define the deterministic part of the utility for the household by includingthe alternative specific constants (ASCs) and five attributes, namely fare (inthe unit of 100$, in order to reduce numerical issues), legroom, total traveltime (Total TT), early and late schedule delays (SchedDE and SchedDL),with their respective generic coefficients βFare, βLegroom, βTotal TT, βSchedDE

and βSchedDL:

V1 = ASC1 + βFare · Fare+ βLegroom · Legroom1 + βTotal TT · Total TT1

+βSchedDE · SchedDE1 + βSchedDL · SchedDL1

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2 + βTotal TT · Total TT2

+βSchedDE · SchedDE2 + βSchedDL · SchedDL2

V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3 + βTotal TT · Total TT3

+βSchedDE · SchedDE3 + βSchedDL · SchedDL3

One of the alternative specific constants (arbitrarily ASC1) is normalized tozero for identification. The corresponding alternative is the reference alter-native for the ASCs. This is important for the interpretation we will performin the next paragraphs.

Given our specification, and everything being equal, an ASC with negativesign indicates a lower utility level for the corresponding alternative comparedto the normalized one (i.e., the first one). As it can be observed in Table 5.8,this is the case for both other alternatives (ASC2 and ASC3 are negative and

78

Page 83: 220714279-Workbook-2012

airline itinerary case 79

Generic logit model estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.26 0.126 -9.952 ASC3 -1.49 0.127 -11.723 βFare -0.0194 0.000795 -24.374 βLegroom 0.222 0.0266 8.355 βSchedDE -0.130 0.0161 -8.086 βSchedDL -0.0883 0.0145 -6.107 βTotal TT -0.326 0.0671 -4.85

. . .Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2333.701

ρ2 = 0.410

Table 5.8: Logit model with generic attributes

statistically significant). It means that alternative 1 is preferred to alterna-tives 2 and 3, i.e., alternative without stop is preferred to alternatives withstops all other things being equal.

The parameter related to leg room has a positive sign and it is significantlydifferent from zero. It implies that more room for legs increases the utility ofthe alternative. For other parameters, like fare, delays and travel time, thesign is negative. It means that all these factors have a negative impact onutility: they make the alternative less likely to be chosen.

Logit model with Alternative-Specific Coefficients

Files to use with Biogeme:Model file: MNL airline specific.modData file: airline.dat

Next we present a model (unrestricted) with alternative-specific travel timecoefficients and we compare it with the (restricted) model with generic co-

79

Page 84: 220714279-Workbook-2012

80 logit

efficients presented in the previous section. We carry out a statistical test(likelihood ratio test) to assess if one specification is significantly better thanthe other. We perform the analysis on the coefficient of the travel time. Thedeterministic utilities for this model with alternative-specific travel times are:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1 + βTotal TT 1 · Total TT1

+βSchedDE · SchedDE1 + βSchedDL · SchedDL1

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2 + βTotal TT 2 · Total TT2

+βSchedDE · SchedDE2 + βSchedDL · SchedDL2

V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3 + βTotal TT 3 · Total TT3

+βSchedDE · SchedDE3 + βSchedDL · SchedDL3

Note that instead of only βTotal TT, we have now βTotal TT 1, βTotal TT 2 andβTotal TT 3.

The results for the unrestricted model are reported in Table 5.9.

Generic vs Specific Test Under the null hypothesis:

H0 : βTotal TT 1 = βTotal TT 2 = βTotal TT 3

We reject null hypothesis (generic travel time coefficient) if :

−2(LR − LU) > χ((1−α),df

Next we describe the standard steps to perform the test:

1. LR and LU represent the log-likelihood for both the restricted and theunrestricted models:

LR = −2333.701

LU = −2320.447

2. The degree of freedom is given by the difference in the number of esti-mated parameters between the models:

df = KU − KR = 9− 7 = 2

80

Page 85: 220714279-Workbook-2012

airline itinerary case 81

Generic logit model estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.43 0.183 -7.812 ASC3 -1.64 0.192 -8.533 βFare -0.0193 0.000802 -24.054 βLegroom 0.226 0.0267 8.455 βSchedDE -0.139 0.0163 -8.536 βSchedDL -0.104 0.0137 -7.597 βTotal TT1

-0.332 0.0735 -4.528 βTotal TT2

-0.299 0.0696 -4.299 βTotal TT3

-0.302 0.0699 -4.31. . .

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2320.447

ρ2 = 0.412

Table 5.9: Logit model with alternative-specific travel-time attributes

81

Page 86: 220714279-Workbook-2012

82 logit

3. −2(LR − LU) = −2(−2333.701+ 2320.447) = 26.508

4. The critical value for χ(0.95,2) is 0.103.

5. We conclude that we can reject the null hypothesis H0 of generic coef-ficient in favor of alternative-specific coefficients.

Inclusion of Socio-Economic Characteristics

Files to use with Biogeme:Model file: MNL airline socioecon.modData file: airline.dat

It is reasonable to assume that people make choices not only in relationto the attributes that characterize the alternatives but also depending onsome personal characteristics or socioeconomic indicators. The availabilityof individual-specific information gives us the opportunity to model partlythe heterogeneity present in the population. We modify the previous modelby adding income of respondents into the utilities.

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1 + βTotal TT 1 · Total TT1

+βSchedDE · SchedDE1 + βSchedDL · SchedDL1 + βInc1 · Income

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2 + βTotal TT 2 · Total TT2

+βSchedDE · SchedDE2 + βSchedDL · SchedDL2 + βInc2 · Income

V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3 + βTotal TT 3 · Total TT3

+βSchedDE · SchedDE3 + βSchedDL · SchedDL3 + βInc3 · Income

Since the variable of the income does not vary between the alternatives andonly differences in utilities matter, we need to normalize one alternative tozero. We interpret the estimated coefficients for the remaining alternativeswith respect to the reference alternative, which arbitrarily is alternative 1.It is similar to what we did when specifying alternative specific constants.

We assumed that the income of the respondent affects differently each alter-native.

The estimation results of this model are reported in Table 5.10.

82

Page 87: 220714279-Workbook-2012

airline itinerary case 83

Generic logit model estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.07 0.215 -4.962 ASC3 -1.05 0.228 -4.613 βFare -0.0195 0.000807 -24.184 βIncome2 -0.0419 0.0148 -2.835 βIncome3 -0.0755 0.0154 -4.906 βLegroom 0.227 0.0268 8.497 βMI -0.578 0.159 -3.648 βSchedDE -0.139 0.0163 -8.509 βSchedDL -0.104 0.0139 -7.4910 βTotal TT1

-0.335 0.0735 -4.5611 βTotal TT2

-0.301 0.0696 -4.3212 βTotal TT3

-0.304 0.0698 -4.36. . .

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2307.488

ρ2 = 0.415

Table 5.10: Logit model with socio-economic variables

83

Page 88: 220714279-Workbook-2012

84 logit

Therefore we have specified two different β parameters associated with theattribute “income”. βInc for alternative 1 has been normalized to zero. Thetwo parameter estimates have negative signs, implying that the higher theincome of the respondent, the lower the likelihood for choosing these twoalternatives (with stops) compared to the first one (without stops).

In this model, we need to deal with missing data for income. We defined“Income” as being the income variable without -1 and 99. The [Exclude]section tells Biogeme not to consider some observations. One solution wouldbe to exclude missing data (-1 and 99) from the whole data set.

A second and better solution consists in defining another variable, called“MissingIncome” (MI). “MissingIncome” is equal to 1 if the income variableis -1 or 99. We don’t exclude any observation any more and the [Exclude]section is not changed, but we add this new variable in the utility function.

84

Page 89: 220714279-Workbook-2012

Chapter 6

Specification Testing

The topic of this case study is the testing of different hypotheses regardingboth model specifications and structures. The objectives can be summarizedas follows:

• Illustration of the market segmentation concept and related testing.

• Explanation of the McFadden IIA test to test the assumption of inde-pendence between alternatives.

• Testing of non-nested hypotheses using the Cox test.

• Testing of non-linear specifications using the piecewise linear approx-imation, the power series expansion and the Box-Cox transformationmethods.

For this case study, you can choose between the Swissmetro, the ResidentialTelephone Services and the Airline Itinerary datasets. A detailed descriptionof each dataset can be found in Appendix A.

Before starting the case study, read the general introduction to the casestudies on page 17. The introduction discusses how to go through the casestudy and gives you some guidelines on the model building process.

The examples of model specifications that we have provided can be foundin the following sections: Swissmetro in section 6.1, Residential TelephoneServices in section 6.2 and Airline Itinerary in section 6.3.

85

Page 90: 220714279-Workbook-2012

86 specification testing

6.1 Swissmetro Case

Market Segmentation

Files to use with Biogeme:Model files: SpecTest SM male.mod,

SpecTest SM female.mod,SpecTest SM full.mod,

Data file: swissmetro.dat

In this example, the segmentation is made on the gender variable. We firstcreate two market segments as follows:

• Male: all observations where MALE=1 belong to this subgroup.

• Female: all observations where MALE=0 belong to this subgroup.

Following the procedure described in Ben-Akiva and Lerman (1985) (pages194-204), we estimate a model on the full data set. Then we run the samemodel for each gender group separately. Note that we make use of the[Exclude] section in the model specification file to define which observa-tions should be excluded for the estimation. We obtain the values shownin Table 6.1. The expressions of the utility functions are the same for allmodels. Note that we define the dummy variable SENIOR which takes thevalue 1 for individuals with age above 65 and 0 otherwise.

Vcar = ASCcar + βtimeCAR TT+ βcar costCAR CO+ βseniorSENIOR

Vtrain = βtimeTRAIN TT+ βtrain costTRAIN COST + βheTRAIN HE+

βgaGA

VSM = ASCSM + βtimeSM TT+ βSM costSM COST + βheSM HE +

βseniorSENIOR + βgaGA

The null hypothesis is of no taste variation across the market segments:

H0 : βMale = βFemale

Note that in the above equation Male and Female refer to market segmentsand not to variables in the dataset.

86

Page 91: 220714279-Workbook-2012

swissmetro case 87

Model Log likelihood Number of coefficientsMale -3680.002 9Female -1110.618 9Restricted model -4927.167 9

Table 6.1: Values for the market segmentation test

The likelihood ratio test (with 18-9=9 degrees of freedom) yields

LR = −2(LN(β) −

G∑

g=1

LNg(βg))

= −2(−4927.167+ 3680.002+ 1110.618) = 273.094

χ20.95,9 = 16.920

and we can therefore reject the null hypothesis at a 95% level of confidence.

McFadden IIA Test

Files to use with Biogeme:Model files: SpecTest SM socioec bis.mod, SpecTest SM IIA.modData files: swissmetro.dat, swissmetro exclude.datCommand file: doit.batSupplementary software: biomerge.exe

We are studying the impact of the modal innovation, represented by theSwissmetro, against traditional transport modes represented by car and train.It would seem logical to expect some kind of relationship between the tradi-tional alternatives. They are probably correlated, where the source of thiscorrelation might be the presence of unobserved shared attributes betweenthe car and train alternatives. In order to test this assumption, we followthe procedure that is described in McFadden (1987) and Train et al. (1989).The procedure is semi-automatic in Biogeme. First we estimate a logit model(SpecTest SM socioec bis.mod) on the full data set swissmetro.dat. The spec-ification file SpecTest SM socioec bis.mod contains a section describing thecorrelation we want to test. The corresponding Biogeme snapshot is shown

87

Page 92: 220714279-Workbook-2012

88 specification testing

[IIATest]

C13 1 3

Figure 6.1: Biogeme snapshot: IIATest section

in Figure 6.1. Alternative 1 corresponds to train, and 3 to car. Then theestimated model is applied on the same data file, using BioSim. By definingthe section [IIATest] in the orginal .mod file, auxiliary variables are auto-matically computed for each observation, and reported in the .enu outputfile. The original .dat file and the .enu file are merged using BIOMERGEin order to create a new data file. In fact to do the merging we use swiss-metro exclude.dat because some observations are excluded in the originalestimation. Now we specify a new model (SpecTest SM IIA.mod) which in-cludes the auxiliary variables in the utility functions associated with trainand car. Finally, we estimate this model on the new data file created bymerging. We show in Table 6.2 the estimation results. Note that the en-tire procedure described above can be carried out automatically using thecommand file doit.bat

The focus in this test is not related to the sign of the estimated IIA parameter.What is important is the value of the t-statistic for such a coefficient. βIIA

is significantly different from 0 at a 95% level of confidence. This indicatesthat the IIA property does not hold for the car and train alternatives. Thiskind of correlation can be captured with GEV models that are treated in oneof the case studies (Chapter 8).

Note that we can also do a likelihood ratio test for the null hypothesis:H0 : βIIA = 0. The test statistic for the null hypothesis is given by

−2(LR − LU) = −2(−5245.512+ 5237.543) = 15.938

where the restricted model is the model without the auxiliary variables(SpecTest SM socioec bis.mod) and the unrestricted model is the model withthe auxiliary variables. The test statistic is asymptotically χ2 distributedwith 1 degree of freedom since there is 1 restriction. Since 15.908 > 3.841

(the critical value of the χ2 distribution with 1 degree of freedom at a 95 %level of confidence), we reject the null hypothesis and conclude that the IIAproperty does not hold for the car and train alternatives.

88

Page 93: 220714279-Workbook-2012

swissmetro case 89

Logit model for car/train IIA testParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 0.217 0.159 1.372 ASCSM 0.486 0.129 3.763 βcost -0.00121 0.000116 -10.404 βcar time -0.0103 0.000965 -10.695 βtrain time -0.0118 0.00116 -10.116 βSM time -0.0112 0.00168 -6.657 βhe -0.00516 0.00111 -4.658 βga 6.66 0.703 9.489 βIIA 0.301 0.128 2.35

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5237.543

ρ2 = 0.246

Table 6.2: Logit model for IIA test

89

Page 94: 220714279-Workbook-2012

90 specification testing

Test of Non-Nested Hypotheses

Files to use with Biogeme:Model files: SpecTest SM M1.mod, SpecTest SM M2.mod,

SpecTest SM MC.modData file: swissmetro.dat

In discrete choice analysis, we often perform tests based on the so-callednested hypotheses, which means that we specify two models such that the firstone (the restricted model) is a special case of the second one (the unrestrictedmodel). For this type of comparison, the classical likelihood ratio test canbe applied. However, there are situations in which we aim at comparingmodels which are not nested, meaning that one model cannot be obtained asa restricted version of the other. One way to compare two non-nested modelsis to build a composite model from which both models can be derived. Wecan thus perform two likelihood ratio tests for each of the restricted modelsagainst the composite model. This procedure is known as the Cox test ofseparate families of hypothesis.

Cox Test

The Cox test is described in detail in Ben-Akiva and Lerman (1985), pages171-174, and in the Textbook of the course, in section “Tests of Non-NestedHypothesis”. Assume that we want to test a model M1 against anothermodel M2 (and one model is not a restricted version of the other). We startby generating a composite model MC such that both models M1 and M2 arerestricted cases of MC. We then test M1 against MC and M2 against MC

using the likelihood ratio test. There are three possible outcomes of this test:

• One of the two models is rejected. Then we keep the one that is notrejected.

• Both models are rejected. Then better models should be developed.The composite model could be used as a new basis for future specifi-cations.

• Both models are accepted. Then we choose the model with the higherρ2 index.

90

Page 95: 220714279-Workbook-2012

swissmetro case 91

We show next the expressions of the utility functions used for the threedifferent models M1, M2 and MC. M1 has the following systematic utilities

Vcar = ASCcar + βcar timeCAR TT + βcar costCAR CO

Vtrain = βtrain timeTRAIN TT + βtrain costTRAIN CO

VSM = ASCSM + βSM timeSM TT + βSM costSM CO

where both the time and cost related coefficients are alternative specific. Thesystematic utilities of M2 are

Vcar = ASCcar + βtimeCAR TT + βcar costCAR CO

Vtrain = βtimeTRAIN TT + βtrain costTRAIN CO+

βheTRAIN HE+ βgaGA

VSM = ASCSM + βtimeSM TT + βSM costSM CO+ βheSM HE

+βgaGA

where only the cost related coefficient is assumed to be alternative specific,headway of train and SM has been added, and one socio-economic variablehas been added to the model. We now define the composite model MC withthe following systematic utilities

Vcar = ASCcar + βcar timeCAR TT + βcar costCAR CO

Vtrain = βtrain timeTRAIN TT + βtrain costTRAIN CO+

βheTRAIN HE+ βgaGA

VSM = ASCSM + βSM timeSM TT + βSM costSM CO+

βheSM HE+ βgaGA

In Table 6.3, we summarize the differences between the various models, andwe show in Tables 6.4, 6.5 and 6.6 the estimation results for the M1, M2 andMC models, respectively.

At this point, we can apply the likelihood ratio test for M1 against MC. Inthis case, the null hypothesis is:

H0 : βhe = βga = 0

91

Page 96: 220714279-Workbook-2012

92 specification testing

Models used for the Cox testModel Parameters DescriptionM1 8 two ASC’s, three alternative specific time coef-

ficients and three alternative specific cost coef-ficients

M2 8 two ASC’s, one generic time coefficient, threealternative specific cost coefficients, one genericheadway coefficient and one socio-economic co-efficient

MC 10 two ASC’s, three alternative specific time co-efficients, three alternative specific cost coeffi-cients, one generic headway coefficient and onesocio-economic coefficient

Table 6.3: Summary of the different model specifications

M1 model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.260 0.138 -1.892 ASCSM 0.113 0.106 1.063 βcar cost -0.00785 0.00149 -5.264 βtrain cost -0.0308 0.00193 -15.985 βSM cost -0.0113 0.000790 -14.246 βcar time -0.0129 0.00163 -7.917 βtrain time -0.00870 0.00118 -7.348 βSM time -0.0112 0.00178 -6.25

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5065.901

ρ2 = 0.271

Table 6.4: Estimation results for the M1 model

92

Page 97: 220714279-Workbook-2012

swissmetro case 93

M2 model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.872 0.140 -6.242 ASCSM -0.410 0.103 -3.993 βcar cost -0.00934 0.00116 -8.024 βtrain cost -0.0284 0.00176 -16.085 βSM cost -0.0104 0.000743 -13.996 βtime -0.0111 0.00120 -9.227 βhe -0.00533 0.00102 -5.258 βga 0.521 0.191 2.72

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5055.843

ρ2 = 0.272

Table 6.5: Estimation results for the M2 model

93

Page 98: 220714279-Workbook-2012

94 specification testing

MC model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.529 0.158 -3.352 ASCSM -0.126 0.116 -1.083 βcar cost -0.00776 0.00150 -5.184 βtrain cost -0.0300 0.00200 -14.975 βSM cost -0.0108 0.000828 -12.996 βcar time -0.0129 0.00162 -7.947 βtrain time -0.00866 0.00120 -7.228 βSM time -0.0111 0.00179 -6.199 βhe -0.00535 0.00101 -5.3110 βga 0.513 0.193 2.65

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5047.205

ρ2 = 0.273

Table 6.6: Estimation results for the MC model

94

Page 99: 220714279-Workbook-2012

swissmetro case 95

As usual, −2(L(M1) − L(MC)) is χ2 distributed with K = 2 degrees of free-

dom. In this case, we have:

−2(−5065.901+ 5047.205) = 37.392 > 5.991

The result of this first test is that we can reject the null hypothesis. Applyingthe same test for M2 against MC, we have

H0 : βcar time = βtrain time = βSM time.

In this case, the likelihood ratio test with K = 2 degrees of freedom gives

−2(−5055.843+ 5047.215) = 17.276 > 5.991

and we can therefore reject the null hypothesis in this case as well. Sinceboth models are rejected, better models should be developed. If both modelswere accepted, we would choose the one with the higher ρ2 index.

Tests of Non-Linear Specifications

Files to use with Biogeme:Model files: SpecTest SM piecewise.mod,

SpecTest SM powerseries.mod,SpecTest SM boxcox.mod

Data file: swissmetro.dat

In the previous case study, the models were specified with linear in parameterformulations of the deterministic parts of the utilities (i.e. parameters thatremain constant throughout the whole range of the values of each variable).However, in some cases non-linear specifications may be more justified. Inthis section, we test three different non-linear specifications of the deter-ministic utility functions (see Ben-Akiva and Lerman, 1985, pages 174-179).Namely, piecewise linear approximation, power series method and Box-Coxtransformation are used below.

95

Page 100: 220714279-Workbook-2012

96 specification testing

[Expressions]

TRAIN_TT1 = min( TRAIN_TT , 90)

TRAIN_TT2 = max(0,min( TRAIN_TT - 90, 90))

TRAIN_TT3 = max(0,min( TRAIN_TT - 180 , 90))

TRAIN_TT4 = max(0,TRAIN_TT - 270)

Figure 6.2: Biogeme snapshot concerning the piecewise variables definition

Piecewise Linear Approximation

In this first example, we want to test the hypothesis that the value ofthe travel time related parameter for the train alternative assumes differ-ent values for different ranges of values of the variable itself. We split therange of values for travel time t (which is t ∈ [35, 1022] , expressed in min-utes) into four different intervals: traintt1 ∈ [0, 90], traintt2 ∈ ]90, 180],traintt3 ∈ ]180, 270] and traintt4 > 270. We show in Figure 6.2 the corre-sponding Biogeme code.

The systematic utility expressions used in this model are

Vcar = ASCcar + βcar timeCAR TT + βcar costCAR CO

Vtrain = βtrain time1TRAIN TT1+ βtrain time2TRAIN TT2+

βtrain time3TRAIN TT3+ βtrain time4TRAIN TT4+

βtrain costTRAIN CO + βheTRAIN HE + βGAGA

VSM = ASCSM + βSM timeSM TT + βSM costSM CO+ βheSM HE+

βGAGA

We can see from the estimation results shown in Table 6.7 that all time coeffi-cients related to the piecewise linear expression are negative. The coefficientassociated with very long trips is the largest in magnitude in an absolutesense, meaning that trips longer than 4 hours and a half are more penalizingthe utility function of the train alternative.

We perform the likelihood ratio test where the restricted model is the onewith linear train travel time (the MC model from the previous section) andthe unrestricted model is the piecewise linear specification. The χ2 statistic

96

Page 101: 220714279-Workbook-2012

swissmetro case 97

Piecewise linear model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.991 0.434 -2.282 ASCSM -0.584 0.421 -1.393 βcar cost -0.00776 0.00150 -5.184 βtrain cost -0.0301 0.00204 -14.785 βSM cost -0.0107 0.000828 -12.976 βcar time -0.0129 0.00162 -7.947 βtrain time1 -0.0135 0.00508 -2.658 βtrain time2 -0.0109 0.00180 -6.059 βtrain time3 -0.00208 0.00224 -0.9310 βtrain time4 -0.0179 0.00551 -3.2511 βSM time -0.0112 0.00179 -6.2412 βhe -0.00534 0.00101 -5.3013 βga 0.515 0.193 2.67

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5041.952

ρ2 = 0.274

Table 6.7: Estimation results for the piecewise linear model

97

Page 102: 220714279-Workbook-2012

98 specification testing

for the null hypothesis is given by

H0 : βtrain time1 = βtrain time2 = βtrain time3 = βtrain time4

The test yields

−2(−5047.205+ 5041.952) = 10.506

and since χ20.95,3 = 7.815, we can reject the null hypothesis of a linear train

travel time at a 95% level of confidence.

The Power Series Expansion

We introduce here a power series expansion for the train travel time variable.In principle, we could add a polynomial expression but here we introduce justthe squared term. The subsequent model specification is practically the sameas the MC model, with the exception of the train alternative:

Vtrain = βtrain timeTRAIN TT + βtrain time sqTRAIN TT SQ+

βtrain costTRAIN CO+ βheTRAIN HE+

βGAGA

The estimation results for this specification are shown in Table 6.8. The esti-mated parameter associated with the linear term of the power series expan-sion is negative while the estimated parameter associated with the squaredterm is positive. However, the cumulative effect of the travel time variableon the utility is still negative, as can be easily verified by a plot of utilityversus travel time for a reasonable range of rail travel time.

We perform the likelihood ratio test where the restricted model is the onewith linear train travel time (the MC model from the previous section) andthe unrestricted model is the power series expansion specification. The χ2

statistic for the null hypothesis is given by:

H0 : βtrain time2 = 0

The test yields

98

Page 103: 220714279-Workbook-2012

swissmetro case 99

Power series model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.693 0.190 -3.652 ASCSM -0.289 0.149 -1.943 βcar cost -0.00776 0.00150 -5.184 βtrain cost -0.0299 0.00201 -14.865 βSM cost -0.0108 0.000828 -12.996 βcar time -0.0129 0.00162 -7.957 βtrain time -0.0109 0.00190 -5.728 βtrain time sq 0.00000628 0.00000282 2.239 βSM time -0.0111 0.00178 -6.2310 βhe -0.00537 0.00101 -5.3111 βga 0.515 0.194 2.65

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5046.573

ρ2 = 0.273

Table 6.8: Estimation results for the power series model

99

Page 104: 220714279-Workbook-2012

100 specification testing

[GeneralizedUtilities]

1 B_TRAIN_TIME * ( ( ( TRAIN_TT ) ^ LAMBDA - 1 ) / LAMBDA )

Figure 6.3: Biogeme snapshot of Box-Cox transformation

−2(−5047.205+ 5046.573) = 1.264

and since χ20.95,1 = 3.841, we can accept the null hypothesis of a linear rail

travel time at a 95% level of confidence.

The Box-Cox Transformation

In this section, we analyze the possibility of testing non-linear transforma-tions of variables that are non-linear in the unknown parameters. One pos-sible transformation is the Box-Cox, expressed as

xλ − 1

λ, where x ≥ 0.

We apply this transformation to the train time variable. The utilities remainexactly the same, with the substitution of such a variable with its Box-Coxtransformation. This introduces one more unknown parameter, λ. We showin Figure 6.3 a Biogeme snapshot from the model specification file to visualizehow non-linear in parameters utility functions are implemented.

The results related to the Box-Cox transformed model are shown in Table 6.9.The Box-Cox transformation reduces to a linear function as a special casewhen the parameter λ is equal to 1. Looking at the estimated values, we seethat λ is significantly different from 1 at a 95 % level of confidence (t-stat= -2.13). Note though that the parameter βtrain time associated with traintravel time is not significant.

We can also perform a likelihood ratio test as follows. The null hypothesisis given by:

H0 : λ = 1

The χ2 statistic for this null hypothesis is as follows:

100

Page 105: 220714279-Workbook-2012

swissmetro case 101

−2(L(βL) − L(βBC)) = −2(−5047.205+ 5045.420) = 3.570

χ20.95,1 = 3.841 > 3.570

Therefore, the null hypothesis of a linear specification is accepted at a 95% level of confidence. Note that the t-test and the likelihood ratio test fortesting one restriction are asymptotically equivalent. Here the t-stat withrespect to 1 is equal to -2.13, so λ is close to being insignificant (w.r.t. 1). Insmall samples, the likelihood ratio test is preferred to the t-test. Therefore,we prefer the linear specification over the Box-Cox transformation in thiscase.

Box-Cox transformed model: estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -1.72 1.01 -1.712 ASCSM -1.32 1.01 -1.313 βcar cost -0.00776 0.00150 -5.184 βtrain cost -0.0298 0.00200 -14.905 βSM cost -0.0107 0.000828 -12.986 βcar time -0.0129 0.00162 -7.957 βtrain time -0.128 0.160 -0.808 βSM time -0.0111 0.00178 -6.239 βhe -0.00535 0.00101 -5.3010 βga 0.508 0.194 2.6211 λ 0.465 0.251 1.85

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5045.420

ρ2 = 0.273

Table 6.9: Estimation results for the Box-Cox transformed model

101

Page 106: 220714279-Workbook-2012

102 specification testing

6.2 Choice of Residential Telephone Services

Case

Market Segmentation

Files to use with Biogeme:Model files: SpecTest Tel low inc.mod, SpecTest Tel med inc.mod,

SpecTest Tel high inc.mod, MNL Tel socioec.modData file: telephone.dat

We test if there is a taste variation across market segments. We definedifferent segments based on income and divide the population into threeincome groups. We estimate separate models for each income group usingthe same model specification, namely MNL Tel socioec.mod used in the logitcase study, and compare the estimation results with a model based on thecomplete dataset. The results in terms of final log-likelihood are summarizedin Table 6.10.

The null hypothesis is of no taste variation across the market segments, thatis

H0 : βHI = βMI = βLI.

Performing a likelihood ratio test,

LR = −2(LN(β) −

G∑

g=1

LNg(βg))

= −2(−468.791+ 120.103+ 297.990+ 46.668) = 8.060

χ20.95,13 = 22.360

We can conclude that the null hypothesis cannot be rejected, that is, marketsegmentation on income does not exist.

102

Page 107: 220714279-Workbook-2012

choice of residential telephone services case 103

Model Definition Log- Nb. oflikelihood Coefficients

Low Income Income < 10000 -120.103 6Medium Income 10000 < Income < 40000 -297.990 7High Income Income > 40000 -46.668 7Pooled DataRestricted Model All -468.791 7

Table 6.10: Results for the market segmentation test

IIA test estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.185 0.233 -0.792 ASCLF 0.801 0.166 4.823 ASCEF 1.07 0.833 1.284 ASCMF 1.83 0.279 6.565 βcost -1.26 0.228 -5.516 βIIAm 0.832 0.334 2.497 βIIAf

1.83 0.538 3.41

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −460.754

ρ2 = 0.165

Table 6.11: Estimation results for the IIA test

103

Page 108: 220714279-Workbook-2012

104 specification testing

[IIATest]

C12 1 2

C345 3 4 5

Figure 6.4: Biogeme snapshot: IIATest section

McFadden IIA Test

Files to use with Biogeme:Model file: MNL base.mod, MNL base IIA.modData file: telephone.datCommand file: doit.batSupplementary software: biomerge.exe

For the telephone dataset, it is possible that there are common unobservedattributes between the measured options (alternatives BM and SM ) andcommon unobserved attributes among the flat options (alternatives LF,EF,and MF ). We can perform the McFadden IIA test to check this. The proce-dure is described in McFadden (1987) and Train et al. (1989). We estimatea logit model (MNL base.mod) on the full dataset telephone.dat. The speci-fication file (MNL base.mod) contains a section describing the correlation wewant to test. The corresponding Biogeme snapshot is shown in Figure 6.4.Alternatives 1 and 2 correspond to measured options, alternatives 3, 4, 5 toflat options. Then the estimated model is applied on the same data file, usingBioSim. By defining the section [IIATest] in the orginal .mod file, auxiliaryvariables are automatically computed for each observation, and reported inthe .enu output file. The original .dat file and the.enu file are merged us-ing BIOMERGE in order to create a new data file. As discussed above,we assume in this case that there are 2 subsets of alternatives suspected tobe correlated: C1 = {BM,SM} and C2 = {LF,EF,MF}. Now we specify anew model (MNL base IIA.mod) which includes the auxiliary variables inthe utility functions associated with measured and flat options. Finally, weestimate the model on the new data file created by merging and obtain theresults shown in Table 6.11. Note that the entire procedure described abovecan be carried out automatically using the command file doit.bat

We do a likelihood ratio test where the null hypothesis is

H0 : βIIAm = βIIAf= 0.

104

Page 109: 220714279-Workbook-2012

choice of residential telephone services case 105

The test statistic for the null hypothesis is given by

−2(LR − LU) = −2(−477.557+ 460.747) = 33.620

where the restricted model is the model without the auxiliary variables andthe unrestricted model is the model with the auxiliary variables. The teststatistic is asymptotically χ2 distributed with 2 degrees of freedom sincethere are 2 restrictions. Since 33.620 > 5.991 (the critical value of the χ2

distribution with 2 degrees of freedom at a 95 % level of confidence), wereject the null hypothesis and conclude that the IIA assumption does nothold for the group of measured alternatives and does not hold for the groupof flat alternatives as well. In presence of such correlations, GEV models likethe Nested Logit are more appropriate.

Test of Non-Nested Hypotheses

In discrete choice analysis, we often perform tests based on the so-callednested hypotheses, which means that we specify two models such that the firstone (the restricted model) is a special case of the second one (the unrestrictedmodel). For this type of comparison, the classical likelihood ratio test canbe applied. However, there are situations in which we aim at comparingmodels which are not nested, meaning that one model cannot be obtained asa restricted version of the other. One way to compare two non-nested modelsis to build a composite model from which both models can be derived. Wecan thus perform two likelihood ratio tests for each of the restricted modelsagainst the composite model. This procedure is known as the Cox test ofseparate families of hypothesis.

Cox Test

Files to use with Biogeme:Model file: SpecTest Tel M1.mod, SpecTest Tel M2.mod

SpecTest Tel MC.modData file: telephone.dat

The Cox test is described in detail in Ben-Akiva and Lerman (1985), pages171-174, and in the Textbook of the course, in section “Tests of Non-Nested

105

Page 110: 220714279-Workbook-2012

106 specification testing

Hypothesis”. Assume that we want to test a model M1 against anothermodel M2 (and one model is not a restricted version of the other). We startby generating a composite model MC such that both models M1 and M2 arerestricted cases of MC. We then test M1 against MC and M2 against MC

using the likelihood ratio test. There are three possible outcomes of this test:

• One of the two models is rejected. Then we keep the one that is notrejected.

• Both models are rejected. Then better models should be developed.

• Both models are accepted. Then we choose the model with the higherρ2 index.

The deterministic parts of the utility functions for each of the three modelspecifications are:

1. M1

VBM = ASCBM + βMcostcostBM

VSM = βMcostcostSM

VLF = ASCLF + βFcostcostLF

VEF = ASCEF + βFcostcostEF

VMF = ASCMF + βFcostcostMF

2. M2

VBM = ASCBM + βcostcostBM

VSM = βcostcostSM

VLF = ASCLF + βcostcostLF + βusersusers

VEF = ASCEF + βcostcostEF + βusersusers

VMF = ASCMF + βcostcostMF + βusersusers

106

Page 111: 220714279-Workbook-2012

choice of residential telephone services case 107

Model Nb. of parameters Log-likelihood ρ2

M1 6 -476.040 0.140M2 6 -471.151 0.148MC 7 -467.804 0.153

Table 6.12: Results from the non-nested hypothesis test

3. Mc

VBM = ASCBM + βMcostcostBM

VSM = βMcostcostSM

VLF = ASCLF + βFcostcostLF + βusersusers

VEF = ASCEF + βFcostcostEF + βusersusers

VMF = ASCMF + βFcostcostMF + βusersusers

The estimation results of the different models are summarized in Table 6.12.We first compare the M1 model specification against the composite modelMC by means of a likelihood ratio test:H0 : βusers = 0

−2(L(βM1) − L(βMC)) = −2(−476.040+ 467.804) = 16.472

χ20.95,1 = 3.841 < 16.472

We can therefore reject the null hypothesis of not including socio-economicvariables. We then compare M2 against MC:H0 : βMcost = βFcost

−2(L(βM2) − L(βMC)) = −2(−471.151+ 467.804) = 6.694

χ20.95,1 = 3.841 < 6.694

We can therefore reject the null hypothesis of generic coefficients. Since bothmodels are rejected, we need to develop better models. Had both modelsbeen accepted, we could have used ρ2 to choose which model to keep.

The adjusted likelihood ratio index ρ2 is computed as follows (it is providedin the Biogeme result file):

ρ2 = 1−L(β) − K

L(0)

107

Page 112: 220714279-Workbook-2012

108 specification testing

So, for the two models M1 and M2, we obtain respectively:

ρ12 = 0.140

ρ22 = 0.148

Tests of Non-Linear Specifications

In the previous case study, the models were specified with linear in param-eter formulations of the deterministic parts of the utilities (parameters thatremain constant throughout the whole range of the values of each variable).However, in some cases, non-linear specifications may be more justified (e.g.sensitivity to cost may not be the same in all cost ranges). In this section,we test three different non-linear specifications of the deterministic utilityfunctions (see Ben-Akiva and Lerman, 1985, pages 174-179). Namely, piece-wise linear approximation, power series method and Box-Cox transformationare used below. We have used the logit model with alternative specific costcoefficients as the base model (SpecTest Tel M1.mod).

Piecewise Linear Approximation

Files to use with Biogeme:Model file: SpecTest Tel piecewise.modData file: telephone.dat

In the first model, we assume that the coefficient of measured cost assumesdifferent values for different ranges of the cost variable. The full range ofvalues for the measured cost variable is $3.28 to $433.5. We split the rangeof values for costi (which is costi ∈ [3.28, 433.5] , expressed in dollars) intothree different intervals: costi1 ∈ [0, 10], costi2 ∈ ]10, 50] and costi3 > 50.The selection of these ranges is based on a priori hypotheses of the userbehavior and distribution of cost in the observed sample. The reader isencouraged to experiment with different ranges. An extract from the Biogememodel file to code the ranges of costs is presented in Figure 6.5.

108

Page 113: 220714279-Workbook-2012

choice of residential telephone services case 109

[Expressions]

// Define here arithmetic expressions for name

// that are not directly available from the data

cost11 =min(cost1 ,10)

cost12 =max(0,min(cost1 - 10 ,40))

cost13 =max(0,cost1 - 50)

cost21 =min(cost2 ,10)

cost22 =max(0,min(cost2 - 10 ,40))

cost23 =max(0,cost2 - 50)

Figure 6.5: Biogeme snapshot for the piecewise linear approximation

The deterministic utility functions are

VBM = ASCBM + βMcost1costBM1 + βMcost2costBM2 + βMcost3costBM3

VSM = βMcost1costSM1 + βMcost2costSM2 + βMcost3costSM3

VLF = ASCLF + βFcostcostLF

VEF = ASCEF + βFcostcostEF

VMF = ASCMF + βFcostcostMF

The results shown in Table 6.13 indicate that the sensitivity to measuredcost becomes less important in the range 10 < costi < 50 compared to therange costi < 10, but has a steep increase for higher costs. This model has abetter goodness-of-fit than the model with linear coefficients in general. Totest whether or not the improvement in goodness-of-fit is statistically signif-icant, we need to perform a likelihood ratio test between the two differentspecifications.

The null hypothesis in this case is

H0 : βMcost1 = βMcost2 = βMcost3

The χ2 statistic for this null hypothesis is as follows:

−2(L(βR) − L(βU)) = −2(−476.040+ 474.703) = 2.674

χ20.95,2 = 5.991 > 2.674

109

Page 114: 220714279-Workbook-2012

110 specification testing

Piecewise linear approximationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.613 0.152 -4.032 ASCLF -0.631 0.500 -1.263 ASCEF -0.843 0.869 -0.974 ASCMF -0.261 0.640 -0.415 βMcost1 -0.294 0.0661 -4.446 βMcost2 -0.149 0.0665 -2.237 βMcost3 -1.23 0.629 -1.968 βFcost -0.105 0.0217 -4.84

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −474.703

ρ2 = 0.138

Table 6.13: Estimation results for the piecewise linear approximation

110

Page 115: 220714279-Workbook-2012

choice of residential telephone services case 111

where the restricted model (R) is represented by the linear specification whilethe unrestricted model (U) corresponds to the piecewise linear specification.The improvement in goodness-of-fit due to the introduction of the piecewiselinear specification is not significant and the null hypothesis that the costcoefficient is linear cannot be rejected.

The Power Series Expansion

Files to use with Biogeme:Model file: SpecTest Tel powerseries.modData file: telephone.dat

In this test, we relax the hypothesis of linear coefficients for measured optionsby assuming a second order power series (a squared term and a linear term).The corresponding systematic utility functions are

VBM = ASCBM + βMcost1costBM + βMcost2cost2BM

VSM = βMcost1costSM + βMcost2cost2SM

VLF = ASCLF + βFcostcostLF

VEF = ASCEF + βFcostcostEF

VMF = ASCMF + βFcostcostMF.

From the estimation results presented in Table 6.14, it may be noted that thecoefficient of the squared term is positive while the coefficient of the linearterm is negative, and the coefficient of the linear term is greater in absolutevalue than that of the squared term. However, since the squared term is verysmall in magnitude, the total effect is expected to remain negative in thecost range which can be easily verified through a plot of utility versus cost.

To test whether or not we should prefer the power series expansion specifica-tion over the linear specification, we need to perform a likelihood ratio test.The null hypothesis in this case is:

H0 : βMcost2 = 0

The χ2 statistic for this null hypothesis is as follows:

111

Page 116: 220714279-Workbook-2012

112 specification testing

Power series estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.563 0.147 -3.832 ASCLF -0.162 0.370 -0.443 ASCEF -0.377 0.813 -0.464 ASCMF 0.215 0.532 0.415 βMcost1 -0.227 0.0427 -5.326 βMcost2 0.000475 0.0000936 5.077 βFcost -0.107 0.0218 -4.91

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −475.465

ρ2 = 0.139

Table 6.14: Estimation results for the power series expansion

−2(L(βR) − L(βU)) = −2(−476.040+ 475.465) = 1.150

χ20.95,1 = 3.841 > 1.150

where now the unrestricted model (U) corresponds to the power series spec-ification. Therefore, we accept the null hypothesis of a linear specificationat a 95 % level of confidence, and we select the linear specification over thepower series expansion specification.

The Box-Cox Transformation

Files to use with Biogeme:Model file: SpecTest Tel boxcox.modData file: telephone.dat

In this section, we analyze the possibility of testing non-linear transforma-tions of variables which are non-linear in the unknown parameters. One suchtransformation is the Box-Cox expressed as

112

Page 117: 220714279-Workbook-2012

choice of residential telephone services case 113

[Utilities]

// Id Name Avail linear-in-parameter expression

1 BM avail1 ASC_BM * one

2 SM avail2 ASC_SM * one

3 LF avail3 ASC_LF * one + B_FCOST * cost3

4 EF avail4 ASC_EF * one + B_FCOST * cost4

5 MF avail5 ASC_MF * one + B_FCOST * cost5

[GeneralizedUtilities]

1 B_MCOST * ( ( ( cost1 )^ LAMBDA - 1)/LAMBDA )

2 B_MCOST * ( ( ( cost2 )^ LAMBDA - 1)/LAMBDA )

Figure 6.6: Biogeme snapshot for the Box-Cox transformation

xλ − 1

λ, where x ≥ 0.

where λ is a parameter that has to be estimated. We apply such a transfor-mation to the measured cost variable. The utilities remain the same with thesubstitution of the measured cost variable with its Box-Cox transformation.The Biogeme snapshot defining such a transformation is shown in Figure 6.6.

The parameter λ is estimated along with the other parameters.

The estimation results are shown in Table 6.15. The estimate of λ was notfound to be statistically significantly different from 0. However, it is statisti-cally significantly different from 1 (t-statistic w.r.t. 1 is -2.51). Therefore, weshould prefer this non-linear specification over the linear specification. Wecan also perform a likelihood ratio test as follows. The null hypothesis isgiven by:

H0 : λ = 1

The χ2 statistic for this null hypothesis is as follows:

−2(L(βL) − L(βBC)) = −2(−476.040+ 472.624) = 6.832

χ20.95,1 = 3.841 < 6.832

Therefore, the null hypothesis of a linear specification can be rejected at a95 % level of confidence, and we prefer the Box-Cox transformation.

113

Page 118: 220714279-Workbook-2012

114 specification testing

Box-Cox estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCBM -0.695 0.166 -4.192 ASCLF -1.76 1.20 -1.463 ASCEF -1.98 1.39 -1.434 ASCMF -1.39 1.28 -1.095 βFcost -0.104 0.0215 -4.836 βMcost -1.30 0.880 -1.477 λ 0.234 0.305 0.77

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −472.624

ρ2 = 0.144

Table 6.15: Estimation results for the Box-Cox transformation

114

Page 119: 220714279-Workbook-2012

airline itinerary case 115

6.3 Airline Itinerary Case

Market Segmentation

Files to use with Biogeme:Model files: SpecTest airline male.mod,

SpecTest airline female.mod,SpecTest airline GenderNA.mod,SpecTest airline full.mod,

Data file: airline.dat

In this example, we test if there exists taste variation across market segments.The segmentation is made on the gender variable. We first create threemarket segments as follows: Male, Female, and no answer (NA). The sum ofthe number of observations for each segment is equal to the total number ofobservations:

NMale +NFemale +NNA = N

We estimate a model on the full data set. Then we estimate the same modelfor each gender group separately. Note that we make use of the [Exclude]

section in the model specification file to define the observations which shouldbe excluded for the estimation. We obtain the values shown in Table 6.16.The expressions of the utility functions are the same for all models:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1 + βTotal TT · Total TT1+βSchedDE ·Opt1 SchedDelayEarly+ βSchedDL ·Opt1 SchedDelayLate

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2 + βTotal TT · Total TT2+βSchedDE ·Opt2 SchedDelayEarly+ βSchedDL ·Opt2 SchedDelayLate

V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3 + βTotal TT · Total TT3+βSchedDE ·Opt3 SchedDelayEarly+ βSchedDL ·Opt3 SchedDelayLate

Let us remark that one of the three alternative specific constants ASC1,ASC2 and ASC3 must be set to 1 for normalization purposes.

The null hypothesis assumes no taste variation across the market segments:

H0 : βMale = βFemale = βNA

115

Page 120: 220714279-Workbook-2012

116 specification testing

Model Log likelihood Number of coefficientsMale -1195.819 9Female -929.325 9NA -178.017 9Restricted model -2320.447 9

Table 6.16: Values for the market segmentation test

where βsegment is the vector of coefficients of market segment. Note that inthe above equation Male, Female and NA refer to market segments andnot to variables in the dataset.

The likelihood ratio test (with 27− 9 = 18 degrees of freedom) yields

LR = −2(LN(β) −

(LNMale

(βMale) + LNFemale)(βFemale) + LNNA

(βNA)))

= −2(−2320.447+ 1195.819+ 929.325+ 178.017) = 34.572

χ20.95,18 = 28.87

and we can therefore reject the null hypothesis at a 95% level of confidence:market segmentation on gender does exist.

McFadden IIA Test

Files to use with Biogeme:Model files: SpecTest airline full.mod,

SpecTest airline IIA.modData file: airline.dat

In this survey, the choice is made between three flight itineraries, two of whichare with the same company. It is possible that there are common unobservedattributes between the two itineraries of the same company. It would seemlogical to expect a relationship between the traditional alternatives. Theymight be correlated. In order to test this assumption, we perform the McFad-den IIA test. First we estimate a logit model (SpecTest airline full bis.mod)

116

Page 121: 220714279-Workbook-2012

airline itinerary case 117

on the full data set airline.dat. The specification file SpecTest airline full bis.modcontains a section describing the correlation we want to test. The correspond-ing Biogeme snapshot is shown in Figure 6.7. Alternative 1 corresponds toan itinerary without stops, and alternative 2 to an itinerary with the samecompany but with one stop.

Biogeme SpecTest airline full bis airline.dat

[IIATest]

C12 1 2

Figure 6.7: Biogeme snapshot: IIATest section

By defining the section [IIATest] in the orginal .mod file, auxiliary vari-ables are automatically computed for each observation, and reported inthe .enu output file. Biogeme also produces a file containing the specifi-cation of the estimated model, in the same format as the model specifica-tion file SpecTest airline full bis.res. We need to rename it as a .mod file:SpecTest airline full bis res.mod in order to apply it on the same data file,using BioSim:

biosim SpecTest_airline_full_bis_res airline.dat

The original .dat file and the SpecTest airline full bis res.enu file need to bemerged in order to create a new data file that contains both the originalmodel variables and the auxiliary variables. This step is performed usingBIOMERGE:

biomerge airline.dat SpecTest_airline_full_bis_res.enu

The merged data file is stored into a file named biomergeOutput.lis. We re-name this file as SpecTest airline IIATest.dat. Now we specify a new model(SpecTest airline IIA.mod) which includes the auxiliary variables in the util-ity functions associated with alternatives 1 and 2. Finally, we estimate thismodel on the new data file created by merging the original data file andSpecTest airline full res.enu, using the following command:

Biogeme SpecTest airline IIA SpecTest airline IIATest.dat

117

Page 122: 220714279-Workbook-2012

118 specification testing

Logit model for IIA test for itineraries 1 and 2Parameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASC2 -1.51 0.211 -7.142 ASC3 -1.65 0.194 -8.513 βFare -0.0198 0.00104 -18.944 βLegroom 0.232 0.0281 8.245 βSchedDE -0.143 0.0168 -8.496 βSchedDL -0.107 0.0145 -7.407 βTotal TT1 -0.341 0.0744 -4.588 βTotal TT2 -0.304 0.0700 -4.349 βTotal TT3 -0.312 0.00111 -4.6510 βIIA -0.0489 0.0714 -4.37

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2320.155

ρ2 = 0.412

Table 6.17: Logit model for IIA test

118

Page 123: 220714279-Workbook-2012

airline itinerary case 119

The estimation results are shown in Table 6.17.

In the IIA Test, we are interested in the value of the t-statistic for thecoefficient related to the auxiliary variables. If βIIA is significantly differentfrom 0 at a 95% level of confidence, this indicates that the IIA property doesnot hold for alternatives 1 and 2. It would mean that alternatives 1 and 2share some unobserved attributes.

However Table 6.17 shows that parameter βIIA is not significantly differentfrom 0. Hence we cannot conclude that the IIA property does not hold.The calibration of more complex models such as Generalized Extreme Value(GEV) models which capture correlation between alternative sharing somecommon characteristics might not be justified in this case. We can hencekeep the logit specification.

Let us note that the whole procedure for the IIA test can be performedautomatically by double-clicking on batch file doit.bat.

Test of Non-Nested Hypotheses

Files to use with Biogeme:Model files: SpecTest airline full LogFare.mod (M1),

SpecTest airline full.mod (M2),SpecTest airline full C.mod (MC)

Data file: airline.dat

In discrete choice analysis, we often perform tests based on so-called nestedhypotheses, which means that we specify two models such that the first one(the restricted model) is a special case of the second one (the unrestrictedmodel). For this type of comparison, the classical likelihood ratio test canbe applied. However, there are situations, such as non-linear specifications,in which we aim at comparing models which are not nested, i.e. one modelcannot be obtained as a restricted version of the other. One way to comparetwo non-nested models is to build a composite model from which both modelscan be derived. We can thus perform two likelihood ratio tests, testing eachof the restricted models against the composite model. This procedure isknown as the Cox test of separate families of hypothesis.

119

Page 124: 220714279-Workbook-2012

120 specification testing

Cox Test

The Cox test is described in detail in Ben-Akiva and Lerman (1985), pages171-174, and in the Textbook of the course, section “Tests of Non-NestedHypothesis”. Assume that we want to test a model M1 against anothermodel M2 (and one model is not a restricted version of the other). We startby generating a composite model MC such that both models M1 and M2 arerestricted cases of MC. We then test M1 against MC and M2 against MC

using the likelihood ratio test. There are three possible outcomes of this test:

• One of the two models is rejected. Then we keep the one that is notrejected.

• Both models are rejected. Then better models should be developed.The composite model could be used as a new basis for future specifi-cations.

• Both models are accepted. Then we choose the model with the highestρ2 index.

We present here the expressions of the utility functions used for three differentmodels M1, M2 and MC developed on the airline itinerary case study.

M1 has the following systematic utilities:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1 + βTotal TT1 · Total TT1+βSchedDE ·Opt1 SchedDelayEarly+ βSchedDL ·Opt1 SchedDelayLate

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2 + βTotal TT2 · Total TT2+βSchedDE ·Opt2 SchedDelayEarly+ βSchedDL ·Opt2 SchedDelayLate

V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3 + βTotal TT3 · Total TT3+βSchedDE ·Opt3 SchedDelayEarly+ βSchedDL ·Opt3 SchedDelayLate

where the cost related coefficients are linear.

120

Page 125: 220714279-Workbook-2012

airline itinerary case 121

The systematic utilities of M2 are expressed as follows:

V1 = ASC1 + βLogFare · log(Fare1) + βLegroom · Legroom1 + βTotal TT1 · Total TT1+βSchedDE ·Opt1 SchedDelayEarly+ βSchedDL ·Opt1 SchedDelayLate

V2 = ASC2 + βLogFare · log(Fare2) + βLegroom · Legroom2 + βTotal TT2 · Total TT2+βSchedDE ·Opt2 SchedDelayEarly+ βSchedDL ·Opt2 SchedDelayLate

V3 = ASC3 + βLogFare · log(Fare3) + βLegroom · Legroom3 + βTotal TT3 · Total TT3+βSchedDE ·Opt3 SchedDelayEarly+ βSchedDL ·Opt3 SchedDelayLate

where the cost related coefficients are logarithmic.

We now define the composite model MC with the following systematic utili-ties:

V1 = ASC1 + βFare · Fare1 + βLogFare · log(Fare1) + βLegroom · Legroom1

+βTotal TT1 · Total TT1 + βSchedDE ·Opt1 SchedDelayEarly

+βSchedDL ·Opt1 SchedDelayLate

V2 = ASC2 + βFare · Fare1 + βLogFare · log(Fare2) + βLegroom · Legroom2

+βTotal TT2 · Total TT2 + βSchedDE ·Opt2 SchedDelayEarly

+βSchedDL ·Opt2 SchedDelayLate

V3 = ASC3 + βFare · Fare1 + βLogFare · log(Fare3) + βLegroom · Legroom3

+βTotal TT3 · Total TT3 + βSchedDE ·Opt3 SchedDelayEarly

+βSchedDL ·Opt3 SchedDelayLate

Table 6.18 summarizes the differences between the various models and Ta-bles 6.19, 6.20 and 6.21 show the estimation results for models M1, M2 andMC, respectively.

Now we can apply the likelihood ratio test for M1 against MC. In this case,the null hypothesis is:

H0 : βLogFare = 0

As usual, −2(L(M1) − L(MC)) is χ2 distributed with K = 1 degrees of free-

dom. In this case, we have:

−2(−2320.447+ 2271.656) = 97.582 > 3.84

121

Page 126: 220714279-Workbook-2012

122 specification testing

Models used for the Cox testModel Parameters DescriptionM1 9 two ASCs, one generic cost linear coefficient,

three generic time coefficients and three genericcoefficients (for legroom, schedule delay – earlydeparture, schedule delay – late departure)

M2 9 two ASCs, one generic cost logarithmic coeffi-cient, three alternative specific time coefficientsand three generic coefficients (for legroom,schedule delay – early departure, schedule de-lay – late departure)

MC 10 two ASCs, one generic cost logarithmic coeffi-cient, one generic cost logarithmic coefficient,three alternative specific time coefficients andthree generic coefficients (for legroom, scheduledelay – early departure, schedule delay – latedeparture)

Table 6.18: Summary of the different model specifications

122

Page 127: 220714279-Workbook-2012

airline itinerary case 123

Parameter Parameter Parameter Robustnumber name estimate standard error t-stat p-value

1 ASC2 -1.43 0.183 -7.81 0.002 ASC3 -1.64 0.192 -8.53 0.003 Fare -0.0193 0.000802 -24.05 0.004 Legroom 0.226 0.0267 8.45 0.005 SchedDE -0.139 0.0163 -8.53 0.006 SchedDL -0.104 0.0137 -7.59 0.007 Total TT1 -0.332 0.0735 -4.52 0.008 Total TT2 -0.299 0.0696 -4.29 0.009 Total TT3 -0.302 0.0699 -4.31 0.00

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2320.447

ρ2 = 0.412

Table 6.19: Estimation results for model M1

The result of this first test is that we can reject the null hypothesis H0: itmeans the composite model is better than M1. The linear model is rejected.Applying the same test for M2 against MC, we have

H1 : βFare = 0.

In this case, the likelihood ratio test with K = 2 degrees of freedom gives

−2(−2283.103+ 2271.656) = 22.894 > 3.84

and we can therefore reject the null hypothesis H1 in this case as well. Thelogaritmic model is also rejected. Since both models are rejected, bettermodels should be developed: we cannot keep the composite model with twodifferent cost-related coefficients since it does not have a behavioral interpre-tation. If both models had been accepted, we would choose the one with thehighest ρ2 index.

123

Page 128: 220714279-Workbook-2012

124 specification testing

Parameter Parameter Parameter Robustnumber name estimate standard error t-stat p-value

1 ASC2 -1.82 0.194 -9.39 0.002 ASC3 -2.09 0.200 -10.46 0.003 Fare -8.54 0.305 -28.02 0.004 Legroom 0.219 0.0261 8.38 0.005 SchedDE -0.142 0.0167 -8.50 0.006 SchedDL -0.105 0.0139 -7.54 0.007 Total TT1 -0.465 0.0729 -6.37 0.008 Total TT2 -0.335 0.0690 -4.86 0.009 Total TT3 -0.321 0.0692 -4.63 0.00

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2283.103

ρ2 = 0.422

Table 6.20: Estimation results for model M2

Tests of Non-Linear Specifications

Files to use with Biogeme:Model files: SpecTest airline piecewise.mod,

SpecTest airline powerseries.mod,SpecTest airline boxcox.mod

Data file: airline.dat

The models studied previously were specified with linear-in-parameter formu-lations of the deterministic parts of the utilities (i.e. parameters that remainconstant throughout the whole range of the values of each variable). How-ever, in some cases non-linear specifications may be more justified. In thissection, we test three different non-linear specifications of the deterministicutility functions: a piecewise linear specification of the time parameter of thenon-stop itinerary, a power series method and Box-Cox transformation.

124

Page 129: 220714279-Workbook-2012

airline itinerary case 125

Parameter Parameter Parameter Robustnumber name estimate standard error t-stat p-value

1 ASC2 -1.69 0.193 -8.74 0.002 ASC3 -1.94 0.199 -9.72 0.003 Fare -0.00658 0.00154 -4.28 0.004 Legroom 0.223 0.0265 8.40 0.005 LogFare -5.96 0.665 -8.96 0.006 SchedDE -0.142 0.0167 -8.51 0.007 SchedDL -0.106 0.0140 -7.57 0.008 Total TT1 -0.415 0.0739 -5.62 0.009 Total TT2 -0.324 0.0694 -4.67 0.0010 Total TT3 -0.316 0.0697 -4.53 0.00

Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2271.656

ρ2 = 0.425

Table 6.21: Estimation results for model MC

125

Page 130: 220714279-Workbook-2012

126 specification testing

Piecewise Linear Approximation

In this first example, we want to test the hypothesis that the value ofthe travel time related parameter for the non-stop itinerary alternative as-sumes different values for different ranges of values of the variable itself. Wesplit the range of values for travel time TripTimeHours1 ∈ [0.67, 6.35] (ex-pressed in hours) into three different intervals: TripTimeHours11 ∈ [0, 2],TripTimeHours12 ∈ ]2, 3], TripTimeHours13 > 3. Figure 6.8 displays thecorresponding Biogeme code.

[Expressions]

TripTimeHours_1_1 = min( TripTimeHours_1 , 2)

TripTimeHours_1_2 = max(0,min( TripTimeHours_1 - 2, 1))

TripTimeHours_1_3 = max(0,TripTimeHours_1 - 3)

Figure 6.8: Biogeme snapshot for the definition of the variables related tothe piecewise linear approximation

The systematic utility expressions used in this model are given as follows:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1

+βSchedDE ·Opt1 SchedDelayEarly+ βSchedDL ·Opt1 SchedDelayLate

+βTotal TT1 1 · Total TT1 1+ βTotal TT1 2 · Total TT1 2

+βTotal TT1 3 · Total TT1 3

V2 = ASC2 + βFare · Fare2 + βLegroom · Legroom2

+βSchedDE ·Opt2 SchedDelayEarly+ βSchedDL ·Opt2 SchedDelayLate

+βTotal TT2 · Total TT2V3 = ASC3 + βFare · Fare3 + βLegroom · Legroom3

+βSchedDE ·Opt3 SchedDelayEarly+ βSchedDL ·Opt3 SchedDelayLate

+βTotal TT3 · Total TT3

The estimation results are shown in Table 6.22. All time coefficients related tothe piecewise linear expression are negative. The coefficient associated withshort trips (< 2 hours) is the largest in absolute value, meaning that the sameincrease of travel time penalizes the utility of the non-stop alternative more

126

Page 131: 220714279-Workbook-2012

airline itinerary case 127

if the trip is shorter than 2 hours than if is longer than 2 hours. Similarly,the coefficient associated with trips with an intermediate duration (between2 and 3 hours) penalizes more the utility of the non-stop alternative than ifthe trip lasts longer than 3 hours.

Piecewise linear model: estimation resultsParameter Parameter Coeff. Robust Robust

number name estimate standard error t-stat1 ASC2 -2.33 0.412 -5.652 ASC3 -2.55 0.438 -5.833 βFare -0.0193 0.000799 -24.104 βLegroom 0.227 0.0267 8.515 βSchedDE -0.140 0.0165 -8.476 βSchedDL -0.105 0.0137 -7.647 βTotal TT1 1 -0.825 0.238 -3.478 βTotal TT1 2 -0.443 0.188 -2.369 βTotal TT1 3 -0.229 0.0889 -2.5710 βTotal TT2 -0.300 0.0701 -4.2911 βTotal TT3 -0.301 0.0701 -4.29

. . .Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2315.041

ρ2 = 0.413

Table 6.22: Estimation results for the piecewise linear model

We perform a likelihood ratio test where the restricted model is the one withlinear travel time for the non-stop alternative and the unrestricted model isthe piecewise linear specification. The null hypothesis is given as follows:

H0 : βTotal TT1 1 = βTotal TT1 2 = βTotal TT1 3

The statistic for the likelihood ratio test is the following:

−2(−2320.447+ 2315.041) = 10.812

Since χ20.95,2 = 5.99, we can reject the null hypothesis of a linear travel time

for the non-stop alternative at a 95% level of confidence.

127

Page 132: 220714279-Workbook-2012

128 specification testing

The Power Series Expansion

We introduce here a power series expansion for the travel time of the non-stop itinerary. Other polynomial expressions could be tried as well, but inthe following example, we only specify a squared term.

The specification of the model presented in this section is the same as theone presented in the previous section, except for the alternative relative tothe non-stop itinerary. The latter is given as follows:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1 +

βSchedDE ·Opt1 SchedDelayEarly+ βSchedDL ·Opt1 SchedDelayLate

+βTotal TT1 · Total TT1 + βTotal TT1 sq · Total TT1 sq

Power series model: estimation resultsParameter Parameter Coeff. Robust Robust

number name estimate standard error t-stat1 ASC2 -2.21 0.298 -7.422 ASC3 -2.43 0.312 -7.783 βFare -0.0193 0.000800 -24.114 βLegroom 0.227 0.0267 8.515 βSchedDE -0.139 0.0165 -8.466 βSchedDL -0.105 0.0137 -7.637 βTotal TT1 -0.870 0.172 -5.058 βTotal TT1 sq 0.0745 0.0220 3.389 βTotal TT2 -0.301 0.0701 -4.3010 βTotal TT3 -0.302 0.0701 -4.31

. . .Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2314.435

ρ2 = 0.414

Table 6.23: Estimation results for the power series model

128

Page 133: 220714279-Workbook-2012

airline itinerary case 129

The estimation results for this specification are shown in Table 6.23. The es-timated parameter associated with the linear term of the power series expan-sion is negative while the estimated parameter associated with the squaredterm is positive. However, for reasonable travel times, the cumulative effectof the travel time variable on the utility is still negative, as the coefficientassociated with the power series term is much smaller in absolute value.

In order to see if the power series specification is better than the linear one,we perform a likelihood ratio test. Here, the restricted model is the one withlinear travel time for the non-stop alternative and the unrestricted model isthe one with the power series expansion. The null hypothesis is given by:

H0 : βTotal TT1 sq = 0

The statistic for the likelihood ratio test is given as follows:

−2(−2314.435+ 2320.447) = 12.024

Since χ20.95,1 = 3.841, we can reject the null hypothesis of a linear travel time

for the non-stop alternative at a 95% level of confidence.

The Box-Cox Transformation

In this section, we specify a Box-Cox transformation, which is a non-lineartransformation of a variable that also depends on an unknown parameter λ.

Precisely, a Box-Cox transformation of a variable x is given as follows:

xλ − 1

λ, where x ≥ 0.

We apply this transformation to the travel time variable for the non-stopitinerary. The utilities are the same as the previous models, apart from theone relative to the non-stop itinerary, which we report below:

V1 = ASC1 + βFare · Fare1 + βLegroom · Legroom1

+βSchedDE ·Opt1SchedDelayEarly+ βSchedDL ·Opt1SchedDelayLate

+βTotal TT1 ·Total TT λ

1 − 1

λ

129

Page 134: 220714279-Workbook-2012

130 specification testing

[GeneralizedUtilities]

1 Total_TT1 * ( ( ( TripTimeHours_1 ) ^ LAMBDA - 1 ) / LAMBDA )

Figure 6.9: Biogeme snapshot of Box-Cox transformation

Let us note that in this specification, we have one more unknown parameter,λ. Figure 6.9 displays a Biogeme snapshot from the model specification file.

The results relative to the model including the Box-Cox transformation areshown in Table 6.24.

Let us remark that the Box-Cox transformation reduces to a linear functionas a special case when the parameter λ is equal to 1. The estimate of λ issignificantly different from 1 at a 95 % level of confidence, with a t-test equalto −3.36.

We perform a likelihood ratio test between the linear model and the Box-Coxmodel. The null hypothesis is given by:

H0 : λ = 1

The statistic of the likelihood ratio test for this null hypothesis is given asfollows:

−2(−2320.447+ 2314.574) = 11.746

χ20.95,1 = 3.841 > 11.746

The null hypothesis of a linear specification is hence rejected at a 95 % levelof confidence. Therefore, the Box-Cox transformation of the time is moreadequate.

130

Page 135: 220714279-Workbook-2012

airline itinerary case 131

Box-Cox transformed model: estimation resultsParameter Parameter Coeff. Robust Robust

number name estimate standard error t-stat1 ASC2 -1.51 0.263 -5.772 ASC3 -1.74 0.280 -6.223 Fare -0.0193 0.000799 -24.124 lambda -0.139 0.338 -0.415 Legroom 0.227 0.0267 8.526 SchedDE -0.140 0.0165 -8.477 SchedDL -0.105 0.0137 -7.638 Total TT1 -1.24 0.372 -3.349 Total TT2 -0.306 0.0681 -4.4910 Total TT3 -0.306 0.0683 -4.48

. . .Summary statisticsNumber of observations = 3609

L(0) = −3964.892

L(β) = −2314.574

ρ2 = 0.414

Table 6.24: Estimation results for the Box-Cox transformed model

131

Page 136: 220714279-Workbook-2012

132 specification testing

132

Page 137: 220714279-Workbook-2012

Chapter 7

Forecasting

The objective of this case study is to forecast market shares for differentpolicy scenarios using the models estimated in the logit model case study.You can choose between the Swissmetro, Residential Telephone Services andAirline Itinerary datasets. A detailed description of each dataset can befound in Appendix A.

The provided forecasting examples are given in the following sections: Swiss-metro in section 7.2 on page 135, Residential Telephone Services in section 7.3on page 138 and Airline Itinerary in 7.4 on page 141.

7.1 Guidelines

This case study differs from the previous ones since you do not develop newmodel specifications. Instead, you use the model specifications from the logitmodel case study. In addition to the programs you normally use, you need aspreadsheet application such as OpenOffice Calc or Microsoft Office Excel.

The estimated coefficients of a discrete choice model can be used to calculatethe choice probability of each alternative for each observation in the sample.In forecasting, however, we are interested in the aggregate market shares forthe entire population or for different segments. It could also be interestingto know how these aggregate market shares are affected by a change in anindependent variable.

In this case study, you learn to aggregate the individual probabilities to

133

Page 138: 220714279-Workbook-2012

134 forecasting

obtain market shares and to test the effect of different alternative scenarioson the market shares. In all case studies it is assumed that the availablesample is a random sample of the population.

Start by studying the given “base case” as well as the corresponding fore-casting scenario based on cost policy changes. Use the given model andspreadsheet (distributed on the course USB key) to test and analyze theproposed scenarios.

134

Page 139: 220714279-Workbook-2012

swissmetro case 135

7.2 Swissmetro Case

Forecasting the Effect of Change in Swissmetro Cost

Files to use with BIOGEME:Model files: MNL SM socioec.mod

MNL SM socioec res.modMNL SM socioec res2.mod

Data file: swissmetro.datExcel worksheet: swissmetro.xls

In this case study, we forecast the effects of change in Swissmetro costs acrossdifferent market segments. (See Chapter 6 in Ben-Akiva and Lerman, 1985for details on forecasting techniques.) Suppose that we know that marketsegmentation exists on income. We can then consider three markets, namely,low income, medium income and high income that are defined as follows

• Low Income: under $50,000 (INCOME = 0 or 1)

• Medium Income: between $50,000 and $100,000 (INCOME = 2)

• High Income: Over $100,000 (INCOME= 3).

We use the model MNL SM socioec.mod, from the case study on logit mod-els (Chapter 5). The procedure used for forecasting market shares is thefollowing

• Estimate the model with BIOGEME.

• Compute predicted probabilities with BioSim. See Section 2.7 on page 34for instructions on how to use BioSim.

• Excel can be used for editing and processing the data and probabilities.For example, you can open the data file with Excel and paste theprobabilities given in the BioSim result file into the Excel file.

We have provided an Excel file (swissmetro.xls) containing the observationsand their corresponding probabilities. This file has also been used for com-puting market shares by averaging the alternative probabilities over eachmarket segment.

135

Page 140: 220714279-Workbook-2012

136 forecasting

We would like to investigate the cost influence on the market shares ofSwissmetro. We therefore increase the cost for the Swissmetro by 20%and we forecast the market shares after this change. We modify the fileMNL SM socioec res.mod to take into account the cost policy in the follow-ing way:

[Expressions]

SM_COST = 1.2 * SM_CO * ( GA == 0 )

We name this file MNL SM socioec res2.mod. It is provided with this casestudy. We simulate again using BioSim in order to obtain the alternativeprobabilities under this new scenario. The probabilities are integrated in theExcel file (swissmetro.xls) and the market shares can be computed in thesame way as for the base case. The results for the base case and the newcost scenario are given in Table 7.1. We can note a decrease in the marketshares of Swissmetro for all market segments. However, it is not an importantdecrease which indicates that travelers are not very sensitive to cost changesfor this new transportation mode.

Figure 7.1 shows the market shares of the Swissmetro alternative for the lowand high income segments as a function of changes in Swissmetro cost. Wecan see that surprisingly the sensitivity to cost is higher for the high incomegroup than for the low income group. This might indicate that a differentmodel specification should be attempted (for example, one that includesincome as an explanatory variable). We can also note that surprisingly theSwissmetro alternative has a higher market share for the low income groupthan for the high income group. This could be due to the SP data collectionwhere the price for Swissmetro may not have been high enough to capturethe differences between these groups.

It would also be interesting to investigate the impact on the market sharesfor the following two policy scenarios:

• The Swissmetro SA has decided to provide a 20% discount to youths(age < 24) and 50% discount to elderly (age > 65) when using Swiss-metro. To compensate for the lost revenue, the company considersincreasing the general Swissmetro fare uniformly by 10%.

• The Swissmetro SA is considering an alternative option of making in-cremental investment in Swissmetro and initially starting with half the

136

Page 141: 220714279-Workbook-2012

swissmetro case 137

Base case ForecastLow Med Hi Low Med HiINC INC INC INC INC INC

CAR 14 28 32 16 31 36TRAIN 23 12 9 24 13 10SM 62 60 60 60 56 54

Table 7.1: Market Shares (percent) for increased cost of Swissmetro

40

50

60

70

-20% -10% base +10%+20%

High income

Low income

Changes in Swissmetro Cost

MarketShare(%

)

Figure 7.1: Swissmetro: Market Shares for Low and High Income Segments

maglev trains they originally planned to purchase. To meet the grow-ing demand, they are also considering doubling the frequency of theregular trains.

137

Page 142: 220714279-Workbook-2012

138 forecasting

7.3 Choice of Residential Telephone Services

Case

Forecasting the Effect of Change in Cost Across MarketSegments

Files to use with Biogeme:Model files: MNL Tel socioec.mod

MNL Tel socioec res.modMNL Tel socioec res2.mod

Data file: telephone.datExcel worksheet: telephone.xls

In this case study, we forecast the effects of change in cost of alternativesacross different market segments (See Chapter 6 in Ben-Akiva and Ler-man, 1985 for details on forecasting techniques.) Suppose that we knowthat market segmentation exists on income (Inc). We can then considerthree markets, namely, low income, medium income and high income. Wedefine these market segments as follows

• Low Income: under $20,000 (Inc = 1 or 2)

• Medium Income: Between $20,000 and $40,000 (Inc = 3 or 4)

• High Income: Over $40,000 (Inc = 5).

We use the model MNL Tel socioec.mod from the case study on logit mod-els (Chapter 5). The procedure used for forecasting market shares is thefollowing

• Estimate the model with Biogeme.

• Compute predicted probabilities with BioSim. See Section 2.7 on page 34for instructions on how to use BioSim.

• Excel can be used for editing and processing the data and probabilities.For example, you can open the data file with Excel and paste theprobabilities given in the BioSim result file into the Excel file.

138

Page 143: 220714279-Workbook-2012

choice of residential telephone services case 139

We have provided an Excel file (telephone.xls) containing the observationsand their corresponding probabilities. This file has also been used for com-puting market shares by averaging the alternative probabilities over eachmarket segment.

Assume that the telephone company in an effort to increase revenues consid-ers raising the fixed costs for alternatives SM, LF, EF and MF by $4, $6, $7and $11, respectively. We would like to forecast the market shares after thischange. We modify the file MNL Tel socioec res.mod to take into accountthe cost policy in the following way:

[Expressions]

logcost1 = log(cost1 )

logcost2 = log(cost2 + 4 )

logcost3 = log(cost3 + 6 )

logcost4 = log(cost4 + 7 )

logcost5 = log(cost5 + 11 )

We name this fileMNL Tel socioec res2.mod, and it is provided with this casestudy. We simulate again using BioSim in order to obtain the alternativeprobabilities under this new scenario. The probabilities are integrated inthe Excel file (telephone.xls), and the market shares can be computed in thesame way as for the base case. The results for the base case and the new costscenario are given in Table 7.2. The cost change does not result in importantchanges for the EF and MF alternatives. There is, however, an importantincrease for all market segments towards the BM alternative.

Figure 7.2 shows the market shares of the standard measure (SM) alternativefor the low and high income segments as a function of changes in SM cost.We can see that the sensitivity to cost is about the same for the two marketsegments. The SM alternative has however a higher market share for the lowincome group than for the high income group.

It would also be interesting to investigate the impact on the market sharesfor the following two policy scenarios:

• Due to legal restrictions, the telephone company is expected to subsi-dize the telephone costs of elderly households (a household with at least1 household member older than 65 years) and low-income households(a household with annual household income less than $20,000). The

139

Page 144: 220714279-Workbook-2012

140 forecasting

Base case ForecastLow Med Hi Low Med HiINC INC INC INC INC INC

BM 19 14 13 34 26 23SM 30 28 23 22 21 18LF 40 43 41 34 39 37EF 0 1 2 0 1 2MF 11 14 21 10 13 19

Table 7.2: Market Shares (percent)

15

25

35

45

-20% -10% base +10%+20%

High income

Low income

Changes in SM Cost

MarketShare(%

)

Figure 7.2: Market Shares for Low and High Income Segments, SM alterna-tive

telephone company must provide a 50% discount to these households’telephone costs. To compensate for these losses in the revenues, thecompany considers increasing the telephone costs of all other house-holds uniformly by 10%.

• Due to recession, the number of employed persons per household hasreduced to half of the previous scenario and the telephone companyhas decided to provide a 20% discount for households that have noemployed persons. To compensate for these losses in the revenues, thecompany considers increasing the telephone costs of households withat least one employed person by 10%.

140

Page 145: 220714279-Workbook-2012

airline itinerary case 141

7.4 Airline Itinerary Case

Forecasting the Effect of Change in the Cost of the Non-stop Itinerary

Files to use with Biogeme:Model files: MNL airline.mod

MNL airline res.modMNL airline res2.mod

Data file: airline.datExcel worksheet: airline.xls

In this case study, we are interested in forecasting the effects of changesin the fare of the non-stop airline itinerary for different market segments,i.e. individuals who pay for their trips and individuals whose airplane ticketis paid by a third party. We assuming that there is evidence for marketsegmentation between these two groups. Precisely, the latter are defined asfollows:

• Traveler pays: category “traveler is paying for the trip” (q03 WhoPays=1)

• Third party pays: categories “employer pays” (q03 WhoPays= 2) and“third party pays” (q03 WhoPays= 3)

The base model we are using here isMNL airline.mod. The procedure used toforecast the market shares of the different airline itineraries is the following:

• Estimate the model with Biogeme.

• Compute the predicted probabilities with BioSim. See Section 2.7 onpage 34 for instructions on how to use BioSim.

• Excel can be used for editing and processing the data and probabilities.For example, you can open the data file with Excel and paste theprobabilities given in the BioSim result file into the Excel file.

141

Page 146: 220714279-Workbook-2012

142 forecasting

An Excel file airline.xls which contains the observations and their correspond-ing probabilities is provided. In this file you can also find the market sharesfor each alternative, which were obtained by averaging the probabilities ofthe alternative over each market segment.

We would like to investigate the influence of a change in the non-stop itineraryfare on the market shares of the three alternatives. For example, we increasethe fare of the non-stop itinerary by 20% and observe the subsequent changesin the market shares.

From the estimation procedure of model MNL airline.mod we obtained fileMNL airline.res. This file has been renamed as MNL airline res.mod and isalso provided in the folder that contains the files relative to this case study.We now modify it in order to take into account the change of fare in thenon-stop itinerary. This is performed in the section called [Expressions] asfollows:

[Expressions]

HighFare_1 = 1.2 * Fare_1

The modified file is called MNL airline res2.mod and is also provided withthis case study. We perform a new simulation with BioSim in order to obtainthe probabilities of the different alternatives for this scenario. The probabil-ities have been included in the Excel file (airline.xls) and the market sharesare computed similarly as for the base case. The results for the base caseand the new cost scenario are reported in Table 7.3.

A important decrease in the market share of the non-stop itinerary can benoticed for both market segments. This shows that individuals are sensitiveto variations of the fare of direct flight.

Figure 7.3 shows the evolution of the market share of the non-stop flightitinerary for the market segments of individuals who pay for their trips andindividuals who do not, with respect to several changes in the non-stop flightfare. As expected, we notice that individuals who pay for their flight areslightly more sensitive to changes in the airplane ticket for the non-stopalternative. This result shows that the variable indicating who pays for thetrip could be included as an explanatory variable in the model.

142

Page 147: 220714279-Workbook-2012

airline itinerary case 143

Base case ForecastTraveler Third party Traveler Third partypays pays pays pays

Opt1 69.4 69.5 43.9 42.9Opt2 16.4 15.9 29.9 30.1Opt3 14.2 14.6 26.1 27.1

Table 7.3: Market Shares (percent) for an increased cost of the non-stopitinerary

40

50

60

70

80

-20% -10% base +10%+20%

Traveler pays

Third party pays

Changes in Cost of Non-stop Alternative

MarketShare(%

)

Figure 7.3: Swissmetro: Market Shares for “Traveler pays” and “Third partypays” segments

143

Page 148: 220714279-Workbook-2012

144 forecasting

144

Page 149: 220714279-Workbook-2012

Chapter 8

Multivariate (Generalized)Extreme Value Models

The topic of this case study is the specification and estimation of Multivariate(Generalized) Extreme Value (MEV) models. Different specifications areintroduced using a stepwise modeling strategy, increasing the complexity ateach step. The objectives of this case study can be summarized as follows:

• Specification and estimation of Nested Logit (NL) models.

• Testing of the nesting parameters.

• Estimation of Cross Nested Logit (CNL) models, with fixed alpha pa-rameters.

• Estimation of CNL models with unknown alpha parameters.

For this case study, you can choose between the Swissmetro and ResidentialTelephone Services datasets. A detailed description of each dataset can befound in Appendix A.

We focus here on the correlation among alternatives and different ways toinclude this correlation in the model structure. We iteratively test differenttypes of nesting structures for the Nested and Cross-Nested Logit models.

The examples of model specifications that we have provided can be found inthe following sections: Swissmetro in section 8.2 on page 150 and ResidentialTelephone Services in section 8.3 on page 158.

145

Page 150: 220714279-Workbook-2012

146 multivariate (generalized) extreme value models

8.1 Challenge Question

The Swissmetro dataset Innovation in the market for intercity passen-ger transportation is a difficult enterprise as the existing modes: private car,coach, rail as well as regional and long-distance air services continue to in-novate in their own right by offering new combinations of speeds, services,prices and technologies. Consider for example high-speed rail links betweenthe major centers or direct regional jet services between smaller countries.The Swissmetro SA in Geneva is promoting such an innovation: a mag-levunderground system operating at speeds up to 500 km/h in partial vacuumconnecting the major Swiss conurbations, in particular along the Mittellandcorridor (St. Gallen, Zurich, Bern, Lausanne and Geneva).

The dataset consists of survey data collected on the trains between St. Gallenand Geneva, Switzerland, during March 1998. The interviewed respondentsprovided information in order to analyze the impact of the modal innovationin transportation, represented by the Swissmetro, a revolutionary mag-levunderground system, against the usual transport modes represented by carand train. The Swissmetro is a true innovation. It is therefore not appro-priate to base forecasts of its impact on observations of existing revealedpreferences (RP) data. As a consequence, a stated preference survey (SP)has been conducted, which allowed to collect 6759 usable observations.

Data description Please read Appendix A.3 of the workbook for details.

Estimation of a Nested Logit Model

Files to use with Biogeme:Model file: GEV SM NL Challenge.modData file: swissmetro.dat

We hypothesize that alternatives which are public transportations, shareunobservable factors. We want our model to incorporate the potential cor-relation pattern between the unobservable parts of the Swissmetro and trainalternatives. We group them inside the Public nest. The Car alternativeremains alone in the Private nest.

146

Page 151: 220714279-Workbook-2012

challenge question 147

Private

Car

Public

Train SM

Figure 8.1: The correlation structure of the specified NL model

The model structure is shown in Figure 8.1.

The model file used by Biogeme is shown in Figure 8.2

When we ran this model in Biogeme, we obtained the results as shown inTable 8.1.

Questions: Can we use this model? Motivate your answer.

147

Page 152: 220714279-Workbook-2012

148 multivariate (generalized) extreme value models

[Choice]

CHOICE

[Beta]

// Name Value LowerBound UpperBound status (0=variable, 1=fixed)

ASC_CAR 0 -1000 1000 0

ASC_SBB 0 -1000 1000 0

ASC_SM 0 -1000 1000 0

B_COST 0 -1000 1000 0

B_CAR_TIME 0 -1000 1000 0

B_TRAIN_TIME 0 -1000 1000 0

B_SM_TIME 0 -1000 1000 0

B_HE 0 -1000 1000 0

B_GA 0 -1000 1000 0

[Utilities]

// Id Name Avail linear-in-parameter expression (beta1*x1 + beta2*x2 + ... )

1 SBB_SP TRAIN_AV_SP B_TRAIN_TIME * TRAIN_TT + B_COST * TRAIN_CO + B_HE * TRAIN_HE

+ B_GA * GA

2 SM_SP SM_AV ASC_SM * one + B_SM_TIME * SM_TT + B_COST * SM_CO + B_HE * SM_HE

+ B_GA * GA

3 Car_SP CAR_AV_SP ASC_CAR * one + B_CAR_TIME * CAR_TT + B_COST * CAR_CO

[Model]

$NL

[NLNests]

// Name paramvalue LowerBound UpperBound status list of alt

public 1.0 1 10 0 1 2

private 1.0 1 10 1 3

[Expressions]

one = 1

Figure 8.2: Swissmetro NL specification for Biogeme

148

Page 153: 220714279-Workbook-2012

challenge question 149

NL Model Estimation ResultsVariable Variable Coefficient Robust Robust Robustnumber name estimate std error t-stat. 0 t-stat. 1

1 ASC CAR 0.256 0.163 1.572 ASC SM 0.434 0.129 3.373 B CAR TIME -0.0104 0.00111 -9.304 B COST -0.00124 0.000178 -6.955 B GA 7.18 0.976 7.356 B HE -0.00541 0.00108 -5.017 B SM TIME -0.0110 0.00187 -5.878 B TRAIN TIME -0.0120 0.00179 -6.699 µprivate 1.010 µpublic 1.14 0.160 7.10 0.87

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5244.668

ρ2 = 0.245

Table 8.1: Estimation results for the Swissmetro Nested Logit model

149

Page 154: 220714279-Workbook-2012

150 multivariate (generalized) extreme value models

8.2 Swissmetro Case

Estimation of a Nested Logit Model

Files to use with Biogeme:Model file: GEV SM NL.modData file: swissmetro.dat

The application of the IIA McFadden test in the case study on specificationtesting revealed that the IIA assumption does not hold between the car andtrain alternatives. This is an indication of probable correlation between carand train. We start with a Nested Logit (NL) specification, where the car andtrain alternatives are both assigned to the same nest and the Swissmetro isalone in a second nest, as shown in Figure 8.4. See Chapter 10 in Ben-Akivaand Lerman (1985) for details on the NL model.

The expressions of the systematic utility functions for each alternative usedin this model specification are

Vcar = ASCcar + βCAR timeCAR TT + βcostCAR CO

Vtrain = βTRAIN timeTRAIN TT + βcostTRAIN CO+ βheTRAIN HE+

βGAGA

Vsm = ASCSM + βSM timeSM TT + βcostSM CO+ βheSM HE

βGAGA,

and in Figure 8.3 an extract from the .mod file illustrating the nest specifica-tion with Biogeme is shown. Note that only one of the two nest parameterscan be estimated. The estimation results are shown in Table 8.2.

The alternative specific constants show a preference for the Swissmetro al-ternative compared to the other modes, all the rest remaining constant. Thecost and travel time coefficients have the expected negative sign. The co-efficient related to the ownership of the Swiss annual season ticket (GA) ispositive as expected, reflecting the preference for the SM and train alterna-tives with respect to the car alternative. The negative estimated value of theheadway parameter βhe indicates that the higher the headway, the lower thefrequency of service, and thus the lower the utility. Finally, the scale param-

150

Page 155: 220714279-Workbook-2012

swissmetro case 151

[NLNests]

// Name paramvalue LowerBound UpperBound status list of alt

Classic 1.0 1 10 0 1 3

Innovative 1.0 1 10 1 2

Figure 8.3: Biogeme snapshot

Innovative

SM

Classic

Car Train

Figure 8.4: The correlation structure of the specified NL model

eter of the random term associated with the classic nest has been estimatedas µclassic = 1.64.

To be consistent with random utility theory, the inequality µµm

< 1 with µ

being normalized to 1 implies µm > 1. To see if this is the case here, wecan test the null hypothesis H0 : µm = 1. Since there is a single restriction,we can use either a t-test or a likelihood ratio test which are asymptoticallyequivalent. The t-statistic with respect to 1 can be computed as follows:

(µm−1)

std err of µm. It is also output by Biogeme. Here the t-statistic with respect

to 1 is 4.86, which indicates that µclassic is significantly different from 1, andhence there is a significant correlation between the car and train alternatives.

We can also do a likelihood ratio test as follows. The test statistic for thenull hypothesis is given by

−2(LR − LU) = −2(−5245.550+ 5207.794) = 75.422

where the restricted model is the logit model (SpecTest SM socioec bis.mod)and the unrestricted model is the nested logit model. The test statistic

151

Page 156: 220714279-Workbook-2012

152 multivariate (generalized) extreme value models

NL modelParameter Parameter Parameter Robust Robust Robustnumber name estimate standard error t-stat. 0 t-stat. 1

1 ASCcar 0.0272 0.119 0.232 ASCSM 0.243 0.119 2.053 βcost -0.000986 0.000105 -9.364 βcar time -0.00874 0.00101 -8.645 βtrain time -0.0113 0.000958 -11.776 βSM time -0.00995 0.00163 -6.097 βhe -0.00472 0.000862 -5.488 βga 5.39 0.582 9.26

9 µclassic 1.64 0.132 12.42 4.86

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5207.794

ρ2 = 0.250

Table 8.2: NL estimation results

152

Page 157: 220714279-Workbook-2012

swissmetro case 153

Rail-Based

SM Train

Classic

Car

Figure 8.5: A representative scheme for the CNL correlation structure.

is asymptotically χ2 distributed with 1 degree of freedom since there is 1restriction. Since 75.440 > 3.841 (the critical value of the χ2 distributionwith 1 degree of freedom at a 95 % level of confidence), we reject the nullhypothesis (logit model) and accept the nested logit model.

Estimation of a Cross-Nested Logit Model with FixedAlphas

Files to use with Biogeme:model file: GEV SM CNL fix.moddata file: swissmetro.dat

In this model, we relax the assumption that an alternative can belong toonly one nest and we assume that the train alternative can be assigned totwo different nests. This correlation structure is motivated by consideringthe train alternative as a classic transportation mode (along with the caragainst the more innovative Swissmetro) on one hand, and as a rail-basedmode (as the Swissmetro) on the other hand. We represent this cross-nestedstructure in Figure 8.5. See Abbe et al. (2007) for a detailed description ofthe Cross-Nested Logit (CNL) model.

In Figure 8.6 we show a snapshot from the Biogeme .mod file illustrating theCNL nest specification. The estimation results are shown in Table 8.3. Thealternative-specific constants now have a negative sign. All other coefficients

153

Page 158: 220714279-Workbook-2012

154 multivariate (generalized) extreme value models

[CNLNests]

// Name paramvalue LowerBound UpperBound status

classic 1.0 1 10 0

Rail_based 1.0 1 10 0

[CNLAlpha]

// Alt Nest value LowerBound UpperBound status

Car classic 1 0.00001 1.0 1

Train classic 0.5 0.00001 1.0 1

Train Rail_based 0.5 0.00001 1.0 1

SM Rail_based 1 0.00001 1.0 1

Figure 8.6: Biogeme snapshot

have the expected signs.

In this CNL specification, we have fixed the αtrain classic and αtrain rail coef-ficients to 0.5. It means that we assume that the train alternative equallybelongs to both nests classic and rail-based. This assumption will be relaxedin the next section. Thus, CNL with fixed α’s is a restricted model of CNLwith variable α’s.

Estimation of a Cross-Nested Logit Model with Un-known Alphas

Files to use with Biogeme:Model file: GEV SM CNL var.modData file: swissmetro.dat

In Table 8.4, we show the results for the CNL specification with variable α co-efficients. We also want to underline the fact that in both CNL specificationsthe condition ∑

m

αjm = 1

has been imposed. Such a condition is not necessary for the validity of themodel. It is imposed for identification purposes. We refer the interestedreader to Abbe et al. (2007) for more theoretical details.

154

Page 159: 220714279-Workbook-2012

swissmetro case 155

CNL model with fixed α’sParameter Parameter Parameter Robust Robust Robustnumber name estimate standard error t-stat. 0 t-stat. 1

1 ASCcar -0.838 0.0787 -10.652 ASCSM -0.457 0.0744 -6.153 βcost -0.00705 0.000526 -13.394 βcar time -0.00628 0.00122 -5.175 βtrain time -0.00863 0.00105 -8.186 βSM time -0.00715 0.00151 -4.747 βhe -0.00298 0.000533 -5.588 βga 0.618 0.0940 6.57

9 µclassic 2.85 0.260 10.93 7.0910 µrail based 4.73 0.483 9.78 7.71

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5120.738

ρ2 = 0.263

Table 8.3: Estimation results for the CNL specification. The α coefficientsare fixed.

155

Page 160: 220714279-Workbook-2012

156 multivariate (generalized) extreme value models

CNL model with unknown α’sParameter Parameter Parameternumber name estimate standard error t-stat. 0 t-stat. 1

1 ASCcar -0.849 0.0692 -12.262 ASCSM -0.460 0.0656 -7.013 βcost -0.00697 0.000440 -15.854 βcar time -0.00621 0.000583 -10.665 βtrain time -0.00849 0.000660 -12.856 βSM time -0.00711 0.000745 -9.547 βhe -0.00293 0.000510 -5.758 βga 0.620 0.0886 7.00

9 µclassic 2.87 0.212 13.54 8.8210 µrail based 4.90 0.722 6.78 5.4011 αtrain classic 0.486 0.0265 18.35 -19.4012 αtrain rail 0.514 0.0265 19.40 -18.35

Summary statisticsNumber of observations = 6759

L(0) = −6958.425

L(β) = −5120.608

ρ2 = 0.262

Table 8.4: Estimation results for the CNL specification. The α coefficientsare estimated.

156

Page 161: 220714279-Workbook-2012

swissmetro case 157

To select between the nested logit and CNL model with variable α’s, we cantest the null hypothesis H0 : αtrain rail = 0, µrail based = 1. Since there aremultiple restrictions, we cannot use multiple t-tests but should rather use alikelihood ratio test as follows. The test statistic for the null hypothesis isgiven by

−2(LR − LU) = −2(−5207.794+ 5120.680) = 174.372

where the restricted model is the nested logit model and the unrestrictedmodel is the CNL model with variable α’s. The test statistic is asymptoticallyχ2 distributed with 2 degrees of freedom since there are 2 restrictions. Since174.372 > 5.991 (the critical value of the χ2 distribution with 2 degrees offreedom at a 95 % level of confidence), we reject the null hypothesis (nestedlogit model) and accept the CNL model with variable α’s. We can thusconclude that the train alternative is correlated with both Swissmetro andcar alternatives.

To select between the CNL model with fixed α’s and the CNL model withvariable α’s, we can test the null hypothesis H0 : αtrain rail = 0.5. Sincethere is a single restriction, we can use either a t-test or a likelihood ratiotest which are asymptotically equivalent. The t-statistic with respect to 0.5is 0.53, which indicates that αtrain rail is not significantly different from 0.5,and hence we accept the null hypothesis (CNL model with fixed α’s) andreject the CNL model with variable α’s.

We can also do a likelihood ratio test as follows. The test statistic for thenull hypothesis is given by

−2(LR − LU) = −2(−5120.738+ 5120.680) = 0.260

where the restricted model is the CNL model with fixed α’s and the un-restricted model is the CNL model with variable α’s. The test statistic isasymptotically χ2 distributed with 1 degree of freedom since there is 1 re-striction. Since 0.260 < 3.841 (the critical value of the χ2 distribution with1 degree of freedom at a 95 % level of confidence), we accept the null hy-pothesis (CNL model with fixed α’s) and reject the CNL model with variableα’s.

As a conclusion, since both the nested logit model and the CNL model withfixed α’s are restricted models of the CNL model with variable α’s, and sincewe have rejected the nested logit model and accepted the CNL model withfixed α’s, we select the CNL model with fixed α’s.

157

Page 162: 220714279-Workbook-2012

158 multivariate (generalized) extreme value models

8.3 Choice of Residential Telephone Services

Case

Estimation of a Nested Logit Model

Files to use with Biogeme:Model file: GEV Tel NL unrestricted.modData file: telephone.dat

The application of the IIA McFadden test in the case study on specificationtesting revealed that the IIA assumption does not hold between the SM andBM alternatives and does not hold among the EF, LF, and MF alternativesas well. We start by giving some examples of possible nesting structures forthe Nested Logit (NL) model in Figure 8.7. See Chapter 10 in Ben-Akivaand Lerman (1985) for details on the NL model.

The sample model file describes the first nesting structure shown in Fig-ure 8.7. The expressions of the utilities for this simple NL model are

VBM = ASCBM + βcost ln(costBM)

VSM = βcost ln(costSM)

VLF = ASCLF + βcost ln(costLF)

VEF = ASCEF + βcost ln(costEF)

VMF = ASCMF + βcost ln(costMF).

We show a snapshot of the Biogeme code in Figure 8.8. In the first column,we write the name of the nest and in the last column the alternatives thatbelong to it. Here the alternative numbers must correspond to those used inthe utility functions under the column ID. The estimation results of the NLmodel are shown in Table 8.5.

To be consistent with random utility theory, the inequality µµm

< 1 with µ

being normalized to 1 implies µm > 1. To see if this is the case here, wecan test the null hypothesis H0 : µmeas = µflat = 1. Since there are multiplerestrictions here, we cannot do multiple t-tests. We should do a likelihood

158

Page 163: 220714279-Workbook-2012

choice of residential telephone services case 159

Measured

BM SM

Flat

LF EF MF

BM

BM

SM

SM

Flat

LF EF MF

Measured

BM SM

LF

LF

EF

EF

MF

MF

Figure 8.7: The possible nesting structures

[NLNests]

// Name paramvalue LowerBound UpperBound status list of alt

N_MEAS 1.0 1.0 10.0 0 1 2

N_FLAT 1.0 1.0 10.0 0 3 4 5

Figure 8.8: Biogeme snapshot

159

Page 164: 220714279-Workbook-2012

160 multivariate (generalized) extreme value models

NL with generic attributesParameter Parameter Parameter Robust Robust Robustnumber name estimate standard error t stat. 0 t stat. 1

1 ASCBM -0.378 0.117 -3.222 ASCLF 0.893 0.158 5.643 ASCEF 0.847 0.391 2.174 ASCMF 1.41 0.238 5.905 βcost -1.49 0.243 -6.136 µmeas 2.06 0.573 3.60 1.867 µflat 2.29 0.763 3.00 1.69

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −473.219

ρ2 = 0.143

Table 8.5: NL with generic attributes

ratio test as follows. The test statistic for the null hypothesis is given by

−2(LR − LU) = −2(−477.557+ 473.219) = 8.676

where the restricted model is the logit model (MNL Tel generic.mod) and theunrestricted model is the nested logit model. The test statistic is asymptot-ically χ2 distributed with 2 degrees of freedom since there are 2 restrictions.Since 8.676 > 5.991 (the critical value of the χ2 distribution with 2 degreesof freedom at a 95 % level of confidence), we reject the null hypothesis (logitmodel) and accept the nested logit model.

The µm’s of the two nests can be set equal to each other too. This can bedone in two ways. One way is to keep the µm’s fixed to 1 and estimate µ

(the related Biogeme code is shown in Figure 8.9).

Alternatively, we can also constrain the two nest coefficients to be equal whilekeeping µ fixed to 1 (Figure 8.10).

The estimation results for this last specification are shown in Table 8.6.

160

Page 165: 220714279-Workbook-2012

choice of residential telephone services case 161

[Mu]

// Value LowerBound UpperBound Status

+1.0000000e+00 +0.0000000e+00 +1.0000000e+00 0

[NLNests]

// Name paramvalue LowerBound UpperBound status list of alt

N_MEAS 1.0 1.0 10.0 1 1 2

N_FLAT 1.0 1.0 10.0 1 3 4 5

Figure 8.9: Biogeme snapshot

[NLNests]

// Name paramvalue LowerBound UpperBound status list of alt

N_MEAS 1.0 1.0 10.0 0 1 2

N_FLAT 1.0 1.0 10.0 0 3 4 5

[ConstraintNestCoef]

// List of pairs of nests for which the associated

// coefficients must be constrained to be equal

// Syntax: COEF_NEST_A = COEF_NEST_B

N_MEAS = N_FLAT

Figure 8.10: Biogeme snapshot

161

Page 166: 220714279-Workbook-2012

162 multivariate (generalized) extreme value models

NL with linear constraintsParameter Parameter Parameternumber name estimate standard error t stat. 0 t stat. 1

1 ASCBM -0.368 0.110 -3.352 ASCLF 0.882 0.167 5.293 ASCEF 0.833 0.398 2.094 ASCMF 1.39 0.251 5.515 βcost -1.50 0.257 -5.836 µmeas 2.16 0.519 4.17 2.247 µflat 2.16 0.519 4.17 2.24

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −473.288

ρ2 = 0.143

Table 8.6: NL with linear constraint on nest parameters

Estimation of a Cross-Nested Logit Model with FixedAlphas

Files to use with Biogeme:Model file: GEV Tel CNL fix.modData file: telephone.dat

In this section and the next one, we specify two different Cross-Nested Logit(CNL) models (see Abbe et al. (2007) for a detailed description of the CNLmodel) using both fixed and variable degrees of membership. The majorpremise here is that such specifications are mainly for demonstration pur-poses. However, an assumption that might make sense is that the standardmeasured alternative (SM) is likely to be correlated with both measured andflat options. Indeed, if we look at its definition, it turns out that it maybelong to both nests, having also a fixed monthly charge. Based on thishypothesis, the proposed cross-nested structure is shown in Figure 8.11.

We present the CNL model with the same deterministic utility functions as

162

Page 167: 220714279-Workbook-2012

choice of residential telephone services case 163

Measured

BM SM

Flat

LF EF MF

Figure 8.11: The cross-nested structure

in the previous model. The corresponding snapshot from the Biogeme codefor this cross-nesting specification is shown in Figure 8.12.

Note that we define αCNL so that the SM alternative belongs equally to boththe flat and the measured nests. This assumption will be relaxed in the nextsection. Thus, CNL with fixed α’s is a restricted model of CNL with variableα’s. The estimation results are shown in Table 8.7.

Cross-Nested Logit Model with Variable Alphas

Files to use with Biogeme:Model file: GEV Tel CNL var.modData file: telephone.dat

In the previous CNL model, we assumed that the SM alternative belongsequally to the measured nest and the flat nest by fixing αSM meas and αSM flat

to be equal to 0.5. This assumption can be relaxed, and we can estimate theshare of SM in each nest during the estimation of the model parameters. Thecorresponding Biogeme snapshot is shown in Figure 8.13. From the resultspresented in Table 8.8, we see that the alternative SM has a very small sharein the flat nest.

We also want to underline the fact that in both CNL specifications the con-

163

Page 168: 220714279-Workbook-2012

164 multivariate (generalized) extreme value models

[CNLNests]

// Name paramvalue LowerBound UpperBound status

N_MEAS 1.0 1 10 0

N_FLAT 1.0 1 10 0

[CNLAlpha]

// Alt Nest value LowerBound UpperBound status

BM N_MEAS 1 0 1.0 1

SM N_MEAS 0.5 0 1.0 1

SM N_FLAT 0.5 0 1.0 1

LF N_FLAT 1 0 1.0 1

EF N_FLAT 1 0 1.0 1

MF N_FLAT 1 0 1.0 1

Figure 8.12: Biogeme snapshot

CNL estimation resultsParameter Parameter Parameter Robust Robust Robustnumber name estimate standard error t stat. 0 t stat. 1

1 ASCBM -0.791 0.0769 -10.282 ASCLF 0.460 0.241 1.913 ASCEF 0.405 0.393 1.034 ASCMF 0.845 0.329 2.575 βcost -1.21 0.311 -3.916 µmeas 3.14 1.18 2.66 1.817 µflat 2.36 1.14 2.08 1.19

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −474.429

ρ2 = 0.141

Table 8.7: CNL estimation results

164

Page 169: 220714279-Workbook-2012

choice of residential telephone services case 165

dition ∑

m

αjm = 1

has been imposed. Such a condition is not necessary for the validity of themodel. It is imposed for identification purposes. We refer the interestedreader to Abbe et al. (2007) for more theoretical details.

To select between the nested logit and CNL model with variable α’s, we cantest the null hypothesis H0 : αSM flat = 0. Since there is a single restriction,we can use either a t-test or a likelihood ratio test which are asymptoticallyequivalent. The t-statistic with respect to 0 is 0.00, which indicates thatαSM flat is not significantly different from 0, and hence we accept the nullhypothesis (nested logit model) and reject the CNL model with variable α’s.

We can also do a likelihood ratio test as follows. The test statistic for thenull hypothesis is given by

−2(LR − LU) = −2(−473.219+ 473.219) = 0.000

where the restricted model is the nested logit model and the unrestrictedmodel is the CNL model. The test statistic is asymptotically χ2 distributedwith 1 degree of freedom since there is 1 restriction. Since 0.000 < 3.841

(the critical value of the χ2 distribution with 1 degree of freedom at a 95 %level of confidence), we accept the null hypothesis (nested logit model) andreject the CNL model with variable α’s. We can thus conclude that the SMalternative is correlated only with the measured nest but not with the flatnest.

To select between the CNL model with fixed α’s and the CNL model withvariable α’s, we can test the null hypothesis H0 : αSM flat = 0.5. Since thereis a single restriction, we can use either a t-test or a likelihood ratio testwhich are asymptotically equivalent. The t-statistic with respect to 0.5 is-0.58, which indicates that αSM flat is not significantly different from 0.5, andhence we accept the null hypothesis (CNL model with fixed α’s) and rejectthe CNL model with variable α’s.

We can also do a likelihood ratio test as follows. The test statistic for thenull hypothesis is given by

−2(LR − LU) = −2(−474.429+ 473.219) = 2.420

where the restricted model is the CNL model with fixed α’s and the un-restricted model is the CNL model with variable α’s. The test statistic is

165

Page 170: 220714279-Workbook-2012

166 multivariate (generalized) extreme value models

[CNLNests]

// Name paramvalue LowerBound UpperBound status

N_MEAS 1.0 1 10 0

N_FLAT 1.0 1 10 0

[CNLAlpha]

// Alt Nest value LowerBound UpperBound status

BM N_MEAS 1 0 1.0 1

SM N_MEAS 0.5 0 1.0 0

SM N_FLAT 0.5 0 1.0 0

LF N_FLAT 1 0 1.0 1

EF N_FLAT 1 0 1.0 1

MF N_FLAT 1 0 1.0 1

Figure 8.13: Biogeme snapshot

asymptotically χ2 distributed with 1 degree of freedom since there is 1 re-striction. Since 2.420 < 3.841 (the critical value of the χ2 distribution with1 degree of freedom at a 95 % level of confidence), we accept the null hy-pothesis (CNL model with fixed α’s) and reject the CNL model with variableα’s.

Since both the nested logit model and the CNL model with fixed α’s arepreferred to the unrestricted model (CNL model with variable α’s), we selectthe nested logit model because it has a higher ρ2 than the CNL model withfixed α’s (0.143 vs. 0.141).

166

Page 171: 220714279-Workbook-2012

choice of residential telephone services case 167

CNL with αCNL variableParameter Parameter Parameternumber name estimate standard error t stat. 0 t stat. 1

1 ASCBM -0.378 1.07 -0.352 ASCEF 0.847 1.13 0.753 ASCLF 0.893 1.08 0.834 ASCMF 1.41 1.09 1.285 βcost -1.49 0.257 -5.806 µflat 2.29 0.640 3.58 2.027 µmeas 2.06 0.575 3.59 1.858 αSM flat 9.40e-005 1.06 0.00 -0.949 αSM meas 1.00 1.06 0.94 0.00

Summary statisticsNumber of observations = 434

L(0) = −560.250

L(β) = −473.219

ρ2 = 0.139

Table 8.8: CNL αCNL variable

167

Page 172: 220714279-Workbook-2012

168 multivariate (generalized) extreme value models

168

Page 173: 220714279-Workbook-2012

Chapter 9

Mixtures of Logit and GEVModels

This case study deals with the specification of mixtures of Logit models. Theobjectives can be summarized as follows:

• Gaining an overview of the different formulations of mixtures of logitand becoming familiar with the concepts of flexible correlation struc-tures and taste heterogeneity.

• Specification and estimation of alternative specific variance models.

• Specification and estimation of error component models.

• Specification and estimation of random coefficients models.

• Specification and estimation of mixtures of GEV models.

For this case study, the Swissmetro dataset is considered. Details on thedataset can be found in the Appendix, section A.3.

The general guidelines presented on page 17 discuss how to go through thecase study.

169

Page 174: 220714279-Workbook-2012

170 mixtures of logit and gev models

9.1 Challenge Question

The Airline Itinerary Case The data come from an Internet choice sur-vey conducted by the Boeing Company in the Fall of 2004. Boeing was in-terested in understanding the sensitivity that air passengers have toward theattributes of an airline itinary, such as fare, travel time, transfers, legroom,and aircraft. It was executed on a sample of customers of an Internet air-line booking service. There are 1633 respondents, each providing one StatedPreference response. Each respondent was faced with three choice alterna-tives based on the origin-destination market request that she entered intothe itinerary search engine. The first alternative is always a non-stop flight,the second a flight with one stop on the same airline, and the third a flightwith one stop and a change of airline.

Data description Please read Appendix A.5 of the workbook for details.

Files to use with Biogeme:Model file: Mixture airline.modData file: airline.dat

We propose a specification of a logit model with a random parameter. Theutility functions include the alternative specific attributes for legroom, sched-ule delay early and late departures. Two attributes capturing the fare are alsoincluded: one for business trips and one for non-business trips. The traveltime parameter is assumed to be randomly distributed over the population.Constants are included for all alternatives except the first one which hasarbitrarily been chosen as a referent. Figure 9.1 gives a suggested Biogemespecification of the model.

Question: Does this model make sense to you? What results do you expectwhen you try to estimate this model?

The results estimated by Biogeme are given in Table 9.1. Do they correspondto your expectations?

170

Page 175: 220714279-Workbook-2012

challenge question 171

[Choice]

SP1_MostAttractive

[Beta]

// Name Value LowerBound UpperBound status

ASC_1 0 -10000 10000 1

ASC_2 0 -10000 10000 0

ASC_3 0 -10000 10000 0

BETA_LogFare_Business 0 -10000 10000 0

BETA_LogFare_NonBusiness 0 -10000 10000 0

BETA_TotalTripTime 0 -10000 10000 0

BETA_TotalTripTime_std 0 -10000 10000 0

BETA_Legroom 0 -10000 10000 0

BETA_SchedDelayEarly 0 -10000 10000 0

BETA_SchedDelayLate 0 -10000 10000 0

[Utilities]

// Id Name Avail linear-in-parameter expression (beta1*x1 + beta2*x2 + ... )

1 Opt1 one ASC_1 * one + BETA_LogFare_Business * Opt1LogFare_Business

+ BETA_LogFare_NonBusiness * Opt1LogFare_NonBusiness

+ BETA_TotalTripTime [ BETA_TotalTripTime_std ] * Opt1_TotalTriptime

+ BETA_Legroom * Opt1_Legroom

+ BETA_SchedDelayEarly * Opt1_SchedDelayEarly

+ BETA_SchedDelayLate * Opt1_SchedDelayLate

2 Opt2 one ASC_2 * one + BETA_LogFare_Business * Opt2LogFare_Business

+ BETA_LogFare_NonBusiness * Opt2LogFare_NonBusiness

+ BETA_TotalTripTime [ BETA_TotalTripTime_std ] * Opt2_TotalTriptime

+ BETA_Legroom * Opt2_Legroom

+ BETA_SchedDelayEarly * Opt2_SchedDelayEarly

+ BETA_SchedDelayLate * Opt2_SchedDelayLate

3 Opt3 one ASC_3 * one + BETA_LogFare_Business * Opt3LogFare_Business

+ BETA_LogFare_NonBusiness * Opt3LogFare_NonBusiness

+ BETA_TotalTripTime [ BETA_TotalTripTime_std ] * Opt3_TotalTriptime

+ BETA_Legroom * Opt3_Legroom

+ BETA_SchedDelayEarly * Opt3_SchedDelayEarly

+ BETA_SchedDelayLate * Opt3_SchedDelayLate

[Expressions]

one = 1

Opt1LogFare_Business = log( Opt1_Fare ) * ( Trip_Purpose <> 2 )

Opt1LogFare_NonBusiness = log( Opt1_Fare ) * ( Trip_Purpose == 2 )

Opt2LogFare_Business = log( Opt2_Fare ) * ( Trip_Purpose <> 2 )

Opt2LogFare_NonBusiness = log( Opt2_Fare ) * ( Trip_Purpose == 2 )

Opt3LogFare_Business = log( Opt3_Fare ) * ( Trip_Purpose <> 2 )

Opt3LogFare_NonBusiness = log( Opt3_Fare ) * ( Trip_Purpose == 2 )

[Model]

$MNL

[Draws]

100

Figure 9.1: Airline itinerary logit model specification with a random param-eter

171

Page 176: 220714279-Workbook-2012

172 mixtures of logit and gev models

Model Estimation ResultsVariable Variable Coefficient standard t-stat. 0 p-valuenumber name estimate error

1 ASC2 -1.14 0.230 -4.95 0.002 ASC3 -1.22 0.229 -5.31 0.003 βLegroom 0.219 0.0455 4.81 0.004 βLogFare Business -7.54 1.01 -7.44 0.005 βLogFare NonBusiness -10.5 0.900 -11.66 0.006 βSchedDelayEarly -0.196 0.0285 -6.86 0.007 βSchedDelayLate -0.127 0.0257 -4.93 0.008 βTotalTripTime -0.665 0.191 -3.48 0.009 βTotalTripTime std -0.579 0.208 -2.78 0.01

Summary statisticsNumber of observations = 1633

L(0) = −1794.034

L(β) = −1008.504

ρ2 = 0.433

Table 9.1: Estimation results for the Airline itinerary logit model with arandom parameter

172

Page 177: 220714279-Workbook-2012

swissmetro case 173

9.2 Swissmetro Case

Alternative Specific Variance Model

Files to use with Biogeme:Model file: Mixture SM AltSpVar.modData file: swissmetro.dat

In this first model specification, we assume that the ASC’s are randomlydistributed. We show below the utility expressions and in Figure 9.2 therelated Biogeme snapshot1.

Vcar = ASCcar + βtimeCAR TT + βcostCAR CO

Vtrain = βtimeTRAIN TT + βcostTRAIN CO+ βheTRAIN HE

VSM = ASCSM + βtimeSM TT + βcostSM CO + βheSM HE

This model is very simple. The parameters are assumed to be generic overthe alternatives, and just a few variables are taken into account. ASCcar

and ASCSM are now randomly distributed, with mean αcar and αSM andstandard deviation σcar and σSM, which are both estimated. We normalizewith respect to the train alternative, and the estimation results are shown inTable 9.2. Note that this is a simplification of the proper estimation processthat is needed for alternative specific variance estimation. Recall that thenormalization is not arbitrary in that only the minimum variance alternativecan be normalized to 0. Therefore, proper estimation requires first thatan unidentified model be estimated (with all three variances in this case).Then, the model should be re-estimated with the smallest variance from theunidentified model normalized to 0.

The estimated values of the time, cost and headway coefficients show theirnegative impact on the utility functions. Time and cost estimated coeffi-cients are numerically very close, indicating the same negative impact, whichis larger than that of headway. The estimated ASC’s show that, all the rest

1Lines in the Biogeme snapshots have been broken but in the original Biogeme .mod

file they are not.

173

Page 178: 220714279-Workbook-2012

174 mixtures of logit and gev models

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 αcar 0.244 0.107 2.292 αSM 0.845 0.178 4.753 σcar 0.0992 0.0974 1.024 σSM 2.92 0.417 7.005 βcost -0.0169 0.00155 -10.946 βhe -0.00763 0.00133 -5.727 βtime -0.0166 0.00192 -8.66

Summary statisticsNumber of draws = 100

Number of observations = 6768

L(0) = −6964.663

L(β) = −5257.982

ρ2 = 0.244

Table 9.2: Alternative specific variance specification

174

Page 179: 220714279-Workbook-2012

swissmetro case 175

[Utilities]

// Id Name Avail linear-in-parameter expression

1 SBB_SP TRAIN_AV_SP ASC_SBB * one + BETA_TIME * TRAIN_TT +

BETA_COST * TRAIN_COST + BETA_HE * TRAIN_HE

2 SM_SP SM_AV ASC_SM [ ASC_SM_std ] * one + BETA_TIME * SM_TT

+ BETA_COST * SM_COST + BETA_HE * SM_HE

3 Car_SP CAR_AV_SP ASC_CAR [ ASC_CAR_std ] * one +

BETA_TIME * CAR_TT + BETA_COST * CAR_CO

Figure 9.2: The Biogeme snapshot illustrating the alternative specific vari-ance specification

remaining constant, both car and Swissmetro alternatives are preferred, onaverage, to the train alternative. The average preference for the innovativetransportation mode is larger in value and its standard deviation is signif-icantly different from zero as well as greater than the mean. This meansthat part of the population prefers the train to the Swissmetro (all the restbeing constant). We could argue that one of the reasons is more strict bud-get issues, for example, related to individuals with lower incomes. Note alsothat the variance parameter σcar for the ASC associated with the car alter-native is not significant. We could therefore define the parameter ASCcar asa constant in order to reduce the complexity of the model.

Only 100 random draws have been used for the estimation. Note that this isnot enough. We have chosen few draws in order to decrease the estimationtime for the case study. For more theoretical details on this choice, we referthe reader to Train (2003) 2.

Error Component Model

Files to use with Biogeme:Model files: Mixture SM EC1.mod, Mixture SM EC2.modData file: swissmetro.dat

2The number of random draws is an important issue in simulated estimations. Forreliable values, such a number should theoretically be ∞, as the Simulated MaximumLikelihood estimator is not consistent for a finite number of draws. In practical applica-tions, the trade-off between the reliability of the estimates and a reasonable computationaltime becomes the most important issue. By default, Biogeme uses pseudo-random draws.

175

Page 180: 220714279-Workbook-2012

176 mixtures of logit and gev models

[Utilities]

// Id Name Avail linear-in-parameter expression

1 SBB_SP TRAIN_AV_SP ASC_SBB * one + BETA_TIME * TRAIN_TT +

BETA_COST * TRAIN_COST + BETA_HE * TRAIN_HE +

RAIL [ RAIL_std ] * one

2 SM_SP SM_AV ASC_SM * one + BETA_TIME * SM_TT + BETA_COST * SM_COST

+ BETA_HE * SM_HE + RAIL [ RAIL_std ] * one

3 Car_SP CAR_AV_SP ASC_CAR * one + BETA_TIME * CAR_TT +

BETA_COST * CAR_CO

Figure 9.3: The Biogeme snapshot illustrating how the error componentspecification is implemented.

This first error component model attempts to capture the correlation betweenthe train and Swissmetro alternatives. They are both rail-based transporta-tion modes, so the hypothesis is that they share unobserved attributes. Weshow below the systematic utility expressions and in Figure 9.3 the relatedBiogeme snapshot.

Vcar = ASCcar + βtimeCAR TT + βcostCAR CO

Vtrain = βtimeTRAIN TT + βcostTRAIN CO

+ βheTRAIN HE + ζrail

VSM = ASCSM + βtimeSM TT + βcostSM CO

+ βheSM HE + ζrail

The train and SM modes share the random term ζrail, which is assumed tobe normally distributed ζrail ∼ N(mrail, σ

2rail). We estimate the standard

deviation σrail of this error component, while the mean mrail is fixed to zero.The estimation results are shown in Table 9.3. The interpretation is substan-tially the same as before. σrail has been estimated significantly different fromzero, capturing the correlation between the train and the Swissmetro alter-natives. This parameter is actually the element of the variance-covariancematrix capturing the correlation between Swissmetro and train.

In the following model, we use a more complex error structure. The ideais that train and SM are correlated, both being rail-based transportationmodes, but also that train and car are correlated representing more classical

176

Page 181: 220714279-Workbook-2012

swissmetro case 177

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 0.184 0.0801 2.302 ASCSM 0.449 0.0935 4.803 βcost -0.0109 0.000684 -15.924 βhe -0.00536 0.000984 -5.455 βtime -0.0128 0.00105 -12.196 σrail 0.153 0.0576 2.66

Summary statisticsNumber of draws = 100

Number of observations = 6768

L(0) = −6964.663

L(β) = −5314.698

ρ2 = 0.236

Table 9.3: Error component specification. The σrail coefficient is the standarddeviation of the random term capturing the unobserved shared attributesbetween the train and Swissmetro alternatives.

177

Page 182: 220714279-Workbook-2012

178 mixtures of logit and gev models

[Utilities]

// Id Name Avail linear-in-parameter expression

1 SBB_SP TRAIN_AV_SP ASC_SBB * one + BETA_TIME * TRAIN_TT +

BETA_COST * TRAIN_COST + BETA_HE * TRAIN_HE +

RAIL [ RAIL_std ] * one +

CLASSIC [ CLASSIC_std ] * one

2 SM_SP SM_AV ASC_SM * one + BETA_TIME * SM_TT +

BETA_COST * SM_COST + BETA_HE * SM_HE +

RAIL [ RAIL_std ] * one

3 Car_SP CAR_AV_SP ASC_CAR * one + BETA_TIME * CAR_TT +

BETA_COST * CAR_CO + CLASSIC [ CLASSIC_std ] * one

Figure 9.4: The Biogeme snapshot for the second error component specifica-tion.

transportation modes with respect to the more innovative Swissmetro. Thecorresponding utility functions are

Vcar = ASCcar + βtimeCAR TT + βcostCAR CO+ ζclassic

Vtrain = βtimeTRAIN TT + βcostTRAIN CO+ βheTRAIN HE

+ ζrail + ζclassic

VSM = ASCSM + βtimeSM TT + βcostSM CO

+ βheSM HE+ ζrail

and the related Biogeme snapshot is shown in Figure 9.4. As before, therandom terms are assumed to be normally distributed ζrail ∼ N(mrail, σ

2rail)

and ζclassic ∼ N(mclassic, σ2classic). The standard deviations, σrail and σclassic,

are estimated, while the means mrail and mclassic are fixed to zero.

A similar correlation pattern could be specified by means of a Cross-NestedLogit model where the SM alternative belongs to a rail nest, the car alter-native belongs to a classic nest and the train alternative is assigned withcertain degrees of membership to both rail and classic nests. In the model,we have normalized with respect to the train alternative. The estimationresults are shown in Table 9.4.

ASCSM and ASCcar have positive values, indicating a preference towardsSwissmetro and car over train, all the rest being constant. The interpreta-tion of the cost, time and headway coefficients remains the same. Only the

178

Page 183: 220714279-Workbook-2012

swissmetro case 179

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar 0.254 0.110 2.322 ASCSM 0.865 0.238 3.633 βcost -0.0166 0.00165 -10.054 βhe -0.00759 0.00134 -5.665 βtime -0.0160 0.00197 -8.126 σclassic 2.86 0.526 5.447 σrail 0.0982 0.101 0.97

Summary statisticsNumber of draws = 100

Number of observations = 6768

L(0) = −6964.663

L(β) = −5261.818

ρ2 = 0.243

Table 9.4: Error component specification. Train and car share unobservedattributes through ζclassic and train and SM through ζrail.

179

Page 184: 220714279-Workbook-2012

180 mixtures of logit and gev models

[Utilities]

// Id Name Avail linear-in-parameter expression

1 SBB_SP TRAIN_AV_SP ASC_SBB * one + BETA_TIME * TRAIN_TT

+ BETA_TRAIN_COST [ BETA_TRAIN_COST_std ] * TRAIN_COST

+ BETA_HE [ BETA_HE_std ] * TRAIN_HE

2 SM_SP SM_AV ASC_SM * one + BETA_TIME * SM_TT

+ BETA_SM_COST [ BETA_SM_COST_std ] * SM_COST

+ BETA_HE [ BETA_HE_std ] * SM_HE

3 Car_SP CAR_AV_SP ASC_CAR * one + BETA_TIME * CAR_TT

+ BETA_CAR_COST [ BETA_CAR_COST_std ] * CAR_CO

Figure 9.5: The Biogeme snapshot for the random coefficient specification.

standard deviation related to ζclassic is significantly different from zero 3

Random Coefficients

Files to use with Biogeme:Model file: Mixture SM Randcoeff.modData file: swissmetro.dat

In this specification, the unknown parameters are assumed to be randomlydistributed over the population. They capture the so called taste variation ofindividuals. The utility expressions are shown below and the related Biogemesnapshot in Figure 9.5.

Vcar = ASCcar + βtimeCAR TT + βcar costCAR CO

Vtrain = βtimeTRAIN TT + βtrain costTRAIN CO + βheTRAIN HE

VSM = ASCSM + βtimeSM TT + βSM costSM CO+ βheSM HE

We have three alternative-specific coefficients for the cost variable which arenormally distributed with means mcar cost, mtrain cost, and mSM cost and stan-dard deviations σcar cost, σtrain cost, and σSM cost, respectively. The coefficient

3The signs of the estimated standard deviations are always reported as positive. InBiogeme they may be reported as negative. If so, just ignore the sign and consider theabsolute value.

180

Page 185: 220714279-Workbook-2012

swissmetro case 181

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -1.47 0.177 -8.302 ASCSM -0.915 0.130 -7.073 mcar cost -0.0168 0.00409 -4.114 σcar cost 0.00883 0.00329 2.685 mtrain cost -0.0588 0.00484 -12.146 σtrain cost 0.0229 0.00209 10.947 mSM cost -0.0162 0.00217 -7.488 σSM cost 0.00814 0.00204 3.999 mhe -0.00619 0.00121 -5.1210 σhe 0.00102 0.00415 0.2511 βtime -0.0129 0.00168 -7.72

Summary statisticsNumber of draws = 100

Number of observations = 6768

L(0) = −6964.663

L(β) = −4979.704

ρ2 = 0.283

Table 9.5: Random coefficient specification assuming normal distributions.

related to headway is also assumed to be randomly distributed over the pop-ulation, with mean mhe and standard deviation σhe.

The estimation results are shown in Table 9.5. The ASC’s have negativesigns, and their values still show a preference, all the rest remaining constant,for the train with respect to both the car and Swissmetro alternatives. Themean for the car cost coefficient is negative, as expected, and the standarddeviation σcar cost is significantly different from zero. Its numerical valueindicates that the probability that the parameter has a negative value is97.15%. The assumed Normal distribution allows for non-zero probabilitiesof having a positive car cost coefficient. Similar considerations can be madefor the other random coefficients. The mean for the train cost coefficient isnegative, as expected, and both the mean and the standard deviation are

181

Page 186: 220714279-Workbook-2012

182 mixtures of logit and gev models

[GeneralizedUtilities]

1 exp( BETA_TIME [ BETA_TIME_std ] ) * TRAIN_TT

2 exp( BETA_TIME [ BETA_TIME_std ] ) * SM_TT

3 exp( BETA_TIME [ BETA_TIME_std ] ) * CAR_TT

Figure 9.6: The Biogeme Log-Normal specification.

significant. Computing the cumulative distribution function (cdf) for theNormal distribution with these parameters, we observe that the cumulativeprobability of having a train cost coefficient less than zero is 99.49%. For theSM cost parameter (both mean and standard deviation are significant), wehave the cdf for negative values equal to 97.67%. The mean of the headwayparameter is negative as expected, and its standard deviation has not beenestimated significantly different from zero.

Different distributions We show here two examples of Biogeme codeto specify a random coefficient model where the parameters are log-normallyand Johnsons Sb distributed. The Biogeme snapshots are shown in Figures9.6and 9.7, respectively. Recall that a variable X is log-normally distributed ify = ln(X) is normally distributed. We can easily define in Biogeme such adistribution by assuming a generic time coefficient to be log-normally dis-tributed.

In the case of Johnsons SB distribution, the functional form is derived using aLogit-like transformation of a Normal distribution, as defined in the followingequation

ξ = a+ (b− a)eζ

eζ + 1

where ζ ∼ N(µ, σ2). This distribution is very flexible; it is bounded betweena and b and its shape can change from a very flat one to a bimodal, changingthe parameters of the normal variable. It requires the estimation of fourparameters (a, b, µ and σ) and a nonlinear specification, assuming as before,a generic time coefficient following such a distribution.

The topic of the functional form for random coefficient distributions is treatedin more detail in, for example, Train (2003) and Walker et al. (2007).

182

Page 187: 220714279-Workbook-2012

swissmetro case 183

[GeneralizedUtilities]

1 ( A + ( ( B - A ) * ( exp( BETA_TIME [ BETA_TIME_std ] )

/ ( exp( BETA_TIME [ BETA_TIME_std ] ) + 1 ) ) ) ) * TRAIN_TT

2 ( A + ( ( B - A ) * ( exp( BETA_TIME [ BETA_TIME_std ] )

/ ( exp( BETA_TIME [ BETA_TIME_std ] ) + 1 ) ) ) ) * SM_TT

3 ( A + ( ( B - A ) * ( exp( BETA_TIME [ BETA_TIME_std ] )

/ ( exp( BETA_TIME [ BETA_TIME_std ] ) + 1 ) ) ) ) * CAR_TT

Figure 9.7: The Biogeme SB specification.

Mixture of GEV Models

Files to use with Biogeme:Model file: Mixture SM M-NL.modData file: swissmetro.dat

In this example, we capture the substitution patterns using a Nested Logitmodel, and we allow for some parameters to be randomly distributed overthe population.

Vcar = ASCcar + βcar timeCAR TT + βcostCAR CO

Vtrain = βtrain timeTRAIN TT + βcostTRAIN CO + βheTRAIN HE

+ βgaGA+ βseniorSENIOR

VSM = ASCSM + βSM timeSM TT + βcostSM CO+ βheSM HE

+ βgaGA+ βseatsSM SEATS

We have added the socio-economic characteristics senior (a dummy variablefor senior people, i.e. age above 65), ga and SM seats. A few observationshave been removed where the variable Age was missing. We specify a nestcomposed of alternatives car and train representing standard transportationmodes, while the Swissmetro alternative represents the technological inno-vation. We further assume a generic cost parameter and three randomlydistributed alternative-specific time parameters. Normal distributions are

183

Page 188: 220714279-Workbook-2012

184 mixtures of logit and gev models

Estimation resultsParameter Parameter Parameter Robust Robust Robustnumber name estimate standard error t stat. 0 t stat. 1

1 ASCcar -0.145 0.120 -1.212 ASCSM 0.185 0.115 1.613 βsenior 1.53 0.132 11.654 mcar time -0.0134 0.000996 -13.435 βcost -0.00961 0.000817 -11.766 σcar time 0.00462 0.000499 9.267 βga 1.01 0.149 6.798 βhe -0.00467 0.000878 -5.329 βseats -0.262 0.100 -2.6210 mSM time -0.0152 0.00132 -11.5611 σSM time 0.00877 0.00139 6.2912 mtrain time -0.0158 0.00109 -14.5013 σtrain time 0.000741 0.000661 1.1214 µclassic 1.85 0.141 13.10 6.02

Summary statisticsNumber of draws = 100

Number of observations = 6759

L(0) = −6958.425

L(β) = −4956.477

ρ2 = 0.286

Table 9.6: Mixture of Nested Logit estimation results

used for the random coefficients, that is,

βcar time ∼ N(mcar time, σ2car time)

βtrain time ∼ N(mtrain time, σ2train time)

βSM time ∼ N(mSM time, σ2SM time).

The estimation results are shown in Table 9.6. The nest parameter hasbeen estimated significantly different from 1, showing a correlation betweenthe train and car alternatives, as expected. The three mean parameters for

184

Page 189: 220714279-Workbook-2012

swissmetro case 185

the time coefficients have been estimated with negative signs (as expected)and are significantly different from zero. Their numerical values are onlyslightly different, suggesting that probably a generic specification would havebeen acceptable. For the car and Swissmetro time coefficients, the estimatedstandard deviations are significant and one magnitude order less than themean value. It means that their distribution over the population is verypeaked, indicating that the way different individuals perceive the negativeimpact of travel time on the alternatives’ utilities is not so different. Finally,given the narrow shape of the estimated random coefficient distributions,other choices than the normal would probably be suitable, such as boundeddistributions.

Mixture of Logit with Panel Data

Files to use with Biogeme:Model file: Mixture SM panel.modData file: swissmetro.dat

In this example, we take into account the fact that we have panel data inthe sample file. Indeed, the sample file is composed of nine observations perindividual. These nine observations correspond to the choices made by asingle respondent in nine hypothetical mode choice situations described inthe questionnaire of the Swissmetro survey. The idea is thus to specify amodel which is able to deal with sequences of observed choices and with theintrinsic correlation among the choices of a sequence.

The specification file Mixture SM panel.mod is based on the model MNL -SM specific.mod with alternative-specific cost coefficients which has been an-alyzed in the Case Study dealing with logit models. We have added thefollowing section:

[PanelData]

ID

ZERO_SIGMA_PANEL

where ID is the name of the variable in the dataset identifying the observa-tions belonging to a given individual, and ZERO_SIGMA_PANEL is the name of

185

Page 190: 220714279-Workbook-2012

186 mixtures of logit and gev models

the random coefficient which will not vary across observations from the sameindividual.

The way we deal with panel data is therefore to use a Mixture of Logit modelwith random coefficients specification. More precisely, we add individualspecific error terms (specified in Biogeme by ZERO [ SIGMA_PANEL ] * one)in two alternatives (we need to normalize one alternative), where the standarddeviation (SIGMA_PANEL) needs to be estimated while the mean (ZERO) isfixed to zero. The utility functions for this model can therefore be specifiedin Biogeme as follows:

[Utilities]

Car ASC_CAR * one + BETA_TIME * CAR_TT + BETA_CAR_COST *

CAR_CO + ZERO [ SIGMA_PANEL ] * one

Train ASC_SBB * one + BETA_TIME * TRAIN_TT +

BETA_TRAIN_COST * TRAIN_COST + BETA_HE * TRAIN_HE +

ZERO [ SIGMA_PANEL ] * one

SM ASC_SM * one + BETA_TIME * SM_TT + BETA_SM_COST *

SM_COST + BETA_HE * SM_HE

We see from the estimation results presented in Table 9.7 that the coef-ficient σpanel is highly significant, which means that this model allows forcapturing intrinsic correlations among the observations of the same indi-vidual. Moreover, the final log-likelihood value is −4235.440, which is muchgreater (in absolute value) than the value −5068.560 obtained with the modelMNL SM specific.mod without a panel term. The interpretation of other co-efficients remains the same as that for the coefficients ofMNL SM specific.mod,except that ASCSM is no longer significantly different from 0.

186

Page 191: 220714279-Workbook-2012

swissmetro case 187

Estimation resultsParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCcar -0.988 0.390 -2.532 ASCSM -0.291 0.531 -0.553 βcar cost -0.0132 0.00324 -4.074 βtrain cost -0.0323 0.00574 -5.635 βSM cost -0.0163 0.00262 -6.226 βhe -0.00757 0.00127 -5.967 βtime -0.0190 0.00616 -3.098 σpanel 2.39 0.216 11.06

Summary statisticsNumber of draws = 100

Number of individuals = 752

L(0) = −6964.663

L(β) = −4235.440

ρ2 = 0.391

Table 9.7: Mixture of logit model with panel data.

187

Page 192: 220714279-Workbook-2012

188 mixtures of logit and gev models

188

Page 193: 220714279-Workbook-2012

Chapter 10

Simultaneous RP/SPEstimation

This case study deals with the simultaneous estimation of a Binary Logitmodel from revealed and stated preference (RP and SP) data. The objec-tive of this case study is to estimate Binary Logit models with RP, SP andcombined RP/SP datasets and compare the results of the three models.

The intercity mode choice dataset taken from the Nijmegen, Netherlands,will be used in this case study. The survey was conducted during 1987 forthe Netherlands Railways to assess factors that influence the choice betweenrail and car for intercity travel. The detailed description of the data collectionmethod and variable definitions are presented in the Appendix, section A.2.

189

Page 194: 220714279-Workbook-2012

190 simultaneous rp/sp estimation

10.1 Model Specification with RP Data

Files to use with Biogeme:Model file: RP-SP NL rp.modData file: netherlands.dat

The simple RP model consists of travel time and travel cost with genericcoefficients for both alternatives.

Vauto = βtimecartime + βcostcarcost

Vrail = ASCrp-rail + βtimerailtime + βcostrailcost

The estimation results are shown in Table 10.1. The results show that theutility of a mode decreases with increase in total travel time and travel cost.

10.2 Model Specification with SP Data

Files to use with Biogeme:Model file: RP-SP NL sp.modData file: netherlands.dat

The simple SP model is estimated with a generic cost coefficient, a generictime coefficient, and an inertia variable (rpchoice) in the rail utility. Theinertia variable captures the effect of the actual choice of the responder onhis/her SP response (based on the hypothesis that people who have chosena particular mode in an actual case will tend to have a bias towards thatmode).

The sample size here is composed of 1511 observations. The coefficients havethe expected sign, and they are significantly different from zero at a 95%level of confidence. Note that the ASC associated with the rail alternativeis now negative. Combined with the inertia coefficient, this implies that theintercept is negative for car users and positive for rail users. The inertiaeffect of the actual choice is significant in the SP experiment.

The estimation results are shown in Table 10.2.

190

Page 195: 220714279-Workbook-2012

model specification with combined rp-sp data 191

10.3 Model Specification with Combined RP-

SP Data

Files to use with Biogememodel file: RP-SP NL rpsp.moddata file: netherlands.dat

Having defined the utility functions for the RP model as follows:

URP = VRP + εRP

and those of the SP model as follows:

USP = VSP + εSP,

we have already estimated separately the RP model and the SP model. Now,in order to perform a joint estimation of both models, that is an RP-SPmodel, it is mandatory that the variances of error terms are the same. Thisis why we assume that:

Var(εRP) = Var(θεSP) = θ2Var(εSP).

The utilities for the RP and SP models can now be rewritten as

URP = VRP + εRP

θUSP = θVSP + θεSP

and the error terms (εRP and θεSP) of both models have the same vari-ance. Assume that VSP

in = βXSPin is a linear in parameter specification. Then

θVSPin = θβXSP

in , where both θ and β must be estimated introducing a nonlin-ear specification.

In this example, the combined RP-SP model consists of total travel time andtravel cost for both types of observations, and inertia (rpchoice) in rail forthe SP observations. The scale of the RP observations is fixed at 1, and θ

therefore represents the scale of the SP observations. The model is estimatedon a total of 1739 observations.

The estimation results are shown in Table 10.3. The negative and significantcoefficient for the alternative specific constant in the SP rail alternative in-dicates that all else being equal, car users tend to dislike rail in the SP case.

191

Page 196: 220714279-Workbook-2012

192 simultaneous rp/sp estimation

The inertia dummy was found to have a large impact on the utility both interms of value and statistical significance. The scale parameter θ was alsofound to be significantly different from one indicating a significant differencein the variance between the RP and SP data.

Finally, we can do a likelihood ratio test to test for stability of preferences1.Specifically, the null hypothesis is:

H0 : βRP = βSP.

The test statistic for the null hypothesis is given by

−2(LR − LU) = −2(−780.124+ 123.133+ 656.991) = 0.000

where the restricted model is the combined RP-SP model and the unre-stricted model is comprised of the separate RP and SP models.

The test statistic is asymptotically χ2 distributed with the degrees of freedomequal to KRP + KSP − KRP−SP = 3+ 4− 6 = 1.

Since 0.000 < 3.841 (the critical value of the χ2 distribution with 1 degreeof freedom at a 95 % level of confidence), we accept the null hypothesis ofstability of preferences (i.e. the combined RP-SP model).

1Note that the likelihood ratio test in such a situation is an approximate test. Thetest results are asymptotically valid if the standard errors and the robust standard errorsare approximately the same. However, if there are substantial differences between thestandard errors and the robust standard errors, the likelihood ratio test results may bemisleading and Wald / Lagrange Multiplier tests are more appropriate.

192

Page 197: 220714279-Workbook-2012

model specification with combined rp-sp data 193

BL with RP dataParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCrp-rail 0.798 0.275 2.902 βcost -0.0499 0.0107 -4.673 βtime -1.33 0.354 -3.75

Summary statisticsNumber of observations: 228L(0) = −158.038

L(β) = −123.133

ρ2 = 0.202

Table 10.1: BL with RP data estimation results

BL with SP dataParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCsp-rail -1.62 0.128 -12.652 βinert 2.72 0.144 18.913 βcost -0.0170 0.00384 -4.424 βtime -0.447 0.0977 -4.58

Summary statisticsNumber of observations: 1511L(0) = −1047.350

L(β) = −656.991

ρ2 = 0.369

Table 10.2: BL with SP data estimation results

193

Page 198: 220714279-Workbook-2012

194 simultaneous rp/sp estimation

BL with combined RP-SP dataParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCrp-rail 0.798 0.275 2.92 ASCsp-rail -4.79 1.35 -3.543 βinert 8.03 1.91 4.214 βcost -0.05 0.00965 -5.185 βtime -1.32 0.293 -4.516 θ 0.339 0.0817 -8.09*

Summary statisticsNumber of observations: 1739L(0) = −1205.383

L(β) = −780.124

ρ2 = 0.348

* Robust t statistic 1

Table 10.3: BL with RP and SP data estimation results

194

Page 199: 220714279-Workbook-2012

Bibliography

Abbe, E., Bierlaire, M. and Toledo, T. (2007). Normalization and correlationof cross-nested logit models, Transportation Research B: Methodological41(7): 795–808.

Ben-Akiva, M. and Lerman, S. R. (1985). Discrete Choice Analysis: Theoryand Application to Travel Demand, MIT Press, Cambridge, MA.

Ben-Akiva, M. and Morikawa, T. (1990). Revealed preferences and statedintentions, Transporation Research A 24A(6): 485–495.

Bierlaire, M., Axhausen, K. and Abay, G. (2001). The acceptanceof modal innovation: The case of swissmetro, Proceedings of the1st Swiss Transportation Research Conference, Ascona, Switzerland.www.strc.ch/bierlaire.pdf.

Cherchi, E. and Ortuzar, J. (2002). Mixed RP/SP models incorporatinginteraction effects, Transportation 29: 371–395.

Ekman, P. and Friesen, W. V. (1978). Facial Action Coding System Investi-gator’s Guide, Consulting Psycologist Press, Palo Alto, CA.

Kanade, T., Cohn, J. and Tian, Y. L. (2000). Comprehensive databasefor facial expression analysis, Proceedings of the 4th IEEE InternationalConference on Automatic Face and Gesture Recognition (FG’00), pp. 46– 53.

McFadden, D. (1987). Regression-based specification tests for the multino-mial logit model, Journal of Econometrics 34(1/2): 63–82.

Train, K. (2003). Discrete Choice Methods with Simulation, Cambridge Uni-versity Press, University of California, Berkeley.

195

Page 200: 220714279-Workbook-2012

196 BIBLIOGRAPHY

Train, K., Ben-Akiva, M. and Atherton, T. (1989). Consumption patternsand self-selecting tariffs, Review of Economics and Statistics 71(1): 62–73.

Train, K., McFadden, D. and Ben-Akiva, M. (1987). The demand for localtelephone service: a fully discrete model of residential calling patternsand service choices, Rand Journal of Economics .

Walker, J., Ben-Akiva, M. and Bolduc, D. (2007). Identification of parame-ters in normal error component logit-mixture (neclm) models, Journalof Applied Econometrics 22(6): 1095–1125.

196

Page 201: 220714279-Workbook-2012

Appendix A

Datasets

A.1 Choice-Lab-Fashion Marketing Case

Context

Choice-Lab-Fashion1 is a European company that specializes in collectingand processing data on companies operating in the fashion industry. Thecompany sells marketing solutions and operates in a business-to-businessmarket covering producers and distributors of clothes, shoes and accessories.Choice-Lab-Fashion provides its clients with a selection of products that willhelp them make the right decision in connection with competitors analysis,segmentation, sales management and direct marketing. Choice-Lab-Fashionhas lately been experiencing a decrease in its customer base. As the fashionindustry has been consolidating in the past few years, one reason might berelated to the shrinking of the target market. Choice-Lab-Fashion has alsobeen introducing new products that might have been cannibalizing someolder ones.

The management team would like to investigate if there is a possibility tounderstand what are the factors characterizing customer departure. However,the company does not have a survey and in the near future they would

1By using this dataset, the student agrees to use it only for academic purposes relatedto this course. The user is responsible for keeping the data on a computer with secureaccess. The user of the data must not transfer it or distribute it to third parties. Thename Choice-Lab-Fashion is fictitious.

197

Page 202: 220714279-Workbook-2012

198 datasets

like to see if it is possible to learn something from the available customerdata. The customer data, described below, include a set of socio-economiccharacteristics of Choice-Lab-Fashion’s clients and the type of product theypurchased. These variables are individual specific and do not vary betweenalternatives.

Data

The Choice-Lab-Fashion customer database includes an unbalanced panel ofdata from 2000 until 2002. For each row of data, we observe customer ID,year of observation, some financial and economic indicators and the type ofproducts that the customer has purchased. The dependent variable indicatesa binary choice. It is equal to one when the customer decides to defect andzero when the customer decides to stay with the company. (There is no moredata on a client once it has defected.)

Note that out of 16220 observations, there is one that equals zero (the cus-tomer is still with the company) but none of the products has been purchased.We believe that this is probably an error from the data provider, and thisobservation has therefore been excluded in the model estimations (see the.mod Biogeme files in the [Exclude] section).

• Unit of analysis: firm

• Observation period: 2000 - 2002

• Choice set: choice made by the firm if remaining as client or not.

Choice-Lab-Fashion has 10 different products which are described below.

• Product 1: Fashion Industry Analysis ReportThe fashion industry analysis report provides key figures in the past5 years on the clothing, shoes and accessories sectors. Fashion indus-try analysis reports include mergers acquisitions and bankruptcies thathave characterized the industry in the past 5 years. The report alsoprovides an opportunity to compare the 10 largest competitors in thegiven sector (clothes, shoes or accessories).

198

Page 203: 220714279-Workbook-2012

choice-lab-fashion marketing case 199

• Product 2: Fashion Credit InfoChoice-Lab-Fashion helps the client if it is about to grant credit toeither a domestic or foreign based customer. The fashion credit re-port also contains an overview of the company’s management, history,and accounting figures. This service is mainly used by manufacturingcompanies in relation to their distributors or wholesalers.

• Product 3: Individual Accounts DatabaseThis product allows clients to retrieve copies of past 5 annual accountson companies via the Choice-Lab-Fashion web site. The accounts aresent via email. This product is mainly used by first time shoppersand does not generate a lot of revenues to Choice-Lab-Fashion. Butit gives Choice-Lab-Fashion exposure to potential clients. If the clientpurchases in 30 days after this first purchase any of the main dataaccess products offered by Choice-Lab-Fashion, the cost of this firstpurchase will be deducted from the client’s new invoice.

• Product 4: Customized Individual Business MonitoringChoice-Lab-Fashion monitors different areas (i.e., product launches fi-nancial information, ownership change) of a list of companies of theclient’s choice. This product is customized to the client’s needs.

• Product 5: Web Access Real-Time Fashion DataWeb access real-time fashion data is an Internet based program withreal-time data. It allows to perform company searches on the fullChoice-Lab-Fashion data or on geographic segments of it. The datacannot be downloaded but the user can generate simple reports as textfiles.

• Product 6: CD-FashionCD-Fashion is the most complete database of companies in the formof a CD which is being updated semiannually. It contains accountingand financial data, companies addresses, and their ownership informa-tion. As manufacturers often produce several brands, for the top 30manufacturers in each country the data also include information onthe name of brands produced. (Choice-Lab-Fashion collects this infor-mation via the daily press and telephone interviews with the differentmanufacturers every six months.)

199

Page 204: 220714279-Workbook-2012

200 datasets

• Product 7: CRM-F IntegratedCRM-F Integrated is a user-friendly Internet based customer relationsmanagement system. It is an integrated and professional tool to helpcontrol and plan all activities directed towards customers, prospectsand suppliers. This solution is normally purchased by large accountsthat have a well developed and integrated IT platform.

• Product 8: Internet-CreditAn internet-credit annual subscription gives access to credit informa-tion of companies. The credit information provides a detailed overviewof the company’s credit limits and credit rating. The data cannot bedownloaded but the client can generate small reports as text files.

• Product 9: Open Fashion Data BaseReal-time access to Choice-Lab-Fashion database. This feature ensuresthat the client has access to the latest company data. These data areupdated daily by Choice-Lab-Fashion’s staff. The data include newproduct launches, mergers, acquisitions and bankruptcies. The datacan be downloaded as a data file.

• Product 10: Other Customized SolutionsThis product includes other solutions such as:Fashion Event Analysis describes what is going on in a specific area interms of events related to the fashion world.In-depth Interviews - Focus Groups Choice-Lab-Fashion can carry in-depth interviews and focus groups in the fashion industry for a list ofcompanies identified by its customers.

Variables and Descriptive Statistics

In Table A.1, we summarize the variables of the dataset, and in Table A.2we summarize the descriptive statistics.

200

Page 205: 220714279-Workbook-2012

choice-lab-fashion marketing case 201

Variable DescriptionChoice Equals 1 if customer drops next year; 0 otherwiseID Company IDIndAnalysis Equals 1 if product 1 has been purchased; 0 otherwiseCreditInfo Equals 1 if product 2 has been purchased; 0 otherwiseAccounts Equals 1 if product 3 has been purchased; 0 otherwiseMonitor Equals 1 if product 4 has been purchased; 0 otherwiseWeb Equals 1 if product 5 has been purchased; 0 otherwiseCD Equals 1 if product 6 has been purchased; 0 otherwiseCRM Equals 1 if product 7 has been purchased; 0 otherwiseInternet Equals 1 if product 8 has been purchased; 0 otherwiseOpenDB Equals 1 if product 9 has been purchased; 0 otherwiseOther Equals 1 if product 10 has been purchased; 0 otherwiseAge Number of years the client has existedRating Client credit rating: 100 represents the best and 0 the

worst (this is a proxy for the current financial conditionof the client)

Year Year of observationNegProfit Equals 1 if profit < 0; 0 otherwiseNegEquity Equals 1 if equity < 0; 0 otherwiseLRSC Equals 1 if company is a limited responsibility stock

owned company; 0 otherwiseLRC Equals 1 if a company is a limited responsibility com-

pany; 0 otherwiseNbEmpl Total number of employeesLnNbEmpl Natural log of the number of employeesLnAge Natural log of the age of the company

Table A.1: Description of the variables in the dataset

201

Page 206: 220714279-Workbook-2012

202 datasets

Variable Mean Std. Dev. Min MaxChoice 0.19 0.39 0 1ID 253164.74 259851.50 830 1091364IndAnalysis 0.27 0.45 0 1CreditInfo 0.35 0.48 0 1Accounts 0.29 0.46 0 1Monitor 0.02 0.14 0 1Web 0.13 0.34 0 1CD 0.34 0.47 0 1CRM 0.00 0.06 0 1Internet 0.06 0.24 0 1OpenDB 0.00 0.04 0 1Other 0.52 0.50 0 1Age 29.94 27.27 1 380Rating 55.96 18.66 0 100Year 2000.98 0.82 2000 2002NegProfit 0.24 0.43 0 1NegEquity 0.04 0.19 0 1LRSC 0.77 0.42 0 1LRC 0.20 0.40 0 1NbEmpl 52.76 106.53 1 989LnNbEmpl 2.83 1.58 0 6.90LnAge 3.10 0.76 0 5.94

Table A.2: Descriptive statistics

202

Page 207: 220714279-Workbook-2012

netherlands mode choice case 203

A.2 Netherlands Mode Choice Case

Context

Nijmegen is a small city in the eastern side of the Netherlands near the bor-der with Germany. The city has typical rail connections with the major citiesin the western metropolitan area called the Randstad (that contains Ams-terdam, Rotterdam and The Hague). Trips from Nijmegen to the Randstadtake approximately two hours by both rail and car. A binary choice modelcan be developed to model the mode choice of travelers for intercity travel.

Data Collection

This dataset was collected by a survey conducted in this corridor during1987 by the Netherlands Railways to assess factors that influence the choicebetween car and rail (see Ben-Akiva and Morikawa, 1990). The sample con-sisted of residents of Nijmegen who:

• made a trip in the previous three months to Amsterdam, Rotterdamor The Hague;

• did not use a yearly rail pass, or other types of pass which wouldeliminate the marginal cost of the trip;

• had the possibility of using a car, namely, possessed a driver’s licenseand had a car available in the household; and

• had the possibility of using rail, namely, did not have any very heavybaggage, were not handicapped, and did not need to visit multipledestinations.

Qualifying residents of Nijmegen were identified in a random telephone sur-vey and requested to participate in a home interview. 235 interviews wereconducted out of the 365 people who were reached by telephone and satis-fied the above criteria. The entire home interview was administered usinglaptop microcomputers, so the respondents replied to the questions on thecomputer screen. The respondents were requested to report the characteris-tics of the above-mentioned trip, and those of a trip to the same destination

203

Page 208: 220714279-Workbook-2012

204 datasets

but with the unchosen mode. So the attribute values of both modes wereprovided by the respondents rather than calculated from network data. Thedata have 228 observations (some observations had to be discarded becauseof inconsistency), each including the following items:

• mode used (rail or car)

• trip purpose

• travel cost (for both chosen mode and unchosen mode)

• in-vehicle travel time (for both chosen mode and unchosen mode)

• access and egress time (for both chosen mode and unchosen mode)

• number of transfers for rail mode

• socio-economic characteristics of the respondent (e.g., age, gender)

Variables and Descriptive Statistics

In addition to the 228 RP observations, all individuals (except two) providedup to nine stated preference (SP) responses to hypothetical changes in net-work attributes. There is a total of 1739 RP and SP observations available.

The variables in this dataset are summarized in Tables A.3, A.4 and A.5 (ifthe type of data is not specified, it means that the variable appears in bothRP and SP).

Note that even though the out-of-vehicle times are obtained from the RP sur-vey, the same values can be used for SP because in the SP survey, respondentsreferred to the trip they reported in the RP survey, and so they would haveconsidered out-of-vehicle time in evaluating the hypothetical alternatives.

In Table A.6, we show the descriptive statistics for some of the variables.Note that for RP specific attributes, the descriptive statistics in Table A.6only concern a subsample of the observations.

204

Page 209: 220714279-Workbook-2012

netherlands mode choice case 205

Name Description Dataid Unique numerical identifier for each subjectrp 1 if the record is an RP choice,

0 otherwisesp 1 if the record is an SP choice,

0 otherwise (note: rp + sp = 1)choice Mode choice (and setting) indicator:

0 for auto in RP context,1 for rail in RP context,10 for auto in SP context,11 for rail in SP context

rp choice Mode choice indicator for the person’s actualchoice:0 for auto,1 for rail (note: rpchoice = choice for RPrecords)

rail ivtt in-vehicle travel time for rail (hours)rail cost Cost (per person) for rail (Guilders)rail transfers Number of transfers for railrp transfers Number of rail transfers in the RP choice

(note: rail transfers = rp transfers for RPrecords)

RP

rail comfort Comfort level for rail in the SP exercises: SP0 = least comfortable,1 = medium comfort,2 = most comfortable;-1 for RP records

Table A.3: Description of variables

205

Page 210: 220714279-Workbook-2012

206 datasets

Name Description Datarp rail ovt Access plus egress time for rail (hours) in the

RP choiceRP

rail acc mode Walk access dummy for rail in the RP choice: RP1 = respondent walked to station,0 = other access mode;-1 for SP records

rail egr mode Walk egress dummy for rail in the RP choice: RP1 = respondent walked from station,0 = other egress mode;-1 for SP records

seat status First class dummy for rail in the RP choice: RP1 = respondent traveled in first class,0 = other class(es);-1 for SP records

car ivtt in-vehicle time for auto (hours)car cost Cost (per person) for auto (Guilders)rp car ovt Out-of-vehicle time (hours) for auto in the

RP choiceRP

car parking fee Free parking dummy for auto in the RPchoice:

RP

1 = traveler can park for free,0 = traveler must pay for parking;-1 for SP records

purpose Business trip dummy:1 = business trip0 = other purposes

Table A.4: Description of variables

206

Page 211: 220714279-Workbook-2012

netherlands mode choice case 207

Name Descriptionarrival time Fixed arrival time dummy:

1 = traveler must arrive at a given time,0 = traveler has flexibility in arrival time

gender Gender dummy:1 = female,0 = male

npersons Number of persons traveling togetherage Age dummy:

1 = 41 or older,0 = 40 or younger

employ status Unemployment dummy:1 = unemployed,0 = employed

mainearn Main earner dummy:1 = main earner in the family,0 otherwise

Table A.5: Description of variables

207

Page 212: 220714279-Workbook-2012

208 datasets

Mean Std. Dev. Minimum Maximum

choice (RP) 0.36 0.48 0 1

choice (SP) 10.27 0.44 10 11

npersons 2.46 1.30 1 6

car ivtt 1.71 0.38 0.75 3.05

car cost 16.52 15.74 0.25 112.5

rail ivtt 2.00 0.49 0.75 4.17

rail cost 31.09 11.79 5.45 93.75

purpose 0.16 0.37 0 1

rail transfers 0.57 0.68 0 3

gender 0.45 0.50 0 1

age 0.33 0.47 0 1

employ status 0.49 0.50 0 1

mainearn 0.48 0.50 0 1

arrival time 0.39 0.49 0 1

rail acc mode 0.25 0.43 0 1

rail egr mode 0.26 0.44 0 1

seat status 0.07 0.26 0 1

car parking fee 0.65 0.48 0 1

rail comfort 0.74 0.64 0 2

rp rail ovt 0.55 0.25 0.08 1.50

rp car ovt 0.09 0.11 0 0.83

Table A.6: Descriptive statistics

208

Page 213: 220714279-Workbook-2012

swissmetro case 209

A.3 Swissmetro Case

This dataset consists of survey data collected on the trains between St.Gallen and Geneva, Switzerland, during March 1998. The respondents pro-vided information in order to analyze the impact of the modal innovationin transportation, represented by the Swissmetro, a revolutionary mag-levunderground system, against the usual transport modes represented by carand train.

Context

Innovation in the market for intercity passenger transportation is a difficultenterprise as the existing modes: private car, coach, rail as well as regionaland long-distance air services continue to innovate in their own right by offer-ing new combinations of speeds, services, prices and technologies. Considerfor example high-speed rail links between the major centers or direct re-gional jet services between smaller countries. The Swissmetro SA in Genevais promoting such an innovation: a mag-lev underground system operating atspeeds up to 500 km/h in partial vacuum connecting the major Swiss conur-bations, in particular along the Mittelland corridor (St. Gallen, Zurich, Bern,Lausanne and Geneva).

Data Collection

The Swissmetro is a true innovation. It is therefore not appropriate tobase forecasts of its impact on observations of existing revealed preferences(RP) data. It is necessary to obtain data from surveys of hypothetical mar-kets/situations, which include the innovation, to assess the impact. Surveydata were collected on rail-based travels, interviewing 470 respondents. Dueto data problems, only 441 are used here. Nine stated choice situations weregenerated for each of 441 respondents, offering three alternatives: rail, Swiss-metro and car (only for car owners).

A similar method for relevant car trips with a household or telephone sur-vey was deemed impractical. The sample was therefore constructed usinglicense plate observations on the motorways in the corridor by means of

209

Page 214: 220714279-Workbook-2012

210 datasets

video recorders. A total of 10529 relevant license plates were recorded dur-ing September 1997. The central Swiss car license agency had agreed to sendup to 10000 owners of these cars a survey-pack. Until April 1998, 9658 let-ters were mailed, of which 1758 were returned. A total of 1070 persons filledin the survey completely and were willing to participate in the second SPsurvey, which was generated using the same approach used for the rail in-terviews. 750 usable SP surveys were returned, from the license-plate basedsurvey.

Variables and Descriptive Statistics

The variables of the dataset are described in Tables A.7 and A.8, and the de-scriptive statistics are summarized in Table A.9. A more detailed descriptionof the data set as well as the data collection procedure is given in Bierlaireet al. (2001).

210

Page 215: 220714279-Workbook-2012

swissmetro case 211

Variable DescriptionGROUP Different groups in the populationSURVEY Survey performed in train (0) or car (1)SP It is fixed to 1 (stated preference survey)ID Respondent identifierPURPOSE Travel purpose. 1: Commuter, 2: Shopping, 3: Busi-

ness, 4: Leisure, 5: Return from work, 6: Return fromshopping, 7: Return from business, 8: Return fromleisure, 9: other

FIRST First class traveler (0 = no, 1 = yes)TICKET Travel ticket. 0: None, 1: Two way with half price card,

2: One way with half price card, 3: Two way normalprice, 4: One way normal price, 5: Half day, 6: Annualseason ticket, 7: Annual season ticket Junior or Senior,8: Free travel after 7pm card, 9: Group ticket, 10: Other

WHO Who pays (0: unknown, 1: self, 2: employer, 3: half-half)

LUGGAGE 0: none, 1: one piece, 3: several piecesAGE It captures the age class of individuals. The age-class

coding scheme is of the type:1: age≤24, 2: 24<age≤39, 3: 39<age≤54, 4: 54<age≤65, 5: 65 <age, 6: not known

MALE Traveler’s Gender 0: female, 1: maleINCOME Traveler’s income per year [thousand CHF]

0 or 1: under 50, 2: between 50 and 100, 3: over 100, 4:unknown

GA Variable capturing the effect of the Swiss annual seasonticket for the rail system and most local public trans-port. It is 1 if the individual owns a GA, zero otherwise.

ORIGIN Travel origin (a number corresponding to a Canton, seeTable A.10)

Table A.7: Description of variables

211

Page 216: 220714279-Workbook-2012

212 datasets

Variable DescriptionDEST Travel destination (a number corresponding to a Can-

ton, see Table A.10)TRAIN AV Train availability dummyCAR AV Car availability dummySM AV SM availability dummyTRAIN TT Train travel time [minutes]. Travel times are door-

to-door making assumptions about car-based distances(1.25*crow-flight distance)

TRAIN CO Train cost [CHF]. If the traveler has a GA, this costequals the cost of the annual ticket.

TRAIN HE Train headway [minutes]Example: If there are two trains per hour, the value ofTRAIN HE is 30.

SM TT SM travel time [minutes] considering the future Swiss-metro speed of 500 km/h

SM CO SM cost [CHF] calculated at the current relevant railfare, without considering GA, multiplied by a fixed fac-tor (1.2) to reflect the higher speed.

SM HE SM headway [minutes]Example: If there are two Swissmetros per hour, thevalue of SM HE is 30.

SM SEATS Seats configuration in the Swissmetro (dummy). Airlineseats (1) or not (0).

CAR TT Car travel time [minutes]CAR CO Car cost [CHF] considering a fixed average cost per kilo-

meter (1.20 CHF/km)CHOICE Choice indicator. 0: unknown, 1: Train, 2: SM, 3: Car

Table A.8: Description of variables

212

Page 217: 220714279-Workbook-2012

swissmetro case 213

Variable Min Max Mean St. Dev.GROUP 2 3 2.63 0.48SURVEY 0 1 0.63 0.48SP 1 1 1.00 0.00ID 1 1192 596.50 344.12PURPOSE 1 9 2.91 1.15FIRST 0 1 0.47 0.50TICKET 1 10 2.89 2.19WHO 0 3 1.49 0.71LUGGAGE 0 3 0.68 0.60AGE 1 6 2.90 1.03MALE 0 1 0.75 0.43INCOME 0 4 2.33 0.94GA 0 1 0.14 0.35ORIGIN 1 25 13.32 10.14DEST 1 26 10.80 9.75TRAIN AV 1 1 1.00 0.00CAR AV 0 1 0.84 0.36SM AV 1 1 1.00 0.00TRAIN TT 31 1049 166.63 77.35TRAIN CO 4 5040 514.34 1088.93TRAIN HE 30 120 70.10 37.43SM TT 8 796 87.47 53.55SM CO 6 6720 670.34 1441.59SM HE 10 30 20.02 8.16SM SEATS 0 1 0.12 0.32CAR TT 0 1560 123.80 88.71CAR CO 0 520 78.74 55.26CHOICE 1 3 2.15 0.63

Table A.9: Descriptive statistics

213

Page 218: 220714279-Workbook-2012

214 datasets

Number Canton1 ZH2 BE3 LU4 UR5 SZ6 OW7 NW8 GL9 ZG10 FR11 SO12 BS13 BL14 Schaffhausen15 AR16 AI17 SG18 GR19 AG20 TH21 TI22 VD23 VS24 NE25 GE26 JU

Table A.10: Coding of Cantons

214

Page 219: 220714279-Workbook-2012

choice of residential telephone services case 215

A.4 Choice of Residential Telephone Services

Case

Context

Local telephone service typically involves the choice between flat (i.e., a fixedmonthly charge for unlimited calls within a specified geographical area) andmeasured (i.e., a reduced fixed monthly charge for a limited number of callsand additional usage charges for additional calls) services. Various flat rateservices differ by the size of the geographical area within which calling isprovided at no extra charge, the monthly charge being higher for larger areas.Measured services differ with respect to the threshold number (or dollarvalue) of calls beyond which the customer is charged. The availability ofeach service may depend on the geographical location within the servicearea.

In developing a model of the residential demand for local telephone service,it is necessary to explicitly account for the inter-relationship between class ofservice choice and usage patterns. For example, expected usage patterns willinfluence the household’s choice of service option since households with highusage levels typically could minimize their monthly bill for local telephoneservice by choosing some sort of flat rate service, while households with rel-atively low usage would be better off with a measured service. Given thata household has chosen a particular service option, usage patterns would bedependent to a certain extent upon the service option that is chosen since itdetermines the marginal price of calls. To accommodate these interrelation-ships, the model representing the household’s choice of calling patterns andservice options needs to include:

1. choice of the service option, which is modeled conditional upon thecalling portfolio chosen by the household;

2. choice of the calling portfolio or the usage pattern as represented bythe number and duration of calls by time of day and calling band.

This case study deals only with the first choice.

215

Page 220: 220714279-Workbook-2012

216 datasets

Data Collection

A household survey was conducted in 1984 for a telephone company among434 households in Pennsylvania. The dataset involves choices among fivecalling plans and consists of various attributes and socio-economic character-istics. It was originally used to develop a model system to predict residentialtelephone demand (Train et al., 1987).

Variables and Descriptive Statistics

In the current application, five types of services are involved: two measuredoptions and three flat options. The availability of these service options variesdepending upon geographic location. Table A.11 below lists the five servicealternatives and their availability within the different service areas. Namesand definitions of the variables are shown in Table A.12. Some descriptivestatistics of the dataset are summarized in Table A.13.

Complications caused by very few respondents choosing alternative4: If you examine the dataset, you see that only 3 of the respondents chosealternative 4 (extended area flat service). This implies that it is not possibleto estimate numerous alternative specific coefficients for alternative 4. Theintuition is that the dataset does not provide enough information on whypeople chose or did not choose alternative 4. If you try to estimate toomany alternative specific coefficients for alternative 4, you get ”Singularityin the Hessian” error, and in order to estimate the model you have to reducethe number of coefficients specific to alternative 4. A practical solution tothis problem is to use an “enriched sample” although such a sample is notavailable here. It is however not recommended to omit the observations forwhich the chosen alternative is 4 or combine alternative 4 with a differentalternative.

216

Page 221: 220714279-Workbook-2012

AvailabilityService option Description metro,

suburban, othersome perimeter perimeter non-metro

areas areas areas1. Budget measured No fixed monthly charge; usage charges ap-

ply to each call made.yes yes yes

2. Standard measured A fixed monthly charge covers up to a spec-ified dollar amount (greater than the fixedcharge) of local calling, after which usagecharges apply to each call made.

yes yes yes

3. Local flat A greater monthly charge that may dependupon residential location; unlimited free call-ing within local calling area; usage chargesapply to calls made outside local calling area.

yes yes yes

4. Extended area flat A further increase in the fixed monthlycharge to permit unlimited free calling withinan extended area.

no yes no

5. Metro area flat The greatest fixed monthly charge that per-mits unlimited free calling within the entiremetropolitan area.

yes yes no

Table A.11: Service options and their availability

Page 222: 220714279-Workbook-2012

218 datasets

Name Descriptionage0 number of household members under age 6age1 number of household members age 6-12age2 number of household members age 13-19age3 number of household members age 20-29age4 number of household members age 30-39age5 number of household members age 40-54age6 number of household members age 55-64age7 number of household members 65 and olderarea location of household residence

1=metro, 2=suburban, 3=perimeter with extended,4=perimeter without extended, 5=non-metro

avail1, avail2,avail3, avail4,avail5

binary indicators of availability of each option.availX=0 if alternative X is not available to the house-hold, availX=1 if alternative X is available to the house-hold

choice chosen alternative (dependent variable)1=budget measured, 2=standard measured, 3=localflat, 4=extended flat, 5=metro flat

cost1, cost2,cost3, cost4,cost5

costX = monthly cost (in $) of alternative X.

employ number of household members employedinc annual household income

1=under $10,000, 2=$10,000-20,000, 3=$20,000-30,000,4=$30,000-40,000, 5=0ver $40,000

ones ones = 1 for all observationsstatus marital status

1=single, 2=married, 3=widowed, 4=divorced, 5=otherusers number of phone users in household

Table A.12: Description of variables

218

Page 223: 220714279-Workbook-2012

choice of residential telephone services case 219

mean max min stand dev rangeage0 0.21 4 0 0.53 4age1 0.23 3 0 0.58 3age2 0.24 4 0 0.67 4age3 0.41 3 0 0.71 3age4 0.44 2 0 0.73 2age5 0.36 2 0 0.67 2age6 0.31 3 0 0.61 3age7 0.38 2 0 0.65 2area 2.93 5 1 1.65 4avail1 1.00 1 1 0.00 0avail2 1.00 1 1 0.00 0avail3 1.00 1 1 0.00 0avail4 0.03 1 0 0.17 1avail5 0.65 1 0 0.48 1choice 2.65 5 1 1.17 4cost1 11.73 433.5 3.28 24.13 430.22cost2 11.49 432.8 5.78 23.90 427.02cost3 14.82 435.5 7.03 23.56 428.47cost4 62.19 433.03 10.48 117.88 422.55cost5 27.48 38.28 23.28 4.17 15employ 1.07 3 0 0.89 3inc 2.53 5 1 1.28 4ones 1.00 1 1 0.00 0status 2.22 5 1 0.91 4users 2.30 6 1 1.28 5

Table A.13: Descriptive Statistics

219

Page 224: 220714279-Workbook-2012

220 datasets

A.5 Airline Itinerary Case

These data come from an Internet choice survey conducted by the BoeingCompany in the Fall of 2004. Boeing was interested in understanding the sen-sitivity that air passengers have toward the attributes of an airline itinerary,such as fare, travel time, transfers, legroom, and aircraft. It was executed ona sample of the customers of an Internet airline booking service. The Internetservice takes a specific user request for travel in a city pair and interrogatesthe web sites of airlines that provide service in that market, returning tothe user a compiled list of available itineraries. While that interrogation istaking place, randomly selected customers were recruited to be surveyed.

A typical page of the survey instrument is shown in Figure A.1. The respon-dent was offered three choices based on the origin-destination market requestthat the respondent entered into the itinerary search engine. The first alter-native is always a non-stop flight, the second always a flight with 1 stop onthe same airline, and the third is always a flight with 1 stop and a changeof airline. The respondent was asked to rank the available choices as well asgiven the option to decline all of the stated options. Demographic data col-lected included age, gender, income, occupation, and education. Situationalvariables that were identified included: a) the desired departure time; b) trippurpose; c) who is paying for the trip; and d) the number in the travel party.All trips were for origin-destination city pairs in the United States.

There are 1633 respondents, each providing 1 SP response. Descriptions ofthe available variables are reported in Tables A.14 to A.17 and some descrip-tive statistics are given in Tables A.18 and A.19.

220

Page 225: 220714279-Workbook-2012

airline itinerary case 221

Figure A.1: Example of Survey Instrument

221

Page 226: 220714279-Workbook-2012

222 datasets

Variable DescriptionSubjectId Unique identifier for each respondent.q17 Gender 1 if male, 2 if female, 99 or -1 if missing.q15 Age Age, (1 = Less than 18 years, 2 = 18-24 years,

3= 25-34 years, 3.5 = 25-44 years, 4 = 35-44years, 5 = 45-54 years, 6 = 55-64 years, 7 =65-74 years, 8 = 75 years or older, 99 or -1 ifmissing)

q19 Occupation Occupation (01 = Executive and Managerial,02 = Professional, 03 = Technicians and re-lated support, 04 = Sales, 05 = Administra-tive support, 06 = Services, 07 = Precisionproduction, craft, repair, 08 = Machine opera-tors, assemblers, inspectors, 09 = Transporta-tion and material moving, 10 = Handlers,cleaners, helpers, 11 = Farming, forestry, andfishing, 12 = Armed forces, 99 or -1 if missing)

q16 Income Annual income in 100$; -1 or 99 if income in-formation is missing

q20 Education Education (01 = Less than High SchoolDiploma, 02 = High School Graduate, 03 =Some college, No Degree, 04 = Associate De-gree - Occupational, 05 = Associate Degree -Academic, 06 = Bachelors Degree, 07 = Mas-ters Degree, 08 = Professional Degree, 09 =Doctorate Degree, 99 or -1 if missing)

q11 DepartureOrArrivalIsImportant Importance of punctuality of departure or ar-rival (1 = departure is important; 2= arrivalis important; otherwise, not important)

Table A.14: Description of Respondent Specific Variables

Variable DescriptionBestAlternative X The chosen alternative is X

Table A.15: Description of Survey Responses

222

Page 227: 220714279-Workbook-2012

airline itinerary case 223

Variable Descriptionq02 TripPurpose Trip purpose (1=business, 2=leisure, 3=at-

tending conference/seminar/training, 4=bothbusiness and leisure, 0=trip purpose missing)

q03 WhoPays 1 if the traveler is paying for the trip, 2 if itis his employer, 3 if it is a third party, 0 ifmissing

q12 IdealDepTime Respondents ideal departure time (hours aftermidnight), -1 indicates a missing value

q14 PartySize Number of persons traveling, -1 and 99 indi-cate missing values

OriginGMT Origin city time zone (minutes from GMT(Greenwich Mean Time))

DestinationGMT Destination city time zone (minutes fromGMT)

Direction Direction of itinerary (1=East to West,2=West to East, 3=North-South, 0=missing)

Table A.16: Description of Trip Specific Attributes

223

Page 228: 220714279-Workbook-2012

224 datasets

Variable DescriptionDepartureTimeHours X Option X: Departure time, local (hours after

midnight)ArrivalTimeHours X Option X: Arrival time, local (hours after mid-

night)FlyingTimeHours X Option X: Total time in air (hours)TripTimeHours X Option X: Total trip time (hours)Legroom X Option X: Legroom , 1 = 2 inches less than

typical, 2 = typical, 3 = 2 inches more thantypical, 4 = 4 inches more than typical

AirlineFirstFlight X Option X: Airline for first leg (only known toarbitrary airline number for proprietary rea-sons)

AirlineSecondFlight X Option X: Airline for second leg (if there existsa second leg) (only known to arbitrary airlinenumber for proprietary reasons)

AirplaneFirstFlight X Option X: Airplane for first leg (only knownto arbitrary airplane number for proprietaryreasons)

AirplaneSecondFlight X Option X: Airplane for second leg (if there ex-ists a second leg) (only known to arbitrary air-plane number for proprietary reasons)

Fare X Option X: Fare ($)

Table A.17: Description of Alternative Specific Attributes where X Corre-sponds to Choice Option (1),(2) and (3)

224

Page 229: 220714279-Workbook-2012

airline itinerary case 225

Variable Average St. Dev. Min MaxSubjectId 1807.50 1043.41 1.00 3613.00q17 Gender 1.46 0.50 1.00 2.00q15 Age 3.95 1.15 1.00 8.00q19 Occupation 2.54 1.90 1.00 12.00q16 Income 8.09 3.53 1.00 14.00q20 Education 5.88 1.71 1.00 9.00q02 TripPurpose 2.04 0.76 1.00 4.00q03 WhoPays 1.20 0.46 1.00 3.00q14 PartySize 1.70 0.99 1.00 5.00OriginGMT 382.18 82.08 300.00 480.00DestinationGMT 397.34 82.87 300.00 480.00Direction 1.59 0.49 1.00 2.00BestAlternative 1 0.69 0.46 0.00 1.00BestAlternative 2 0.16 0.37 0.00 1.00DepartureTimeHours 1 11.72 3.34 6.00 18.00ArrivalTimeHours 1 15.21 3.35 7.67 21.63FlyingTimeHours 1 3.74 1.59 0.67 6.35TripTimeHours 1 3.74 1.59 0.67 6.35Legroom 1 2.46 1.12 1.00 4.00AirlineFirstFlight 1 4.61 2.56 1.00 11.00AirlineSecondFlight 1 0.00 0.00 0.00 0.00AirplaneFirstFlight 1 4.52 2.30 1.00 8.00AirplaneSecondFlight 1 0.00 0.00 0.00 0.00Fare 1 405.66 199.87 80.00 1330.00

Table A.18: Descriptive Statistics of Variables

225

Page 230: 220714279-Workbook-2012

226 datasets

Variable Average St. Dev. Min MaxDepartureTimeHours 2 11.67 3.35 6.00 18.00ArrivalTimeHours 2 16.92 3.36 9.17 24.10FlyingTimeHours 2 4.24 1.59 1.17 6.85TripTimeHours 2 5.50 1.68 1.83 8.85Legroom 2 2.48 1.13 1.00 4.00AirlineFirstFlight 2 4.68 2.65 1.00 11.00AirlineSecondFlight 2 0.00 0.00 0.00 0.00AirplaneFirstFlight 2 4.51 2.29 1.00 8.00AirplaneSecondFlight 2 0.00 0.00 0.00 0.00Fare 2 407.07 200.96 80.00 1390.00DepartureTimeHours 3 11.66 3.34 6.00 18.00ArrivalTimeHours 3 16.89 3.41 9.25 24.03FlyingTimeHours 3 4.24 1.59 1.17 6.85TripTimeHours 3 5.48 1.67 1.92 8.85Legroom 3 2.53 1.13 1.00 4.00AirlineFirstFlight 3 4.65 2.59 1.00 11.00AirlineSecondFlight 3 4.65 2.65 1.00 11.00AirplaneFirstFlight 3 4.50 2.31 1.00 8.00AirplaneSecondFlight 3 4.50 2.28 1.00 8.00Fare 3 405.20 197.68 80.00 1275.00

Table A.19: Descriptive Statistics of Variables

226

Page 231: 220714279-Workbook-2012

facial expressions recognition case 227

A.6 Facial Expressions Recognition Case

”. . . the face is the most extraordinary communicator, capableof accurately signaling emotion in a bare blink of a second, capableof concealing emotion equally well. . . ”

Deborah Blum

These data come from an ongoing Internet survey called the EPFL FacialExpressions Evaluation Survey available at:http://lts5www.epfl.ch/face

The goal is to collect a dataset with observations of a heterogeneous groupof respondents allowing to investigate what human factors play a role in theperception of human expressions. The goal is also to understand what facialparts are important and what are their impact on the expression recognitiontask performed by different people.

During the survey each respondent is asked to associate 23 images of facialexpressions with one out of seven proposed alternatives:

1. happiness;

2. surprise;

3. fear;

4. disgust;

5. sadness;

6. anger; or

7. neutral.

In the beginning of the survey the respondent is also asked to provide thesocio-economic characteristics described in Table A.20.

The images used in the survey come from the Cohn-Kanade database (Kanadeet al., 2000). Examples are shown in Figure A.4. This database consists ofexpression sequences of persons (for clarity called subjects), starting froma neutral expression and ending most of the time in the peak of the facial

227

Page 232: 220714279-Workbook-2012

228 datasets

Figure A.2: List of the AUs related to the 6 primary expressions

expression. The subjects are university students enrolled in introductorypsychology classes. They ranged in age from 18 to 30 years. Subjects wereinstructed by an experimenter to perform a series of 23 facial displays. Six ofthe displays were based on descriptions of prototypic emotions: happiness,anger, fear, disgust, sadness and surprise. There are 104 subjects in thedatabase but only 10 of them (8 women and 2 men) gave the consent forpublications. The subset of the Cohn-Kanade database used in this surveyconsists of the 1274 images of these 10 subjects.

In the field of automated facial expression recognition system, the Facial Ac-tion Coding System (FACS) has become the leading standard for measuringfacial expressions. FACS is a system originally developed by Paul Ekmanand Wallace Friesen (see Ekman and Friesen, 1978). It is a human-observerbased system designed to detect subtle changes in facial features. It definesexpressions as a combination of a subset of the 46 Action Units (AUs), whichcorrespond to contraction or relaxation of one or several muscles. Figure A.2shows the subset of AUs that are related to the six prototypic expressions.In this case study we consider the measures related to mouth and eyes, seeTable A.21 and Figure A.3 for descriptions of the attributes. Some statisticsare reported in Table A.22.

In the following sections we provide an example specification of an logitmodel.

228

Page 233: 220714279-Workbook-2012

facial expressions recognition case 229

Variable DescriptionUserID Unique identifier for each participant.UserLocation Location for the current survey (01 = Home,

02 = Work, 03 = Other)UserGender 1 if male, 0 otherwiseUserBirthDate Age in years, 0 if no informationUserOccupation Occupation (00 = None, 01 = Medical, 02 =

Educational, 03 = Management, 04 = Scien-tific, 05 = Engineering, 06 = Technical, 07 =Rural, 08 = Other)

UserFormation Education (04 = High School, 05 = University,06 = PhD, 07 = Other)

UserEthnic Ethnic (00 = None, 01 = White, 02 = Black,03 = Asian, 04 = Mixed White-Black, 05 =Mixed White-Asian, 06 = Mixed Asian-Black,07 = Other)

UserRegion Continent (00 = None, 01 = Africa, 02 =Antarctica, 03 = Asia, 04 = Australia, 05= Europe, 06 = North America, 07 = SouthAmerica)

UserScienceKW Participant scientific knowledge (00 = None,02 = Behavioral Science, 03 = Social Science,04 = Computer Science, 05 = Cognitive Sci-ence, 06 = Other)

UserLanguage Survey language(01 = French, 02 = English,03 = Italian)

Table A.20: Description of Participant Socio-Economic Variables

229

Page 234: 220714279-Workbook-2012

230 datasets

Figure A.3: Facial measures: width and height of left and right eye andmouth

Variable DescriptionChoice Choice indicator (1 = happiness, 2 = surprise,

3 = fear, 4 = disgust, 5 = sadness, 6 = anger,7 = neutral)

mouth w mouth width, normalized pixel measuremouth h mouth height, normalized pixel measureleye w left eye width, normalized pixel measureleye h left eye height, normalized pixel measurereye w right eye width, normalized pixel measurereye h right eye height, normalized pixel measure

Table A.21: Description of Variables

230

Page 235: 220714279-Workbook-2012

facial expressions recognition case 231

Figure A.4: Examples of images in the database

231

Page 236: 220714279-Workbook-2012

232 datasets

Variable Mean St. Dev. Min MaxChoice 3.84 2.07 1.00 7.00UserID 902.74 484.29 59.00 1713.00UserLocation 1.50 0.58 1.00 3.00UserGender 0.58 .49 0.00 1.00UserBirthDate 1977.23 9.12 1935 2001UserJob 3.89 2.52 0.00 8.00UserEthnic 1.14 1.08 0.00 7.00UserRegion 4.66 1.40 0.00 7.00UserScienceKW 3.48 2.26 0.00 6.00UserLanguage 1.64 0.75 1.00 3.00mouth w 0.14 0.02 0.10 0.20mouth h 0.07 0.04 0.02 0.22leye w 0.07 0.01 0.06 0.09leye h 0.03 0.01 0.02 0.05reye w 0.07 0.01 0.06 0.09reye h 0.04 0.01 0.01 0.06

Table A.22: Descriptive Statistics of Variables

232

Page 237: 220714279-Workbook-2012

facial expressions recognition case 233

Example of Model Specification

In this section we describe a logit model specification, the corresponding Bio-geme model file is called MNL exp fm.mod and the data file expressions.dat.

The deterministic utility functions include alternative specific constants forall alternatives except the neutral which has been selected as referent. Wealso estimate alternative specific coefficients related to eye height (average ofthe left and right eye height), mouth width and mouth height. Note that wecannot include all characteristics in all utilities since their respective valuesare the same for the different alternatives.

Vneutral = 0

Vanger = ASCA + βmouth height Amouth height+

βmouth width Amouth width+

βeyes height Aeyes height

Vdisgust = ASCD + βmouth height Dmouth height+

βmouth width Dmouth width+

βeyes height Deyes height

Vfear = ASCF + βmouth height Fmouth height+

βmouth width Fmouth width+

βeyes height Feyes height

Vhappiness = ASCH + βmouth height Hmouth height+

βmouth width Hmouth width+

βeyes height Heyes height

Vsadness = ASCSA + βmouth height SAmouth height+

βmouth width SAmouth width+

βeyes height SAeyes height

Vsurprise = ASCSU + βmouth height SUmouth height+

βmouth width SUmouth width+

βeyes height SUeyes height

The estimation results are shown in Table A.23. All the explanatory variables

233

Page 238: 220714279-Workbook-2012

234 datasets

Figure A.5: Estimated parameters interpretation: First row Anger and sec-ond row Surprise

have their expected signs. For example, the βeyes height A and βmouth height A

have negative values meaning that the utility of the anger alternative isincreased if the heights of mouth and eyes are smaller than the neutral ex-pression. The coefficients related to the surprise alternative, βmouth height SU

and βmouth width SU, show as expected that the utility for this alternative isincreased when the mouth height is larger and the width smaller than theneutral expression. Images showing the interpretation of these coefficientsare shown in Figure A.5.

234

Page 239: 220714279-Workbook-2012

facial expressions recognition case 235

Logit model estimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCA 3.96 0.897 4.412 ASCD 8.23 0.845 9.743 ASCF -7.36 0.931 -7.904 ASCH -16.9 1.44 -11.735 ASCSA 3.88 0.801 4.846 ASCSU -0.895 0.930 -0.967 βeyes height A -58.3 11.8 -4.938 βeyes height D -146 12.8 -11.469 βeyes height F -25.8 12.0 -2.1510 βeyes height H -53.6 12.3 -4.3511 βeyes height SA -3.93 11.4 -0.3512 βeyes height SU -37.9 11.4 -3.3413 βmouth height A -35.5 9.28 -3.8214 βmouth height D 10.3 4.03 2.5615 βmouth height F 57.0 4.66 12.2316 βmouth height H 17.2 8.48 2.0317 βmouth height SA -46.6 6.91 -6.7418 βmouth height SU 66.7 4.97 13.4119 βmouth width A -5.14 5.89 -0.8720 βmouth width D -24.8 4.54 -5.4621 βmouth width F 29.1 5.52 5.2722 βmouth width H 119 9.42 12.623 βmouth width SA -13.6 5.44 -2.5024 βmouth width SU -18.9 5.77 -3.27

Summary statisticsNumber of observations = 2889

L(0) = −5621.73

L(β) = −3721.22

ρ2 = 0.334

Table A.23: Estimation results for the logit model

235

Page 240: 220714279-Workbook-2012

236 datasets

A.7 Italy Mode Choice Case

Context

Cagliari is the capital of Sardinia, Italy. With a metropolitan area of morethan 450,000 inhabitants, it contains one third of the island’s population.Although Sardinia’s railway system can be described as a suburban service,the last 20 km overlapping the Cagliari Assemini corridor actually passesthrough an urban area. The area under study, to the North of Cagliari, had20,490 inhabitants at the time of the study in 1998, and generated about10,000 trips a day in the corridor of interest. Of these, 75% use car, 20% goby bus, 3% by train and 2% by other modes. More than 80% of the work andother-purpose trips are made by car. To prevent falling demand in the corri-dor – which has dropped to 3% –, the local rail authority decided to upgradethe service into a metropolitan-like commuter train service, increasing notonly speed and frequency, but also the number of stations inside this corridor.In order to analyze the impact of a potential new train system three typesof surveys were conducted: a qualitative survey using focus groups to gain agood understanding of the phenomenon, a revealed preference (RP) surveydescribing current trips, and a stated preference (SP) survey to evaluate theintroduction of radical improvements to the existing alternative.

A description of the data as well as the modeling results are reported inCherchi and Ortuzar, 2002. The RP data concern choice between car, busand train; the SP data consider the binary choice between a new train service(quicker, more frequent, with a lower fare and more stations than the currentone) and the alternative currently chosen by car and bus users.

Data Collection

The data was collected in 1998. First, the RP survey considered data onactual travellers’ trips and socio-economic characteristics in the form of a24-hour travel diary.

Households were randomly selected from the telephone directory and eachmember of the family over the age of 12 was asked to participate. Witha response rate of 83%, a total of 524 responses were obtained yielding atotal of 1840 reported trips. From these trips, only 748 observations actually

236

Page 241: 220714279-Workbook-2012

italy mode choice case 237

referred to the corridor of interest. After testing consistency and validityof the data for mode choice modeling – only people with an actual modalchoice among Car, Bus and Train were considered –, a final sample of 338observations was left for model estimation.

Then, a SP survey was applied to 90% of the 338 individuals already inter-viewed. The SP survey was designed as a stated choice experiment betweena proposed new train service and the mode currently used with customizedattribute levels (a percentage variation of the values declared on the RP sur-vey). The design for bus users considered four variables at three levels each:travel time, cost, frequency and comfort. The design for car users includedthree variables at three levels each: travel time, cost and frequency; whilecomfort was modeled by one two-level variable. Each respondent provided 9choices, according to the situations defined by an appropriate block design.After validation a total of 1396 mixed RP/SP observations were obtained.

Variables and Descriptive Statistics

We use a sub-sample of 1152 observations where the first 318 correspond tothe RP data. The available variables are described in Table A.24 and somedescriptive statistics in Table A.25.

237

Page 242: 220714279-Workbook-2012

238 datasets

Variable Descriptionch Mode Choice. 1: Train RP, 2: Car RP, 3: Bus RP, 4:

Train SP, 5: Car SP, 6: Bus SPav1 Train RP availability dummyav2 Car RP availability dummyav3 Bus RP availability dummyav4 Train SP availability dummyav5 Car SP availability dummyav6 Bus SP availability dummytt t Train travel time [minutes]tt c Car travel time [minutes]tt b Bus travel time [minutes]wt t Train walking time [minutes]wt c Car walking time [minutes]wt b Bus walking time [minutes]c t Train travel cost [Euro]c c Car travel cost [Euro]c b Bus travel cost [Euro]tr t Number of train transferstr b Number of bus transfersfrq t Number of trains in an interval of time (60 min)frq b Number of buses in an interval of time (60 min)cflw t Train low comfort dummycfav t Train average comfort dummycflw b Bus low comfort dummycfav b Bus average comfort dummycar lic Number of cars available in each household divided by

the number of members with driving licensesid Actual survey IDrp RP responses dummysp SP responses dummy

Table A.24: Description of variables

238

Page 243: 220714279-Workbook-2012

italy mode choice case 239

Mean Std. Dev. Minimum Maximum

ch 4.00 1.26 1.00 6.00

av1 0.25 0.44 0.00 1.00

av2 0.18 0.38 0.00 1.00

av3 0.28 0.45 0.00 1.00

av4 0.72 0.45 0.00 1.00

av5 0.37 0.48 0.00 1.00

av6 0.35 0.48 0.00 1.00

tt t 19.52 7.45 0.00 55.00

tt c 12.32 12.49 0.00 80.00

tt b 16.61 15.25 0.00 70.00

wt t 20.03 8.84 0.00 62.00

wt c 1.34 4.31 0.00 40.00

wt b 9.02 9.29 0.00 45.00

c t 1.15 0.72 0.00 2.58

c c 1.13 1.14 0.00 7.18

c b 0.31 0.41 0.00 2.01

tr t 0.49 0.51 0.00 2.00

tr b 0.24 0.44 0.00 2.00

frq t 5.24 2.38 0.00 12.00

frq b 3.56 2.97 0.00 12.00

cflw t 0.12 0.32 0.00 1.00

cfav t 0.53 0.50 0.00 1.00

cflw b 0.57 0.50 0.00 1.00

cfav b 0.06 0.23 0.00 1.00

car lic 0.20 0.36 0.00 1.00

id 311.54 108.53 1.00 420.00

rp 0.28 0.45 0.00 1.00

sp 0.72 0.45 0.00 1.00

Table A.25: Descriptive statistics

239

Page 244: 220714279-Workbook-2012

240 datasets

Example of RP/SP Logit Model Specification

In this section, we describe logit model specifications for the RP data alone,SP data alone, and combined RP/SP data. The corresponding Biogememodel files are called mnl-RP.mod, mnl-SP.mod and mnl-RPSP.mod, respec-tively. The data file is called italy.dat.

The deterministic utility functions for the RP alternatives include traveltimes, walking times, travel costs, frequency for train and bus, number oftransfers for train and bus, and for car we include the variable car lic (ratiobetween number of cars in household and number of members with drivinglicense). We also include constants for all alternatives except bus, which isarbitrarily chosen as a referent.

Vtrain RP = ASCtrain RP + βtttt t+ βwtwt t+ βcc t+ βfrqfrq t+ βtrtr t

Vcar RP = ASCcar RP + βtttt c + βwtwt c + βcc c + βcarliccar lic

Vbus RP = βtttt b + βwtwt b+ βcc b+ βfrqfrq b+ βtrtr b

The estimation results are reported in Table A.26. All the coefficients of theexplanatory variables have their expected signs. Note from the constants thatwhen the remaining deterministic utilities are equal car and bus alternativesare preferred over the train alternative.

We use the same deterministic utility functions for the SP alternatives withthe exceptions that we add the variables related to comfort and remove thecar license variable. From the estimation results reported in Table A.27 wenote that the interpretation of the coefficients of the explanatory variablesremains the same and the new coefficients have intuitive signs.

Vtrain SP = ASCtrain SP + βtttt t + βwtwt t+ βcc t+ βfrqfrq t+ βtrtr t+

βcf1cflw t+ βcf2cfav t

Vcar SP = ASCcar SP + βtttt c + βwtwt c + βcc c + βcarlic

Vbus SP = βtttt b+ βwtwt b + βcc b + βfrqfrq b+ βtrtr b+

βcf1cflw b+ βcf2cfav b

240

Page 245: 220714279-Workbook-2012

italy mode choice case 241

RP Logit Model EstimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCtrain RP -0.997 0.594 -1.682 ASCcar RP -0.276 1.02 -0.273 βc -0.972 0.283 -3.434 βcarlic 5.46 1.64 3.335 βfrq 0.203 0.129 1.586 βtr -0.757 0.497 -1.527 βtt -0.0390 0.0189 -2.068 βwt -0.102 0.0313 -3.26

Summary statisticsNumber of observations = 318

L(0) = −294.22

L(β) = −86.720

ρ2 = 0.678

Table A.26: RP data: Estimation results for a logit model

241

Page 246: 220714279-Workbook-2012

242 datasets

SP Logit EstimationParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCtrain SP -0.756 0.260 -2.912 ASCcar SP 0.0426 0.421 0.103 βc -1.37 0.355 -3.864 βcf1 -2.03 0.278 -7.325 βcf2 -1.07 0.173 -6.166 βfrq 0.226 0.036 6.217 βtr -0.395 0.213 -1.858 βtt -0.0689 0.0152 -4.559 βwt -0.0233 0.0118 -1.97

Summary statisticsNumber of observations = 834

L(0) = −578.085

L(β) = −507.646

ρ2 = 0.106

Table A.27: SP data: Estimation results for a Logit model

242

Page 247: 220714279-Workbook-2012

italy mode choice case 243

Recall that the utility functions for the RP model are defined as

URP = VRP + εRP,

and those of the SP model as

USP = VSP + εSP.

Previously we presented estimation results of the RP and SP models sepa-rately. We now want to perform a joint estimation of both models, that isan RP/SP model, where the coefficients common to both models (βtravelT,βwalkT, βcost and βfrq) are estimated based on both datasets. In order todo so, the variances of error terms must be the same. We therefore assumethat

Var(εRP) = Var(µεSP) = µ2Var(εSP).

The utilities for the RP and SP models can now be rewritten as

URP = VRP + εRP

µUSP = µVSP + µεSP

and the error terms (εRP and µεSP) of both models have the same variance.Assume that VSP

in = βXSPin is a linear in parameter specification. Then µVSP

in =

µβXSPin , where both µ and β are to be estimated.

The estimation results of the combined model are shown in Table A.28. Allthe parameters have the right expected sign but the parameter for scalingthe SP alternatives is not significantly different from one. Accordingly wecannot reject that the RP and the SP data have the same variance.

Example of NL Model Specification

In this section, we describe an example of a NL model specification for the RPdata alone, and for the combined RP/SP data. The corresponding Biogememodel files are called nl-RP.mod, and nl-RPSP.mod, respectively.

We postulate a nested structure for the public transport RP alternatives(train and bus). This correlation structure implies the estimation of thescales parameters describing each nest. The estimation results are shown inTables A.29 and A.30, which also show the additional parameters λcar for

243

Page 248: 220714279-Workbook-2012

244 datasets

Combined RP/SP Logit Model EstimationParameter Parameter Parameter Robust Robust Robustnumber name estimate st. error t stat. 0 t stat. 1

1 ASCtrain RP -1.43 0.437 -3.272 ASCcar RP 0.298 0.837 0.363 ASCtrain SP -0.659 0.288 -2.294 ASCcar SP -0.422 0.526 -0.805 βc -1.13 0.268 -4.226 βcarlic 5.43 1.59 3.427 βcf1 -2.18 0.812 -2.698 βcf2 -1.15 0.462 -2.509 βfrq 0.231 0.0790 2.9310 βtr -0.471 0.245 -1.9211 βtt -0.0644 0.0152 -4.2312 βwt -0.0444 0.0236 -1.8813 µ 0.922 0.348 2.65 -0.22

Summary statisticsNumber of observations = 1152

L(0) = −872.30

L(β) = −599.798

ρ2 = 0.297

Table A.28: Combined RP/SP data: Estimation results for a Logit Model

244

Page 249: 220714279-Workbook-2012

italy mode choice case 245

RP NL estimationParameter Parameter Parameter Standard t tnumber name estimate Error stat. 0 stat. 1

1 ASCtrain RP 0.0165 0.246 0.072 ASCcar RP -0.290 0.831 -0.353 βc -0.738 0.216 -3.424 βcarlic 4.34 1.18 3.695 βfrq 0.103 0.0629 1.636 βtr -0.616 0.315 -1.957 βtt -0.0201 0.0133 -1.528 βwt -0.0578 0.0181 -3.199 λcar 5.65 2.14 2.64 2.1710 λpub 5.65 2.14 2.64 2.17

Summary statisticsNumber of observations = 318

L(0) = −294.22

L(β) = −76.34

ρ2 = 0.707

Table A.29: RP data: Estimation results for a NL model

the only car nest and λpub for the public transport nest. Note that in thecase of the mixed RP/SP estimation we have to include one-alternative nestsin the same way we do for the RP car alternative. The estimation results areshown in Table A.30.

In both cases we can support the specified nested structure based on theresults of the nesting parameters obtained and their t statistic.

Example of Agent Effect Model Specification

Since there are several observations for each individual in the SP survey(panel data), we describe examples of different model specifications account-ing for the intrinsic correlation among observations of a same individual. Thecorresponding Biogeme model files are called mnl-SP agentEffect.mod for aSP logit model, mnl-RPSP agentEffect.mod for a mixed RP/SP logit model

245

Page 250: 220714279-Workbook-2012

246 datasets

Combined RP/SP NL estimationParameter Parameter Parameter Standard t tnumber name estimate Error stat. 0 stat. 1

1 ASCtrain RP -0.192 0.172 -1.122 ASCcar RP 0.0673 0.758 0.093 ASCtrain SP -0.345 0.195 -1.774 ASCcar SP -0.341 0.305 -1.125 βc -0.815 0.217 -3.766 βcarlic 4.47 1.18 3.787 βcf1 -1.42 0.476 -2.998 βcf2 -0.765 0.271 -2.829 βfrq 0.141 0.0459 3.0810 βtr -0.382 0.167 -2.2811 βtt -0.0380 0.0124 -3.0712 βwt -0.0335 0.0119 -2.8213 µ 1.38 0.447 3.09 0.8614 λcar 4.99 1.84 2.71 2.1715 λpub 4.99 1.84 2.71 2.1716 λSP train 4.99 1.84 2.71 2.1717 λSP car 4.99 1.84 2.71 2.1718 λSP bus 4.99 1.84 2.71 2.17

Summary statisticsNumber of observations = 1152

L(0) = −872.30

L(β) = −590.90

ρ2 = 0.302

Table A.30: Combined RP/SP data: Estimation results for a NL model

246

Page 251: 220714279-Workbook-2012

italy mode choice case 247

and nl-RPSP agentEffect.mod for the mixed RP/SP nested logit.

In order to account for an agent effect for the SP repeated choices (eachrespondent provided 9 choices) we consider an additive common randomterm affecting the utilities:

Vtrain RP = ASCtrain RP + βtttt t+ βwtwt t+ βcc t + βfrqfrq t+ βtrtr t+

νpanel

Vcar RP = ASCcar RP + βtttt c+ βwtwt c+ βcc c+ βcarliccar lic

Vbus RP = βtttt b+ βwtwt b+ βcc b+ βfrqfrq b + βtrtr b+ νpanel

Vtrain SP = ASCtrain SP + βtttt t+ βwtwt t + βcc t+ βfrqfrq t+ βtrtr t+

βcf1cflw t+ βcf2cfav t+ νpanel

Vcar SP = ASCcar SP + βtttt c + βwtwt c+ βcc c + βcarliccar lic

Vbus SP = βtttt b+ βwtwt b+ βcc b+ βfrqfrq b + βtrtr b+

βcf1cflw b + βcf2cfav b + νpanel

The term νpanel corresponds to an additional error term which we assume isnormally distributed with a zero mean and a standard deviation σpanel. Itrepresents random taste variation across individuals. The results of using500 draws for simulated maximum likelihood estimation are presented in thefollowing tables, where the estimation for the parameter σpanel is included.We present a logit model estimation with random agent effect (Table A.31), amixed RP/SP logit model estimation with random agent effect (Table A.32)and a mixed RP/SP NL estimation with random agent effect (Table A.33). Ifwe analyze the last model we can see that we are able to obtain a significantparameter for the standard deviation of the random agent effect σpanel (whichis also the case for the other models estimated in this section) and parametersλ validating the nested correlation structure.

247

Page 252: 220714279-Workbook-2012

248 datasets

SP Logit Model Estimation with Agent EffectParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCtrain SP -0.827 0.276 -2.992 ASCcar SP 0.136 0.515 0.263 βc -1.46 0.337 -4.334 βcf1 -2.09 0.336 -6.245 βcf2 -1.13 0.212 -5.326 βfrq 0.236 0.0366 6.457 βtr -0.389 0.302 -1.298 βtt -0.0742 0.0185 -4.029 βwt -0.0191 0.0172 -1.1110 σpanel 0.750 0.143 5.24

Summary statisticsNumber of observations = 834

L(0) = −578.08

L(β) = −501.81

ρ2 = 0.115

Table A.31: SP data: Estimation results for a logit model with agent effect

248

Page 253: 220714279-Workbook-2012

italy mode choice case 249

Combined RP/SP Logit Model Estimation with Agent EffectParameter Parameter Parameter Robust Robustnumber name estimate standard error t statistic

1 ASCtrain RP -1.33 0.716 -1.852 ASCcar RP 0.479 1.04 0.463 ASCtrain SP -0.814 0.607 -1.344 ASCcar SP -0.616 1.12 -0.555 βc -1.34 0.613 -2.196 βcarlic 6.30 3.26 1.947 βcf1 -2.79 2.52 -1.108 βcf2 -1.52 1.46 -1.049 βfrq 0.294 0.233 1.2610 βtr -0.546 0.459 -1.1911 βtt -0.0796 0.0398 -2.0012 βwt -0.0593 0.0633 -0.9413 σpanel 1.07 1.25 0.8614 µ 0.735 0.695 -0.38*

Summary statisticsNumber of observations = 1152

L(0) = −872.300

L(β) = −593.425

ρ2 = 0.304

* t statistic w.r.t. 1

Table A.32: Combined RP/SP data: Estimation results for a logit modelwith agent effect

249

Page 254: 220714279-Workbook-2012

250 datasets

Combined RP/SP NL estimationParameter Parameter Parameter Standard t tnumber name estimate Error stat. stat.

1 ASCtrain RP -0.202 0.172 -1.172 ASCcar RP 0.0602 0.784 0.083 ASCtrain SP -0.330 0.189 -1.744 ASCcar SP -0.339 0.308 -1.105 βc -0.824 0.232 -3.566 βcarlic 4.56 1.23 3.707 βcf1 -1.37 0.471 -2.908 βcf2 -0.748 0.272 -2.759 βfrq 0.139 0.0467 2.9810 βtr -0.395 0.195 -2.0311 βtt -0.0385 0.0130 -2.9612 βwt -0.0338 0.0125 -2.7113 σpanel 0.472 0.187 2.5214 µ 1.49 0.498 2.99 0.9815 λcar 4.97 1.90 2.61 2.0916 λpub 4.97 1.90 2.61 2.0917 λSP train 4.97 1.90 2.61 2.0918 λSP car 4.97 1.90 2.61 2.0919 λSP bus 4.97 1.90 2.61 2.09

Summary statisticsNumber of observations = 1152

L(0) = −872.300

L(β) = −585.647

ρ2 = 0.307

Table A.33: Combined RP/SP data: Estimation results for a NL model withagent effect

250