predicting carrier load cancellation

Predicting Carrier Load Cancellation

by

Ali Al-Habib

Bachelor of Science, Computer Science, King Fahd University of Petroleum and Minerals, 2005

and

Nicolas Favier Gonzalez

Bachelor of Science, Industrial Engineering, University of Cuyo, 2014

Bachelor of Science, Mechanical Engineering, École Nationale d’Ingénieurs de Saint-Étienne, 2013

SUBMITTED TO THE PROGRAM IN SUPPLY CHAIN MANAGEMENT

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF APPLIED SCIENCE IN SUPPLY CHAIN MANAGEMENT

AT THE

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUNE 2018

© 2018 Ali Al-Habib and Nicolas Favier Gonzalez. All rights reserved.

The authors hereby grant to MIT permission to reproduce and to distribute publicly paper and electronic

copies of this capstone document in whole or in part in any medium now known or hereafter created.

Signature of Author: …………………………………………………………………………………………

Ali Al-Habib

Department of Supply Chain Management

May 11, 2018

Signature of Author: …………………………………………………………………………………………


Department of Supply Chain Management

May 11, 2018

Certified by: .............…………………………………………………………………………………………

Dr. Christopher Mejia

Director, MIT SCALE Latin America Network

Capstone Advisor

Accepted by: …………………………………………………………………………………………………

Dr. Yossi Sheffi

Director, Center for Transportation and Logistics

Elisha Grey II Professor of Engineering Systems

Professor, Civil and Environmental Engineering

2

Predicting Carrier Load Cancellation

by

Ali Al-Habib

and


Submitted to the Program in Supply Chain Management

on May 11, 2018 in Partial Fulfilment of the

Requirements for the Degree of Master of Applied Science in Supply Chain Management

ABSTRACT

Truckload cancellations by carriers are causing disruptions in the trucking industry operations. By

extrapolating the findings from the 3PL’s data studied in this research to the whole trucking industry, it is

estimated that 32 million cancellations occur every year. These cancellations result in around $4.6 billion

extra cost. If these cancellations can be predicted, shippers and transportation brokers can avoid loss of

money and resources caused by the required rebooking process. This research explores the key drivers of

loads’ cancellation using historical cancellation patterns. It evaluates the applicability of different predictive

models that were built using three-year data from a third-party logistics provider. These models include

logistic regression, random forest, neural networks and k-nearest neighbors. However, the research focuses

mostly on logistic regression, as it provides more insights of the main drivers of the cancellations. The

resulted models were capable of correctly predicting only 16% of the cancelled loads. In effort to improve

the accuracy of the logistic regression model, tradeoff analysis was developed to study the impact of

adjusting the threshold. The analysis showed that using lower threshold can improve the correctly predicted

cancellations to 42%. However, for every additional cancelled load predicted correctly, around 3

uncancelled loads are predicted as cancelled. As all models gave comparable results, the research concludes

that the available load information and historical cancellation behaviors are not enough to predict future

cancellations. The research concludes by recommending business solutions to be implemented in order to

reduce the probability of cancellations. These solutions include educating carriers on the impact of

cancellation and encouraging them to cancel with longer timeframe when cancellation in inevitable.

Moreover, further research might focus on surveying carriers to identify the root causes of cancellations

and capture details related to these causes.

Capstone Advisor: Dr. Christopher Mejia

Title: Director, MIT SCALE Latin America Network

3

ACKNOWLEDGEMENTS

We would like to thank our advisor, Dr. Christopher Mejia, for his support and guidance throughout the

project. We would also like to thank our writing instructors, Toby Gooley and Pamela Siska, for their help

and feedback throughout the writing process.

We also thank our sponsor company for their trust and continuous support. The support team provided

prompt feedback and guidance that helped us tremendously during the project phases.

Final thanks the SCM 2018 Class for their support throughout the year.

- Ali and Nicolas

I would like to thank my family, especially my parents, for their encouragement and support. Special thanks

also my wife, Rabab, for her unconditional love and support throughout the year. I would also like to thank

my sons, Hasan and Elyas, for bringing joy to my life.

- Ali Al-Habib

I would like to thank my family, friends and my girlfriend for their support and motivation through this

year. I would also like to thank the Fulbright program and Aeropuertos Argentina 2000 organization for

giving me the possibility of living this wonderful experience.

- Nicolas Favier Gonzalez

4

Table of Contents

1. Introduction .............................................................................................................................................. 7 2. Literature Review ................................................................................................................................... 10

2.1. Models Evaluation ......................................................................................................................... 10 2.2. Evaluating Performance ................................................................................................................. 12 2.3. Section Summary ........................................................................................................................... 13

3. Methodology .......................................................................................................................................... 14 3.1. DEFINE ......................................................................................................................................... 14

3.1.1. Process Mapping ..................................................................................................................... 14 3.2. MEASURE .................................................................................................................................... 15

3.2.1. Data Collection ........................................................................................................................ 15 3.2.2. Data Cleaning and Preparation ................................................................................................ 16 3.2.3. Outliers Processing .................................................................................................................. 17 3.2.4. Data Analysis .......................................................................................................................... 18

3.3. ANALYZE ..................................................................................................................................... 18 3.3.1. Correlation and Multi-Collinearity Analysis ........................................................................... 18 3.3.2. Variable Normalization ........................................................................................................... 20 3.3.3. Bootstrap Forest Predictor Screening and Stepwise Regression ............................................. 20

3.4. IMPROVE ..................................................................................................................................... 22 3.4.1. Logistic Regression ................................................................................................................. 22 3.4.2. Model Accuracy Analysis ....................................................................................................... 22 3.4.3. Machine Learning.................................................................................................................... 23

3.5. CONTROL ..................................................................................................................................... 23 3.5.1. Sensitivity Analysis ................................................................................................................. 23

4. Results .................................................................................................................................................... 24 4.1. Available Dataset ........................................................................................................................... 24 4.2. Enriched Dataset ............................................................................................................................ 25 4.3. Unpredictability Testing ................................................................................................................ 26 4.4. Further analysis .............................................................................................................................. 28 4.5. Logistic regression Threshold Sensitivity Analysis ....................................................................... 29

5. Discussion .............................................................................................................................................. 34 5.1. Threshold Reduction ...................................................................................................................... 34 5.2. Further Research ............................................................................................................................ 34 5.3. Business Strategy ........................................................................................................................... 35

6. Conclusion ............................................................................................................................................. 36 7. Reference List ........................................................................................................................................ 37 8. Appendices ............................................................................................................................................. 39

Appendix A: Project Gantt ...................................................................................................................... 39 Appendix B: Cancellation Causes Brainstorming................................................................................... 40 Appendix C: Data Glossary .................................................................................................................... 41 Appendix D: Data Cleaning Process ....................................................................................................... 43 Appendix E: Descriptive Analytics......................................................................................................... 44 Appendix F: Variables Prediction Rankings ........................................................................................... 53

5

List of Figures

Figure 1: Six-Sigma Continuous Improvement DMAIC Cycle .................................................................. 14

Figure 2: Load Process Map. ...................................................................................................................... 15

Figure 3: Example of calculating and aggregating cancellation ratios ....................................................... 17

Figure 4: Forward Stepwise Regression algorithm’s results. ...................................................................... 21

Figure 5: Model Parameters and characteristics ......................................................................................... 22

Figure 6: Threshold tradeoff curves. ........................................................................................................... 30

Figure 7: Relative variation of FN and FP .................................................................................................. 31

Figure 8: Total number of FN and FP loads ............................................................................................... 31

Figure 9: Recall vs Precision ...................................................................................................................... 32

Figure 10: Sensitivity vs Specificity ........................................................................................................... 32

Figure 11: Receiver Operating Characteristic (ROC) Curve ...................................................................... 33

6

List of Tables

Table 1: Weight of Shipments by Transportation Mode ............................................................................... 7

Table 2: Value of Shipments by Transportation Mode ................................................................................. 7

Table 3: Comparing supervised learning algorithms .................................................................................. 10

Table 4: Summary of the cancellations and their impact ............................................................................ 18

Table 5: Logistic regression confusion matrix on the available dataset ..................................................... 24

Table 6: Machine learning models’ results on the available dataset ........................................................... 25

Table 7: Logistic regression confusion matrix on the enriched dataset ...................................................... 25

Table 8: Machine learning models’ results on the enriched dataset ........................................................... 25

Table 9: Logistic regression confusion matrix on the new dataset ............................................................. 26

Table 10: Machine learning models results on the new dataset .................................................................. 26

Table 11: Confusion matrix for loads with at least 10 previous records..................................................... 27

Table 12: Confusion matrix for loads that happened on the next seven days ............................................. 27

Table 13: Additional analyses results ......................................................................................................... 29

7

1. Introduction

Truckload represents the largest transportation mode in the United States. It accounts for around

75% of the total volume and value of shipments transported every year. Table 1 and Table 2 show the

distribution of shipments among transportation modes:

Table 1: Weight of Shipments by Transportation Mode : 2007, 2013, and 2040 (millions of tons)

Reprinted from Freight Facts and Figures, by U.S. Department of Transportation Bureau of Transportation Statistics 2015.

Table 2: Value of Shipments by Transportation Mode : 2007, 2013, and 2040 (billions of 2007 dollars)

Reprinted from Freight Facts and Figures, by U.S. Department of Transportation Bureau of Transportation Statistics 2015.

Of the whole trucking industry, approximately 48% of the industry is represented by Full Truck

Load (FTL), 11% is represented Less Than Truck Load (LTL) and the remaining 41% is represented by

private or dedicated fleets (AT Kearney, 2016). With a total weight of 13.9 billion tons transported via

trucks and an average of 36 tons per truckload (U.S. Department of Transportation Federal Highway

Administration, 2016), it is estimated that around 185 million contracted full truckloads are transported

each year in the United States.

The trucking industry is highly fragmented in the U.S. with many owner-operator carriers.

Approximately 90% of the carriers operate six or fewer trucks (American Trucking Associations, 2017).

This fact adds to the operational complexity of the industry and makes it difficult to organize and predict.

Due to this context, shippers and third-party logistics providers in the industry face the challenge of carrier

load cancellation.

A third-party logistics (3PL) provider company sponsored this project to assess the ability to predict

cancellations of future loads. The company provides truckload, less-than-truckload, and intermodal

8

brokerage services and transportation management services to more than 14,000 shippers, from Fortune

100 companies to small businesses. It identifies the right equipment for customers’ freight from a growing

network of more than 40,000 prequalified local, regional, and national carriers. It acts as the middleman

that connects shippers with carriers to facilitate moving shippers’ loads from their origins to destinations.

3PL companies receive compensation by means of commission once the load is moved successfully.

Based on an initial analysis of the studied dataset, approximately 17% of confirmed loads gets

cancelled (also known as bounced) by carriers. On average, each cancellation is estimated to result in $145

of extra cost, which is caused by the higher costs of the rebooked loads with new carriers.

There is not enough information regarding the impact of truckload cancellations at the overall

trucking industry level. However, if the same cancellations metrics, implied by this analysis, are

extrapolated to the whole industry, interesting results can be observed. By extrapolating the calculated 17%

cancellation ratio, it can be estimated that approximately 32 million loads are cancelled per year. These

cancellations translate into extra costs of around $4.6 billion each year. Therefore, the capacity to predict

these cancellations and plan accordingly can be valuable to companies in this industry due to the potential

cost savings.

A big advantage of the trucking industry is the availability of operational data. 3PLs usually use

information systems where all the bookings and loads information is stored. These systems also keep

records of all the carriers and shippers, which provides a proficient level of details of past loads. Predictive

analytics can be used to process this data and generate insights into future trends. Knowledge created from

this type of analytics can help in developing better plans or avoiding potential risks (Olson & Wu, 2017).

In the context of carrier load cancellations, transactional data can be used as the base of predictive

analysis to identify which loads are more likely to be cancelled based on their characteristics. Having this

information, before the cancellation occurs, can be very valuable. This information will allow the company

to make better decisions when choosing carriers, or to develop a mitigation plan in case a cancellation

happens. This can potentially translate into cost reduction and productivity increase for the company.

Nowadays, there are many explanatory models that can predict an event based on historical data

patterns. The objective of this project is to explore the key characteristics that drive load cancellations. The

project will measure the impact of loads’, shippers’ and carriers’ characteristics on load cancellations and

provide insights into cancellation patterns. To achieve this, a logistic regression model will be applied to

evaluate the significance of the different load attributes in affecting the carriers’ cancellations. Other

machine learning models will also be evaluated to validate the results obtained by logistic regression model.

This project will exploit the available dataset and identify the most significant characteristics that

provide insights into cancellations. The result of the project will help the company in making business

decisions and will provide insights on focus areas of future research. Although the data provided by the

9

company covers considerable number of features, other details that might impact load cancellations were

not studied in this project. This is due to limitations in the ability to access external information such as

carriers’ schedules and loads that are not managed by the company.

This report starts by reviewing available literature related to predictive modeling and approaches

followed in similar problems. Then, the methodology section explains the steps followed in the data

analysis, features evaluation, and predictive models’ development. Models’ performance is then presented

and analyzed in the results section. Finally, potential future research and next steps are mentioned in the

discussion and conclusion sections.

10

2. Literature Review

Different predictive models can be used to predict an outcome of a dependent variable based on a

set of independent variables. These models fit different contexts and solve various types of problems. Each

model has pros and cons that need to be evaluated. Accordingly, literature review was done to identify the

models that are suitable for predicting carrier load cancellations and the appropriate approaches for

evaluating models’ performance.

2.1. Models Evaluation

Multiple predictive models are commonly used to solve similar kind of problems. Each model has

its own pros and cons based on the dataset, it is applied to, and the model’s objective. Model selection

depends mainly on the studied dataset and the problem characteristics. Table 3 shows a brief comparison

among the different models.

Table 3: Comparing supervised learning algorithms

Reprinted from Comparing supervised learning algorithms, by Kevin Markham, 2015.

Simple and multi-linear regression models are used to analyze correlation among a list of

independent variables in order to predict a dependent variable using linear relationship. These models are

usually easy to understand and relate to the real world. Although many data may come in a nonlinear form,

linear regression can still be used by transforming nonlinear to linearly related data. Regression models can

include as many independent variables as needed. However, correlation and multi-collinearity among

variables must be assessed to avoid information overlap among independent variables. In a multiple-linear

regression model, there should be low correlation among independent variables to avoid information

11

overlapping, and high correlation between independent and dependent variable to justify their prediction

strength (Olson & Wu, 2017).

Logistic regression is an extension of the regression models. It provides an approximation to predict

the likelihood of a specific event by using an underlying regression model (de Menezes, Liska, Cirillo &

Vivanco, 2017). While linear regression models work well with continuous dependent variables, they do

not provide plausible estimation for categorical output. On the other hand, logistic regression can transform

the output of the underlying regression model into categorical output through the probability of acceptance

formula (Olson & Wu, 2017).

Other prediction models, that have also been used to solve similar problems, are machine learning

algorithms. These algorithms use historical data to predict continuous, categorical or binary future results

(Kotsiantis, Zaharakis & Pintelasstates, 2007). There are several types of machine learning algorithms, such

as Neural Networks, Decision Trees and Rule Induction (Olson & Wu, 2017).

An MIT SCM 2017 thesis “Predicting On-time Delivery in the trucking industry” by Alcoba and

Ohlund (2017) addressed a similar problem, but with a different objective. It addressed the challenge of

predicting truckload on-time delivery based on certain characteristics. In theory, both problems face

comparable objectives, which are predicting the likelihood of a binary outputs. Also, both problems require

a predictive model that is based on a list of known load characteristics (independent variables) to predict

the probability of an event (dependent variable). The approach Alcoba and Ohlund followed for solving the

on-time delivery prediction was through creating a logistic regression model.

Beside the models’ accuracy, model interpretability is an important aspect to consider while

selecting a model. Regression models are self-explanatory in which the assigned coefficients represent the

direction and magnitude of the impact of each independent variable. On the other hand, machine learning

algorithms such as Neural Networks are “black box” models and very complex to analyze (Olson & Wu,

2017). Accordingly, logistic regression was selected as the main model to predict load cancellations.

To validate the output of the logistic regression model, the same datasets were used to build other

machine learning models. As wide range of models are available, three models were selected based on their

applicability to the load cancellation prediction. Moreover, the underlying functions of the models were

also considered to cover various machine learning techniques. The selected models are:

• Neural Networks models are represented by layers of nodes that are connected by arcs with

assigned weights. The architecture of these models allows learning through feedback loops by

comparing nodes output to the target values and adjusting the weights accordingly. These

models are complex; however, they tend to perform well in problems with high level of

unpredictability (Olson & Wu, 2017).

12

• k-Nearest Neighbor models are instance-based learning models that use the similarities in

records’ properties to classify the unclassified records. This algorithm locates the K (number

of instances) closest classified instances (in terms of similar properties) to determine the class

of unclassified instance. These models have been used in real applications and demonstrated

high level of stability compared to other machine learning models (Kotsiantis et al., 2007).

• Random Forest models are collections of unpruned decisions trees. Decision trees are

classification models that are based on association rules. These models determine the important

variables based on their ability to predict outcomes. Random forest and decision trees models

are widely applied in business data mining problems and provide reliable outcomes (Olson &

Wu, 2017).

2.2. Evaluating Performance

Once a suitable technique is defined to predict truckload cancellations, it is important to test the

model robustness with a new dataset. Model accuracy will be assessed by the expected significance level

to decide its fit for predicting load cancellation. If the model accuracy does not meet the expected level, the

model will be reformulated by introducing new features to the dataset.

The selection of an algorithm over another can be based on multiple metrics such as prediction

accuracy, recall, precision, sensitivity and specificity. Prediction accuracy (percentage of correct

predictions over total predictions) and sensitivity (percentage of true positive over total positives) reflect

the overall model performance and its applicability to the problem. Accordingly, these two metrics will be

used to compare models’ results. To obtain an unbiased evaluation of the model accuracy, a training dataset

should be used to build the models while a different set (i.e., test dataset) should be used to evaluate their

accuracy (Kotsiantis et al., 2007).

Confusion Matrices will be used to demonstrate the models’ performance. A confusion matrix for

two classes consists of four possible outcomes. Given a classification model (i.e., classifier) and the

instances (i.e., test data), Confusion Matrix provides the following outcomes (Fawcett, 2005):

• True Positive: Reflects that instance is positive and predicted positive

• True Negative: Reflects that instance is negative and predicted negative

• False Positive: Reflects that instance in negative and predicted positive

• False Negative: Reflects that instance is positive and predicted negative

This matrix allows determining the model precision and accuracy. It also allows identifying if the

model is good enough for the expected significance level.

13

2.3. Section Summary

This literature review analyzed various methods that can be used to formulate a predictive model

for predicting the probability of an event based on known facts. It also mentioned the approaches for testing

the different models and identifying the most suitable one for the desired objective.

Although predictive analytics have been previously applied to the trucking industry, the literature

body did not apply these predictive models in order to predict truckload cancellations. This research will

analyze industry data and apply logistic regression model to understand the key drivers of cancellations in

the trucking industry.

14

3. Methodology

This section explains the steps and methods that were used to develop the model for predicting

carrier load cancellations based on loads’ characteristics. Six-Sigma continuous improvement cycle

(Define, Measure, Analyze, Implement, Control sown in Figure 1) was followed as a framework to structure

the multiple phases of the project (Pillet, 2004).

Figure 1: Six-Sigma Continuous Improvement DMAIC Cycle (Source: iancos.wordpress.com/)

Based on this DMAIC cycle, a detailed action plan was elaborated to set the roadmap for the project

(refer to Appendix A: Project Gantt for details). Throughout the section, details are explained about load’s

lifecycle, data cleaning process, variables selection and model development. Finally, this section will cover

the approach used to test and validate the models’ accuracy as well as the process followed for the sensitivity

analysis.

3.1. DEFINE

3.1.1. Process Mapping

The first step of the research was a site visit to the sponsor company’s headquarter. The main

objective of this visit was to understand the company’s process details. Based on the information collected

during the visit, a process map was created detailing the load creation and execution processes.

15

Figure 2: Load Process Map. This figure shows the different stages through which a load moves from the booking until the delivery.

Figure 2 shows the typical loads’ lifecycle for 3PL operations, which gives a general perspective

of how the process works.

A brainstorming session was conducted during the visit with the sponsor company’s team to

identify all possible variables that could impact load cancellations (refer to Appendix B: Cancellation

Causes Brainstorming for details about the full list of variables). The session resulted in a list of 48 different

variables categorized into four categories (Load Impact, Shipper Impact, Carrier Impact and Other

Impacts). Based on this list, the sponsor company committed to provide three-year operational data to be

used in the analysis.

3.2. MEASURE

3.2.1. Data Collection

The sponsor company provided stop-level data on their operations during 2015, 2016 and part of

2017. Stop-level data gives information about each stop, where a certain load goes from the first pickup

point to the last drop point. This means that each load is represented in the data by at least two stops (one

pickup and one drop). However, many loads are represented by multiple stops, as they go through multiple

pickup or drop points. A preliminary revision of the received data was done, and a glossary of all variables

was created (refer to Appendix C: Data Glossary for detailed information about the dataset). The data

glossary was validated with the sponsor company’s team to verify the inclusion of all desired variables and

to confirm the meaning of each variable. Further meetings were carried out to validate assumptions

regarding outliers and missing data. The data received consisted of:

• Total number of records = 10,706,177

• Number of unique loads = 3,629,543

16

3.2.2. Data Cleaning and Preparation

Once all the questions about the data were clarified, the data preparation and cleaning process

started. Data cleaning consisted of eliminating records with invalid information and replacing others with

validated assumptions. The cleanup steps were required to insure data consistency and to remove outliers

that can create undesired impact on the analysis. For details regarding the cleaning process refer to

Appendix D: Data Cleaning Process.

The provided dataset, as mentioned earlier, was at stop-level, which means that each load is

represented by more than one record. However, for the purpose of the analysis and for predicting the

probability of future load cancellations, the data were transformed to load-level (one record per load

cancelled or shipped).

Once this process was performed, the total number of loads that were included in the load-level

data was 3,997,488. This number includes all the completed and cancelled loads.

To transform the data from a stop-level to a load-level, some variables were aggregated (like

weights and pallets) and new variables were created to describe the trip information contained in the stop-

level data. A set of different cancellation ratios were created for some of the variables. Cancellation ratio is

the percentage of cancelled loads to all loads booked. Combined ratios were also created to give a more

detailed insights into cancellation behavior. Figure 3 illustrates an example of how these ratios were

calculated and aggregated to load-level for a given load.

17

Figure 3: Example of calculating and aggregating cancellation ratios to load-level data for carrier and city combination

Moreover, all categorical variables were turned into numerical or dummy variables to be used in

the different models.

3.2.3. Outliers Processing

After the data cleaning process, a statistical analysis was developed to identify the shape and

distribution of the continuous variables. Certain outliers were identified and were further analyzed using a

box and whiskers plot (Potter, 2006). A set of boundaries was defined to remove records with outliers from

18

the dataset. These boundaries were validated by the sponsor company’s team to match the possible ranges

of values in the real operations:

• Miles [0 – 3,500]

• Cost [0 – 12,000]

• Dead head miles [0 – 3,500]

• Book to load days [0 – 21 days]

• Book to Pick Up hours [0 – 504 hours]

• Load duration [0 – 14 days]

3.2.4. Data Analysis

An exhaustive descriptive data analysis was carried out to provide a general overview of the current

situation. The objective of the analysis was to assess the size of the load cancellations and provide insights

on possible drivers for these cancellations in the data. Table 4 summarizes the total number of loads,

cancellation ratios and over costs for cancellations through the studied three years dataset:

Table 4: Summary of the cancellations and their impact during the last three years on the 3PL company

Year Total

Loads

Cancelled

Loads

Cancellation

Ratio

Bounces Cost

(Actual – Cancelled)

2015 1,049,936 194,070 18.5% $21,293,232

2016 1,325,797 235,914 17.8% $31,060,148

2017 1,245,634 204,479 16.4% $38,926,148

Total 3,621,367 634,463 17.5% $91,279,528

According to data represented in Table 4, 17.5% of booked loads are cancelled. The estimated cost

of each cancellation is $145 with a standard deviation of $480 and an interquartile range of 200 (Q1=$0,

median=$50 and Q3=$200). For more descriptive analytics, refer to Appendix E: Descriptive Analytics.

3.3. ANALYZE

3.3.1. Correlation and Multi-Collinearity Analysis

Once the data were cleaned and prepared a total of 64 attributes were obtained. From these

attributes, 12 attributes were excluded as they were not useful for the models’ building process. The

excluded attributes included IDs, dates and attributes that are only known after the cancellation event.

Information presented by date attributes were rolled up and reflected in durations, while high cardinality

attributes (i.e., IDs) were removed and replaced by cancellation (bounce) ratios (Shipper Bounce Ratio and

Carrier Bounce Ratio).

19

Moreover, attributes that were identified as non-reliable due to the way they were collected were

also removed. The removed attributes are:

• Load ID

• Load Date

• Customer ID

• Carrier ID

• Bounced Date Time

• Book by Date Time

• Empty Time

• Expected Empty Location

• First Pick Up Date Time

• Max Pay

• Actual Load Cost

• First Load Moved On

The remaining 52 variables were further analyzed and assessed to be used in the model. To avoid

overfitting and having a very complex model different analysis were carried out to identify the best features

to predict load cancellations. First, a simple correlation analysis was used to identify highly correlated

(positive or negative) variable pairs. Highly correlated variables are likely to affect the model outcomes by

increasing the risk of over-fitting (Thompson, 2009) and therefore one of the variables should be excluded.

Considering high correlation absolute values greater than 0.5, the following variables were

identified as highly correlated:

• Miles, Cost, Market Cost, Load Duration

• Book to Load Days, Book to Pick up Hours

• Zip Bounce Ratio, Zip3 Bounce Ratio, City Bounce Ratio, Shipper Bounce Ratio

• Carrier Bounce Ratio, Carrier Equipment Type Bounce Ratio

• Shipper Carrier Bounce Ratio, Carrier City Bounce Ratio

• Carrier City Bounce Ratio, Carrier Load Day Bounce Ratio, Carrier Pickup Time Bounce Ratio

• Equipment Drivers, Equipment Power Units

• Carrier Equipment Type Bounce Ratio, Carrier CS Bounce Ratio

• Load Week Day, Load Over Weekend

20

Based on these results the following variables were discarded:

• Cost, Market Cost and Load Duration

• Book to Load Days

• Zip Bounce Ratio, Zip3 Bounce Ratio and City Bounce Ratio

• Carrier Bounce Ratio

• Shipper Carrier Bounce Ratio

• Carrier Load Day Bounce Ratio and Carrier Pick up Time Bounce Ratio

• Equipment Drivers

• Carrier CS Bounce Ratio

• Load Over Weekend

Once the highly correlated attributes were excluded, a multicollinearity analysis was done.

Variance Inflation Factor (VIF) analysis was used to identify and eliminate potential multicollinearity

among three or more variables. This analysis identifies multicollinearity among the variables that would

unnecessarily inflate the standard error of the model’s coefficients (Akinwande, 2015). However, the result

of the VIF analysis on the dataset did not identify any additional correlations that were not captured by the

simple correlation analysis.

The outcome of the two correlation analyses led to a reduction of 14 features from the original

dataset (52 to 38 features).

3.3.2. Variable Normalization

Numerical load characteristics were normalized to avoid the possible negative impact of combining

big and small numbers on the logistic regression model (Kutner, Nachtsheim & Neter, 2004). Normalization

process involved transforming the distribution of the numerical attributes to a standard normal distribution

(i.e., mean of zero and standard deviation of one) to have smaller ranges. However, no significant difference

was found in the model after comparing the preliminary results of the normalized and non-normalized

variables. Therefore, the simplest approach was taken by using the non-normalized values to reduce the

model complexity.

3.3.3. Bootstrap Forest Predictor Screening and Stepwise Regression

After finalizing the set of attributes to be used in the model, the next step was to identify the features

that best describe the probability of cancellation. Predictor Screening algorithm, which is based on Random

Forest models, is commonly used to rank features based on their predictive power. The result of this

algorithm ranks the attributes based on their contribution to predict the desired outcome (Biau, 2012). This

21

algorithm was applied to the loads dataset and the ranking was obtained accordingly (refer to Appendix F:

Variables Prediction Rankings for detailed results).

In order to identify the optimal combination of features to be included in a regression model,

forward stepwise regression algorithm was used. This algorithm is widely used to identify the best subset

of features to be used in a regression model. It creates a sequence of regression models by adding or

removing features and measure their impact on the adjusted R2 (Kutner et al., 2004).

Tradeoff between model improvement and complexity was assessed in order to select the features

to be included in the model. According to results of the forward stepwise regression (Figure 4),

contributions of features six onward to the model accuracy are very low. Adding these features to the model

will increase its complexity without considerable improvement in the prediction power. Accordingly, only

the top five attributes were used in the model:

• Carrier City Bounce Ratio

• Book to Pick up Hours

• Carrier Equipment Type Bounce Ratio

• Contract-Spot

• Shipper Bounce Ratio

Figure 4: Forward Stepwise Regression algorithm’s results.

22

3.4. IMPROVE

3.4.1. Logistic Regression

Once the best features to be used for predicting load cancellations were identified, the Logistic

Regression model was built. First, all the categorical features were transformed into dummies variables to

be used in a linear regression model. This step involved creating a new binary variable for each value in the

categorical attributes, which is necessary for regression models. Then, the dataset was split into training

and testing samples following a random selection. Using rule of thumb, data were split as follow:

• Training set: 80% of the dataset

• Testing set: 20% of the dataset

The training set was used to build the logistic regression algorithm and create the predictive model

(de Menezes et al., 2017). Figure 5 shows the model parameters obtained, which reflects that all parameters

are statistically significant at 1% (which reflects a confidence interval of 99%).

Figure 5: Model Parameters and characteristics

3.4.2. Model Accuracy Analysis

As mentioned, the training set was used to build the model and obtain its parameters. The testing

set was used to validate the model performance with new and unknown dataset that was not used in the

building process. Confusion matrices were created to compare how accurate the model can predict

cancellations in the testing set (Fawcett, 2005).

23

Finally, additional three months of unlabeled data were requested. The data were used to test the

model under unknown circumstances to identify whether the model was good enough to predict load

cancellations over new loads.

3.4.3. Machine Learning

A series of machine learning algorithms were also applied to the dataset to validate the results

obtained with the Logistic Regression model. As referenced in the literature review section, the used

machine learning models were:

• Neural Networks

• Random Forest

• K-Nearest Neighbors

The results obtained by these models were compared to the results obtained by the Logistic

Regression model to verify the robustness of the regression model. All models were giving equivalent

results under the same assumptions. Detailed results are discussed further in the results section.

3.5. CONTROL

3.5.1. Sensitivity Analysis

The last step of the analysis was developing a sensitivity analysis that allowed testing the model

under different assumptions and circumstances. Logistic regression threshold analysis was conducted to

measure the impact of the threshold selection on the model robustness. This analysis covered the model’s

accuracy, specificity, sensitivity, precision, and recall.

24

4. Results

Multiple models were developed to predict the probability of load cancellation using the available

dataset. Logistic regression and machine learning models were initially based on the attributes available in

the original dataset along with simple calculated fields to assess their predictive power. The dataset was

later enriched by introducing fields that represent cancellation behaviors of carriers and by linking new data

sources. This section presents the results obtained by the predictive models on each dataset and the approach

followed to confirm the results.

4.1. Available Dataset

A Logistic regression model was initially developed using the load-level data, which was

aggregated from the provided stop-level data (for details about the list of variables refer to Appendix F:

Variables Prediction Rankings). Simple calculated fields were added to the dataset (like load weekday and

month, load duration, days and hours between booking and load pickup) to convert date fields into

numerical variables that can be used in the model. The model results reflected that the available attributes

did not have enough prediction power to predict carrier load cancellation. Table 5 shows the confusion

matrix of the logistic regression model which reflects that only 1.5% of the cancellations were predicted

correctly in this model.

Table 5: Logistic regression confusion matrix on the available dataset

Predictions No Yes

Act

ual

No 652,501 2,956 655,457

Yes 129,727 1,971 131,698

782,228 4,927 787,155

Error 16.86%

Missed Bounces 98.50%

Machine learning models were also developed to confirm the results obtained by the logistic

regression model. Neural Networks, Random Forest, and K-Nearest Neighbor algorithms were applied to

the same dataset to evaluate their performances and compare them to the logistic regression performance.

It was noted that K-Nearest Neighbor predicted cancellations more accurately than the other models, but

the overall accuracy was lower. This can be explained by a tradeoff between True Positives and False

Positives, which caused this performance difference. However, when compared to a random prediction

results, all algorithms performed poorly in predicting carrier load cancellation. Results of the machine

learning algorithms are summarized in Table 6.

25

Table 6: Machine learning models’ results on the available dataset

Error % Missed

Bounces

Neural Networks 16.73% 99.95%

Random Forest 16.61% 99.48%

K-Nearest Neighbor 19.90% 84.44%

4.2. Enriched Dataset

As all variables in the provided dataset were not good predictors for load cancellation, an effort was

exerted to enrich the dataset with more information (for details about the list of variables refer to Appendix

F: Variables Prediction Rankings). A focus on historical bounce ratios for selected characteristics was

emphasized to test if these ratios would provide better results in predicting cancellations. A bounce ratio is

the percentage of cancelled loads of the total loads for that characteristic.

Bounce ratios were calculated for single characteristics (like carriers, shippers, city and state) as

well as combined characteristics (like carrier-city, carrier-contract type and carrier-equipment type). These

ratios were obtained using the three-year dataset and added as additional variables to the load information.

Moreover, weather alerts information was obtained from National Centers for Environmental Information

(NCEI) and added to the data to test the impact of weather severity on cancellation decisions.

The three-years dataset was split into modeling and testing parts, which allow develop the logistic

regression model and testing its performance using different datasets. Initially, carrier-city bounce ratio

appeared to be a good predictor for load cancellation and the model was able to accurately predict more

than 60% of the cancelled loads. Table 7 shows the confusion matrix for the new logistic regression model.

Table 7: Logistic regression confusion matrix on the enriched dataset

Predictions No Yes

Act

ual

No 638,652 16,880 655,532

Yes 52,155 79,468 131,623

690,807 96,348 787,155

Error 8.77%


Similar improvements were also achieved in the cancellations prediction using the other machine

learning models that were tested earlier, as shown in Table 8.

Table 8: Machine learning models’ results on the enriched dataset

Error % Missed

Bounces




26

However, a careful interpretation of this improvement had to be made, as the most significant

variables in these models were the newly introduced bounce ratios. These bounce ratios were calculated

over the whole three-year dataset. This might possibly have created some bias for the available dataset, as

the actual bounces were already incorporated into those ratios. This possible effect mandated testing the

model over a new dataset that was not incorporated into the models training or in the ratios calculation.

A new dataset was provided covering three months of loads, which was used to test the model’s

performance. Prediction capability of the model dropped significantly (as shown in Table 9) when tested

on the new data, which confirmed the initial concern of the bounce ratios bias. The obtained results reflected

that neither load characteristics nor historical cancellation patterns provide sufficient information to

correctly predict the cancellation probability of future loads.

Table 9: Logistic regression confusion matrix on the new dataset

Predictions No Yes

Act

ual

No 59,883 3,735 63,618

Yes 8,903 1,722 10,625

68,786 5,457 74,243

Error 17.02%


The same approach was followed to confirm the results using machine learning algorithms. The

same algorithms used earlier (Neural Networks, Random Forest, and K-Nearest Neighbor) were trained

using the previous three-year data and tested against the new three-month data. All models performed

poorly and did not provide sufficient accuracy to be considered (as shown in Table 10). The similarity of

results among the different models led to the conclusion that the drivers of carrier load cancellations are not

explained by all the existing or added data to the loads information.

Table 10: Machine learning models results on the new dataset

Error % Missed

Bounces




4.3. Unpredictability Testing

The results obtained by all the models suggested that available data are not sufficient to predict

carrier load cancellations. It also reflected that none of the machine learning models outperformed the

logistic regression model. Accordingly, further tests were carried-on based on the obtained logistic

regression model due to its interpretability and ease of development.

27

Further testing was conducted to confirm the conclusion reached from the models’ results. Two

main hypotheses were developed and tested to confirm the unpredictability conclusion.

1. Prediction accuracy will improve if the model is only applied on loads with enough historical

data. As the main predictors in the developed model were the calculated bounce ratios, the

model is not expected to perform well when no previous ratios exist. Accordingly, the new

dataset was filtered to only include the loads with at least 10 previous records to make a ratio.

Model was tested using the filtered data, and the prediction accuracy was not within an

acceptable range as well (Table 11), as it could only predict less than 20% of the cancellations.

Table 11: Confusion matrix for loads with at least 10 previous records

Predictions No Yes

Act

ual

No 21,449 368 21,817

Yes 2,222 542 2,764

23,671 910 24,581

Error 10.54%


2. Prediction accuracy will improve if the model is only applied on shorter time horizons. Since

cancellation ratios are dynamic and change over time, the data were filtered to include only the

first seven days of the new dataset. The filter was done to exclude any uncertainty of new

cancellation patterns that might have developed in the new loads. The resulting prediction

accuracy was not accepted for this filtered dataset as well, as the model was capable of

predicting only 20% of the cancellations (Table 12):

Table 12: Confusion matrix for loads that happened on the next seven days

Predictions No Yes

Act

ual

No 2,147 31 2,178

Yes 176 44 220

2,323 75 2,398

Error 8.63%


The low accuracy obtained from the model proved that the two hypotheses were false and prediction

power could not be improved with better historical data. Accordingly, the unpredictability with the available

data conclusion was confirmed. This poor accuracy means that cancellations are either random events or

caused by other factors that are not captured by the studied dataset.

28

4.4. Further analysis

Additional tests were done to further validate the lack of prediction power of the available data.

The following assumptions were tested:

• Cluster the data using specific attributes (miles, costs, book to pick up hours), and build

different models for each cluster. The objective was to test if cancellation behaviors differ for

different clusters that could not be captured by a single model.

• Reduce dimensionality of data using principal component analysis and use the most descriptive

principal components as an input for the machine learning algorithms. Then, validate if those

principal components could be used to build a better model.

• Develop a model that focuses only on two stop loads (single pickup and single drop) excluding

loads with multiple stops. The objective was to eliminate possible factors that impact multiple

stops loads that were not captured in the data.

• Consider only the cancelled record for each cancelled load. The data had the same load repeated

for each cancellation and for the actual completed load. By only keeping the cancelled records,

the model can better capture the cancellation patterns.

• Build a model that relies only on time related attributes (“Month”, “Day of the week” and

“Book to Pickup Hours”) and assess their impact on load cancellations.

All these additional assumptions were tested by developing multiple models and comparing the

results with the previously developed model. None of the scenarios gave results that showed significant

improvement from the original logistic regression model. Summary of the results is shown in Table 13.

29

Table 13: Additional analyses results compared to the results obtained for the logistic regression.

Test Error Missed

Bounces

Logistic Regression (Threshold=0.5) - Base Scenario 17.02% 83.79%

Cost Clustering

Low Cost (<= $500) 18.20% 99.06%

Mid Cost 16.67% 98.46%

High Cost (>= $6000) 8.49% 100.00%

Miles Clustering

Same day delivery (<= 250 mi) 16.07% 99.18%

Next Day delivery 18.08% 98.18%

Long Haul (>= 550 mi) 18.08% 98.18%

Book To pickup Hours Clustering

Less than 24h 8.53% 100.00%

Between 24h and 48h 16.91% 100.00%

Between 48h and 72h 20.58% 99.99%

More than 72h 22.33% 99.58%

PCA Analysis



K-Nearest Neighbors 17.49% 82.50%

One record per load (first cancellation) and only two stops scenario 17.07% 99.74%

Model based on time attributes (month, day of the week, book to pick-up hours) 17.03% 100.00%

4.5. Logistic regression Threshold Sensitivity Analysis

The final test was a sensitivity analysis over the logistic regression threshold to convert the

numerical regression output to binary output. For the base analysis a threshold of 0.5 was used. This

indicates that if the model output is a value between 0.5 and 1 the load is predicted as cancelled (a bounce);

while if the output value is between 0 and 0.5 the load was predicted as not cancelled.

Reducing the threshold to the average cancellation ratio (17%) might improve the prediction power

of the model and enhance the model results. This reduction would improve the prediction accuracy of

cancelled loads through decreasing the number of False Negatives (wrongly predicted as not cancelled) and

increasing the number of True Positives (accurately predicted as cancelled). However, this action would

also impact the number of False Positives (wrongly predicted as cancelled) and the overall model accuracy.

30

Such tradeoff is commonly used in disease diagnoses to define a threshold on which a disease will

be diagnosed, as shown in Figure 6:

Figure 6: Threshold tradeoff curves. Curve of people diagnosed with a disease (comparable to cancellations) and people without

the disease. It shows that decreasing the criterion value (threshold) will increase the True Positives but will also increase the False

Positives. The distribution of the test results will overlap and changes in criterion value will always be a tradeoff (Source:

MEDCALC)

So, if the threshold (criterion value in Figure 6) is moved to the left (decreased) to reduce the

number of False Negatives (FN), that will inevitably increase the False Positives (FP). Given that, for 3PL

companies, failing to predict actual cancellations (FN) is worse than predicting uncancelled loads as

cancellation (FP), this could potentially be a considerable strategy.

However, another aspect to consider in this tradeoff between FN and FP is the rate of change in

each category. If this tradeoff were proportional, the approach can be justified, as it reduces the error that

is significant to the company. On the other hand, if the relationship between the two errors is not

proportional, it is important to understand the impact of this tradeoff on the overall results. Therefore, an

assessment of how the two mentioned errors (FP and FN) change as the threshold changes.

To verify how the model accuracy changes by changing the threshold, a sensitivity analysis was

developed. The analysis assessed the rate of change in both FN and FP as the threshold is modified. The

actual number of FN loads and FP loads were recorded to identify the operational impact of this tradeoff.

Finally, recall, precision, sensitivity, specificity and the Receiver Operating Characteristic (ROC) Curve

were plotted to provide a broader understanding of how the threshold changes impact the model. Figure 7,

Figure 8, Figure 9, Figure 10 and Figure 11 show the results obtained:

31

Figure 7: Relative variation of FN and FP compared to the base value of loads for a threshold of 0.5 and model accuracy evolution

for different threshold values

Figure 8: Total number of FN and FP loads over the new testing data set and bounces predicted correctly (as % of total bounces)

for different threshold values

-

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

(10,000)

-

10,000

20,000

30,000

40,000

50,000

60,000

70,000

- 0.10 0.20 0.30 0.40 0.50 0.60

Acc

ura

cy (

%)

% V

aria

tio

n f

rom

Bas

e (T

hre

sho

ld=0

.5)

Threshold

Relative Variation of FP and FN

TN

FN Relative Variation

FP (Missed Not Bounces)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

-

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

- 0.10 0.20 0.30 0.40 0.50 0.60

% o

f B

ou

nce

s p

red

icte

d c

orr

ectl

y o

f to

tal

Bo

un

ces

Load

s

Threshold

Loads

TP

FN (Missed Bounces)

Model Accuracy

32

Figure 9: Recall vs Precision (Recall = TP/(TP+FP) and Precision = TP/(TP+FN))

Figure 10: Sensitivity vs Specificity (Sensitivity = TP/(TP+FN) and Specificity =TN/(TN+FP))

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

- 0.10 0.20 0.30 0.40 0.50 0.60

Threshold

Recall vs Precision

Recall

Precision

Model Accuracy

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

- 0.10 0.20 0.30 0.40 0.50 0.60

Threshold

Sensitivity vs Specificity

Sensitivity

Specificity

Model Accuracy

33

Figure 11: Receiver Operating Characteristic (ROC) Curve to illustrate the ability of the classifier as it the threshold changes

As observed in the charts, reducing the threshold decreases the number of FN, which improves the

prediction of cancelled loads. However, the rate at which the FP increase is much higher. For a threshold

value of 0.17 (which represents the average cancellation ratio), FN decreased 31% compared to the base

value, which was obtained 0.5 threshold, while FP increased 244%. In terms of total loads, the threshold

reduction allowed to correctly predict additional 2,743 cancelled loads. On the other hand, additional 9,130

loads were wrongly predicted as cancelled.

Based on these results, the model could be potentially used under a new threshold, but that change

would not be improving the model. A new threshold would just be reducing the most relevant error for the

company by increasing, at a much higher rate, the error on the other side.

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Sen

siti

vity

1-Specificity

ROC Curve

ROC Curve

45° Curve

34

5. Discussion

As presented in the results section, the studied dataset is not good enough to predict the probability

of a load being cancelled. Based on these results, neither the load characteristics nor the carrier behavior

over time give enough insights into the main drivers of cancellations.

The alternative of changing the threshold, to increase the prediction power of the cancelled loads,

could be used if the company is willing to accept the explained tradeoff between False Positives and False

Negatives. However, this approach would reduce the overall model’s accuracy and force the model to

reduce one type of error by increasing another at a much higher rate.

Based on this assessment, there are three potential approaches, which could be taken

simultaneously, that can be adopted to face the cancellation issue.

5.1. Threshold Reduction

One alternative, based on the results obtained, is to use the model with a lower threshold. By using

a threshold of 0.17 (average bounce ratio), the prediction power of the model increases by 31%, and the

model can predict 42% of the cancelled loads correctly. However, the cost of this alternative is an increase,

at a much higher rate (244%), of the number of loads predicted wrongly as cancelled. An assessment of the

over-cost associated with this tradeoff must be done in order to evaluate its applicability. Loads predicted

to be cancelled will most probably have some contingency plan or would need more supervision than the

rest of the loads. By reducing the threshold, the company will correctly predict more cancelled loads, but it

will need to monitor or have contingency plans for a much bigger number of loads. The ratio at which the

number of loads change is 4.3 to 1, which means that to predict 1 additional cancelled load correctly, 4.3

additional loads will be predicted as cancelled.

The decision of whether to use the model or not under this tradeoff would most probably depend

on the company strategy and cost structure. If the average cost per cancellation was considered ($145),

changing the threshold is only justified if the cost of the added errors on the company’s operation is lower.

Accordingly, changing the threshold is feasible if all the actions taken to control those additional 4.3 loads

do not cost more than the $145 (which is the average potential saving by avoiding a cancellation). This

means that a cost of $33.5 per load is the maximum additional cost for the developing contingency plans or

tracking loads predicted to be cancelled.

5.2. Further Research

Another approach is to do further research on the main drivers of the cancellations and capture

more information about these causes. As seen in the results, the cancellations are mostly related with the

35

carrier and the cities in the load route. However, the cancellation behavior does not remain constant over

time, so the causes are probably diverse. Surveys of existing carriers can be implemented to capture the

range of reasons for cancellations. In addition, real causes for each new cancellation can be captured to

understand the details and drivers in each case. The survey may include some potential root causes that can

be categorized and studied in future. Moreover, transactional data related to the captured causes should be

recorded with the loads information to facilitate future analyses. Once a considerable amount of surveys is

collected, further research can be done in order to build better models for predicting the load cancellations.

Some of the potential operational details that can be collected, that can be useful to predict carrier

load cancellations are:

• Information regarding sequence of back-to-back loads that use the same truck

• Information regarding available capacity of the carrier during the load day

• Information regarding drivers’ behavior

However, the challenge in capturing such information is that it is not fully captured by a single 3PL

company. This is due to the fact that the carriers are interacting with multiple shippers and 3PLs at the same

time. A delay on any of the loads (not necessary booked by the same company) may cause a cancellation

of another load. Moreover, carriers may overbook their capacity to reduce their risk of underutilization, and

then pick the loads that are more profitable and convenient. This may also cause cancellations for the 3PL

company’s loads, even if the company is not fully booking that carrier’s capacity. Therefore, capturing such

information will be a challenge, as they are not available at a company level but at an industry level.

5.3. Business Strategy

Another approach, that can help to reduce the impact of the cancellations, is to redefine some of the

company’s booking process. Currently, no defined consequences are faced by carriers for cancellations.

This may incentivize carriers to overbook their capacity and cancel the “least attractive” options later. As

the main drivers for the carrier cancellations have not been identified, a possible way to increase the

commitment of carriers is to include penalty clauses on the contract. These penalties can be either financial

penalties or reduction in the business awarded to the carrier.

Companies can also develop carriers’ ratings based on cancellation ratios. These rating can be used

for future contract negotiations and performance reviews. Having carriers aware of the consequences of

cancellations might help in reducing these cancellations or providing longer notice when a cancellation is

inevitable.

36

6. Conclusion

Load cancellations is a common issue within the trucking industry. The cost impact of cancelled

loads on companies represents significant amount of money every year. This extra cost is associated with

finding a replacement carrier on the spot market and losing productivity in the rebooking process.

3PL companies have access to huge amount of detailed transactional data generated by their

operation. Descriptive analysis over these data showed some factors that may impact the probability of load

cancellations. However, those factors (characteristics) were not enough to build a predictive model with

considerable accuracy.

A tradeoff can improve the model by reducing the logistic regression’s threshold at which a load is

predicted as cancelled. However, this tradeoff improves the prediction of cancelled loads, but at the same

time predicts more uncancelled loads as cancelled. With this approach, the model can be used to predict at

a decent level of accuracy and sensitivity. However, this tradeoff also generates a decrease of the model

specificity an increase, at a much higher rate, in the number of loads wrongly predicted as cancelled. This

would have an impact on the company’s operation, as the number of loads that must be monitored will

increase in a very high rate.

Based on these results different approaches can be followed to lower the impact of load

cancellations. First, the company can use the model with lower threshold and accept the high number of

False Positives in order to predict correctly up to a 42% of the loads cancelled. Another alternative is to

improve the model by surveying the carriers to identify the causes of cancellations. Surveys can identify

the root causes and enable building appropriate metrics to capture these causes. Once the data are captured,

it can be incorporated into the predictive model to improve its accuracy. One last alternative is to incorporate

business decisions that discourage carriers to cancel loads or encourage them to provide longer time window

if cancellation is inevitable.

Finally, it is known that the trucking industry is very complex with many stakeholders that are not

necessarily interconnected. This complexity might confound the possibility of building a good predictive

model using the company data solely. As carriers work with many shippers and brokers at the same time,

cancellations might be a consequence of delays or cancellations in other loads that are not managed by the

same company. This fact implies that companies will always have a limited view of all the factors that

might impact the load cancellations and consequently hinder the ability to build a sound predictive model.

37

7. Reference List

Alcoba, R. D., Ohlund, K. W. (2017). Predicting On-time Delivery in the Trucking Industry. Massachusetts

Institute of Technology, Center for Transportation and Logistics.

Akinwande, M. O. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor

Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. Retrieved form

http://www.scirp.org/

American Trucking Associations (2017). Reports, Trends & Statistics. Retrieved from

http://www.trucking.org/

AT Kearney (2016). CSCMP’s Annual State of Logistics Report. Logistics in Transition: New Drivers at

the Wheel. Retrieved from https://www.lee-associates.com/

Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research 13, 1063-

1095. Retrieved from http://www.jmlr.org/

U.S. Department of Transportation Bureau of Transportation Statistics (2015). Freight Facts and Figures.

Retrieved from https://www.bts.gov/

de Menezes, F. S., Liska, G. R., Cirillo, M. A., Vivanco, M. J. F. (2017). Data classification with binary

response through the Boosting algorithm and logistic regression. Elsevier, 69, 62-73. Retrieved from

https://www.journals.elsevier.com/

Fawcett, T. (2005). An introduction to ROC analysis. Elsevier, 27, 861-874. Retrieved from

https://www.journals.elsevier.com/

Kotsiantis, S.B., Zaharakis, I.D., Pintelasstates, P.E. (2007). Machine learning: a review of classification

and combining techniques. Artif Intell, 26, 159-190. DOI 10.1007/s10462-007-9052-3

Kutner, M. H.; Nachtsheim, C. J.; Neter, J. (2004). Applied Linear Regression Models. New York, NY:

McGraw-Hill Irwin. ISBN 0-07-238688-6

Markham, K. (2015). Comparing supervised learning algorithms. Retrieved from

http://www.dataschool.io/

MEDCALC easy-to-use statistical software (2018). ROC curve analysis. Retrieved from:

https://www.medcalc.org/

National Centers for Environmental Information (2017). Storm Events Database: years 2015, 2016 and

2017. Retrieved from: https://www.ncdc.noaa.gov/

Olson, D. L., Wu, D. (2017). Predictive Data Mining Models (1st ed.). Singapore: Springer. DOI

10.1007/978-981-10-2543-3

Pillet, M. (2004). Six Sigma Comment l’Appliquer. Paris : Éditions d’Organisation. ISBN: 2-7081-3029-3

Potter, K. (2006). Methods for Presenting Statistical Information: The Box Plot. University of Utah School

of Computing.

38

Thompson, W.R. (2009). Variable Selection of Correlated Predictors in Logistic Regression: Investigating

the Diet-Heart Hypothesis (Thesis). Florida State University College of Arts and Sciences.

U.S. Department of Transportation Federal Highway Administration (2003). Commercial Vehicle Weight

Standards. Retrieved from https://ops.fhwa.dot.gov/

39

8. Appendices

Appendix A: Project Gantt

This appendix shows the project plan that was followed during this research:

Figure A 1: Project Gantt. This Gantt provide detailed information regarding the plan followed to complete the project.

Act

ion

Pla

nM

on

th

Act

ivit

yW

ee

k1

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

4

Kik

Off

me

eti

ng

Pro

cess

Map

pin

g

Var

iab

les

List

ing

and

Dat

a R

evi

ew

Lite

ratu

re R

evi

ew

Pro

ble

m M

od

eli

ng

Dat

a C

lean

ing

Dat

a an

alys

is a

nd

Re

leva

nt

Var

iab

le D

efi

nit

ion

Re

sear

ch E

xpo

Var

iab

le S

ele

ctio

n

Mo

de

lin

g

Ris

k A

nal

ysis

Mo

de

l Te

stin

g (S

cen

ario

An

alyz

ing)

Fin

anci

al a

nal

ysis

imp

rove

me

nt

wit

h n

ew

mo

de

l

Re

com

me

nd

atio

ns/

Sen

siti

vity

An

alys

is

Cap

sto

ne

Pro

ject

Wri

tin

g an

d P

rese

nta

tio

n

MM

eas

ure

Sep

tem

be

rO

cto

be

rN

ove

mb

er

Mar

chA

pri

lM

ay

DD

efi

ne

De

cem

be

rJa

nu

ary

Feb

ruar

y

AA

nay

ze

IIm

pro

ve

CC

on

tro

l

Act

ion

Pla

nM

on

th

Act

ivit

yW

ee

k1

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

4

Kik

Off

me

eti

ng

Pro

cess

Map

pin

g

Var

iab

les

List

ing

and

Dat

a R

evi

ew

Lite

ratu

re R

evi

ew

Pro

ble

m M

od

eli

ng

Dat

a C

lean

ing

Dat

a an

alys

is a

nd

Re

leva

nt

Var

iab

le D

efi

nit

ion

Re

sear

ch E

xpo

Var

iab

le S

ele

ctio

n

Mo

de

lin

g

Ris

k A

nal

ysis

Mo

de

l Te

stin

g (S

cen

ario

An

alyz

ing)

Fin

anci

al a

nal

ysis

imp

rove

me

nt

wit

h n

ew

mo

de

l

Re

com

me

nd

atio

ns/

Sen

siti

vity

An

alys

is

Cap

sto

ne

Pro

ject

Wri

tin

g an

d P

rese

nta

tio

n

MM

eas

ure

Sep

tem

be

rO

cto

be

rN

ove

mb

er

Mar

chA

pri

lM

ay

DD

efi

ne

De

cem

be

rJa

nu

ary

Feb

ruar

y

AA

nay

ze

IIm

pro

ve

CC

on

tro

l

Act

ion

Pla

nM

on

th

Act

ivit

yW

ee

k1

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

41

23

4

Kik

Off

me

eti

ng

Pro

cess

Map

pin

g

Var

iab

les

List

ing

and

Dat

a R

evi

ew

Lite

ratu

re R

evi

ew

Pro

ble

m M

od

eli

ng

Dat

a C

lean

ing

Dat

a an

alys

is a

nd

Re

leva

nt

Var

iab

le D

efi

nit

ion

Re

sear

ch E

xpo

Var

iab

le S

ele

ctio

n

Mo

de

lin

g

Ris

k A

nal

ysis

Mo

de

l Te

stin

g (S

cen

ario

An

alyz

ing)

Fin

anci

al a

nal

ysis

imp

rove

me

nt

wit

h n

ew

mo

de

l

Re

com

me

nd

atio

ns/

Sen

siti

vity

An

alys

is

Cap

sto

ne

Pro

ject

Wri

tin

g an

d P

rese

nta

tio

n

MM

eas

ure

Sep

tem

be

rO

cto

be

rN

ove

mb

er

Mar

chA

pri

lM

ay

DD

efi

ne

De

cem

be

rJa

nu

ary

Feb

ruar

y

AA

nay

ze

IIm

pro

ve

CC

on

tro

l

40

Appendix B: Cancellation Causes Brainstorming

This appendix shows the list of variables identified in the brainstorming process as potential

attributes for the model:

Figure B 1: Diagram of potential variables impacting the cancellation probability of a certain load

Can

cellati

on

s

Lo

ad

Imp

ac

t

Sh

ipp

er

Imp

ac

t

Ca

rrie

rIm

pa

ct

Oth

er

Imp

ac

ts

Ca

rrie

r S

ize

Ca

rrie

r T

yp

e

Lo

ad

s/Y

ea

r

Bo

un

ce

/Ca

rrie

rC

arr

ier

IDC

arr

ier

Le

ng

th o

fR

ela

tio

ns

hip

Sa

fety

Ra

tin

g

Nu

mb

er

of

Cla

ims

/In

cid

en

ce

Sh

ipp

er

ID

Fa

cilit

y

Ind

us

try

Sh

ipp

er

Le

ng

th o

fR

ela

tio

ns

hip

Sh

ipp

er

Siz

e

Fa

cilit

y D

we

llT

ime Facil

ity

Imp

act

Carr

ier

His

tory

Imp

act

Carr

ier

Ch

ara

cte

risti

cs

Imp

act

Sh

ipp

er

Ch

ara

cte

risti

cs

Imp

act

Sh

ipm

en

ts/Y

ea

r

Sh

ipm

en

t

His

tory

Im

pact

Carr

ier

Issu

es

Imp

act

Ca

rrie

r R

ep

We

ath

er

Na

tura

lD

isa

ste

rG

eo

gra

ph

y

Re

p T

en

ure

Inte

rnal

Facto

rs

Imp

act

Exte

rnal

Facto

rs I

mp

act

Da

y o

f th

eW

ee

kB

oo

k T

ime

Lo

ad

Tim

e

Lo

ad

ID

Ori

gin

De

sti

na

tio

n

Nu

mb

er

of

Sto

ps

Lo

ad

Co

st

Lo

ad

Ra

te

Sp

ot

Pri

ce

Ap

po

intm

en

tT

yp

e

Le

ad

Tim

e

Em

pty

Tim

e

Hig

h R

isk

Hig

hV

alu

e

Bo

ok

Le

ad

Tim

e

Se

rvic

eL

ev

el

On

-Tim

eD

eliv

ery O

n-T

ime

Pic

kU

p

Eq

uip

me

nt

Typ

e

De

ad

He

ad

Le

an

gth

of

Ha

ul

Du

rati

on

We

igh

t

Lo

ad

ing

Tim

e

Un

loa

din

gT

ime

Co

ntr

ac

tT

yp

e

Lo

ad

Ch

an

ge

s

Ca

rrie

rC

on

fere

nc

e

Pri

ce I

mp

act

Lo

ad

Ch

ara

cte

risti

cs

Imp

act

Tri

p C

hara

cte

risti

cs

Imp

act

Co

ntr

act

Ch

ara

cte

risti

cs

Imp

act

41

Appendix C: Data Glossary

This appendix list all the attributes included in the dataset provided by the company along with a

brief description:

Table C 1: Dataset Glossary

Field Description Values

Loadid ID of the load.

LoadDate Date of the load to be executed.

EquipmentType Type of the required truck. V = Van

R = Refrigerated

CustomerID Shipper ID.

Industry Shipper Industry Type.

Conference Carriers Reps team in the 3PL company Conferences are limited to

USA

Team Expedited Service. 2 Drivers executing one

load.

HighValue Indicator if the load is high value or not.

Which means carrier needs higher value

insurance to execute the load.

Above 100k$ in product

value is considered as high

value.

Miles Total miles for the load trip.

NumStops Number of stops on a load. A Load can be

multi pick or multi stop.

LoadStopID ID for each unique stop

Type Stop type. If its pick up or delivery. 1 = Pickup

2 = Delivery

Sequence The order of stops.

FacilityID Pickup Facility ID.

CityName City Nam of the pickup facility.

StateCode State Code of the City of the pickup facility.

ZipCode Zip Code of the City of the pickup facility.

Contract-Spot Specify if the load is booked through spot

price load or contract price. This reflect the

latest status of load.

C = Contract

S = Spot

Weight Weight of the load Up to 45 tons.

Pallets # of pallets in the load Up to 48 pallets.

Code Type of appointment

Appt = Fixed date and

time

Notice = Shipper inform to

carrier

Open = Carrier can arrive

through within a certain

period

None = There is no

appointment defined

Appt Date and Time of the pickup.

42

Field Description Values

ArrivedAtFacility Actual date and time of arrival to the

facility.

DepartedFacility Actual date and time of departure from the

facility.

Cost Cost of load.

FacilityOpenTime Time the pickup facility is opens on the

pickup day.

FacilityOpenTime =

FacilityCloseTime

represents 24h operation.

FacilityCloseTime Time the pickup facility closes on the

pickup day.

FacilityOpenTime =

FacilityCloseTime

represents 24h operation.

LoadRank Contract carriers are broken down into

Primary, Backup and Secondary.

Primary = 1st carrier that

the load was tendered to.

Backup = Backup carrier

Secondary = 2nd carrier

that the load was tendered

to (f both primary and

backup do not show up).

Spot = Carrier from the

spot market.

CarrierID ID for the carrier.

BookByDateTime Date when the load was assigned to carrier. Blank means load was not

covered.

CarrierType Gives a list of all carriers that were

bounced, or they actually hauled the freight.

Bounced = Carrier

cancelled the load.

Actual = Carrier hauled

the load.

IsBounced Indicator if the load is bounced or not.

BouncedDateTime Date the bounce was recorded.

MaxPay Maximum allowed payment for carrier. Sometimes Max Pay is

opened (no value) because

the freight is a “must

move” and has to go, no

matter the cost.

EmptyTime Date and time for the carrier when they are

booked on the load.

ExpectedEmptyLocation City and State where the truck is expected

to be empty when they are booked on the

load.

DeadHeadMiles Miles expected with empty truck when the

carrier is booked on the load.

MarketCost Spot market cost for the booked load

43

Appendix D: Data Cleaning Process

This appendix provides further details regarding the cleaning process done over the data. Data

received were first cleaned to obtain a valid dataset that could be used for the modeling. The main cleanup

steps were:

• Remove all records where there is a mismatch between IsBounced and CarrierType fields

(IsBounced=False and CarrierType=Bounced) - Removed Records = 774,587

• Remove all records where Type has a wrong value (Type must be 1=PickUp or 2=Delivery) -

Removed Records = 1,873

• Remove all records where cost is less than 0 - Removed Records = 107

• Remove all unique load bookings (Loadid, CarrierID, IsBounced, BouncedDate combinations)

where no record exists for the first stop (Sequence=1) - Removed Records = 8,695

After completing the above data cleaning process, the data remaining consisted of:

• Total number of records = 9,919,181 (92.6% of original data)

• Number of unique loads = 3,621,367 (99.8% or original data)

Table D 1 summarizes the observations that were done during the transformation process and the

actions taken:

Table D 1: Summaries of observations on the data and the actions taken to resolve these observations

Observation Action Taken

A load can be bounced and completed by the

same carrier

Load-level data are designed so that each record

represents a unique combination of the below

fields:

▪ Loadid

▪ CarrierID

▪ IsBounced

▪ BouncedDate

A load can be bounced by the same carrier

multiple times

First stop of some loads (id, carrier, bounce

combination) is duplicated due to different

values in MarketCost or MaxPay

Average values were considered for these

duplicates

Load weight and pallets data are available on

stop-level

The sums of weights and pallets for all pickup

stops were aggregated to the load-level.

Some loads have cost = 0 MarketCosts were assumed as the cost for these

loads

44

Appendix E: Descriptive Analytics

This appendix provides further details regarding the descriptive analytics done over the available

data. A statistical descriptive analysis of the load data to give an insight of the ranges and variability of the

numerical characteristics:

Table E 1: Load Characteristics Statistics summary of all the loads (since 2015)

Total Distance

(miles)

Number of

Stops Cost

Dead Head

(miles)

Weight

(ton)

mean 615 2 $1,613.17 41 32,960

Std. 522 0.5 $1,391.63 67.8 13,660

Min 0 2 $0.00 0 0

Q1 245 2 $700.00 0 22,853

median 464 2 $1,200.00 18 40,368

Q3 822 2 $2,000.00 61 43,379

max 3,500 31 $12,000.00 3,341 45,000

Cancellation ratios (percentage of cancelled loads to all loads) was analyzed for multiple variables

to identify the characteristics with differentiable cancellation probabilities:

Figure E 1: Cancellation Ratios by Industry. This chart represents the cancellation ratio for each of the shippers’ industries.

45

Figure E 2: Cancellation Ratio by Carrier Maturity (in years). Carrier maturity is defined by the length of the carrier’s relationship

with the company. This chart reflects that cancellation ratio is higher for carriers that have been working with the company for a

shorter period of time.

Figure E 3: Cancellation Ratio by City: This map reflects the number of loads that go through different cities and the cancellation

ratio (bounces/total loads). Size represents the number of loads, while the color represents the bounce ratio.

46

Figure E 4: Cancellation Ratio by Deadhead Miles. This chart shows that the cancellation ratio tends to increase for loads with

longer deadhead miles.

Figure E 5: Cancellation Ratio by Conference. This chart shows the cancellation ratio behavior within the Conferences managing

FTL within the U.S.

47

Figure E 6: Cancellation Ratio by High Value indicator. This chart shows the cancellation ratios for high value (1) loads vs. non-

high value loads (0).

Figure E 7: Cancellation Ratio by Number of Stops in a load. This chart shows how the cancellation ratio changes for the different

number of stops per load.

48

Figure E 8: Cancellation Ratio by Type of Contract (with shipper) , where C=Contract and S=Spot.

Figure E 9: Cancellation Ratio by trip length. This chart show that next day delivery has a higher cancellation ratio than the other

options.

49

Figure E 10: Cancellation Ratio by Load Cost (in $). This chat shows a decreasing trend in the cancellation ratios as the cost of

the load increases.

Figure E 11: Cancellation Ratio over Time. This chart shows the cancellations trend over time for the three-year dataset.

50

Figure E 12: Cancellation Ratio by number of claims against carrier. This chart shows how the cancellation ratios are higher for

carriers with higher number of claims.

Figure E 13: Cancellation Ratio by State (represented by the color) and Total Cancellations by State (Represented by the number).

This chart shows how the cancellation ratio change from one state to the other (dark orange represents high cancellation ratio

while dark blue represents low cancellation ratio) and the total number of cancellations that happened in each state.

51

Figure E 14: Cancellation Ratio by first pickup appointment time slot. This chart represents how the cancellation ratio changes

for different appointment time slots for the first pickup (Morning: between 6am to 11am, Afternoon: between 11am to 4pm, Evening:

between 4pm and 9pm, Late night: between 9pm and 6am)

Figure E 15: Cancellation Ratio by month. This chart shows how the cancellation ratios over the months.

52

Figure E 16: Cancellation Ratio by day of the week. This chart shows how the cancellation ratios over the weekdays.

Figure E 17: Cancellation Ratio by Book to Pick up Hours (time between the booking of the load and the first pickup). This chart

shows that the probability of cancellation increases as the time between the booking and pickup increases.

53

Appendix F: Variables Prediction Rankings

This appendix shows the features prediction ranking for the available and enriched datasets.

Figure F 1: Predictor screening of variables in the available dataset. The figure shows the result of the predictor screening

algorithm that ranks the features based on their prediction power.

54

Figure F 2: Predictor screening of variables in the enriched dataset. .The figure shows the result of the predictor screening

algorithm that ranks the features based on their prediction power.

predicting carrier load cancellation

Documents