predicting carrier load cancellation
TRANSCRIPT
Predicting Carrier Load Cancellation
by
Ali Al-Habib
Bachelor of Science, Computer Science, King Fahd University of Petroleum and Minerals, 2005
and
Nicolas Favier Gonzalez
Bachelor of Science, Industrial Engineering, University of Cuyo, 2014
Bachelor of Science, Mechanical Engineering, École Nationale d’Ingénieurs de Saint-Étienne, 2013
SUBMITTED TO THE PROGRAM IN SUPPLY CHAIN MANAGEMENT
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF APPLIED SCIENCE IN SUPPLY CHAIN MANAGEMENT
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
JUNE 2018
© 2018 Ali Al-Habib and Nicolas Favier Gonzalez. All rights reserved.
The authors hereby grant to MIT permission to reproduce and to distribute publicly paper and electronic
copies of this capstone document in whole or in part in any medium now known or hereafter created.
Signature of Author: …………………………………………………………………………………………
Ali Al-Habib
Department of Supply Chain Management
May 11, 2018
Signature of Author: …………………………………………………………………………………………
Nicolas Favier Gonzalez
Department of Supply Chain Management
May 11, 2018
Certified by: .............…………………………………………………………………………………………
Dr. Christopher Mejia
Director, MIT SCALE Latin America Network
Capstone Advisor
Accepted by: …………………………………………………………………………………………………
Dr. Yossi Sheffi
Director, Center for Transportation and Logistics
Elisha Grey II Professor of Engineering Systems
Professor, Civil and Environmental Engineering
2
Predicting Carrier Load Cancellation
by
Ali Al-Habib
and
Nicolas Favier Gonzalez
Submitted to the Program in Supply Chain Management
on May 11, 2018 in Partial Fulfilment of the
Requirements for the Degree of Master of Applied Science in Supply Chain Management
ABSTRACT
Truckload cancellations by carriers are causing disruptions in the trucking industry operations. By
extrapolating the findings from the 3PL’s data studied in this research to the whole trucking industry, it is
estimated that 32 million cancellations occur every year. These cancellations result in around $4.6 billion
extra cost. If these cancellations can be predicted, shippers and transportation brokers can avoid loss of
money and resources caused by the required rebooking process. This research explores the key drivers of
loads’ cancellation using historical cancellation patterns. It evaluates the applicability of different predictive
models that were built using three-year data from a third-party logistics provider. These models include
logistic regression, random forest, neural networks and k-nearest neighbors. However, the research focuses
mostly on logistic regression, as it provides more insights of the main drivers of the cancellations. The
resulted models were capable of correctly predicting only 16% of the cancelled loads. In effort to improve
the accuracy of the logistic regression model, tradeoff analysis was developed to study the impact of
adjusting the threshold. The analysis showed that using lower threshold can improve the correctly predicted
cancellations to 42%. However, for every additional cancelled load predicted correctly, around 3
uncancelled loads are predicted as cancelled. As all models gave comparable results, the research concludes
that the available load information and historical cancellation behaviors are not enough to predict future
cancellations. The research concludes by recommending business solutions to be implemented in order to
reduce the probability of cancellations. These solutions include educating carriers on the impact of
cancellation and encouraging them to cancel with longer timeframe when cancellation in inevitable.
Moreover, further research might focus on surveying carriers to identify the root causes of cancellations
and capture details related to these causes.
Capstone Advisor: Dr. Christopher Mejia
Title: Director, MIT SCALE Latin America Network
3
ACKNOWLEDGEMENTS
We would like to thank our advisor, Dr. Christopher Mejia, for his support and guidance throughout the
project. We would also like to thank our writing instructors, Toby Gooley and Pamela Siska, for their help
and feedback throughout the writing process.
We also thank our sponsor company for their trust and continuous support. The support team provided
prompt feedback and guidance that helped us tremendously during the project phases.
Final thanks the SCM 2018 Class for their support throughout the year.
- Ali and Nicolas
I would like to thank my family, especially my parents, for their encouragement and support. Special thanks
also my wife, Rabab, for her unconditional love and support throughout the year. I would also like to thank
my sons, Hasan and Elyas, for bringing joy to my life.
- Ali Al-Habib
I would like to thank my family, friends and my girlfriend for their support and motivation through this
year. I would also like to thank the Fulbright program and Aeropuertos Argentina 2000 organization for
giving me the possibility of living this wonderful experience.
- Nicolas Favier Gonzalez
4
Table of Contents
1. Introduction .............................................................................................................................................. 7 2. Literature Review ................................................................................................................................... 10
2.1. Models Evaluation ......................................................................................................................... 10 2.2. Evaluating Performance ................................................................................................................. 12 2.3. Section Summary ........................................................................................................................... 13
3. Methodology .......................................................................................................................................... 14 3.1. DEFINE ......................................................................................................................................... 14
3.1.1. Process Mapping ..................................................................................................................... 14 3.2. MEASURE .................................................................................................................................... 15
3.2.1. Data Collection ........................................................................................................................ 15 3.2.2. Data Cleaning and Preparation ................................................................................................ 16 3.2.3. Outliers Processing .................................................................................................................. 17 3.2.4. Data Analysis .......................................................................................................................... 18
3.3. ANALYZE ..................................................................................................................................... 18 3.3.1. Correlation and Multi-Collinearity Analysis ........................................................................... 18 3.3.2. Variable Normalization ........................................................................................................... 20 3.3.3. Bootstrap Forest Predictor Screening and Stepwise Regression ............................................. 20
3.4. IMPROVE ..................................................................................................................................... 22 3.4.1. Logistic Regression ................................................................................................................. 22 3.4.2. Model Accuracy Analysis ....................................................................................................... 22 3.4.3. Machine Learning.................................................................................................................... 23
3.5. CONTROL ..................................................................................................................................... 23 3.5.1. Sensitivity Analysis ................................................................................................................. 23
4. Results .................................................................................................................................................... 24 4.1. Available Dataset ........................................................................................................................... 24 4.2. Enriched Dataset ............................................................................................................................ 25 4.3. Unpredictability Testing ................................................................................................................ 26 4.4. Further analysis .............................................................................................................................. 28 4.5. Logistic regression Threshold Sensitivity Analysis ....................................................................... 29
5. Discussion .............................................................................................................................................. 34 5.1. Threshold Reduction ...................................................................................................................... 34 5.2. Further Research ............................................................................................................................ 34 5.3. Business Strategy ........................................................................................................................... 35
6. Conclusion ............................................................................................................................................. 36 7. Reference List ........................................................................................................................................ 37 8. Appendices ............................................................................................................................................. 39
Appendix A: Project Gantt ...................................................................................................................... 39 Appendix B: Cancellation Causes Brainstorming................................................................................... 40 Appendix C: Data Glossary .................................................................................................................... 41 Appendix D: Data Cleaning Process ....................................................................................................... 43 Appendix E: Descriptive Analytics......................................................................................................... 44 Appendix F: Variables Prediction Rankings ........................................................................................... 53
5
List of Figures
Figure 1: Six-Sigma Continuous Improvement DMAIC Cycle .................................................................. 14
Figure 2: Load Process Map. ...................................................................................................................... 15
Figure 3: Example of calculating and aggregating cancellation ratios ....................................................... 17
Figure 4: Forward Stepwise Regression algorithm’s results. ...................................................................... 21
Figure 5: Model Parameters and characteristics ......................................................................................... 22
Figure 6: Threshold tradeoff curves. ........................................................................................................... 30
Figure 7: Relative variation of FN and FP .................................................................................................. 31
Figure 8: Total number of FN and FP loads ............................................................................................... 31
Figure 9: Recall vs Precision ...................................................................................................................... 32
Figure 10: Sensitivity vs Specificity ........................................................................................................... 32
Figure 11: Receiver Operating Characteristic (ROC) Curve ...................................................................... 33
6
List of Tables
Table 1: Weight of Shipments by Transportation Mode ............................................................................... 7
Table 2: Value of Shipments by Transportation Mode ................................................................................. 7
Table 3: Comparing supervised learning algorithms .................................................................................. 10
Table 4: Summary of the cancellations and their impact ............................................................................ 18
Table 5: Logistic regression confusion matrix on the available dataset ..................................................... 24
Table 6: Machine learning models’ results on the available dataset ........................................................... 25
Table 7: Logistic regression confusion matrix on the enriched dataset ...................................................... 25
Table 8: Machine learning models’ results on the enriched dataset ........................................................... 25
Table 9: Logistic regression confusion matrix on the new dataset ............................................................. 26
Table 10: Machine learning models results on the new dataset .................................................................. 26
Table 11: Confusion matrix for loads with at least 10 previous records..................................................... 27
Table 12: Confusion matrix for loads that happened on the next seven days ............................................. 27
Table 13: Additional analyses results ......................................................................................................... 29
7
1. Introduction
Truckload represents the largest transportation mode in the United States. It accounts for around
75% of the total volume and value of shipments transported every year. Table 1 and Table 2 show the
distribution of shipments among transportation modes:
Table 1: Weight of Shipments by Transportation Mode : 2007, 2013, and 2040 (millions of tons)
Reprinted from Freight Facts and Figures, by U.S. Department of Transportation Bureau of Transportation Statistics 2015.
Table 2: Value of Shipments by Transportation Mode : 2007, 2013, and 2040 (billions of 2007 dollars)
Reprinted from Freight Facts and Figures, by U.S. Department of Transportation Bureau of Transportation Statistics 2015.
Of the whole trucking industry, approximately 48% of the industry is represented by Full Truck
Load (FTL), 11% is represented Less Than Truck Load (LTL) and the remaining 41% is represented by
private or dedicated fleets (AT Kearney, 2016). With a total weight of 13.9 billion tons transported via
trucks and an average of 36 tons per truckload (U.S. Department of Transportation Federal Highway
Administration, 2016), it is estimated that around 185 million contracted full truckloads are transported
each year in the United States.
The trucking industry is highly fragmented in the U.S. with many owner-operator carriers.
Approximately 90% of the carriers operate six or fewer trucks (American Trucking Associations, 2017).
This fact adds to the operational complexity of the industry and makes it difficult to organize and predict.
Due to this context, shippers and third-party logistics providers in the industry face the challenge of carrier
load cancellation.
A third-party logistics (3PL) provider company sponsored this project to assess the ability to predict
cancellations of future loads. The company provides truckload, less-than-truckload, and intermodal
8
brokerage services and transportation management services to more than 14,000 shippers, from Fortune
100 companies to small businesses. It identifies the right equipment for customers’ freight from a growing
network of more than 40,000 prequalified local, regional, and national carriers. It acts as the middleman
that connects shippers with carriers to facilitate moving shippers’ loads from their origins to destinations.
3PL companies receive compensation by means of commission once the load is moved successfully.
Based on an initial analysis of the studied dataset, approximately 17% of confirmed loads gets
cancelled (also known as bounced) by carriers. On average, each cancellation is estimated to result in $145
of extra cost, which is caused by the higher costs of the rebooked loads with new carriers.
There is not enough information regarding the impact of truckload cancellations at the overall
trucking industry level. However, if the same cancellations metrics, implied by this analysis, are
extrapolated to the whole industry, interesting results can be observed. By extrapolating the calculated 17%
cancellation ratio, it can be estimated that approximately 32 million loads are cancelled per year. These
cancellations translate into extra costs of around $4.6 billion each year. Therefore, the capacity to predict
these cancellations and plan accordingly can be valuable to companies in this industry due to the potential
cost savings.
A big advantage of the trucking industry is the availability of operational data. 3PLs usually use
information systems where all the bookings and loads information is stored. These systems also keep
records of all the carriers and shippers, which provides a proficient level of details of past loads. Predictive
analytics can be used to process this data and generate insights into future trends. Knowledge created from
this type of analytics can help in developing better plans or avoiding potential risks (Olson & Wu, 2017).
In the context of carrier load cancellations, transactional data can be used as the base of predictive
analysis to identify which loads are more likely to be cancelled based on their characteristics. Having this
information, before the cancellation occurs, can be very valuable. This information will allow the company
to make better decisions when choosing carriers, or to develop a mitigation plan in case a cancellation
happens. This can potentially translate into cost reduction and productivity increase for the company.
Nowadays, there are many explanatory models that can predict an event based on historical data
patterns. The objective of this project is to explore the key characteristics that drive load cancellations. The
project will measure the impact of loads’, shippers’ and carriers’ characteristics on load cancellations and
provide insights into cancellation patterns. To achieve this, a logistic regression model will be applied to
evaluate the significance of the different load attributes in affecting the carriers’ cancellations. Other
machine learning models will also be evaluated to validate the results obtained by logistic regression model.
This project will exploit the available dataset and identify the most significant characteristics that
provide insights into cancellations. The result of the project will help the company in making business
decisions and will provide insights on focus areas of future research. Although the data provided by the
9
company covers considerable number of features, other details that might impact load cancellations were
not studied in this project. This is due to limitations in the ability to access external information such as
carriers’ schedules and loads that are not managed by the company.
This report starts by reviewing available literature related to predictive modeling and approaches
followed in similar problems. Then, the methodology section explains the steps followed in the data
analysis, features evaluation, and predictive models’ development. Models’ performance is then presented
and analyzed in the results section. Finally, potential future research and next steps are mentioned in the
discussion and conclusion sections.
10
2. Literature Review
Different predictive models can be used to predict an outcome of a dependent variable based on a
set of independent variables. These models fit different contexts and solve various types of problems. Each
model has pros and cons that need to be evaluated. Accordingly, literature review was done to identify the
models that are suitable for predicting carrier load cancellations and the appropriate approaches for
evaluating models’ performance.
2.1. Models Evaluation
Multiple predictive models are commonly used to solve similar kind of problems. Each model has
its own pros and cons based on the dataset, it is applied to, and the model’s objective. Model selection
depends mainly on the studied dataset and the problem characteristics. Table 3 shows a brief comparison
among the different models.
Table 3: Comparing supervised learning algorithms
Reprinted from Comparing supervised learning algorithms, by Kevin Markham, 2015.
Simple and multi-linear regression models are used to analyze correlation among a list of
independent variables in order to predict a dependent variable using linear relationship. These models are
usually easy to understand and relate to the real world. Although many data may come in a nonlinear form,
linear regression can still be used by transforming nonlinear to linearly related data. Regression models can
include as many independent variables as needed. However, correlation and multi-collinearity among
variables must be assessed to avoid information overlap among independent variables. In a multiple-linear
regression model, there should be low correlation among independent variables to avoid information
11
overlapping, and high correlation between independent and dependent variable to justify their prediction
strength (Olson & Wu, 2017).
Logistic regression is an extension of the regression models. It provides an approximation to predict
the likelihood of a specific event by using an underlying regression model (de Menezes, Liska, Cirillo &
Vivanco, 2017). While linear regression models work well with continuous dependent variables, they do
not provide plausible estimation for categorical output. On the other hand, logistic regression can transform
the output of the underlying regression model into categorical output through the probability of acceptance
formula (Olson & Wu, 2017).
Other prediction models, that have also been used to solve similar problems, are machine learning
algorithms. These algorithms use historical data to predict continuous, categorical or binary future results
(Kotsiantis, Zaharakis & Pintelasstates, 2007). There are several types of machine learning algorithms, such
as Neural Networks, Decision Trees and Rule Induction (Olson & Wu, 2017).
An MIT SCM 2017 thesis “Predicting On-time Delivery in the trucking industry” by Alcoba and
Ohlund (2017) addressed a similar problem, but with a different objective. It addressed the challenge of
predicting truckload on-time delivery based on certain characteristics. In theory, both problems face
comparable objectives, which are predicting the likelihood of a binary outputs. Also, both problems require
a predictive model that is based on a list of known load characteristics (independent variables) to predict
the probability of an event (dependent variable). The approach Alcoba and Ohlund followed for solving the
on-time delivery prediction was through creating a logistic regression model.
Beside the models’ accuracy, model interpretability is an important aspect to consider while
selecting a model. Regression models are self-explanatory in which the assigned coefficients represent the
direction and magnitude of the impact of each independent variable. On the other hand, machine learning
algorithms such as Neural Networks are “black box” models and very complex to analyze (Olson & Wu,
2017). Accordingly, logistic regression was selected as the main model to predict load cancellations.
To validate the output of the logistic regression model, the same datasets were used to build other
machine learning models. As wide range of models are available, three models were selected based on their
applicability to the load cancellation prediction. Moreover, the underlying functions of the models were
also considered to cover various machine learning techniques. The selected models are:
• Neural Networks models are represented by layers of nodes that are connected by arcs with
assigned weights. The architecture of these models allows learning through feedback loops by
comparing nodes output to the target values and adjusting the weights accordingly. These
models are complex; however, they tend to perform well in problems with high level of
unpredictability (Olson & Wu, 2017).
12
• k-Nearest Neighbor models are instance-based learning models that use the similarities in
records’ properties to classify the unclassified records. This algorithm locates the K (number
of instances) closest classified instances (in terms of similar properties) to determine the class
of unclassified instance. These models have been used in real applications and demonstrated
high level of stability compared to other machine learning models (Kotsiantis et al., 2007).
• Random Forest models are collections of unpruned decisions trees. Decision trees are
classification models that are based on association rules. These models determine the important
variables based on their ability to predict outcomes. Random forest and decision trees models
are widely applied in business data mining problems and provide reliable outcomes (Olson &
Wu, 2017).
2.2. Evaluating Performance
Once a suitable technique is defined to predict truckload cancellations, it is important to test the
model robustness with a new dataset. Model accuracy will be assessed by the expected significance level
to decide its fit for predicting load cancellation. If the model accuracy does not meet the expected level, the
model will be reformulated by introducing new features to the dataset.
The selection of an algorithm over another can be based on multiple metrics such as prediction
accuracy, recall, precision, sensitivity and specificity. Prediction accuracy (percentage of correct
predictions over total predictions) and sensitivity (percentage of true positive over total positives) reflect
the overall model performance and its applicability to the problem. Accordingly, these two metrics will be
used to compare models’ results. To obtain an unbiased evaluation of the model accuracy, a training dataset
should be used to build the models while a different set (i.e., test dataset) should be used to evaluate their
accuracy (Kotsiantis et al., 2007).
Confusion Matrices will be used to demonstrate the models’ performance. A confusion matrix for
two classes consists of four possible outcomes. Given a classification model (i.e., classifier) and the
instances (i.e., test data), Confusion Matrix provides the following outcomes (Fawcett, 2005):
• True Positive: Reflects that instance is positive and predicted positive
• True Negative: Reflects that instance is negative and predicted negative
• False Positive: Reflects that instance in negative and predicted positive
• False Negative: Reflects that instance is positive and predicted negative
This matrix allows determining the model precision and accuracy. It also allows identifying if the
model is good enough for the expected significance level.
13
2.3. Section Summary
This literature review analyzed various methods that can be used to formulate a predictive model
for predicting the probability of an event based on known facts. It also mentioned the approaches for testing
the different models and identifying the most suitable one for the desired objective.
Although predictive analytics have been previously applied to the trucking industry, the literature
body did not apply these predictive models in order to predict truckload cancellations. This research will
analyze industry data and apply logistic regression model to understand the key drivers of cancellations in
the trucking industry.
14
3. Methodology
This section explains the steps and methods that were used to develop the model for predicting
carrier load cancellations based on loads’ characteristics. Six-Sigma continuous improvement cycle
(Define, Measure, Analyze, Implement, Control sown in Figure 1) was followed as a framework to structure
the multiple phases of the project (Pillet, 2004).
Figure 1: Six-Sigma Continuous Improvement DMAIC Cycle (Source: iancos.wordpress.com/)
Based on this DMAIC cycle, a detailed action plan was elaborated to set the roadmap for the project
(refer to Appendix A: Project Gantt for details). Throughout the section, details are explained about load’s
lifecycle, data cleaning process, variables selection and model development. Finally, this section will cover
the approach used to test and validate the models’ accuracy as well as the process followed for the sensitivity
analysis.
3.1. DEFINE
3.1.1. Process Mapping
The first step of the research was a site visit to the sponsor company’s headquarter. The main
objective of this visit was to understand the company’s process details. Based on the information collected
during the visit, a process map was created detailing the load creation and execution processes.
15
Figure 2: Load Process Map. This figure shows the different stages through which a load moves from the booking until the delivery.
Figure 2 shows the typical loads’ lifecycle for 3PL operations, which gives a general perspective
of how the process works.
A brainstorming session was conducted during the visit with the sponsor company’s team to
identify all possible variables that could impact load cancellations (refer to Appendix B: Cancellation
Causes Brainstorming for details about the full list of variables). The session resulted in a list of 48 different
variables categorized into four categories (Load Impact, Shipper Impact, Carrier Impact and Other
Impacts). Based on this list, the sponsor company committed to provide three-year operational data to be
used in the analysis.
3.2. MEASURE
3.2.1. Data Collection
The sponsor company provided stop-level data on their operations during 2015, 2016 and part of
2017. Stop-level data gives information about each stop, where a certain load goes from the first pickup
point to the last drop point. This means that each load is represented in the data by at least two stops (one
pickup and one drop). However, many loads are represented by multiple stops, as they go through multiple
pickup or drop points. A preliminary revision of the received data was done, and a glossary of all variables
was created (refer to Appendix C: Data Glossary for detailed information about the dataset). The data
glossary was validated with the sponsor company’s team to verify the inclusion of all desired variables and
to confirm the meaning of each variable. Further meetings were carried out to validate assumptions
regarding outliers and missing data. The data received consisted of:
• Total number of records = 10,706,177
• Number of unique loads = 3,629,543
16
3.2.2. Data Cleaning and Preparation
Once all the questions about the data were clarified, the data preparation and cleaning process
started. Data cleaning consisted of eliminating records with invalid information and replacing others with
validated assumptions. The cleanup steps were required to insure data consistency and to remove outliers
that can create undesired impact on the analysis. For details regarding the cleaning process refer to
Appendix D: Data Cleaning Process.
The provided dataset, as mentioned earlier, was at stop-level, which means that each load is
represented by more than one record. However, for the purpose of the analysis and for predicting the
probability of future load cancellations, the data were transformed to load-level (one record per load
cancelled or shipped).
Once this process was performed, the total number of loads that were included in the load-level
data was 3,997,488. This number includes all the completed and cancelled loads.
To transform the data from a stop-level to a load-level, some variables were aggregated (like
weights and pallets) and new variables were created to describe the trip information contained in the stop-
level data. A set of different cancellation ratios were created for some of the variables. Cancellation ratio is
the percentage of cancelled loads to all loads booked. Combined ratios were also created to give a more
detailed insights into cancellation behavior. Figure 3 illustrates an example of how these ratios were
calculated and aggregated to load-level for a given load.
17
Figure 3: Example of calculating and aggregating cancellation ratios to load-level data for carrier and city combination
Moreover, all categorical variables were turned into numerical or dummy variables to be used in
the different models.
3.2.3. Outliers Processing
After the data cleaning process, a statistical analysis was developed to identify the shape and
distribution of the continuous variables. Certain outliers were identified and were further analyzed using a
box and whiskers plot (Potter, 2006). A set of boundaries was defined to remove records with outliers from
18
the dataset. These boundaries were validated by the sponsor company’s team to match the possible ranges
of values in the real operations:
• Miles [0 – 3,500]
• Cost [0 – 12,000]
• Dead head miles [0 – 3,500]
• Book to load days [0 – 21 days]
• Book to Pick Up hours [0 – 504 hours]
• Load duration [0 – 14 days]
3.2.4. Data Analysis
An exhaustive descriptive data analysis was carried out to provide a general overview of the current
situation. The objective of the analysis was to assess the size of the load cancellations and provide insights
on possible drivers for these cancellations in the data. Table 4 summarizes the total number of loads,
cancellation ratios and over costs for cancellations through the studied three years dataset:
Table 4: Summary of the cancellations and their impact during the last three years on the 3PL company
Year Total
Loads
Cancelled
Loads
Cancellation
Ratio
Bounces Cost
(Actual – Cancelled)
2015 1,049,936 194,070 18.5% $21,293,232
2016 1,325,797 235,914 17.8% $31,060,148
2017 1,245,634 204,479 16.4% $38,926,148
Total 3,621,367 634,463 17.5% $91,279,528
According to data represented in Table 4, 17.5% of booked loads are cancelled. The estimated cost
of each cancellation is $145 with a standard deviation of $480 and an interquartile range of 200 (Q1=$0,
median=$50 and Q3=$200). For more descriptive analytics, refer to Appendix E: Descriptive Analytics.
3.3. ANALYZE
3.3.1. Correlation and Multi-Collinearity Analysis
Once the data were cleaned and prepared a total of 64 attributes were obtained. From these
attributes, 12 attributes were excluded as they were not useful for the models’ building process. The
excluded attributes included IDs, dates and attributes that are only known after the cancellation event.
Information presented by date attributes were rolled up and reflected in durations, while high cardinality
attributes (i.e., IDs) were removed and replaced by cancellation (bounce) ratios (Shipper Bounce Ratio and
Carrier Bounce Ratio).
19
Moreover, attributes that were identified as non-reliable due to the way they were collected were
also removed. The removed attributes are:
• Load ID
• Load Date
• Customer ID
• Carrier ID
• Bounced Date Time
• Book by Date Time
• Empty Time
• Expected Empty Location
• First Pick Up Date Time
• Max Pay
• Actual Load Cost
• First Load Moved On
The remaining 52 variables were further analyzed and assessed to be used in the model. To avoid
overfitting and having a very complex model different analysis were carried out to identify the best features
to predict load cancellations. First, a simple correlation analysis was used to identify highly correlated
(positive or negative) variable pairs. Highly correlated variables are likely to affect the model outcomes by
increasing the risk of over-fitting (Thompson, 2009) and therefore one of the variables should be excluded.
Considering high correlation absolute values greater than 0.5, the following variables were
identified as highly correlated:
• Miles, Cost, Market Cost, Load Duration
• Book to Load Days, Book to Pick up Hours
• Zip Bounce Ratio, Zip3 Bounce Ratio, City Bounce Ratio, Shipper Bounce Ratio
• Carrier Bounce Ratio, Carrier Equipment Type Bounce Ratio
• Shipper Carrier Bounce Ratio, Carrier City Bounce Ratio
• Carrier City Bounce Ratio, Carrier Load Day Bounce Ratio, Carrier Pickup Time Bounce Ratio
• Equipment Drivers, Equipment Power Units
• Carrier Equipment Type Bounce Ratio, Carrier CS Bounce Ratio
• Load Week Day, Load Over Weekend
20
Based on these results the following variables were discarded:
• Cost, Market Cost and Load Duration
• Book to Load Days
• Zip Bounce Ratio, Zip3 Bounce Ratio and City Bounce Ratio
• Carrier Bounce Ratio
• Shipper Carrier Bounce Ratio
• Carrier Load Day Bounce Ratio and Carrier Pick up Time Bounce Ratio
• Equipment Drivers
• Carrier CS Bounce Ratio
• Load Over Weekend
Once the highly correlated attributes were excluded, a multicollinearity analysis was done.
Variance Inflation Factor (VIF) analysis was used to identify and eliminate potential multicollinearity
among three or more variables. This analysis identifies multicollinearity among the variables that would
unnecessarily inflate the standard error of the model’s coefficients (Akinwande, 2015). However, the result
of the VIF analysis on the dataset did not identify any additional correlations that were not captured by the
simple correlation analysis.
The outcome of the two correlation analyses led to a reduction of 14 features from the original
dataset (52 to 38 features).
3.3.2. Variable Normalization
Numerical load characteristics were normalized to avoid the possible negative impact of combining
big and small numbers on the logistic regression model (Kutner, Nachtsheim & Neter, 2004). Normalization
process involved transforming the distribution of the numerical attributes to a standard normal distribution
(i.e., mean of zero and standard deviation of one) to have smaller ranges. However, no significant difference
was found in the model after comparing the preliminary results of the normalized and non-normalized
variables. Therefore, the simplest approach was taken by using the non-normalized values to reduce the
model complexity.
3.3.3. Bootstrap Forest Predictor Screening and Stepwise Regression
After finalizing the set of attributes to be used in the model, the next step was to identify the features
that best describe the probability of cancellation. Predictor Screening algorithm, which is based on Random
Forest models, is commonly used to rank features based on their predictive power. The result of this
algorithm ranks the attributes based on their contribution to predict the desired outcome (Biau, 2012). This
21
algorithm was applied to the loads dataset and the ranking was obtained accordingly (refer to Appendix F:
Variables Prediction Rankings for detailed results).
In order to identify the optimal combination of features to be included in a regression model,
forward stepwise regression algorithm was used. This algorithm is widely used to identify the best subset
of features to be used in a regression model. It creates a sequence of regression models by adding or
removing features and measure their impact on the adjusted R2 (Kutner et al., 2004).
Tradeoff between model improvement and complexity was assessed in order to select the features
to be included in the model. According to results of the forward stepwise regression (Figure 4),
contributions of features six onward to the model accuracy are very low. Adding these features to the model
will increase its complexity without considerable improvement in the prediction power. Accordingly, only
the top five attributes were used in the model:
• Carrier City Bounce Ratio
• Book to Pick up Hours
• Carrier Equipment Type Bounce Ratio
• Contract-Spot
• Shipper Bounce Ratio
Figure 4: Forward Stepwise Regression algorithm’s results.
22
3.4. IMPROVE
3.4.1. Logistic Regression
Once the best features to be used for predicting load cancellations were identified, the Logistic
Regression model was built. First, all the categorical features were transformed into dummies variables to
be used in a linear regression model. This step involved creating a new binary variable for each value in the
categorical attributes, which is necessary for regression models. Then, the dataset was split into training
and testing samples following a random selection. Using rule of thumb, data were split as follow:
• Training set: 80% of the dataset
• Testing set: 20% of the dataset
The training set was used to build the logistic regression algorithm and create the predictive model
(de Menezes et al., 2017). Figure 5 shows the model parameters obtained, which reflects that all parameters
are statistically significant at 1% (which reflects a confidence interval of 99%).
Figure 5: Model Parameters and characteristics
3.4.2. Model Accuracy Analysis
As mentioned, the training set was used to build the model and obtain its parameters. The testing
set was used to validate the model performance with new and unknown dataset that was not used in the
building process. Confusion matrices were created to compare how accurate the model can predict
cancellations in the testing set (Fawcett, 2005).
23
Finally, additional three months of unlabeled data were requested. The data were used to test the
model under unknown circumstances to identify whether the model was good enough to predict load
cancellations over new loads.
3.4.3. Machine Learning
A series of machine learning algorithms were also applied to the dataset to validate the results
obtained with the Logistic Regression model. As referenced in the literature review section, the used
machine learning models were:
• Neural Networks
• Random Forest
• K-Nearest Neighbors
The results obtained by these models were compared to the results obtained by the Logistic
Regression model to verify the robustness of the regression model. All models were giving equivalent
results under the same assumptions. Detailed results are discussed further in the results section.
3.5. CONTROL
3.5.1. Sensitivity Analysis
The last step of the analysis was developing a sensitivity analysis that allowed testing the model
under different assumptions and circumstances. Logistic regression threshold analysis was conducted to
measure the impact of the threshold selection on the model robustness. This analysis covered the model’s
accuracy, specificity, sensitivity, precision, and recall.
24
4. Results
Multiple models were developed to predict the probability of load cancellation using the available
dataset. Logistic regression and machine learning models were initially based on the attributes available in
the original dataset along with simple calculated fields to assess their predictive power. The dataset was
later enriched by introducing fields that represent cancellation behaviors of carriers and by linking new data
sources. This section presents the results obtained by the predictive models on each dataset and the approach
followed to confirm the results.
4.1. Available Dataset
A Logistic regression model was initially developed using the load-level data, which was
aggregated from the provided stop-level data (for details about the list of variables refer to Appendix F:
Variables Prediction Rankings). Simple calculated fields were added to the dataset (like load weekday and
month, load duration, days and hours between booking and load pickup) to convert date fields into
numerical variables that can be used in the model. The model results reflected that the available attributes
did not have enough prediction power to predict carrier load cancellation. Table 5 shows the confusion
matrix of the logistic regression model which reflects that only 1.5% of the cancellations were predicted
correctly in this model.
Table 5: Logistic regression confusion matrix on the available dataset
Predictions No Yes
Act
ual
No 652,501 2,956 655,457
Yes 129,727 1,971 131,698
782,228 4,927 787,155
Error 16.86%
Missed Bounces 98.50%
Machine learning models were also developed to confirm the results obtained by the logistic
regression model. Neural Networks, Random Forest, and K-Nearest Neighbor algorithms were applied to
the same dataset to evaluate their performances and compare them to the logistic regression performance.
It was noted that K-Nearest Neighbor predicted cancellations more accurately than the other models, but
the overall accuracy was lower. This can be explained by a tradeoff between True Positives and False
Positives, which caused this performance difference. However, when compared to a random prediction
results, all algorithms performed poorly in predicting carrier load cancellation. Results of the machine
learning algorithms are summarized in Table 6.
25
Table 6: Machine learning models’ results on the available dataset
Error % Missed
Bounces
Neural Networks 16.73% 99.95%
Random Forest 16.61% 99.48%
K-Nearest Neighbor 19.90% 84.44%
4.2. Enriched Dataset
As all variables in the provided dataset were not good predictors for load cancellation, an effort was
exerted to enrich the dataset with more information (for details about the list of variables refer to Appendix
F: Variables Prediction Rankings). A focus on historical bounce ratios for selected characteristics was
emphasized to test if these ratios would provide better results in predicting cancellations. A bounce ratio is
the percentage of cancelled loads of the total loads for that characteristic.
Bounce ratios were calculated for single characteristics (like carriers, shippers, city and state) as
well as combined characteristics (like carrier-city, carrier-contract type and carrier-equipment type). These
ratios were obtained using the three-year dataset and added as additional variables to the load information.
Moreover, weather alerts information was obtained from National Centers for Environmental Information
(NCEI) and added to the data to test the impact of weather severity on cancellation decisions.
The three-years dataset was split into modeling and testing parts, which allow develop the logistic
regression model and testing its performance using different datasets. Initially, carrier-city bounce ratio
appeared to be a good predictor for load cancellation and the model was able to accurately predict more
than 60% of the cancelled loads. Table 7 shows the confusion matrix for the new logistic regression model.
Table 7: Logistic regression confusion matrix on the enriched dataset
Predictions No Yes
Act
ual
No 638,652 16,880 655,532
Yes 52,155 79,468 131,623
690,807 96,348 787,155
Error 8.77%
Missed Bounces 39.62%
Similar improvements were also achieved in the cancellations prediction using the other machine
learning models that were tested earlier, as shown in Table 8.
Table 8: Machine learning models’ results on the enriched dataset
Error % Missed
Bounces
Neural Networks 8.67% 39.04%
Random Forest 8.70% 42.13%
K-Nearest Neighbor 9.33% 44.32%
26
However, a careful interpretation of this improvement had to be made, as the most significant
variables in these models were the newly introduced bounce ratios. These bounce ratios were calculated
over the whole three-year dataset. This might possibly have created some bias for the available dataset, as
the actual bounces were already incorporated into those ratios. This possible effect mandated testing the
model over a new dataset that was not incorporated into the models training or in the ratios calculation.
A new dataset was provided covering three months of loads, which was used to test the model’s
performance. Prediction capability of the model dropped significantly (as shown in Table 9) when tested
on the new data, which confirmed the initial concern of the bounce ratios bias. The obtained results reflected
that neither load characteristics nor historical cancellation patterns provide sufficient information to
correctly predict the cancellation probability of future loads.
Table 9: Logistic regression confusion matrix on the new dataset
Predictions No Yes
Act
ual
No 59,883 3,735 63,618
Yes 8,903 1,722 10,625
68,786 5,457 74,243
Error 17.02%
Missed Bounces 83.79%
The same approach was followed to confirm the results using machine learning algorithms. The
same algorithms used earlier (Neural Networks, Random Forest, and K-Nearest Neighbor) were trained
using the previous three-year data and tested against the new three-month data. All models performed
poorly and did not provide sufficient accuracy to be considered (as shown in Table 10). The similarity of
results among the different models led to the conclusion that the drivers of carrier load cancellations are not
explained by all the existing or added data to the loads information.
Table 10: Machine learning models results on the new dataset
Error % Missed
Bounces
Neural Networks 16.78% 84.70%
Random Forest 16.19% 87.98%
K-Nearest Neighbor 16.41% 86.66%
4.3. Unpredictability Testing
The results obtained by all the models suggested that available data are not sufficient to predict
carrier load cancellations. It also reflected that none of the machine learning models outperformed the
logistic regression model. Accordingly, further tests were carried-on based on the obtained logistic
regression model due to its interpretability and ease of development.
27
Further testing was conducted to confirm the conclusion reached from the models’ results. Two
main hypotheses were developed and tested to confirm the unpredictability conclusion.
1. Prediction accuracy will improve if the model is only applied on loads with enough historical
data. As the main predictors in the developed model were the calculated bounce ratios, the
model is not expected to perform well when no previous ratios exist. Accordingly, the new
dataset was filtered to only include the loads with at least 10 previous records to make a ratio.
Model was tested using the filtered data, and the prediction accuracy was not within an
acceptable range as well (Table 11), as it could only predict less than 20% of the cancellations.
Table 11: Confusion matrix for loads with at least 10 previous records
Predictions No Yes
Act
ual
No 21,449 368 21,817
Yes 2,222 542 2,764
23,671 910 24,581
Error 10.54%
Missed Bounces 80.39%
2. Prediction accuracy will improve if the model is only applied on shorter time horizons. Since
cancellation ratios are dynamic and change over time, the data were filtered to include only the
first seven days of the new dataset. The filter was done to exclude any uncertainty of new
cancellation patterns that might have developed in the new loads. The resulting prediction
accuracy was not accepted for this filtered dataset as well, as the model was capable of
predicting only 20% of the cancellations (Table 12):
Table 12: Confusion matrix for loads that happened on the next seven days
Predictions No Yes
Act
ual
No 2,147 31 2,178
Yes 176 44 220
2,323 75 2,398
Error 8.63%
Missed Bounces 80.00%
The low accuracy obtained from the model proved that the two hypotheses were false and prediction
power could not be improved with better historical data. Accordingly, the unpredictability with the available
data conclusion was confirmed. This poor accuracy means that cancellations are either random events or
caused by other factors that are not captured by the studied dataset.
28
4.4. Further analysis
Additional tests were done to further validate the lack of prediction power of the available data.
The following assumptions were tested:
• Cluster the data using specific attributes (miles, costs, book to pick up hours), and build
different models for each cluster. The objective was to test if cancellation behaviors differ for
different clusters that could not be captured by a single model.
• Reduce dimensionality of data using principal component analysis and use the most descriptive
principal components as an input for the machine learning algorithms. Then, validate if those
principal components could be used to build a better model.
• Develop a model that focuses only on two stop loads (single pickup and single drop) excluding
loads with multiple stops. The objective was to eliminate possible factors that impact multiple
stops loads that were not captured in the data.
• Consider only the cancelled record for each cancelled load. The data had the same load repeated
for each cancellation and for the actual completed load. By only keeping the cancelled records,
the model can better capture the cancellation patterns.
• Build a model that relies only on time related attributes (“Month”, “Day of the week” and
“Book to Pickup Hours”) and assess their impact on load cancellations.
All these additional assumptions were tested by developing multiple models and comparing the
results with the previously developed model. None of the scenarios gave results that showed significant
improvement from the original logistic regression model. Summary of the results is shown in Table 13.
29
Table 13: Additional analyses results compared to the results obtained for the logistic regression.
Test Error Missed
Bounces
Logistic Regression (Threshold=0.5) - Base Scenario 17.02% 83.79%
Cost Clustering
Low Cost (<= $500) 18.20% 99.06%
Mid Cost 16.67% 98.46%
High Cost (>= $6000) 8.49% 100.00%
Miles Clustering
Same day delivery (<= 250 mi) 16.07% 99.18%
Next Day delivery 18.08% 98.18%
Long Haul (>= 550 mi) 18.08% 98.18%
Book To pickup Hours Clustering
Less than 24h 8.53% 100.00%
Between 24h and 48h 16.91% 100.00%
Between 48h and 72h 20.58% 99.99%
More than 72h 22.33% 99.58%
PCA Analysis
Neural Networks 16.38% 86.16%
Random Forest 14.84% 93.67%
K-Nearest Neighbors 17.49% 82.50%
One record per load (first cancellation) and only two stops scenario 17.07% 99.74%
Model based on time attributes (month, day of the week, book to pick-up hours) 17.03% 100.00%
4.5. Logistic regression Threshold Sensitivity Analysis
The final test was a sensitivity analysis over the logistic regression threshold to convert the
numerical regression output to binary output. For the base analysis a threshold of 0.5 was used. This
indicates that if the model output is a value between 0.5 and 1 the load is predicted as cancelled (a bounce);
while if the output value is between 0 and 0.5 the load was predicted as not cancelled.
Reducing the threshold to the average cancellation ratio (17%) might improve the prediction power
of the model and enhance the model results. This reduction would improve the prediction accuracy of
cancelled loads through decreasing the number of False Negatives (wrongly predicted as not cancelled) and
increasing the number of True Positives (accurately predicted as cancelled). However, this action would
also impact the number of False Positives (wrongly predicted as cancelled) and the overall model accuracy.
30
Such tradeoff is commonly used in disease diagnoses to define a threshold on which a disease will
be diagnosed, as shown in Figure 6:
Figure 6: Threshold tradeoff curves. Curve of people diagnosed with a disease (comparable to cancellations) and people without
the disease. It shows that decreasing the criterion value (threshold) will increase the True Positives but will also increase the False
Positives. The distribution of the test results will overlap and changes in criterion value will always be a tradeoff (Source:
MEDCALC)
So, if the threshold (criterion value in Figure 6) is moved to the left (decreased) to reduce the
number of False Negatives (FN), that will inevitably increase the False Positives (FP). Given that, for 3PL
companies, failing to predict actual cancellations (FN) is worse than predicting uncancelled loads as
cancellation (FP), this could potentially be a considerable strategy.
However, another aspect to consider in this tradeoff between FN and FP is the rate of change in
each category. If this tradeoff were proportional, the approach can be justified, as it reduces the error that
is significant to the company. On the other hand, if the relationship between the two errors is not
proportional, it is important to understand the impact of this tradeoff on the overall results. Therefore, an
assessment of how the two mentioned errors (FP and FN) change as the threshold changes.
To verify how the model accuracy changes by changing the threshold, a sensitivity analysis was
developed. The analysis assessed the rate of change in both FN and FP as the threshold is modified. The
actual number of FN loads and FP loads were recorded to identify the operational impact of this tradeoff.
Finally, recall, precision, sensitivity, specificity and the Receiver Operating Characteristic (ROC) Curve
were plotted to provide a broader understanding of how the threshold changes impact the model. Figure 7,
Figure 8, Figure 9, Figure 10 and Figure 11 show the results obtained:
31
Figure 7: Relative variation of FN and FP compared to the base value of loads for a threshold of 0.5 and model accuracy evolution
for different threshold values
Figure 8: Total number of FN and FP loads over the new testing data set and bounces predicted correctly (as % of total bounces)
for different threshold values
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
(10,000)
-
10,000
20,000
30,000
40,000
50,000
60,000
70,000
- 0.10 0.20 0.30 0.40 0.50 0.60
Acc
ura
cy (
%)
% V
aria
tio
n f
rom
Bas
e (T
hre
sho
ld=0
.5)
Threshold
Relative Variation of FP and FN
TN
FN Relative Variation
FP (Missed Not Bounces)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
-
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
- 0.10 0.20 0.30 0.40 0.50 0.60
% o
f B
ou
nce
s p
red
icte
d c
orr
ectl
y o
f to
tal
Bo
un
ces
Load
s
Threshold
Loads
TP
FN (Missed Bounces)
Model Accuracy
32
Figure 9: Recall vs Precision (Recall = TP/(TP+FP) and Precision = TP/(TP+FN))
Figure 10: Sensitivity vs Specificity (Sensitivity = TP/(TP+FN) and Specificity =TN/(TN+FP))
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
- 0.10 0.20 0.30 0.40 0.50 0.60
Threshold
Recall vs Precision
Recall
Precision
Model Accuracy
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
- 0.10 0.20 0.30 0.40 0.50 0.60
Threshold
Sensitivity vs Specificity
Sensitivity
Specificity
Model Accuracy
33
Figure 11: Receiver Operating Characteristic (ROC) Curve to illustrate the ability of the classifier as it the threshold changes
As observed in the charts, reducing the threshold decreases the number of FN, which improves the
prediction of cancelled loads. However, the rate at which the FP increase is much higher. For a threshold
value of 0.17 (which represents the average cancellation ratio), FN decreased 31% compared to the base
value, which was obtained 0.5 threshold, while FP increased 244%. In terms of total loads, the threshold
reduction allowed to correctly predict additional 2,743 cancelled loads. On the other hand, additional 9,130
loads were wrongly predicted as cancelled.
Based on these results, the model could be potentially used under a new threshold, but that change
would not be improving the model. A new threshold would just be reducing the most relevant error for the
company by increasing, at a much higher rate, the error on the other side.
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0.00% 20.00% 40.00% 60.00% 80.00% 100.00%
Sen
siti
vity
1-Specificity
ROC Curve
ROC Curve
45° Curve
34
5. Discussion
As presented in the results section, the studied dataset is not good enough to predict the probability
of a load being cancelled. Based on these results, neither the load characteristics nor the carrier behavior
over time give enough insights into the main drivers of cancellations.
The alternative of changing the threshold, to increase the prediction power of the cancelled loads,
could be used if the company is willing to accept the explained tradeoff between False Positives and False
Negatives. However, this approach would reduce the overall model’s accuracy and force the model to
reduce one type of error by increasing another at a much higher rate.
Based on this assessment, there are three potential approaches, which could be taken
simultaneously, that can be adopted to face the cancellation issue.
5.1. Threshold Reduction
One alternative, based on the results obtained, is to use the model with a lower threshold. By using
a threshold of 0.17 (average bounce ratio), the prediction power of the model increases by 31%, and the
model can predict 42% of the cancelled loads correctly. However, the cost of this alternative is an increase,
at a much higher rate (244%), of the number of loads predicted wrongly as cancelled. An assessment of the
over-cost associated with this tradeoff must be done in order to evaluate its applicability. Loads predicted
to be cancelled will most probably have some contingency plan or would need more supervision than the
rest of the loads. By reducing the threshold, the company will correctly predict more cancelled loads, but it
will need to monitor or have contingency plans for a much bigger number of loads. The ratio at which the
number of loads change is 4.3 to 1, which means that to predict 1 additional cancelled load correctly, 4.3
additional loads will be predicted as cancelled.
The decision of whether to use the model or not under this tradeoff would most probably depend
on the company strategy and cost structure. If the average cost per cancellation was considered ($145),
changing the threshold is only justified if the cost of the added errors on the company’s operation is lower.
Accordingly, changing the threshold is feasible if all the actions taken to control those additional 4.3 loads
do not cost more than the $145 (which is the average potential saving by avoiding a cancellation). This
means that a cost of $33.5 per load is the maximum additional cost for the developing contingency plans or
tracking loads predicted to be cancelled.
5.2. Further Research
Another approach is to do further research on the main drivers of the cancellations and capture
more information about these causes. As seen in the results, the cancellations are mostly related with the
35
carrier and the cities in the load route. However, the cancellation behavior does not remain constant over
time, so the causes are probably diverse. Surveys of existing carriers can be implemented to capture the
range of reasons for cancellations. In addition, real causes for each new cancellation can be captured to
understand the details and drivers in each case. The survey may include some potential root causes that can
be categorized and studied in future. Moreover, transactional data related to the captured causes should be
recorded with the loads information to facilitate future analyses. Once a considerable amount of surveys is
collected, further research can be done in order to build better models for predicting the load cancellations.
Some of the potential operational details that can be collected, that can be useful to predict carrier
load cancellations are:
• Information regarding sequence of back-to-back loads that use the same truck
• Information regarding available capacity of the carrier during the load day
• Information regarding drivers’ behavior
However, the challenge in capturing such information is that it is not fully captured by a single 3PL
company. This is due to the fact that the carriers are interacting with multiple shippers and 3PLs at the same
time. A delay on any of the loads (not necessary booked by the same company) may cause a cancellation
of another load. Moreover, carriers may overbook their capacity to reduce their risk of underutilization, and
then pick the loads that are more profitable and convenient. This may also cause cancellations for the 3PL
company’s loads, even if the company is not fully booking that carrier’s capacity. Therefore, capturing such
information will be a challenge, as they are not available at a company level but at an industry level.
5.3. Business Strategy
Another approach, that can help to reduce the impact of the cancellations, is to redefine some of the
company’s booking process. Currently, no defined consequences are faced by carriers for cancellations.
This may incentivize carriers to overbook their capacity and cancel the “least attractive” options later. As
the main drivers for the carrier cancellations have not been identified, a possible way to increase the
commitment of carriers is to include penalty clauses on the contract. These penalties can be either financial
penalties or reduction in the business awarded to the carrier.
Companies can also develop carriers’ ratings based on cancellation ratios. These rating can be used
for future contract negotiations and performance reviews. Having carriers aware of the consequences of
cancellations might help in reducing these cancellations or providing longer notice when a cancellation is
inevitable.
36
6. Conclusion
Load cancellations is a common issue within the trucking industry. The cost impact of cancelled
loads on companies represents significant amount of money every year. This extra cost is associated with
finding a replacement carrier on the spot market and losing productivity in the rebooking process.
3PL companies have access to huge amount of detailed transactional data generated by their
operation. Descriptive analysis over these data showed some factors that may impact the probability of load
cancellations. However, those factors (characteristics) were not enough to build a predictive model with
considerable accuracy.
A tradeoff can improve the model by reducing the logistic regression’s threshold at which a load is
predicted as cancelled. However, this tradeoff improves the prediction of cancelled loads, but at the same
time predicts more uncancelled loads as cancelled. With this approach, the model can be used to predict at
a decent level of accuracy and sensitivity. However, this tradeoff also generates a decrease of the model
specificity an increase, at a much higher rate, in the number of loads wrongly predicted as cancelled. This
would have an impact on the company’s operation, as the number of loads that must be monitored will
increase in a very high rate.
Based on these results different approaches can be followed to lower the impact of load
cancellations. First, the company can use the model with lower threshold and accept the high number of
False Positives in order to predict correctly up to a 42% of the loads cancelled. Another alternative is to
improve the model by surveying the carriers to identify the causes of cancellations. Surveys can identify
the root causes and enable building appropriate metrics to capture these causes. Once the data are captured,
it can be incorporated into the predictive model to improve its accuracy. One last alternative is to incorporate
business decisions that discourage carriers to cancel loads or encourage them to provide longer time window
if cancellation is inevitable.
Finally, it is known that the trucking industry is very complex with many stakeholders that are not
necessarily interconnected. This complexity might confound the possibility of building a good predictive
model using the company data solely. As carriers work with many shippers and brokers at the same time,
cancellations might be a consequence of delays or cancellations in other loads that are not managed by the
same company. This fact implies that companies will always have a limited view of all the factors that
might impact the load cancellations and consequently hinder the ability to build a sound predictive model.
37
7. Reference List
Alcoba, R. D., Ohlund, K. W. (2017). Predicting On-time Delivery in the Trucking Industry. Massachusetts
Institute of Technology, Center for Transportation and Logistics.
Akinwande, M. O. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor
Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. Retrieved form
http://www.scirp.org/
American Trucking Associations (2017). Reports, Trends & Statistics. Retrieved from
http://www.trucking.org/
AT Kearney (2016). CSCMP’s Annual State of Logistics Report. Logistics in Transition: New Drivers at
the Wheel. Retrieved from https://www.lee-associates.com/
Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research 13, 1063-
1095. Retrieved from http://www.jmlr.org/
U.S. Department of Transportation Bureau of Transportation Statistics (2015). Freight Facts and Figures.
Retrieved from https://www.bts.gov/
de Menezes, F. S., Liska, G. R., Cirillo, M. A., Vivanco, M. J. F. (2017). Data classification with binary
response through the Boosting algorithm and logistic regression. Elsevier, 69, 62-73. Retrieved from
https://www.journals.elsevier.com/
Fawcett, T. (2005). An introduction to ROC analysis. Elsevier, 27, 861-874. Retrieved from
https://www.journals.elsevier.com/
Kotsiantis, S.B., Zaharakis, I.D., Pintelasstates, P.E. (2007). Machine learning: a review of classification
and combining techniques. Artif Intell, 26, 159-190. DOI 10.1007/s10462-007-9052-3
Kutner, M. H.; Nachtsheim, C. J.; Neter, J. (2004). Applied Linear Regression Models. New York, NY:
McGraw-Hill Irwin. ISBN 0-07-238688-6
Markham, K. (2015). Comparing supervised learning algorithms. Retrieved from
http://www.dataschool.io/
MEDCALC easy-to-use statistical software (2018). ROC curve analysis. Retrieved from:
https://www.medcalc.org/
National Centers for Environmental Information (2017). Storm Events Database: years 2015, 2016 and
2017. Retrieved from: https://www.ncdc.noaa.gov/
Olson, D. L., Wu, D. (2017). Predictive Data Mining Models (1st ed.). Singapore: Springer. DOI
10.1007/978-981-10-2543-3
Pillet, M. (2004). Six Sigma Comment l’Appliquer. Paris : Éditions d’Organisation. ISBN: 2-7081-3029-3
Potter, K. (2006). Methods for Presenting Statistical Information: The Box Plot. University of Utah School
of Computing.
38
Thompson, W.R. (2009). Variable Selection of Correlated Predictors in Logistic Regression: Investigating
the Diet-Heart Hypothesis (Thesis). Florida State University College of Arts and Sciences.
U.S. Department of Transportation Federal Highway Administration (2003). Commercial Vehicle Weight
Standards. Retrieved from https://ops.fhwa.dot.gov/
39
8. Appendices
Appendix A: Project Gantt
This appendix shows the project plan that was followed during this research:
Figure A 1: Project Gantt. This Gantt provide detailed information regarding the plan followed to complete the project.
Act
ion
Pla
nM
on
th
Act
ivit
yW
ee
k1
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
4
Kik
Off
me
eti
ng
Pro
cess
Map
pin
g
Var
iab
les
List
ing
and
Dat
a R
evi
ew
Lite
ratu
re R
evi
ew
Pro
ble
m M
od
eli
ng
Dat
a C
lean
ing
Dat
a an
alys
is a
nd
Re
leva
nt
Var
iab
le D
efi
nit
ion
Re
sear
ch E
xpo
Var
iab
le S
ele
ctio
n
Mo
de
lin
g
Ris
k A
nal
ysis
Mo
de
l Te
stin
g (S
cen
ario
An
alyz
ing)
Fin
anci
al a
nal
ysis
imp
rove
me
nt
wit
h n
ew
mo
de
l
Re
com
me
nd
atio
ns/
Sen
siti
vity
An
alys
is
Cap
sto
ne
Pro
ject
Wri
tin
g an
d P
rese
nta
tio
n
MM
eas
ure
Sep
tem
be
rO
cto
be
rN
ove
mb
er
Mar
chA
pri
lM
ay
DD
efi
ne
De
cem
be
rJa
nu
ary
Feb
ruar
y
AA
nay
ze
IIm
pro
ve
CC
on
tro
l
Act
ion
Pla
nM
on
th
Act
ivit
yW
ee
k1
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
4
Kik
Off
me
eti
ng
Pro
cess
Map
pin
g
Var
iab
les
List
ing
and
Dat
a R
evi
ew
Lite
ratu
re R
evi
ew
Pro
ble
m M
od
eli
ng
Dat
a C
lean
ing
Dat
a an
alys
is a
nd
Re
leva
nt
Var
iab
le D
efi
nit
ion
Re
sear
ch E
xpo
Var
iab
le S
ele
ctio
n
Mo
de
lin
g
Ris
k A
nal
ysis
Mo
de
l Te
stin
g (S
cen
ario
An
alyz
ing)
Fin
anci
al a
nal
ysis
imp
rove
me
nt
wit
h n
ew
mo
de
l
Re
com
me
nd
atio
ns/
Sen
siti
vity
An
alys
is
Cap
sto
ne
Pro
ject
Wri
tin
g an
d P
rese
nta
tio
n
MM
eas
ure
Sep
tem
be
rO
cto
be
rN
ove
mb
er
Mar
chA
pri
lM
ay
DD
efi
ne
De
cem
be
rJa
nu
ary
Feb
ruar
y
AA
nay
ze
IIm
pro
ve
CC
on
tro
l
Act
ion
Pla
nM
on
th
Act
ivit
yW
ee
k1
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
41
23
4
Kik
Off
me
eti
ng
Pro
cess
Map
pin
g
Var
iab
les
List
ing
and
Dat
a R
evi
ew
Lite
ratu
re R
evi
ew
Pro
ble
m M
od
eli
ng
Dat
a C
lean
ing
Dat
a an
alys
is a
nd
Re
leva
nt
Var
iab
le D
efi
nit
ion
Re
sear
ch E
xpo
Var
iab
le S
ele
ctio
n
Mo
de
lin
g
Ris
k A
nal
ysis
Mo
de
l Te
stin
g (S
cen
ario
An
alyz
ing)
Fin
anci
al a
nal
ysis
imp
rove
me
nt
wit
h n
ew
mo
de
l
Re
com
me
nd
atio
ns/
Sen
siti
vity
An
alys
is
Cap
sto
ne
Pro
ject
Wri
tin
g an
d P
rese
nta
tio
n
MM
eas
ure
Sep
tem
be
rO
cto
be
rN
ove
mb
er
Mar
chA
pri
lM
ay
DD
efi
ne
De
cem
be
rJa
nu
ary
Feb
ruar
y
AA
nay
ze
IIm
pro
ve
CC
on
tro
l
40
Appendix B: Cancellation Causes Brainstorming
This appendix shows the list of variables identified in the brainstorming process as potential
attributes for the model:
Figure B 1: Diagram of potential variables impacting the cancellation probability of a certain load
Can
cellati
on
s
Lo
ad
Imp
ac
t
Sh
ipp
er
Imp
ac
t
Ca
rrie
rIm
pa
ct
Oth
er
Imp
ac
ts
Ca
rrie
r S
ize
Ca
rrie
r T
yp
e
Lo
ad
s/Y
ea
r
Bo
un
ce
/Ca
rrie
rC
arr
ier
IDC
arr
ier
Le
ng
th o
fR
ela
tio
ns
hip
Sa
fety
Ra
tin
g
Nu
mb
er
of
Cla
ims
/In
cid
en
ce
Sh
ipp
er
ID
Fa
cilit
y
Ind
us
try
Sh
ipp
er
Le
ng
th o
fR
ela
tio
ns
hip
Sh
ipp
er
Siz
e
Fa
cilit
y D
we
llT
ime Facil
ity
Imp
act
Carr
ier
His
tory
Imp
act
Carr
ier
Ch
ara
cte
risti
cs
Imp
act
Sh
ipp
er
Ch
ara
cte
risti
cs
Imp
act
Sh
ipm
en
ts/Y
ea
r
Sh
ipm
en
t
His
tory
Im
pact
Carr
ier
Issu
es
Imp
act
Ca
rrie
r R
ep
We
ath
er
Na
tura
lD
isa
ste
rG
eo
gra
ph
y
Re
p T
en
ure
Inte
rnal
Facto
rs
Imp
act
Exte
rnal
Facto
rs I
mp
act
Da
y o
f th
eW
ee
kB
oo
k T
ime
Lo
ad
Tim
e
Lo
ad
ID
Ori
gin
De
sti
na
tio
n
Nu
mb
er
of
Sto
ps
Lo
ad
Co
st
Lo
ad
Ra
te
Sp
ot
Pri
ce
Ap
po
intm
en
tT
yp
e
Le
ad
Tim
e
Em
pty
Tim
e
Hig
h R
isk
Hig
hV
alu
e
Bo
ok
Le
ad
Tim
e
Se
rvic
eL
ev
el
On
-Tim
eD
eliv
ery O
n-T
ime
Pic
kU
p
Eq
uip
me
nt
Typ
e
De
ad
He
ad
Le
an
gth
of
Ha
ul
Du
rati
on
We
igh
t
Lo
ad
ing
Tim
e
Un
loa
din
gT
ime
Co
ntr
ac
tT
yp
e
Lo
ad
Ch
an
ge
s
Ca
rrie
rC
on
fere
nc
e
Pri
ce I
mp
act
Lo
ad
Ch
ara
cte
risti
cs
Imp
act
Tri
p C
hara
cte
risti
cs
Imp
act
Co
ntr
act
Ch
ara
cte
risti
cs
Imp
act
41
Appendix C: Data Glossary
This appendix list all the attributes included in the dataset provided by the company along with a
brief description:
Table C 1: Dataset Glossary
Field Description Values
Loadid ID of the load.
LoadDate Date of the load to be executed.
EquipmentType Type of the required truck. V = Van
R = Refrigerated
CustomerID Shipper ID.
Industry Shipper Industry Type.
Conference Carriers Reps team in the 3PL company Conferences are limited to
USA
Team Expedited Service. 2 Drivers executing one
load.
HighValue Indicator if the load is high value or not.
Which means carrier needs higher value
insurance to execute the load.
Above 100k$ in product
value is considered as high
value.
Miles Total miles for the load trip.
NumStops Number of stops on a load. A Load can be
multi pick or multi stop.
LoadStopID ID for each unique stop
Type Stop type. If its pick up or delivery. 1 = Pickup
2 = Delivery
Sequence The order of stops.
FacilityID Pickup Facility ID.
CityName City Nam of the pickup facility.
StateCode State Code of the City of the pickup facility.
ZipCode Zip Code of the City of the pickup facility.
Contract-Spot Specify if the load is booked through spot
price load or contract price. This reflect the
latest status of load.
C = Contract
S = Spot
Weight Weight of the load Up to 45 tons.
Pallets # of pallets in the load Up to 48 pallets.
Code Type of appointment
Appt = Fixed date and
time
Notice = Shipper inform to
carrier
Open = Carrier can arrive
through within a certain
period
None = There is no
appointment defined
Appt Date and Time of the pickup.
42
Field Description Values
ArrivedAtFacility Actual date and time of arrival to the
facility.
DepartedFacility Actual date and time of departure from the
facility.
Cost Cost of load.
FacilityOpenTime Time the pickup facility is opens on the
pickup day.
FacilityOpenTime =
FacilityCloseTime
represents 24h operation.
FacilityCloseTime Time the pickup facility closes on the
pickup day.
FacilityOpenTime =
FacilityCloseTime
represents 24h operation.
LoadRank Contract carriers are broken down into
Primary, Backup and Secondary.
Primary = 1st carrier that
the load was tendered to.
Backup = Backup carrier
Secondary = 2nd carrier
that the load was tendered
to (f both primary and
backup do not show up).
Spot = Carrier from the
spot market.
CarrierID ID for the carrier.
BookByDateTime Date when the load was assigned to carrier. Blank means load was not
covered.
CarrierType Gives a list of all carriers that were
bounced, or they actually hauled the freight.
Bounced = Carrier
cancelled the load.
Actual = Carrier hauled
the load.
IsBounced Indicator if the load is bounced or not.
BouncedDateTime Date the bounce was recorded.
MaxPay Maximum allowed payment for carrier. Sometimes Max Pay is
opened (no value) because
the freight is a “must
move” and has to go, no
matter the cost.
EmptyTime Date and time for the carrier when they are
booked on the load.
ExpectedEmptyLocation City and State where the truck is expected
to be empty when they are booked on the
load.
DeadHeadMiles Miles expected with empty truck when the
carrier is booked on the load.
MarketCost Spot market cost for the booked load
43
Appendix D: Data Cleaning Process
This appendix provides further details regarding the cleaning process done over the data. Data
received were first cleaned to obtain a valid dataset that could be used for the modeling. The main cleanup
steps were:
• Remove all records where there is a mismatch between IsBounced and CarrierType fields
(IsBounced=False and CarrierType=Bounced) - Removed Records = 774,587
• Remove all records where Type has a wrong value (Type must be 1=PickUp or 2=Delivery) -
Removed Records = 1,873
• Remove all records where cost is less than 0 - Removed Records = 107
• Remove all unique load bookings (Loadid, CarrierID, IsBounced, BouncedDate combinations)
where no record exists for the first stop (Sequence=1) - Removed Records = 8,695
After completing the above data cleaning process, the data remaining consisted of:
• Total number of records = 9,919,181 (92.6% of original data)
• Number of unique loads = 3,621,367 (99.8% or original data)
Table D 1 summarizes the observations that were done during the transformation process and the
actions taken:
Table D 1: Summaries of observations on the data and the actions taken to resolve these observations
Observation Action Taken
A load can be bounced and completed by the
same carrier
Load-level data are designed so that each record
represents a unique combination of the below
fields:
▪ Loadid
▪ CarrierID
▪ IsBounced
▪ BouncedDate
A load can be bounced by the same carrier
multiple times
First stop of some loads (id, carrier, bounce
combination) is duplicated due to different
values in MarketCost or MaxPay
Average values were considered for these
duplicates
Load weight and pallets data are available on
stop-level
The sums of weights and pallets for all pickup
stops were aggregated to the load-level.
Some loads have cost = 0 MarketCosts were assumed as the cost for these
loads
44
Appendix E: Descriptive Analytics
This appendix provides further details regarding the descriptive analytics done over the available
data. A statistical descriptive analysis of the load data to give an insight of the ranges and variability of the
numerical characteristics:
Table E 1: Load Characteristics Statistics summary of all the loads (since 2015)
Total Distance
(miles)
Number of
Stops Cost
Dead Head
(miles)
Weight
(ton)
mean 615 2 $1,613.17 41 32,960
Std. 522 0.5 $1,391.63 67.8 13,660
Min 0 2 $0.00 0 0
Q1 245 2 $700.00 0 22,853
median 464 2 $1,200.00 18 40,368
Q3 822 2 $2,000.00 61 43,379
max 3,500 31 $12,000.00 3,341 45,000
Cancellation ratios (percentage of cancelled loads to all loads) was analyzed for multiple variables
to identify the characteristics with differentiable cancellation probabilities:
Figure E 1: Cancellation Ratios by Industry. This chart represents the cancellation ratio for each of the shippers’ industries.
45
Figure E 2: Cancellation Ratio by Carrier Maturity (in years). Carrier maturity is defined by the length of the carrier’s relationship
with the company. This chart reflects that cancellation ratio is higher for carriers that have been working with the company for a
shorter period of time.
Figure E 3: Cancellation Ratio by City: This map reflects the number of loads that go through different cities and the cancellation
ratio (bounces/total loads). Size represents the number of loads, while the color represents the bounce ratio.
46
Figure E 4: Cancellation Ratio by Deadhead Miles. This chart shows that the cancellation ratio tends to increase for loads with
longer deadhead miles.
Figure E 5: Cancellation Ratio by Conference. This chart shows the cancellation ratio behavior within the Conferences managing
FTL within the U.S.
47
Figure E 6: Cancellation Ratio by High Value indicator. This chart shows the cancellation ratios for high value (1) loads vs. non-
high value loads (0).
Figure E 7: Cancellation Ratio by Number of Stops in a load. This chart shows how the cancellation ratio changes for the different
number of stops per load.
48
Figure E 8: Cancellation Ratio by Type of Contract (with shipper) , where C=Contract and S=Spot.
Figure E 9: Cancellation Ratio by trip length. This chart show that next day delivery has a higher cancellation ratio than the other
options.
49
Figure E 10: Cancellation Ratio by Load Cost (in $). This chat shows a decreasing trend in the cancellation ratios as the cost of
the load increases.
Figure E 11: Cancellation Ratio over Time. This chart shows the cancellations trend over time for the three-year dataset.
50
Figure E 12: Cancellation Ratio by number of claims against carrier. This chart shows how the cancellation ratios are higher for
carriers with higher number of claims.
Figure E 13: Cancellation Ratio by State (represented by the color) and Total Cancellations by State (Represented by the number).
This chart shows how the cancellation ratio change from one state to the other (dark orange represents high cancellation ratio
while dark blue represents low cancellation ratio) and the total number of cancellations that happened in each state.
51
Figure E 14: Cancellation Ratio by first pickup appointment time slot. This chart represents how the cancellation ratio changes
for different appointment time slots for the first pickup (Morning: between 6am to 11am, Afternoon: between 11am to 4pm, Evening:
between 4pm and 9pm, Late night: between 9pm and 6am)
Figure E 15: Cancellation Ratio by month. This chart shows how the cancellation ratios over the months.
52
Figure E 16: Cancellation Ratio by day of the week. This chart shows how the cancellation ratios over the weekdays.
Figure E 17: Cancellation Ratio by Book to Pick up Hours (time between the booking of the load and the first pickup). This chart
shows that the probability of cancellation increases as the time between the booking and pickup increases.
53
Appendix F: Variables Prediction Rankings
This appendix shows the features prediction ranking for the available and enriched datasets.
Figure F 1: Predictor screening of variables in the available dataset. The figure shows the result of the predictor screening
algorithm that ranks the features based on their prediction power.
54
Figure F 2: Predictor screening of variables in the enriched dataset. .The figure shows the result of the predictor screening
algorithm that ranks the features based on their prediction power.