airline flights delay prediction- 2014 spring data mining project

22
Flight Delay Prediction Model Vishwanath K, Viral Tarpara, Haozhe Wang, Ling Zhou

Upload: haozhe-wang

Post on 15-Jul-2015

540 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Airline flights delay prediction- 2014 Spring Data Mining Project

Flight Delay Prediction ModelVishwanath K, Viral Tarpara, Haozhe Wang, Ling Zhou

Page 2: Airline flights delay prediction- 2014 Spring Data Mining Project

Business Problem Overview

Flight delay is a challenging problem for all airline companies, which will lead to

● Financial losses.

● Negative impact on their business reputation.

$32.9B

$8.3B $16.7B $3.9B $4B

Cost of Delays in the US

Cost to Airlines Cost to Passengers Cost from

Lost Demand

GDP ImpactSource: Total Cost Impact Study

Page 3: Airline flights delay prediction- 2014 Spring Data Mining Project

Business Problem Overview

ModelPredict Flight Delay

Optimize operation

Reduce further loss

Airline

Companies

Help

Page 4: Airline flights delay prediction- 2014 Spring Data Mining Project

Literature Review on Delay CostsAirline industry incurs an average cost of about $11,300 per delayed flight.

based on 61,000 delayed flights per month average

Excludes costs to passengers and lost demand

A more accurate delay prediction system can help to identify operational variables that contribute to delays.

While some conditions, such as weather, are not controllable factors, the way airlines and airports operate and optimize resources in the face of "acts of god" is controllable.

Page 5: Airline flights delay prediction- 2014 Spring Data Mining Project

Data UnderstandingDataset: On-Time Performance

From Research and Innovative Technology Administration,BTS

Page 6: Airline flights delay prediction- 2014 Spring Data Mining Project

Data UnderstandingPotentially Useful Variables:

Quarter,

Month;

Day of Month

Flight

Number

Origin Airport;

Destination Airport

Departure Block;

Arrival Block

Carrier

Departure Delay;

Arrival Delay

Time

OperationGeography

Airline

Page 7: Airline flights delay prediction- 2014 Spring Data Mining Project

Training: Testing:

Data Preparation

Selected Attributes from 2012 Data

Derived Attributes from 2011 Data

Selected Attributes from 2013 Data

Derived Attributes from 2012 Data

Attributes from Additional Dataset Attributes from Additional Dataset

Page 8: Airline flights delay prediction- 2014 Spring Data Mining Project

Data PreparationSelected Attributes:

1. Quarter

2. Month

3. Day of Month

4. FL_NUM: Flight Number

5. Origin: Origin Airport

6. Dest: Destination Airport

7. UniqueCarrier: Unique Carrier Code

8. DepTimeBLK: Departure Time Block, Hourly Intervals

9. ArrTimeBLK: Arrival Time Block, Hourly Intervals

Target: ArrDel: Arrival Delay, 1=Y, 0=N

Removed for the project.to build the full model these attributes

are necessary.

Page 9: Airline flights delay prediction- 2014 Spring Data Mining Project

Data Preparation

Derived Attributes:

1. Airline_Delay: the percentage of delay by each airline in one year

2. Flight_Delay: the percentage of delay by each specific flight in one year

3. Day_Delay: the percentage of delay by day of month for all flights in one year

4. Origin_Delay: the percentage of delay by each origin airport for all flights in one year

5. Dest_Delay: the percentage of delay by each destination airport for all flights in one year

6. Dep_BLK_Delay: the percentage of delay by each departure block for all flights in one year

7. Arr_BLK_Delay: the percentage of delay by each arrival for all flights in one year

Page 10: Airline flights delay prediction- 2014 Spring Data Mining Project

Data Preparation

Additional Dataset : Schedule EmployeesFrom Research and Innovative Technology Administration, BTS

Page 11: Airline flights delay prediction- 2014 Spring Data Mining Project

Data Preparation

Additional Attributes:

1. Full Time Employees in current month

2. Part Time Employees in current month

3. FTE Employees: Full Time Equivalent Employees in current month

(2 part time= 1 full time)

4. Total Employees in current month

We wanted to see if historical on-time performance and current

staffing levels was enought to build a decent model.

Page 12: Airline flights delay prediction- 2014 Spring Data Mining Project

Data Preparation

Large size of dataset(2.9GB)

Merge these attributes by month(via Excel Vlookup)

Use data of one month, January, to build the model.

Page 13: Airline flights delay prediction- 2014 Spring Data Mining Project

Modeling

• Naive Beyes

• Decision tree- J48(with various leaf sizes)

• Logistic Regression “refused” to grocess in Weka

Page 14: Airline flights delay prediction- 2014 Spring Data Mining Project

Modeling

Preprocess

• Convert the type of attributes

• Convert csv file to arff(70MB)

Training:

• Instances: 422539

• Attributes: 19

Testing:

• Instances: 478145

• Attributes: 19

Page 15: Airline flights delay prediction- 2014 Spring Data Mining Project

NaiveBayes Modeling

On Training Data

Confusion Matrix of Naïve Bayes:

a b <-- classified as

333876 28289 | a = 0 (on-time)

45761 14613 | b = 1 (delay)

Accuracy ROC Area

Naïve Bayes 82.475% 0.694

High cost, lower is better

Page 16: Airline flights delay prediction- 2014 Spring Data Mining Project

Modeling- snapshot

J48 with different parameter:

MinObjNum Accuracy ROC Area

15 88.4917% 0.85

25 87.7308% 0.791

50 87.3311% 0.774

100 87.0414% 0.767

150 82.475% 0.694

Page 17: Airline flights delay prediction- 2014 Spring Data Mining Project

Modeling - snapshot

Confusion Matrix of J48, 25:

a b ---classified as

356570 5595 a=0

46247 14127 b=1

Confusion Matrix of J48, 15:

a b <-- classified as

356407 5758 | a = 0

42869 17505 | b = 1

Confusion Matrix of J48, 100:

a b ---classified as

357881 4284 a=0

50471 14127 b=1

Confusion Matrix of J48, 50:

a b ---classified as

356885 5280 | a = 0

48251 12123 | b = 1

Page 18: Airline flights delay prediction- 2014 Spring Data Mining Project

Training model Performance-J48

0.7

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

15 25 50 100 150

ROC Area

ROC Area

The trend

levels off

around 0.76

Page 19: Airline flights delay prediction- 2014 Spring Data Mining Project

Model Evaluation

oThe evaluation is mainly based on the falsely classified on-time instances: this is the

case where pessengers are given confidence on arrive on time while end up being late.

oWe choose trainning model with largest AUC and smallest False Nagative value.

MinObjNum Accuracy ROC Area FN Value Results

15 88.4917 % 0.85 42869 Reject

25 87.7308% 0.791 46247 Reject

50 87.3311% 0.774 48251 Reject

100 87.0414% 0.767 50471 Reject

150 86.8746 % 0.761 51276 Reject

NaiveBeyes 82.475% 0.694 45761 Accept

Page 20: Airline flights delay prediction- 2014 Spring Data Mining Project

Model EvaluationModel Performance on Testing Data(Jan 2013)

Model ROC Area FN Value

J48_minObjNum=100 0.512 82442

Naive Bayes 0.583 74058

Page 21: Airline flights delay prediction- 2014 Spring Data Mining Project

DeploymentExample : Avoiding the Most Delay Prone Parts of the System

Schedule your air flight without a layover

Avoid the major hubs by using smaller airportsChicago ORD, New York City (All), Atlanta were the worse in terms of congestion

Early Morning Departure flights have better on-time performance

Late Afternoon and early evening has the worst on-time performance

Page 22: Airline flights delay prediction- 2014 Spring Data Mining Project

"When I can, I try to arrive the night

before," says Russell Hayward, a

USA TODAY Road Warrior. "But that eats

up a whole work day, wasted

travel time due to airline uncertainty."

(Woodyard, 2001)