flight delays and cancellations
TRANSCRIPT
Flight Delays and
Cancellations
Asad Zaidi Soubhi Hadri
Department of Electrical and Computer Engineering
The University of Oklahoma
December, 2017
EDA & Flight delay prediction
Introduction:
1
• Data was collected and published by the U.S. Department of
Transportation's for 2015.
• It is available on Kaggle.
• The question to answer:
• Which airline should you fly on?
Dataset Discovery :
2
Dataset contains three CSVs files:
1- airlines.csv
Dataset Discovery :
3
Dataset contains three CSVs files:
2- airports.csv
Dataset Discovery :
4
Dataset contains three CSVs files:
3- flights.csv
Dataset Discovery :
5
Exploratory Analysis :
6
Missing Data:
Many NaNs !
Exploratory Analysis:
7
Negative Delay!
Ahead flights
Exploratory Analysis:
8
Exploratory Analysis:
9
Exploratory Analysis:
10
Best Airlines :
ON_TIME_PER
Exploratory Analysis:
11
Best Airlines :
MEAN_DEPARTURE_DELAY
Exploratory Analysis:
Best Airlines :
MEAN_DEPARTURE_AHEAD
12
Exploratory Analysis:
Best Airlines :
CANCELLED_PERCENTAGE
13
Exploratory Analysis:
Best Airlines :
CANCELLATION_REASONS
14
Exploratory Analysis:
Best Airlines :
DIVERTED_FLIGHTS
15
Exploratory Analysis:
16
The same for ARRIVAL_TIME
Exploratory Analysis:
17
Exploratory Analysis:
18
Exploratory Analysis:
Best Airlines :
MEAN_SPEED
19
Exploratory Analysis:
Best Airlines :
Simple ranking using:
• MEAN_SPEED
• MEAN_DEPARTURE_DELAY
• MEAN_DEPARTURE_AHEAD
• CANCELLED_PERCENTAGE
• DIVERTED_FLIGHTS
20
Flight Delay Prediction
Flight Delay Prediction
• Convolution Neural Network.
• Tensorflow – Python.
• Columns:
• AIRLINE
• DAY_OF_WEEK
• ORIGIN_AIRPORT
• DESTINATION_AIRPORT
• DISTANCE
• DEPARTURE_DELAY
21
Flight Delay Prediction
Steps:
• Remove :
• DEPARTURE_DELAY<0
• CANCELLED
• DIVERTED
• Encode (using One hot encoding):
• AIRLINE
• ORIGIN_AIRPORT
• DESTINATION_AIRPORT
22
Flight Delay Prediction
23
Flight Delay Prediction
24
• Regression:
• 5 convolution layers.
• 2 pooling layers.
• 2 full connected layers _ dropout.
• loss function : square mean.
• Bad results!
• Reasons (maybe):
• Not able to use full dataset.
• Inappropriate encoding.
• Network structure.
First Try:
Flight Delay Prediction
25
• Convert the problem from regression to classification.
• Spread delay values into 5 levels.
• Use CNN structure similar to AlexNet.
• Result:
• Still running :D .
Second Try:
Script on GitHub: https://github.com/SubhiH/Flight-Delays-and-Cancellations-EDA
Thank you