bi ppt finale

Download BI PPT Finale

If you can't read please download the document

Upload: vipul-neema

Post on 06-Apr-2017

328 views

Category:

Documents


2 download

TRANSCRIPT

PowerPoint Presentation

Predict Aircraft Damage Upon Bird Strike

Group 01Members: Ketan Bansal Mayank AterkarRitu PandeyRoheen ChaturvediVipul Neema

1

AGENDAProblem definition and backgroundData set descriptionVariable DescriptionPreprocessing and predictive analysisModel comparisonConclusionFuture scope

PROBLEM BACKGROUND AND DESCRIPTION

Bird and other wildlife strikes with aircraft cause over $900 million annual damage to U.S. civil and military aviation. These strikes put the lives of aircraft crew members and their passengers at riskOver 250 people have been killed worldwide as a result of wildlife strikes since 1988

PROBLEM BACKGROUND AND DESCRIPTION

According to Bird Strike Committee USA, a recent Bird strike accident of Transavia B738 at Girona on Jul 11th 2014 led to the plane to be under repair for 10 dayshttp://avherald.com/h?article=4774b5a9&opt=0The data cover the incidents involving Airplane bird strikes in United States

KEY FEATURES OF THE DATASETTotal of 37 Input VariableIt has around 99,404 plus records The dataset source is federal aviation administration websiteTarget variable : Effect_Indicated_Damage

VARIABLE DESCRIPTION

PREPROCESSING AND PREDICTIVE ANALYSIS

FILE IMPORT

FILE IMPORT

Load data in data sourceCheck for variable summary, significance and roleRelevant variable selection will be done by Variable Selection NodeTarget Variable- Effect: Indicated Damage

VARIABLE SELECTIONUsed to identify the variables which are important for predicting the target variableFew of the variables like Flight Date & Record_ID, were rejected manuallyThey didnt have any effect on the outcome of target variable

VARIABLE SELECTION

IMPUTE

IMPUTE

The missing variables are replaced ,instead of removing record altogetherMean, Median and Count are most commonly used method for that.Missing values in Interval variables replaced by the meansCount was used to fill up the missing values in nominal variables

SAMPLE

SAMPLEUsed to get a sample of data, which reflects whole dataset.Over-sampling was required in the datasetRequired to bias the classification of a rare eventThe records of the positive target value was a rare event in original dataset

SAMPLEStratified sampling is used with Equal as the criteriaPuts higher proportion of the rare event observation than in the original one After the node was applied, there were equal records for both, about 15,000 in total

DATA PARTITIONThe database generated from previous nodes is partitioned into training, validation, testing data.Training - Model buildingValidation- Avoid Over-fittingTesting: Final assessment of the modelRatio taken- 50:30:20

Predictive Analytics

Predictive Analysis

Decision Tree

RegressionLogistic regressionSelection Methods : Forward , Backward , Stepwise, None

No need to create dummy variable! For target19

DECISION TREE

Tree ended up with 16 leaf nodes.

Variable Importance in the order as given by SAS:1. Aircraft Airline operator2. Wildlife Size3. Wildlife number struck4. Altitude Bin5. Phase of flight

DECISION TREE

Cumulative lift = 1.84 Accuracy = 73.8

REGRESSIONIt helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while others are fixed.Regression could be performed both linearly and logistically but since our target variable is binary in nature we used logistic regression techniques.

LOGIT REGRESSIONCumulative lift = 1.92

LOGIT REGRESSION

The Confusion Matrix; True Positive and True Negatives adds up to give the Accuracy rate for the modelAccuracy = 79.6 %

Other Logit Regression Methods

Backward RegressionAccuracy = 76.73%

Forward RegressionAccuracy = 76.73%

Stepwise Regression Accuracy = 76.73% The forward-selection technique begins with no variables in the model. For each of the independent variables, the FORWARD method calculates statistics that reflect the variables contribution to the model if it is included

All gives same accuracy ! Why ?

Optimizing the ResultsTried different ratios in Data Partition Node Increase the max. no. of classes in Variable Selector Tried out with different Criterion for Stratification in Sample NodeCame up with optimal levels of Accuracy for all the models

So Which Model Is the Best Fit ?Model Comparison Node tries to answer that!

On what Basis ? Misclassification Rate!

How to interpret results ? Correlation is not Causation !

Comparison of Missclassification Rate for all Modles ModelsMisclassification ratesLogistic Regression.22944Regression stepwise.23143Regression backward.23143Regression forward.23143Decision Tree.24182

MODEL COMPARISON

Result of the Model Comparison Node

MODEL COMPARISON

Comparison of Cumulative Lift charts for all the models Curve for Regression has been highlighted

Shortcomings! Where the Model could have been Improved!

There were large number of missing values originally present in the data set. The accuracy of all the models would have been improved if missing values were low.Also to account for rare event we had to compress our data set to 15% only. So the results were not that accurate

Business ApplicationProblem: 250 people killed since 1988$900 million/year (by defense & civil aviation)

Causes/Reasons & Possible Solutions:

Cause/ReasonSolutionsIntersection of aviation routes and Routes Migration of birds Change in the aviation routes at different routes at various points in year Wildlife Management department at airport should invest more in analyzing the changes in routes due to landing issues and air traffic Airports and aviation settlement near natural habitat and places like pondsIn general the airports and aviation centers are away from the cities which are near to the natural habitat for wildlife and places like pondsHeight of travellingHeights of travel should be changed according to the change in the month due to migration and change in the flocks natural flying heights

Bird Strikes and its effects

Prevention Better than CureExisting Solutions:Spiral Marks on the Turbine Fans

The spiral appears to be dangerous35

Preventive Measures

Technological changes in the manufacturing of the turbine and engines because once the birds are sucked in the engine failsIncreasing in the no. of the engines because of the above given reasons Campaigns like Strike Out started by Aviation Industry in USABird Strike Committee USA Steering Committee

Strike out : helps the planes figure out the birds and the planes nearby preventing collisions from bothSteering Committee : contains members from below organization dedicated to prevent bird strikes on national level Federal Aviation AdministrationU.S. Department of AgricultureDepartment of DefenseU.S. AirportsPrivate Sector ServicesAirlinesAerospace Industry

36

Future ScopeMeasures of Areas majorly affected by the Airplane bird strikesCost of repair using: cluster analysis, hidden-patterns, causation andCorrelation Which solution amongthe existing ones are best

37