ask-the-expert: active learning based knowledge discovery ...€¦ · ask-the-expert: active...
TRANSCRIPT
ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert
Kamalika DasDataSciences Group
NASAAmesResearch Center
MLWorkshop,August 2017
Problem• Identifysafetyeventsinflightoperational data• Unsupervisedanomaly detection• SMEreviewof anomalies
Unsupervisedanomalydetection
NOS OSOS NOS
NOS NOS
Statisticalflight anomalies2
• Lackofdefinitionof‘safety’ incident• One-classSVMbasedanomaly detection
x2 x2
Θ
Unsupervisedanomaly detection
x1 x1
S. Das, B. Matthews, A. Srivastava, N Oza. 2010.Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study. InProceedings of the 16th ACM SIGKDD (KDD '10). 47-56.
3
Stateofthe art
TRACON
ATRCC
FAAFacilities
Data Collection Data Processing
TRACON
ARTCC
DataFilter
FeatureSelectionAndNormalization
DataMerge
Calculate FlightSeparation andTurn-to-finalfeatures
Existing System
MKAD: UnsupervisedAnomaly Detection Nominals
Anomalies
Labels
OperationallySignificantEvents
4
Proposed approach
Input Features Anomalies
MKAD
Operationally significant anomalies
Active learning strategy
SMETraining
Uninteresting anomalies
Nominals
Active learning with rationales framework
Inst
ance
forl
abel
ing
Labe
l
5
Ratio
nale
Output
2-class classification/ranking algorithmActive Learner
Activelearning framework
…
Flightsflight1
f?1 f2 f3 … … … … fn f*?
?
?
labels
Statistical anomaliesFeatures
Bootstrap samples
Labeled poolf1 f2 f3 … … … … fn f*
O
N
N
f1 f2 f3 … … … … fn f*?
?
?
?
Unlabeled pool
ActiveLearner
Model:2-classmultiplekernelSVMActivelearningstrategy:MostlikelypositiveAutomatedfeatureconstruction:Multiplekernellearning+decisiontree construction
Lossof separationOSflight:x* Label:y* rationale
flight: x* Sample to label
6
ASK-the-Experttool: architecture
7K.Das,I.Avrekh,B.Matthews,M.Sharma,N.Oza.2017.ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert.InProceedingsofECML-PKDD2017.Tobepublished.
Annotator component
8
Coordinator component
10
Multiplekernelsupportvector machine
• 2-classSVM objective:
• Decision function:
f1 f2 f2 … … fn
1
2
3
…
m
• Multiplekernel2classSVM:classifyingbetweenoperationallysignificant(OS)anduninteresting(NOS) flights
Feature set
Flight
timeserie
s
… Weightedaverageofallfeature kernels
… …
ηnKernelweights: η1 … η3 ……
10
Rationalefeatureconstruction
• Howtosetweights:η1,η2,…,ηn𝑠. 𝑡. 𝜂𝑚 >=0 &∑𝜂𝑚= 1
• SimpleMKL algorithm– Modifiedobjective function– Alternatesbetweenoptimizingclassifiermarginandweightsof kernels
11
Rationalefeatureconstruction
• Decisiontree induction
12
Data
altitude
Verticalseparation
Horizontal separation
ORIGINAL FEATURES• Latitude• Longitude• Altitude• Ground speed• Horizontal separation• Vertical separation• Aircraft size• Turn-to-final(TTF) parameters:
• Maximum overshoot• Speedat TTF• Distanceat TTF• Angleat TTF• Altitudedifferenceat TTF
• Nearestneighboring(NN)flight info:• NNflightonsame runway• NNflightonparallel runway• NNflightpartofthesame flow
Runway
Rationale features
“Lossof separation”• Horizontalseparation<3milesAND
Verticalseparation<1000ftANDnearestneighboringflightisnotonparallelrunwaysandnotpartofthesame flow
“Large overshoot”• Maximumovershootisgreaterthana
thresholdbasedonvaluesofflightswithpositive labels
“Unusualflight path”• Overalldeviationfromexpected(average)
trajectoryofalllandingflightsonthatrunway
x
Begin PointxLandingPoint
Expected
Actutrajecto
alry
trajectory
Deviation fromexpectedpath
Verticalseparation<1000 ft
Horizontalseparation<3miles
Experimental setup
15
• Dataset:30NMairspacearoundDenverInternationalAirportforAug 2014– Trainingset:~2400 flights– Statisticalanomalies: 153– OSflights: 24
• 2foldcrossvalidationwith10randombootstrapsforeach fold
Performance analysis• Metrics:precision@5and precision@10• Most-likelypositivestrategy
Learningcurvesfordifferentactivelearning strategies16
Performance analysis
75%savingsinlabelingeffort
Learningcurvesformostlikelypositivestrategywithandwithout rationales
17M.Sharma,K.Das,M.Bilgic,B.Matthews,D.Nielsen,N.Oza.2016.ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation.InProceedingsofECML-PKDD2016.pp209-225.
Performance analysis
Comparisonofnumberoflabeledflightsrequiredbyvariousstrategiestoachieveatargetperformancemeasure.‘n/a’representsthatthetarget
performancecannotbeachievedbyamethodevenwith45labeled flights.
18
Performance benefits
20
• Generalization– Twodifferenttestdatasets:July2014andJuly 2015– Averageimprovementinprecision@5: ~30%– Averageimprovementinprecision@10: ~65%
• Review time– Upto75%reductioninreviewtimeforsametargetperformance
Summary
20
• Upto75%reductioninSMEreview time• Methodandtoolisagnostictodomain
• Canbetailoredtoworkinanydomainsufferingfromlackoflabeleddata
Acknowledgement
21
• ThisworkissupportedbyCenterInnovationFund(CIF)2017 award
• Team:– NikunjOza,NASAAmesResearch Center– BryanMatthews,SGT Inc.– IllyaAvrekh,SGT Inc.– ManaliSharma,PhDStudent,IllinoisInstituteof Technology– SayeriLala,UndergraduateStudent,MassachusettsInstituteof Technology
Thank You
22