ask-the-expert: active learning based knowledge discovery ...€¦ · ask-the-expert: active...

22
ASK-the-Expert: Active learning based knowledge discovery using the expert Kamalika Das Data Sciences Group NASA Ames Research Center ML Workshop, August 2017

Upload: others

Post on 09-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert

Kamalika DasDataSciences Group

NASAAmesResearch Center

MLWorkshop,August 2017

Page 2: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Problem• Identifysafetyeventsinflightoperational data• Unsupervisedanomaly detection• SMEreviewof anomalies

Unsupervisedanomalydetection

NOS OSOS NOS

NOS NOS

Statisticalflight anomalies2

Page 3: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

• Lackofdefinitionof‘safety’ incident• One-classSVMbasedanomaly detection

x2 x2

Θ

Unsupervisedanomaly detection

x1 x1

S. Das, B. Matthews, A. Srivastava, N Oza. 2010.Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study. InProceedings of the 16th ACM SIGKDD (KDD '10). 47-56.

3

Page 4: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Stateofthe art

TRACON

ATRCC

FAAFacilities

Data Collection Data Processing

TRACON

ARTCC

DataFilter

FeatureSelectionAndNormalization

DataMerge

Calculate FlightSeparation andTurn-to-finalfeatures

Existing System

MKAD: UnsupervisedAnomaly Detection Nominals

Anomalies

Labels

OperationallySignificantEvents

4

Page 5: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Proposed approach

Input Features Anomalies

MKAD

Operationally significant anomalies

Active learning strategy

SMETraining

Uninteresting anomalies

Nominals

Active learning with rationales framework

Inst

ance

forl

abel

ing

Labe

l

5

Ratio

nale

Output

2-class classification/ranking algorithmActive Learner

Page 6: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Activelearning framework

Flightsflight1

f?1 f2 f3 … … … … fn f*?

?

?

labels

Statistical anomaliesFeatures

Bootstrap samples

Labeled poolf1 f2 f3 … … … … fn f*

O

N

N

f1 f2 f3 … … … … fn f*?

?

?

?

Unlabeled pool

ActiveLearner

Model:2-classmultiplekernelSVMActivelearningstrategy:MostlikelypositiveAutomatedfeatureconstruction:Multiplekernellearning+decisiontree construction

Lossof separationOSflight:x* Label:y* rationale

flight: x* Sample to label

6

Page 7: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

ASK-the-Experttool: architecture

7K.Das,I.Avrekh,B.Matthews,M.Sharma,N.Oza.2017.ASK-the-Expert:Activelearningbasedknowledgediscoveryusingthe expert.InProceedingsofECML-PKDD2017.Tobepublished.

Page 8: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Annotator component

8

Page 9: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Coordinator component

10

Page 10: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Multiplekernelsupportvector machine

• 2-classSVM objective:

• Decision function:

f1 f2 f2 … … fn

1

2

3

m

• Multiplekernel2classSVM:classifyingbetweenoperationallysignificant(OS)anduninteresting(NOS) flights

Feature set

Flight

timeserie

s

… Weightedaverageofallfeature kernels

… …

ηnKernelweights: η1 … η3 ……

10

Page 11: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Rationalefeatureconstruction

• Howtosetweights:η1,η2,…,ηn𝑠. 𝑡. 𝜂𝑚 >=0 &∑𝜂𝑚= 1

• SimpleMKL algorithm– Modifiedobjective function– Alternatesbetweenoptimizingclassifiermarginandweightsof kernels

11

Page 12: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Rationalefeatureconstruction

• Decisiontree induction

12

Page 13: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Data

altitude

Verticalseparation

Horizontal separation

ORIGINAL FEATURES• Latitude• Longitude• Altitude• Ground speed• Horizontal separation• Vertical separation• Aircraft size• Turn-to-final(TTF) parameters:

• Maximum overshoot• Speedat TTF• Distanceat TTF• Angleat TTF• Altitudedifferenceat TTF

• Nearestneighboring(NN)flight info:• NNflightonsame runway• NNflightonparallel runway• NNflightpartofthesame flow

Runway

Page 14: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Rationale features

“Lossof separation”• Horizontalseparation<3milesAND

Verticalseparation<1000ftANDnearestneighboringflightisnotonparallelrunwaysandnotpartofthesame flow

“Large overshoot”• Maximumovershootisgreaterthana

thresholdbasedonvaluesofflightswithpositive labels

“Unusualflight path”• Overalldeviationfromexpected(average)

trajectoryofalllandingflightsonthatrunway

x

Begin PointxLandingPoint

Expected

Actutrajecto

alry

trajectory

Deviation fromexpectedpath

Verticalseparation<1000 ft

Horizontalseparation<3miles

Page 15: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Experimental setup

15

• Dataset:30NMairspacearoundDenverInternationalAirportforAug 2014– Trainingset:~2400 flights– Statisticalanomalies: 153– OSflights: 24

• 2foldcrossvalidationwith10randombootstrapsforeach fold

Page 16: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Performance analysis• Metrics:precision@5and precision@10• Most-likelypositivestrategy

Learningcurvesfordifferentactivelearning strategies16

Page 17: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Performance analysis

75%savingsinlabelingeffort

Learningcurvesformostlikelypositivestrategywithandwithout rationales

17M.Sharma,K.Das,M.Bilgic,B.Matthews,D.Nielsen,N.Oza.2016.ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation.InProceedingsofECML-PKDD2016.pp209-225.

Page 18: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Performance analysis

Comparisonofnumberoflabeledflightsrequiredbyvariousstrategiestoachieveatargetperformancemeasure.‘n/a’representsthatthetarget

performancecannotbeachievedbyamethodevenwith45labeled flights.

18

Page 19: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Performance benefits

20

• Generalization– Twodifferenttestdatasets:July2014andJuly 2015– Averageimprovementinprecision@5: ~30%– Averageimprovementinprecision@10: ~65%

• Review time– Upto75%reductioninreviewtimeforsametargetperformance

Page 20: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Summary

20

• Upto75%reductioninSMEreview time• Methodandtoolisagnostictodomain

• Canbetailoredtoworkinanydomainsufferingfromlackoflabeleddata

Page 21: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Acknowledgement

21

• ThisworkissupportedbyCenterInnovationFund(CIF)2017 award

• Team:– NikunjOza,NASAAmesResearch Center– BryanMatthews,SGT Inc.– IllyaAvrekh,SGT Inc.– ManaliSharma,PhDStudent,IllinoisInstituteof Technology– SayeriLala,UndergraduateStudent,MassachusettsInstituteof Technology

Page 22: ASK-the-Expert: Active learning based knowledge discovery ...€¦ · ASK-the-Expert: Active learning based knowledge discovery using theexpert KamalikaDas Data SciencesGroup NASA

Thank You

22