TAC2018StreamingMultimediaKBPPilot
HoaTrangDang
NationalInstituteofStandardsandTechnology
Background
• NISTwillevaluateperformersinDARPAAIDAProgram(ActiveInterpretationofDisparateAlternatives)• SomeAIDAevaluationswillbeopenevaluationsinTACandTRECVID.• ThegoalofAIDAistodevelopasemanticenginethatautomaticallygeneratesmultiplealternativeanalyticinterpretations ofasituation,basedonavarietyofunstructuredsourcesthatmaybenoisy,conflicting,ordeceptive.• Documentscancontainamixofmultilingualtext,speech,image,video;includingmetadata.• Adocumentcanbeassmallasasingletweet,oraslargeasaWebpagecontaininganewsarticlewithtext,picturesandvideoclips.
§ Alldatawillbein streamingmode; systemscanaccessthedataonlyonceinrawformat,butmayaccessaKBcontainingastructuredsemanticrepresentationofalldataseentodate
ACTIVE INTERPRETATIONOF DISPARATE ALTERNATIVES(AIDA)• Givenascenario(“Benghazi”),documentstream,andseveraltopics.Foreachtopic:
• TA1outputsallKnowledgeElements(entities,relations,events,etc.,definedintheontology)inthedocuments,includingalternativeinterpretations• TA2fusesKEsfromTA1intotheTA2KB,maintainingalternativeinterpretations• TA3constructsinternallyconsistenthypotheses(partialKBs)fromTA2KB
TA1TA2TA3
Scenario-SpecificOntology
• Scenarioswillinvolveeventssuchasinternationalconflicts,naturaldisasters,violenceatinternationalevents,orprotestsanddemonstrations.• AIDAwillextendKBPontologyofentities,relations,events,beliefandsentimenttoincludeadditionalconceptsthatareneededtocoverinformationalconflictsineachtopicinthescenario• Ideally,wouldhaveasingleontologyforalltopicsinthescenario(?)
AIDAKBrepresentation
• KnowledgeElement(KE)isastructuredrepresentationofentities,relations,events,etc.-- likelyanaugmentedtriplelikeinColdStartKB• Tripleisaugmentedwithprovenanceandconfidence• Provenanceisasetofjustifications.Eachjustificationhasajustification-levelconfidence• KE-levelconfidenceisexplicitlyprovidedbyTA1andTA2,andisanaggregationofjustification-levelconfidences
• KBcontainsconflictingKEs(asfoundintherawdocuments)• Representation-- notreconciliation-- ofconflicts
WhatisallowedinKBrepresentation?
• AIDA:“Althoughtheremaybeneedforsomenaturallanguage,imagethumbnails,featurized media,etc.intheKBforreference,registration,ormatchingpurposes,itisexpectedthatmostoftheassertionsintheKBwillbeexpressibleinthestructuredrepresentation,withelementsderivedfromanontology.”• FeaturesaccessibletoTA1/TA2inKEcannotbedocument-levelcontentfeatures(?).Allowablefeaturesinclude• Numberofsupportingdocs,andlinktodocs(butcan’treaddocs)• Timeoffirstsupportingdoc,mostrecentsupportingdoc
• Comments/recommendationsfromparticipatingteamsarewelcomeregardingwhatfeaturesshouldbeallowedintheKB• Forevaluationpurposes,provenanceaccessibletoLDCshouldbepointersintotherawdocumentsdenotingtextspans,audiospans,images,orvideoshots
TAC/TRECVID2018tasks(pilot)• Task1:Extractallevents,subeventoractions,entities,relations,locations,time,andsentimentfrommultimediadocumentstream ,conditionedonzeroormoredifferentcontexts,orhypotheses (TAC,TRECVID2018)• OutputisasetofallpossibleKEs,includingconfidenceandprovenance• Mention-leveloutput,includingwithin-documentlinking
• Task2:BuildKBbyaggregatingallKEsfromTA1and“user”(TAC2018)• OutputisKBincludingcross-doclinking• Evaluatebyqueries(withentrypoints)andassessment
• [Task3:CreatehypothesesfromTask2KBs(AIDAprogram-internalin2018)]
Training/Evaluationdata
• Onenewscenarioperevaluationcycle;4 scenariostotaloverlifetimeofAIDAprogram.• 100Kdocs/scenario,includingrelevantandirrelevantdocuments• 5-20%ofdocswillberelevanttothescenario• 200labeleddocsperscenario
• 12-20topicsperscenario• Atleastoneforeignlanguageperscenario,plusEnglish• AIDA:“Governmentwillprovidelinguisticresources andtoolsofaqualityandcompositiontobedetermined,butconsisting atleastofthetypeandsizefoundinaLORELEIRelatedLanguagePack (LRLP)"
LowResourceLanguagePacks• 1Mw- 2Mw+monotextfromnews,webtext&socialmedia• 300Kw- 1.1Mw+paralleltextofvariablequality(professional,crowd,found,comparable)• Annotationsfor25Kw- 75Kw/languageincluding
• SimpleNamedEntity(PER,ORG,GPE,LOC/FAC)• KBlinkingofnamestoGeoNames andCIAWorldFactBook• SituationFrames:needs/issuesforanincident(e.g.UrgentshelterneedinKermanshahprovince)
• FullEntity(name,nom,pro)andwithin-doccoref• Predicate-argumentannotationofdisaster-relevantActsandStates
• Grammaticalresourcesrangingfromfullgrammaticalsketchtofoundresources(dictionaries,grammars,primers,gazetteers)tolexicons• BasicNLPtoolsincludingword,sentencesegmenters,encodingconverters; nametaggers
RelatedTRECVIDTasks
TRECVID(2001– Present)• Shotboundarydetection:Identifytheshotboundariesinthegivenvideoclip(s)• High-levelfeatureextraction/SemanticIndexing:Givenastandardsetofshotboundariesandalistoffeature(concepts)definitions,returnarankedlistofshotsaccordingtothehighestpossibilityofdetectingthepresenceofeachfeature
• Ad-hocVideoSearch:Givenastatementofinformationneed,returnarankedlistofshotswhichbestsatisfytheneed;similartosemanticindexing,butwithcomplexconcepts(combinationofconcepts);e.g.,findgroupofchildrenplayingfrisbee inapark.
• RushesSummarization:Givenavideofromtherushestestcollection,automaticallycreateanMPEG-1summarycliplessthanorequaltoamaximumdurationthatshowsthemainobjectsandeventsintherushesvideotobesummarized
• Surveillanceeventdetection:detectasetofpredefinedeventsandidentifytheiroccurrencestemporally
• Content-basedcopydetection:givenatestcollectionofvideosandasetof(video,audio,video+audio)queries,determineforeachquerytheplace,ifany,thatsomepartofthequeryoccurs,withpossibletransformations,inthetestcollection
TRECVID(2001– Present)• Known-itemSearch:Givenatext-onlydescriptionofthevideodesiredandatestcollectionofvideowithassociatedmetadata,automaticallyreturnalistofupto100videoIDsrankedbyprobabilitytobetheonesought• InstanceSearch:Givenacollectionoftestvideos,amastershotreference,andacollectionofqueriesthatdelimitaperson,object,orplaceentityinsomeexamplevideo,locateforeachquerythe1000shotsmostlikelytocontainarecognizableinstanceoftheentity[AIDATA2cross-doccoref]• MultimediaEventDetection:Givenacollectionoftestvideosandalistoftestevents,indicatewhethereachofthetesteventsispresentanywhereineachofthetestvideosandgivethestrengthofevidenceforeachsuchjudgment• Localization:Givenavideoshot,Determinethepresenceofaconcepttemporallywithintheshot,withrespecttoasubsetoftheframescomprisedbytheshot,and,spatially,foreachsuchframethatcontainstheconcept,toaboundingrectangle[AIDAprovenance?]
Latesttaskintroducedin2016:Video-to-Text• Givenasetof2000URLsofTwitter(Vine)videosandsetsoftextdescriptions(eachcomposedof2000sentences),systemsareaskedtoworkandsubmitresultsfortwosubtasks:
• MatchingandRanking: ReturnforeachvideoURLarankedlistofthemostlikelytextdescriptionthatcorrespond(wasannotated)tothevideofromeachofthedifferenttextdescriptionsets.
• DescriptionGeneration: AutomaticallygenerateforeachvideoURLatextdescription(1sentence)independentlyandwithouttakingintoconsiderationtheexistenceoftextdescription
sets.
• Systemsandannotatorswereencouragedtodescribevideosusing4facets:• Who isthevideodescribingsuchasconcreteobjectsandbeings(kindsofpersons,animals,things)• What aretheobjectsandbeingsdoing?(genericactions,conditions/stateorevents)• Where suchaslocale,site,place,geographic,architectural(kindofplace,geographicorarchitectural)
• When suchastimeofday,season,etc
AirplaneAnchorpersonAnimal Basketball BeachBicyclingBoat_ShipBoy Bridges BusCar_RacingChair CheeringClassroom Computers Dancing Demonstration_Or_ProtestGreetingHand Highway
Sitting_DownStadium Swimming Telephones ThrowingBaby Door_OpeningFields Flags Forest George_BushHill Lakes Military_AirplaneExplosion_FireFemale-Human-Face-Closeup Flowers GirlGovernment-Leader Instrumental_Musician
Oceans Quadruped Skating Skier SoldiersStudio_With_AnchorpersonTraffic Kitchen MeetingMotorcycle News_StudioNighttime Office Old_PeoplePeople_MarchingPress_ConferenceReportersRoadway_JunctionRunningSinging
ExamplesofconceptsusedintheTRECVIDSemanticINdexing(SIN)task
Multimedia
• Eachdocumentcancontainamixoftext,speech,image,video;includingmetadata.• Multiplelanguages:Englishplus1-2foreignlanguages(TBA)• LDCwillprovidelanguagepackscontainingresourcesforeachlanguage
• Allparticipantswillbegiventhesamedocuments• Participantsareallowedtoprocessinfoinapropersubsetofthelanguagesormediatypes• NISTmayreportbreakdownevaluationresultsbylanguage,mediatype,etc.
StreamingExtraction
• Documentsarriveinbatchesasachunk.• ~100documents/chunk(?),withcaponlengthoftimecoveredinachunk
• TA1(andTA2?)systememitsKE’s(triple+confidence+extras)aftereachchunk.• Atspecifiedtimepointsinthestream,thesetofaccumulatedKE’sisevaluated.• Rankedprecision/recallderivatives.
• Atsomeofthosepoints,awildhypothesisappears!• Ahypothesis=asetofproposedtuples.• TA1systemoutputsKE’sprimedbythehypothesis,whichareevaluated.
TA1ExtractionConditionedonContext• TA1mustbecapableofacceptingalternatecontexts and producingalternateanalyses foreachcontext.• Forexample,theanalysisofacertainimageproducesknowledgeelementsrepresentingabuson aroad.However,knowledgeelementsinoneormorehypothesessuggestthatthisisariverratherthanaroad. Theanalysis algorithmshouldusethisinformationforadditionalanalysisoftheimagewithpriorsfavoringa boat.
• Simplifyingassumptionsforevaluationpurposes:• Contextsarecoherenthypotheses(representedasapartialKB)drawnfromasmallstaticsetofpossiblehypothesesthatareproducedmanuallybyLDC• Only“whatif”hypothesesareinputtoTA1;KEsandconfidencevaluesresultingfrom“whatif”hypothesesdonotgetpassedontoTA2butareevaluatedseparately
HowisTask1differentfrompastTRECVIDandTACcomponenttasks?
• Multimedia• Streaminginput• Can’tgobacktoreanalyzerawdocsinpreviousdatachunks
• TA1hasaccesstoTA2KBencodingpreviouslyaddedKE’s
• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationalconflictsinscenario• TA1outputsallpossibleextractionsandinterpretations,notjustthemostconfidentones• TA1extractionfromdataitemsmaybeconditionedonhypothesis
HowisTask2 differentfromColdStartKBP?
• Multimedia• Streaminginput• TA2hasnoaccesstorawdataitemstoassistinfusingincomingKEswithexistingKB;canonlyusewhat’srepresentedintheincomingKEandexistingKB
• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationconflictsinscenario• TA2KBmustmaintainallpossibleKEs(evenlow-confidenceKEs)inordertosupportcreationofmultiplehypothesesanddisparateinterpretations• TA2KEsandconfidencestheoreticallycouldbeconditionedonhypothesisinfuture,butfor2018theTA2KBisindependentofany“whatif”hypotheses.
EvaluationbyAssessment
• Evaluateusingpost-submissionassessmentandclusteringofpooledmentions• TosupportevaluationofTA1extractionconditionedoncontext,ground-truthmustbeconditionedonasmallsetofhypotheses,predeterminedbyLDC.
• OnlytargetedKEs(relevanttohypotheses)willbeevaluated• Onlykhighest-confidencementions/justificationsforeachKEwillbepooledandassessed• LDCmight provideexhaustiveannotationofmentionsofentitiesforasmall setofdocuments,forgold-standardbased“NER”evaluation
AIDAEvaluationSchedule
• 318-monthphases• January2018kick-off
• ~Sept2018:Eval Pilot• ~May2019:Eval 1(Phase1)• ~Nov2020:Eval 2(Phase2)• ~May2022:Eval 3(Phase3)
TAC2018StreamingMMKBPPilotEvaluationSchedule
• Sample/training/eval datarelease:• ~January:scenarioand3mostlylabeledtopicsfortraining;all100Kunlabeleddocsforthescenario(foreignlanguagesannouncedatthistime)• ~April:3additionallabeledtopicsfortraining• ~September:6“evaluation”topics
• EarlySeptember(?):Task1evaluationwindow• MidSeptember(?):Task2evaluationwindow