open universes and nuclear weapons

of 92 /92
1 Open universes and nuclear weapons Stuart Russell Computer Science Division, UC Berkeley

Author: rupert

Post on 11-Jan-2016

36 views

Category:

Documents


0 download

Embed Size (px)

DESCRIPTION

Open universes and nuclear weapons. Outline. Why we need expressive probabilistic languages BLOG combines probability and first-order logic Application to global seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). The world has things in it!!. - PowerPoint PPT Presentation

TRANSCRIPT

  • *Open universes and nuclear weapons

    Stuart RussellComputer Science Division, UC Berkeley

  • *OutlineWhy we need expressive probabilistic languagesBLOG combines probability and first-order logicApplication to global seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT)

  • *The world has things in it!!Expressive language => concise models => fast learning, sometimes fast reasoningE.g., rules of chess: 1 page in first-order logicOn(color,piece,x,y,t)~100000 pages in propositional logicWhiteKingOnC4Move12~100000000000000000000000000000000000000 pages as atomic-state modelR.B.KB.RPPP..PPP..N..N..PP.q.pp..Q..n..n..ppp..pppr.b.kb.r[Note: chess is a tiny problem compared to the real world]

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.17th C

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.19th C17th C

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.19th C17th C20th C

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.19th C17th C20th C21st C

  • *Brief history of expressivenessatomicpropositionalfirst-order/relationallogicprobability5th C B.C.19th C17th C20th C21st C(be patient!)

  • *First-order probabilistic languagesGaifman [1964]: Possible worlds with objects and relations, probabilities attached to (infinitely many) sentences Halpern [1990]: Probabilities within sentences, constraints on distributions over first-order possible worldsPoole [1993], Sato [1997], Koller & Pfeffer [1998], various others:KB defines distribution exactly (cf. Bayes nets)assumes unique names and domain closure like Prolog, databases (Herbrand semantics)

  • *Herbrand vs full first-orderGivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?

  • *Herbrand vs full first-orderGivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?

    Herbrand semantics:2

  • *Herbrand vs full first-orderGivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?

    Herbrand semantics:2First-order logical semantics:Between 1 and

  • *Possible worldsPropositional

  • *Possible worldsPropositional

    First-order + unique names, domain closure

    ABCDABCDABCDABCD

  • *Possible worldsPropositional

    First-order + unique names, domain closure

    First-order open-universeABCDABCDABCDABCDA B C DA B C DA B C DA B C DA B C DA B C D

  • *Open-universe modelsEssential for learning about what exists, e.g., vision, NLP, information integration, tracking, life[Note the GOFAI Gap: logic-based systems going back to Shakey assumed that perceived objects would be named correctly]Key question: how to define distributions over an infinite, heterogeneous set of worlds?

  • Bayes nets build propositional worlds*BurglaryAlarmEarthquake

  • Bayes nets build propositional worlds*BurglaryAlarmEarthquakeBurglary

  • Bayes nets build propositional worlds*BurglaryAlarmEarthquakeBurglarynot Earthquake

  • Bayes nets build propositional worlds*BurglaryAlarmEarthquakeBurglarynot EarthquakeAlarm

  • *Open-universe models in BLOGConstruct worlds using two kinds of steps, proceeding in topological order:Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values

  • *Open-universe models in BLOGConstruct worlds using two kinds of steps, proceeding in topological order:Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent valuesNumber statements: Add some objects to the world, conditioned on what objects and relations exist so far

  • *SemanticsEvery well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net

    * No infinite receding ancestor chains, no conditioned cycles, all expressions finitely evaluable

  • *Example: Citation Matching[Lashkari et al 94] Collaborative Interface Agents, Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, 1994.

    Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August 1994.

    Are these descriptions of the same object?

    Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *(Simplified) BLOG model#Researcher ~ NumResearchersPrior();Name(r) ~ NamePrior();#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));Title(p) ~ TitlePrior();PubCited(c) ~ Uniform({Paper p});Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

  • *Citation Matching ResultsFour data sets of ~300-500 citations, referring to ~150-300 papers

    Chart1

    0.210.060.053

    0.060.030.031

    0.140.040.063

    0.110.070.049

    Phrase Matching[Lawrence et al. 1999]

    Generative Model + MCMC[Pasula et al. 2002]

    Conditional Random Field[Wellner et al. 2004]

    Error (Fraction of Clusters Not Recovered Correctly)

    Sheet1

    ReinforceFaceReasonConstraint

    Phrase Matching[Lawrence et al. 1999]0.790.940.860.89

    Generative Model + MCMC[Pasula et al. 2002]0.940.970.960.93

    Conditional Random Field[Wellner et al. 2004]0.9470.9690.9370.951

    ReinforceFaceReasonConstraint

    Phrase Matching[Lawrence et al. 1999]0.210.060.140.11

    Generative Model + MCMC[Pasula et al. 2002]0.060.030.040.07

    Conditional Random Field[Wellner et al. 2004]0.0530.0310.0630.049

    Sheet1

    000

    000

    000

    000

    Phrase Matching[Lawrence et al. 1999]

    Generative Model + MCMC[Pasula et al. 2002]

    Conditional Random Field[Wellner et al. 2004]

    Fraction of Clusters Not Recovered Correctly

    Sheet2

    Sheet3

  • *Example: Sibyl attacksTypically between 100 and 10,000 real peopleAbout 90% are honest, have one login IDDishonest people own between 10 and 1000 logins. Transactions may occur between loginsIf two logins are owned by the same person (sibyls), then a transaction is highly likely; Otherwise, transaction is less likely (depending on honesty of each logins owner). A login may recommend another after a transaction: Sibyls with the same owner usually recommend each other;Otherwise, probability of recommendation depends on the honesty of the two owners.

  • *#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));

    Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)

  • *#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));

    Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)

  • *#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));

    Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)

  • *#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));

    Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)

  • *#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));

    Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)

  • *Example: classical data association

  • *Example: classical data association

  • *Example: classical data association

  • *Example: classical data association

  • *Example: classical data association

  • *Example: classical data association

  • *#Aircraft(EntryTime = t) ~ NumAircraftPrior();Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1);InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, t-1) & !Exits(a, t-1));State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t));#Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));

  • *InferenceTheorem: BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well-formed* model, for any first-order queryCurrent generic MCMC engine is quite slowApplying compiler technologyDeveloping user-friendly methods for specifying piecemeal MCMC proposals

  • *CTBTBans testing of nuclear weapons on earthAllows for outside inspection of 1000km2182/195 states have signed153/195 have ratified Need 9 more ratifications including US, ChinaUS Senate refused to ratify in 1998 too hard to monitor

  • *2053 nuclear explosions

  • *

  • *254 monitoring stations

  • *

  • *Vertically Integrated Seismic AnalysisThe problem is hard:~10000 detections per day, 90% falseCTBT system (SEL3) finds 69% of significant events plus about twice as many spurious (nonexistent) events16 human analysts find more events, correct existing ones, throw out spurious events, generate LEB (ground truth)Unreliable below magnitude 4 (1kT)Solve it by global probabilistic inferenceNET-VISA finds around 88% of significant events

  • *

  • *

  • *

  • *

  • *

  • *

  • *

  • *

  • *

  • *

  • *Generative model for IDC arrival dataEvents occur in time and space with magnitudeNatural spatial distribution a mixture of Fisher-BinghamsMan-made spatial distribution uniformTime distribution Poisson with given spatial intensityMagnitude distribution Gutenberg-Richter (exponential)Aftershock distribution (not yet implemented)Travel time according to IASPEI91 model plus Laplacian error distribution for each of 14 phasesDetection depends on magnitude, distance, station*Detected azimuth, slowness plus Laplacian error False detections with station-dependent distribution

  • *# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE];IsEarthQuake(e) ~ Bernoulli(.999);EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution() Else UniformEarthDistribution();Magnitude(e) ~ Exponential(log(10)) + MIN_MAG;Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s));IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);#Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)];#Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0;Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION) else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a);TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a)));Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360) else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a);AzRes(a) ~ Laplace(0, AZSCALE(site(a)));Slow(a) ~ If (event(a) = null) then Uniform(0,20)else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noise

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noiseTypeTimeLocationDepthMagnitudePhase

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noiseTravel timeAmplitude decay

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noiseArrival time*Amplitude*Azimuth*Slowness*Phase*

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noise

  • *

  • *

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noise

  • *Travel-time residual (station 6)

  • *Seismic eventPropagationSeismic eventPropagationForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noise

  • *Detection probability as a function of distance (station 6, mb 3.5)P phaseS phase

  • *Seismic eventPropagationSeismic eventPropagationStation 1picksStation 2picksForward model structureDetected atStation 1?Detected atStation 2?Station 1noiseStation 2 noise

  • *Overall Pick Error

  • *Overall Azimuth Error

  • *Phase confusion matrix

  • *Fraction of LEB events missed

  • *Fraction of LEB events missed

  • *Event distribution: LEB vs SEL3

  • *Event distribution: LEB vs NET-VISA

  • *Why does NET-VISA work?Multiple empirically calibrated seismological modelsImproving model structure and quality improves the resultsSound Bayesian combination of evidenceMeasured arrival times, phase labels, azimuths, etc., NOT taken literally Absence of detections provides negative evidenceMore detections per event than SEL3 or LEB

  • *Example of using extra detections

  • *NEIC event (3.0) missed by LEB

  • *NEIC event (3.7) missed by LEB

  • *NEIC event (2.6) missed by LEB

  • *Why does NET-VISA not work (perfectly)?Needs hydroacoustic for mid-ocean eventsWeaknesses in model:Travel time residuals for all phases along a single path are assumed to be uncorrelatedEach phase arrival is assumed to generate at most one detection; in fact, multiple detections occurArrival detectors use high SNR thresholds, look only at local signal to make hard decisions

  • *Detection-based and signal-based monitoringeventsdetectionswaveform signals SEL3 NET-VISA SIG-VISA

  • *SummaryExpressive probability models are very usefulBLOG provides a generative language for defining first-order, open-universe modelsInference via MCMC over possible worldsOther methods welcome!CTBT application is typical of multi-sensor monitoring applications that need vertical integration and involve data associationScaling up inference is the next step

    **

    ********************************************************************************************************************************************************************>python debug.py 15 visa 254 -w 4 -r .1***python debug.py 15 visa 2069 -w 4 -r .1***python debug.py 15 visa 2338 -w 4 -r .1*******