extracting events from probabilistic streams

33
EXTRACTING EVENTS FROM PROBABILISTIC STREAMS Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

Upload: lita

Post on 16-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Chris Re, Julie Letchner , Magdalena Balazinska and Dan Suciu University of Washington. Extracting Events from Probabilistic Streams. One Slide Overview. Motivating App: RFID Ecosystem Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extracting Events from Probabilistic Streams

EXTRACTING EVENTS FROM PROBABILISTIC

STREAMS

Chris Re, Julie Letchner,

Magdalena Balazinska and Dan Suciu

University of Washington

Page 2: Extracting Events from Probabilistic Streams

One Slide Overview Motivating App: RFID Ecosystem

Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]

Alert when anyone enters the coffee room Two problems

Missed readings, read-rates in practice are lowGranularity mismatch, e.g. Office v. Antenna 41

Instead, infer location from sensors Propose, keep probs & query with PEEX+

PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.

Page 3: Extracting Events from Probabilistic Streams

Motivating Apps

RFID appsDiary and Active Calendar Application.

○ Alert if I go to a database meeting.Supply chain

○ Alert if Mach 3 razors are being stolen

Many independent HMMsElder care [Intel,Patterson]

○ Alert if elder takes their medicine with waterFinancial applications on predictive HMM

○ Alert if head-and-shoulders market

Page 4: Extracting Events from Probabilistic Streams

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 5: Extracting Events from Probabilistic Streams

The source of probabilities

Each orange particle is a guess of true location

6th Floor in PAC

Blue ring is ground truth

Connectivity Diagram

Antennas

Page 6: Extracting Events from Probabilistic Streams

PFs to a (prob) DB personTag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

At(tag,loc)

To query Particle Filter output, query At

Page 7: Extracting Events from Probabilistic Streams

Tag t Loc P

Joe 7 O2 0.4

H2 0.2

H3 0.4

Joe 8 O2 0.6

H2 0.2

H3 0.2

Sue 7 … …

Semantics of the Model

At(tag,loc)

Tag t Loc

Joe 7 O2

Joe 8 O2

Sue 7 …

Prob =0.4 * 0.6 * …

NB: Markovian correlations OK

“Joe enter O2 at t=8”

(0.2 0.4)*0.6 0.36 Query Semantic: sum weight of all worlds where Q is true at time t

possible stream (worlds)

Probability outside O2 (in H2,H3)

Page 8: Extracting Events from Probabilistic Streams

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 9: Extracting Events from Probabilistic Streams

A hierarchy of PEEX+ queries Regular Queries

Alert me when Joe goes to the coffee room Extended Regular

Alert when anyone goes to the coffee room Safe

Alert when anyone goes to the coffee room and a DB member follows them.

Hard Others (Simulation)This line is sharp for some queries

Page 10: Extracting Events from Probabilistic Streams

Peex+ Queries

Fragment of Cayuga, queries define events.Operator Description

Base stream

semicolon Sequence

Select

Kleene+

Technical Point: Left-to-right eval,

( ) ( )V P1( ) 1( ( , ))l At p lRoom

1( , )At p l

1 2( , ); ( , )At p l At p l

( )( , ) { , }Hall lAt p l p

g;

{ , }P V

1 2 3 1 2 3; ; ( ; );E E E E E E

Same p in both

p in some location

Page 11: Extracting Events from Probabilistic Streams

Regulars and Extended Regular

Query is regular if no variable is shared between subgoals

Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query

502 ( (' ', '501'); (' ', ))l At Joe At Joe l

502 ( ( , 5̀01 ); ( , ))l At p At p l p is shared between subgoals

Page 12: Extracting Events from Probabilistic Streams

Wrinkle in the language:Filter v. Selection

“Alert next time Joe is in 502 after he is in 501”

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Time

Yes

No

( ,501)Joe ( ,502)Joe( ,503)Joe

“Alert if the next place Joe is in after 501 is 502”

At

Page 13: Extracting Events from Probabilistic Streams

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 14: Extracting Events from Probabilistic Streams

Why are ER queries hard?

Regular Queries ~ Regular ExpressionsMapping is non-trivial

○ similar to Cayuga [Demers et al. 06] Queries have #P-combined complexity

○ Can encode mDNF as regular expressionIntuition: n-sized automaton leads to

Extended regular ~ 1 NFA per/personk persons implies O(k)-size automatonExponential cost

time(2 )n

When ER, can avoid blowup

Page 15: Extracting Events from Probabilistic Streams

Algorithm for Regular Queries Overview

Deterministic Algorithm

1. Compile a query q1. NFA –like-thing in a language

2. Mapping events to subsets of

2. At runtime, at time t have events E1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qL

qLqM

Focus on the compilation

Page 16: Extracting Events from Probabilistic Streams

Compile Select and Filter

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Intuition: goal maps to two letters:match (m) : matches filteraccept (a) : accepted by select

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

language and automaton are the same for both queries

Page 17: Extracting Events from Probabilistic Streams

The difference is the mapping

1 1 2 2{m , , , }L a m a

1a 2a

2{ }m Final

Does not contain

Does contain

(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe

`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l

Event Filter Select

( ,501)Joe11{ , }m a 1 1 2{ , , }m a m

( ,502)Joe22{ , }m a

2 2{ , }m a

0( , )Joe l2{ }m

Page 18: Extracting Events from Probabilistic Streams

Regular Queries w. Probabilities

Probabilistic Algorithm

1. Compile a query q1. NFA with transition in a language

2. Mapping events to subsets of

2. At time t have events E with probs1. Create set of symbols at time t:

2. Process NFA on

( ) ( )q qe E

M E M e

( )qM E

qLqLqM

Stays the same

distribution on inputs

Algorithm is constant in data, exponential in |Q|

distribution on states

State at t+1 only depends on state at t and input at t+1

Page 19: Extracting Events from Probabilistic Streams

Extension to Extended regular “Alert when anyone in 501 and next step

in 502”

If substitute for p, result is regular

Bindings use disjoint sets of tuples. Algorithm: independent copies, multiply

`502'( ( , 5̀01 ); ( , ))select lq At p At p l

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Joe At Joe At Joe l

Depends on # distinct values (shared vars), not # of timesteps – can stream

`502'[ ] ( ( , 5̀01 ); ( , ))lq p Tom At Tom At Tom l

Page 20: Extracting Events from Probabilistic Streams

Recap of Algorithms

Regular QueriesCompiled them to an NFA, then used imageData complexity O(1)

Extended regularSeveral regulars multiplied togetherDepends on number of distinct people in the

data, not number of time steps. Markov Correlations: more arithmetic &

state

Page 21: Extracting Events from Probabilistic Streams

PEEX+ Algorithms and Analysis

Compilation procedures Safe plans.

More complicated based on algebracost grows with data (useful for archives)

Aggregates Complexity: Can we do better?

For a restricted class, draw a crisp lineMinor variants of safe result in hardness

Page 22: Extracting Events from Probabilistic Streams

Outline

RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments

Page 23: Extracting Events from Probabilistic Streams

Experimental Setup

Quality Experiment52 objects, 352 locations, 10k sq. ft.

○ 2x30m trace with 10 m break in betweenParticipants marked down true locations“Alert when anyone enters the Coffee Room”

Consider two ScenariosRealtime (No correlations) v. MLEArchived (Smoothing) v. Viterbi

2 1( ) ( ) 1 2( ( ( , )); ( , ))Coffee l Hallway l At p l At p l

In practice, can smooth in a short time

Page 24: Extracting Events from Probabilistic Streams

Quality: Realtime Declare an event “true”, if its Pr > threshold

Vary threshold

0

0.2

0.4

0.6

0.8

1Precision

0

0.2

0.4

0.6

0.8

1Recall

0

0.2

0.4

0.6

0.8

1F1

10% improvement in F1

Page 25: Extracting Events from Probabilistic Streams

Quality: Archived

Smoothing v. ViterbiPEEX keeps track of Markovian Correlations

0

0.2

0.4

0.6

0.8

1

Precision Recall F1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

Approx ~30% gain in F1

Page 26: Extracting Events from Probabilistic Streams

Performance

Page 27: Extracting Events from Probabilistic Streams

Conclusion Showed PEEX+

Processed output of several inference tasks○ Applies more generally than just RFID

Quality (F1) gains by keeping probability50% from probs, 50% from correlations

Performance was usable in real-timeNo indexing!

Preprint available on request

Page 28: Extracting Events from Probabilistic Streams
Page 29: Extracting Events from Probabilistic Streams

Future Work Implementing archived stream indexing.

Aggregations in timeAggressive indexingRanking? Top-K?

Shaper lines for complexityAre there more streamable queries?

Richer languageSimilar to linear style plansWhat do people need?

Temporal Models!Consistency

Page 30: Extracting Events from Probabilistic Streams

Correlations

Page 31: Extracting Events from Probabilistic Streams

Sequencing by example

Sequencing is parameterized [Cayuga]

502' ( ( , 5̀01'); ( , ))l At p At p l

( ,501)Joe ( ,502)Bob ( ,502)Joe

Time

( ,503)Joe

Semicolon means “the next event among those that match next goal”

Semicolon is not “after”

Page 32: Extracting Events from Probabilistic Streams

Compilation by example

Each goal “corresponds” to two letters:move (m) – the query should advanceaccept (a) – the next subgoal accepts

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l

1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

qM

Page 33: Extracting Events from Probabilistic Streams

Subtle example..

What about:

1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a

1a 2a

2{ }m

1 1 2( ,501) { , , }Joe m a m

2 2( ,502) { , }Joe m a

Any other maps to empty set0 2( , ) { }Joe l m

Final

Does not contain

Does contain

1M

2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe

0( , )Joe l

2M