extracting events from probabilistic streams
DESCRIPTION
Chris Re, Julie Letchner , Magdalena Balazinska and Dan Suciu University of Washington. Extracting Events from Probabilistic Streams. One Slide Overview. Motivating App: RFID Ecosystem Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop] - PowerPoint PPT PresentationTRANSCRIPT
EXTRACTING EVENTS FROM PROBABILISTIC
STREAMS
Chris Re, Julie Letchner,
Magdalena Balazinska and Dan Suciu
University of Washington
One Slide Overview Motivating App: RFID Ecosystem
Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]
Alert when anyone enters the coffee room Two problems
Missed readings, read-rates in practice are lowGranularity mismatch, e.g. Office v. Antenna 41
Instead, infer location from sensors Propose, keep probs & query with PEEX+
PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.
Motivating Apps
RFID appsDiary and Active Calendar Application.
○ Alert if I go to a database meeting.Supply chain
○ Alert if Mach 3 razors are being stolen
Many independent HMMsElder care [Intel,Patterson]
○ Alert if elder takes their medicine with waterFinancial applications on predictive HMM
○ Alert if head-and-shoulders market
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
The source of probabilities
Each orange particle is a guess of true location
6th Floor in PAC
Blue ring is ground truth
Connectivity Diagram
Antennas
PFs to a (prob) DB personTag t Loc P
Joe 7 O2 0.4
H2 0.2
H3 0.4
Joe 8 O2 0.6
H2 0.2
H3 0.2
Sue 7 … …
At(tag,loc)
To query Particle Filter output, query At
Tag t Loc P
Joe 7 O2 0.4
H2 0.2
H3 0.4
Joe 8 O2 0.6
H2 0.2
H3 0.2
Sue 7 … …
Semantics of the Model
At(tag,loc)
Tag t Loc
Joe 7 O2
Joe 8 O2
Sue 7 …
Prob =0.4 * 0.6 * …
NB: Markovian correlations OK
“Joe enter O2 at t=8”
(0.2 0.4)*0.6 0.36 Query Semantic: sum weight of all worlds where Q is true at time t
possible stream (worlds)
Probability outside O2 (in H2,H3)
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
A hierarchy of PEEX+ queries Regular Queries
Alert me when Joe goes to the coffee room Extended Regular
Alert when anyone goes to the coffee room Safe
Alert when anyone goes to the coffee room and a DB member follows them.
Hard Others (Simulation)This line is sharp for some queries
Peex+ Queries
Fragment of Cayuga, queries define events.Operator Description
Base stream
semicolon Sequence
Select
Kleene+
Technical Point: Left-to-right eval,
( ) ( )V P1( ) 1( ( , ))l At p lRoom
1( , )At p l
1 2( , ); ( , )At p l At p l
( )( , ) { , }Hall lAt p l p
g;
{ , }P V
1 2 3 1 2 3; ; ( ; );E E E E E E
Same p in both
p in some location
Regulars and Extended Regular
Query is regular if no variable is shared between subgoals
Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query
502 ( (' ', '501'); (' ', ))l At Joe At Joe l
502 ( ( , 5̀01 ); ( , ))l At p At p l p is shared between subgoals
Wrinkle in the language:Filter v. Selection
“Alert next time Joe is in 502 after he is in 501”
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Time
Yes
No
( ,501)Joe ( ,502)Joe( ,503)Joe
“Alert if the next place Joe is in after 501 is 502”
At
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
Why are ER queries hard?
Regular Queries ~ Regular ExpressionsMapping is non-trivial
○ similar to Cayuga [Demers et al. 06] Queries have #P-combined complexity
○ Can encode mDNF as regular expressionIntuition: n-sized automaton leads to
Extended regular ~ 1 NFA per/personk persons implies O(k)-size automatonExponential cost
time(2 )n
When ER, can avoid blowup
Algorithm for Regular Queries Overview
Deterministic Algorithm
1. Compile a query q1. NFA –like-thing in a language
2. Mapping events to subsets of
2. At runtime, at time t have events E1. Create set of symbols at time t:
2. Process NFA on
( ) ( )q qe E
M E M e
( )qM E
qL
qLqM
Focus on the compilation
Compile Select and Filter
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Intuition: goal maps to two letters:match (m) : matches filteraccept (a) : accepted by select
1 1 2 2{m , , , }L a m a
1a 2a
2{ }m Final
Does not contain
Does contain
language and automaton are the same for both queries
The difference is the mapping
1 1 2 2{m , , , }L a m a
1a 2a
2{ }m Final
Does not contain
Does contain
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Event Filter Select
( ,501)Joe11{ , }m a 1 1 2{ , , }m a m
( ,502)Joe22{ , }m a
2 2{ , }m a
0( , )Joe l2{ }m
Regular Queries w. Probabilities
Probabilistic Algorithm
1. Compile a query q1. NFA with transition in a language
2. Mapping events to subsets of
2. At time t have events E with probs1. Create set of symbols at time t:
2. Process NFA on
( ) ( )q qe E
M E M e
( )qM E
qLqLqM
Stays the same
distribution on inputs
Algorithm is constant in data, exponential in |Q|
distribution on states
State at t+1 only depends on state at t and input at t+1
Extension to Extended regular “Alert when anyone in 501 and next step
in 502”
If substitute for p, result is regular
Bindings use disjoint sets of tuples. Algorithm: independent copies, multiply
`502'( ( , 5̀01 ); ( , ))select lq At p At p l
`502'[ ] ( ( , 5̀01 ); ( , ))lq p Joe At Joe At Joe l
Depends on # distinct values (shared vars), not # of timesteps – can stream
`502'[ ] ( ( , 5̀01 ); ( , ))lq p Tom At Tom At Tom l
Recap of Algorithms
Regular QueriesCompiled them to an NFA, then used imageData complexity O(1)
Extended regularSeveral regulars multiplied togetherDepends on number of distinct people in the
data, not number of time steps. Markov Correlations: more arithmetic &
state
PEEX+ Algorithms and Analysis
Compilation procedures Safe plans.
More complicated based on algebracost grows with data (useful for archives)
Aggregates Complexity: Can we do better?
For a restricted class, draw a crisp lineMinor variants of safe result in hardness
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
Experimental Setup
Quality Experiment52 objects, 352 locations, 10k sq. ft.
○ 2x30m trace with 10 m break in betweenParticipants marked down true locations“Alert when anyone enters the Coffee Room”
Consider two ScenariosRealtime (No correlations) v. MLEArchived (Smoothing) v. Viterbi
2 1( ) ( ) 1 2( ( ( , )); ( , ))Coffee l Hallway l At p l At p l
In practice, can smooth in a short time
Quality: Realtime Declare an event “true”, if its Pr > threshold
Vary threshold
0
0.2
0.4
0.6
0.8
1Precision
0
0.2
0.4
0.6
0.8
1Recall
0
0.2
0.4
0.6
0.8
1F1
10% improvement in F1
Quality: Archived
Smoothing v. ViterbiPEEX keeps track of Markovian Correlations
0
0.2
0.4
0.6
0.8
1
Precision Recall F1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Approx ~30% gain in F1
Performance
Conclusion Showed PEEX+
Processed output of several inference tasks○ Applies more generally than just RFID
Quality (F1) gains by keeping probability50% from probs, 50% from correlations
Performance was usable in real-timeNo indexing!
Preprint available on request
Future Work Implementing archived stream indexing.
Aggregations in timeAggressive indexingRanking? Top-K?
Shaper lines for complexityAre there more streamable queries?
Richer languageSimilar to linear style plansWhat do people need?
Temporal Models!Consistency
Correlations
Sequencing by example
Sequencing is parameterized [Cayuga]
502' ( ( , 5̀01'); ( , ))l At p At p l
( ,501)Joe ( ,502)Bob ( ,502)Joe
Time
( ,503)Joe
Semicolon means “the next event among those that match next goal”
Semicolon is not “after”
Compilation by example
Each goal “corresponds” to two letters:move (m) – the query should advanceaccept (a) – the next subgoal accepts
1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l
1 1 1 2 2{m , , , }L a m a
1a 2a
2{ }m1 1 2( ,501) { , , }Joe m a m
2 2( ,502) { , }Joe m a
Any other maps to empty set0 2( , ) { }Joe l m
Final
Does not contain
Does contain
qM
Subtle example..
What about:
1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a
1a 2a
2{ }m
1 1 2( ,501) { , , }Joe m a m
2 2( ,502) { , }Joe m a
Any other maps to empty set0 2( , ) { }Joe l m
Final
Does not contain
Does contain
1M
2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe
0( , )Joe l
2M