Ensuring Safe Composi1on of Distributed Processes Jacob Beal
QA4SASO Workshop @ IEEE SASO September, 2015
A generative approach to safety
Restrict your development environment…
… to contain only resilient distributed systems.
sensors&
local&func,ons&
actuators&
Applica'on*Code*
Developer*APIs*
Field*Calculus*Constructs*
Resilient*Coordina'on*Operators*
Device*Capabili'es*
built&ins) repnbr if
TG ifCbuilt&ins)
communica,on& state& restric,on&
Percep,on&summarizeaverageregionMax…
Ac,on& State&
Collec,ve&Behavior&
distanceTobroadcastpartition…
timerlowpassrecentTrue…
collectivePerceptioncollectiveSummarymanagementRegions…
Crowd&Management&dangerousDensity crowdTrackingcrowdWarning safeDispersal
Everything is a wireless computer
Example: Services for Mass Events
4
Example: Managing Crowd Danger
Dangerous Density Warning
Dispersal Advice
Crowd Es1ma1on
Conges1on-‐Aware Naviga1on
Distributed services are complex
How to make engineering resilience tractable?
Aggregate Programming
sensors&
local&func,ons&
actuators&
Applica'on*Code*
Developer*APIs*
Field*Calculus*Constructs*
Resilient*Coordina'on*Operators*
Device*Capabili'es*
built&ins) repnbr if
TG ifCbuilt&ins)
communica,on& state& restric,on&
Percep,on&summarizeaverage
regionMax…
Ac,on& State&
Collec,ve&Behavior&
distanceTobroadcast
partition…
timerlowpass
recentTrue…
collectivePerceptioncollectiveSummary
managementRegions…
Crowd&Management&dangerousDensity crowdTrackingcrowdWarning safeDispersal
Aggregate Programming Stack
Field Calculus
Variable
Abstract & Call Func1ons
Local Ops Literal Value
Communica1on
Domain Change
State
[Viroli et al., ‘13, Damiani et al., ‘15]
Field Calculus is Space-Time Universal
Space-time Universal = arbitrarily good approximation of any causal, finitely-approximable computation
[Beal, ’10; Beal et al., SCW 14]
!"#$%
&'()$%
!"#$%
&'()$%
!"#$%
&'()$%
!"#$%&'!"()*+&##,$&-%+&..$/0!1&2/"'+
!"#$%
&'()$%
!"
!"#$%
&'()$%
!"
!"#$%
&'()$%
!" #$%"#$&"
&$'"%$&"
#"
#$'"&$("#$(" %$(" '$("
Specifica1on:
Acausal Non-‐approximable
Instantiation: Protelis
• Java-hosted & integrated • Java-like syntax • Eclipse support
Daemon'Device'
Protelis'Parser'
Protelis'Device'
Environment'Variables'
Protelis'Interpreter'
Protelis)Program)
Service'Manager'Daemon'
Networked'Service'
Start/Stop'Signals'
Status,'Sockets'
Other&Services&
Other&Managers&
Enterprise&Server&
Protelis)Parser)
Protelis)Device)
Environment)Variables)
Protelis)Interpreter)
Protelis)Program)
Other&Devices&
Simulated*Device*
Protelis*Parser*
Protelis*Device*
Environment*Variables*
Protelis*Interpreter*
Protelis)Program)
Simulated*Environment*
Simula8on*Builder*
Simula/on)Script)
Alchemist*Simulator*
Simulated*Network*
Architecture:
[Pianini, Viroli, & Beal, SAC ‘15]
sensors&
local&func,ons&
actuators&
Applica'on*Code*
Developer*APIs*
Field*Calculus*Constructs*
Resilient*Coordina'on*Operators*
Device*Capabili'es*
built&ins) repnbr if
TG ifCbuilt&ins)
communica,on& state& restric,on&
Percep,on&summarizeaverage
regionMax…
Ac,on& State&
Collec,ve&Behavior&
distanceTobroadcast
partition…
timerlowpass
recentTrue…
collectivePerceptioncollectiveSummary
managementRegions…
Crowd&Management&dangerousDensity crowdTrackingcrowdWarning safeDispersal
Aggregate Programming Stack
Self-Stabilizing Building Blocks
G
Informa5on spreading
C
Informa5on collec5on
3 1
1
4
3
7
3 0
2
T
Short-‐term memory
Resilience by construcOon: all programs from these building blocks are also self-‐stabilizing!
Applying building blocks:
Example API algorithms from building blocks: distance-‐to (source) max-‐likelihood (source p) broadcast (source value) path-‐forecast (source obstacle) summarize (sink accumulate local null) average (sink value) integral (sink value) region-‐max (sink value) Omer (length) limited-‐memory (value Omeout) random-‐voronoi (grain metric) group-‐size (region) broadcast-‐region (region source value) recent-‐event (event Omeout) distance-‐avoiding-‐obstacles (source obstacles)
Since based on these building blocks, all programs built this way are self-‐stabilizing!
Complex Example: Crowd Management (def crowd-tracking (p) ;; Consider only Fruin LoS E or F within last minute (if (recently-true (> (density-est p) 1.08) 60) ;; Break into randomized ‘‘cells’’ and estimate danger of each (+ 1 (dangerous-density (sparse-partition 30) p)) 0))
(def recently-true (state memory-time) ;; Make sure first state is false, not true... (rt-sub (not (T 1 1)) state memory-time))(def rt-sub (started s m) (if state 1 (limited-memory s m)))
(def dangerous-density (partition p) ;; Only dangerous if above critical density threshold... (and (> (average partition (density-est p)) 2.17) ;; ... and also involving many people. (> (summarize partition + (/ 1 p) 0) 300)))
(def crowd-warning (p range) (> (distance-to (= (crowd-tracking p) 2)) range)
(def safe-navigation (destination p) (distance-avoiding-obstacles destination (crowd-warning p)))
18 lines non-‐whitespace code 10 library calls (21 ops) IF: 3 G: 11 C: 4 T: 3
Optimization of Dynamics
Minimal'Resilient'Implementa.on'
Op.mized'Implementa.on'
Self5Org.'System'Specifica.on'
Substitution Library
"Building Block" Library
Decompose(into(Building(Blocks(
Op3mize(by(subs3tu3on(
Subs3tu3on(rela3ons(
0 100 200 300 400 500 600 700 800 900 1000
0
0.2
0.4
0.6
0.8
1
Time
Mea
n Er
ror
0 200 400 600 800 1000 1200 1400 1600 1800 200010−1
100
101
102
103
Time
Error
See our talk in the main program of SASO!
Predicting and Controlling Dynamics
Distance estimate: prediction from transient response:
0 100 200 300 400 500 600 700 800 900 1000
100
101
102
Time
∆+
D" D">" >"
r1" r2"
output%U"source%s1% s2% sout%d1% d2%s0%
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
Time
Frac
tion
Erro
r
See our talk in the FOCAS/SCOPES workshop!
sensors&
local&func,ons&
actuators&
Applica'on*Code*
Developer*APIs*
Field*Calculus*Constructs*
Resilient*Coordina'on*Operators*
Device*Capabili'es*
built&ins) repnbr if
TG ifCbuilt&ins)
communica,on& state& restric,on&
Percep,on&summarizeaverage
regionMax…
Ac,on& State&
Collec,ve&Behavior&
distanceTobroadcast
partition…
timerlowpass
recentTrue…
collectivePerceptioncollectiveSummary
managementRegions…
Crowd&Management&dangerousDensity crowdTrackingcrowdWarning safeDispersal
Aggregate Programming Stack
Crowd Safety Services
def dangerousDensity(p, r) { let mr = managementRegions(r*2, () -‐> { nbrRange }); let danger = average(mr, densityEst(p, r)) > 2.17 && summarize(mr, sum, 1 / p, 0) > 300; if(danger) { high } else { low } } def crowdTracking(p, r, t) { let crowdRgn = recentTrue(densityEst(p, r)>1.08, t); if(crowdRgn) { dangerousDensity(p, r) } else { none }; } def crowdWarning(p, r, warn, t) { distanceTo(crowdTracking(p,r,t) == high) < warn }
DisseminaOon of new versions Pre-‐empOve modulaOon of prioriOes
Dependency-Directed Recovery
See our talk in the main program of SASO!
2 4 6 8 10 12 14 16 18 20 220
10
20
30
40
50
60
70
# of services
Res
tart
time
(sec
onds
)
Fixed�OrderErdos�RenyiLayeredClustered
Drama1cally beSer recovery 1me
2 4 6 8 10 12 14 16 18 20 22
0
10
20
30
40
50
60
70
80
90
100
# of services
Pe
rce
nt o
f q
ue
ry o
rig
in
ato
rs a
ffe
cte
d
Erdos�Renyi
Layered
Clustered
Fewer services disrupted
App#1#
Portal'
Exchange'
Core'Logic'
Database'
App'Server'Legacy'Unix'
App#2#
Gateway#B#Gateway#A#
Core#1#
Core#2#
Core#3#
Iden1fy & shut down affected services
Non-‐dependent services s1ll run
Distributed Rewind and Replay
Tactical Cloud Services
sensors&
local&func,ons&
actuators&
Applica'on*Code*
Developer*APIs*
Field*Calculus*Constructs*
Resilient*Coordina'on*Operators*
Device*Capabili'es*
built&ins) repnbr if
TG ifCbuilt&ins)
communica,on& state& restric,on&
Percep,on&summarizeaverage
regionMax…
Ac,on& State&
Collec,ve&Behavior&
distanceTobroadcast
partition…
timerlowpass
recentTrue…
collectivePerceptioncollectiveSummary
managementRegions…
Crowd&Management&dangerousDensity crowdTrackingcrowdWarning safeDispersal
Summary: Aggregate Methodology
Daemon'Device'
Protelis'Parser'
Protelis'Device'
Environment'Variables'
Protelis'Interpreter'
Protelis)Program)
Service'Manager'Daemon'
Networked'Service'
Start/Stop'Signals'
Status,'Sockets'
Other&Services&
Other&Managers&
Enterprise&Server&
restrict
Minimal'Resilient'Implementa.on'
Op.mized'Implementa.on'
Self5Org.'System'Specifica.on'
Substitution Library
"Building Block" Library
Decompose(into(Building(Blocks(
Op3mize(by(subs3tu3on(
Subs3tu3on(rela3ons(
Acknowledgements
• Shane Clark • Partha Pal • Kyle Usbeck
• Mirko Viroli • Danilo Pianini
• Soura Dasgupta • Raghu Mudumbai • Amy Kumar
• Ferruccio Damiani