Maria Hybinette, UGA 1
Towards Adaptive Caching for
Parallel and Distributed Simulation
Abhishek Chugh & Maria Hybinette
Computer Science Department
The University of Georgia
WSC-2004
Maria Hybinette, UGA 2
Airspace
Atlanta Munich
Simulation Model Assumptions
Collection of Logical Processes (LPs) Assume LPs do not share state variables Communicate by exchanging time stamped
messages
LP
LPLP
LP
LP
Maria Hybinette, UGA 3
Problem & Goal
Problem:Inefficiency in PDES: Redundant computations
Observation:Computations repeat: » Long run of simulations» Cyclic Systems» Communication network simulations
Goal:Increase efficiency by reusing computations
Maria Hybinette, UGA 4
LPLPLPLPMsgMsgMsg
Cache
Approach
Cache computations and re-use when they repeat instead of re-compute.
Msg Msg Msg
Msg
Msg MsgMsg LP
Msg LP
Msg
Msg
LP
LP
Msg
Maria Hybinette, UGA 5
Approach: Adaptive Caching
Cache computations and re-use when they repeat instead of re-compute.
Generic caching mechanism independent of simulation engine and application
Caveat: Different factors that impact the effectiveness of caching
» Proposal: An adaptive approach
Msg LP
Msg LP
Msg
Msg
LP
LP
Cache
Maria Hybinette, UGA 6
Factors Affecting Caching Effectiveness
Cache size Cost of looking up into the cache and
updating cache Execution time of the computation Probability of a hit: Hit rate
Maria Hybinette, UGA 7
Effective Caching Cost
E(Costuse_cache) =
hit_rate * Costlookup_hit
+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)
Maria Hybinette, UGA 8
Caching is Not Always a Good Idea
E(Costuse_cache) =
hit_rate * Costlookup_hit
+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)
Hit rate low, or Very fast computation Only when Costuse_cache < Costcomputation is caching
worthwhile
Maria Hybinette, UGA 9
How Much Speedup is Possible?
Neglecting cache warm up and fixed costs
Expected Speedup = Costcomputation / Costuse_cache
Upper bound (hit_rate = 1)
= Costcomputation / Costlookup
In our experiments Costcomputation / Costlookup = ~3.5
Maria Hybinette, UGA 10
Related Work
Function Caching: Replace application level function calls with cache queries:
» Introduced by: Bellman (1957); Michie (1968)» Incremental computations:
– Pugh & Teitelbaum (1989), Liu & Teitelbaum (1995)» Sequential discrete event simulation:
– Staged Simulation: Walsh & Sirer (2003) function caching + currying (break up computations), re-ordering and pre-computations),
Decision Tool Techniques for PADS: Multiple runs of similar simulations
» Simulation Cloning: Hybinette & Fujimoto (1998); Chen & Turner, et al (2002); Straburger (2000)
» Updateable Simulations (Ferenci et al 2002) Related Optimization Techniques
» Lazy Re-Evaluation: West (1988)
Maria Hybinette, UGA 11
Overview of Adaptive Caching
Execution time:
1. Warm-up execution phase, for each function:a) Monitor: hit rate, query time, function run time
b) Determine utility of using cache
2. Main execution phase, for each function:a) Use cache (or not) depending on results from 1
b) Randomly sample: hit rate, query time, function run time» Revise decision if conditions change
Maria Hybinette, UGA 12
What’s New
Decision to use cache is made dynamically » in response to unpredictable local conditions for each LP at
execution time
Relieves user of having to know whether something is worth caching
» adaptive method will automatically identify caching opportunities, reject poor caching choices
Easy to use caching API » independent of application or simulation kernel
» cache middleware
Distributed cache» Each LP maintains own independent cache
Maria Hybinette, UGA 13
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init()
{
cacheInitialize(int argc, char** argv);
}
Maria Hybinette, UGA 14
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init()
{
cacheInitialize(int argc, char** argv);
}
Maria Hybinette, UGA 15
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init(){cacheInitialize(int argc, char** argv);
}
Proc(state, msg, MyPE){retval = cacheCheckStart( currentstate, event );if( retval == NULL )
{/* original LP code. compute new state and events to be scheduled */
/* allow cache to save results */cacheCheckEnd( newstate, newevents ) ;}
else{newstate = retval.state;newevents = retval.events;}
schedule( newevents );
}
Maria Hybinette, UGA 16
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init(){cacheInitialize(int argc, char** argv);
}
Proc(state, msg, MyPE){retval = cacheCheckStart( currentstate, event );if( retval == NULL )
{/* original LP code. compute new state and events to be scheduled */
/* allow cache to save results */cacheCheckEnd( newstate, newevents ) ;}
else{newstate = retval.state;newevents = retval.events;}
schedule( newevents );
}
Maria Hybinette, UGA 17
Implementation
Maria Hybinette, UGA 18
Caching Middleware
Simulation Application
Cache Middleware
Simulation Kernel
Maria Hybinette, UGA 19
Caching Middleware (Hit)
Simulation Application
Cache Middleware
Simulation Kernel
Check cache state/message Cache Hit
Maria Hybinette, UGA 20
Caching Middleware (Miss)
Simulation Application
Cache Middleware
Simulation Kernel
Check cache state/message
Miss or cache lookup expensive
Miss: Cache new state & message
Cache Miss
Maria Hybinette, UGA 21
Cache Implementation
Hash table and separate chaining Input: Current State & Message Output: State and output message(s) Hash function (djb2 by Dan Bernstein, Perl)
Maria Hybinette, UGA 22
Memory Management
Distributed cache; one for each LP Pre-allocate memory pool for cache in each
LP during initialization phase Upper limit parameterized
Maria Hybinette, UGA 23
Experiments
3 Sets of Experiments with P-Hold» Proof of concept (no adaptive caching) hit-rate» Evaluation of impact of cache size and simulation
running time on speedup (no caching/caching)» Evaluation of adaptive caching with regard to the cost of
event computation 16 processor SGI Origin 2000
» 4 processors
“Curried” out time stamps
Maria Hybinette, UGA 24
0
10
20
30
40
50
60
70
80
90
100
0 20000 40000 60000 80000 100000 120000 140000 160000 180000
Progress (Simulated Time)
Hit
Rate
(Perc
en
tag
e %
)
90 KB (10%)
25000 KB (25%)
10000 KB (100%)
Hit Rate versus Progress
As expected hit ratio increases as cache size increases Maximum hit rate for large cache Hit rates sets an upper bound for speedup
Maria Hybinette, UGA 25
Speedup vs Cache Size
0
0.5
1
1.5
2
2.5
3
3.5
0 2000 4000 6000 8000 10000
Size of Cache (KB)
Spe
edu
p (
No C
achin
g/C
ach
ing
)
5 msec3 msec
Speedup improves as size of the cache increases Beyond size 9,000KB speedup declines and levels off Better performance for simulations with computations
that have higher latency
Maria Hybinette, UGA 26
Speedup vs Costcomputation
Non-adaptive caching suffers a speedup of 0.82 for low latency computations and improves to 1 when the computational latency approaches 1.5 msec
0.8
0.85
0.9
0.95
1
1.05
1.1
0 0.5 1 1.5 2 2.5 3
Computational Latency (msec)
Speedup (
Cach
ing/N
o C
ach
ing)
Non-Adaptive
Maria Hybinette, UGA 27
Speedup vs Costcomputation
Adaptive Caching, tracks the cost of consulting the cast in comparison of running the actual computation
Adaptive caching is 1 for small computational latencies (selects performing computation instead of consulting cache)
0.8
0.85
0.9
0.95
1
1.05
1.1
0 0.5 1 1.5 2 2.5 3
Computational Latency (msec)
Speedup (
Cach
ing/N
o C
ach
ing)
Non-Adaptive
Adaptive
Maria Hybinette, UGA 28
Summary & Future Work
Summary: Middleware implementation that require no major
structural revision of application code Best case speedup approaches 3.5 worst case speedup
of 1 (speedup is limited to a hit rate of 70%) Random generated information (such as time stamps or
other) caching may become ineffective unless taking pre-cautions
Future Work: Function caching instead of LP caching Look at series of functions to jump forward Adaptive replacement strategies
Maria Hybinette, UGA 29
Closing
“A sword wielded poorly will kill it’s owner”
-- Ancient Proverb
Maria Hybinette, UGA 30
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init()
{
//
//
//
//
}
Proc(state, msg, MyPE)
{
val1 =
fancy_function(msg->param1,
state->key_part);
val2 =
fancier_function(msg->param3);
state->key_part = val1 + val2;
}
Maria Hybinette, UGA 31
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init()
{
//
//
//
//
}
Proc(state, msg, MyPE)
{
val1 =
fancy_function(msg->param1,
state->key_part);
val2 =
fancier_function(msg->param3);
state->key_part = val1 + val2;
}
Maria Hybinette, UGA 32
Pseudo-Code Example
// ORIGINAL LP CODE
LP_init()
{
//
//
//
//
}
Proc(state, msg, MyPE)
{
val1 =
fancy_function(msg->param1,
state->key_part);
val2 =
fancier_function(msg->param3);
state->key_part = val1 + val2;
}
// LP CODE WITH CACHING
LP_init()
{
cache_init(FF1, SIZE1, 2,
fancy_function);
cache_init(FF2, SIZE2, 1,
fancier_function);
}
Proc(state, msg, MyPE)
{
val1 =
cache_query(FF1, msg->param1,
state->key_part);
val2 =
cache_query(FF2, msg->param3);
State->key_part = val1 + val2;
}
Maria Hybinette, UGA 33
Approach
Cache computations and re-use when they repeat instead of re-compute.
LP
LPLP
LPLP
LPLP
LPLP
LP