domain-specialized cache management for graph...
TRANSCRIPT
Domain-Specialized Cache Management
for Graph Analytics
Priyank Faldu, Boris Grot
This research is partially supported by a grant from Oracle Labs.
Jeff Diamond
1
Cache management in the age of big data
Variety of application domains
Working set size much larger than typical SPEC benchmarks
- Vastly different cache access patterns across domains
2
Data Analytics Graph Analytics Machine Learning
HPCA'20
Cache management in the age of big data
Variety of application domains
Working set size much larger than typical SPEC benchmarks
- Vastly different cache access patterns across domains
Yet, cache management mechanisms are “domain-agnostic”
- Assumption: one size fits all
2
Data Analytics Graph Analytics Machine Learning
HPCA'20
Cache management in the age of big data
Variety of application domains
Working set size much larger than typical SPEC benchmarks
- Vastly different cache access patterns across domains
Yet, cache management mechanisms are “domain-agnostic”
- Assumption: one size fits all
2
Graph Analytics
A case for domain-specialized cache management
HPCA'20
Domain-agnostic techniques for graph analytics
3
HPCA'20
Domain-agnostic techniques for graph analytics
3
Winner of the latest cache replacement championship
HPCA'20
Domain-agnostic techniques for graph analytics
3
HPCA'20
Domain-agnostic techniques for graph analytics
3
Slowdown
HPCA'20
Domain-agnostic techniques for graph analytics
3
Slowdown
1-15% geomean slowdown
HPCA'20
Outline
➢ Performance of domain-agnostic cache management
➢ Graph analytics
➢ GRASP: domain-specialized cache management
- Software-guided reuse-prediction
- Hardware-enforced cache management
➢ Performance evaluation
4
HPCA'20
Applications of graph analytics
Extract meaningful information out of complex many-to-many
relationships among objects
Community Analysis
- Identify customers with similar interests
5
HPCA'20
Applications of graph analytics
Extract meaningful information out of complex many-to-many
relationships among objects
Community Analysis
- Identify customers with similar interests
Connectivity Analysis
- Find weakness in a network
Path Analysis
- Route optimization for distribution and supply chain
Centrality Analysis
- Most influential people and information in social media
And many others …5
HPCA'20
Real-world graphs & power-law degree distribution
Small fraction of vertices have high connectivity – hot vertices
Large fraction of vertices have low connectivity – cold vertices
Prevalent in many domains – e.g., Twitter user-follower graph
6
HPCA'20
Real-world graphs & power-law degree distribution
Small fraction of vertices have high connectivity – hot vertices
Large fraction of vertices have low connectivity – cold vertices
Prevalent in many domains – e.g., Twitter user-follower graph
6
∼700
Average User
HPCA'20
Real-world graphs & power-law degree distribution
Small fraction of vertices have high connectivity – hot vertices
Large fraction of vertices have low connectivity – cold vertices
Prevalent in many domains – e.g., Twitter user-follower graph
6
∼700
Average User
∼72M
Donald Trump
HPCA'20
Real-world graphs & power-law degree distribution
Small fraction of vertices have high connectivity – hot vertices
Large fraction of vertices have low connectivity – cold vertices
Prevalent in many domains – e.g., Twitter user-follower graph
6
∼700
Average User
∼72M
Donald Trump
How does connectivity influence cache locality?
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
Example Graph
Vertex Properties
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0Example Graph
Vertex Properties
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V0 E P2 E P5
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V0 E P2 E P5 V1 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
Key observation: vertex reuse is proportional to its degree
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
Key observation: vertex reuse is proportional to its degree
HPCA'20
A canonical example of graph analyticsComputes property for a vertex based on its neighbors' properties
7
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
Key observation: vertex reuse is proportional to its degree
Hot vertices → Small footprint + High reuse
HPCA'20
Challenging to identify hot vertices in hardwareDomain-agnostic techniques rely on purely hardware mechanisms
8
HPCA'20
Challenging to identify hot vertices in hardwareDomain-agnostic techniques rely on purely hardware mechanisms
8
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
Challenging to identify hot vertices in hardwareDomain-agnostic techniques rely on purely hardware mechanisms
8
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Reason ❶ Irregular Accesses
Example Graph
Vertex Properties
Cache Accesses in Time
Hot
Hot
HPCA'20
Challenging to identify hot vertices in hardwareDomain-agnostic techniques rely on purely hardware mechanisms
8
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Reason ❶ Irregular Accesses
Example Graph
Vertex Properties
Reason ❷ Long Reuse Distances
Cache Accesses in Time
Hot
Hot
HPCA'20
Challenging to identify hot vertices in hardwareDomain-agnostic techniques rely on purely hardware mechanisms
8
V4
V0
V1
V2 V3
V5
P0
P1
P2
P3
P4
P5
V0
V1
V2
V3
V4
V5
V0 E P2 E P5 V1 E P4 V2 E P0 E P3 E P1 V3 E P4 V4 E P2 V5 E P4
Reason ❶ Irregular Accesses
Example Graph
Vertex Properties
Reason ❷ Long Reuse Distances
Cache Accesses in Time
Hot
Hot
Idea: Leverage domain-knowledge for reuse prediction
HPCA'20
Proposal: GRASP – a software-hardware co-design
Software aids hardware in identifying hot vertices
Hardware preferentially caches hot vertices
9
HPCA'20
Outline
➢ Performance of domain-agnostic cache management
➢ Graph analytics
➢ GRASP: domain-specialized cache management
- Software-guided reuse-prediction
- Hardware-enforced cache management
➢ Performance evaluation
10
HPCA'20
GRASP: Software-guided reuse-prediction
Task: Let software aid hardware in identifying hot vertices
11
HPCA'20
GRASP: Software-guided reuse-prediction
Task: Let software aid hardware in identifying hot vertices
Challenge: Non-trivial due to sparse distribution of hot
vertices in memory
11
P0
P1
P2
P3
P4
P5
Vertex Properties
Hot
Hot
HPCA'20
GRASP: Software-guided reuse-prediction
Task: Let software aid hardware in identifying hot vertices
Challenge: Non-trivial due to sparse distribution of hot
vertices in memory
11
Idea: Leverage prior graph reordering optimization
P0
P1
P2
P3
P4
P5
Vertex Properties
Hot
Hot
HPCA'20
Optimization: skew-aware graph reordering
Vertices are ordered in memory based on their assigned IDs
Changing vertex order to improve cache locality [IISWC’19]
12
HPCA'20
Optimization: skew-aware graph reordering
Vertices are ordered in memory based on their assigned IDs
Changing vertex order to improve cache locality [IISWC’19]
V4
V0
V1
V2 V3
V5
Hot
Hot
Original Vertex Order12
HPCA'20
Optimization: skew-aware graph reordering
Vertices are ordered in memory based on their assigned IDs
Changing vertex order to improve cache locality [IISWC’19]
Degree-based SortV4
V0
V1
V2 V3
V5 V0
V2
V3
V1 V4
V5
Hot
Hot
Original Vertex Order
Graph is unchanged
12
HPCA'20
Optimization: skew-aware graph reordering
Vertices are ordered in memory based on their assigned IDs
Changing vertex order to improve cache locality [IISWC’19]
Degree-based Sort
Hot vertices are placed
in a contiguous region
V4
V0
V1
V2 V3
V5 V0
V2
V3
V1 V4
V5
Hot
Hot
Original Vertex Order
Hot
Hot
New Vertex Order
Graph is unchanged
12
HPCA'20
Optimization: skew-aware graph reordering
Vertices are ordered in memory based on their assigned IDs
Changing vertex order to improve cache locality [IISWC’19]
Degree-based Sort
Hot vertices are placed
in a contiguous region
V4
V0
V1
V2 V3
V5 V0
V2
V3
V1 V4
V5
Hot
Hot
Original Vertex Order
Hot
Hot
New Vertex Order
Graph is unchanged
12
Easy to communicate the region boudary to hardware
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
13
Hot
Vertices
Cold
Vertices
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
13
Hot
Vertices
Cold
Vertices
Region
Start
Region
End
Architecturally exposed
configuration registers
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
❷ Initialization:
Software populates
configuration registers
13
Hot
Vertices
Cold
Vertices
Region
Start
Region
End
Architecturally exposed
configuration registers
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
❷ Initialization:
Software populates
configuration registers
❸ Initialization:
Hardware logically
partitions the
Property Array13
Hot
Vertices
Cold
Vertices
Region
Start
Region
End
Architecturally exposed
configuration registers
High Reuse
Region
Low Reuse
Region
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
❷ Initialization:
Software populates
configuration registers
❸ Initialization:
Hardware logically
partitions the
Property Array13
Hot
Vertices
Cold
Vertices
Region
Start
Region
End
LLC
size
Architecturally exposed
configuration registers
High Reuse
Region
Low Reuse
Region
HPCA'20
GRASP: Region-based lightweight interface
❶ Preprocessing:
Software applies
skew-aware
reordering
❷ Initialization:
Software populates
configuration registers
❸ Initialization:
Hardware logically
partitions the
Property Array13
Hot
Vertices
Cold
Vertices
Region
Start
Region
End
LLC
size
Architecturally exposed
configuration registers
High Reuse
Region
Low Reuse
Region
Software involvement is limited to initialization
HPCA'20
GRASP: Reuse prediction at runtime
14
High Reuse Hint
Low Reuse Hint
Cache AccessDoes it
belong to
High Reuse
Region?
Yes
No
HPCA'20
GRASP: Reuse prediction at runtime
14
High Reuse Hint
Low Reuse Hint
Cache AccessDoes it
belong to
High Reuse
Region?
Yes
No
Prediction is entirely done in hardware
HPCA'20
Outline
➢ Performance of domain-agnostic cache management
➢ Graph analytics
➢ GRASP: domain-specialized cache management
- Software-guided reuse-prediction
- Hardware-enforced cache management
➢ Performance evaluation
15
HPCA'20
GRASP: Hardware-enforced cache management
Task: Preferentially cache hot vertices
Challenge: LLC capacity is limited- Not all hot vertices can fit
16
HPCA'20
GRASP: Hardware-enforced cache management
Task: Preferentially cache hot vertices
Challenge: LLC capacity is limited- Not all hot vertices can fit
16
Hot
Vertices
Cold
Vertices
High Reuse
Region
Low Reuse
Region
Hot vertices but predicted to have
low reuse due to limited LLC capacity
LLC
size
HPCA'20
GRASP: Hardware-enforced cache management
Task: Preferentially cache hot vertices
Challenge: LLC capacity is limited- Not all hot vertices can fit
16
Hot
Vertices
Cold
Vertices
High Reuse
Region
Low Reuse
Region
Hot vertices but predicted to have
low reuse due to limited LLC capacity
Requirement: Keep cache management flexible
LLC
size
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
Highest Priority
(MRU)
Lowest Priority
(LRU)
Way 1 Way 4
4-Way Set Associative Cache
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
Insertion
Way 1 Way 4
4-Way Set Associative Cache
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
Insertion
Hit Promotion
Way 1 Way 4
4-Way Set Associative Cache
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
Insertion
Hit Promotion
Eviction
Way 1 Way 4
4-Way Set Associative Cache
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
GRASP policies for Low Reuse Prediction
High
Reuse
Low
Reuse
Cache
Access
Insertion
Hit Promotion
Eviction
Way 1 Way 4
4-Way Set Associative Cache
GRASP policies for High Reuse Region
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
GRASP policies for Low Reuse Prediction
High
Reuse
Low
Reuse
Cache
Access
Insertion
Insertion
Eviction
Way 1 Way 4
Way 1 Way 4
4-Way Set Associative Cache
GRASP policies for High Reuse Region
HPCA'20
Hit Promotion
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
GRASP policies for Low Reuse Prediction
High
Reuse
Low
Reuse
Cache
Access
Insertion
Insertion
Hit Promotion
Hit Promotion
Eviction
Way 1 Way 4
Way 1 Way 4
4-Way Set Associative Cache
GRASP policies for High Reuse Region
HPCA'20
GRASP: Preferential but flexible cache management
17
LRU Cache management Technique
GRASP policies for Low Reuse Prediction
High
Reuse
Low
Reuse
Cache
Access
Insertion
Insertion
Hit Promotion
Hit Promotion
Eviction
Eviction
Way 1 Way 4
Way 1 Way 4
4-Way Set Associative Cache
GRASP policies for High Reuse Region
HPCA'20
GRASP is simple!
Software- Off the shelf skew-aware reordering optimization
- Compatible with multiple skew-aware reordering techniques
18
HPCA'20
GRASP is simple!
Software- Off the shelf skew-aware reordering optimization
- Compatible with multiple skew-aware reordering techniques
Lightweight Interface- Software configures a pair of registers at initialization
- No software dependency after initialization
18
HPCA'20
GRASP is simple!
Software- Off the shelf skew-aware reordering optimization
- Compatible with multiple skew-aware reordering techniques
Lightweight Interface- Software configures a pair of registers at initialization
- No software dependency after initialization
Hardware- Lightweight address comparison logic to infer the reuse hint
- Trivial policy changes
- Minimal modifications to cache structure – no additional metadata
18
HPCA'20
GRASP is simple!
Software- Off the shelf skew-aware reordering optimization
- Compatible with multiple skew-aware reordering techniques
Lightweight Interface- Software configures a pair of registers at initialization
- No software dependency after initialization
Hardware- Lightweight address comparison logic to infer the reuse hint
- Trivial policy changes
- Minimal modifications to cache structure – no additional metadata
18
Accelerating graph analytics at minimal cost
HPCA'20
Outline
➢ Performance of domain-agnostic cache management
➢ Graph analytics
➢ GRASP: domain-specialized cache management
- Software-guided reuse-prediction
- Hardware-enforced cache management
➢ Performance evaluation
19
HPCA'20
Evaluation methodology
Evaluated 25 benchmarks (5 applications x 5 graph datasets)- Graph applications from the Ligra framework [PPoPP’13]
- Graph datasets are 0.3GB – 8GB in Compressed Sparse Row
(CSR) format
Datasets are reordered using DBG [IISWC’19]- Degree-Based Grouping is state-of-the-art skew-aware reordering
Evaluated on the Sniper simulator [TACO’14]- 8 Out of Order cores
- 16MB shared LLC (2MB per core)
20
HPCA'20
Domain-agnostic techniques vs GRASP
21
HPCA'20
Domain-agnostic techniques vs GRASP
21
HPCA'20
Domain-agnostic techniques vs GRASP
21
✓ No Slowdown
✓ Up to 10.2% speed-up
HPCA'20
More results in paper
Evaluation of pinning-based techniques
Evaluation of GRASP on low-/no-skew graph datasets
Evaluation of GRASP on top of other reordering schemes
… and more
22
HPCA'20
Key take away: one size does NOT fit all
23
Look beyond domain-agnostic cache management
HPCA'20
Thank You
I am on the job market24
Priyank Faldu
Source code https://github.com/faldupriyank
Personal website www.faldupriyank.com
HPCA'20