cache coherence “can we do a better job of supporting cache coherence ?”

17
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim

Upload: mercia

Post on 23-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Cache Coherence “Can we do a better job of supporting cache coherence ?”. Ross Daly Chan Kim. Definition of CC. “For any given memory location, at any given moment in time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Cache Coher-ence

“Can we do a better job of supporting cache co-

herence?”

Ross DalyChan Kim

Page 2: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Definition of CC• “For any given memory location, at any given moment

in time, there is either a single core that may write it (and that may also read it) or some number of cores that may read it.”

• “Data-Value Invariant: the value of a memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch”

- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence, volume 6 of Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, May 2011.

Page 3: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Goals• Improve performance for cache coherency on multi-core/many-core systems.

• Scaling the number of cores to increase perfor-mance A

• Scaling the number of cores with out increasing cache coherence complexity.

Page 4: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Xpoint Cache• Motivation:

Page 5: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Xpoint: Architecture(2D)

Typical bus based Architecture Xpoint Architecture

Page 6: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Xpoint: Architecture(3D)

Page 7: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Xpoint: Results• 29x speedup for 32 core system

• 45x speedup for 64 core system

• 2.1 improvement over 64 core conventional bus

Page 8: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Motivation

• Keeping track of all the blocks in directory entails huge storage requirements.

• Directory cache requires less storage, but it will suffer from directory cache misses.

• Most of the accessed blocks (about 75% on avg.) are private.

Page 9: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Private vs. Shared blocks• Coarse-grain strategy (page granularity)

• OS detects when a private page must become shared.

• Every new page load is private

• When another processor access private blocks, it becomes shared.

Page 10: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks

Page 11: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Coherence Recovery Mecha-

nism

• Flushing-based Recovery Mechanism- Flushing all the blocks within a page may in-crease

the miss rate.

• Updating-based Recovery Mechanism

Page 12: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Increasing the Effectiveness of Directory Caches by Deactivating

Coherence for Private Memory Blocks: Results

• Directory caches can avoid the tracking of about 57%

• Shorten the runtime of parallel application by 15% while keeping directory cache size or to maintain system performance while using direc-tory caches 8 times smaller.

Page 13: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Complexity-Effective Multicore Coherence

• Similarity- Motivation

- Private and Shared blocks

• Difference- Simplifying the protocol

- directory-less

Page 14: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Complexity-Effective Multicore Coherence:

Simplifying the protocol• Dynamic write policy - Write-back vs. Write-through

• VIPS Cache coherency protocol- Valid/Invalid – Private/Shared

Page 15: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Complexity-Effective Multicore Coherence:

Directory-less• Self-invalidation

- Readers are allowed to make unregistered copies of a memory location, as long as they promise to invalidate these at the next synchronization point.- Doe this follow cache coherency?

• Selective Flushing

• Write-through at a word granularity with per-word dirty bit

Page 16: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Complexity-Effective Multicore Coherence:

Simplifying the protocol: Synchronization• Synchronization relies on data race

• Atomic instructions spin locally in it’s L1 until the condition is changed by another core.

• In this paper, a core does not send invalidation signal to other cores when executes write inst.

• Solution?

Page 17: Cache  Coherence “Can  we do a better job of supporting cache coherence ?”

Complexity-Effective Multicore Coherence:

Simplifying the protocol: Results• Outperformed MESI directory protocol by 4.8%

• Reduced network energy consumption by 14.2%

• Simulated for 15 parallel benchmarks, on 16 cores