ncore™ cache coherent interconnect - arteris · 24/05/2016 · cache coherent interconnect...
TRANSCRIPT
Ncore™ Cache Coherent InterconnectTechnology Overview, 24 May 2016
David Kruckemyer
Copyright © 2016 Arteris
Chief Technology Officer Chief Hardware Architect
Craig Forrest
24 May 2016
Contents
○ About Arteris
○ Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 2
Arteris: The on-chip interconnect leaderArteris Product Milestones○ Founded in 2003 to pioneer network-on-chip (NoC) interconnect○ NoC Solution = first released NoC implementation in 2005○ FlexNoC® = second generation Arteris NoC in 2009/2010○ FlexPSI = die-to-die or chip-to-chip parallel interface in 2013○ FlexNoC Resilience Package™ = Functional Safety option in 2014○ FlexNoC Physical™ = Physically aware IP with FlexNoC Version 3 in 2015○ Ncore™ Cache Coherent Interconnect = Heterogeneous cache coherency in 2016.Company○ Headquarters and Engineering Development in Campbell, USA○ Worldwide support offices (USA, France, China, Korea, India, Japan)
Copyright © 2016 Arteris
Customer Adoption
* Customer data current as of 1 May 2016
Awards
1 6 9 1320
4152
5867
76 79
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
3
Arteris has become the standard for complex and low-power SoCs
Copyright © 2016 Arteris
Customers shipped > 1B SoCs as of 2015
108 Chips Produced
*Data is cumulative. Design data is customer-reported and subject to change. Data is current as of 1 May 2016.
146 Tape-Outs240 Design Starts
1 5 11 1932
55
99119
140 146
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 4 1120
33
51
79
104 108
2008 2009 2010 2011 2012 2013 2014 2015 2016
1 5 13 26 41
85
128159
190
229 240
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
4
Arteris Customers:Arteris technology is becoming a standard
Copyright © 2016 Arteris 5
Mobility
Automotive, IoT (Internet of Things), Camera & CE (Consumer Electronics)
SSD (Solid State Drive), Networking & Automation
Current as of 1 May 2016
Very LargeSoC Maker
Toshiba Japan System OEM
AutomotiveSoC Maker
Major SSD Vendor
Defense Contractor
Defense Contractor Silicon Foundry Major IP
Provider
Large Drone Maker
Japan Tier 1 SoC Maker
Defense Contractor
Major Auto & CE SoC Maker
Major AutomotiveOEM
Major SSD Vendor
Arteris interconnect IP now covers coherent and non-coherent use cases
Copyright © 2016 Arteris
Design-Specific Subsystems
GPU Subsystem
3D Graphics
DSP Subsystem (A/V) AES
2D GR.
MPEG
Etc.
FlexNoC® Non-coherent Interconnect
High Speed Wired Peripherals
USB 3USB 2
PHY3.0, 2.0
PCIe
PHY
Ethernet
PHY
Wireless Subsystem
WiFi
GSM
LTE
LTE Adv.
InterChip LinksTM
HDMI
MIPI
Display
PMU
JTAG
I/O Peripherals
Memory Subsystem
Wide IO LP DDRDDR3
PHY PHY
Memory Scheduler
Memory Controller
Arteris Interconnect IP Products
Subsystem InterconnectCRI
CryptoFirewall (PCF+)
RSA-PSSCert.
Engine
Security Subsystem
IP
IP
IP
IP
IP
IP
FlexWay® Interconnect
Application IP Subsystem
IP
IP
IP
IP
IP
IP
FlexWay Interconnect
Ncore™ Cache Coherent Interconnect
CPU Subsystem
A57
L2 cache
A57
A57
A57
A53
L2 cache
A53
A53
A53
6
Contents
○ About Arteris
○ Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 7
Modern SoC Design Challenges
○ SCALABILITY: How to scale systems up as the number of coherent agents increases?
○ HETEROGENEITY: How to integrate coherent processing elements using different protocols, different semantics, or having different cache characteristics?
○ SYSTEM INTEGRATION: How to integrate IP that is not cache coherent and achieve better performance?
○ PHYSICAL DESIGN: How to create a cache coherent system that is easily placed on chip?
○ POWER MANAGEMENT: How to optimize power consumption of complex systems?
Copyright © 2016 Arteris 8
Why Caches?
○ Caches are small, fast memories tightly coupled to processing elements
○ Reduced average memory latency means higher performance• Temporal locality• Spatial locality
○ High bandwidth due to high frequency and wide interfaces
○ Fewer off-chip DRAM accesses resulting in lower power consumption
Copyright © 2016 Arteris 9
Why Cache Coherency?
○ Caches create multiple copies of data• Managing these copies in software is difficult
○ Hardware cache coherency creates the illusion of a flat, shared memory• Caches are invisible to software• Multiple copies are kept consistent
○ But… managing copies in hardware requires a lot of communication• Must check every place there may be a valid copy à Snoop• Snoop filters reduce communication by tracking cache contents
Copyright © 2016 Arteris 10
Contents
○ About Arteris
○ Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 11
Ncore Cache Coherent Interconnect IP
Copyright © 2016 Arteris
CPU Cluster
DRAM SRAM
Cache ($)
GPU
Cache ($)… Image
ProcessingDisplay
Processing…
Subsystems
Peripherals
Coherent Agents Non-coherent Agents
Non-coherent Agents
Memory Agents12
Ncore Interconnect Architecture
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
13
Coherent Read Example – Cache Hit
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
Bridge⋯
❷
❸
14
Coherent Agent Interface
Cache ($)
Consumer
❶
Producer
Coherent Read Example – Cache Misses
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
Bridge⋯❷
❸
15
Coherent Agent Interface
Cache ($)
Consumer
❶
Memory
❹
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 16
Benefit #1: True heterogeneous coherency
Two features are primarily responsible for enabling Ncore’s unique heterogeneous cache coherency capabilities:
1. Support for multiple coherence models
2. Use of multiple configurable snoop filters to accommodate different cache organizations
Copyright © 2016 Arteris 17
Benefit #1: True heterogeneous coherencySupport for heterogeneous coherent agents
○ Cache coherent agents can differ greatly, which increases the difficulty in integrating them into a system-on-chip• Logical – coherence models• Physical – cache organization, transaction table sizes
○ Ncore adapts to each coherent agent’s behavior and characteristics• Coherent agent interfaces adapt individual coherence models to a
generic model using a lightweight messaging layer
Copyright © 2016 Arteris 18
Benefit #1: True heterogeneous coherencyCoherent agent interfaces adapt individual coherence models to a generic model
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
19
Benefit #1: True heterogeneous coherencyWith multiple configurable snoop filters
Copyright © 2016 Arteris
Non-coherent Domain
Non-coherent Bridge(s)
Proxy Cache ($)
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
○ Cache coherent agents can have very different behaviors• Cache organization• Coherency models• Workloads
○ Associating caching agents that share common properties with individual snoop filters can consume less die area than a monolithic snoop filter
20
Benefit #1: True heterogeneous coherencyMultiple snoop filters are more area-efficient than one
Copyright © 2016 Arteris
A
Cache ($)
B
Cache ($)C
Cache ($)
DCache ($)
Multiple snoop filters are smaller: area(Y+Z) < area (X)
Traditional Approach
MonolithicSnoop Filter
(X)
REQ
ABCD
Ncore Approach
REQ
Snoop Filter #1(Y)
Snoop Filter #2
(Z)
ABCD
21
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 22
Benefit #2: Highly scalable systemsWith a configurable, modular approach
○ Transaction processing and data bandwidth scaling• Each component can be scaled individually (add or subtract
components)• Ports per component can be scaled individually (add or remove
ports)
○ Why is configurable interconnect superior to fixed-function, centralized controllers?• Meet performance goals without wasted resources• Easily adjust system design as requirements evolve• Build derivative chips based on the same platform
Copyright © 2016 Arteris 23
Benefit #2: Highly scalable systemsAdd more components or ports to scale bandwidth
Arteris Confidential 24
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($)
⋱⋱
⋯
⋯
Coherent Memory Interface ⋯
Cache ($)
Coherent Agent Interface
Coherent Memory Interface
Non-coherent Subsystem
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
Cache ($)
Coherent Agent Interface
Add more components…
…or add more ports
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 25
Benefit #3: Higher performance with non-coherent IPUsing configurable proxy caches
Advantages (new and novel)1. Better for sharing data between non-coherent agents and
coherent agents2. Better for sharing data between non-coherent agents
○ Using a proxy cache minimizes communication through DRAM
○ Additional system benefits• Pre-fetch effect – fetch cache lines vs. individual data• Write-gathering benefit – writes accumulated in cache • Optimizes coherent memory accesses
Copyright © 2016 Arteris 26
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent & coherent agentsUsing configurable proxy caches
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
Consumer Producer
❶
❷
❸
❹
❺
27
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent agentsUsing configurable proxy caches
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
ConsumerProducer
❶
❷
❸
❹
28
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 29
Benefit #4: Lower power consumptionWith multiple clock and voltage domains
Copyright © 2016 Arteris
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-coherent
BridgeProxy C
ache ($)
Non-coherent
BridgeProxy C
ache ($)⋯
30
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 31
Benefit #5: Easier chip floorplanningWith a highly distributed architecture
Copyright © 2016 Arteris
○ Reserve less area for cache coherent interconnect• Place it in existing “white space” routing channels – easier P&R
○ Locate modular Ncore components closer to critical IP – better timing
○ Minimize wiring congestion
Source: Andrei Frumusanu, AnandTech
Hub- and crossbar-based coherent interconnects require significant contiguous reserved die area
32
Contents
○ About Arteris
○ Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 33
Summary
Ncore™ Cache Coherent Interconnect IP is targeted at heterogeneous SoCs.
Copyright © 2016 Arteris
○ Scalability○ Configurability ○ Area efficiency○ High performance○ Optimal power consumption
○ Multiple configurable snoop filters
○ Multiple configurable proxy caches
○ Modular distributed architecture
Benefits Major Unique Features
RESULT: Custom-configured interconnect IP that meets exact
system requirements
34
Copyright © 2016 Arteris 35
To request more information, visit us at http://www.arteris.com/contact