qualcomm centriq architecting a multi-core server soc for ......•distributed architecture increase...
Post on 16-Oct-2020
11 Views
Preview:
TRANSCRIPT
Architecting amulti-core server SoC for the cloud
Barry WolfordSr. Director of Technology, Chief SoC ArchitectQualcomm Datacenter Technologies, Inc.
Linley Processor Conference October 4-5, 2017
2Source: IDC
Agenda
• Seeding the Cloud…
• Qualcomm Centriq™ 2400 Overview
• Selected Architecture Features
• Summary
Qualcomm Centriq is a product of Qualcomm Datacenter Technologies, Inc.
3
Traditionalenterprise
Monolithic | Stateful | OS or VM boundScale up | Silo’d
The shift to the cloud…
Cloudenvironments
Microservices | Mix of stateless / statefulContainerized | Scale out | Devops | Multi-tenant
More than 50 percent of servers soldby 2020 will be deployed for cloud computing services*
Source: IDC
4
…driving new requirements for datacenter infrastructure…
Throughput scalability
Performance at scale
Power efficiency
Workload-optimized infrastructure tiers
Application level redundancy
Efficient resource poolingCloud
environmentsMicroservices | Mix of stateless / stateful
Containerized | Scale out | Devops | Multi-tenant
5
…driving new considerations in processor architecture…
Among those are…
• Aggregate Performance◦ Throughput, concurrency, parallelism
• Thread density
◦ VM-hosting, multi-instance/multi-tenant• Thread isolation
◦ Reliable performance, SLAs
• Quality of service◦ SLAs, “noisy neighbors”, tail latencies
• Power efficiency
◦ Performance/Watt
Qualcomm Centriq 2400
6
• World’s First 10nm Server Processor
• Qualcomm® Falkor™ CPU ◦ Qualcomm's 5th-generation custom core design◦ ARMv8-compliant / AArch64 only
• Highly integrated Server SoC◦ Single chip platform-level solution◦ Integrated South Bridge
• High core count (up to 48 cores)◦ High performance single threaded CPU ◦ 1 CPU per “thread”
• Distributed architecture◦ Increase parallelism◦ Maximize concurrency
• Targeting Cloud and throughput-oriented workloads◦ Virtualization and Containerization◦ Multi-instance and Multi-tenancy
Purpose-built for the Cloud
◦◦
◦
◦◦◦
◦◦
Qualcomm Centriq 2400
Qualcomm Falkor is a product of Qualcomm Datacenter Technologies, Inc.
7
◦
SoC Overview
Coherent Ring
QDF2400
L3Cache
DDR4 Memory
Controllers
PCIeGen3
Falkor
CPUL1
CPUL1
L2
Falkor
CPUL1
CPUL1
L2 SATA
IMC
DMA
Low-speed IO
Qualcomm Centriq 2400
8
◦
◦
◦
◦
◦
◦
Foundational Elements
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
CPW
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
Coherent Segmented Ring Interconnect
Qualcomm Centriq 2400
9
Falkor Core Duplex
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
Qualcomm System Bus is a product of Qualcomm Technologies, Inc.
10
LLC & Memory
◦
Qualcomm Centriq 2400
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
C
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
11
On-Chip Interconnect
Even Interleave - CW
Odd Interleave - CW
Even Interleave - CCW
Odd Interleave - CCW
◦
◦
◦
◦
◦
Qualcomm Centriq 2400
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
CPW
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
Coherent Segmented Ring Interconnect
12
Distributed LLC & DDR
◦
◦
◦
Qualcomm Centriq 2400
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
C
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
MC
MC
MC
MC
MC
MC
L3
L3
L3
L3
L3
L3
L3
L3
L3
L3
L3
L3
13
Distributed PoC & Snoop Filter
◦
◦
◦
◦
◦
◦
◦
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
C
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
Qualcomm Centriq 2400
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS
/PoC
Snp
Filt
er
PoS/PoC/SnpF
14
Distributed IOMMUs
◦
◦
◦
Qualcomm Centriq 2400
8-S
erde
s
SA
TA
CT
LH
DM
AE
MA
C
OC
ME
M
QG
ICU
SB
US
B
US
B
US
B
QF
PR
OM
IMC
MP
M/C
C
8-Serdes
PC
le
8-Serdes
8-Serdes
PC
le
8-Serdes
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
FalkorDuplex
DDR DDR
MC
DDR DDR DDR
L3L3 L3L3 L3L3
L3L3 L3L3 L3L3
MC
MC
MC
MC
MC
DDR
159/27/17 Qualcomm Confidential and Proprietary
L3
L3 Quality of Service (QoS) Extensions
QoS Extensions:• Hardware Abstracted QoS Domain Identifier
• Per Client (Core/Virtual Machine, IO/Virtual Function)• Per-Resource Monitoring and Way-based Allocation
• Monitor Utilization per QoSID per L3• Policy Enforcement per QoSID per L3
• Instruction/Data Granularity • Fine-Tune Cache Allocation per Thread or Class of Threads
Shared Resource Contention- Distributed L3 Cache- Limited/No Allocation Policy Enforcement
VM/Thread 0 VM/Thread 1 IO/VF 0
L3
CPU 0 CPU 1 Device 0
VM/Thread 0 VM/Thread 1 IO/VF 0
CPU 0 CPU 1 Device 0
No L3 QoS L3 QoS
Improved cache utilization and per-workload performance (lower application latency) for critical workloads…..
169/27/17 Qualcomm Confidential and Proprietary
Memory Bandwidth Compression
Uncompressed Memory(128B Lines)
0a 0b 1a 1b2a 2b 3a 3b4a 4b 5a 5b6a 6b 7a 7b8a 8b 9a 9bAa Ab Ba Bb
Bandwidth Compression:• Proprietary algorithm• Inline compression w/in Memory Controllers
• Fully transparent to software• Compress 128B line to 64B when possible• ECC is encoded with compression bit• Very low latency decompression
• 2 – 4 cycles• Effective on compressible bandwidth intensive workloads
Compressed Memory
Increased effective memory bandwidth and reduced power for compressible workloads…..
Constrained Memory Bandwidth- Channel limited peak MT/s- Limited number of DDR Channels
0 12a 2b 34 5a 5b
6a 6b 7a 7b8 9a 9bA Ba Bb
0a 0b 1a 1b 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b 7a 7b
0 1 2a 2b 3 4 5a 5b 6a 6b 7a 7b
8a 8b
8
Memory Access Stream – w/o Bandwidth Compression
Memory Access Stream – w/ Bandwidth Compression9a 9b A Ba Bb
17
SummaryQualcomm Centriq 2400
Follow us on:For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog
Thank you
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiariesor business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially allof its product and services businesses, including its semiconductor business, QCT.
top related