a breakthrough new cpu architecture revives ipc scaling
DESCRIPTION
Soft Machines представляет революционную архитектуру микропроцессоров VISC™, возрождающую рост производительности на ватт потребления Компания Soft Machines — стартап из Кремниевой долины, работающий в области полупроводников, анонсировал архитектуру Soft Machines VISC™. В числе инвесторов компании — Samsung Ventures, AMD, Mubadala, РВК, KACST, РОСНАНО и TAQNIA. VISC-архитектура — это настоящий прорыв в направлении увеличения производительности микропроцессоров в расчете на ватт потребляемой мощности. Эта разработка позволит существенно повысить энергоэффективность во всех сегментах компьютерной экосистемы. VISC-архитектура разрабатывалась как решение проблем увеличения частоты одноядерных процессоров и сложности программирования многоядерных процессоров. http://www.rusnano.com/about/press-centre/news/20141024-soft-machines-predstavlyaet-revolyutsionnuyu-architekturu-mikroprotsessorov-viscTRANSCRIPT
A Breakthrough New CPU Architecture Revives IPC Scaling
Mohammad Abdallah
Founder, President and CTO
Linley Processor Conference October 23, 2014
• Emerging from stealth mode
• Developed new VISC™ Architecture
• 7 years, $125M R&D
• ~250 employees , 75+ patents filed
Introducing Soft Machines™
2 ©Copyright 2014, All Rights Reserved
The Death of CPU Scaling
©Copyright 2014, All Rights Reserved 3
“The failure of CPU scaling after 30 years of continual improvements may have slammed the door on the easiest and most common type of performance scaling…”
The Death of CPU Scaling ExtremeTech (2012)
2014
Microprocessor Scaling Realities after 2004
Transistor scaling continues
Clock speed flat
Power budget flat
Perf/clock flat
Source: “The Free Lunch is Over”, Herb Sutter
Industry Response: Multi-Core
4
Core1 Core2
Thread1 Thread2
Advantages: - Utilizes growing transistor
budget - Performance scaling for
parallel code - Improves throughput
Challenges: - ST performance doesn’t scale - Threading/multicore coding
complexity - Amdahl’s Law of diminishing returns - Dark silicon
©Copyright 2014, All Rights Reserved
• Revive CPU performance scaling
• Utilize Moore’s Law transistor scaling
• Mitigate dark silicon
• Liberate ISA dependency
CPU Architecture Challenge
5 ©Copyright 2014, All Rights Reserved
VISC™ Architecture Wave
6
RISC (MIPS)
CISC (IBM/Intel)
VISC (Soft Machines)
Software Scalability/Productivity
Compilation Concurrency Extraction Assembly
Device Physics Scalability
Short Pipeline
Code Memory size
Deep OoO Pipeline
Processor Speed
Virtual Cores/Threads
Processor Power
Late 1980s – 2010s 1970s – early 1980s 2010s
VISC Architecture scales on both physical and software productivity layers
©Copyright 2014, All Rights Reserved
VISC™ Processor Block Diagram
©Copyright 2014, All Rights Reserved 7
L2$ & Memory
Sequential Code
SW Single Thread
Core2 Core1 L1 D$ L1 D$
Core4 Core3 L1 D$ L1 D$
Virtual Cores Global Front End
Virtual HW Threads (HW threadlets)
Virtual Core1
Virtual Core2
Virtual Core3
Virtual Core4
VISC™ CPU Usage Example
©Copyright 2014, All Rights Reserved 8
or
• VISC dynamically allocates resources across virtual cores based on individual application needs
• Performance/watt balanced for both single & multi-thread applications
Heavy App
Dual SW Threads Single SW Thread Heavy App Light App
Virtual Cores Virtual HW Threads/Threadlets
Core2 Core1 L1 D$ L1 D$
Virtual Core1
Virtual Core2
Virtual Cores Virtual HW Threads/Threadlets
Core2 Core1 L1 D$ L1 D$
Virtual Core1
VISC™ Architecture Prototype Pipeline
©Copyright 2014, All Rights Reserved 9
Fetch Allocate/ Dispatch EXE
Mem/long latency
Execution
RF read
Virtual Thread
Formation
Pipeline of Virtual Threads Across the Virtual Cores
L2$ & Memory
SW Single Thread
Global Front End
Core2 Core1 L1 D$ L1 D$
Virtual Core1
Virtual Core2
Virtual Cores
Virtual HW Threads (HW threadlets)
VISC™ Revives IPC Curve
10
ARM A15 1C
Intel Atom
1C
Soft Machines
2VC Proto
Apple A7 1C
ARM A57 1C
Intel Haswell
1C
Compiled Code 32-bit 32-bit 32-bit 32-bit 32-bit 64-bit
Cache 1M 2M 1M 1M+4M 2M 2M
Pipeline Moderate Moderate Shallow Moderate Moderate Deep
IPC(SPEC 2006)* 0.71 0.69 2.1 1.0 .87 1.39
* Company conducted benchmark tests and projections, using industry-standard Compiler GCC 4.6 or equivalent
Mobile CPU designs are pursuing higher ARCH/µARCH complexity
2006 The Basic
A8 2-way
2009 The Simple
A9 2-way OoO
2011 The Moderate
A15 3-way
2013 The Big
Apple A7 6-way
2014 The Ultimate
Haswell 8-way
©Copyright 2014, All Rights Reserved
• Extracting ILP has significant complexity
• OoO complexity increases quadratically with machine width
• VISC complexity increases linearly with number of virtual cores
• VISC Performance/Watt utilizes linear scaling
VISC™ Concurrency Extraction Linear vs. Quadratic Complexity
11 ©Copyright 2014, All Rights Reserved
System Energy Approach: DRVFS
12
Virtual Cores – DRVFS • DRVFS: linear increase in power
• P No. of virtual core resources • Higher Perf/MHz enables DVFS scaling DOWN
Physical Cores – DVFS • DVFS: quadratic increase in power
• P V2 * F • Lower Perf/MHz requires DVFS scaling UP
Use Case: Rush to low power mode (boosting
performance or response time)
Core1
©Copyright 2014, All Rights Reserved
VISC™ Single Thread SPEC/Watt
13
Mob
ile
Serv
er
Same performance in 1/4-1/3rd power or 1.7-2.2x perf at the same power* * Company conducted benchmark tests and projections for 28nm
1C App CPU
Single Thread Performance
Pow
er
1.7x
1/3
1/4
2.1x
1.8x 2.2x 1VC (2C) 1VC (4C)
©Copyright 2014, All Rights Reserved
VISC™ Dual Thread SPEC/Watt
14
* Company conducted benchmark tests and projections for 28nm
2C App CPU
Mob
ile
Serv
er
Pow
er
Dual Thread Performance
1.4x
1.5x
1/2
0.4x
1.8x
1.9x
Same performance in 0.4 to 0.5x of power or 1.4 - 1.9x perf at the same power*
2VC (2C) 2VC (4C)
©Copyright 2014, All Rights Reserved
VISC™ Technology Prototype
15
Working Silicon • VISC Processor Proof-of-Concept Prototype
• IPC scalability • VISC architecture • Software efficiency
• Full Platform • VISC Dual Virtual Core Processor • SoC with 3D, Video, DRAM controller,
HD video…. • Full System functionality
• Linux OS • UEFI BIOS • Benchmarks running on Linux • Android ICS booting
©Copyright 2014, All Rights Reserved
16
Silicon Results: Performance/MHz Dual Virtual Core/A15 IPC Ratio
©Copyright 2014, All Rights Reserved
VISC™ Architecture
17
Virtual SW layer
Guest Sequential Code
OS & Hypervisor Single Thread
Guest ISA
Virtual ISA
L2$ & Memory
Core2 Core1 L1 D$ L1 D$
Core4 Core3 L1 D$ L1 D$
Virtual Core1
Virtual Core2
Virtual Core3
Virtual Core4
Virtual Cores Global Front End
Virtual HW Threads/Threadlets
©Copyright 2014, All Rights Reserved
Converter
VISC™ Run-time SW Architecture
18
Low level Virtual Machine
High level Virtual Machine Guest Code (ARM,X86)
Dynamic optimization
VISC™ Processor
Guest/VM to native mapping
Native Code
SMI API
Hot Pass
©Copyright 2014, All Rights Reserved
• Silicon proven VISC™ architecture delivers 3-4x IPC advantage on single and multi-threaded applications without software changes
• Resulting in ~2-4x performance/watt advantage
• VISC architecture is scalable from IoT to mobile to servers due to its modularity and symmetry
• Number of virtual cores, virtual threads, and virtual instruction layer
• VISC virtual instruction layer provides ISA agnostic and optimized run-time platform capabilities
Summary
19 ©Copyright 2014, All Rights Reserved