openpiton+ariane: the risc-v hardware research...

31
OpenPiton+Ariane: The RISC-V Hardware Research Platform Princeton University and ETH Zürich http://openpiton.org http://pulp-platform.org

Upload: others

Post on 15-Jul-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton+Ariane: The RISC-V Hardware Research Platform

Princeton University and ETH Zürich

http://openpiton.orghttp://pulp-platform.org

Page 2: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Princeton Parallel Research Group• Computer Architecture after Moore’s Law• Redesigning the Data Center of the Future• Biodegradable Computing (Materials)

• 12 PhD Students• 3 Undergraduates

3Winter 2018 Ski Trip

Page 3: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

4

This work was partially supported by the NSF under Grants No. CNS-1823222,CCF-1823032, CCF-1217553, CCF-1453112, and CCF-1438980, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreements No. FA8650-18-2-7846, FA8650-18-2-7852, and FA8650-18-2-7862, AFOSR under Grant No. FA9550-14-1-0148, and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), the NSF, AFOSR, or the U.S. Government.

Support

Page 4: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

5

The world’s first open source, general purpose, multithreaded manycore processor

• Open source manycore• Written in Verilog RTL• Scales to ½ billion cores• Configurable core, uncore• Includes synthesis and back-end flow• Simulate in VCS, ModelSim, NCSim, Verilator, Icarus• ASIC & FPGA verified• ASIC power and energy fully characterized

[HPCA 2018]• Runs full stack multi-user Debian Linux• Used for Architecture, Programming Language,

Compilers, Operating Systems, Security, EDA research

Tile

Chip

chipset

Page 5: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

• Collaboration between Princeton University and PULP team from ETH Zürich

• Goal is to develop a permissively licensed, Linux capable many-core research platform based on RISC-V

• Ariane– RV64GC Core– Linux capable

•– Research manycore system– OpenSPARC T1 based– Coherent NoC, distributed cache

OpenPiton+Ariane

6

Page 6: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

• Project started in 2013 by Luca Benini• A collaboration between University of Bologna and ETH Zürich

– Large team. In total about 60 people, not all are working on PULP

• Key goal is

• We were able to start with a clean slate, no need to remain compatible to legacy systems.

Parallel Ultra Low Power (PULP)

How to get the most BANGfor the ENERGY consumed in a computing system

Page 7: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

• Our research was not developing processors…• … but we needed good processors for systems we build for

research• Initially (2013) our options were

– Build our own (support for SW and tools)– Use a commercial processor (licensing, collaboration issues)– Use what is openly available (OpenRISC,.. )

• We started with OpenRISC– First chips until mid-2016 were all using OpenRISC cores– We spent time improving the microarchitecture

• Moved to RISC-V later– Larger community, more momentum– Transition was relatively simple (new decoder)

How we started with open source processors

Page 8: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

§ Zero-riscy§ RV32-ICM

§ Micro-riscy§ RV32-CE

§ Ariane§ RV64-GC§ Full privileged

specification§ “OS Core”

§ RI5CY§ RV32-ICMX

§ SIMD§ HW loops§ Bit

manipulation§ Fixed point

§ RI5CY + FPU§ RV32-ICMFX

PULP RISC-V Family Explained

Low Cost Core Linux capable Core

9

Core with DSP enhancements

Floating-point capable Core

32 bit 64 bit

9

See also other tutorials on PULP/HERO

Page 9: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

An Application class processor• Virtual Memory

– Multi-program environment

– Efficient sharing and protection

• Operating System– Highly sequential code– Increase frequency to gain

performance• Large software

infrastructure– Drivers for hardware (PCIe,

ethernet)– Application SW (e.g.:

Tensorflow, …)

– Larger address space (64-bit)– Requires more hardware support• MMU (TLBs, PTW)• Privilege Levels• More Exceptions (page fault, illegal access)

→Ariane an application class processor

10

10

Page 10: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

ARIANE: Linux capable 64-bit core• Application class processor• Linux Capable

– Tightly integrated D$ and I$– M, S and U privilege modes– TLB, SV39– Hardware PTW

• Optimized for performance– Frequency: 1.5 GHz (22 FDX)– Area: ~ 175 kGE– Critical path: ~ 25 logic levels

• 6-stage pipeline– In-order issue– Out-of-order write-back– In-order commit

• Scoreboarding• Designed for extendibility• Branch-prediction

– Return Address Stack (RAS)– Branch Target Buffer (BTB)– Branch History Table (BHT)

11

11

Page 11: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

ARIANE: Linux capable 64-bit core

12

Page 12: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

RISC-V Debug• RI5CY/Ariane contain

performance counters– SoC performance monitoring

not part of RISC-V spec

• Trace task group working on PC tracing– UltraSoC leading efforts– PULP effectively engaging– Working on implementation

for PULPissimo

13

• Draft specification 0.13– More or less frozen

• Defines debug registers for– run/halt/single-step– reading/writing GPR, FPR

and CSRs– Querying hart status

• JTAG interface• OpenOCD support• SiFive influenced

13

Page 13: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

14

Tile

Page 14: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

15

Page 15: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

16

Chip

Page 16: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

17

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

ChipsetChip

Page 17: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

18

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM

Chip Chipset

Page 18: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

19

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM WishboneSDHC

Chip Chipset

Page 19: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

20

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM WishboneSDHC

AXII/O

Chip Chipset

Page 20: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton System Overview

21

P-Mesh Off-Chip Routers (3)

Chip Bridge

P-Mesh Chipset Crossbars (3)

DRAM WishboneSDHC

AXII/O

Chip Chipset

Page 21: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Tile Overview

22

To Other Tiles

L2 Cache Slice+

Directory Cache

P-MeshRouters

(3)

L1.5 Cache

CCX Arbiter

FPU

Modified OpenSPARC T1

Core

MITTS(Traffic Shaper)

Page 22: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Silicon Proven Designs: Ariane• Ariane has been taped-out

Globalfoundries 22nm FDXin 2017 and 2018

• The system features 16 kByte ofinstruction and 32 kByte of datacache.

• Poseidon:– Area: 0.23 mm2 – 175 kGE– 0.2 - 1.7 GHz (0.5 V – 1.15 V)

• Kosmodrom:– RV64GCXsmallFloat– Transprecision / Vector FPU– Ariane HP

• 8T library, 0.8V, 1.3 GHz• 55 mW @ 1 GHz

– Ariane LP• 7.5T ULP library, 0.5V, 250 MHz• 5 mW @ 200 MHz 23

Issue

QUENTIN KERBIN

HYPERDRIVE

Poseidon layoutAriane

Kosmodrom layout

Ariane LPAriane HP

L2

NTX

23

Page 23: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Silicon Proven Designs: Piton Chip

Off-Chip Memory and I/O Interface

Tile Network On-Chip Links

• 25-core– 2 Threads per core– 64-bit Architecture– Modified OpenSPARC T1 Core

• 3 NoCs (P-Mesh)– 64-bit, 2D Mesh– Extend off-chip enabling multichip systems

• Directory-Based Cache System– 64KB L2 Cache per core (Shared)– 8KB L1.5 Data Cache– 8KB L1 Data Cache– 16KB L1 Instruction Cache

• IBM 32nm SOI Process– 6mm x 6mm– 460 Million Transistors

• Target: 1GHz Clock @ 900mV• 208 Pin CQFP Package

24

Page 24: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Piton Test Setup

25

DRAM + I/O

Chipset FPGAKintex 7

Bridge FPGASpartan 6

Piton + Heat Sink

Bulk Decoupling

Power Supply

Misc. Configuration

[McKeown et al, HotChips 2016] [McKeown et al, IEEE MICRO 2017] [McKeown et al, HPCA 2018]

Page 25: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Putting it all together

26

To Other Tiles

L2 Cache Slice+

Directory Cache

P-MeshRouters

(3)

L1.5 Cache

CCX Arbiter

FPU

Modified OpenSPARC T1

Core

MITTS(Traffic Shaper)

§ Native L1.5 interface is the ideal point to attach a new core

§ Well defined interface similar to CCX from OpenSPARC

§ Write-through cache protocol

§ Coherency mechanism: only need to support invalidation messages

Page 26: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Putting it all together

27

To Other Tiles

L2 Cache Slice+

Directory Cache

P-MeshRouters

(3)

L1.5 Cache

CCX Arbiter

FPU

Modified OpenSPARC T1

Core

MITTS(Traffic Shaper)

§ Native L1.5 interface is the ideal point to attach a new core

§ Well defined interface similar to CCX from OpenSPARC

§ Write-through cache protocol

§ Coherency mechanism: only need to support invalidation messages

Page 27: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

FPGA Prototyping Platforms

Available:• Digilent Genesys2– $999 ($600 academic)– 1-2 cores at 66MHz• Xilinx VC707– $3500– 1-4 cores at 60MHz• Digilent Nexys Video– $500 ($250 academic)– 1 core at 30MHz

In progress:• Xilinx VCU118, BittWare XUPP3R– $7000-8000– >100MHz• Amazon AWS F1– Rent by the hour

Page 28: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton Philosophy• Focus/Value is in the Uncore

– Not religious about ISA– Provide whole working system

• We are practical– Use Verilog (Ariane is SV)– Industry standard tools– Use the best tool for job (including commercial CAD tools)

• Primarily for research, but welcome industry also• Licensing

– All our code, Hypervisor, are BSD-like– Linux, T1 core (GPL or LGPL)– Ariane (Solderpad)

• Scalability (Million Core)

29

Page 29: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

OpenPiton Community

• Visit http://openpiton.org• [email protected]

30

• Building a community– Welcome community

contributions– Thousands of Downloads

• Google Group

Page 30: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Doing Research with OpenPiton + Ariane

• Software– Install on Debian, test scalability

• Operating System– Recompile kernel, rebuild SW, run

• Hardware/Software Co-design– Add new instructions, change compiler/HV/OS/SW

• Architecture– Change parameters, rebuild HW, run

31

HW

ISA

HV/OS

Apps

Compiler/Runtime

Page 31: OpenPiton+Ariane: The RISC-V Hardware Research Platformparallel.princeton.edu/openpiton/tutorial_slides/...–Use a commercial processor (licensing, collaboration issues) –Use what

Enabled Research

• Coherence Domain Restriction– Fu et al. MICRO 2015

• Execution Drafting– McKeown et al. MICRO 2014

• Memory Inter-arrival Time Traffic Shaper– Zhou et al. ISCA 2016

• Oblivious RAM– Fletcher et al. ASPLOS 2015

• DVFS modelling• Multiple outside papers• Numerous class research projects

32

Program A Instruction Program B Instruction

Fetch Stage Thread Select Stage

Decode Stage Execute Stage Memory Stage Writeback Stage

Successfully Drafted Instructions Lead Instructions

… … … … … … … ……

…App 1

App 3

App 2

Frequency

RequestInter-arrival time

2t

t 3t

Uniform Traffic

More Bursty Traffic

2tA Distribution of Traffic

Time