raven: a 28nm risc-v vector processor with … raven: a 28nm risc-v vector processor with integrated...

45
1 Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer, Andrew Waterman, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller, Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu, Henry Cook, Rimas Avizienis, Brian Richards, Elad Alon, Borivoje Nikolic, Krste Asanovic University of California, Berkeley

Upload: doankiet

Post on 10-Mar-2018

241 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

1

Raven: A 28nm RISC-V Vector Processor with

Integrated Switched-Capacitor DC-DC

Converters and Adaptive Clocking

Yunsup Lee, Brian Zimmer, Andrew Waterman,

Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller,

Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu,

Henry Cook, Rimas Avizienis, Brian Richards,

Elad Alon, Borivoje Nikolic, Krste Asanovic

University of California, Berkeley

Page 2: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Motivation

Energy efficiency constrains everything - SoCs are designed with an increasing number of

voltage domains for better power management

- Dynamic voltage and

frequency scaling (DVFS)

maximizes energy

efficiency while

meeting performance

constraints

2

• Off-chip conversion

Few voltage domains

Costly off-chip components

Slow mode transitions

✖ ✖ ✖

• On-chip conversion

Many domains

No off-chip components

Fast transitions

✔ ✔ ✔

Page 3: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

3

iPhone 6

Front Side Back Side

Credits: iFixit

Voltage

Regulators

Primary Power

Management IC

Secondary Power

Management IC

Page 4: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven Project Goals

Build a microprocessor that is:

4

• Fine-grained DVFS

• High conversion efficiency Energy-efficient

• Entirely on-chip converter

• Low area overhead Low-cost

Page 5: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Talk Outline

Motivation/Raven Project Goals

On-Chip Switched-Capacitor DC-DC Converters

Raven3 Chip Architecture

Raven3 Implementation

Raven3 Evaluation

RISC-V Chip Building at UC Berkeley

Summary

For more details on the Switched-Capacitor DC-DC Converters, please take a

look at HC23 tutorial “Fully Integrated Switched-Capacitor DC-DC Conversion”

by Elad Alon

5

Page 6: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Switched-Capacitor (SC) DC-DC Converters

6

Page 7: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

SC DC-DC Converters Cont’d

7

Partition capacitor

and switches into

many “unit cells” for

better modularity

Keeps the custom

design modular

Makes it easier to

floorplan SC DC-DC

converter

Page 8: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Traditional Approach: Interleaved Switching

8

Switch one unit cell at a time to smooth out

voltage ripple Pros

- Voltage ripple at the output is

suppressed

- Great for digital designs with

fixed frequency clocks

Cons - Each unit cell charge shares

with other unit cells

- Causes an efficiency loss

beyond typical switching

losses

Page 9: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven’s Approach: Simultaneous Switching

Switch all unit cells simultaneously when Vout

reaches a lower bound Vref

Add an adaptive clock generator so that clock

frequency tracks the voltage ripple 9

Pros - Simplifies the design

- No charge sharing losses

- Better energy efficiency

Cons - Need to deal with big ripple

on voltage output

Page 10: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Self-Adjusting Clock Generator

Replica tracks critical path with voltage ripple

Controller quantizes clock edge

10

Block Diagram (Tunable Replica Path)

DLL2Ghz input,

16 phases

ControllerSelect closest edge

Tunable

Replica

Path

Vout .........

...

...

......

Adaptive system clock

Vout

Page 11: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Reconfigurable SC Converters for DVFS

11

Simple lower bound control

1V Mode 1.8V 1/2 Mode 1V 2/3 Mode 1V 1/2 Mode(~1V) (~0.9V) (~0.67V) (~0.5V)

Unit Cell Topologies

Lower-bound Controller

Operational Waveform

1V

1V

1V

1.8V

1V

1.8V

1V

1V

1V

1V

1V

1V

VoutVout

VoutVout

+FSM

2GHz comparator

Vref

Vout toggle

toggle

Vref

Vout

Φ1Φ1 Φ1 Φ1

Φ2Φ2Φ2

Φ2

Φ2

Φ2

Φ2 Φ1

Φ1

Φ1

Φ1

off

off

Φ2

Φ2 Φ1

Φ1

Φ1

Φ1

off

Φ2

off

off

Φ1

Φ1

Φ1

Φ1

off

off

Φ2

Φ2

Φ2

Φ2

off

off

off

off

Efficiency Improvements

Interleaving Rippling (this approach)

Videal VoutVlow

VhighVout

VidealVlow

Vhigh

clk

Adaptive clock negates losses by increasing the average clock frequency

caused by charge sharing with interleaved converters

clk

No losses∆ V loss

∆ V

Vlow

Vin

Vlow

VinVhigh

Vlow

VinVin

Vhigh

toggletoggle

Operational Waveform

z

bypass

serial

Page 12: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven3 Chip Architecture

12 12

Page 13: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

RISC-V is a new, open, and completely free

general-purpose instruction set architecture (ISA)

developed at UC Berkeley starting in 2010

RISC-V is simple and a clean-slate design - The base (enough to boot Linux and run modern

software stack) has less than 50 instructions

RISC-V is modular and has been designed to be

flexible and extensible - Better integrate accelerators with host cores

RISC-V software stack - GNU tools (GCC/Binutils/glibc/newlib/GDB),

LLVM/Clang, Linux, Yocto (OpenJDK, Python, Scala)

Checkout http://riscv.org for more details 13

Page 14: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Rocket Scalar Core

14

PC IF ID EX MEM

To Vector

Accelerator

64-bit 5-stage single-issue in-order pipeline

Design minimizes impact of long clock-to-output delays of

SRAMs

64-entry BTB, 256-entry BHT, 2-entry RAS

MMU supports page-based virtual memory

IEEE 754-2008-compliant FPU

Supports SP, DP Fused-Multiply-Add (FMA) with HW

support for subnormals

WB

Page 15: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

ARM Cortex-A5 vs. RISC-V Rocket

Category ARM Cortex-A5 RISC-V Rocket

ISA 32-bit ARM v7 64-bit RISC-V v2

Architecture Single-Issue In-Order Single-Issue In-Order 5-stage

Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz

Process TSMC 40GPLUS TSMC 40GPLUS

Area w/o Caches 0.27 mm2 0.14 mm2

Area with 16K Caches 0.53 mm2 0.39 mm2

Area Efficiency 2.96 DMIPS/MHz/mm2 4.41 DMIPS/MHz/mm2

Frequency >1GHz >1GHz

Dynamic Power <0.08 mW/MHz 0.034 mW/MHz

- PPA reporting conditions - 85% utilization, use Dhrystone for benchmark, frequency/power at TT

0.9V 25C, all regular Vt transistors

- 10% higher in DMIPS/MHz, 49% more area-efficient 15

Page 16: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Hwacha Vector Accelerator

16

Page 17: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

17

1.3mm X 1.8mm Raven3 Chip

Fabricated in ST 28nm FDSOI

(Fully Depleted Silicon-on-Insulator)

Chip area: 2.34 mm2

Single-Core RISC-V Processor with a Vector Accelerator

Core area: 1.19mm2

Integrated Switched-Capacitor DC-DC Converter

Converter area: 0.19mm2

Area overhead: 16%

Adaptive Clock Generator

971 MHz

34 GFLOPS/W w/o Converter

26 GFLOPS/W w/ Converter

122 pins total

60 for signals

62 for power/ground

Page 18: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven3 Test setup

Conf. 1

Conf. 2

Conf. 3

18

Daughterboard: Chip-on-board

Motherboard: Programmable

supplies

Host: Zedboard (ZYNQ FPGA,

network accessible)

Page 19: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven3 Voltage Measurements

19

Page 20: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Measuring Efficiency

20

Traditional efficiency measurement is not

appropriate for our design 1. Difficult to measure on-chip current and voltage

2. Cannot measure adaptive clock overhead

Page 21: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven3 Achieves >80% System Efficiency

System efficiency = Ratio of energy required to

finish same workload in same time with the

converter versus without the converter

21

Page 22: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Raven3 Achieves 34 Double-Precision GFLOPS/W

Double-precision matrix-multiplication running on

vector unit used as energy-efficiency metric - 34 GFLOPS/W without DC-DC converter

- 26 GFLOPS/W with DC-DC converter

- Note, GFLOPS/W is inverse of nJ/FLOP

22

Page 23: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Five 28nm & Six 45nm RISC-V Chips Taped Out So Far

23

Raven-1 Raven-2

Raven-3

Raven-3.5

EOS14

EOS16

EOS18

EOS20

EOS22 EOS24

2011 2012 2013 2014 2015

May Apr Aug Feb Jul Sep Mar Nov Mar

Raven: ST 28nm FDSOI

SWERVE: TSMC 28nm

EOS: IBM 45nm SOI

SWERVE

Apr

Page 24: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Agile Hardware Development Methodology

24

C++

FPGA

ASIC Flow

Tape-in

Tape-out

Big Chip

Tape-out

“Tape-in”: Designs that

could be taped out - LVS clean & DRC sane

- Pass RTL/gate-level

simulation and timing

Fully scripted ASIC flow - RTL change to chip <1d

- Get early feedback

- Automatic nightly

regressions

- Identify source of subtle

bugs

Check longer programs on

FPGA

Iterate quickly on RTL with

using the

C++ emulator (Checkout

chisel.eecs.berkeley.edu

for more details)

Page 25: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Summary

28nm Raven3 processor features: - Fine-grained, wide-range DVFS (20ns, 0.45-1V)

- Entirely on-chip voltage conversion

- High system efficiency (>80%)

- Extreme energy efficiency (34/26 GFLOPS/W)

Key enablers - RISC-V, a simple, yet powerful ISA (free and open!)

- Agile development, build hardware like software

- Chisel, lets designers do more things with less effort

RISC-V core generators and software tools open-

sourced at http://riscv.org

Energy-efficient, low-cost on-chip DC/DC

converters are buildable!

25

Page 26: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Acknowledgements

Funding: BWRC members, ASPIRE members,

DARPA PERFECT Award Number HR0011-12-2-

0016, Intel ARO, AMD, GRC, Marie Curie FP7,

NSF GRFP, NVIDIA Fellowship

Fabrication donation by STMicroelectronics

Tom Burd, James Dunn, Olivier Thomas, and

Andrei Vladimirescu

26

Page 27: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

27

SUPPLEMENTAL

SLIDES

Page 28: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Test Chip Setup

28

Page 29: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

FPGA Setup

29

Page 30: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

“Rocket Chip” SoC Generator

30

Page 31: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Who should use the Rocket Chip Generator?

31

1. Change Parameters 2. Develop New Accelerators 3. Develop Own RISC-V Core 4. Develop Own Device

Page 32: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Chisel is a new HDL embedded in Scala - Rely on good software engineering practices such as

abstraction, reuse,

object oriented

programming,

functional programming

- Build hardware

like software

Chisel is not a

high-level

synthesis tool - It is a program

that writes RTL

Chisel Program

C++

code FPGA

Verilog ASIC

Verilog

Software

Simulator

C++ Compiler

Scala/JVM

FPGA

Emulation

FPGA Tools

GDS

Layout

ASIC Tools

32

Page 33: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Functional Programming 101

(1, 2, 3) map { n => n+1 } Result: (2, 3, 4)

(1, 2, 3) zip (a, b, c) Result: ((1, a), (2, b), (3, c))

((1, a), (2, b), (3, c)) map { case (left, right) =>

left } Result: (1, 2, 3)

(1, 2, 3) foreach { n => print(n) } Result: “123”

for (x <- 1 to 3; y <- 1 to 3) yield (x, y) Result: ((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2),

(3,3))

33

Page 34: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Functional Programming Example Used in AHB-Lite Crossbar

class AHBXbar(nMasters: Int, nSlaves: Int) extends Module { val io = new Bundle { val masters = Vec.fill(nMasters){new AHBMasterIO}.flip val slaves = Vec.fill(nSlaves){new AHBSlaveIO}.flip } val buses = List.fill(nMasters){ Module(new AHBBus(nSlaves)) } val muxes = List.fill(nSlaves){ Module(new AHBSlaveMux(nMasters)) } (buses.map(b => b.io.master) zip io.masters) foreach { case (b, m) => b <> m } (muxes.map(m => m.io.out) zip io.slaves ) foreach { case (x, s) => x <> s } for (m <- 0 until nMasters; s <- 0 until nSlaves) yield { buses(m).io.slaves(s) <> muxes(s).io.ins(m) } } 34

Page 35: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

TileLinkIO

35

- TileLinkIO consists of Acquire, Probe, Release, Grant, Finish

Client

Manager

Cache

Acqu

ire

Fin

ish

Gra

nt

Pro

be

Rele

ase

Client

Acqu

ire

Fin

ish

Gra

nt

Cache

Pro

be

Rele

ase

Page 36: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

UncachedTileLinkIO

36

Client

Manager

Cache

Acqu

ire

Fin

ish

Gra

nt

Pro

be

Rele

ase

Client

Acqu

ire

Fin

ish

Gra

nt

- UncachedTileLinkIO consists of Acquire, Grant, Finish - Convertors for TileLinkIO/UncachedTileLinkIO in uncore

library

Page 37: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

ROCCIO

Rocket sends

coprocessor instruction

via the Cmd interface

Accelerator responds

through Resp interface

Accelerator sends

memory requests to

L1D$ via CacheIO

busy bit for fences

IRQ, S, exception bit

used for virtualization

UncachedTileLinkIO for

instruction cache on

accelerator

PTWIO for page-table

walker ports on

accelerator

37

Page 38: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Vector Bank Execution Diagram

R R

R R

R

W

R

R

W

R

R

W

R

R

38

After a 2-cycle initial startup latency, the banked RF is

effectively able to read out 2 operands/cycle.

Page 39: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Our Physical Design Flow

39

RTL Code (Verilog)

Synthesis

Place-and-Route

Gate-level Netlist

Signed-Off Design

Formality

Formal Verification

PrimeTime/StarRC

Static Timing Analysis

VCS Post-PNR

Gate-level Simulation

Hierarchical SYN & PNR

UPF-based MV SYN & PNR

Chisel Source Code

Chisel

Page 40: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

What’s Different about RISC-V?

Simple - Far smaller than other commercial ISAs

Clean-slate design - Clear separation between user and privileged ISA - Avoids µarchitecture or technology-dependent

features

A modular ISA - Small standard base ISA - Multiple standard extensions

Designed for extensibility/specialization - Variable-length instruction encoding - Vast opcode space available for instruction-set

extensions

Stable - Base and standard extensions are frozen - Additions via optional extensions, not new versions

40

Page 41: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

SPECint2006 Code Size

RISC-V is smallest ISA for 32- and 64-bit addresses

- Average 34% smaller for RV32C, 42% smaller for RV64C

41

142%

100%

141% 131% 129%

169%

80%

100%

120%

140%

160%

180%

RV64C RV64 X86-64 ARMv8 MIPS64

64-bit Address

134%

100%

140%

126% 136%

101%

173%

126%

80%

100%

120%

140%

160%

180%

32-bit Address

Page 42: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

RISC-V Documentation, Software Ecosystem

Documentation - RISC-V User-Level ISA Specification V2

- RISC-V Compressed ISA Specification V1.7

- RISC-V Privileged-Level ISA Specification V1.7

Modern Software Stack - GCC/Binutils/glibc/newlib/GDB

- LLVM/Clang

- Linux

- Yocto (OpenJDK, Python, Scala, …)

RISC-V Software Implementation - ANGEL JS ISA Simulator

- Spike ISA Simulator

- QEMU

Checkout http://riscv.org for more details 42

Page 43: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

RISC-V Core Generators from UC Berkeley

Z-scale: Family of Tiny Cores - Similar class: ARM Cortex M0/M0+/M3/M4

- Integrates with AHB-Lite interconnect

- Open-Sourced

Rocket: Family of In-order Cores - Currently 64-bit single-issue only

- Plans to work on dual-issue, 32-bit options

- Similar class: ARM Cortex A5/A7/A53

- Will integrate with AXI4 interconnect

- Open-Sourced

BOOM: Family of Out-of-Order Cores - Supports 64-bit single-, dual-, quad-issue

- Similar class: ARM Cortex A9/A15/A57

- Will integrate with AXI4 interconnect

43

Page 44: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

CoreMark Scores

44

0

1.00

2.00

3.00

4.00

5.00

6.00

CoreMark/MHzC

ore

Mark

/MH

z

UC BerkeleyIndustry*Comparisons

2

Checkout Chris Celio’s “BOOM: Berkeley Out-of-Order

Machine” talk from 2nd RISC-V Workshop for more details

Page 45: Raven: A 28nm RISC-V Vector Processor with … Raven: A 28nm RISC-V Vector Processor with Integrated Switched-Capacitor DC-DC Converters and Adaptive Clocking Yunsup Lee, Brian Zimmer,

Glossary

DVFS: Dynamic Voltage and Frequency Scaling

SC: Switched Capacitor

DLL: Delay-Locked Loop

LDO: Low-Dropout Regulator

FDSOI: Fully Depleted Silicon-on-Insulator

FMA: Fused-Multiply-Add

BTB: Branch Target Buffer

BHT: Branch History Table

RAS: Return Address Stack

45