pulpino: a small single-core risc-v soc -...

1
1 Introduction 3 RI5CY - Implementation 4 PULPino 5 Conclusion and Outlook PULPino: A small single-core RISC-V SoC Andreas Traber , Florian Zaruba, Sven Stucki, Antonio Pullini, Germain Haugou, Eric Flamand, Frank K. Gürkaynak, Luca Benini ETH Zurich, Integrated Systems Laboratory PULPino is an open-source microcontroller-like platform featuring a 32-bit RISC-V core. PULPino evolved from the effort of publishing PULP, the Parallel Ultra-Low- Power Platform that is developed as a joint project between ETH Zurich and the University of Bologna, as open-source. Since PULP is a huge project, PULPino has been created as a first-step which re-uses the IPs and cores that were developed for PULP. While the focus for PULP is on extreme energy-efficiency, the spotlight for PULPino is on simplicity and ease of use. Our goal is to create a platform for low power signal processing, this means tuning the core micro-architecture and the ISA towards this goal. For this purpose we have created non-standard RISC-V instruction set extensions. Those extensions have a low overhead in terms of area and power when they are not in active use. Where those extensions are applicable, they allow for speedups of up to 2x. The first set of extensions that were created are hardware loops, post-in- crementing load and stores, multiply-accumulate with optional sub-word selection and extensions for the ALU that can perform common opera- tions like minimum, maximum, average and absolute value in one cycle. PULP Parallel Ultra Low Power P P 2 RI5CY - Signal Processing ISA Extensions Cortex-M4 OR10N (Base) RI5CY (Base) OR10N (Ext.) RI5CY (Ext.) aes-cbc keccak sha conv2d ӽW fdct ipm ӾU mmul8 mmul16 mmul32 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 Speedup vs RISC-V Simple vector addition for (i = 0; i < 100; i++) { d[i] = a[i] + 1; } Baseline mv x4, 0 mv x5, 100 Lstart: lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) addi x11, x11, 4 addi x4, x4, 1 bne x4, x5, Lstart + hardware loops lp.setupi 100, Lend lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) Lend: addi x11, x11, 4 + post. increment lp.setupi 100, Lend lw x2, 4(x10!) addi x2, x3, 1 Lend: sw x2, 4(x11!) - 2 instr. - branch cost - 2 instr. PULPino is a minimal system focused on the core. It has no caches, no memory hierarchy and no DMA. It re-uses the peripherals and the core from the PULP project. It contains an event unit that can put the core to sleep and wake it up when there is an event or interrupt from a periph- eral. To program PULPino the SPI slave or the advanced debug unit can be used. Both have access to the whole memory map and can control PULPino. PULPino can also be used standalone without an external host that controls it. In this mode it starts by using the boot loader contained in the boot ROM in order to load its program from an external SPI flash. PULPino is available for RTL simulation, FPGA and ASIC. The first tape- out will be done by the end of January in UMC 65 nm technology as a student project. RI5CY is a simple 4-stage in-order RISC-V core with support for vector- ized interrupts on 32 lines, events so that the core can sleep and save energy, exceptions on illegal instructions and memory accesses, basic debug support with software breakpoints and the possibility to use core internal performance counters. Integrated Systems Laboratory 1. scaled from 90nm to 65nm Core Instr. RAM instr data Data RAM Bridge Bridge Bridge AXI4 Interconnect APB Bridge UART GPIO SPI Master I C Boot RAM Adv. Debug Unit SPI Slave debug Timer Event Unit JTAG SPI SPI I C UART GPIO 2 2 PULPino is being released open-source under the SolderPad license. The release includes the complete PULPino RTL source, all IPs, the RI5CY core, the environment for RTL simulation and the complete FPGA build flow. We are currently working on extending the ISA further towards low power signal processing and adding a simple branch predictor to the core. For even easier extensibility and configuration PULPino is currently being ported to IP-XACT. 2. ARM Cortex M4, http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php RI5CY ARM Cortex M4 2 Technology 65 nm Conditions 25°C, 1.2 V Dynamic Power [μW/MHz] 17.5 Area [mm 2 ] 0.050 90 nm 25°C, 1.2 V 32.82 0.119 65 nm 25°C, 1.2 V 23.2 1 0.062 1 GPR addr rdata wdata Data Interface LSU ALU MULT CSR addr rdata Instruction Interface IF ID ID EX EX WB '0' '2' Decoder Prefetch Buffer CoreMark 0.5 1 1.5 2 2.5 3 CoreMark Score/MHz

Upload: others

Post on 18-Feb-2020

80 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PULPino: A small single-core RISC-V SoC - iis-projectsiis-projects.ee.ethz.ch/images/d/d0/Pulpino_poster_riscv2015.pdf · PULPino: A small single-core RISC-V SoC Andreas Traber, Florian

1 Introduction

3 RI5CY - Implementation

4 PULPino

5 Conclusion and Outlook

PULPino: A small single-core RISC-V SoCAndreas Traber, Florian Zaruba, Sven Stucki, Antonio Pullini, Germain Haugou, Eric Flamand, Frank K. Gürkaynak, Luca BeniniETH Zurich, Integrated Systems Laboratory

PULPino is an open-source microcontroller-like platform featuring a 32-bit RISC-V core. PULPino evolved from the effort of publishing PULP, the Parallel Ultra-Low-Power Platform that is developed as a joint project between ETH Zurich and the University of Bologna, as open-source. Since PULP is a huge project, PULPino has been created as a first-step which re-uses the IPs and cores that were developed for PULP. While the focus for PULP is on extreme energy-efficiency, the spotlight for PULPino is on simplicity and ease of use.

Our goal is to create a platform for low power signal processing, this means tuning the core micro-architecture and the ISA towards this goal. For this purpose we have created non-standard RISC-V instruction set extensions. Those extensions have a low overhead in terms of area and power when they are not in active use. Where those extensions are applicable, they allow for speedups of up to 2x.

The first set of extensions that were created are hardware loops, post-in-crementing load and stores, multiply-accumulate with optional sub-word selection and extensions for the ALU that can perform common opera-tions like minimum, maximum, average and absolute value in one cycle.

PULPParallel Ultra Low Power

P

P

2 RI5CY - Signal Processing ISA Extensions

Cortex-M4 OR10N (Base) RI5CY (Base) OR10N (Ext.) RI5CY (Ext.)

aes-cbc keccak sha conv2d fdct ipm mmul8 mmul16 mmul320.40.50.60.70.80.9

11.11.21.31.41.51.61.71.81.9

22.1

Spee

dup

vsRI

SC-V

Simple vector addition

for (i = 0; i < 100; i++) { d[i] = a[i] + 1;}

Baseline

mv x4, 0 mv x5, 100Lstart: lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11) addi x11, x11, 4 addi x4, x4, 1 bne x4, x5, Lstart

+ hardware loops

lp.setupi 100, Lend lw x2, 0(x10) addi x10, x10, 4 addi x2, x2, 1 sw x2, 0(x11)Lend: addi x11, x11, 4

+ post. increment

lp.setupi 100, Lend lw x2, 4(x10!) addi x2, x3, 1Lend: sw x2, 4(x11!)

- 2 instr. - branch cost

- 2 instr.

PULPino is a minimal system focused on the core. It has no caches, no memory hierarchy and no DMA. It re-uses the peripherals and the core from the PULP project. It contains an event unit that can put the core to sleep and wake it up when there is an event or interrupt from a periph-eral.

To program PULPino the SPI slave or the advanced debug unit can be used. Both have access to the whole memory map and can control PULPino. PULPino can also be used standalone without an external host that controls it. In this mode it starts by using the boot loader contained in the boot ROM in order to load its program from an external SPI flash.

PULPino is available for RTL simulation, FPGA and ASIC. The first tape-out will be done by the end of January in UMC 65 nm technology as a student project.

RI5CY is a simple 4-stage in-order RISC-V core with support for vector-ized interrupts on 32 lines, events so that the core can sleep and save energy, exceptions on illegal instructions and memory accesses, basic debug support with software breakpoints and the possibility to use core internal performance counters.

Integrated Systems Laboratory

1. scaled from 90nm to 65nm

Core

Instr.RAM

instr data DataRAM

Bridge

Bridge

Bridge

AXI4 Interconnect

APB

Bridge

UARTGPIO SPIMasterI C

BootRAM

Adv.Debug Unit

SPISlave

debug

Timer EventUnit

JTAGSPISPII CUARTGPIO

2

2

PULPino is being released open-source under the SolderPad license. The release includes the complete PULPino RTL source, all IPs, the RI5CY core, the environment for RTL simulation and the complete FPGA build flow.

We are currently working on extending the ISA further towards low power signal processing and adding a simple branch predictor to the core. For even easier extensibility and configuration PULPino is currently being ported to IP-XACT.

2. ARM Cortex M4, http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php

RI5CY ARM Cortex M42

Technology 65 nm

Conditions 25°C, 1.2 V

Dynamic Power [μW/MHz] 17.5

Area [mm2] 0.050

90 nm

25°C, 1.2 V

32.82

0.119

65 nm

25°C, 1.2 V

23.21

0.0621

GPR

addr rdatawdataData Interface

LSU

ALU

MULT

CSR

addrrdataInstruction Interface

IFID

IDEX

EXWB

'0'

'2'

DecoderPrefetchBuffer

CoreMark

0.5

11

1.5

2

2.5

3

Core

Mar

kSc

ore/

MH

z