out-of-order openrisc 2 semesters project
DESCRIPTION
Out-of-Order OpenRISC 2 semesters project . Semester B: OR1200 ISA Extension Final B Presentation. 10.3.14 . By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach. Spring 2013. Content:. 1 . Project Overview a. Background - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/1.jpg)
Out-of-Order OpenRISC2 semesters project
Semester B: OR1200 ISA Extension Final B Presentation
By: Vova Menis-Lurie Sonia GershkovichAdvisor: Mony Orbach
10.3.14
Spring 2013
![Page 2: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/2.jpg)
Content:1. Project Overview
a. Background b. Goals
2. The System: OR12003. Project Flow
a. Simulation Environmentb. Out-of-Order Implementationc. Super Scalar implementationd. ISA Extension
4. Conclusions
![Page 3: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/3.jpg)
Project Overview
Background• OpenRISC 1200 is an open source Verilog implementation of OR1000 ISA
• As a part A, we created basic working environment on XUPV5 board and SoC with OR1200 CPU
![Page 4: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/4.jpg)
Project Overview
Project GoalInitial Goal:
Out-of-Order execution processor implementation based on OR1200 implementation
Changed goal:Super Scalar processor implementation based on OR1200 implementation
Final Goal
ISA Extension Implementation for OR1200
![Page 5: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/5.jpg)
CPU
![Page 6: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/6.jpg)
MMU
CPUQMEM
OR1200 top
IMMU
DMMU
32
32
Cache
ICache
DCache
3232
3232
StoreBuffer
WBI
Instruction
WBIU
DataWBIU
3232
32WB bus
WB bus
![Page 7: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/7.jpg)
1. Cache initialization function in assembly to enable cache.
(WB Interface protocol require 3 cycles for each transaction – not effective for rtl analyze and implementation
improvements )
2. Simulation Environment Creation (Testbench)
3. Out-of-Order implementation – try
4. Super-Scalar implementation – try
5. ISA extension of current implementation
Project Flow
![Page 8: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/8.jpg)
Environment features:• UART interface emulation• Waveform generation • One Makefile to:
• RTL Compilation• Testbench instantiation• C program compilation• Run simulation• Assembly code file creation
• XILINX ram initialization file
Simulation Environment
![Page 9: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/9.jpg)
Environment features:• Advanced monitor:
• Monitoring all data and control transactions of SoC• Monitoring states and SPRS values• Creates log files with desired information:
• States of register file after each command
• Execution time analysis
Simulation Environment
![Page 10: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/10.jpg)
Fundamental statements (based on Tomasulu algorithm):• Execution parallelism should be implemented !!• Non-arch shadow registers implementation.• In order commitment. (SW executes in order)
Out of Order implementation – try
ALU
OR1200 IF
GenPCOR1200
CTRL
Except
Freeze
MAC
LSU
FPU
SPRS CFGROR1200
RF
PCNext PC
Operand MUX
OR1200 top
OR1200 top
OR1200 top
WB MUX
CPU
• For LSU instruction parallelism–multiple ports memory and wider bus-multiple port Cache, QMEM and MMU
• Branch prediction is not necessary – delay slot at compiler level
• Multiple ALU – not effective solutionALU instructions executed in one cycle
![Page 11: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/11.jpg)
Fundamental statements :.• Still in-order commitment. Multiple execution should not affect SW in-order
execution• Non-parallel Fetch and Decode to avoid instructions dependencies.
Super Scalar implementation – try
• Fetch and Decode units should be completely rewritten based on current implementation
• Exception engine should support 2 pipes – requires exception unit complete redesign
• Not all dependencies can be seen at fetch/decode stage LSU results may be required
• Multiple port SPRS should be implemented.
• Parallel LSU instruction execution in 2 pipes requires multiple port memories and wider bus
![Page 12: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/12.jpg)
• gcc OR1000 compiler and assembler support empty slots for custom ISA extension
• 8 non-parameter commands:• l.cust1• l.cust2• l.cust3• l.cust4• l.cust6• l.cust7• l.cust8
• 1 highly parameterized command• l.cust5 Rd , Ra , Rb , L immediate[5:0] , K immediate [4:0]• Allows 2048 !! commands which operates on 3 registers.
• ISA extension will not be used by compiler to generate assembly code from given C code, but gcc allows assembly commands use aside C code.
ISA Extension – final goal
![Page 13: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/13.jpg)
4 Non parameterized commands
• l.cust1• Set flag (unconditioned)
• l.cust2• Unset flag (unconditioned)
• l.cust3• Set carry (unconditioned)
• l.cust4• Unset carry (unconditioned)
l.cust Commands Implementation
![Page 14: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/14.jpg)
l.cust5 parameterized command : K immediate defines command, L immediate defines options
• K=0x1 • Replaces A[L_byte] with B[0_byte] and put result in D
• K=0x2 • SET bit A[L] (Result in D)
• K=0x3 • UNSET bit A[L] (Result in D)
l.cust Commands Implementation
![Page 15: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/15.jpg)
l.cust5 parameterized command : K immediate defines command, L immediate defines options
• K=0x4 • Slice A(MSB’s) and B(LSB’s) and put result in D >> D = {A[32-L:L] , B[L-1:0]}
• K=0x5 • Slice B(MSB’s) and A(LSB’s) and put result in D >> D = {B[32-L:L] , A[L-1:0]}
• K=0x6 • Rotate A >> D = A[0:31]
l.cust Commands Implementation
![Page 16: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/16.jpg)
l.cust5 parameterized command : K immediate defines command, L immediate defines options
• K=0x7 • Rotate A by bit- Hword-wise >> D = {A[16:31] , A[0:15]}
• K=0x8 • Rotate A by bit- byte-wise >> D = {A[24:31] , A[16:23] , A[8:15] , A[0:7]}
• K=0xa • Check if A is even. If true D=1 and set flag else D=0
• K=0xb • Check if A is odd. If true D=1 and set flag else D=0
l.cust Commands Implementation
![Page 17: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/17.jpg)
l.cust5 parameterized command : K immediate defines command, L immediate defines options
• K=0xe • L=2: Rotate A 2bytes MSB’s with 2bytes LSB’s >> D = {A[15:0] , A[31:16]}• L=4: Rotate A byte-wise >> D = {A[7:0] , A[15:8] , A[23:16] , A[31:24]}• L=8: Rotate A Hbyte-wise >> D = {A[3:0] , A[7:4] , A[11:8] , A[15:12] , A[19:16] , A[23:20] , A[27:24] ,A[31:28]};
• K=0xf
• L=0: Mirror LSB’s >> D = {A[0:15] , A[15:0]}• L=1: Mirror MSB’s >> D = {A[31:16] , A[16:31]}
l.cust Commands Implementation
![Page 18: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/18.jpg)
ISA Extension – FPGA provenTest C program
![Page 19: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/19.jpg)
ISA Extension – FPGA provenUART output
![Page 20: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/20.jpg)
FPGA UtilizationOld RTL New RTL
~1% change
![Page 21: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/21.jpg)
• Given implementation is not suitable for any significant u-Arch improvements
• Out-of-Order / Super-Scalar OR1200 implementations are possible but should
be done from scratch.
• Written in assembly software can be easily optimized for specific application
due to l.cust instructions (2048 instructions with 5 operands)
Conclusions
![Page 22: Out-of-Order OpenRISC 2 semesters project](https://reader035.vdocuments.mx/reader035/viewer/2022062813/56816460550346895dd63ab7/html5/thumbnails/22.jpg)
Thank you!