advanced processor architectures for embedded systems witawas srisa-an csce 496: embedded systems...
Post on 19-Dec-2015
246 views
TRANSCRIPT
![Page 1: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/1.jpg)
Advanced Processor Architectures for Embedded Systems
Witawas Srisa-anCSCE 496: Embedded Systems Design and
Implementation
![Page 2: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/2.jpg)
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
![Page 3: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/3.jpg)
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
Hardwire, GPPPerform well in most conditions
but not extreme conditions
![Page 4: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/4.jpg)
(R)evolution of Processors
Rock Hard
Ice Hard
Play DoughHard
GPP with FPGAsCustom designs perform wellin some extreme conditions.
Required extensive knowledgeof hardware design
![Page 5: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/5.jpg)
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
GPP with embedded programmable logicsReconfiguration triggered
by software
![Page 6: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/6.jpg)
(R)evolution of Processors
• Ice Hard– Contains ASIC
(Application Specific IC) designs
• Increases time-to-market
• Takes time to reconfigure
![Page 7: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/7.jpg)
Software Hotspots
• In DSP– 80% of the processing load are spent on 20%
of the code• Hand tuned assembly that can take thousands of
cycle to execute.• Less portable
– The remaining 80% of the code have complex system functions
• Run well on most GPP
![Page 8: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/8.jpg)
Software Hotspots Example• when 16 QuadAM modem (19.2 Kbaud) implemented
entirely in software – takes 177,000 instruction cycles to execute on
TIC6711
FPGA Co-processor (a few cycles)
![Page 9: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/9.jpg)
Solving Hotspots
PROCESSOR + FPGA MULTIPLE DSPs
PP P
PFPGA
DSP ENABLEDPROCESSORS
P
P RISC
PROCESSORPROGRAMMABLE
LOGIC
![Page 10: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/10.jpg)
An Example of Configurable Processor (Stretch S5000)
ALUFPU
32-BIT RF
CO
NT
RO
L
128-BIT WRF32-BIT RF
ALUFPU
S5 ENGINE
I/O I/O
I/O
I/O + DMA
ISEFInstruction-Set
Extension Fabric
DATA RAM32KB
SRAM256KB
D-CACHE32KB
I-CACHE32KB
MMU
S5 Engine Common To
All S5000 Processors
300 MHz Xtensa-V
32-bit RISC Processor
I/O Subsystem Tailored To Markets &
Applications
Programmable Logic Data Path Inside The RISC
Processor
32 x 128b Wide Registers +
Flexible Wide Load/Store Instructions
![Page 11: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/11.jpg)
Programmable Logic Architecture
RISC DP
Instruction Set Extension Fabric
(ISEF)
WRAR
Memory
128
32
128
128128
128
3232 3232
![Page 12: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/12.jpg)
ISEF Resources• An ISEF includes:
– Computation resources – Routing resources– Pipeline resources– State Register resources
• 2 types of computation resources:– 4096 arithmetic units (AUs) for arithmetic and logic operations– 8192 multiplier units (MUs) for multiply and shift operations
• Example: A single ISEF may implement– 32 16*16 multipliers– 128 32-bit ALUs
![Page 13: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/13.jpg)
Wide Register• Wide register file is used for holding
WR data
– 32 WR registers (128-bits each)
– Divided into 2 banks of 16 registers (WRA and WRB)
• The WRA/WRB types associate a variable with WR bank A/B
– WRA v1, v2, v3;
– WRB w1, w2, w3;
• The WR type defaults to WRA
– Use WRA/WRB to avoid unnecessary register moves between the two WR banks
128128
WritePort 0
WritePort 1
128 bits 128 bits
0
15
ReadPort 0
ReadPort 1
ReadPort 2
128 128128
WRA WRB
1
...
128 bits128 bits
![Page 14: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/14.jpg)
Extension Instructions (EIs)• The power of the Software Configurable Processor
(SCP) architecture is derived from the ability to define new and complex instructions that operate on very wide data
• Extension Instruction’s 3 steps
1. EI Definition: write a Stretch-C function
2. EI Compilation: compile the Stretch-C function
3. EI Use: call an EI through its intrinsic in the application code (C/C++)
![Page 15: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/15.jpg)
Extension Instructions1. Define an Extension Instruction (writing Stretch-C)
#include <stretch.h>SE_FUNC void V_AND8(WR v1, WR vMask, WR *vOut) {
*vOut = v1 & vMask;}
2. Compile and link EI (Stretch-C source file: *.xc)
3. Use EI in C/C++ application code (calling intrinsics)
#include “vector.h”WR v1, vMask, vOut;…WRL128I(&v1, (WR*) memSrc1Ptr, 0);V_AND8(v1, vMask, &vOut);WRS128I(vOut, (WR*) memDstPtr, 0);
vector.xc
![Page 16: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/16.jpg)
Extension Instructions
• Extension Instructions– Are issued by the Xtensa– Read source operands from the
128-bit WR and/or 32-bit AR register files
– Execute out of the ISEF– Write destination operands to
WR
• Once the ISEF is configured with the new instruction, it may be– Called as an intrinsic from
application C code– Used as an assembly
instruction in an assembly source file
ISEF
128
WRReadPort 2
128
WRReadPort 1
128
WRReadPort 0
32
ARReadPort 1
32
ARReadPort 0
WritePort 1
128
WritePort 0
128
![Page 17: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/17.jpg)
Writing Stretch-C Functions#include <stretch.h>
SE_FUNC void V_AND128(WR v1, WR v2, WR *vOut)
{*vOut = v1 & vMask;
}
• #include stretch.h header file
• Stretch-C functions are identified by keyword SE_FUNC void
• EI names are identified by the Stretch-C function name (for single instruction functions)
• EI source and destination operands are defined by the Stretch-C function parameters
• EI operation is defined by the Stretch-C function instructions
![Page 18: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/18.jpg)
Extension Instruction Parameters 1• Extension Instructions are user
defined assembly instructions that use input and output operands
• An Extension Instruction can specify up to 3 Parameters– 0, 1, 2, or 3 inputs– 0, 1 or 2 outputs
• Input and output parameters reside in register files– Inputs come from the WR
or AR register files– Outputs may only be
written to the WR register file
WR
WRA WRB
Extension Unit
128 128
128128
AR
12832 32
ISEF
Assembly# result = a + bADD result, a, b
Stretch-C// RESULT = A + BV_ADD4(A, B, &RESULT);
![Page 19: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/19.jpg)
Extension Instruction Parameters 2
• EI source operands (inputs) may include– Up to 3 WR inputs (use WR,
WRA or WRB)– Up to 2 AR inputs (use int,
short, etc.)
• EI destination operands (outputs) may include– Up to 2 WR outputs, each
writing a separate WR bank– Use the C pointer notation for
outputs
• A single WR parameter may be used as both an input and output operand
SE_FUNC voidFOO(int c1, WR v1, WRB
*vOut){ }
SE_FUNC voidFOO(WR v1, WRA *vOut1, WRB
*vOut2){ }
SE_FUNC voidFOO(WR v1, WRA *vInOut1, WRB
*vOut2){ }
![Page 20: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/20.jpg)
Example of Stretch-C
• RGB2YCrCbY = 0.299 R + 0.587 G + 0.114 B
Cr = 0.701 R - 0.587 G - 0.114 B
Cb = -0.299 R - 0.587 G + 0.886 B
Or
Y = (77R + 150G + 29B) >> 8
Cb = (-43R - 85G + 128B + 32768) >> 8
Cr = (128R - 107G + 21B + 32768) >> 8
![Page 21: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/21.jpg)
RGB2YCCSE_FUNC void rgb2ycc(WR A, WR *B){ se_sint<8> r[5], g[5], b[5]; se_sint<8> y[5], cb[5], cr[5]; int i, j; /* unpack A to RGB data, does not use any ISEF logic */ for (i = 0; i < 5; i++) { j = i * 3 * 8; r[i] = A(j+7, j); g[i] = A(j+15, j+8); b[i] = A(j+23, j+16); } /* converting 5 pixels */ for (i = 0; i < 5; i++) { y[i] = ( 77*r[i] + 150*g[i] + 29*b[i] ) >> 8; cb[i] = (-43*r[i] - 85*g[i] + 128*b[i] + 32768) >> 8; cr[i] = (128*r[i] - 107*g[i] - 21*b[i] + 32768) >> 8; } /* pack YCbCr to B */ *B = (cr[4],cb[4],y[4],cr[3],cb[3],y[3],cr[2],cb[2],y[2],cr[1],cb[1],y[1],cr[0],cb[0],y[0]);}
![Page 22: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/22.jpg)
Stretch Compiler
scc
libei.hlibei.a
rgb2ycc.xc
scc
rgb2ycc.c
scc
rgb2ycc.exe
rgb2ycc.o
<stretch.h>
target
compile
link
Stretch compile
run
![Page 23: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/23.jpg)
Compiler Option
Aruba
Stretch Compiler
Stretch Linker
C/C++ Compiler(xt-xcc, gcc, …)
NativeISS
Compilation Option -ms5610-ms5-iss (default)-stretch-nobits
-ms5-native
.xo Object File Includes Configurationbitstream for ISEF
.dll for im plem enting ExtensionInstructions (EIs)
C++ functions for EIs
Target Aruba device Instruction Set Sim ulator Native (e.g.: x86)
.exe
libei.a, libei.h
.xo.xr
.c, .cc
.xcscc shell
S5000
![Page 24: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/24.jpg)
Summary
• Software Configurable Processor– Describe hardware using C/C++
• But not trivial. Basic understanding of the architecture is needed
– Reconfiguration can take place in 150 micro-seconds
• 2 ISEFs per chip – Can ping pong
• Configuration files stored in SDRAM– Use DMA to preload information
• ISEF is proprietary and NOT FPGAs
![Page 25: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation](https://reader030.vdocuments.mx/reader030/viewer/2022033018/56649d3f5503460f94a196f0/html5/thumbnails/25.jpg)