introduction cell processor

Post on 29-Nov-2014

691 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

shared by Mansoor Mirza

TRANSCRIPT

Introduction Cell Processor

Why Cell Processor

Performance improvement with increase in frequency Possible due to increase in transistor

density Clock frequency is timing reference for a

processor Power density

Leakage currents increase with reducing the transistor density

Increase the idle power consumption

2

History of Cell Processor

A powerful processor of next generation of PS2 Powerful multimedia and broadband

network interface IBM contribution in shaping the concept

of Cell processor Collaboration with Toshiba STI Alliance

3

History of Cell Processor

Development of Cell 1999: Sony proposed partnership with IBM

for successor of PS2 2001: STI alliance initiated the development

on Cell 2004: first prototype of Cell 2005: Sony unveil the PS3 in an E3 2006: official release of PS3, Cell SDK by

IBM 2008: IBM Roadrunner become fastest

supercomputer in the world (1.026 pflops)

4

Overview of Cell

5Design and Animation Game Programming Graphics Programming Matthew Scarpino

Overview of Cell

66.189 IAP 2007 MIT

Cell components

Memory Interface Controller (MIC) Bus Interface Controller (BIC) PowerPC Processor Element/Unit

(PPE/PPU) Synergistic processing Element/Unit

(SPE/SPU) Element Interconnect Bus (EIB) Input/Output InterFace (IOIF)

7

Cell components

MIC Connects the processor with system

memory Two channels to system memory Xteram Data Rate Dynamic Random Access

Memory (XDR DRAM) Can support 8 data transfers per second Provides high data flow at low frequency

PS3 contains 256 MB XDR DRAM

8

Cell components

PPU Based on IBM PowerPC architecture RISC architecture Cell control center

Runs operating system Manages interrupts Manages L2 shared cache Issues work to SPU

9

Cell components

10

PPU

Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

11

PPU 64bit architecture Supports SIMD Supports cell related functions Dual thread processor Computation power is reduced

PPU is not computational element in Cell Reduces power consumption

Cell components

12

Functional units of PPU

Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

13

Instruction unit (IU) Fetches and executes the instruction

Load and Store Unit Receives the memory access request

Vector/Scalar Unit (VSU) Contains Floating Point Unit Performs FP operations on individual or

multiple operands

Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

14

Fixed point unit (FPU) Performs fix point operations

Arithmetic and logical operations

Memory Management Unit (MMU) Performs virtual memory management

PPU registers Provides quick access to operands Some functional unit can access only

processor registers

Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

15

32 general purpose registers 32 floating point registers Link register

Holds branch address of upcoming target Count register

Holds branch address of upcoming target (or)

Holds loop counter Fixed point exception register

Holds carry and overflow bits for fixed point op. Design and Animation Game Programming Graphics Programming Matthew

Scarpino

Cell components

16

Condition register Holds status of arithmetic, logical or

comparison Floating point status and control

register Status of scalar FP operation

Vector registers Contains data for vector operations

Vector status and control register Holds saturation bit for vector operation

Vector register save and restore register Saves vector registers in case of context

switch Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

17

SPU Basic work horse of Cell Designed to executes SIMD Separate Instruction set Takes the work for PPU Does have any cache No virtual memory Each SPU can contain only 256KB of

memory

Cell components

18

SPU SPU can only access its own 256KB memory

directly Dynamic Memory Access is required to

transfer the required data to SPU Memory alignment is required to pass data

to SPU Different methods to communicates with

PPU and other memory

Cell components

19Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

Purpose of SPU Take 128-bit data to local register Apply operation on it Save the result to local memory

Two distinct pipelines Even pipeline handles mathematical

operations Odd pipeline handles everything else

20

Cell components

SPU Control Unit (SCN) Fetches and dispatches the instructions Perform branching and other control

operations SPU even fixed point unit

Handles logic/arithmetic operations Performs comparisons and reciprocations

for FP SPU odd fixed point unit

Performs bit level shifts, rotations, and shuffling

21

Cell components

SPU floating point unit Performs floating point operations

SPU load/store unit Performs loads and stores Manages branch targets and DMA to Local

store SPU channel and DMA unit

Communicates with Memory Flow Controller Controls DMA transfer

22

Cell components

SPU registers 128 general purpose registers Floating point status and control registers

Contains status and results of floating point operations

SPU local Store (LS) Each SPU contains very low latency 256KB

memory It acts as local cache for SPU All data transfer is responsibility of the

programmer

23

Cell components

SPU local Store (LS) Not a cache just an SRAM Only one read/write operations per second Operations accessing the LS

DMA Transfer data from main memory to LS

SPU load/store Reads/writes 16 bytes at a time

Instruction fetch Reads 128 bytes of the LS at once

24

Cell components

SPU local Store (LS) Does not support virtual memory Tradeoff between cache coherence and

fetching the data to LS LS is low latency memory Cache coherence protocols are used for other

processors Data is transferred to LS using high throughput

EIB via DMA instead of cache coherence protocols Make the hardware simple

25

Cell components

communications between SPU and other system DMA Mailboxes Events and signals

26

Cell components

DMA Transfers data to LS Asynchronous in nature

SPU continues its operation while DMA Transfers data in chunk of bytes of size

power of 2 Provides control to manage and synchronize

the data transfer One DMA can maximum transfer 16KB

27

Cell components

28Design and Animation Game Programming Graphics Programming Matthew Scarpino

Cell components

EIB Connects all the system components Consists of four data ring (two clockwise

and two counter-clockwise) One ring is for control signals One bus cycles can transfer 16 bytes of

data Each ring can carry three DMA requests

simultaneously Each DMA takes at least 8 cycles to

complete

29

Cell components

MFC Coprocessor to communicate between SPU

and EIB Process data transfer without interrupting

the SPU SPU requests the MFC to get the data MFC processes the rest of data transfer

30

Cell components

Mailboxes Simplest way to transfer the data between

PPU and SPU Can only transfer 4 bytes of data Provides one-to-one communication Mailbox channels

Outgoing mailbox Outgoing interrupt mailbox

Holds the data for outside world and cause interrupt if applicable

Incoming mailbox

31

Cell components

Events and signals Commonly used for DMA notifications Signals can be sent directly to outside world Signals can provide one-to-many style

communication

32

Cell components

Events and signals Commonly used for DMA notifications Signals can be sent directly to outside world Signals can provide one-to-many style

communication

33

Software development of Cell

Different instruction sets for SPU and PPU

Different compilers are required to compile the applications for two codes

Embedding the SPU code in PPU executable

34

Software development of Cell

Tools to compile the application for Cell PPU compiler

ppu-gcc SPU compiler

spu-gcc Embed SPU code to PPU

ppu-embedspu

35

Software development of Cell

Cell simulator Full System Simulator Emulates all system components Can provides cycle accurate information Provides graphical interface to se and

interact with system components

36

Software development of Cell

37IBM Full System Simulator user guide

Software development of Cell

Three modes Fast mode Simple mode Cycle mode

Graphical visualization of SPU and PPU Provides debugging and profiling

information Provides system utilization information 38

Software development of Cell

39

Software development of Cell

40Design and Animation Game Programming Graphics Programming Matthew Scarpino

top related