synergistic processing in cell’s multicore architecture michael gschwind , et al

12
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08

Upload: tillie

Post on 24-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind , et al. Presented by: Jia Zou CS258 3/5/08. Goal for Cell. Increase processor efficiency for most performance per area Reduce area per core, have more core in a given chip are - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Synergistic Processing In Cell’s Multicore Architecture

Michael Gschwind, et al.

Presented by: Jia ZouCS2583/5/08

Page 2: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Goal for Cell

• Increase processor efficiency for most performance per area

• Reduce area per core, have more core in a given chip are

• Take advantage of the application parallelism– Aimd at data-processing intensive applications

Page 3: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Cell Architecture

Page 4: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Design Philosophy

• Simple cores, lots of them– Any complexity reduction directly translates into

increased performance– Exploiting the compiler to eliminate hardware

complexity• PPE serves as controller, SPE provides

performance– PPE and SPEs share address translation and virtual

memory architecture

Page 5: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Synergic Processing Unit

Page 6: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Data alignment for Scalar and Vector Processing

• SPU has no separate support for scalar processing– Unified scalar/SIMD register – Unified execution unit– Simpler control unit• Software-controlled data-alignment approach

– Simplifies scalar data extraction, insertion, sharing between scalar and vector data• Increases compiler efficiency

Page 7: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Scalar Layering

Page 8: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Data-Parallel Conditional Execution

Page 9: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Deterministic Data Delivery

• SPE has local stores– 4Kb – 4Gb address range– Stores both instruction and data– All memory operations that the SPU executes refer

to address space of this local store– Different from cache memory by:• No cache coherency problem• Offers low and deterministic access latency

Page 10: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Statically Scheduled ILP

• Instruction fetches are scheduled statically• Delivery up to two instructions per cycle– One to each complex

• Static branch prediction: prepare-to-branch instruction => initiate instruction prefetch

Page 11: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

SPE Microarchitecture

Page 12: Synergistic Processing In Cell’s  Multicore  Architecture Michael  Gschwind , et al

Design Goals and Decisions