Download - Programming the PS3
Programming with Linux on the Playstation3
FOSDEM [email protected]
Architecture overview: introducing the Cell BE
Installing Linux SIMD programming in C/C++ Asynchronous data transfer with
the DMA
Who am I Java / Python developer at Nuxeo (FOSS document
management server) Interested in Artificial Intelligence (and need fast
Support Vector Machines) Slides to be published at:
http://oliviergrisel.name
PS3 architecture overview CPU: IBM Cell/BE @ 3.2GHz
218 GFLOPS Main RAM: 256MB XDR ([email protected])
GPU: Nvidia RSX 1.8 TFLOPS (SP) / 356 GFLOPS programmable VRAM: 256MB GDDR3 (2x128b@700MHz)
System Bus: 2.5 GB/s
The Cell Broadband Engine 1 PPE core @ 3.2GHz
64bit hyperthreaded PowerPC
512KB L2 cache 8 SPE cores @ 3.2GHz
128bit SIMD optimized 256KB SRAM
PS3 Clusters Cheap cluster for
academic researchers Carolina State U. and
U. Massachusetts at D. 8+1 cluster with ssh and
MPI
PS3 GRID Computing PS3GRID project
based on BOINC 30,000 atoms simulation
Folding@Home 1 PFLOPS with 800
TFLOPS from PS3s BlueGene == 280
TFLOPS
Linux on the PS3 Lv1 Hypervisor shipped with the default firmware Partition utility in the Sony Game OS menu Choose your favorite distro:
Install a powerpc64smp or ps3 kernel Install gccspu + libspe2
Programming the Cell/BE in C Program the PPE as a chief conductor to spread the
numerical code to SPEs Use POSIX threads to start SPE subroutines in
parallel Use SPE intrinsics to perform vector instructions Eliminate branches as much as possible in SPE code Align your data to 16 bytes
Introduction to SIMD programming 128 bits registers (SSE2, Altivec, SPE)
2 x double 4 x float 4 x int
introduce new vector types 1 vector float operation == 4 float operations logical (and, or, cmp, ...), arithmetic (+, *, abs, ...),
shuffling
SIMD programming – the big picture
Not always SIMDizable
SIMD programming with libspe2 and gccspu
#include <spu_intrinsics.h> avoid scalar types use:
vector_float4 vector_double2 vector_char16 ...
d = spu_and(a, b); e = spu_madd(a, b, c); spugcc pure_spe_prog.c o pure_spe_prog.elf
Branch elimination avoid branching (if / else)
c = spu_sel(a, b, spu_cmpgt(a, d));
A sample SPE programvolatile union {
vec_float4 vec;float part[4];
} sum;float dot_product(const float* xp, const float* yp, const int size) {
sum.vec = (vec_float4) {0, 0, 0, 0}; vec_float4* xvp = (vec_float4*) xp; vec_float4* yvp = (vec_float4*) yp;
vec_float4* xvp_end = xvp + size / 4;while(__builtin_expect(xvp < xvp_end, 1)) {
sum.vec = spu_madd(*xvp, *yvp, sum.vec);xvp++;yvp++;
}return sum.part[0] + sum.part[1] + sum.part[2] + sum.part[3];
}
DMA with the SPUs' Memory Flow Controllers
#include <spu_mfcio.h> mfc_get(&local_data, main_mem_data_ea,
sizeof(local_data), DMA_TAG, 0, 0); mfc_put(&local_data, main_mem_data_ea,
sizeof(&local_data), DMA_TAG, 0, 0); mfc_getb(&local_data, main_mem_data_ea,
sizeof(local_data), DMA_TAG, 0, 0); spu_mfcstat(MFC_TAG_UPDATE_ALL);
Doublebuffering – the problem
Doublebuffering – the big picture
Doublebuffering with MFC 1. SPU queues MFC GET to fill buffer #1 2. SPU queues MFC GET to fill buffer #2 3. SPU waits for buffer #1 to finish filling 4. SPU processes buffer #1 5. SPU queues MFC PUT back content of buffer #1 6. SPU queues MFC GETB to refill buffer #1 7. SPU waits for buffer #2 to finish filling 8. SPU processes buffer #2 (...)
Some resources Cell BE Programming Tutorial (ibm.com 190 pages) IBM developerworks short programming tutorials
Search for articles by Jonathan Barlett Barcelona Supercomputing Center (software)
http://www.bsc.es/projects/deepcomputing/linuxoncell/ PS3 programming workshops (videos)
http://www.cc.gatech.edu/~bader/CellProgramming.html #ps3dev on freenode
Thanks, credits, licensing Most schemas from excellent GFDL 'd tutorial by
Geoff Levand (Sony Corp) http://www.kernel.org/pub/linux/kernel/people/geoff/cell
Pictures and trade marks belong to their respective owners (Sony, IBM, Universities, Folding@Home, PS3GRID, ...)
All remaining work is GFDL
7 differences