title page cell broadband engine programming handbook · cell broadband engine programming handbook...
TRANSCRIPT
-
Cell Broadband Engine
Programming Handbook
Including the PowerXCell 8i Processor
Version 1.11
May 12, 2008
Title Page
-
®
Copyright and Disclaimer© Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corpora-tion 2006, 2008.
All Rights ReservedPrinted in the United States of America May 2008
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occur-rence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trade-marks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark infor-mation” at www.ibm.com/legal/copytrade.shtml
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu-ment was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Systems and Technology Group2070 Route 52, Bldg. 330Hopewell Junction, NY 12533-6351
The IBM home page can be found at ibm.com®The IBM semiconductor solutions home page can be found at ibm.com/chips
Version 1.11May 12, 2008
http://www.ibm.comhttp://www.ibm.com/chipshttp://www.ibm.com/legal/copytrade.shtml
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 3 of 884
Contents
List of Figures ............................................................................................................... 19
List of Tables ................................................................................................................. 23
Preface ........................................................................................................................... 29Related Publications ............................................................................................................................. 29Conventions and Notation ..................................................................................................................... 30Referencing Registers, Fields, and Bit Ranges .................................................................................... 31Terminology .......................................................................................................................................... 32Reserved Regions of Memory and Registers ....................................................................................... 32
Revision Log ................................................................................................................. 33
1. Overview of CBEA Processors ................................................................................ 391.1 Background ..................................................................................................................................... 40
1.1.1 Motivation .............................................................................................................................. 401.1.2 Power, Memory, and Frequency ........................................................................................... 421.1.3 Scope of this Handbook ........................................................................................................ 42
1.2 Hardware Environment ................................................................................................................... 441.2.1 The Processor Elements ....................................................................................................... 441.2.2 Element Interconnect Bus ..................................................................................................... 441.2.3 Memory Interface Controller .................................................................................................. 451.2.4 Cell Broadband Engine Interface Unit ................................................................................... 45
1.3 Programming Environment ............................................................................................................. 461.3.1 Instruction Sets ...................................................................................................................... 461.3.2 Storage Domains and Interfaces ........................................................................................... 461.3.3 Byte Ordering and Bit Numbering .......................................................................................... 481.3.4 Runtime Environment ............................................................................................................ 49
2. PowerPC Processor Element ................................................................................... 512.1 PowerPC Processor Unit ................................................................................................................ 522.2 PowerPC Processor Storage Subsystem ....................................................................................... 542.3 PPE Registers ................................................................................................................................. 542.4 PowerPC Instructions ...................................................................................................................... 57
2.4.1 Data Types ............................................................................................................................ 572.4.2 Addressing Modes ................................................................................................................. 572.4.3 Instructions ............................................................................................................................ 58
2.5 Vector/SIMD Multimedia Extension Instructions ............................................................................. 592.5.1 SIMD Vectorization ................................................................................................................ 592.5.2 Data Types ............................................................................................................................ 612.5.3 Addressing Modes ................................................................................................................. 612.5.4 Instruction Types ................................................................................................................... 612.5.5 Instructions ............................................................................................................................ 622.5.6 Graphics Rounding Mode ...................................................................................................... 62
-
Programming Handbook
Cell Broadband Engine
ContentsPage 4 of 884
Version 1.11May 12, 2008
2.6 Vector/SIMD Multimedia Extension C/C++ Language Intrinsics ..................................................... 622.6.1 Vector Data Types ................................................................................................................. 622.6.2 Vector Literals ........................................................................................................................ 632.6.3 Intrinsics ................................................................................................................................. 63
3. Synergistic Processor Elements .............................................................................. 653.1 Synergistic Processor Unit .............................................................................................................. 65
3.1.1 Local Storage ......................................................................................................................... 663.1.2 Register File ........................................................................................................................... 693.1.3 Execution Units ...................................................................................................................... 703.1.4 Floating-Point Support ........................................................................................................... 70
3.2 Memory Flow Controller .................................................................................................................. 723.2.1 Channels ................................................................................................................................ 743.2.2 Mailboxes and Signalling ....................................................................................................... 743.2.3 MFC Commands and Command Queues .............................................................................. 743.2.4 Direct Memory Access Controller .......................................................................................... 753.2.5 Synergistic Memory Management Unit .................................................................................. 76
3.3 SPU Instruction Set ......................................................................................................................... 763.3.1 Data Types ............................................................................................................................. 763.3.2 Instructions ............................................................................................................................. 77
3.4 SPU C/C++ Language Intrinsics ..................................................................................................... 773.4.1 Vector Data Types ................................................................................................................. 783.4.2 Vector Literals ........................................................................................................................ 783.4.3 Intrinsics ................................................................................................................................. 78
4. Virtual Storage Environment .................................................................................... 794.1 Introduction ...................................................................................................................................... 794.2 PPE Memory Management ............................................................................................................. 80
4.2.1 Memory Management Unit ..................................................................................................... 814.2.2 Address-Translation Sequence .............................................................................................. 824.2.3 Enabling Address Translation ................................................................................................ 834.2.4 Effective-to-Real-Address Translation ................................................................................... 834.2.5 Segmentation ......................................................................................................................... 854.2.6 Paging .................................................................................................................................... 874.2.7 Translation Lookaside Buffer ................................................................................................. 934.2.8 Real Addressing Mode ......................................................................................................... 1004.2.9 Effective Addresses in 32-Bit Mode ..................................................................................... 103
4.3 SPE Memory Management ........................................................................................................... 1034.3.1 Synergistic Memory Management Unit ................................................................................ 1034.3.2 Enabling Address Translation .............................................................................................. 1044.3.3 Segmentation ....................................................................................................................... 1054.3.4 Paging .................................................................................................................................. 1084.3.5 Translation Lookaside Buffer ............................................................................................... 1084.3.6 Real Addressing Mode ......................................................................................................... 1174.3.7 Exception Handling and Storage Protection ........................................................................ 118
5. Memory Map ............................................................................................................. 1215.1 Introduction .................................................................................................................................... 121
5.1.1 Configuration-Ring Initialization ........................................................................................... 123
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 5 of 884
5.1.2 Allocated Regions of Memory .............................................................................................. 1235.1.3 Reserved Regions of Memory ............................................................................................. 1265.1.4 The Guarded Attribute ......................................................................................................... 126
5.2 PPE Memory Map ......................................................................................................................... 1265.2.1 PPE Memory-Mapped Registers ......................................................................................... 1265.2.2 Predefined Real-Address Locations .................................................................................... 127
5.3 SPE Memory Map ......................................................................................................................... 1275.3.1 SPE Local-Storage Memory Map ........................................................................................ 1285.3.2 SPE Memory-Mapped Registers ......................................................................................... 129
5.4 BEI Memory-Mapped Registers .................................................................................................... 1305.4.1 I/O ........................................................................................................................................ 131
6. Cache Management ................................................................................................ 1336.1 PPE Caches .................................................................................................................................. 133
6.1.1 Configuration ....................................................................................................................... 1346.1.2 Overview of PPE Cache ...................................................................................................... 1346.1.3 L1 Caches ........................................................................................................................... 1366.1.4 Branch History Table and Link Stack .................................................................................. 1416.1.5 L2 Cache ............................................................................................................................. 1416.1.6 Instructions for Managing the L1 and L2 Caches ................................................................ 1466.1.7 Effective-to-Real-Address Translation Arrays ..................................................................... 1506.1.8 Translation Lookaside Buffer ............................................................................................... 1506.1.9 Instruction-Prefetch Queue Management ............................................................................ 1506.1.10 Load Subunit Management ............................................................................................... 150
6.2 SPE Caches .................................................................................................................................. 1516.2.1 Translation Lookaside Buffer ............................................................................................... 1516.2.2 Atomic Unit and Cache ........................................................................................................ 151
6.3 Replacement Management Tables ............................................................................................... 1546.3.1 PPE TLB Replacement Management Table ........................................................................ 1546.3.2 PPE L2 Replacement Management Table .......................................................................... 1576.3.3 SPE TLB Replacement Management Table ........................................................................ 158
6.4 I/O Address-Translation Caches ................................................................................................... 159
7. I/O Architecture ....................................................................................................... 1617.1 Overview ....................................................................................................................................... 161
7.1.1 I/O Interfaces ....................................................................................................................... 1617.1.2 System Configurations ........................................................................................................ 1627.1.3 I/O Addressing ..................................................................................................................... 164
7.2 Data and Access Types ................................................................................................................ 1657.2.1 Data Lengths and Alignments ............................................................................................. 1657.2.2 Atomic Accesses ................................................................................................................. 166
7.3 Registers and Data Structures ...................................................................................................... 1667.3.1 IOCmd Configuration Register ............................................................................................ 1667.3.2 I/O Segment Table Origin Register ..................................................................................... 1667.3.3 I/O Segment Table .............................................................................................................. 1697.3.4 I/O Page Table .................................................................................................................... 1717.3.5 IOC Base Address Registers ............................................................................................... 1747.3.6 I/O Exception Status Register ............................................................................................. 176
-
Programming Handbook
Cell Broadband Engine
ContentsPage 6 of 884
Version 1.11May 12, 2008
7.4 I/O Address Translation ................................................................................................................. 1767.4.1 Translation Overview ........................................................................................................... 1767.4.2 Translation Steps ................................................................................................................. 178
7.5 I/O Exceptions ............................................................................................................................... 1807.5.1 I/O Exception Causes .......................................................................................................... 1807.5.2 I/O Exception Status Register .............................................................................................. 1817.5.3 I/O Exception Mask Register ............................................................................................... 1817.5.4 I/O-Exception Response ...................................................................................................... 181
7.6 I/O Address-Translation Caches ................................................................................................... 1817.6.1 IOST Cache ......................................................................................................................... 1817.6.2 IOPT Cache ......................................................................................................................... 183
7.7 I/O Storage Model ......................................................................................................................... 1887.7.1 Memory Coherence ............................................................................................................. 1887.7.2 Storage-Access Ordering ..................................................................................................... 1897.7.3 I/O Accesses to Other I/O Units through an IOIF ................................................................. 1947.7.4 Examples ............................................................................................................................. 195
8. Resource Allocation Management ......................................................................... 2038.1 Introduction .................................................................................................................................... 2038.2 Requesters .................................................................................................................................... 206
8.2.1 PPE and SPEs ..................................................................................................................... 2068.2.2 I/O ........................................................................................................................................ 206
8.3 Managed Resources ..................................................................................................................... 2078.4 Tokens ........................................................................................................................................... 208
8.4.1 Tokens Required for Single-CBEA-Processor Systems ...................................................... 2088.4.2 Operations Requiring No Token .......................................................................................... 2128.4.3 Tokens Required for Multi-CBEA-Processor Systems ......................................................... 213
8.5 Token Manager ............................................................................................................................. 2138.5.1 Request Tracking ................................................................................................................. 2138.5.2 Token Granting .................................................................................................................... 2148.5.3 Unallocated RAG ................................................................................................................. 2158.5.4 High-Priority Token Requests .............................................................................................. 2168.5.5 Memory Tokens ................................................................................................................... 2168.5.6 I/O Tokens ........................................................................................................................... 2208.5.7 Unused Tokens .................................................................................................................... 2208.5.8 Memory Banks, IOIF Allocation Rates, and Unused Tokens ............................................... 2208.5.9 Token Request and Grant Example ..................................................................................... 2218.5.10 Allocation Percentages ...................................................................................................... 2258.5.11 Efficient Determination of TKM Priority Register Values .................................................... 2268.5.12 Feedback from Resources to Token Manager ................................................................... 228
8.6 Configuration of PPE, SPEs, MIC, and IOC .................................................................................. 2298.6.1 Configuration Register Summary ......................................................................................... 2298.6.2 SPE Address-Range Checking ............................................................................................ 231
8.7 Changing Resource-Management Registers with MMIO Stores ................................................... 2338.7.1 Changes to the RAID ........................................................................................................... 2338.7.2 Changing a Requester’s Token-Request Enable ................................................................. 2348.7.3 Changing a Requester’s Address Map ................................................................................ 2358.7.4 Changing a Requester’s Use of Multiple Tokens per Access .............................................. 236
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 7 of 884
8.7.5 Changing Feedback to the TKM .......................................................................................... 2368.7.6 Changing TKM Registers .................................................................................................... 236
8.8 Latency Between Token Requests and Token Grants .................................................................. 2378.9 Hypervisor Interfaces .................................................................................................................... 237
9. PPE Interrupts ......................................................................................................... 2399.1 Introduction ................................................................................................................................... 2399.2 Summary of Interrupt Architecture ................................................................................................ 2409.3 Interrupt Registers ......................................................................................................................... 2449.4 Interrupt Handling .......................................................................................................................... 2459.5 Interrupt Vectors and Definitions ................................................................................................... 246
9.5.1 System Reset Interrupt (Selectable or x‘00..00000100’) ..................................................... 2489.5.2 Machine Check Interrupt (x‘00..00000200’) ......................................................................... 2499.5.3 Data Storage Interrupt (x‘00..00000300’) ............................................................................ 2519.5.4 Data Segment Interrupt (x‘00..00000380’) .......................................................................... 2529.5.5 Instruction Storage Interrupt (x‘00..00000400’) ................................................................... 2539.5.6 Instruction Segment Interrupt (x‘00..00000480’) ................................................................. 2549.5.7 External Interrupt (x‘00..00000500’) .................................................................................... 2549.5.8 Alignment Interrupt (x‘00..00000600’) ................................................................................. 2559.5.9 Program Interrupt (x‘00..00000700’) .................................................................................... 2569.5.10 Floating-Point Unavailable Interrupt (x‘00..00000800’) ..................................................... 2579.5.11 Decrementer Interrupt (x‘00..00000900’) ........................................................................... 2579.5.12 Hypervisor Decrementer Interrupt (x‘00..00000980’) ........................................................ 2589.5.13 System Call Interrupt (x‘00..00000C00’) ............................................................................ 2589.5.14 Trace Interrupt (x‘00..00000D00’) ...................................................................................... 2599.5.15 VXU Unavailable Interrupt (x‘00..00000F20’) .................................................................... 2609.5.16 System Error Interrupt (x‘00..00001200’) .......................................................................... 2609.5.17 Maintenance Interrupt (x‘00..00001600’) ........................................................................... 2619.5.18 Thermal Management Interrupt (x‘00..00001800’) ............................................................ 263
9.6 Direct External Interrupts .............................................................................................................. 2659.6.1 Interrupt Presentation .......................................................................................................... 2659.6.2 IIC Interrupt Registers ......................................................................................................... 2669.6.3 SPU and MFC Interrupts ..................................................................................................... 2719.6.4 Other External Interrupts ..................................................................................................... 272
9.7 Mediated External Interrupts ......................................................................................................... 2769.7.1 Mediated External Interrupt Architecture ............................................................................. 2769.7.2 Mediated External Interrupt Implementation ........................................................................ 279
9.8 SPU and MFC Interrupts Routed to the PPE ................................................................................ 2809.8.1 Interrupt Types and Classes ................................................................................................ 2809.8.2 Interrupt Registers ............................................................................................................... 2829.8.3 Interrupt Definitions ............................................................................................................. 2869.8.4 Handling SPU and MFC Interrupts ...................................................................................... 289
9.9 Thread Targets for Interrupts ........................................................................................................ 2919.10 Interrupt Priorities ........................................................................................................................ 2919.11 Interrupt Latencies ...................................................................................................................... 2939.12 Machine State Register Settings Due to Interrupts ..................................................................... 2939.13 Interrupts and Hypervisor ............................................................................................................ 2959.14 Interrupts and Multithreading ...................................................................................................... 295
-
Programming Handbook
Cell Broadband Engine
ContentsPage 8 of 884
Version 1.11May 12, 2008
9.15 Checkstop ................................................................................................................................... 2959.16 Use of an External Interrupt Controller ........................................................................................ 2969.17 Relationship Between CBEA Processor and PowerPC Interrupts .............................................. 296
10. PPE Multithreading ................................................................................................ 29910.1 Multithreading Guidelines ............................................................................................................ 29910.2 Thread Resources ....................................................................................................................... 301
10.2.1 Registers ............................................................................................................................ 30110.2.2 Arrays, Queues, and Other Structures ............................................................................... 30210.2.3 Pipeline Sharing and Support for Multithreading ............................................................... 303
10.3 Thread States .............................................................................................................................. 30510.3.1 Privilege States .................................................................................................................. 30510.3.2 Suspended or Enabled State ............................................................................................. 30610.3.3 Blocked or Stalled State ..................................................................................................... 306
10.4 Thread Control and Status Registers .......................................................................................... 30610.4.1 Machine State Register (MSR) ............................................................................................. 30710.4.2 Hardware Implementation Register 0 (HID0) ...................................................................... 30810.4.3 Logical Partition Control Register (LPCR) ............................................................................ 30910.4.4 Control Register (CTRL) ...................................................................................................... 31010.4.5 Thread Status Register Local and Remote (TSRL and TSRR) .............................................. 31110.4.6 Thread Switch Control Register (TSCR) .............................................................................. 31210.4.7 Thread Switch Time-Out Register (TTR) ............................................................................. 313
10.5 Thread Priority ............................................................................................................................. 31310.5.1 Thread-Priority Combinations ............................................................................................ 31310.5.2 Choosing Useful Thread Priorities ..................................................................................... 31410.5.3 Examples of Priority Combinations on Instruction Scheduling ........................................... 316
10.6 Thread Control and Configuration ............................................................................................... 31910.6.1 Resuming and Suspending Threads .................................................................................. 31910.6.2 Setting the Instruction-Dispatch Policy: Thread Priority and Temporary Stalling ............... 31910.6.3 Preventing Starvation: Forward-Progress Monitoring ........................................................ 32110.6.4 Multithreading Operating-State Switch .............................................................................. 322
10.7 Pipeline Events and Instruction Dispatch .................................................................................... 32210.7.1 Instruction-Dispatch Rules ................................................................................................. 32210.7.2 Pipeline Events that Stall Instruction Dispatch ................................................................... 323
10.8 Suspending and Resuming Threads ........................................................................................... 32510.8.1 Suspending a Thread ......................................................................................................... 32510.8.2 Resuming a Thread ........................................................................................................... 32510.8.3 Exception and Interrupt Interactions With a Suspended Thread ....................................... 32710.8.4 Thread Targets and Behavior for Interrupts ....................................................................... 328
11. Logical Partitions and a Hypervisor .................................................................... 33111.1 Introduction .................................................................................................................................. 331
11.1.1 The Hypervisor and the Operating Systems ...................................................................... 33211.1.2 Partitioning Resources ....................................................................................................... 33211.1.3 An Example Flowchart ....................................................................................................... 334
11.2 PPE Logical-Partitioning Facilities ............................................................................................... 33611.2.1 Enabling Hypervisor State ................................................................................................. 33611.2.2 Hypervisor-State Registers ................................................................................................ 336
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 9 of 884
11.2.3 Controlling Real Memory ................................................................................................... 33711.2.4 Controlling Interrupts and Environment ............................................................................. 343
11.3 SPE Logical-Partitioning Facilities .............................................................................................. 34611.3.1 Access Privilege ................................................................................................................ 34611.3.2 Memory-Management Facilities ........................................................................................ 34711.3.3 Controlling Interrupts ......................................................................................................... 34911.3.4 Other SPE Management Facilities .................................................................................... 349
11.4 I/O-Address Translation .............................................................................................................. 35111.4.1 IOC Memory Management Units ....................................................................................... 35111.4.2 I/O Segment and Page Tables .......................................................................................... 351
11.5 Resource Allocation Management .............................................................................................. 35211.5.1 Combining Logical Partitions with Resource Allocation ..................................................... 35211.5.2 Resource Allocation Groups and the Token Manager ....................................................... 352
11.6 Power Management .................................................................................................................... 35311.6.1 Entering Low-Power States ............................................................................................... 35311.6.2 Thread State Suspension and Resumption ....................................................................... 353
11.7 Fault Isolation .............................................................................................................................. 35411.8 Code Sample .............................................................................................................................. 354
11.8.1 Error Codes and Hypervisor-Call (hcall) Tokens ............................................................... 35411.8.2 C Functions for PowerPC 64-bit ELF Hypervisor Call ....................................................... 354
12. SPE Context Switching ........................................................................................ 35712.1 Introduction ................................................................................................................................. 35712.2 Data Structures ........................................................................................................................... 358
12.2.1 Local Storage Context Save Area ..................................................................................... 35812.2.2 Context Save Area ............................................................................................................ 358
12.3 Overview of SPE Context-Switch Sequence ............................................................................... 35812.3.1 Save SPE Context ............................................................................................................. 36012.3.2 Restore SPE Context ........................................................................................................ 360
12.4 Implementation Considerations ................................................................................................... 36212.4.1 Locking .............................................................................................................................. 36212.4.2 Watchdog Timers .............................................................................................................. 36212.4.3 Waiting for Events ............................................................................................................. 36212.4.4 PPE’s SPU Channel Access Facility ................................................................................. 36212.4.5 SPE Interrupts ................................................................................................................... 36212.4.6 Suspending the MFC DMA Queue .................................................................................... 36312.4.7 SPE Context-Save Sequence and Context-Restore Sequence Code .............................. 36312.4.8 SPE Parameter Passing .................................................................................................... 36312.4.9 Storage for SPE Context-Save Sequence and Context-Restore Sequence Code ............ 36312.4.10 Harvesting an SPE .......................................................................................................... 36412.4.11 Scheduling ....................................................................................................................... 36412.4.12 Light-Weight SPE Context Save ...................................................................................... 364
12.5 Detailed Steps for SPE Context Switch ...................................................................................... 36512.5.1 Context-Save Sequence .................................................................................................... 36512.5.2 Context-Restore Sequence ............................................................................................... 371
12.6 Considerations for Hypervisors ................................................................................................... 379
-
Programming Handbook
Cell Broadband Engine
ContentsPage 10 of 884
Version 1.11May 12, 2008
13. Time Base and Decrementers .............................................................................. 38113.1 Introduction .................................................................................................................................. 38113.2 Time-Base Facility ....................................................................................................................... 381
13.2.1 Clock Domains ................................................................................................................... 38113.2.2 Time-Base Registers ......................................................................................................... 38213.2.3 Time-Base Frequency ........................................................................................................ 38313.2.4 Time-Base Sync Mode Controls ........................................................................................ 38413.2.5 Reading and Writing the TB Register ................................................................................ 38813.2.6 Computing Time-of-Day ..................................................................................................... 389
13.3 Decrementers .............................................................................................................................. 38913.3.1 PPE Decrementers ............................................................................................................ 38913.3.2 SPE Decrementers ............................................................................................................ 39013.3.3 Using an SPU Decrementer to Monitor SPU Code Performance ...................................... 391
14. Objects, Executables, and SPE Loading ............................................................. 39714.1 Introduction .................................................................................................................................. 39714.2 ELF Overview and Extensions .................................................................................................... 398
14.2.1 Overview ............................................................................................................................ 39814.2.2 SPE-ELF Extensions ......................................................................................................... 399
14.3 Runtime Initializations and Requirements ................................................................................... 40114.3.1 PPE Initial Machine State .................................................................................................. 40114.3.2 SPE Initial Machine State for Linux .................................................................................... 405
14.4 Linker Requirements ................................................................................................................... 40714.4.1 SPE Linker Requirements .................................................................................................. 40714.4.2 PPE Linker Requirements .................................................................................................. 408
14.5 The CESOF Format .................................................................................................................... 40814.5.1 CESOF Overview ............................................................................................................... 40914.5.2 CESOF Use Convention of ELF ........................................................................................ 40914.5.3 Embedding an SPE-ELF Executable in a PPE-ELF Object: The .spu.elf Section ......... 41014.5.4 The spe_program_handle Data Structure ........................................................................... 41114.5.5 The TOE: Accessing Symbol Values Defined in EA Space ............................................... 41314.5.6 Future Software Tool Chain Enhancements for CESOF ................................................... 417
14.6 SPE Runtime Loader ................................................................................................................... 41814.6.1 Runtime Loader Overview ................................................................................................. 41814.6.2 SPE Runtime Loader Requirements .................................................................................. 41914.6.3 Example SPE Runtime Loader Framework Definition ....................................................... 421
14.7 SPE Execution Environment ....................................................................................................... 42714.7.1 Signal Types for the SPE Stop-and-Signal Instruction ...................................................... 427
15. Power and Thermal Management ........................................................................ 42915.1 Power Management .................................................................................................................... 429
15.1.1 Slow State .......................................................................................................................... 43015.1.2 PPE Pause (0) State .......................................................................................................... 43115.1.3 SPU Pause State ............................................................................................................... 43215.1.4 MFC Pause State ............................................................................................................... 432
15.2 Thermal Management ................................................................................................................. 43215.2.1 Thermal-Management Operation ....................................................................................... 43315.2.2 Configuration-Ring Settings ............................................................................................... 43515.2.3 Thermal Registers .............................................................................................................. 435
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 11 of 884
15.2.4 Thermal Sensor Status Registers ...................................................................................... 43515.2.5 Thermal Sensor Interrupt Registers .................................................................................. 43615.2.6 Dynamic Thermal-Management Registers ........................................................................ 438
16. Performance Monitoring ...................................................................................... 44316.1 How It Works ............................................................................................................................... 44416.2 Events (Signals) .......................................................................................................................... 44416.3 Performance Counters ................................................................................................................ 44416.4 Trace Array ................................................................................................................................. 445
17. SPE Channel and Related MMIO Interface ......................................................... 44717.1 Introduction ................................................................................................................................. 447
17.1.1 An SPE’s Use of its Own Channels ................................................................................... 44717.1.2 Access to Channel Functions by the PPE and other SPEs ............................................... 44817.1.3 Channel Characteristics .................................................................................................... 44817.1.4 Channel Summary ............................................................................................................. 44917.1.5 Channel Instructions .......................................................................................................... 45217.1.6 Channel Capacity and Blocking ......................................................................................... 453
17.2 SPU Event-Management Channels ............................................................................................ 45317.3 SPU Signal-Notification Channels ............................................................................................... 45417.4 SPU Decrementer ....................................................................................................................... 454
17.4.1 SPU Write Decrementer Channel ...................................................................................... 45417.4.2 SPU Read Decrementer Channel ..................................................................................... 455
17.5 MFC Write Multisource Synchronization Request Channel ........................................................ 45517.6 SPU Read Machine Status Channel ........................................................................................... 45617.7 SPU Write State Save-and-Restore Channel ............................................................................. 45617.8 SPU Read State Save-and-Restore Channel ............................................................................. 45717.9 MFC Command Parameter Channels ......................................................................................... 457
17.9.1 MFC Local Storage Address Channel ............................................................................... 45917.9.2 MFC Effective Address High Channel ............................................................................... 46017.9.3 MFC Effective Address Low or List Address Channel ....................................................... 46017.9.4 MFC Transfer Size or List Size Channel ........................................................................... 46117.9.5 MFC Command Tag Identification Channel ...................................................................... 46217.9.6 MFC Class ID and MFC Command Opcode Channel ....................................................... 463
17.10 MFC Tag-Group Management Channels .................................................................................. 46317.10.1 MFC Write Tag-Group Query Mask Channel .................................................................. 46417.10.2 MFC Read Tag-Group Query Mask Channel .................................................................. 46417.10.3 MFC Write Tag Status Update Request Channel ............................................................ 46417.10.4 MFC Read Tag-Group Status Channel ........................................................................... 46617.10.5 MFC Read List Stall-and-Notify Tag Status Channel ...................................................... 46617.10.6 MFC Write List Stall-and-Notify Tag Acknowledgment Channel ..................................... 467
17.11 MFC Read Atomic Command Status Channel .......................................................................... 46817.12 SPU Mailbox Channels ............................................................................................................. 469
18. SPE Events ............................................................................................................ 47118.1 Introduction ................................................................................................................................. 47118.2 Events and Event-Management Channels .................................................................................. 472
18.2.1 Event Conditions and Bit Definitions for Event-Management Channels ............................ 472
-
Programming Handbook
Cell Broadband Engine
ContentsPage 12 of 884
Version 1.11May 12, 2008
18.2.2 Pending Event Register (Internal, SPE-Hidden) ................................................................ 47318.2.3 SPU Read Event Status ..................................................................................................... 47418.2.4 SPU Write Event Mask ...................................................................................................... 47518.2.5 SPU Write Event Acknowledgment .................................................................................... 47518.2.6 SPU Read Event Mask ...................................................................................................... 476
18.3 SPU Interrupt Facility .................................................................................................................. 47618.4 Interrupt Address Save-and-Restore Channels .......................................................................... 477
18.4.1 SPU Read State Save-and-Restore .................................................................................. 47718.4.2 SPU Write State Save-and-Restore ................................................................................... 47718.4.3 Nested Interrupts Using SPU Write State Save-and-Restore ............................................ 477
18.5 Event-Handling Protocols ............................................................................................................ 47818.5.1 Synchronous Event Handling Using Polling or Stalling ...................................................... 47818.5.2 Asynchronous Event Handling Using Interrupts ................................................................ 47918.5.3 Protecting Critical Sections from Interruption ..................................................................... 480
18.6 Event-Specific Handling Guidelines ............................................................................................ 48118.6.1 Protocol with Multiple Events Enabled ............................................................................... 48118.6.2 Procedure for Handling the Multisource Synchronization Event ........................................ 48318.6.3 Procedure for Handling the Privileged Attention Event ...................................................... 48418.6.4 Procedure for Handling the Lock-Line Reservation Lost Event ......................................... 48518.6.5 Procedure for Handling the Signal-Notification 1 Available Event ..................................... 48618.6.6 Procedure for Handling the Signal-Notification 2 Available Event ..................................... 48718.6.7 Procedure for Handling the SPU Write Outbound Mailbox Available Event ...................... 48818.6.8 Procedure for Handling the SPU Write Outbound Interrupt Mailbox Available Event ........ 48918.6.9 Procedure for Handling the SPU Decrementer Event ........................................................ 48918.6.10 Procedure for Handling the SPU Read Inbound Mailbox Available Event ....................... 49118.6.11 Procedure for Handling the MFC SPU Command Queue Available Event ...................... 49218.6.12 Procedure for Handling the DMA List Command Stall-and-Notify Event ......................... 49218.6.13 Procedure for Handling the Tag-Group Status Update Event .......................................... 494
18.7 Developing a Basic Interrupt Handler .......................................................................................... 49518.7.1 Basic Interrupt Protocol Features and Design ................................................................... 49518.7.2 FLIH Design ....................................................................................................................... 49618.7.3 SLIH Design and Registering SLIH Functions ................................................................... 49818.7.4 Example Application Code ................................................................................................. 500
18.8 Nested Interrupt Handling ........................................................................................................... 50118.8.1 Nested Handler Design ...................................................................................................... 50218.8.2 FLIH Design for Nested Interrupts ..................................................................................... 502
18.9 Using a Dedicated Interrupt Stack ............................................................................................... 50418.10 Sample Applications .................................................................................................................. 506
18.10.1 SPU Decrementer Event .................................................................................................. 50618.10.2 Tag-Group Status Update Event ...................................................................................... 50718.10.3 DMA List Command Stall-and-Notify Event ..................................................................... 50818.10.4 MFC SPU Command Queue Available Event .................................................................. 51018.10.5 SPU Read Inbound Mailbox Available Event ................................................................... 51118.10.6 SPU Signal-Notification Available Event .......................................................................... 51118.10.7 Lock-Line Reservation Lost Event ................................................................................... 51118.10.8 Privileged Attention Event ................................................................................................ 512
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 13 of 884
19. DMA Transfers and Interprocessor Communication ......................................... 51319.1 Introduction ................................................................................................................................. 51319.2 MFC Commands ......................................................................................................................... 514
19.2.1 DMA Commands ............................................................................................................... 51619.2.2 DMA List Commands ......................................................................................................... 51819.2.3 Synchronization Commands .............................................................................................. 51819.2.4 Command Modifiers .......................................................................................................... 51919.2.5 Tag Groups ........................................................................................................................ 51919.2.6 MFC Command Issue ........................................................................................................ 52119.2.7 Replacement Class ID and Transfer Class ID ................................................................... 52119.2.8 DMA-Command Completion .............................................................................................. 522
19.3 PPE-Initiated DMA Transfers ...................................................................................................... 52319.3.1 MFC Command Issue ........................................................................................................ 52319.3.2 MFC Command-Queue Control Registers ........................................................................ 52519.3.3 DMA-Command Issue Status and Errors .......................................................................... 525
19.4 SPE-Initiated DMA Transfers ...................................................................................................... 52919.4.1 MFC Command Issue ........................................................................................................ 53019.4.2 MFC Command-Queue Monitoring Channels ................................................................... 53119.4.3 DMA Command Issue Status and Errors .......................................................................... 53219.4.4 DMA List Command Example ........................................................................................... 536
19.5 Performance Guidelines for MFC Commands ............................................................................ 53919.6 Mailboxes .................................................................................................................................... 539
19.6.1 Reading and Writing Mailboxes ......................................................................................... 54019.6.2 Mailbox Blocking ................................................................................................................ 54119.6.3 Dealing with Anticipated Messages ................................................................................... 54119.6.4 Uses of Mailboxes ............................................................................................................. 54119.6.5 SPU Outbound Mailboxes ................................................................................................. 54219.6.6 SPU Inbound Mailbox ........................................................................................................ 547
19.7 Signal Notification ....................................................................................................................... 55119.7.1 SPU Signalling Channels .................................................................................................. 55119.7.2 Uses of Signaling ............................................................................................................... 55219.7.3 Mode Configuration ........................................................................................................... 55219.7.4 SPU Signal Notification 1 Channel .................................................................................... 55319.7.5 SPU Signal Notification 2 Channel .................................................................................... 55319.7.6 Sending Signals ................................................................................................................. 55319.7.7 Receiving Signals .............................................................................................................. 55619.7.8 Differences Between Mailboxes and Signal Notification ................................................... 559
20. Shared-Storage Synchronization ........................................................................ 56120.1 Shared-Storage Ordering ............................................................................................................ 561
20.1.1 Storage Model ................................................................................................................... 56120.1.2 PPE Ordering Instructions ................................................................................................. 56420.1.3 SPU Ordering Instructions ................................................................................................. 56820.1.4 MFC Ordering Mechanisms ............................................................................................... 57220.1.5 MFC Multisource Synchronization Facility ......................................................................... 57720.1.6 Scenarios for Using Ordering Mechanisms ....................................................................... 584
20.2 PPE Atomic Synchronization ...................................................................................................... 58520.2.1 Atomic Synchronization Instructions .................................................................................. 585
-
Programming Handbook
Cell Broadband Engine
ContentsPage 14 of 884
Version 1.11May 12, 2008
20.2.2 PPE Synchronization Primitives ......................................................................................... 58720.2.3 SPE Synchronization Primitives ......................................................................................... 590
20.3 SPE Atomic Synchronization ....................................................................................................... 59720.3.1 MFC Commands for Atomic Updates ................................................................................ 59720.3.2 The MFC Read Atomic Command Status Channel ........................................................... 59920.3.3 Avoiding Livelocks ............................................................................................................. 59920.3.4 Synchronization Primitives ................................................................................................. 601
21. Parallel Programming ........................................................................................... 60921.1 Challenges .................................................................................................................................. 60921.2 Patterns of Parallel Programming ............................................................................................... 609
21.2.1 Terminology ....................................................................................................................... 61021.2.2 Finding Parallelism ............................................................................................................. 61121.2.3 Strategies for Parallel Programming .................................................................................. 612
21.3 Steps for Parallelizing a Program ................................................................................................ 61421.3.1 Step 1: Understand the Problem ........................................................................................ 61421.3.2 Step 2: Choose Programming Tools and Technology ....................................................... 61421.3.3 Step 3: Develop High-Level Parallelization Strategy ......................................................... 61521.3.4 Step 4: Develop Low-Level Parallelization Strategy .......................................................... 61521.3.5 Step 5: Design Data Structures for Efficient Processing .................................................... 61521.3.6 Step 6: Iterate and Refine .................................................................................................. 61621.3.7 Step 7: Fine-Tune .............................................................................................................. 616
21.4 Levels of Parallelism in the CBEA Processors ............................................................................ 61721.4.1 SIMD Parallelization ........................................................................................................... 61821.4.2 Superscalar Parallelization ................................................................................................ 61821.4.3 Hardware Multithreading .................................................................................................... 61821.4.4 Multiple Execution Units ..................................................................................................... 61821.4.5 Multiple CBEA Processors ................................................................................................. 619
21.5 Tools for Parallelization ............................................................................................................... 62021.5.1 Language Extensions: Intrinsics and Directives ................................................................ 62021.5.2 Compiler Support for Single Shared-Memory Abstraction ................................................. 62121.5.3 OpenMP Directives ............................................................................................................ 62121.5.4 Compiler-Controlled Software Cache ................................................................................ 62321.5.5 Compiler and Runtime Support for Code Partitioning ........................................................ 62621.5.6 Thread Library .................................................................................................................... 627
22. SIMD Programming ............................................................................................... 62922.1 SIMD Basics ................................................................................................................................ 629
22.1.1 Converting Scalar Data to SIMD Data ............................................................................... 63022.1.2 Approaching SIMD Coding Methodically ........................................................................... 63422.1.3 Coding for Effective Auto-SIMDization ............................................................................... 645
22.2 Auto-SIMDizing Compilers .......................................................................................................... 64722.2.1 Motivation and Challenges ................................................................................................. 64822.2.2 Examples of Invalid and Valid SIMDization ....................................................................... 650
22.3 SIMDization Framework for a Compiler ...................................................................................... 65422.3.1 Phase 1: Basic-Block Aggregation ..................................................................................... 65622.3.2 Phase 2: Short-Loop Aggregation ...................................................................................... 65622.3.3 Phase 3: Loop-Level Aggregation ...................................................................................... 65722.3.4 Phase 4: Alignment Devirtualization .................................................................................. 658
-
Programming Handbook
Cell Broadband Engine
Version 1.11May 12, 2008
ContentsPage 15 of 884
22.3.5 Phase 5: Length Devirtualization ....................................................................................... 66322.3.6 Phase 6: SIMD Code Generation and Instruction Scheduling ........................................... 66422.3.7 SIMDization Example: Multiple Sources of SIMD Parallelism ........................................... 66522.3.8 SIMDization Example: Multiple Data Lengths ................................................................... 66822.3.9 Vector Operations and Mixed-Mode SIMDization ............................................................. 673
22.4 Other Compiler Optimizations ..................................................................................................... 67422.4.1 OpenMP ............................................................................................................................ 67422.4.2 Subword Data Types ......................................................................................................... 67422.4.3 Backend Scheduling for SPEs ........................................................................................... 67522.4.4 Interacting with Typical Optimizations ............................................................................... 676
23. Vector/SIMD Multimedia Extension and SPU Programming ............................. 67923.1 Architectural Differences ............................................................................................................. 679
23.1.1 Registers ........................................................................................................................... 68023.1.2 Data Types ........................................................................................................................ 68123.1.3 Instruction-Set Differences ................................................................................................ 682
23.2 Porting SIMD Code from the PPE to the SPEs ........................................................................... 68423.2.1 Code-Mapping Considerations .......................................................................................... 68423.2.2 Simple Macro Translation .................................................................................................. 68523.2.3 Full Functional Mapping .................................................................................................... 68823.2.4 Code-Portability Typedefs ................................................................................................. 68923.2.5 Compiler-Target Definition ................................................................................................. 689
24. SPE Programming Tips ........................................................................................ 69124.1 DMA Transfers ............................................................................................................................ 691
24.1.1 Initiating DMA Transfers from SPEs .................................................................................. 69224.1.2 Overlapping DMA Transfers and Computation .................................................................. 69224.1.3 DMA Transfers and LS Accesses ...................................................................................... 697
24.2 SPU Pipelines and Dual-Issue Rules .......................................................................................... 69824.3 Eliminating and Predicting Branches .......................................................................................... 699
24.3.1 Function-Inlining and Loop-Unrolling ................................................................................. 70024.3.2 Predication Using Select-Bits Instruction ........................................................................... 70024.3.3 Branch Hints ...................................................................................................................... 70124.3.4 Program-Based Branch Prediction .................................................................................... 70524.3.5 Profile or Linguistic Branch-Prediction ............................................................................... 70624.3.6 Software Branch-Target Address Cache ........................................................................... 70724.3.7 Using Control Flow to Record Branch History ................................................................... 708
24.4 Loop Unrolling and Pipelining ..................................................................................................... 70924.5 Offset Pointers ............................................................................................................................ 71224.6 Transformations and Table Lookups ........................................................................................... 712
24.6.1 The Shuffle-Bytes Instruction ............................................................................................ 71224.6.2 Fast SIMD 8-Bit Table Lookups ......................................................................................... 713
24.7 Integer Multiplies ......................................................................................................................... 71624.8 Scalar Code ................................................................................................................................ 716
24.8.1 Scalar Loads and Stores ................................................................................................... 71624.8.2 Promoting Scalar Data Types to Vector Data Types ......................................................... 718
24.9 Unaligned Loads ......................................................................................................................... 718
-
Programming Handbook
Cell Broadband Engine
ContentsPage 16 of 884
Version 1.11May 12, 2008
Appendix A. PPE Instruction Set and Intrinsics ....................................................... 723A.1 PowerPC Instruction Set ............................................................................................................... 723
A.1.1 Data Types .......................................................................................................................... 723A.1.2 PPE Instructions .................................................................................................................. 723A.1.3 Microcoded Instructions ....................................................................................................... 733
A.2 PowerPC Extensions in the PPE .................................................................................................. 740A.2.1 New PowerPC Instructions .................................................................................................. 740A.2.2 Implementation-Dependent Interpretation of PowerPC Instructions ................................... 743A.2.3 Optional PowerPC Instructions Implemented ...................................................................... 746A.2.4 PowerPC Instructions Not Implemented .............................................................................. 747A.2.5 Endian Support .................................................................................................................... 747
A.3 Vector/SIMD Multimedia Extension Instructions ........................................................................... 748A.3.1 Data Types .......................................................................................................................... 748A.3.2 Vector/SIMD Multimedia Extension Instructions .................................................................. 748A.3.3 Graphics Rounding Mode .................................................................................................... 752
A.4 C/C++ Language Extensions (Intrinsics) for Vector/SIMD Multimedia Extensions ....................... 754A.4.1 Vector Data Types ............................................................................................................... 754A.4.2 Vector Literals ...................................................................................................................... 755A.4.3 Intrinsics .............................................................................................................................. 756
A.5 Issue Rules ................................................................................................................................... 760A.6 Pipeline Stages ............................................................................................................................. 762
A.6.1 Instruction-Unit Pipeline ....................................................................................................... 762A.6.2 Vector/Scalar Unit Issue Queue .......................................................................................... 764A.6.3 Stall and Flush Points .......................................................................................................... 765
A.7 Compiler Optimizations ................................................................................................................. 767A.7.1 Instruction Arrangement ...................................................................................................... 767A.7.2 Avoiding Slow Instructions and Processor Modes ............................................................... 767A.7.3 Avoiding Dependency Stalls and Flushes ........................................................................... 768A.7.4 General Recommendations ................................................................................................. 770
Appendix B. SPU Instruction Set and Intrinsics ....................................................... 771B.1 SPU Instruction Set ....................................................................................................................... 771
B.1.1 Data Types .......................................................................................................................... 771B.1.2 Instructions .......................................................................................................................... 771B.1.3 Fetch and Issue Rules ......................................................................................................... 779B.1.4 Inline Prefetch and Instruction Runout ................................................................................ 783
B.2 C/C++ Language Extensions (Intrinsics) for SPU Instructions ..................................................... 784B.2.1 Vector Data Types ............................................................................................................... 784B.2.2 Vector Literals ...................................................................................................................... 786B.2.3 Intrinsics .............................................................................................................................. 787B.2.4 Inline Assembly ..........................