frank casilio computer engineering may 15, 1997 multithreaded processors
TRANSCRIPT
![Page 1: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/1.jpg)
Frank CasilioComputer Engineering
May 15, 1997
Multithreaded Processors
![Page 2: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/2.jpg)
1997 Frank Casilio2Computer Engineering
Problems with MultiProcessors
• Memory Latency
• Context Switching Time
• Communication/Synchronization Latency
• Cache Coherence• Writes To Memory
• Poor Programming Model
![Page 3: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/3.jpg)
1997 Frank Casilio3Computer Engineering
Motivation
• Reduce/Tolerate Memory Latency
• General Purpose Machine
• Scalability
• Shared Memory
• Simpler Programming Model
![Page 4: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/4.jpg)
1997 Frank Casilio4Computer Engineering
Typical Ways To Reduce Latency
• On-Chip Cache
• Shortens Round Trip To Memory
• Fast Buses & Networks
• Hardware Synchronization
• Prefetching
![Page 5: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/5.jpg)
1997 Frank Casilio5Computer Engineering
Multi-Threading: The Concept
• Support For Multiple Concurrent Hardware Contexts
• Tolerates Latency Instead of Reducing It
• Swap Contexts During Latencies
• Experimental Systems Have Existed Since The 50’s• Only 2 Commercial Systems Ever Produced
• HEP• Tera MTA
![Page 6: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/6.jpg)
1997 Frank Casilio6Computer Engineering
Parameters That Effect Efficiency
• Number Of Contexts Supported
• Switching Overhead
• Run Length (Granularity)
• Average Latency To Be Hidden
![Page 7: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/7.jpg)
1997 Frank Casilio7Computer Engineering
Switching Theory
• Determines How Often Contexts Switch
• Two Different Types
• Fine Grained• Coarse Grained
• Directly Related to Cost
![Page 8: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/8.jpg)
1997 Frank Casilio8Computer Engineering
Fine Grained Switching
• Switches Contexts Every Cycle
• Many Long Latencies Operations Tolerated
• Requires More Contexts• Workload Requirements
• Can Simplify Overall Processor Complexity
![Page 9: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/9.jpg)
1997 Frank Casilio9Computer Engineering
Coarse Grained Switching
• Switches Contexts After A Couple Of Cycles• Has Problems With Sporadic Latencies
• Requires Less Contexts
• Requires More Complex Processors
![Page 10: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/10.jpg)
1997 Frank Casilio10Computer Engineering
The TERA MTA
• First Commercial Multithreaded Machine Since 1978
• Uniform Shared Memory
• Scalable
• Direct Relationship b/w PE’s & Throughput
• Fine Grained Architecture
![Page 11: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/11.jpg)
1997 Frank Casilio11Computer Engineering
The Tera MTA Cont’d
• Torodial Interconnection
• 12 Million Dollar Base System
• 16-256 Processor Versions
![Page 12: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/12.jpg)
1997 Frank Casilio12Computer Engineering
Processor Characteristics
• Support For 128 Threads
• 16 Protection Domains
• 333 MHz Nominal Speed
• 0 Context Switching Overhead!!!
• 1 GFLOP Peak Performance
![Page 13: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/13.jpg)
1997 Frank Casilio13Computer Engineering
Processor Characteristics Cont’d
• Load-Store Architecture• 3 Addressing Modes
• 31 64-bit GPR’s
• 3 Operations Per Instruction• 1 Memory Reference• 1 Arithmetic Operation• 1 Control (i.e.. Branch)
• 6KW Of Power Dissipation Per Processor
![Page 14: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/14.jpg)
1997 Frank Casilio14Computer Engineering
Interconnection Network
• 3-D Torus Contains 3p/2 nodes
• Packet Switching
• 3 Cycles of Latency Per Node
• Messages Are Assigned Random Priorities
• 164 Bit Packets• 64 Bits Are Data• 2.67 GB/s Bandwidth In Each Direction
• 2 HIPPI Channels / Processor For Net Connection
![Page 15: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/15.jpg)
1997 Frank Casilio15Computer Engineering
Memory
• 8, 16, 32 and 64 Bit Addressable
• 4 Bits per Word Of Access State For Synchronization
• Memory Units Equipped With Error Correcting Code
• Memory Usage In Random To All Banks
• Either 2p or 4p Units, Interleaved 64 Ways
• 16 MB DRAM Chips
![Page 16: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/16.jpg)
1997 Frank Casilio16Computer Engineering
Input / Output• Maximum Strategy Gen5 XL RAID
• Sustained Bandwidth of 130 MB/s
• At Least p/16 Disk Arrays Are Required
• System Capacity of 300p GB
• 20p MB/s In Each Direction
![Page 17: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/17.jpg)
1997 Frank Casilio17Computer Engineering
Operating System
• Distributed Parallel Version Of Unix• Highly Concurrent Version Of Berkeley
• Allows Systems To Run p Tasks Truly Parallel
• Streams Are Dynamically Created w/o OS Intervention
• Processes Are Broken Up Into Tasks By OS
• Two Tier Scheduler Provides Better Resource Allocation• PL Scheduler• PB Scheduler
![Page 18: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/18.jpg)
1997 Frank Casilio18Computer Engineering
Software / Languages
• Implicit And Explicit Parallelism Is Allowed
• Automatic Parallelization Of:• C, C++ & Fortran By The Compiler
• High Degree of Cray Compatibility
• Easy To Program b/c Of Architecture
![Page 19: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/19.jpg)
1997 Frank Casilio19Computer Engineering
System Performance
• 3.84-12.8 Times Performance Of Cray T90/32
• 1K x 1K Matrix Multiple in 50 ms
• Integer Sort of 100M Keys in 36 ms
![Page 20: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/20.jpg)
1997 Frank Casilio20Computer Engineering
Conclusion
• Proven Effectiveness
• Logical Step For Multiprocessor Computers
• Still Very Pricey
• Allow General Purpose Workload
• Scalable
• Shared Memory
![Page 21: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/21.jpg)
1997 Frank Casilio21Computer Engineering
Questions?
![Page 22: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/22.jpg)
1997 Frank Casilio22Computer Engineering
Instruction Pipeline
![Page 23: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/23.jpg)
1997 Frank Casilio23Computer Engineering
Breakdown Of A Task
Task
Tea
m
Tea
m
Tea
m
Tea
m
VPVPVPVPVPVPVPVP
![Page 24: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/24.jpg)
1997 Frank Casilio24Computer Engineering
![Page 25: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649ed25503460f94be2099/html5/thumbnails/25.jpg)
1997 Frank Casilio25Computer Engineering
Deciding The Of Number Contexts