exascale computing: embedded style · 2019. 2. 15. · • 1 rack = 32 groups (=1.7 pf/s) • 384...
TRANSCRIPT
![Page 1: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/1.jpg)
1HPEC 9/22/09
Peter M. KoggeMcCourtney Chair in Computer Science & Engineering
Univ. of Notre DameIBM Fellow (retired)
May 5, 2009
TeraFlopEmbedded
PetaFlopDepartmental
ExaFlopData Center
Exascale Computing:Embedded Style
![Page 2: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/2.jpg)
2HPEC 9/22/09
Thesis
• Last 30 years: “Gigascale” computing first in a single vector processor“Terascale” computing first via several thousand microprocessors“Petascale” computing first via several hundred thousand cores
• Commercial technology: to dateAlways shrunk prior “XXX” scale to smaller form factorShrink, with speedup, enabled next “XXX” scale
• Space/Embedded computing has lagged far behindEnvironment forced implementation constraintsPower budget limited both clock rate & parallelism
• “Exascale” now on horizonBut beginning to suffer similar constraints as spaceAnd technologies to tackle exa challenges very relevant
Especially Energy/Power
![Page 3: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/3.jpg)
3HPEC 9/22/09
Topics
• The DARPA Exascale Technology Study• The 3 Strawmen Designs• A Deep Dive into Operand Access
![Page 4: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/4.jpg)
4HPEC 9/22/09
Disclaimers
This presentation reflects my interpretation of the final report of the Exascale working group only, and not necessarily of the universities, corporations, or other institutions to which the members are affiliated.
Furthermore, the material in this document does not reflect the official views, ideas, opinions and/or findings of DARPA, the Department of Defense, or of the United States government. http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf
Note: Separate Exa Studies on Resiliency & Software
![Page 5: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/5.jpg)
5HPEC 9/22/09
11 Academic 6 Non-Academic 5 “Government”+ Special Domain Experts
The Exascale Study Group
UT-AustinSteve KecklerColumbiaKeren Bergman
AffiliationNAMEAffiliationNAME
MicronDean KleinIntelShekhar Borkar
Notre DamePeter KoggeGTRIDan Campbell
UC-BerkeleyKathy YelickSTASherman Karp
HPStan WilliamsSTAJon Hiller
LSUThomas SterlingAFRLKerry Hill
SDSCAllan SnavelyDARPABill Harrod
CraySteve ScottNCSUPaul Franzon
AFRLAl ScarpeliIBMMonty Denneau
Georgia TechMark RichardsStanfordBill Dally
USC/ISIBob LucasIDABill Carlson
10+ Study Meetings over 2nd half 2007
![Page 6: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/6.jpg)
6HPEC 9/22/09
The DARPA Exascale Technology Study
• Exascale = 1,000X capability of Petascale• Exascale != Exaflops but
Exascale at the data center size => ExaflopsExascale at the “rack” size => Petaflops for departmental systems Exascale embedded => Teraflops in a cube
• Teraflops to Petaflops took 14+ years 1st Petaflops workshop: 1994Thru NSF studies, HTMT, HPCS …To give us to Peta now
• Study Questions: Can we ride silicon to Exa By 2015?What will such systems look like?
• Can we get 1 EF in 20 MW & 500 racks?Where are the Challenges?
![Page 7: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/7.jpg)
7HPEC 9/22/09
The Study’s Approach
• Baseline today’s:Commodity TechnologyArchitecturesPerformance (Linpack)
• Articulate scaling of potential application classes• Extrapolate roadmaps for
“Mainstream” technologiesPossible offshoots of mainstream technologiesAlternative and emerging technologies
• Use technology roadmaps to extrapolate use in “strawman” designs• Analyze results & Id “Challenges”
![Page 8: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/8.jpg)
8HPEC 9/22/09
Context: Focus on Energy
Not Power
![Page 9: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/9.jpg)
9HPEC 9/22/09
CMOS Energy 101
Vin Vg
R & C C
Dissipate CV2/2And store CV2/2
Dissipate CV2/2From Capacitance
One clock cycle dissipates C*V2
![Page 10: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/10.jpg)
10HPEC 9/22/09
ITRS Projections
Assume capacitance of a circuitscales as feature size
330X
15X
90nm picked as breakpoint because that’s when Vdd and thus clocks flattened
![Page 11: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/11.jpg)
11HPEC 9/22/09
Energy Efficiency
![Page 12: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/12.jpg)
12HPEC 9/22/09
The 3 Study Strawmen
![Page 13: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/13.jpg)
13HPEC 9/22/09
Architectures Considered
• Evolutionary Strawmen “Heavyweight” Strawman based on commodity-derived microprocessors“Lightweight” Strawman based on custom
microprocessors
• Aggressive Strawman“Clean Sheet of Paper” CMOS Silicon
![Page 14: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/14.jpg)
14HPEC 9/22/09
A Modern HPC System
Power DistributionMemory
9%
Routers33%
Random2%
Processors56%
Silicon Area Distribution
Processors3%
Routers3% Memory
86%
Random8%
Board Area DistributionMemory
10%
Processors24%
Routers8%
White Space50%
Random8%
![Page 15: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/15.jpg)
15HPEC 9/22/09
A “Light Weight” Node Alternative
2 Nodes per “Compute Card.” Each node:• A low power compute chip• Some memory chips• “Nothing Else”
System Architecture:• Multiple Identical Boards/Rack• Each board holds multiple Compute Cards• “Nothing Else”
• 2 simple dual issue cores• Each with dual FPUs• Memory controller• Large eDRAM L3• 3D message interface• Collective interface• All at subGHz clock“Packaging the Blue Gene/L supercomputer,” IBM J. R&D, March/May 2005
“Blue Gene/L compute chip: Synthesis, timing, and physical design,” IBM J. R&D, March/May 2005
![Page 16: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/16.jpg)
16HPEC 9/22/09
Possible System Power Models:Interconnect Driven
• Simplistic: A highly optimistic modelMax power per die grows as per ITRSPower for memory grows only linearly with # of chips
• Power per memory chip remains constantPower for routers and common logic remains constant
• Regardless of obvious need to increase bandwidthTrue if energy for bit moved/accessed decreases as fast as “flops per second” increase
• Fully Scaled: A pessimistic modelSame as Simplistic, except memory & router power grow with peak flops per chipTrue if energy for bit moved/accessed remains constant
• Real world: somewhere in between
![Page 17: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/17.jpg)
17HPEC 9/22/09
1 EFlop/s “Clean Sheet of Paper” Strawman
• 4 FPUs+RegFiles/Core (=6 GF @1.5GHz)• 1 Chip = 742 Cores (=4.5TF/s)
• 213MB of L1I&D; 93MB of L2• 1 Node = 1 Proc Chip + 16 DRAMs (16GB)• 1 Group = 12 Nodes + 12 Routers (=54TF/s)• 1 Rack = 32 Groups (=1.7 PF/s)
• 384 nodes / rack• 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)
• 166 MILLION cores• 680 MILLION FPUs• 3.6PB = 0.0036 bytes/flops• 68 MW w’aggressive assumptions
Sizing done by “balancing” power budgets with achievable capabilities
Largely due to Bill Dally, Stanford
![Page 18: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/18.jpg)
18HPEC 9/22/09
A Single Node (No Router)
NodePower
Characteristics:• 742 Cores; 4 FPUs/core • 16 GB DRAM• 290 Watts• 1.08 TF Peak @ 1.5GHz• ~3000 Flops per cycle
“Stacked” Memory
![Page 19: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/19.jpg)
19HPEC 9/22/09
1 Eflops Aggressive Strawman Data Center Power Distribution
• 12 nodes per group• 32 groups per rack• 583 racks• 1 EFlops/3.6 PB• 166 million cores• 67 MWatts
![Page 20: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/20.jpg)
20HPEC 9/22/09
Data Center Performance Projections
Exascale
But not at 20 MW!
Heavyweight
Lightweight
![Page 21: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/21.jpg)
21HPEC 9/22/09
Power EfficiencyExascale Study:
1 Eflops @ 20MW
UHPC RFI:Module: 80GF/WSystem: 50GF/W
Aggressive Strawman
![Page 22: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/22.jpg)
22HPEC 9/22/09
Energy Efficiency
![Page 23: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/23.jpg)
23HPEC 9/22/09
Data Center Total Concurrency
Billion-way concurrency
Million-way concurrency
Thousand-way concurrency
![Page 24: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/24.jpg)
24HPEC 9/22/09
Data Center Core Parallelism
170 Million CoresAND we will need10-100X moreThreading forLatency Management
![Page 25: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/25.jpg)
25HPEC 9/22/09
Key Take-Aways
• Developing Exascale systems really tough In any time frame, for any of the 3 classes
• Evolutionary Progression is at best 2020ishWith limited memory
• 4 key challenge areas Power:Concurrency: Memory CapacityResiliency
• Requires coordinated, cross-disciplinary efforts
![Page 26: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/26.jpg)
26HPEC 9/22/09
Embedded Exa:A Deep Dive into
Interconnectto Deliver Operands
![Page 27: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/27.jpg)
27HPEC 9/22/09
Tapers• Bandwidth Taper: How effective bandwidth of operands
being sent to a functional unit varies with location of the operands in memory hierarchy.
Units: Gbytes/sec, bytes/clock, operands per flop time
• Energy Taper: How energy cost of transporting operands to a functional unit varies with location of the operands in the memory hierarchy.
Units: Gbytes/Joule, operands per Joule
• Ideal tapers: “Flat”–doesn’t matter where operands are.
• Real tapers: huge dropoffs
![Page 28: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/28.jpg)
28HPEC 9/22/09
An Exa Single Node for Embedded
![Page 29: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/29.jpg)
29HPEC 9/22/09
The Access Path: Interconnect-Driven
ROUTER
MemoryMICROPROCESSOR
MoreRouters
Some sort of memory structure
![Page 30: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/30.jpg)
30HPEC 9/22/09
Sample Path – Off Module Access1. Check local L1 (miss)2. Go thru TLB to remote L3 (miss)3. Across chip to correct port (thru routing table RAM)4. Off-chip to router chip5. 3 times thru router and out6. Across microprocessor chip to correct DRAM I/F7. Off-chip to get to correct DRAM chip8. Cross DRAM chip to correct array block9. Access DRAM Array10. Return data to correct I/R11. Off-cchip to return data to microprocessor12. Across chip to Routre Table13. Across microprocessor to correct I/O port 14. Off-chip to correct router chip15. 3 times thru router and out16. Across microprocessor to correct core17. Save in L2, L1 as required18. Into Register File
![Page 31: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/31.jpg)
31HPEC 9/22/09
Taper Data from Exascale Report
![Page 32: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/32.jpg)
32HPEC 9/22/09
Bandwidth Tapers
1,000X Decrease Across the System!!
![Page 33: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/33.jpg)
33HPEC 9/22/09
Energy Analysis: Possible Storage Components
Cacti 5.3
Extrapolation
Technology for 2017 system
![Page 34: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/34.jpg)
34HPEC 9/22/09
Summary Transport Options
1 Operand/word = 72 bits
![Page 35: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/35.jpg)
35HPEC 9/22/09
Energy Tapers vs Goals
![Page 36: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/36.jpg)
36HPEC 9/22/09
Reachable Memory vs Energy
![Page 37: Exascale Computing: Embedded Style · 2019. 2. 15. · • 1 Rack = 32 Groups (=1.7 PF/s) • 384 nodes / rack • 3.6EB of Disk Storage included • 1 System = 583 Racks (=1 EF/s)](https://reader034.vdocuments.mx/reader034/viewer/2022052019/60332873798482361e5bd92c/html5/thumbnails/37.jpg)
37HPEC 9/22/09
What Does This Tell Us?• There’s a lot more energy sinks than you think
And we have to take all of them into consideration• Cost of Interconnect Dominates• Must design for on-board or stacked DRAM, with
DRAM blocks CLOSEtake into account physical placement
• Reach vs energy per access looks “linear”• For 80GF/W, cannot afford ANY memory references• We NEED to consider the entire access path:
Alternative memory technologies – reduce access costAlternative packaging costs – reduce bit movement costAlternative transport protocols – reduce # bits movedAlternative execution models – reduce # of movements