eece 571r (spring 2009) massively parallel/distributed platforms

80
EECE 571R (Spring 2009) Massively parallel/distributed platforms Matei Ripeanu matei at ece.ubc.ca

Upload: keely

Post on 22-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

EECE 571R (Spring 2009) Massively parallel/distributed platforms. Matei Ripeanu matei at ece.ubc.ca. Contact Info. Email : matei @ ece.ubc.ca Office : KAIS 4033 Office hours : by appointment (email me) Course page : http://www.ece.ubc.ca/~matei/EECE571/. EECE 571R: Course Goals. Primary - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EECE 571R (Spring 2009) Massively parallel/distributed platforms

EECE 571R (Spring 2009)

Massively parallel/distributed platforms

Matei Ripeanu

matei at ece.ubc.ca

Page 2: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Contact Info

Email: matei @ ece.ubc.caOffice: KAIS 4033Office hours: by appointment (email me)Course page:

http://www.ece.ubc.ca/~matei/EECE571/

Page 3: EECE 571R (Spring 2009) Massively parallel/distributed platforms

EECE 571R: Course Goals

• Primary– Gain understanding of fundamental issues that affect design of:

• Large-scale systems – massively parallel / massively distributed

– Survey main current research themes

– Gain experience with distributed systems research

• Research on: federated system, networks• Secondary

– By studying a set of outstanding papers, build knowledge of how to do & present research

– Learn how to read papers & evaluate ideas

Page 4: EECE 571R (Spring 2009) Massively parallel/distributed platforms

What I’ll Assume You Know

• Basic Internet architecture– IP, TCP, DNS, HTTP

• Basic principles of distributed computing– Asynchrony (cannot distinguish between communication failures

and latency)– Incomplete & inconsistent global state knowledge (cannot know

everything correctly)– Failures happen (in large systems, even rare failures of individual

components, aggregate to high failure rates)

• If there are things that don’t make sense, ask!

Page 5: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Case study (and project ideas):– Amdahl’s Law in the Multi-core Era

• Administrative: course schedule/organization

Page 6: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law in the Multicore Era

Acknowledgement:: some slides borrowed form Mark D. Hill, David Patterson, Jim Larus, Saman Amarasinghe, Richard Brunner, Luddy Harrison presentations

Page 7: EECE 571R (Spring 2009) Massively parallel/distributed platforms
Page 8: EECE 571R (Spring 2009) Massively parallel/distributed platforms
Page 9: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Moore’s Laws

Page 10: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Technology & Moore’s Law

Transistor1947

Integrated Circuit 1958 (a.k.a. Chip)

Moore’s Law 1964:

# Transistors per Chip doubles every two years (or 18 months)

Page 11: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Architects & Another Moore’s Law

Microprocessor 1971

50M transistors ~2000

Popular Moore’s Law:

Processor (core) performance doubles every two years

Page 12: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Multicore Chip (a.k.a. Chip Multiprocesors)

Why Multicore?• Power simpler structures• Memory Concurrent accesses

to tolerate off-chip latency• Wires intra-core wires shorter• Complexity divide & conquer

But more cores; NOT faster cores

Will effective chip performance

keep doubling every two years?Eight 4-way cores 2006

Page 13: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Virtuous Cycle, circa 1950 – 2005

World-Wide Software Market (per IDC):

$212b (2005) $310b (2010)

Increasedprocessor

performance

Larger, morefeature-fullsoftware

Largerdevelopment

teams

Higher-levellanguages &abstractions

Slowerprograms

Page 14: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Increasedprocessor

performance

Larger, morefeature-fullsoftware

Largerdevelopment

teams

Higher-levellanguages &abstractions

Slowerprograms

Virtuous Cycle, 2005 – ???

World-Wide Software Market $212b (2005) ?

XGAME OVER — NEXT LEVEL?

Thread Level Parallelism & Multicore Chips

Page 15: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Summary: A Corollary to Amdahl’s Law

• Develop Simple Model of Multicore Hardware – Complements Amdahl’s software model– Fixed chip resources for cores– Core performance improves sub-linearly with resources

• Shows Need For Research To– Increase parallelism (Are you surprised?)– Increase core performance (especially for larger chips)– Refine asymmetric designs (e.g., one core enhanced)– Refine dynamically harnessing cores for serial performance

Page 16: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Recall Amdahl’s Law

• A Model of Multicore Hardware

• Symmetric Multicore Chips

• Asymmetric Multicore Chips

• Dynamic Multicore Chips

• Caveats & Wrap Up

Page 17: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law

• Begins with Simple Software Assumption (Limit Arg.)– Fraction F of execution time perfectly parallelizable– No Overhead for

– Scheduling– Synchronization– Communication, etc.

– Fraction 1 – F Completely Serial

• Time on 1 core = (1 – F) / 1 + F / 1 = 1

• Time on N cores = (1 – F) / 1 + F / N

Page 18: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law [1967]

• Implications:– Attack the common case when introducing paralelizations:

When f is small, optimizations will have little effect.

– The aspects you ignore also limit speedup: As N approaches infinity, speedup is bound by 1/(1 – f ).

Amdahl’s Speedup =1

+1 - F

1

F

N

Page 19: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law [1967]

• Discussion:– Can you ever obtain super-linear speedups?

Amdahl’s Speedup =1

+1 - F

1

F

N

Page 20: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law [1967]

• For mainframes, Amdahl expected 1 - F = 35%– For a 4-processor speedup = 2– For infinite-processor speedup < 3– Therefore, stay with mainframes with one/few processors

• Q: Does it make sense to design massively multicore chips? What kind?

Amdahl’s Speedup =1

+1 - F

1

F

N

Page 21: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Designing Multicore Chips Hard

• Designers must confront single-core design options– Instruction fetch, wakeup, select– Execution unit configuation & operand bypass– Load/queue(s) & data cache– Checkpoint, log, runahead, commit.

• As well as additional design degrees of freedom– How many cores? How big each?– Shared caches: levels? How many banks?– Memory interface: How many banks?– On-chip interconnect: bus, switched?

Page 22: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Want Simple Multicore Hardware Model

To Complement Amdahl’s Simple Software Model

(1) Chip Hardware Roughly Partitioned into– (I) Multiple Cores (with L1 caches)– (II) The Rest (L2/L3 cache banks, interconnect, pads, etc.)– Assume: Changing Core Size/Number does NOT change The Rest

(2) Resources for Multiple Cores Bounded– Bound of N resources per chip for cores– Due to area, power, cost ($$$), or multiple factors– Bound = Power? (but pictures here use Area)

Page 23: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Want Simple Multicore Hardware Model, cont.

(3) Architects can improve single-core performance using more of the bounded resource

• A Simple Base Core– Consumes 1 Base Core Equivalent (BCE) resources– Provides performance normalized to 1

• An Enhanced Core (in same process generation)– Consumes R x BCEs– Performance as a function Perf(R)

• What does function Perf(R) look like?

Page 24: EECE 571R (Spring 2009) Massively parallel/distributed platforms

More on Enhanced Cores

• (Performance Perf(R) consuming R BCEs resources)

• If Perf(R) > R Always enhance core• Cost-effectively speedups both sequential & parallel

• Therefore, equations assume Perf(R) < R

• Graphs Assume Perf(R) = square root of R– 2x performance for 4 BCEs, 3x for 9 BCEs, etc.– Why? Models diminishing returns with “no coefficients”

• How to speedup enhanced core?– <Insert favorite or TBD micro-architectural ideas here>

Page 25: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Recall Amdahl’s Law

• A Model of Multicore Hardware

• Symmetric Multicore Chips

• Asymmetric Multicore Chips

• Dynamic Multicore Chips

• Caveats & Wrap Up

Page 26: EECE 571R (Spring 2009) Massively parallel/distributed platforms

How Many (Symmetric) Cores per Chip?

• Each Chip Bounded to N BCEs (for all cores)• Each Core consumes R BCEs• Assume Symmetric Multicore = All Cores Identical• Therefore, N/R Cores per Chip — (N/R)*R = N• For an N = 16 BCE Chip:

Sixteen 1-BCE cores Four 4-BCE cores One 16-BCE core

Page 27: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Performance of Symmetric Multicore Chips

• Serial Fraction 1-F uses 1 core at rate Perf(R) • Serial time = (1 – F) / Perf(R)

• Parallel Fraction uses N/R cores at rate Perf(R) each• Parallel time = F / (Perf(R) * (N/R)) = F*R / Perf(R)*N

• Therefore, w.r.t. one base core:

• Implications?

Symmetric Speedup =1

+1 - F

Perf(R)

F * R

Perf(R)*N

Enhanced Cores speed Serial & Parallel

Page 28: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 16 BCEs

F=0.5, Opt. Speedup S = 4 = 1/(0.5/4 + 0.5*16/(4*16))

Need to increase parallelism to make multicore optimal!

(16 cores) (8 cores) (2 cores) (1 core)

F=0.5R=16,Cores=1,Speedup=4

(4 cores)

Page 29: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 16 BCEs

At F=0.9, Multicore optimal, but speedup limited

Need to obtain even more parallelism!

F=0.5R=16,Cores=1,Speedup=4

F=0.9, R=2, Cores=8, Speedup=6.7

Page 30: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 16 BCEs

F matters: Amdahl’s Law applies to multicore chips

Researchers should target parallelism F first

F1, R=1, Cores=16, Speedup16

Page 31: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 16 BCEs

As Moore’s Law enables N to go from 16 to 256 BCEs,

More core enhancements? More cores? Or both?

Recall F=0.9, R=2, Cores=8, Speedup=6.7

Page 32: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 256 BCEs

As Moore’s Law increases N, often need enhanced core designs

Researcher should target single-core performance too

F=0.9R=28 (vs. 2)Cores=9 (vs. 8)Speedup=26.7 (vs. 6.7)CORE ENHANCEMENTS!

F1R=1 (vs. 1)Cores=256 (vs. 16)Speedup=204 (vs. 16) MORE CORES!

F=0.99R=3 (vs. 1)

Cores=85 (vs. 16)Speedup=80 (vs. 13.9)

CORE ENHANCEMENTS& MORE CORES!

Page 33: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Recall Amdahl’s Law

• A Model of Multicore Hardware

• Symmetric Multicore Chips

• Asymmetric Multicore Chips

• Dynamic Multicore Chips

• Caveats & Wrap Up

Page 34: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric (Heterogeneous) Multicore Chips

• Symmetric Multicore Required All Cores Equal• Why Not Enhance Some (But Not All) Cores?

• For Amdahl’s Simple Software Assumptions– One Enhanced Core– Others are Base Cores

• How?– <fill in favorite micro-architecture techniques here>– Model ignores design cost of asymmetric design

• How does this effect our hardware model?

Page 35: EECE 571R (Spring 2009) Massively parallel/distributed platforms

How Many Cores per Asymmetric Chip?

• Each Chip Bounded to N BCEs (for all cores)• One R-BCE Core leaves N-R BCEs• Use N-R BCEs for N-R Base Cores• Therefore, 1 + N - R Cores per Chip• For an N = 16 BCE Chip:

Symmetric: Four 4-BCE cores Asymmetric: One 4-BCE core& Twelve 1-BCE base cores

Page 36: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Performance of Asymmetric Multicore Chips

• Serial Fraction 1-F same, so time = (1 – F) / Perf(R)

• Parallel Fraction F– One core at rate Perf(R)– N-R cores at rate 1– Parallel time = F / (Perf(R) + N - R)

• Therefore, w.r.t. one base core:

Asymmetric Speedup =1

+1 - F

Perf(R)

F

Perf(R) + N - R

Page 37: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric Multicore Chip, N = 256 BCEs

Number of Cores = 1 (Enhanced) + 256 – R (Base)

How do Asymmetric & Symmetric speedups compare?

(256 cores) (253 cores) (193 cores) (1 core)(241 cores)

Page 38: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Recall Symmetric Multicore Chip, N = 256 BCEs

Recall F=0.9, R=28, Cores=9, Speedup=26.7

Page 39: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric Multicore Chip, N = 256 BCEs

Asymmetric offers greater speedups potential than Symmetric

• As Moore’s Law increases N, Asymmetric gets better

Some researchers should target developing asymmetric multicores

F=0.9R=118 (vs. 28)Cores= 139 (vs. 9)Speedup=65.6 (vs. 26.7)

F=0.99R=41 (vs. 3)Cores=216 (vs. 85)Speedup=166 (vs. 80)

Page 40: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Recall Amdahl’s Law

• A Model of Multicore Hardware

• Symmetric Multicore Chips

• Asymmetric Multicore Chips

• Dynamic Multicore Chips

• Caveats & Wrap Up

Page 41: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chips

• Why NOT Have Your Cake and Eat It Too?

• N Base Cores for Best Parallel Performance• Harness R Cores Together for Serial Performance

• How? DYNAMICALLY Harness Cores Together– <insert favorite or TBD techniques here>

parallel mode

sequential mode

How would onemodel this chip?

Page 42: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Performance of Dynamic Multicore Chips

• N Base Cores Where R Can Be Harnessed

• Serial Fraction 1-F uses R BCEs at rate Perf(R)• Serial time = (1 – F) / Perf(R)

• Parallel Fraction F uses N base cores at rate 1 each• Parallel time = F / N

• Therefore, w.r.t. one base core:

Dynamic Speedup =1

+1 - F

Perf(R)

F

N

Page 43: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Recall Asymmetric Multicore Chip, N = 256 BCEs

What happens with a dynamic chip?

Recall F=0.99R=41Cores=216Speedup=166

Page 44: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chip, N = 256 BCEs

Dynamic offers greater speedup potential than Asymmetric

Researchers should target dynamically harnessing cores

F=0.99R=256 (vs. 41)Cores=256 (vs. 216)Speedup=223 (vs. 166)

Note: #Cores always N=256

Page 45: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Outline

• Recall Amdahl’s Law

• A Model of Multicore Hardware

• Symmetric Multicore Chips

• Asymmetric Multicore Chips

• Dynamic Multicore Chips

• Caveats & Wrap Up

Page 46: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Three Multicore Amdahl’s Law

Symmetric Speedup =1

+1 - F

Perf(R)

F * R

Perf(R)*N

Asymmetric Speedup =1

+1 - F

Perf(R)

F

Perf(R) + N - R

Dynamic Speedup =1

+1 - F

Perf(R)

F

N

N/R Enhanced

Cores

Parallel Section

N Base Cores

1 Enhanced & N-R Base

Cores

Sequential Section

1 Enhanced Core

Page 47: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Software Model Charges 1 of 2

• Serial fraction not totally serial• Can extend software model to tree algorithms, etc.

• Parallel fraction not totally parallel• Can extend for varying or bounded parallelism

• Serial/Parallel fraction may change• Can extend for Weak Scaling [Gustafson, CACM’88]• Run larger, more parallel problem in constant time

Page 48: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Software Model Charges 2 of 2

• Synchronization, communication, scheduling effects?• Can extend for overheads and imbalance

• Software challenges for asymmetric multicore worse• Can extend for asymmetric scheduling, etc.

• Software challenges for dynamic multicore greater• Can extend to model overheads to facilitate

Page 49: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Hardware Model Charges

• Naïve to consider total resources for cores fixed• Can extend hardware model to how core changes effect

‘The Rest’

• Naïve to bound Cores by one resource (esp. area)• Can extend for Pareto optimal mix of area, power,

complexity, reliability, …

• Naïve to ignore challenges due to off-chip bandwidthlimits & benefits of last-level caching

• Can extend for modeling these

• Naïve to use performance = square root of resources• Can extend as equations can use any function

Page 50: EECE 571R (Spring 2009) Massively parallel/distributed platforms

• Just because the model is simple• Does NOT mean the conclusions are wrong

• Prediction– While the truth is more complex– These basic observations will hold

Page 51: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chip, N = 1024 BCEs

F1R1024Cores1024Speedup 1024!

NOT Possible Today

NOT Possible EVER Unless We Dream & Act

Page 52: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Summary so far: A Corollary to Amdahl’s Law

• Developed Simple Model of Multicore Hardware – Complements Amdahl’s software model– Fixed chip resources for cores– Core performance improves sub-linearly with resources

• Show Need For Research To– Increase parallelism (Are you surprised?)– Increase core performance (especially for larger chips)– Refine asymmetric design (e.g., one core enhanced)– Refine dynamically harnessing cores for serial performance

Page 53: EECE 571R (Spring 2009) Massively parallel/distributed platforms
Page 54: EECE 571R (Spring 2009) Massively parallel/distributed platforms

What about clusters / clouds / massively distributed platforms?

Page 55: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Amdahl’s Law – for multi-processor/distributed

• Begins with Simple Software Assumption (Limit Arg.)– Fraction F of execution time perfectly parallelizable– No Overhead for

– Scheduling– Synchronization– Communication, etc.

– Fraction 1 – F Completely Serial

Amdahl’s Speedup =1

+1 - F

1

F

N

Page 56: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Clusters / Distributed systems

Page 57: EECE 571R (Spring 2009) Massively parallel/distributed platforms

This class

• Weekly schedule (tentative)

Page 58: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Course Organization/Syllabus/etc.

Page 59: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Course structure

• Paper discussion (most classes) & lectures

• Student projects – Aim high! Have fun! It’s a class project, not your PhD!– Teams of up to 3 students– Project presentations at the end of the term

Page 60: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Grading

• Paper reviews:35%• Class participation 10% • Discussion leading: 10%

• Project: 45%

Page 61: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia:Paper Reviewing (1)• Goals:

– Think of what you read– Expand your knowledge beyond the papers that are assigned– Get used to writing paper reviews

• Reviews due by midnight the day before the class• Be professional in your writing• Have an eye on the writing style:

– Clarity– Beware of traps: learn to use them in writing and detect them in reading– Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the

Internet…”

Page 62: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Paper Reviewing (2)Follow the form provided when relevant.

• State the main contribution of the paper

• Critique the main contribution:

• Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution).

• Explain your rating in a sentence or two.

Rate how convincing the methodology is.

• Do the claims and conclusions follow from the experiments?

• Are the assumptions realistic?

• Are the experiments well designed?

• Are there different experiments that would be more convincing?

• Are there other alternatives the authors should have considered?

• (And, of course, is the paper free of methodological errors?)

Page 63: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Paper Reviewing (3)• What is the most important limitation of the approach?

• What are the three strongest and/or most interesting ideas in the paper?

• What are the three most striking weaknesses in the paper?

• Name three questions that you would like to ask the authors.

• Detail an interesting extension to the work not mentioned in the future work section.

• Optional comments on the paper that you’d like to see discussed in class.

Page 64: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Discussion leading

• Come prepared!– Prepare discussion outline

– Prepare questions:

• “What if”s• Unclear aspects of the solution proposed• …

– Similar ideas in different contexts

– Initiate short brainstorming sessions

• Leaders do NOT need to submit paper reviews• Main goals:

– Keep discussion flowing

– Keep discussion relevant

– Engage everybody (I’ll have an eye on this, too)

Page 65: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Projects

• Combine with your research if relevant to the class• Get approval from all instructors if you overlap final projects:

– Don’t sell the same piece of work twice

– You can get more than twice as many results with less than twice as much work

• Aim high!– Put one extra month and get a publication out of it

– It is doable!

• Try ideas that you postponed out of fear: it’s just a class, not your PhD.

Page 66: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Administravia: Project deadlines (tentative)

• 2nd week – 3 slides idea presentation: “What is the research question you aim to answer”

• 4th week: 2-page project proposal• 6th week: 3-page related work survey

– Know relevant work in your problem area– If implementation project, list tools, similar projects– Expand proposal

• 8th week: 4-page Midterm project due– Have a clear image of what’s possible/doable– Report preliminary results

• [see schedule] In-class project presentation– Demo, if appropriate

• Last week of exam session:– 10-page write-up

Page 67: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Next Class (Mon, 19/01)

• Note room change: KAIS • Discussion of

– Project ideas– Papers

To do:• Subscribe to mailing list• Volunteers for discussion leaders for class next week

Page 68: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Questions?

Page 69: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Backup Slides

Page 70: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Cost-Effective Parallel Computing

• Isn’t Speedup(P) < P inefficient? (P = processors)• If only throughput matters, use P computers instead?

• But much of a computer’s cost is NOT in the processor [Wood & Hill, IEEE Computer 2/95]

• Let Costup(P) = Cost(P)/Cost(1)

• Parallel computing cost-effective:• Speedup(P) > Costup(P)• E.g. for SGI PowerChallenge w/ 500MB:• Costup(32) = 8.6

Page 71: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 16 BCEs

Page 72: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 256 BCEs

Page 73: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Symmetric Multicore Chip, N = 1024 BCEs

Page 74: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric Multicore Chip, N = 16 BCEs

Page 75: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric Multicore Chip, N = 256 BCEs

Page 76: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Asymmetric Multicore Chip, N = 1024 BCEs

Page 77: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chip, N = 16 BCEs

Page 78: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chip, N = 256 BCEs

Page 79: EECE 571R (Spring 2009) Massively parallel/distributed platforms

Dynamic Multicore Chip, N = 1024 BCEs

Page 80: EECE 571R (Spring 2009) Massively parallel/distributed platforms