1 microprocessor: from the humble random-logic replacement to the giant killer history, the present,...

26
1 Microprocessor: From the Humble Random-Logic Replacement to the Giant Killer History, the Present, the International Scene, Own Research, and some Closing Thoughts for CE Students Ganesh Gopalakrishnan, School of Computing, University of Utah CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) -TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protoc Shared Memory Processors (the ``MPV'' project) osoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''

Upload: abbey-piersall

Post on 15-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

1

Microprocessor: From the Humble Random-Logic Replacement to the Giant Killer History, the Present, the International Scene, Own Research, and some Closing Thoughts for CE StudentsGanesh Gopalakrishnan,

School of Computing,University of Utah

• NSF CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) • 2005-TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protocols in Shared Memory Processors (the ``MPV'' project) • Microsoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''

2

The Microprocessor Rules!

Computers are incredibly fast, accurate and stupid;

humans are incredibly slow, inaccurate and brilliant;

together they are powerful beyond imagination

-- Albert Einstein

Virtually all computers are based on the humble “micro”

3

This Talk

How did the “micro” come about? What is the latest in the world of the micro? What about the international scene? What research am I doing? How about some time-tested advise for CE

students?

4

Birth of the micro, Mu-P, …

Intel’s 4004 and TI’s TMS-1000 were the first 4004 – with cover removed (L) and on (R)

Patent awarded to TI ! Intel made single-chip computer for Datapoint Marketed it as 8008 when Datapoint did not use the

design

5

Revolution of the 70s and 80s Intel : 4004, 4040, 8008, 8080, 8085, 8086,

80186, 80286, 80386, 80486, Pentium, PPro, … now “X86” (also Itanium)

Motorola: 6800, 6810, 6820, 68000, 68010, 68020, … then PowerPC (collab with IBM)

Other companies Burst of activity – EVERY student wanted to

build something out of a “mu-P” in the 70s and 80s.

6

… and it turned into a Giant Killer! It became amply clear in the 80s that it was

going to replace “mainframes” casual experiments conducted between Sun-2

(68020) versus Digital’s VAX 11/750 and 780

The birth of the IBM PC around 1980 started things going mu-P’s way!

7

… and a super Giant Killer!

John Hennessy’s prediction during SC’97: (http://news-service.stanford.edu/news/1997/november19/supercomp1119.html

John Hennessy: “Today’s microprocessor chipping away at supercomputer market”

Traditionally designed supercomputers will vanish within a decade – it has!

Bus-based multi microprocessor machines rule! Clusters of them fill vast rooms now!

8

IBM ASCI White Machine

Released in 2000-- Peak Performance : 12.3 teraflops. -- Processors used : IBM RS6000 SP Power3's - 375 MHz. -- There are 8,192 of these processors -- The total amount of RAM is 6Tb. -- Two hundred cabinets - area of two basket ball courts.

9

IBM BlueGene/L

The first machine in the family, Blue Gene/L, is expected to operateat a peak performance of about 360 teraflops (360 trillion operations per second), and occupy 64 racks -- taking up only about the same space as half of a tennis court. Researchers at the Lawrence Livermore National Laboratory (LLNL) plan to use Blue Gene/L to simulate physical phenomena that require computational capability much greater than presently available, such as cosmology and the behavior of stellar binary pairs, laser-plasma interactions, and the behavior and aging of high explosives.

10

IBM Power-5 based supercomputer 8 die x 2 CPUs x 2-way execution = 32-way shared memory machine!

11

Sun Niagara processor 8 CPU cores (I’ve heard it is a 32-way machine too – maybe 4-way SMT?)

12

So what are the design issues? Complex cache coherence protocols !

Silicon debugging is becoming a headache !

Programming apps is becoming hard !

13

What is cache coherence?Thread and process interactions need to coordinateOtherwise something analogous to this will happen !

Teller 1 Teller 2

Read bank balance ($100) Read bank balance ($100)

Add $10 on scratch paper ($110) Subtract $10 on scratch paper ($90)Enter $110 into account Enter $90 into account

USER LEFT WITH $90 – NOT WITH $100 !!

14

Cache Coherence Protocol Verification

My “MPV” research project develops techniques to ensurethat cache coherence protocols are correct

dir dir

Chip-level protocols

Inter-cluster protocols

Intra-cluster protocols

mem mem

15

Programming these supercomputers!My “Gauss” project (in collaboration with Robert M. Kirby) ensures

that supercomputer programs do not contain bugs, and also perform efficiently

Virtually all supercomputers are programmed using the “MPI” communication library

Mis-using this library can often result in bugs that show up only after porting

P1

MPI_SEND(to P2, Msg)

MPI_RECV(from P2, Msg)

P2

MPI_SEND(to P1, Msg)

MPI_RECV(from P1, Msg)

If the system does not provide sufficient buffering, the sends may both block, thus causing a deadlock !

16

Ensuring Simulations are Correct and Efficient

(Photo courtesy NHTSA)

17

• CANNOT Assume there is a “front-side bus”• CANNOT Record all link traffic• CAN ONLY Generate sets of possible cache states • HOW BEST can one match against designed behavior?• I did a prelim study of a simple example during sabbatical • Organizing workshop talks in November (FMCAD 2006)

cpu cpu cpu cpu

Invisible“miss” traffic

Visible“miss” traffic

Silicon Debugging: Can’t see “inside” CPUs without paying a huge price

18

The real rage these days

Multicores ! Putting two simple CPUs achieves 80%

performance per cpu with only 50% of the power per CPU chip as a whole gives 1.6x performance for same power PROVIDED we can keep the cores busy

Simple way to keep ‘em busy Virus-checker in background while user computes Photoshop in one and Windows on another

More complex ways to keep multiple cores busy are being investigated

19

The real rage these days

Transaction Memories! Users cause too many bugs when programming

using locks Transaction memories allow shared memory

threads to “watch” each others read/write actions Conflicting accesses can rollback and retry

20

LOTS of hard problems remain open How to provide memory bandwidth?

Put multicore CPU chip on top of highly dense DRAM chip (e.g. 8 GB)

Most users will buy just “one of those” Others will buy SDRAM module add-ons

Slooow access may need to re-learn olden day techniques of overlay programming !!

21

Learn from History – Learn Computer History If you want to understand today, you have to

search yesterday.  ~Pearl Buck

Things are changing SO fast that basic principles are often being diluted

Get excited by studying computer history and seeing how much better off we are (also be chagrined by all the lost opportunity!)

22

Where to learn computer history? Computer History Museum, Mountain View

Intel Museum, Santa Clara

Boston Computer Museum

Many in the UK (Manchester, London, …)

Travel widely – be inspired by what you see!

23

It is all International, baby!

Learn multiple cultures, how the world works Anything that’s automated is outsourced That said, the US has to try VERY hard to give up its

amazing set of advantages Amazing work ethics Individuality Infrastructure !!!!!!!! The talent pool is VERY deep here in the US

But, sadly, we as a nation are REALLY trying hard to foolishly give up a lot

Simple but sure-footed corrections now we are well off

24

A glimpse of the International Scene Lessons from MSR India

Amazing talent-pool Relatively high availability of talent

Lessons from Intel India Talent-pool still lacks depth and abilities of many of our

CEs We can stay competitive in hardware for a LONG time to

come Yet,… it’s easy to ignore how SMART and

numerous the talent outside the US is… Seeing first-hand is amazingly useful! Apply for international internships!

25

Gradual loss of manufacturing death How many goods are made in China these

days? Any green-house laws? Airbus assembly factory moving to China Autos… watch out

How much software / services made in India? THE REAL DANGER

Loss of manufacturing kills pride and incentive to learn – we don’t want that in CE

26

Recipe for success

The best ideas don’t always work Wait for the world to be ready for the ideas The devil is in the detail Too much established momentum Decide goal (short-term impact vs. long-term)

Quiet tenacity Tenacity without ruffling feathers needlessly Work hard! work smart! learn theory! be a champion

algorithm / program designer! learn advanced hardware design!

Learn to write extremely clearly and precisely! Learn to give inspiring talks! (be inspired first!)