1 microprocessor: from the humble random-logic replacement to the giant killer history, the present,...
TRANSCRIPT
1
Microprocessor: From the Humble Random-Logic Replacement to the Giant Killer History, the Present, the International Scene, Own Research, and some Closing Thoughts for CE StudentsGanesh Gopalakrishnan,
School of Computing,University of Utah
• NSF CSR-SMA: Toward Reliable and Efficient Message Passing Software Through Formal Analysis (the ``Gauss'' project) • 2005-TJ-1318 (SRC/Intel Customization), Scaling Formal Methods Towards Hierarchical Protocols in Shared Memory Processors (the ``MPV'' project) • Microsoft HPC Innovation Center, ``Formal Analysis and Code Generation Support for MPI''
2
The Microprocessor Rules!
Computers are incredibly fast, accurate and stupid;
humans are incredibly slow, inaccurate and brilliant;
together they are powerful beyond imagination
-- Albert Einstein
Virtually all computers are based on the humble “micro”
3
This Talk
How did the “micro” come about? What is the latest in the world of the micro? What about the international scene? What research am I doing? How about some time-tested advise for CE
students?
4
Birth of the micro, Mu-P, …
Intel’s 4004 and TI’s TMS-1000 were the first 4004 – with cover removed (L) and on (R)
Patent awarded to TI ! Intel made single-chip computer for Datapoint Marketed it as 8008 when Datapoint did not use the
design
5
Revolution of the 70s and 80s Intel : 4004, 4040, 8008, 8080, 8085, 8086,
80186, 80286, 80386, 80486, Pentium, PPro, … now “X86” (also Itanium)
Motorola: 6800, 6810, 6820, 68000, 68010, 68020, … then PowerPC (collab with IBM)
Other companies Burst of activity – EVERY student wanted to
build something out of a “mu-P” in the 70s and 80s.
6
… and it turned into a Giant Killer! It became amply clear in the 80s that it was
going to replace “mainframes” casual experiments conducted between Sun-2
(68020) versus Digital’s VAX 11/750 and 780
The birth of the IBM PC around 1980 started things going mu-P’s way!
7
… and a super Giant Killer!
John Hennessy’s prediction during SC’97: (http://news-service.stanford.edu/news/1997/november19/supercomp1119.html
John Hennessy: “Today’s microprocessor chipping away at supercomputer market”
Traditionally designed supercomputers will vanish within a decade – it has!
Bus-based multi microprocessor machines rule! Clusters of them fill vast rooms now!
8
IBM ASCI White Machine
Released in 2000-- Peak Performance : 12.3 teraflops. -- Processors used : IBM RS6000 SP Power3's - 375 MHz. -- There are 8,192 of these processors -- The total amount of RAM is 6Tb. -- Two hundred cabinets - area of two basket ball courts.
9
IBM BlueGene/L
The first machine in the family, Blue Gene/L, is expected to operateat a peak performance of about 360 teraflops (360 trillion operations per second), and occupy 64 racks -- taking up only about the same space as half of a tennis court. Researchers at the Lawrence Livermore National Laboratory (LLNL) plan to use Blue Gene/L to simulate physical phenomena that require computational capability much greater than presently available, such as cosmology and the behavior of stellar binary pairs, laser-plasma interactions, and the behavior and aging of high explosives.
12
So what are the design issues? Complex cache coherence protocols !
Silicon debugging is becoming a headache !
Programming apps is becoming hard !
13
What is cache coherence?Thread and process interactions need to coordinateOtherwise something analogous to this will happen !
Teller 1 Teller 2
Read bank balance ($100) Read bank balance ($100)
Add $10 on scratch paper ($110) Subtract $10 on scratch paper ($90)Enter $110 into account Enter $90 into account
USER LEFT WITH $90 – NOT WITH $100 !!
14
Cache Coherence Protocol Verification
My “MPV” research project develops techniques to ensurethat cache coherence protocols are correct
…
dir dir
Chip-level protocols
Inter-cluster protocols
Intra-cluster protocols
mem mem
15
Programming these supercomputers!My “Gauss” project (in collaboration with Robert M. Kirby) ensures
that supercomputer programs do not contain bugs, and also perform efficiently
Virtually all supercomputers are programmed using the “MPI” communication library
Mis-using this library can often result in bugs that show up only after porting
P1
MPI_SEND(to P2, Msg)
MPI_RECV(from P2, Msg)
P2
MPI_SEND(to P1, Msg)
MPI_RECV(from P1, Msg)
If the system does not provide sufficient buffering, the sends may both block, thus causing a deadlock !
17
• CANNOT Assume there is a “front-side bus”• CANNOT Record all link traffic• CAN ONLY Generate sets of possible cache states • HOW BEST can one match against designed behavior?• I did a prelim study of a simple example during sabbatical • Organizing workshop talks in November (FMCAD 2006)
cpu cpu cpu cpu
Invisible“miss” traffic
Visible“miss” traffic
Silicon Debugging: Can’t see “inside” CPUs without paying a huge price
18
The real rage these days
Multicores ! Putting two simple CPUs achieves 80%
performance per cpu with only 50% of the power per CPU chip as a whole gives 1.6x performance for same power PROVIDED we can keep the cores busy
Simple way to keep ‘em busy Virus-checker in background while user computes Photoshop in one and Windows on another
More complex ways to keep multiple cores busy are being investigated
19
The real rage these days
Transaction Memories! Users cause too many bugs when programming
using locks Transaction memories allow shared memory
threads to “watch” each others read/write actions Conflicting accesses can rollback and retry
20
LOTS of hard problems remain open How to provide memory bandwidth?
Put multicore CPU chip on top of highly dense DRAM chip (e.g. 8 GB)
Most users will buy just “one of those” Others will buy SDRAM module add-ons
Slooow access may need to re-learn olden day techniques of overlay programming !!
21
Learn from History – Learn Computer History If you want to understand today, you have to
search yesterday. ~Pearl Buck
Things are changing SO fast that basic principles are often being diluted
Get excited by studying computer history and seeing how much better off we are (also be chagrined by all the lost opportunity!)
22
Where to learn computer history? Computer History Museum, Mountain View
Intel Museum, Santa Clara
Boston Computer Museum
Many in the UK (Manchester, London, …)
Travel widely – be inspired by what you see!
23
It is all International, baby!
Learn multiple cultures, how the world works Anything that’s automated is outsourced That said, the US has to try VERY hard to give up its
amazing set of advantages Amazing work ethics Individuality Infrastructure !!!!!!!! The talent pool is VERY deep here in the US
But, sadly, we as a nation are REALLY trying hard to foolishly give up a lot
Simple but sure-footed corrections now we are well off
24
A glimpse of the International Scene Lessons from MSR India
Amazing talent-pool Relatively high availability of talent
Lessons from Intel India Talent-pool still lacks depth and abilities of many of our
CEs We can stay competitive in hardware for a LONG time to
come Yet,… it’s easy to ignore how SMART and
numerous the talent outside the US is… Seeing first-hand is amazingly useful! Apply for international internships!
25
Gradual loss of manufacturing death How many goods are made in China these
days? Any green-house laws? Airbus assembly factory moving to China Autos… watch out
How much software / services made in India? THE REAL DANGER
Loss of manufacturing kills pride and incentive to learn – we don’t want that in CE
26
Recipe for success
The best ideas don’t always work Wait for the world to be ready for the ideas The devil is in the detail Too much established momentum Decide goal (short-term impact vs. long-term)
Quiet tenacity Tenacity without ruffling feathers needlessly Work hard! work smart! learn theory! be a champion
algorithm / program designer! learn advanced hardware design!
Learn to write extremely clearly and precisely! Learn to give inspiring talks! (be inspired first!)