1 computer architecture research overview rajeev balasubramonian school of computing, university of...
TRANSCRIPT
1
Computer Architecture Research Overview
Rajeev Balasubramonian
School of Computing, University of Utahhttp://www.cs.utah.edu/~rajeev
2
What is Computer Architecture?
3
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster?
4
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster?
Completing instruction
Clock tick
Case 1:
Case 2:
Time
5
What is Computer Architecture?
To a large extent, computer architecture determines:
• the number of instructions used to execute a program
• the time each instruction takes to execute
• the idle cycles when no work gets done
• the number of instructions that can execute in parallel
6
A Typical Microprocessor
BranchPredictor
Decode &Rename Issue Logic
ALUALU ALU ALU
L2 Cache
L1 InstrCache
L1 DataCache
RegisterFile
7
Architecture Trends in the 90s
• Performance was the ultimate metric
• Transistors were a limiting factor
As on-chip transistors became available in the 90s, more functionalityand complex circuitry was added to boost performance – most of the low-hanging fruit has now been picked
8
Hitting the Wall
We have now hit the following walls:
• Single core performance
• Memory
• Complexity
• Power, temperature
9
Hitting the Power Wall
Power is as important a metric today as performance
From Shekhar Borkar, MICRO’99
10
The Advent of Multi-Core Chips
• In the past, performance magically increased by 50% every year• In the future, this improvement will be only ~20% every year … unless … the application is multi-threaded!
Core
Cache bank
11
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
For publications, see http://www.cs.utah.edu/~rajeev/research.html
12
Interconnects as a Bottleneck
• In the past, on-chip data transmission on wires cost almost nothing
• Interconnect speed and power has been improving, but not at the same rate as transistor speeds
Hence, relative to computation, communication is much more expensive
• In the near future, it will take 100 cycles to travel across the chip
• 50% of chip power can be attributed to interconnects
13
Interconnects in Multi-Core Chips
A
L1
A
CPU 3
CPU 1 CPU 2
L2cache
L2control
AA
A
A
A
L2control
14
Not all Wires are Created Equal
B-Wires L-Wires W-Wires PW-Wires
Relative latency 1x 0.5x 1.6x 3.2xRelative area 1x 4x 0.5x 0.5xDynamic power (W/m) 2.65 1.46 2.9 0.87Static Power (W/m) 1.02 0.57 1.16 0.31
15
Data Transfers have Varying Needs
• Example of a cache coherence transaction: Read exclusive request for a shared block
16
Other Interconnect Choices
• Optical interconnects: speed of light, cost in converting between optical and electrical domains
• 3D chips: reduces communication distances, low cost for vertical signal transmission, increase in power density
17
3D Layouts
Cluster
(a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)
Cache bank Intra-die horizontal wire Inter-die vertical wire
Die 1
Die 0
18
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Clustered architectures: relatively low complexity scalable solution easily handles multiple threads
19
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Heterogeneous perf/powerCores that execute the OSCores that verify results
20
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Hardware to supporttransactional memory
21
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Faults are caused by high energy particles that deposit enough charge to toggle bits
Variations in conditions may cause a circuit to not produce its result in time
22
Research Methodologies
It’s all about the simulators!
• Simplescalar & Wattch & Hotspot: about 10,000 lines of C code that models the flow of instructions through a modern processor
• Inputs: configuration file that specifies processor parameters, benchmark program (say, gzip)
• Outputs: how long the program runs on the simulated processor (Simplescalar), how much power is consumed (Wattch), what is the peak temperature (Hotspot)
23
Evaluating a New Idea
• Lots of reading (it’s better than waiting for divine inspiration)
• Identify bottlenecks, identify problems, develop an idea, repeatedly question that idea
• Understand simulator
• Engineer a solution, modify simulator code (perhaps, write fewer than 1000 lines of C code)
• Analyze data (things never work the first time), engineer/optimize/debug your solution
• Write papers
• Implement in silicon?
24
To Learn More…
• CS/EE 3810: Computer Organization
• CS/EE 6810: Computer Architecture
• CS/EE 7810: Advanced Computer Architecture
• CS/EE 7820: Parallel Computer Architecture
• CS 7937 / 7940: Architecture Reading Seminar
25
Title
• Bullet