cs425 computer systems architecturehy425/2019f/lectures/hy425_l1_intro.pdf · •midterm exam: 20%...
TRANSCRIPT
![Page 1: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/1.jpg)
CS425Computer Systems Architecture
Fall 2019
Introduction
CS425 - Vassilis Papaefstathiou 1
![Page 2: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/2.jpg)
Outline
• Logistics
• CPU Evolution
• Course goal (what is Computer Architecture?)
CS425 - Vassilis Papaefstathiou 2
![Page 3: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/3.jpg)
Course Information
• Elective course in Hardware and Computer Systems (E4)• 6 ECTS
• Prerequisite: CS225 Computer Organization
• Instructors:• Dr. Vassilis Papaefstathiou ([email protected])
• Prof. Manolis Katevenis ([email protected])
• Teaching Assistants:• Mr. Sotiris Totomis ([email protected])
• Mr. Iasonas Mastorakis ([email protected])
• Lectures:• Monday 16:15 – 18:00 (H.206)
• Wednesday 16:15 – 18:00 (H.206)
• Friday 16:15 – 18:00 (H.206) backup slot when needed
CS425 - Vassilis Papaefstathiou 3
![Page 4: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/4.jpg)
Grading
• Homeworks & Programming Assignments: 35%• Mandatory
• Average Grade > 4.5
• Midterm Exam: 20% (mandatory)
• Final Exam: 45% (grade > 4.5)
CS425 - Vassilis Papaefstathiou 4
![Page 5: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/5.jpg)
Course Material
• Website:• http://www.csd.uoc.gr/~hy425
• Mailing List: • [email protected] (subscribe with majordomo)
• Textbook:• Hennessy and Patterson, Computer Architecture,
A Quantitative Approach. 3rd Edition Available in Greek (Tziolas publishers, translation by D. Pnevmatikatos, D. Serpanos and G. Stamoulis). ISBN 97896041807693
CS425 - Vassilis Papaefstathiou 5
![Page 6: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/6.jpg)
Tentative Schedule
• Fundamentals, metrics, pipelining (1.5 weeks)
• Instruction Level Parallelism (2 weeks)
• Branch prediction (1 week)
• Multiple issue, VLIW, vector, multithreading (2 weeks)
• Memory hierarchy, caches and optimizations (2.5 weeks)
• Multicore processors, cache coherence (2 weeks)
• Main memory technologies (1 week)
• Advanced topics (1 week)
CS425 - Vassilis Papaefstathiou 6
![Page 7: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/7.jpg)
History in Computer Devices
• EDSAC, University of Cambridge, UK, 1949-1958 (mercury-based memory, logic, punched tape, teleprinter, EDSAC2 1965)
CS425 - Vassilis Papaefstathiou 7
![Page 8: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/8.jpg)
Computing Systems Today
• The world is a large parallel system• Microprocessors everywhere
• Vast infrastructure behind them
CS425 - Vassilis Papaefstathiou 8
’70: microproc. & supercomputers
‘80: compilers, OS, RISC, x86
‘90: Internet, WWW, PDA
‘00: mobile, cell phones,
embedded cpus
‘10: internet of things (IoT)
MEMS for
Sensor Nets
Internet
Connectivity
Clusters
Massive Cluster
Gigabit Ethernet
RobotsRouters
Cars
Sensor
Nets
Refr
igera
tors
![Page 9: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/9.jpg)
Improvement in Computer
• Radical progress in computers due to:• Technological improvements (next few slides)
o steady
• Better computer architectures (course focus)o less consistent
CS425 - Vassilis Papaefstathiou 9
![Page 10: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/10.jpg)
Source Drain
Technology: Transistor Revolution
CS425 - Vassilis Papaefstathiou 10
Intel 4004, 1971(Moore, Noyce Intel 1968)
4-bit2,300 transistors740KHz operation10μm (=10000nm) PMOS technology
Bell Labs, 1948First Transistor
Intel Core i7, 201164-bit 2,600,000,000 transistors3.4GHz32nm
![Page 11: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/11.jpg)
Technology: Moore’s Law
• In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 months (i.e., grow exponentially with time)
• He made a prediction that semiconductor technology will double its effectiveness every 18 months
• In practice a new technology is introduced every ~two years, with feature sizes of circuit layout 70% of the previous technology
CS425 - Vassilis Papaefstathiou 11
![Page 12: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/12.jpg)
Technology: Transistor Count
CS425 - Vassilis Papaefstathiou 12
4004:
2.3K transistors
10-core Xeon:
2.6B transistors
28-core Xeon Platinum: 8 B
Apple A13: 8.5 B
Nvidia Volta V100 GPU: 21 B
32-core AMD Epyc: 19 B
64-core AMD Epyc Rome: 32 B
![Page 13: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/13.jpg)
Technology constantly on the move
• Number of transistors is not the limiting factor⎻ Currently ~5+ billion transistors/chip ⎻ Problems: power, heat, latency
• 3-dimensional chip technology?⎻ Sandwiches of silicon (Package on Package)⎻ “Through-silicon Vias” TSVs for communication ⎻ FinFET
• On-chip optical connections?⎻ Power savings for large packets
• Intel Core i7 (“Ivy Bridge”)⎻ 4 cores + GPU⎻ 22 nm, tri-gate (“3D”) transistors⎻ 1.4B Transistors ⎻ Shared L3 Cache - 8MB ⎻ L2 Cache - 1MB (256K x 4) , L1 – 64KB/core
CS425 - Vassilis Papaefstathiou 13
![Page 14: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/14.jpg)
Technology vs microarchitecture: Intel CPU evolution
• Kaby Lake optimization step (14nm)
• Intel announced March 2016: Process-Architecture-Optimization
• CannonLake upcoming 10 nm successor (process step)
• Ice Lake 10 nm (architecture step)CS425 - Vassilis Papaefstathiou 14
tick tock tick tock tick(2006) tock tick (2008) tock tick(2010)
tock tick(2012) tock tick(2014) tock opt(2016)10nm
process(2017)
![Page 15: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/15.jpg)
Transistor size trends and questions
• Feature sizes, higher performance?⎻ Transistor size went down from 10 micros to 14 nanometers
⎻ Quadratic increase in density, linear drop in feature size
⎻ Linear increase in transistor performance
• Where is the catch?⎻ Smaller voltage reduction to maintain safe operation
⎻ Higher resistance and capacitance per unit of length
⎻ Shorter wires but with higher resistance/capacitance
⎻ Wire delays improving poorly compared to transistors
CS425 - Vassilis Papaefstathiou 15
![Page 16: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/16.jpg)
Limiting Force: Power Density
Microprocessor Size ≈ 2 cm2
CS425 - Vassilis Papaefstathiou 16
![Page 17: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/17.jpg)
Crossroads: Uniprocessor Performance
CS425 - Vassilis Papaefstathiou 17
Constrained by power, instruction level parallelism, memory latency
Move to multicore
![Page 18: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/18.jpg)
Trends – All in one
CS425 - Vassilis Papaefstathiou 18
![Page 19: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/19.jpg)
Trends – All in one
CS425 - Vassilis Papaefstathiou 19
![Page 20: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/20.jpg)
The End of the Uniprocessor Era
• Power wall: power expensive, transistors free ⎻ can put more on chip than can afford to turn on
• ILP wall: law of diminishing returns on more HW for ILP
• Memory wall: Memory slow, multiplies fast ⎻ 200 clock cycles to DRAM memory vs. 4 clocks for multiply
• Power Wall + ILP Wall + Memory Wall = Brick Wall⎻ Uniprocessor performance now 2X every 5(?) years
Single biggest change in the history of computing systems
CS425 - Vassilis Papaefstathiou 20
![Page 21: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/21.jpg)
Many Core Chips: The future is here
• “Many Core” refers to many processors/chip⎻ 64? 128? Hard to say exact
boundary
• How to program these?⎻ Use 2 CPUs for video/audio
⎻ Use 1 for word processor, 1 for browser
⎻ 76 for virus checking???
• Something new is clearly needed here…
CS425 - Vassilis Papaefstathiou 21
Intel 80-core multicore chip, 2007, 65nm – 100M transistors
Intel Many Integrated Core Architecture (MIC), 50-cores, 2012, 22nm, commercial
Intel Single-Chip Cloud Computer (SCC), 48-cores, 2010, 4 memory controllers, 24-router mesh
![Page 22: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/22.jpg)
What is Computer Architecture
• In its broadest definition, computer architecture is the design of theabstraction layers that allow us to implement information processing applications efficiently using available manufacturing technologies.
CS425 - Vassilis Papaefstathiou 22
Application
Physics
Gap too large to
bridge in one step
![Page 23: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/23.jpg)
Abstraction Layers in Modern Systems
Algorithm
Gates/Register-Transfer Level (RTL)
Application
Instruction Set Architecture (ISA)
Operating System/Virtual Machine
Microarchitecture
Devices
Programming Language
Circuits
Physics
Original domain
of the computer
architect
(‘50s-’80s)
Domain of recent
computer
architecture (‘90s)
Reliability,
power, …
Parallel computing,
security, …
Reinvigoration of
computer architecture,
mid-2000s onward.
CS425 - Vassilis Papaefstathiou 23
![Page 24: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/24.jpg)
Computer Architecture is an Integrated Approach
• What really matters is the functioning of the complete system ⎻ hardware, runtime system, compiler, operating system, and application
⎻ In networking, this is called the “End to End argument”
• Computer architecture is not just about transistors, individual instructions, or particular implementations⎻ E.g., Original RISC projects replaced complex instructions with a compiler +
simple instructions
• It is very important to think across all hardware/software boundaries⎻ New technology New Capabilities New Architectures New Tradeoffs
⎻ Delicate balance between backward compatibility and efficiency
CS425 - Vassilis Papaefstathiou 24
![Page 25: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/25.jpg)
Defining Computer Architecture (ISA)
Instruction Set Architecture
• ISAs converged to a common RISC paradigm⎻ CISC ISAs implemented on RISC pipelines
• Load-store architectures, general-purpose registers
• Aligned memory addressing, simple addresing modes
• Byte, word, double-word, quad-word operands
• Arithmetic, logic, control operations
• Fixed-length encoding
CS425 - Vassilis Papaefstathiou 25
![Page 26: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/26.jpg)
Example: MIPS R30000r0
r1°°°r31
PC
Programmable storage
232 x bytes
31 x 32-bit GPRs (R0=0)
32 x 32-bit FP regs (paired DP)
PC
Data types ?
Format ?
Addressing Modes?
Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUISLL, SRL, SRA, SLLV, SRLV, SRAVMUL, DIV
Memory AccessLB, LBU, LH, LHU, LW, LWL,LWRSB, SH, SW, SWL, SWR
ControlJ, JAL, JR, JALRBEq, BNE, BLEZ, BGTZ, BLTZ, BGEZ, BLTZAL, BGEZAL
32-bit instructions on word boundary
CS425 - Vassilis Papaefstathiou 26
![Page 27: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/27.jpg)
ISA vs Computer Architecture
• Old definition of computer architecture == instruction set design ⎻ Other aspects of computer design called implementation
⎻ Suggests that implementation is uninteresting or less challenging
• Computer architecture >> ISA
• Architect’s job much more than instruction set design; technical hurdles today more challenging than those in instruction set design
CS425 - Vassilis Papaefstathiou 27
![Page 28: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/28.jpg)
Defining Computer Architecture
Architecture = ISA (+prog. lang.) + Organization + Hardware
• Processor Architecture⎻ Pipelining, hazards, ILP, HW/SW interface
• Memory hierarchies
• Interconnects
• I/O systems
• Hardware technology used (e.g. component size)
• Computer architecture focuses on organization and quantitative principles of design
CS425 - Vassilis Papaefstathiou 28
![Page 29: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/29.jpg)
Execution is not just about HW and ISA• The VAX fallacy
⎻ Produce one instruction for every high-level concept
⎻ Absurdity: Polynomial Multiply
o Single hardware instruction
o But Why? Is this really faster???
• RISC Philosophy⎻ Full System Design
⎻ Hardware mechanisms viewed in context of complete system
⎻ Cross-boundary optimization
• Modern programmer does not see assembly language⎻ Many do not even see “low-level”
languages like “C”.
CS425 - Vassilis Papaefstathiou 29
Hardware
Application Binary
Library Services
OS Services
Hypervisor
Linker
Program
Libraries
Source-to-SourceTransformations
Compiler
![Page 30: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/30.jpg)
Computer Architecture Topics
Instruction Set Architecture
Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, Dynamic Compilation
Addressing,Protection,Exception Handling
L1 Cache
L2 Cache
DRAM
Disks,Tape
Coherence,Bandwidth,Latency
Emerging TechnologiesInterleavingBus protocols
RAID
VLSI
Input/Output and Storage
Memory
Hierarchy
Pipelining and Instruction
Level Parallelism
Network
Communication
Oth
er
Pro
cessors
CS425 - Vassilis Papaefstathiou 30
![Page 31: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/31.jpg)
Executive Summary
The processor
you built in
HY225
What you’ll
understand
after taking
HY425
Also, the
technology behind
multi-core
processors
CS425 - Vassilis Papaefstathiou 31
![Page 32: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles](https://reader035.vdocuments.mx/reader035/viewer/2022070916/5fb6ee594d3e865a7a5ea506/html5/thumbnails/32.jpg)
Next Lecture: Major Design Challenges
• Power
• CPU time
• Memory latency/bandwidth
• Storage latency/bandwidth
• Transactions per second
• Intercommunication
• Dependability
CS425 - Vassilis Papaefstathiou 32
Everything Looks a Little Different