processor arch: pipelined the memory hierarchy
TRANSCRIPT
![Page 1: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/1.jpg)
Processor Arch: Pipelined
Memory HierarchyMADE BY: Zhong Zhineng & Song Yixin
![Page 2: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/2.jpg)
Processor Arch: Pipelined
![Page 3: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/3.jpg)
• Throughput and latency change
Pipeline Basics
![Page 4: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/4.jpg)
• Critical moments
Pipeline Basics
![Page 5: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/5.jpg)
• Limitations of pipelining
• Nonuniform partitioning
• Decreasing returns to pipeline depth
• Stages
• AMD Zen2(3th Gen Ryzen): 19 stages
• Intel Ice Lake(10th Gen core): 14-19 stages
Pipeline Basics
![Page 6: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/6.jpg)
SEQ->SEQ+
![Page 7: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/7.jpg)
• Circuit retiming
• PC update stage
• No hardware PC registers
SEQ->SEQ+
![Page 8: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/8.jpg)
SEQ+->PIPE-
![Page 9: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/9.jpg)
• Adding pipeline registers
• Select PC
• Select A: Since valP is only used
in the Memory period of call
and in the Execute period of jXX
and both of them do not need
valA, Select A module is used to
reduce the number of registers.
SEQ+->PIPE-
![Page 10: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/10.jpg)
• Data hazard
• Control hazard
Problems:Data/Control Dependency
![Page 11: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/11.jpg)
Handling
data
hazard
Handling
control
hazard
A Simple Solution: Bubbles and Stalls
![Page 12: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/12.jpg)
Another Solution: Forward
• Need the data that
has not written back
to the registers
when decoding.
• Principle: Try to
use forward. If
failed, use stall.
• Sel+Fwd A
• Fwd B
![Page 13: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/13.jpg)
HCL:
Modified HCL: Select PC & Fetch
![Page 14: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/14.jpg)
HCL:
Pay attention to the choice order!
Modified HCL: Decode & Write back
![Page 15: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/15.jpg)
Modified HCL: Execute
HCL: left out
![Page 16: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/16.jpg)
Modified HCL: Memory
HCL: left out
![Page 17: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/17.jpg)
Hazard: Load/Use
• You cannot only use forwarding to solve all the problems…
• Last instruction reads data from memory to a register, and present
instruction needs the data in this register.
• Must stall and insert a bubble, then forward from memory stage.
![Page 18: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/18.jpg)
Hazard: ret
• The PC of the next instruction of ret will be known until memory stage.
• Insert three bubbles.
![Page 19: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/19.jpg)
Hazard: Branch Misprediction
• After Execute stage, the right branch will be known.
• Insert two bubbles.
![Page 20: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/20.jpg)
Hazard Combination
![Page 21: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/21.jpg)
Hazard Detection & Control
![Page 22: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/22.jpg)
Implementing Pipeline Control
![Page 23: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/23.jpg)
Memory Hierarchy
![Page 24: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/24.jpg)
Example
![Page 25: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/25.jpg)
• RAM: SRAM & DRAM
• Disk
• Bus structure
Storage Technology
![Page 26: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/26.jpg)
• Random Access Memory (RAM)• Volatile, expensive, compared to hard disk
• SRAM versus DRAM• SRAM doesn’t need refresh
• faster and stable, more expensive
• used as cache memories
• DRAM• higher density, lower power consumption
• used as main memory
RAM
![Page 27: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/27.jpg)
• Row Access Strobe (RAS)
• Column Access Strobe (CAS)
• Memory module: Read & Write a word
• FPM DRAM, SDRAM, DDR SDRAM
DRAM: Access
![Page 28: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/28.jpg)
• nonvolatile, compared to RAM
• PROM: only programmed once
• EPROM
• EEPROM -> flash memory
• firmware: stored in ROM
ROM
![Page 29: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/29.jpg)
• Bus transaction: read and write
• System bus: connecting CPU and I/O bridge
• Memory bus: connecting I/O bridge and main memory
• I/O bus: disk, graphic card and other buses
BUS
![Page 30: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/30.jpg)
• Capacity:
• 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 = #𝑏𝑦𝑡𝑒𝑠
𝑠𝑒𝑐𝑡𝑜𝑟∗ #
𝑎𝑣𝑔.𝑠𝑒𝑐𝑡𝑜𝑟𝑠
𝑡𝑟𝑎𝑐𝑘∗ #
𝑡𝑟𝑎𝑐𝑘𝑠
𝑠𝑢𝑟𝑓𝑎𝑐𝑒∗ #
𝑠𝑢𝑟𝑓𝑎𝑐𝑒𝑠
𝑝𝑙𝑎𝑡𝑡𝑒𝑟∗ #
𝑝𝑙𝑎𝑡𝑡𝑒𝑟𝑠
𝑑𝑖𝑠𝑘
• Access time:• avg seek time + avg rotation time + avg transfer time
DISK
![Page 31: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/31.jpg)
• K(kilo), M(mega), G(giga), T(tera): context dependent
• DRAM & SRAM: 𝐾 = 210, 𝑀 = 220, 𝐺 = 230, 𝑇 = 240
• Disk & network: 𝐾 = 103, 𝑀 = 106, 𝐺 = 109, 𝑇 = 1012
Unit Conversion
![Page 32: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/32.jpg)
• Solid State Disk (SSD)
• Sequential access faster than random access
• Write slower than Read
• Modifying a block page requires full page erasure and copy
SSD
![Page 33: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/33.jpg)
Developing Tendency
![Page 34: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/34.jpg)
• Temporal locality
• Spatial locality
• Data-access: temporal locality or spatial locality:• The smaller the step length, the better the spatial locality.• Repeating references to the same variable has the temporal locality.
• Instruction-fetch: both locality: • The smaller the loop body and the more the number of iteration, the
better the locality.
Locality
![Page 35: Processor Arch: Pipelined The Memory Hierarchy](https://reader034.vdocuments.mx/reader034/viewer/2022051412/627dcf96a4eaa273db47c10f/html5/thumbnails/35.jpg)
Thanks for listening.