pipelining cache
TRANSCRIPT
![Page 1: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/1.jpg)
Pipelining CacheBy Riman Mandal
![Page 2: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/2.jpg)
Contents
▪ What is Pipelining?▪ Cache optimization▪ Why Pipelining cache?▪ Cache Hit and Cache Access▪ How can we implement pipelining to cache▪ Cache Pipelining effects▪ References
![Page 3: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/3.jpg)
What is Pipelining?
Time
Jobs 24 hrs
24 hrs
24 hrs
Un-pipelined
Throughput
Parallelism
1 car / 24 hrs
1
Start and Finish a job before moving to next job
![Page 4: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/4.jpg)
What is Pipelining? (cont.)
Time
Jobs
Throughput
Parallelism
1 car / 8 hrs
3
Pipelined Break the job into small stages
Engine1
Engine2
Engine3
Engine4
Body1
Body2
Body2
Body4
Paint1
Paint2
Paint3
Paint4
8 hr
8 hr
8 hr
x3
![Page 5: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/5.jpg)
What is Pipelining? (cont.)
Time
Jobs 3 ns
3 ns
3 ns
Un-pipelined
Start and Finish an instruction execution before moving to next instruction
FET
DEC EXE
FET
DEC EXE
FET
DEC EXE
Cyc 1
Cyc 2
Cyc 3
![Page 6: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/6.jpg)
What is Pipelining? (cont.)
Time
Jobs
Pipelined Break the instruction exeution into small stages
FET IR1
FET IR2
FET IR3
FET IR4
DEC IR1
DEC IR2
DEC IR3
DEC IR4
EXC IR1
EXC IR2
EXC IR3
EXC IR4
Cyc 1 Cyc
2Cyc 3 1
ns1 ns
1 ns
Un-pipelinedClock Speed =
1 / 3ns = 333 MHzPipelinedClock Speed = 1 / 1ns = 1 GHz
![Page 7: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/7.jpg)
Cache optimization
▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss penalty
▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power consumption
▪ Optimizing Cache Access Time – Reducing the hit time (1st level catch, way-prediction)– Increasing cache bandwidth (pipelining cache, non-blocking
cache, multibanked cache)– Reducing the miss penalty (critical word first, merging write
buffers)– Reducing the miss rate (compiler optimizations)– Reducing the miss penalty or miss rate via parallelism
(prefetching)
![Page 8: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/8.jpg)
Why Pipelining Cache?
▪ Basically used for L1 Cache.▪ Multiple Cycles to access the cache– Access comes in cycle N (hit)– Access comes in Cycle N+1 (hit) (Has to wait)
Hit time = Actual hit time + wait time
![Page 9: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/9.jpg)
Cache Hit and Cache Access
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ? Hit ?Hit ?
Where ?
Index
Done
Valid bit
![Page 10: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/10.jpg)
Designing a 3 Stage pipeline Cache
▪ Reading the tag and validity bit.▪ Combine the result and find out the actual hit and start
data read.▪ Finishing the data read and transfer data to CPU.
Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
![Page 11: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/11.jpg)
Stage 1:Read tag and valid bit
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ? Hit ?Hit ?
Where ?
Index
Done
Valid bit
![Page 12: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/12.jpg)
Stage 2: If Hit start reading
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ? Hit ?Hit ?
Where ?
Index
Done
Valid bit
![Page 13: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/13.jpg)
Stage 3: Supply data to CPU
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ? Hit ?Hit ?
Where ?
Index
Done
Valid bit
![Page 14: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/14.jpg)
Designing a 2 Stage pipeline Cache
▪ Checking the tag and validity bit and combine them to find actual hit, and find the location of data.
▪ Read data and serve the CPU request.
Retrieve tag and valid bit. Is Hit? Serve CPU request
![Page 15: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/15.jpg)
Example
▪ Instruction-cache pipeline stages:– Pentium: 1 stage– Pentium Pro through Pentium III: 2 stages– Pentium 4: 4 stages
![Page 16: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/16.jpg)
Pipeline Cache Efficiency
▪ Increases the bandwidth▪ increasing the number of pipeline stages leading to – greater penalty on mispredicted branches – more clock cycles between issuing the load and using the data
Technique
Hittime Bandwidth
Misspenalty
Missrate
Powerconsumptio
nPipelining
Cache _ +
![Page 17: Pipelining cache](https://reader036.vdocuments.mx/reader036/viewer/2022070514/587d35841a28ab2a448b5d71/html5/thumbnails/17.jpg)
References
▪ https://www.udacity.com/course/high-performance-computer-architecture--ud007
▪ https://www.youtube.com/watch?v=r9AxfQB_qlc▪ “Computer Architecture: A Quantitative Approach Fifth
Edition”, by Hennessy & Patterson