lab 3 – cache & memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ece 1175 – lab...
TRANSCRIPT
![Page 1: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/1.jpg)
ECE 1175Embedded Systems Design
Lab 3 – Cache & Memory
ECE 1175 Embedded Systems Design
1
![Page 2: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/2.jpg)
ECE 1175 – Lab 3
Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1
Direct GPIO Access Virtual/physical memory basics Raspberry Pi GPIO Lab task 2
ECE 1175 Embedded Systems Design 2
You need to use C/C++ to complete your lab work.
![Page 3: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/3.jpg)
ARM Cortex-A53
Cache Basics
Cache: Fast but small memory close to the processor
ECE 1175 Embedded Systems Design 3
Caches on Raspberry Pi 3 Processor BCM2837
Data Cache Instruction Cache
L1 Cache (per core)
L2 Cache
Main Memory (used by CPU) Main Memory (used by GPU)
VideoCore IV
L2 Cache
Per slice
Per slice
Instruction CacheUniform Cache
Textual Memory Unit
![Page 4: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/4.jpg)
0
1
2
3
4
5
6
7
… …
Cache Basics
How does cache work?
ECE 1175 Embedded Systems Design 4
0
1
2
3
4
5
6
7
… …
Memory
CPU
Look for data in
address 0
If not in cache, load the entire block into
the cache.
Load the entire block
Cache
The block size depends on specific cache design.
![Page 5: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/5.jpg)
Cache Basics
Impact on C programming
ECE 1175 Embedded Systems Design 5
In C, multidimensional arrays are stored in row-major order in the memory. The way you access entries affects cache misses.
Address Row-major Column-major
0 a11 a11
1 a12 a21
2 a13 a31
3 a21 a12
4 a22 a22
5 a23 a32
6 a31 a13
… … …
![Page 6: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/6.jpg)
Cache Basics
Impact on C programming
ECE 1175 Embedded Systems Design 6
Traverse a 2D array in row major order
Traverse a 2D array in column major order
for (i = 0; i < N; i++) {for (j = 0; j < N; j++) {
// Access a[j][i]}
}
for (i = 0; i < N; i++) {for (j = 0; j < N; j++) {
// Access a[i][j]}
}
Not sequential access, 100% cache misses!
Sequential access, a few compulsory cache missesstride = 1
stride = N
Let’s assume N is very large
![Page 7: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/7.jpg)
Performance Analysis Tool – perf
perf Provide access to performance counters
• Hardware: CPU cycles, bus cycles, cache misses, etc.• Software: task clock, page faults, alignment faults, etc.• Use perf list to see available events
Offer a rich set of commands• Support multiple events• Repeated measurement• Processor-wide mode• Use perf --help to check info on a specific command
ECE 1175 Embedded Systems Design 7
![Page 8: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/8.jpg)
Performance Analysis Tool – perf
Use perf on Raspberry Pi OS To install: sudo apt-get install linux-perf Bypass version check on Raspbian
ECE 1175 Embedded Systems Design 8
1. Check your installed perf version
2. Open /usr/bin/perf (use vim, nano, etc.)
3. Change exec “perf_$version” “$@” to exec “perf_4.9” “$@”Your installed version
![Page 9: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/9.jpg)
Performance Analysis Tool – perf
An example of analyzing your program via perf
ECE 1175 Embedded Systems Design 9
Add events you want to measure
Measurements return
perf stat -e event1,event2,event3 [...] ./your_program
For more details: https://perf.wiki.kernel.org/index.php/Tutorial
![Page 10: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/10.jpg)
Lab Task 1
Analyze your matrix multiplication program via perf
ECE 1175 Embedded Systems Design 10
𝐂𝐂 = 𝐀𝐀𝐀𝐀 is defined by 𝑐𝑐𝑖𝑖𝑖𝑖 = ∑𝑘𝑘=0𝑁𝑁−1 𝑎𝑎𝑖𝑖𝑘𝑘𝑏𝑏𝑘𝑘𝑖𝑖 .
Very high cache miss rate!
To reduce cache misses, you can try interchanging your loops. Use perf to measure L1 data cache misses.
Not sequentially accessed in memory
![Page 11: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/11.jpg)
Virtual/Physical Memory Basics
In modern operating systems, physical memory is transparent to users.
ECE 1175 Embedded Systems Design 11
0
1
2
3
4
5
6
7
… …
0
1
2
3
4
5
6
7
… …
Physical Memory Virtual Memory
User Space
Mapping
![Page 12: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/12.jpg)
Device file – dev/mem
dev/mem An image of main memory of computer Byte addresses in /dev/mem are interpreted as physical
memory (actual RAM address, registers).
ECE 1175 Embedded Systems Design 12
For more details: https://man7.org/linux/man-pages/man4/mem.4.html
![Page 13: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/13.jpg)
Create mapping – mmap()
mmap() with dev/mem You can create a mapping from virtual to physical memory.
ECE 1175 Embedded Systems Design 13
0
1
2
3
4
5
6
7… …
0
1
2
3
4
5
6
7
… …
Physical Memory Virtual Memory
User Space
For more details: https://man7.org/linux/man-pages/man2/mmap.2.html
![Page 14: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/14.jpg)
Lab Task 2
Direct GPIO manipulation on Raspberry Pi OS Find the physical address of GPIO registers in manual. Use mmap() and dev/mem to create a mapping. Control the GPIO registers from user space.
ECE 1175 Embedded Systems Design 14
0
1
2
3
4
5
… …0
1
2
3
4
5
… …Physical Memory Virtual Memory
User Space
Pi GPIO pinsGPIO Control Registers
set pin high/low
get pin status
![Page 15: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/15.jpg)
Raspberry Pi GPIO
GPIO pinouts of Raspberry Pi 3
ECE 1175 Embedded Systems Design 15https://pinout.xyz/
Please select those pins without other specific purposes for your test.
![Page 16: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/16.jpg)
Lab Task 2
For check-off You can code your GPIO pins to generate some voltage
patterns (e.g. square wave) and verify your configuration using multimeters or oscilloscopes.
What’s the benefit of direct control from OS? Easier than low-level assembly code on bare metal. OS provides more functionalities for you to develop
interesting applications. Many APIs available but you can customize your own for
specific purposes.ECE 1175 Embedded Systems Design 16
You can refer to the example here https://elinux.org/RPi_GPIO_Code_Samples.
![Page 17: Lab 3 – Cache & Memoryweigao/ece1175/spring2021/lab3... · 2020. 9. 28. · ECE 1175 – Lab 3 Monitor Cache Misses Cache basics Performance analysis tool – perf Lab task 1 Direct](https://reader036.vdocuments.mx/reader036/viewer/2022070223/6142c187b7accd31ec0ee751/html5/thumbnails/17.jpg)
ECE 1175 Embedded Systems Design 17
Thank you!