ms thesis defense
DESCRIPTION
MS Thesis Defense “Improving Performance, Power, and Security of Multicore Systems using Cache Organization” By Tania Jareen. CoE EECS Department April 21, 2014. About Me. Tania Jareen MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications: - PowerPoint PPT PresentationTRANSCRIPT
MS Thesis Defense
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”
By
Tania Jareen
CoE EECS Department
April 21, 2014
Jareen 2
About Me
Tania Jareen
MS in Electrical Engineering with Thesis GTA for Routing and Switching–II Publications:
“An Effective Locking-Free Caching Technique for Power-Aware Multicore Computing Systems,” accepted in the IEEE ICIEV-2014 conference.
“A Novel Level-1 Cache Mapping Approach to Improve System Security without Compromising Performance to Power Ratio,” currently preparing.
Jareen 3
Committee Members
Dr. Abu Asaduzzaman, EECS Dept.
Dr. Ramazan Asmatulu, ME Dept.
Dr. Zheng Chen, EECS Dept.
Jareen 4
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”
Outline ►
IntroductionProblem StatementSome Important TermsPrevious WorkProposalSimulationSimulation Results ConclusionsFuture Work
Q U E S T I O N S ? Any time, please.
Jareen 5
Introduction
Multicore System Multicore system is a collection of parallel or concurrent
processing units, divides a large complex problem into many small tasks
Main goal : to process a complex problem faster
Dual-core System
Jareen 6
Problem Statement
Challenges for Multicore System
High Average Memory Latency
High Total Power Consumption
Cache Side Channel Security Attack
Jareen 7
Contributions
Propose a multicore system design to reduce the average memory latency
Propose a multicore system design to reduce the total power consumption
Propose a multicore system design to provide hardware level security
Jareen 8
Some Important Terms
■ Cache
A small buffer to store
recent information
Helps to mitigate the speed gap
between processor and main memory
Increases the overall performance of the system significantly
Logically cache is placed between CPU and main memory
Cache and Main Memory (Computer Desktop Encyclopedia)
Jareen 9
Some Important Terms
■ Cache Organization
Cache Hit – The requested data contains in the cache
Cache Miss – The requested data does not contain in the cache
Cache Organization
Jareen 10
Some Important Terms
■ Cache Replacement Policy
Some blocks from the
cache need to be replaced
to store new blocks as the
cache memory size is limited
Replacement should be in a manner so that the miss ratio will be low
Some of the well know cache replacement policies – Least Recent Used (LRU), Random, Most Recent Used, First In First Out etc
Cache Replacement Policy (Aaron Toponce)
Jareen 11
Some Important Terms
■ Memory Update Policy
This is a combination of Read policy and Write policy
Read Policy – indicates how a word is to be read.
Write Policy – indicates how the write of a memory block will be handled. Example: Write-Through, Write-Back
Jareen 12
Some Important Terms
■ Cache Locking
Lock the most usable data for future
During replacement, these locked blocks
will not be replaced
Increases the hit ratio and performance
Reduces average memory access time and power consumption
Problem : Hard to predict locking blocks, all processor configuration does not suit, reduces effective cache size
Locked Cache
Jareen 13
Some Important Terms
■ Victim Cache
Oldest and the most popular
technique to improve performance
Used in between CL1 and CL2
Holds the victim blocks during cache replacement
Reduces average memory latency and total power consumption
Victim cache Organization
Jareen 14
Some Important Terms
■ Stream Buffering
During cache miss, required blocks along with some additional blocks come from main memory to CL2 and then copy to CL1
The additional blocks are kept in Stream Buffer
Helps to reduce average memory latency and total power consumption
Jareen 15
Some Important Terms
■ Cache Side Channel Attack
Hardware attack, mainly on cache
Exploits important information from cache by passively monitoring
Uses physical properties (Example: time variation, power
consumption, sound variation, heat production) [1,2,3,4]
Silent attack, but most dangerous
Jareen 16
Some Important Terms
■ A-Symmetric Encryption Step 1: Receiver generates private and public key and shares the public key with the sender.
Step 2. Sender encrypts the information using the public key
Step 3: Sender sends the encrypted information to receiver
Step 4: Receiver decrypts the information using its own private key
A-Symmetric Encryption
Jareen 17
Previous Work
■ To Improve Average Memory Latency and Total Power Consumption:
Victim Cache between CL1 and main memory and Stream Buffering [6]
Problem – no guarantee that the victim blocks are with maximum number of miss
Selective Victim Caching [7]
Problem – possibility to pollute the cache, need prediction
Jareen 18
Previous Work
Selective Pre-Fetching [8]
Problem – need a history of references
Cache Locking [9]
Problem – hard to predict the blocks with high cache miss, all processor configuration do not support
■ To Improve Cache Level Security: Partitioned Cache [1]
Problem – cache underutilization, need to depend on software
Dynamic Memory-to-Cache Remapping [5]
Jareen 19
Proposed Mechanism
■ Smart Victim Cache (SVC) MCB = Miss Cache Block
VCB = Victim Cache Block
SBB = Stream Buffering Block
BACMI = Block Address and Cache Miss Information
SLLC = Shared Last Level Cache
Proposed Cache Organization with SVC
Jareen 20
Work Flow Diagram
Work Flow Diagram
Jareen 21
Proposed Mechanism
Block size = 128 Bytes; main memory = 4 GB
SVCSize(KB)
Num. ofBlocks
SVC1:MCB
(Block)
SVC2:VCB+SBB
(Block)
Max. Num.of BACMIs(MCB*16)
2 168 5 + 3 128
5 8 + 3 80
4 328 21 + 3 128
16 13 + 3 256
8 648 53 + 3 128
48 13 + 3 768
16 1288 117 + 3 128
112 13 + 3 1792
32 2568 245 + 3 128
240 13 + 3 3840
Maximum Number of BACMI entries for a given SVC with various MCB
MCB = Miss Cache Block
VCB = Victim Cache Block
SBB = Stream Buffering Block
BACMI = Block Address and Cache Miss Information
Jareen 22
Simulation
■ Assumptions
SVC can be enabled and disabled
All cores equally share SVC
LRU replacement policy is used
Write-Back update policy is used
Jareen 23
Simulation
■ Workload
Moving Picture Experts
Group’s – 4 (MPEG-4) Advanced Video
Coding (H.264/AVC) Matrix Inversion (MI) Fast Fourier Transform (FFT)
H.264/AVC behaves similar to MPEG-4 MI behaves similar to FFT
Jareen 24
Simulation
■ Input Parameters
Number of cores = 4
SVC size = 2, 4, 8, 16, 32 KB
I1/D1 size of CL1 = 8/8, 16/16, 32/32, 64/64, 128/128 KB
CL2 size = 256, 512, 1024, 2048, 4096 KB
Line size = 16, 32, 64, 128, 256 B
Associativity level = 1- ,2-, 4-, 8-, 16 - way
Jareen 25
Simulation
■ Assumption for Delay Penalty
Number of cycle to load and store any operation = 100
Number of cycle to branch any operation = 150
Satisfy Any Instruction at
Number of Cycles
ALU 1
Private CL1 3
Shared CL2 10
Jareen 26
Simulation
■ Assumption for Power Consumption
Power Consumption by mWatts/Operation
CPU 3.6
I1 2.7
Other Components 2.1
Jareen 27
Simulation Results
■ Impact of SVC Size
Jareen 28
Simulation Results■ Impact of SVC and CL1 Size
Both the latency and total power consumption decreases for MPEG-4 when the cache size increases For MPEG-4, both latency and power decreases mostly with SVC and no locking
Impact of SVC and CL1 Size on Memory Latency and Total Power Consumption
Jareen 29
Simulation Results
■ Impact of SVC and Line Size
With the increase of line size of MPEG-4, latency and power consumption decreases For MPEG-4, latency and power consumption both decreases with SVC and no locking
Impact of SVC and Line Size on Memory latency and Total power Consumption
Jareen 30
Simulation Results
■ Impact of SVC and Associativity Level
For MPEG-4, with increase of associativity level, latency and power consumption decreases For MPEG-4, latency and power consumption decreases most for SVC and no locking
Impact of SVC and Associativity Level on Memory Latency and Total Power Consumption
Jareen 31
Simulation Results
■ Impact of SVC and CL2/SLLC Size
With the increase of CL2 size, for MPEG-4, latency becomes stable but power consumption increasesBoth latency and power consumption for MPEG-4, decreases mostly for using SVC and no locking
Impact of SVC and CL2/SLLC Size on Memory Latency and Total power Consumption
Jareen 32
Simulation Results■ Comparison of SVC and Cache Line Locking
Average memory latency and total power consumption decreases as locked CL2 cache increases from 0% to 25% locking. Average memory latency and total power consumption both decreases with using SVC and no locking compared to using locking or no SVC and no locking.
Comparison of SVC and Cache Line Locking
Jareen 33
Proposed Solution for Security Improvement
■ Randomized Cache Mapping Between D1X and CL1 (Solution-1)
Randomized Cache Mapped Between D1X and CL1
Jareen 34
Proposed Solution for Security Improvement
■ Problem with Solution-1
Extra hardware D1X implementation
Increase memory latency for processing
Increase total power consumption about 17%
Jareen 35
Proposed Modified Solution for Security Improvement
■ Randomized Cache Mapping Between Main Memory and CL1 (Solution-2)
Randomized Cache Mapped between CL1 and Main Memory
It is expected that the probability of cache side channel attack decreases from 40K to 1 for 16 blocks of CL1
Jareen 36
Conclusions
Using several levels of cache in multicore systems cause serious performance and power issue
Shared cache among various cores in multicore system cause hardware level security threat
Proposed SVC significantly increases the system performance by
reducing the memory latency, power consumption
Proposed cache randomization technique between main memory and CL1 reduces the probability of cache attack significantly
Jareen 37
Conclusions
Average memory latency is reduced with SVC by 17% compared to CL2 cache locking
Total power consumption is reduced with SVC by 21% compared to CL2 cache locking
According to our estimates the probability of cache side channel attack reduces from 40K to 1 for 16 block of CL1
Jareen 38
Future Work
Explore the impact of SVC on average memory latency, total power consumption for real time embedded system and Handheld computers
Explore the randomized cache mapping between CL1 and main memory technique on real time embedded system and handheld computers
Jareen 39
QUESTION
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”
Jareen 40
Thank You
Contact:
Full Name: Tania Jareen
Telephone: (316) 516-8516
E-mail: [email protected]
“Improving Performance, Power, and Security of Multicore Systems using Cache Organization”
Jareen 41
References
1. D. Page, “Partitioned Cache Architecture as a Side-Channel Defense Mechanism,” in Cryptology ePrint Archive, Report 2005/280, 2005.
2. O. Aciicmez, “Yet another Micro Architectural Attack: exploiting I-Cache,” in CSAW ’07 Proceedings of the 2007 ACM workshop on Computer security architecture, pp. 11-18, DOI: 10.1145/1314466.1314469, 2007.
3. C.P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems,” Springer Berlin Heidelberg. pp. 104-113, DOI: 10.1007/3-540-68697-5_9, 1996.
4. P. Kocher, et al., "Differential Power Analysis," in Proceedings of the 19th Annul International Cryptology Conference on Advances in Cryptology, 1999.
Jareen 42
References
5. Z. Wang and R.B. Lee, "A novel cache architecture with enhanced performance and security," in Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, 2008.
6. N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Western Research Laboratory (WRL), Digital Equipment Corporation, URL:https://www.cis.upenn.edu/~cis501/papers/joupp victim.pdf, 1990.
7. D. Stiliadia and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches,” in IEEE Transactions on Computers, Vol. 46, No. 5. pp. 603-610, DOI: 10.1109/12.589235, 2002.
Jareen 43
References
8. R. Pendse and H. Katta, “Selective Prefetching: Prefetching when only required,” in the 42nd Midwest Symposium on Circuits and Systems, Vol. 2. pp. 866-869, DOI: 10.1109/MWSCAS.1999.867772, 1999.
9. A. Asaduzzaman, F.N. Sibai, and M. Rani, “Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level,” in the EUROMICRO Journal of Systems Architecture, Vol. 56, Issue 4-6. pp 151-162, 2010.