inputsmetricscoderesults main memory core interconnection network private data (li) cache cache...

41
Cache Coherence Simulation Team #1 Members: Shahab Helmi Anh Nguyen K. Harish Kodali Phuc Nguyen Supervised by: Professor Gita Alaghband

Upload: adele-cain

Post on 30-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Cache Coherence Protocols in Multicore Architectures

Cache Coherence SimulationTeam #1 Members: Shahab HelmiAnh NguyenK. Harish KodaliPhuc NguyenSupervised by:Professor Gita AlaghbandFeel Free To Ask Questions

SnoopingInputs (Tests)Performance MetricsImplementationResults & AnalysisDirectoryReviewEvaluation PlanImplementation Results & Analysis

OutlineImplemented 2 simulators for Coherence:Snooping Simulator (C#)Directory Simulator (C++)

Protocols:MSIMESIMOSISimulatorsMAIN MEMORYcoreInterconnection networkPrivate data (LI) cacheCache controllercoreCache controllerPrivate data (LI) cacheMULTICORE PROCESSOR CHIP

ParametersHardware Parameters:Number of coresCache and memory latency (cycles)Memory and cache size (number of blocks)

Input Parameters: Input size (number of load/store requests for each core)Store percentage (distribution of load vs store request). For example if it is set to 40, 40% of the request will be store and 60% of them will be load request. Larger input size -> higher probability that cores need a block at the same timeLarger store percentage -> more conflicts -> more stalls

InputsName#coresC LatencyM LatencyM BlocksCache BlocksInput SizeStore %Number of invalidate messages in MSI and MESINumber of write-backs in MSI and MOSIL18310010K1K1K0L28310010K1K1K40L38310010K1K1K80M18310010K1K10K0M28310010K1K10K40M38310010K1K10K80H18310010K1K100K0H28310010K1K100K40H38310010K1K100K80

Inputs (contd)Name#coresC LatencyM LatencyM BlocksCache BlocksInput SizeStore %Sensitivity of write-backs to the cache sizeMC18310010K1010K50MC28310010K10010K50MC28310010K1K10K50Sensitivity of write-backs to the # of coresMW12310010K10010K50MW24310010K10010K50MW38310010K10010K50Goals: number of invalidate messages (MESI), number of write backs (MOSI)Per Core:Write-backsMemory readsInvalidate messagesCoherence messages (broadcasting between cores)Memory messages (coherence messages sent to the memory)Data responsesStallsCache hitsCache missesReplacements (evictions): when cache is fullMetricsPer Protocol:Write-backsInvalidate messagesCoherence messages (broadcasting between cores)Memory messages (coherence messages sent to the memory)All messagesMemory references (read/write from memory)StallsCache hitsCache missesMetrics (contd)InputElement.cspublic string _Command = Load"; public int _Core = 2; public int _BlockID = 25; Core #2 needs to load the block which is originally located in the 25th block of memory (its copies could be contained in caches!)Generator.csThis class generates the input (input elements) according to the input size and store percentage parameters.

Input

InputTest.csGenerates an input using the Generator.cs class and outputs values.Load 3850 2 Load 5207 6 Load 7230 4 Store 4374 3 Store 5998 5 Load 4247 3 Store 7729 1 Load 1040 0 Store 862 2 Store 4738 4 Load 2152 7 Load 6976 1 Store 8759 6 Store 8347 3 Load 7171 0 ..

Tests.csIncludes predefined configurations that we used to test our simulator.Input (contd)

Core.csWe use this for keeping track of our metrics for each core.

CacheBlock.csEach cache block has 3 fields: Cores

MSI.csMESI.csMOSI.cs

Methods:Load: loads a block into cache.Evicts: if the cache is full, this method chooses and evicts it.UpdateState: updates the state of a block. For example: M->SOWNE(BlockID = 1, Core = 2) : returns true if the cache 2 has a copy of the memory block with id = 1 in the E state. OWNS, OWNO

Protocols

Summary.csProtocol Statistics.csThese classes calculate the number of messages, cache hits and so on for each protocol.

For example: Cache Hits = cache hits of cache 1 + cache hits of cache 2 +

Statistics

Demo!Requirements: .NET Framework CSCI5593 ->bin -> Debug -> CSCI5593.exeVisual Studio 2013

Metrics:Number of invalidate messagesProtocols: MSI, MESI

Number of write-backsProtocols: MSI, MOSI

Write-back reduction vs Cache sizeProtocols: MSI, MESI, MOSI

Write-back reduction vs Number of coresProtocols: MSI, MOSIEvaluations (1/9)Name#coresC LatencyM LatencyM BlocksCache BlocksInput SizeStore %Number of invalidate messages in MSI and MESINumber of write-backs in MSI and MOSIL18310010K1K1K0L28310010K1K1K40L38310010K1K1K80M18310010K1K10K0M28310010K1K10K40M38310010K1K10K80H18310010K1K100K0H28310010K1K100K40H38310010K1K100K80Evaluations (2/9)Evaluations (3/9)

99003672286372995698356222776271739558183575612779427131785547089623000Situation: A core first reads a block and then subsequently writes it.18Evaluations (4/9)

001331314454433716367676197603424513863879834783890000Situation: A cache has a block in state M or E and receives a GetS from another core.19Evaluations (5/9)

11142310265110092211063410090499612880901978997824215930590362247473344474049448802673711163309655891258064910157839Evaluations (6/9)Name#coresC LatencyM LatencyM BlocksCache BlocksInput SizeStore %Sensitivity of write-backs to the cache sizeMC18310010K1010K50MC28310010K10010K50MC28310010K1K10K50Evaluations (7/9)

494999534948490196764717284865832328Evaluations (8/9)Name#coresC LatencyM LatencyM BlocksCache BlocksInput SizeStore %Sensitivity of write-backs to the # of coresMW12310010K10010K50MW24310010K10010K50MW38310010K10010K50Evaluations (9/9)10.7310.6310.63

Review (Directory Protocol)Interconnection networkMAIN MEMORYcorePrivate data (LI) cacheCache controllerDirectory controllerDirectoryMAIN MEMORYcorePrivate data (LI) cacheCache controllerDirectory controllerDirectoryIn this presentation, we present the result of implementing multiprocessor system model with distributed directorySimulator:Written in C++Evaluation metrics:Number of write backs vs. cache size and block sizeNumber of write backs vs. coresNumber of stalls vs. scoresNumber of cycles vs. scoresNumber of hits and misses vs. scores

Evaluation PlanSystem ImplementationDirectory controllerCacheBlockCache controller CoreCache controller CoreCache controller CoreCache controller sends request to directorySystem ImplementationCache controller CoreCache controller Directory controllerCacheBlockCoreCache controller CorebottleneckSystem ImplementationCache controller CoreCache controller Directory controllerCacheBlockCoreCache controller CoreDirectory controllerDirectory controllerCache controller responses to every request by unicasting message

Messages typesStates

Define messages and statesCache controller request

Cache controller request: A closer look

Directory controller response

MOSI_protocol_cache_request: Executing cache controller requestMOSI_protocol_directory_request: Executing directory controller responseI_state_cache: Performing cache actions when it is in I stateTransition_I_to_SD: Performing cache actions when it is in I state and wants to change to S state with condition DDirectory_I: Performing directory action upon receiving message on cache controller for a block in I state

All functions

MOSI protocol: Number of cores: 8; Number of request/cycle: 4

Evaluation, Effects of Cache SizeL1 Cache Size (KB)Write-Back/Memory References16 681163213627864 272795128 545639L1 Block Size (bytes)Write-Back/Memory References16 613779323427764171801288537Write backsL1 cash size (KB)Write backsL1 block size (bytes)Block size =16 bytesCache size = 128 bytesNumber of write back vs. coresNumber of write backsmean(MOSI/MSI) = 0.7816Number of stalls vs. scoresNumber of blocks/cache: 1000Number of cache:100Number of request/cycle: 4Number of stallsmean(MOSI/MSI) = 1.1002Number of cycles vs. cores

Number of blocks/cache: 1000Number of cache:100Number of request/cycle: 4Number of cyclesmean(MOSI/MSI) = 1.459Number of hits and misses vs. cores

Number of blocks/cache: 1000Number of cache:100Number of request/cycle: 4mean(MOSI/MSI) = 1.345mean(MOSI/MSI) = 1.273[1] - Daniel J. S. Mark D. H. David A. W., A Primer on Memory Consistency and Cache Coherence, Morgan Claypool Publishers, 2011.[2] Suleman, Linda Bigelow Veynu Narasiman Aater. "An Evaluation of Snoop-Based Cache Coherence Protocols."[3] Tiwari, Anoop. Performance comparison of cache coherence protocol on multi-core architecture. Diss. 2014.[4] Chang, Mu-Tien, Shih-Lien Lu, and Bruce Jacob. "Impact of Cache Coherence Protocols on the Power Consumption of STT-RAM-Based LLC."[5] CMU 15-418: Parallel Architecture and Programming. Lecture Series. Spring 2012ReferencesQ&A