16 memory hierarchy cache - unipi.it · cache operation • processor issues read and write...

19
SISTEMI EMBEDDED Computer Organization Memory Hierarchy, Cache Memory Federico Baronti Last version: 20160524

Upload: others

Post on 16-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

SISTEMIEMBEDDED

ComputerOrganizationMemoryHierarchy,CacheMemory

FedericoBaronti Lastversion:20160524

Page 2: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

MemoryHierarchy

• Idealmemoryisfast,large,andinexpensive

• Notfeasiblewithcurrentmemorytechnology,sousememoryhierarchy

• Exploitsprogrambehavior(localityofreference)tomakeitappear asthoughmemoryisonaveragefastandlarge

Page 3: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

CachesandLocalityofReference

• Thecacheisbetweenprocessorandmemory• Makeslarge,slowmainmemoryappearfast• Typicalprogrambehaviorinvolvesexecutinginstructionsinloopsandaccessingdataarray

• Effectivenessisbasedonlocalityofreference– Temporallocality:instructions/datathathavebeenrecentlyaccessedarelikelytobeagain

– Spatiallocality:nearby instructionsordataarelikelytobeaccessedaftercurrentaccess

Page 4: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

MoreCacheConcepts• Toexploitspatiallocality,transfercacheblock(orline)withmultipleadjacentwordsfrommemory– Lateraccessestonearbywordsarefast,providedthatcachestillcontainstheblock

• Mapping function determineswhereablockfrommemoryistobelocatedinthecache– DirectorAssociativemapping

• Whencacheisfull,replacementalgorithmdetermineswhichblockhastoberemovedfromthecache

Page 5: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

CacheOperation

• ProcessorissuesReadandWriterequestsasifitwereaccessingmainmemorydirectly

• Butcontrolcircuitryfirstchecksthecache– Ifdesiredinformationispresentinthecache,aread or write hit occurs

• Foraread hit,mainmemoryisnotinvolved;thecacheprovidesthedesiredinformation

• Forawrite hit,therearetwoapproaches:–Write-backorWrite-through

Page 6: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

HandlingCacheWrites• Write-throughprotocol:updatecache&memory.Memoryisalwaysupdated.

• Write-backprotocol:onlyupdatethecache;memoryupdatedlaterwhenblockisreplaced– Write-back schemeneedsmodified ordirtybit tomarkblocksthatareupdatedinthecacheandneedtobewritteninthemainmemorywhentheyarereplaced

• Ifsamelocationiswrittenrepeatedly,thenwrite-back ismuchbetterthanwrite-through– Blockmemoryupdateisoftenmoreefficient,evenifwritingbackunchangedwords

Page 7: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

HandlingCacheMisses

• Ifdesiredinformationisnotpresentincache,aread orwrite miss occurs

• Foraread miss,theblockwithdesiredwordistransferredfrommainmemorytothecache

• Forawrite missunderwrite-through protocol,informationiswrittentothemainmemory

• Underwrite-back protocol,firsttransferblockcontainingtheaddressedwordintothecache.Thenoverwritespecificwordincachedblock

Page 8: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

MappingFunctions• Blockofconsecutivewordsinmainmemorymustbetransferredtothecacheafteramiss

• Themappingfunction determinesthelocationofablockinthecache

• Threemappingfunctions:– Direct,AssociativeandSetAssociativeMapping

• Let’sconsiderthefollowingscenario:– Cachewith128blocksof16words– Mainmemorywith64Kwords(4Kblocks),word-addressable,so16-bitaddress

Page 9: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

DirectMapping

• Simplestapproachusesafixedmapping:mem.blockj→ cacheblock(jmod128)• Onlyoneuniquelocationforeachmem.block– Twoblocksmaycontendforsamelocationevenifthecacheisnotfullyutilized–Newblockalwaysoverwritespreviousblock

Page 10: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

Addressisdividedinto3fieldstag,blockorlineindex,word(oroffset)

Cachewith128blocksof16wordsMainmemorywith64Kwords(4Kblocks)Word-addressablememory,so16-bitaddress

DirectMapping

Page 11: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

AssociativeMapping

• Fullflexibility:locateblockanywhereincache• Blockfieldofaddressnolongerneedsanybits• Tagfieldisenlargedtoencompassthosebits• Largertagstoredincachewitheachblock• Forhit/miss,comparealltagssimultaneouslyinparallelagainsttagfieldofgivenaddress

• Thisassociative search increasescomplexity• Flexiblemappingalsorequiresappropriatereplacementalgorithmwhencacheisfull

Page 12: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

Cachewith128blocksof16wordsMainmemorywith64Kwords(4Kblocks)Word-addressablememory,so16-bitaddress

AssociativeMapping

Page 13: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

Set-AssociativeMapping• Combinationofdirect&associativemapping• Groupblocksofcacheintosets• Blockfieldbitsmapablocktoauniqueset• Butanyblockwithinasetmaybeused• Associativesearchinvolvesonlytagsinaset• Replacementalgorithmisonlyforblocksinset• Reducingflexibilityalsoreducescomplexity• k blocks/set→k-wayset-associativecache– DirectMappingcorrespondsto1-way– AssociativeMappingcorrespondstoall-way

Page 14: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

Cachewith128blocksof16wordsMainmemorywith64Kwords(4Kblocks)Word-addressablememory,so16-bitaddress

2-wayAssociativeMapping

Page 15: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

StaleData• Eachblockhasavalidbit,initializedto0• Nohitifvalidbitis0,eveniftagmatchoccurs• Validbitsetto1whenablockplacedincache• Whenpoweristurnedon,allvalidbitsaresetto0• BecauseofDMA,mainmemorycanchangew/oreadorwriteperformedbytheprocessor– InvalidacacheblockwhencorrespondingblockinmemoryismodifiedbyDMA

– Ifwrite-blacktransfersblockfromcachetomemorybeforestartingDMAthathassuchablockassource.Thisactioncanbeachievedbyflushingthecache.

Page 16: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

LRUReplacementAlgorithm• Replacementistrivialfordirectmapping,butneedsamethodforassociativemapping

• Considertemporallocalityofreferenceandusealeast-recently-used (LRU) algorithm

• Fork-waysetassociativity,eachblockinasethasacounterrangingfromfrom0 tok-1,whichisupdatedw/thefollowing rules:– Hittingonablockclearsitscountervalueto0;othersoriginallylowerinsetareincremented,andalltheothersremainunchanged

– Whenamissoccursandthesetisnotfull,thecounterofthenewblockissetto0 andalltheothersareincreasedbyone

– Whenamissoccursandthesetisfull,replacetheblockw/counter=k-1,setitscounterto0 andincrementbyonealltheothercounters

Page 17: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

HitRateandMissPenalty• Performanceofamemoryhierarchyaredeterminedbythehit

rate andthemisspenalty• Hitrate dependsonthecachesizeanditsorganization

(mappingfunction,blocksize)• Misspenalty includesthetimetodetectthemiss,transferone

blockfromthemainmem.tothecacheandeventuallytherequestedwordtotheproc.Itdependsonthemainmemoryaccesstime,whichisusuallymuchlargerforthefirstwordoftheblockthanfortheremainderones.– Let’sassumethatthecacheaccesstimeis1clockcycle,theaccess

forthefirstwordinmem.isNfirst =7cycle andforthefollowingwordsNmore =1cycle,andtheblocksizeB =8word.

– Then,themisspenaltyNmiss =(1+1x Nfirst +(B-1)x Nmore +1)– 1=15,where1onecycleisfordetectingthecachemissandanotherforprovidingtherequestedwordtotheproc.

Page 18: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

EffectonPipeliningPerformance• Assumethat:freq.ofcachemissesduringfetchpmiss-fetch =5%,freq.ofcachemissesduringmemaccesspmiss-mem =10%,freq.ofLoadandStoreinstr.pLD-ST =30%.Then,:– δcache-miss =Nmiss (pmiss-fetch +pLD-ST x pmiss-mem)

=15x(0.05+0.03)=1.2– F =R/2.2=0.45R

• W/ocache,i.e.,pmiss-fetch =100%andpmiss-mem=100%,memaccesstimepenaltyisNfirst -1cycle– δmem =(Nfirst -1)x(1+pLD-ST)=7.8– F =R/8.8=0.11R

• Cacheimprovesperformancebyafactorof4.

Page 19: 16 Memory Hierarchy Cache - unipi.it · Cache Operation • Processor issues Read and Write requests as if it were accessing main memory directly • But control circuitry first checks

References

• C.Hamacher,Z.Vranesic,S.Zaky,N.Manjikian"ComputerOrganizationandEmbeddedSystems,”McGraw-HillInternationalEdition– ChapterVIII:8.5– 8.7.1