16 memory hierarchy cache - unipi.it · cache operation • processor issues read and write...

SISTEMIEMBEDDED

ComputerOrganizationMemoryHierarchy,CacheMemory

FedericoBaronti Lastversion:20160524

MemoryHierarchy

• Idealmemoryisfast,large,andinexpensive

• Notfeasiblewithcurrentmemorytechnology,sousememoryhierarchy

• Exploitsprogrambehavior(localityofreference)tomakeitappear asthoughmemoryisonaveragefastandlarge

CachesandLocalityofReference

• Thecacheisbetweenprocessorandmemory• Makeslarge,slowmainmemoryappearfast• Typicalprogrambehaviorinvolvesexecutinginstructionsinloopsandaccessingdataarray

• Effectivenessisbasedonlocalityofreference– Temporallocality:instructions/datathathavebeenrecentlyaccessedarelikelytobeagain

– Spatiallocality:nearby instructionsordataarelikelytobeaccessedaftercurrentaccess

MoreCacheConcepts• Toexploitspatiallocality,transfercacheblock(orline)withmultipleadjacentwordsfrommemory– Lateraccessestonearbywordsarefast,providedthatcachestillcontainstheblock

• Mapping function determineswhereablockfrommemoryistobelocatedinthecache– DirectorAssociativemapping

• Whencacheisfull,replacementalgorithmdetermineswhichblockhastoberemovedfromthecache

CacheOperation

• ProcessorissuesReadandWriterequestsasifitwereaccessingmainmemorydirectly

• Butcontrolcircuitryfirstchecksthecache– Ifdesiredinformationispresentinthecache,aread or write hit occurs

• Foraread hit,mainmemoryisnotinvolved;thecacheprovidesthedesiredinformation

• Forawrite hit,therearetwoapproaches:–Write-backorWrite-through

HandlingCacheWrites• Write-throughprotocol:updatecache&memory.Memoryisalwaysupdated.

• Write-backprotocol:onlyupdatethecache;memoryupdatedlaterwhenblockisreplaced– Write-back schemeneedsmodified ordirtybit tomarkblocksthatareupdatedinthecacheandneedtobewritteninthemainmemorywhentheyarereplaced

• Ifsamelocationiswrittenrepeatedly,thenwrite-back ismuchbetterthanwrite-through– Blockmemoryupdateisoftenmoreefficient,evenifwritingbackunchangedwords

HandlingCacheMisses

• Ifdesiredinformationisnotpresentincache,aread orwrite miss occurs

• Foraread miss,theblockwithdesiredwordistransferredfrommainmemorytothecache

• Forawrite missunderwrite-through protocol,informationiswrittentothemainmemory

• Underwrite-back protocol,firsttransferblockcontainingtheaddressedwordintothecache.Thenoverwritespecificwordincachedblock

MappingFunctions• Blockofconsecutivewordsinmainmemorymustbetransferredtothecacheafteramiss

• Themappingfunction determinesthelocationofablockinthecache

• Threemappingfunctions:– Direct,AssociativeandSetAssociativeMapping

• Let’sconsiderthefollowingscenario:– Cachewith128blocksof16words– Mainmemorywith64Kwords(4Kblocks),word-addressable,so16-bitaddress

DirectMapping

• Simplestapproachusesafixedmapping:mem.blockj→ cacheblock(jmod128)• Onlyoneuniquelocationforeachmem.block– Twoblocksmaycontendforsamelocationevenifthecacheisnotfullyutilized–Newblockalwaysoverwritespreviousblock

Addressisdividedinto3fieldstag,blockorlineindex,word(oroffset)

Cachewith128blocksof16wordsMainmemorywith64Kwords(4Kblocks)Word-addressablememory,so16-bitaddress

DirectMapping

AssociativeMapping

• Fullflexibility:locateblockanywhereincache• Blockfieldofaddressnolongerneedsanybits• Tagfieldisenlargedtoencompassthosebits• Largertagstoredincachewitheachblock• Forhit/miss,comparealltagssimultaneouslyinparallelagainsttagfieldofgivenaddress

• Thisassociative search increasescomplexity• Flexiblemappingalsorequiresappropriatereplacementalgorithmwhencacheisfull


AssociativeMapping

Set-AssociativeMapping• Combinationofdirect&associativemapping• Groupblocksofcacheintosets• Blockfieldbitsmapablocktoauniqueset• Butanyblockwithinasetmaybeused• Associativesearchinvolvesonlytagsinaset• Replacementalgorithmisonlyforblocksinset• Reducingflexibilityalsoreducescomplexity• k blocks/set→k-wayset-associativecache– DirectMappingcorrespondsto1-way– AssociativeMappingcorrespondstoall-way


2-wayAssociativeMapping

StaleData• Eachblockhasavalidbit,initializedto0• Nohitifvalidbitis0,eveniftagmatchoccurs• Validbitsetto1whenablockplacedincache• Whenpoweristurnedon,allvalidbitsaresetto0• BecauseofDMA,mainmemorycanchangew/oreadorwriteperformedbytheprocessor– InvalidacacheblockwhencorrespondingblockinmemoryismodifiedbyDMA

– Ifwrite-blacktransfersblockfromcachetomemorybeforestartingDMAthathassuchablockassource.Thisactioncanbeachievedbyflushingthecache.

LRUReplacementAlgorithm• Replacementistrivialfordirectmapping,butneedsamethodforassociativemapping

• Considertemporallocalityofreferenceandusealeast-recently-used (LRU) algorithm

• Fork-waysetassociativity,eachblockinasethasacounterrangingfromfrom0 tok-1,whichisupdatedw/thefollowing rules:– Hittingonablockclearsitscountervalueto0;othersoriginallylowerinsetareincremented,andalltheothersremainunchanged

– Whenamissoccursandthesetisnotfull,thecounterofthenewblockissetto0 andalltheothersareincreasedbyone

– Whenamissoccursandthesetisfull,replacetheblockw/counter=k-1,setitscounterto0 andincrementbyonealltheothercounters

HitRateandMissPenalty• Performanceofamemoryhierarchyaredeterminedbythehit

rate andthemisspenalty• Hitrate dependsonthecachesizeanditsorganization

(mappingfunction,blocksize)• Misspenalty includesthetimetodetectthemiss,transferone

blockfromthemainmem.tothecacheandeventuallytherequestedwordtotheproc.Itdependsonthemainmemoryaccesstime,whichisusuallymuchlargerforthefirstwordoftheblockthanfortheremainderones.– Let’sassumethatthecacheaccesstimeis1clockcycle,theaccess

forthefirstwordinmem.isNfirst =7cycle andforthefollowingwordsNmore =1cycle,andtheblocksizeB =8word.

– Then,themisspenaltyNmiss =(1+1x Nfirst +(B-1)x Nmore +1)– 1=15,where1onecycleisfordetectingthecachemissandanotherforprovidingtherequestedwordtotheproc.

EffectonPipeliningPerformance• Assumethat:freq.ofcachemissesduringfetchpmiss-fetch =5%,freq.ofcachemissesduringmemaccesspmiss-mem =10%,freq.ofLoadandStoreinstr.pLD-ST =30%.Then,:– δcache-miss =Nmiss (pmiss-fetch +pLD-ST x pmiss-mem)

=15x(0.05+0.03)=1.2– F =R/2.2=0.45R

• W/ocache,i.e.,pmiss-fetch =100%andpmiss-mem=100%,memaccesstimepenaltyisNfirst -1cycle– δmem =(Nfirst -1)x(1+pLD-ST)=7.8– F =R/8.8=0.11R

• Cacheimprovesperformancebyafactorof4.

References

• C.Hamacher,Z.Vranesic,S.Zaky,N.Manjikian"ComputerOrganizationandEmbeddedSystems,”McGraw-HillInternationalEdition– ChapterVIII:8.5– 8.7.1

16 memory hierarchy cache - unipi.it · cache operation • processor issues read and write...

Documents