caching tutorial for web authors and webmasters.pdf
TRANSCRIPT
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
1/14
forWebAuthorsandWebmasters
Thisisaninformationaldocument.Althoughtechnicalinnature,itatemptstomaketheconceptsinvolvedunderstandableandapplicable inrealworldsituations.Becauseofthis,someaspectsofthematerialaresimplifiedoromited,forthesakeofclarity.Ifyouareinterestedintheminutiaofthesubject,pleaseexploretheReferencesandFurtherInformationattheend.
WhatsaWebCache?Whydopeopleusethem?1.
KindsofWebCaches
BrowserCaches1.ProxyCaches2.
2.
ArentWebCachesbadforme?WhyshouldIhelpthem?3.
HowWebCachesWork4.
How(andhownot)toControlCaches
HTMLMetaTagsvs.HTTPHeaders1.
PragmaHTTPHeaders(andwhytheydontwork)2.
ControllingFreshnesswiththeExpiresHTTPHeader3.
CacheControlHTTPHeaders4.
ValidatorsandValidation5.
5.
TipsforBuildingaCacheAwareSite6.
WritingCacheAwareScripts7.
FrequentlyAskedQuestions8.ImplementationNotesWebServers9.
ImplementationNotesServerSideScripting10.
ReferencesandFurtherInformation11.
AboutThisDocument12.
AWebcachesitsbetweenoneormoreWebservers(alsoknownasoriginservers)andaclientormanyclients,andwatchesrequestscomeby,savingcopiesoftheresponseslikeHTMLpages,imagesandfiles(collectivelyknownasrepresentations)for
itself.Then,ifthereisanotherrequestforthesameURL,itcanusetheresponsethatithas,insteadofaskingtheoriginserverforitagain.
TherearetwomainreasonsthatWebcachesareused:
ToreducelatencyBecausetherequestissatisfiedfromthecache(whichisclosertotheclient)insteadoftheoriginserver,ittakeslesstimeforittogettherepresentationanddisplayit.ThismakestheWebseemmoreresponsive.ToreducenetworktrafficBecauserepresentationsarereused,itreducestheamountofbandwidthusedbyaclient.Thissavesmoneyiftheclientispayingfortraffic,andkeepstheirbandwidthrequirementslowerandmoremanageable.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
2/14
BROWSER CACHES
IfyouexaminethepreferencesdialogofanymodernWebbrowser(likeInternetExplorer,SafariorMozilla),youllprobablynoticeacacheseting.Thisletsyousetasideasectionofyourcomputersharddisktostorerepresentationsthatyouveseen,justforyou.Thebrowsercacheworksaccordingtofairlysimplerules.Itwillchecktomakesurethattherepresentationsarefresh,usuallyonceasession(thatis,theonceinthecurrentinvocationofthebrowser).
Thiscacheisespeciallyusefulwhenusershitthebackbutonorclickalinktoseeapagetheyvejustlookedat.Also,ifyouusethesamenavigationimagesthroughoutyoursite,theyllbeservedfrombrowserscachesalmostinstantaneously.
PROXY CACHES
Webproxycachesworkonthesameprinciple,butamuchlargerscale.Proxiesservehundredsorthousandsofusersinthesameway;largecorporationsandISPsofensetthemupontheirfirewalls,orasstandalonedevices(alsoknownasintermediaries).
Becauseproxycachesarentpartoftheclientortheoriginserver,butinsteadareoutonthenetwork,requestshavetoberoutedtothemsomehow.Onewaytodothisistouseyourbrowsersproxysetingtomanuallytellitwhatproxytouse;anotherisusinginterception.InterceptionproxieshaveWebrequestsredirectedtothembytheunderlyingnetworkitself,sothatclientsdontneedtobeconfiguredforthem,orevenknowaboutthem.
Proxycachesareatypeofsharedcache;ratherthanjusthavingonepersonusingthem,theyusuallyhavealargenumberofusers,andbecauseofthistheyareverygoodatreducinglatencyandnetworktraffic.Thatsbecausepopularrepresentationsarereusedanumberoftimes.
GATEWAY CACHES
Alsoknownasreverseproxycachesorsurrogatecaches,gatewaycachesarealsointermediaries,butinsteadofbeingdeployedbynetworkadministratorstosavebandwidth,theyretypicallydeployedbyWebmastersthemselves,tomaketheirsitesmorescalable,reliableandbeterperforming.
Requestscanberoutedtogatewaycachesbyanumberofmethods,buttypically
someformofloadbalancerisusedtomakeoneormoreofthemlookliketheoriginservertoclients.
Contentdeliverynetworks(CDNs)distributegatewaycachesthroughouttheInternet(orapartofit)andsellcachingtointerestedWebsites.SpeederaandAkamaiareexamplesofCDNs.
Thistutorialfocusesmostlyonbrowserandproxycaches,althoughsomeoftheinformationissuitableforthoseinterestedingatewaycachesaswell.
Web
caching
is
one
of
the
most
misunderstood
technologies
on
the
Internet.Webmastersinparticularfearlosingcontroloftheirsite,becauseaproxycachecanhidetheirusersfromthem,makingitdifficulttoseewhosusingthesite.
Unfortunatelyforthem,evenifWebcachesdidntexist,therearetoomanyvariablesontheInternettoassurethattheyllbeabletogetanaccuratepictureofhowusersseetheirsite.Ifthisisabigconcernforyou,thistutorialwillteachyouhowtogetthestatisticsyouneedwithoutmakingyoursitecacheunfriendly.
Anotherconcernisthatcachescanservecontentthatisoutofdate,orstale.However,thistutorialcanshowyouhowtoconfigureyourservertocontrolhowyourcontentiscached.
Ontheotherhand,ifyouplanyoursitewell,cachescanhelpyourWebsiteload
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
3/14
CDNsarean
interesting
development,because
unlikemanyproxy
caches,theirgateway
cachesarealigned
withtheinterestsof
theWebsitebeing
cached,sothatthese
problemsarentseen.
However,evenwhen
youuseaCDN,you
stillhavetoconsider
thattherewillbe
proxyandbrowser
cachesdownstream.
faster,andsaveloadonyourserverandInternetlink.Thedifferencecanbedramatic;asitethatisdifficulttocachemaytakeseveralsecondstoload,whileonethattakesadvantageofcachingcanseeminstantaneousincomparison.Userswillappreciateafastloadingsite,andwillvisitmoreofen.
Thinkofitthisway;manylargeInternetcompaniesarespendingmillionsofdollarsseting
upfarmsofserversaroundtheworldtoreplicatetheircontent,inordertomakeitasfasttoaccessaspossiblefortheirusers.Cachesdothesameforyou,andtheyreevenclosertotheenduser.Bestofall,youdonthavetopayforthem.
Thefactisthatproxyandbrowsercacheswillbeusedwhetheryoulikeitornot.Ifyoudontconfigureyoursitetobecachedcorrectly,itwillbecachedusingwhateverdefaultsthecachesadministratordecidesupon.
Allcacheshaveasetofrulesthattheyusetodeterminewhentoservearepresentationfromthecache,ifitsavailable.Someoftheserulesaresetintheprotocols(HTTP1.0and1.1),andsomearesetbytheadministratorofthecache(eithertheuserofthebrowsercache,ortheproxyadministrator).
Generallyspeaking,thesearethemostcommonrulesthatarefollowed(dontworryifyoudontunderstandthedetails,itwillbeexplainedbelow):
Iftheresponsesheaderstellthecachenottokeepit,itwont.1.
Iftherequestisauthenticatedorsecure(i.e.,HTTPS),itwontbecached.2.
Acachedrepresentationisconsideredfresh(thatis,abletobesenttoaclientwithoutcheckingwiththeoriginserver)if:
Ithasanexpirytimeorotheragecontrollingheaderset,andisstillwithinthefreshperiod,orIfthecachehasseentherepresentationrecently,anditwasmodifiedrelativelylongago.
Freshrepresentationsareserveddirectlyfromthecache,withoutcheckingwiththeoriginserver.
3.
Ifarepresentationisstale,theoriginserverwillbeaskedtovalidateit,ortellthecachewhetherthecopythatithasisstillgood.
4.
Undercertaincircumstancesforexample,whenitsdisconnectedfromanetworkacachecanservestaleresponseswithoutcheckingwiththeoriginserver.
5.
Ifnovalidator(anETagorLas t - Modi f i edheader)ispresentonaresponse,and it
doesn
t
have
any
explicit
freshness
information,
it
will
usually
but
not
always
beconsidereduncacheable.
Together,freshnessandvalidationarethemostimportantwaysthatacacheworkswithcontent.Afreshrepresentationwillbeavailableinstantlyfromthecache,whileavalidatedrepresentationwillavoidsendingtheentirerepresentationoveragainifithasntchanged.
ThereareseveraltoolsthatWebdesignersandWebmasterscanusetofinetunehowcacheswilltreattheirsites.Itmayrequiregetingyourhandsalitledirtywithyourserversconfiguration,buttheresultsareworthit.Fordetailsonhowtousethese
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
4/14
Ifyoursiteishosted
atanISPorhosting
farmandtheydont
giveyoutheabilityto
setarbitraryHTTP
headers(like
and
),
complainloudly;
thesearetools
necessaryfordoing
yourjob.
toolswithyourserver,seetheImplementation sectionsbelow.
HTML META TAGS AND HTTP HEADERS
HTMLauthorscanputtagsinadocumentssectionthatdescribeitsatributes.Thesemetatagsareofenusedinthebeliefthattheycanmarkadocumentasuncacheable,orexpireitatacertaintime.
Metatagsareeasytouse,butarentveryeffective.Thatsbecausetheyreonlyhonoredbyafewbrowsercaches,notproxycaches(whichalmostneverreadtheHTMLinthedocument).WhileitmaybetemptingtoputaPragma:nocachemetatagintoaWebpage,itwontnecessarilycauseittobekeptfresh.
Ontheotherhand,trueHTTPheadersgiveyoualotofcontroloverhowbothbrowsercachesandproxieshandleyourrepresentations.TheycantbeseenintheHTML,andareusuallyautomaticallygeneratedbytheWebserver.However,youcancontrolthemtosomedegree,dependingontheserveryouuse.Inthefollowingsections,youllseewhatHTTPheadersareinteresting,andhowtoapplythemtoyoursite.
HTTPheadersaresentbytheserverbeforethe
HTML,and
only
seen
by
the
browser
and
anyintermediatecaches.TypicalHTTP1.1response
headersmightlooklikethis:
HTTP/ 1. 1 200 OKDat e: Fr i , 30 Oct 1998 13: 19: 41 GMTServer : Apache/ 1 . 3 . 3 ( Uni x)Cache- Cont r o l : max- age=3600, must - r eva l i dat eExpi r es: Fr i , 30 Oct 1998 14: 19: 41 GMTLast - Modi f i ed: Mon, 29 Jun 1998 02: 28: 12 GMTETag: " 3e86- 410- 3596f bbc"Cont ent - Lengt h : 1040Con t ent - Type : t ext / ht ml
TheHTML
would
follow
these
headers,
separated
by
ablank
line.
See
theImplementationsectionsforinformationabouthowtosetHTTPheaders.
PRAGMA HTTP HEADERS (AND WHY THEY DONT WORK)
ManypeoplebelievethatassigningaPr agma: no- cacheHTTPheadertoarepresentationwillmakeituncacheable.Thisisnotnecessarilytrue;theHTTPspecificationdoesnotsetanyguidelinesforPragmaresponseheaders;instead,Pragmarequestheaders(theheadersthatabrowsersendstoaserver)arediscussed.Althoughafewcachesmayhonorthisheader,themajoritywont,anditwonthaveanyeffect.Usetheheadersbelowinstead.
CONTROLLING FRESHNESS WITH THE EXPIRES HTTP HEADER
TheExpi r es HTTPheaderisabasicmeansofcontrollingcaches;ittellsallcaches
howlongtheassociatedrepresentationisfreshfor.Aferthattime,cacheswillalwayscheckbackwiththeoriginservertoseeifadocumentischanged.Expi r esheadersaresupportedbypracticallyeverycache.
MostWebserversallowyoutosetExpi r es responseheadersinanumberofways.Commonly,theywillallowsetinganabsolutetimetoexpire,atimebasedonthelasttimethattheclientretrievedtherepresentation(lastaccesstime),oratimebasedonthelasttimethedocumentchangedonyourserver(lastmodificationtime).Expi r es headersareespeciallygoodformakingstaticimages(likenavigationbarsandbutons)cacheable.Becausetheydontchangemuch,youcansetextremelylongexpirytimeonthem,makingyoursiteappearmuchmoreresponsivetoyourusers.Theyrealsousefulforcontrollingcachingofapagethatisregularlychanged.For
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
5/14
Itsimportanttomake
surethatyourWeb
serversclockis
accurateifyouusethe
header.
Onewaytodothisis
usingtheNetwork
TimeProtocol(NTP);
talktoyourlocal
systemadministrator
tofindoutmore.
instance,ifyouupdateanewspageonceadayat6am,youcansettherepresentationtoexpireatthattime,socacheswillknowwhentogetafreshcopy,withoutusershavingtohitreload.
TheonlyvaluevalidinanExpi r es headerisaHTTPdate;anythingelsewillmostlikelybeinterpretedasinthepast,sothattherepresentationisuncacheable.Also,rememberthatthetimeinaHTTPdateisGreenwichMeanTime(GMT),notlocaltime.
Forexample:
Expi r es: Fr i , 30 Oct 1998 14: 19: 41 GMT
AlthoughtheExpi r esheaderisuseful,ithassomelimitations.First,becausetheresadateinvolved,theclocksontheWebserverandthecachemustbesynchronised;iftheyhaveadifferentideaofthetime,theintendedresultswontbeachieved,andcachesmightwronglyconsiderstalecontentasfresh.
AnotherproblemwithExpi r es isthatitseasytoforgetthatyouvesetsomecontenttoexpireataparticulartime.IfyoudontupdateanExpi r es
timebeforeitpasses,eachandeveryrequestwillgobacktoyourWebserver,increasingloadandlatency.
CACHE-CONTROL HTTP HEADERS
HTTP1.1introducedanewclassofheaders,Cache- Cont r ol responseheaders,togiveWebpublishersmorecontrolovertheircontent,andtoaddressthelimitationsofExpi r es .
UsefulCache- Cont r ol responseheadersinclude:
max- age=[seconds]specifiesthemaximumamountoftimethatarepresentationwillbeconsideredfresh.SimilartoExpi r es,thisdirectiveis
relativetothe
time
ofthe
request,
rather
than
absolute.
[seconds]
isthe
number
ofsecondsfromthetimeoftherequestyouwishtherepresentationtobefreshfor.
s- maxage=[seconds]similartomax- age,exceptthatitonlyappliestoshared(e.g.,proxy)caches.publ i c marksauthenticatedresponsesascacheable;normally,ifHTTPauthentication isrequired,responsesareautomaticallyprivate.pr i vat eallowscachesthatarespecifictooneuser(e.g.,inabrowser)tostoretheresponse;sharedcaches(e.g.,inaproxy)maynot.no- cacheforcescachestosubmittherequesttotheoriginserverforvalidationbeforereleasingacachedcopy,everytime.Thisisusefultoassurethatauthentication isrespected(incombinationwithpublic),ortomaintainrigidfreshness,withoutsacrificingallofthebenefitsofcaching.no- s t or einstructscachesnottokeepacopyoftherepresentationunderanyconditions.mus t - r eval i dat etellscachesthattheymustobeyanyfreshnessinformationyougivethemaboutarepresentation.HTTPallowscachestoservestalerepresentationsunderspecialconditions;byspecifyingthisheader,youretellingthecachethatyouwantittostrictlyfollowyourrules.pr ox y- r ev al i dat esimilartomus t - r eval i dat e,exceptthatitonlyappliestoproxycaches.
Forexample:
Cache- Cont r o l : max- age=3600, must - r eva l i dat e
WhenbothCache- Cont r ol andExpi r esarepresent,Cache- Cont r ol takesprecedence.IfyouplantousetheCache- Cont r ol headers,youshouldhavealook
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
6/14
attheexcellentdocumentationinHTTP1.1;seeReferencesandFurtherInformation.
VALIDATORS AND VALIDATION
InHowWebCachesWork,wesaidthatvalidationisusedbyserversandcachestocommunicatewhenarepresentationhaschanged.Byusingit,cachesavoidhavingtodownloadtheentirerepresentationwhentheyalreadyhaveacopylocally,buttheyrenotsureifitsstillfresh.
Validatorsareveryimportant;ifoneisntpresent,andthereisntanyfreshnessinformation(Expi r es orCache- Cont r ol )available,cacheswillnotstorearepresentationatall.
Themostcommonvalidatoristhetimethatthedocumentlastchanged,ascommunicatedinLas t - Modi f i edheader.WhenacachehasarepresentationstoredthatincludesaLas t - Modi f i edheader,itcanuseittoasktheserveriftherepresentationhaschangedsincethelasttimeitwasseen,withanI f - Modi f i ed- Si nc erequest.
HTTP1.1introducedanewkindofvalidatorcalledtheETag.ETagsareuniqueidentifiersthataregeneratedbytheserverandchangedeverytimetherepresentationdoes.BecausetheservercontrolshowtheETagisgenerated,cachescanbesurethatiftheETagmatcheswhentheymakeaI f - None- Mat chrequest,therepresentationreallyisthesame.
AlmostallcachesuseLastModifiedtimesasvalidators;ETagvalidationisalsobecomingprevalent.
MostmodernWebserverswillgeneratebothETagandLas t - Modi f i edheaderstouseasvalidatorsforstaticcontent(i.e.,files)automatically; youwonthavetodoanything.However,theydontknowenoughaboutdynamiccontent(likeCGI,ASPordatabasesites)togeneratethem;seeWritingCacheAwareScripts.
Besidesusingfreshnessinformationandvalidation,thereareanumberofotherthingsyoucandotomakeyoursitemorecachefriendly.
UseURLsconsistentlythisisthegoldenruleofcaching.Ifyouservethesamecontentondifferentpages,todifferentusers,orfromdifferentsites,itshouldusethesameURL.Thisistheeasiestandmosteffectivewaytomakeyoursitecachefriendly.Forexample,ifyouuse/index.htmlinyourHTMLasareferenceonce,alwaysuseitthatway.Useacommonlibraryofimagesandotherelementsandreferbacktothemfromdifferentplaces.MakecachesstoreimagesandpagesthatdontchangeofenbyusingaCache-Cont r ol : max-ageheaderwithalargevalue.Makecachesrecogniseregularlyupdatedpagesbyspecifyinganappropriatemaxageorexpirationtime.Ifaresource(especiallyadownloadable file)changes,changeitsname.Thatway,youcanmakeitexpirefarinthefuture,andstillguaranteethatthecorrectversionisserved;thepagethatlinkstoitistheonlyonethatwillneedashortexpirytime.Dontchangefilesunnecessarily.Ifyoudo,everythingwillhaveafalselyyoungLas t - Modi f i eddate.Forinstance,whenupdatingyoursite,dontcopyovertheentiresite;justmovethefilesthatyouvechanged.Usecookiesonlywherenecessarycookiesaredifficulttocache,andarentneededinmostsituations.Ifyoumustuseacookie,limititsusetodynamicpages.MinimizeuseofSSLbecauseencryptedpagesarenotstoredbysharedcaches,usethemonlywhenyouhaveto,anduseimagesonSSLpagessparingly.CheckyourpageswithREDbotitcanhelpyouapplymanyoftheconceptsinthistutorial.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
7/14
Bydefault,mostscriptswontreturnavalidator(aLas t - Modi f i ed orETag responseheader)orfreshnessinformation(Expi r es orCache- Cont r ol ).Whilesomescriptsreallyaredynamic(meaningthattheyreturnadifferentresponseforeveryrequest),many(likesearchenginesanddatabasedrivensites)canbenefitfrombeingcachefriendly.
Generallyspeaking,ifascriptproducesoutputthatisreproduciblewiththesamerequestatalatertime(whetheritbeminutesordayslater),itshouldbecacheable.IfthecontentofthescriptchangesonlydependingonwhatsintheURL,itiscacheable;iftheoutputdependsonacookie,authenticationinformationorother
externalcriteria,itprobablyisnt.
Thebestwaytomakeascriptcachefriendly(aswellasperformbeter)istodumpitscontenttoaplainfilewheneveritchanges.TheWebservercanthentreatitlikeanyotherWebpage,generatingandusingvalidators,whichmakesyourlifeeasier.Remembertoonlywritefilesthathavechanged,sotheLas t - Modi f i edtimesarepreserved.Anotherwaytomakeascriptcacheableinalimitedfashionistosetanagerelatedheaderforasfarinthefutureaspractical.AlthoughthiscanbedonewithExpi r es,itsprobablyeasiesttodosowithCache- Cont r ol : max- age,whichwillmaketherequestfreshforanamountoftimeafertherequest.Ifyoucantdothat,youllneedtomakethescriptgenerateavalidator,andthenrespondtoI f - Modi f i ed- Si nc eand/orI f - None- Mat chrequests.ThiscanbedonebyparsingtheHTTPheaders,andthenrespondingwith304 Not
Modi f i edwhenappropriate.Unfortunately,thisisnotatrivaltask.
Someothertips;
DontusePOSTunlessitsappropriate.ResponsestothePOSTmethodarentkeptbymostcaches;ifyousendinformationinthepathorquery(viaGET),cachescanstorethatinformationforthefuture.DontembeduserspecificinformationintheURLunlessthecontentgeneratediscompletelyuniquetothatuser.Dontcountonallrequestsfromausercomingfromthesamehost,becausecachesofenworktogether.GenerateCont ent - Lengt hresponseheaders.Itseasytodo,anditwillallowtheresponseofyourscripttobeusedinapersistentconnection.ThisallowsclientstorequestmultiplerepresentationsononeTCP/IPconnection,insteadofsetingupa
connectionfor
every
request.
Itmakes
your
site
seem
much
faster.
SeetheImplementationNotesformorespecificinformation.
WHAT ARE THE MOST IMPORTANT THINGS TO MAKECACHEABLE?
Agoodstrategyistoidentifythemostpopular,largestrepresentations(especiallyimages)andworkwiththemfirst.
HOW CAN I MAKE MY PAGES AS FAST AS POSSIBLE WITHCACHES?
Themostcacheablerepresentationisonewithalongfreshnesstimeset.Validationdoeshelpreducethetimethatittakestoseearepresentation,butthecachestillhastocontacttheoriginservertoseeifitsfresh.Ifthecachealreadyknowsitsfresh,itwillbeserveddirectly.
I UNDERSTAND THAT CACHING IS GOOD, BUT I NEED TO KEEPSTATISTICS ON HOW MANY PEOPLE VISIT MY PAGE!
Ifyoumustknoweverytimeapageisaccessed,selectONEsmallitemonapage(orthepageitself),andmakeituncacheable,bygivingitasuitableheaders.Forexample,youcouldrefertoa1x1transparentuncacheableimagefromeachpage.TheRef er er headerwillcontaininformationaboutwhatpagecalledit.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
8/14
Beawarethateventhiswillnotgivetrulyaccuratestatisticsaboutyourusers,and isunfriendlytotheInternetandyourusers;itgeneratesunnecessarytraffic,andforcespeopletowaitforthatuncacheditemtobedownloaded.Formoreinformationaboutthis,seeOnInterpretingAccessStatisticsinthereferences.
HOW CAN I SEE A REPRESENTATIONS HTTP HEADERS?
ManyWebbrowsersletyouseetheExpi r es andLas t - Modi f i edheadersareinapageinfoorsimilarinterface.Ifavailable,thiswillgiveyouamenuofthepageandanyrepresentations(likeimages)associatedwithit,alongwiththeirdetails.
Toseethefullheadersofarepresentation,youcanmanuallyconnecttotheWebserverusingaTelnetclient.
Todoso,youmayneedtotypetheport(bedefault,80)intoaseparatefield,oryoumayneedtoconnecttowww. exampl e. com: 80orwww. exampl e. com 80(notethespace).ConsultyourTelnetclientsdocumentation.
Onceyouveopenedaconnectiontothesite,typearequestfortherepresentation.Forinstance,ifyouwanttoseetheheadersforht t p: / / www. exampl e. com
/ f oo. ht ml ,connecttowww. exampl e. com,port80,andtype:
GET / f oo. h t ml HTTP/ 1. 1 [ r e t urn ]Hos t : www. exampl e. com [ r e t u rn ] [ r et ur n]
PresstheReturnkeyeverytimeyousee[ r et ur n ] ;makesuretopressittwiceattheend.Thiswillprinttheheaders,andthenthefullrepresentation.Toseetheheadersonly,substituteHEADforGET.
MY PAGES ARE PASSWORD-PROTECTED; HOW DO PROXYCACHES DEAL WITH THEM?
Bydefault,pagesprotectedwithHTTPauthentication areconsideredprivate;theywillnotbekeptbysharedcaches.However,youcanmakeauthenticatedpagespublicwithaCacheControl:publicheader;HTTP1.1compliantcacheswillthenallowthemtobecached.
Ifyoudlikesuchpagestobecacheable,butstillauthenticatedforeveryuser,combine
theCache- Cont r ol : publ i c
and
no- cache
headers.
This
tells
the
cache
that
itmustsubmitthenewclientsauthentication informationtotheoriginserverbefore
releasingtherepresentationfromthecache.Thiswouldlooklike:
Cache-Cont r o l : pub l i c , no -cache
Whetherornotthisisdone,itsbesttominimizeuseofauthentication; forexample,ifyourimagesarenotsensitive,puttheminaseparatedirectoryandconfigureyourservernottoforceauthentication forit.Thatway,thoseimageswillbenaturallycacheable.
SHOULD I WORRY ABOUT SECURITY IF PEOPLE ACCESS MYSITE THROUGH A CACHE?
SSLpagesarenotcached(ordecrypted)byproxycaches,soyoudonthavetoworryaboutthat.However,becausecachesstorenonSSLrequestsandURLsfetchedthroughthem,youshouldbeconsciousaboutunsecuredsites;anunscrupulousadministratorcouldconceivablygatherinformationabouttheirusers,especiallyintheURL.
Infact,anyadministratoronthenetworkbetweenyourserverandyourclientscouldgatherthistypeofinformation.OneparticularproblemiswhenCGIscriptsputusernamesandpasswordsintheURLitself;thismakesittrivialforotherstofindandusetheirlogin.
IfyoureawareoftheissuessurroundingWebsecurityingeneral,youshouldnthaveanysurprisesfromproxycaches.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
9/14
IM LOOKING FOR AN INTEGRATED WEB PUBLISHINGSOLUTION. WHICH ONES ARE CACHE-AWARE?
Itvaries.Generallyspeaking,themorecomplexasolutionis,themoredifficultitistocache.Theworstareoneswhichdynamicallygenerateallcontentanddontprovidevalidators;theymaynotbecacheableatall.Speakwithyourvendorstechnicalstaffformoreinformation,andseetheImplementationnotesbelow.
MY IMAGES EXPIRE A MONTH FROM NOW, BUT I NEED TOCHANGE THEM IN THE CACHES NOW!
TheExpiresheadercantbecircumvented;unlessthecache(eitherbrowserorproxy)runsoutofroomandhastodeletetherepresentations,thecachedcopywillbeuseduntilthen.
Themosteffectivesolutionistochangeanylinkstothem;thatway,completelynewrepresentationswillbeloadedfreshfromtheoriginserver.Rememberthatanypagethatreferstotheserepresentationswillbecachedaswell.Becauseofthis,itsbesttomakestaticimagesandsimilarrepresentationsverycacheable,whilekeepingtheHTMLpagesthatrefertothemonatightleash.
Ifyouwanttoreloadarepresentationfromaspecificcache,youcaneitherforceareload(inFirefox,holdingdownshif whilepressingreloadwilldothisbyissuingaPr agma: no- cacherequestheader)whileusingthecache.Or,youcanhavethecacheadministratordeletetherepresentationthroughtheirinterface.
I RUN A WEB HOSTING SERVICE. HOW CAN I LET MY USERSPUBLISH CACHE-FRIENDLY PAGES?
IfyoureusingApache,considerallowingthemtouse.htaccessfilesandprovidingappropriatedocumentation.
Otherwise,youcanestablishpredeterminedareasforvariouscachingatributesineachvirtualserver.Forinstance,youcouldspecifyadirectory/cache1mthatwillbecachedforonemonthaferaccess,anda/nocacheareathatwillbeservedwithheadersinstructingcachesnottostorerepresentationsfromit.
Whateveryouareabletodo,itisbesttoworkwithyourlargestcustomersfirstoncaching.Mostofthesavings(inbandwidthandinloadonyourservers)willbe
realizedfromhighvolumesites.
IVE MARKED MY PAGES AS CACHEABLE, BUT MY BROWSERKEEPS REQUESTING THEM ON EVERY REQUEST. HOW DO IFORCE THE CACHE TO KEEP REPRESENTATIONS OF THEM?
Cachesarentrequiredtokeeparepresentationandreuseit;theyreonlyrequiredtonotkeeporusethemundersomeconditions.Allcachesmakedecisionsaboutwhichrepresentationstokeepbasedupontheirsize,type(e.g.,imagevs.html),orbyhowmuchspacetheyhavelef tokeeplocalcopies.Yoursmaynotbeconsideredworthkeepingaround,comparedtomorepopularorlargerrepresentations.
Somecachesdoallowtheiradministratorstoprioritizewhatkindsofrepresentationsarekept,andsomeallowrepresentationstobepinnedincache,sothattheyre
alwaysavailable.
Generallyspeaking,itsbesttousethelatestversionofwhateverWebserveryouvechosentodeploy.Notonlywilltheylikelycontainmorecachefriendlyfeatures,newversionsalsousuallyhaveimportantsecurityandperformanceimprovements.
APACHE HTTP SERVER
Apacheusesoptionalmodulestoincludeheaders,includingbothExpiresandCacheControl.Bothmodulesareavailableinthe1.2orgreaterdistribution.
ThemodulesneedtobebuiltintoApache;althoughtheyareincludedinthe
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
4 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
10/14
distribution,theyarenotturnedonbydefault.Tofindoutifthemodulesareenabledinyourserver,findthehtpdbinaryandrunht t pd - l ;thisshouldprintalistoftheavailablemodules(notethatthisonlylistscompiledinmodules;onlaterversionsofApache,useht t pd - Mtoincludedynamicallyloadedmodulesaswell).Themoduleswerelookingforaremod_expiresandmod_headers.
Iftheyarentavailable,andyouhaveadministrativeaccess,youcanrecompileApachetoincludethem.ThiscanbedoneeitherbyuncommentingtheappropriatelinesintheConfigurationfile,orusingthe- enabl e-modul e=expi r esand- enabl e- modul e=header sargumentstoconfigure(1.3
orgreater).ConsulttheINSTALLfilefoundwiththeApachedistribution.OnceyouhaveanApachewiththeappropriatemodules,youcanusemod_expirestospecifywhenrepresentationsshouldexpire,eitherin.htaccessfilesorintheserversaccess.conffile.Youcanspecifyexpiryfromeitheraccessormodificationtime,andapplyittoafiletypeorasadefault.Seethemoduledocumentationformoreinformation,andspeakwithyourlocalApacheguruifyouhavetrouble.
ToapplyCache- Cont r ol headers,youllneedtousethemod_headersmodule,whichallowsyoutospecifyarbitraryHTTPheadersforaresource.Seethemod_headersdocumentation.
Heresanexample.htaccessfilethatdemonstratestheuseofsomeheaders.
.htaccessfilesallowwebpublisherstousecommandsnormallyonlyfoundin
configurationfiles.Theyaffectthecontentofthedirectorytheyreinandtheirsubdirectories.Talktoyourserveradministratortofindoutiftheyreenabled.
### act i vat e mod_expi r esExpi r esAct i ve On### Expi r e . gi f ' s 1 mon t h f r om when t hey ' r e accessedExpi r esByType i mage/ gi f A2592000### Ex pi r e ev er y t hi ng el s e 1 day f r om when i t ' s l as t modi f i ed### ( t hi s us es t he Al t er nat i v e s ynt ax )Expi r esDef au l t " modi f i cat i on pl us 1 day"### Appl y a Cache- Cont r ol header t o i ndex. ht mlHeader append Cache- Cont r o l " pub l i c , must - r eva l i dat e"
Notethatmod_expiresautomaticallycalculatesandinsertsaCache-Cont r ol : max-ageheaderasappropriate.
Apache2sconfigurationisverysimilartothatof1.3;seethe2.2mod_expiresandmod_headersdocumentationformoreinformation.
MICROSOFT IIS
MicrosofsInternetInformationServermakesitveryeasytosetheadersinasomewhatflexibleway.Notethatthisisonlypossibleinversion4oftheserver,whichwillrunonlyonNTServer.
Tospecifyheadersforanareaofasite,selectitintheAdmi ni s t r at i on Tool s
interface,andbringupitsproperties.AferselectingtheHTTP Header s tab,you
shouldseetwointerestingareas;Enabl e Cont ent Expi r at i onandCust omHTTP header s .Thefirstshouldbeselfexplanatory,andthesecondcanbeusedtoapplyCacheControlheaders.
SeetheASPsectionbelowforinformationaboutsetingheadersinActiveServerPages.ItisalsopossibletosetheadersfromISAPImodules;refertoMSDNfordetails.
NETSCAPE/IPLANET ENTERPRISE SERVER
Asofversion3.6,EnterpriseServerdoesnotprovideanyobviouswaytosetExpiresheaders.However,ithassupportedHTTP1.1featuressinceversion3.0.ThismeansthatHTTP1.1caches(proxyandbrowser)willbeabletotakeadvantageofCache
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
14 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
11/14
Onethingtokeepin
mindisthatitmaybe
easiertosetHTTP
headerswithyour
Webserverrather
thaninthescripting
language.Tryboth.
Controlsetingsyoumake.
TouseCacheControlheaders,chooseCont ent Management | Cache Cont r olDi r ec t i v es intheadministrationserver.Then,usingtheResourcePicker,choosethedirectorywhereyouwanttosettheheaders.Afersetingtheheaders,clickOK.Formoreinformation,seetheNESmanual.
Becausetheemphasisinserversidescriptingisondynamiccontent,itdoesntmakeforverycacheablepages,evenwhenthecontentcouldbecached.Ifyourcontentchangesofen,butnotoneverypagehit,considersetingaCacheControl:maxageheader;mostusersaccesspagesagaininarelativelyshortperiodoftime.Forinstance,whenusershitthebackbuton,ifthereisntanyvalidatororfreshnessinformationavailable,theyllhavetowaituntilthepageisredownloadedfromtheservertoseeit.
CGI
CGIscriptsareoneofthemostpopularwaystogeneratecontent.YoucaneasilyappendHTTPresponseheadersbyaddingthembeforeyousendthebody;MostCGIimplementationsalreadyrequireyoutodothisfortheCont ent - Typeheader.Forinstance,inPerl;
#! / us r / bi n/ per lpr i nt " Cont ent - t ype: t ext / ht ml \ n" ;pr i n t " Expi r es: Thu, 29 Oct 1998 17: 04: 19 GMT\ n";pr i nt " \ n" ;### t he cont en t body f o l l ows . . .
Sinceitsalltext,youcaneasilygenerateExpi r es andotherdaterelatedheaderswithinbuiltfunctions.ItseveneasierifyouuseCache- Cont r ol : max- age;
pr i n t " Cache- Cont r o l : max- age=600\ n";
Thiswillmakethescriptcacheablefor10minutesafertherequest,sothatiftheuserhitsthebackbuton,theywontberesubmitingtherequest.
TheCGIspecificationalsomakesrequestheadersthattheclientsendsavailableintheenvironmentofthescript;eachheaderhasHTTP_prependedtoitsname.So,ifaclientmakesanI f - Modi f i ed- Si nc erequest,itwillshowupasHTTP_I F_MODI FI ED_SI NCE.
Seealsothecgi_bufferlibrary,whichautomaticallyhandlesETaggenerationandvalidation,Cont ent - Lengt hgenerationandgzipcontentcodingforPerlandPythonCGIscriptswithaonelineinclude.ThePythonversioncanalsobeusedtowrap
arbitraryCGIscriptswith.
SERVER SIDE INCLUDES
SSI(ofenusedwiththeextension.shtml)isoneofthefirstwaysthatWebpublisherswereabletogetdynamiccontentintopages.Byusingspecialtagsinthepages,alimitedformofinHTMLscriptingwasavailable.
MostimplementationsofSSIdonotsetvalidators,andassucharenotcacheable.However,ApachesimplementationdoesallowuserstospecifywhichSSIfilescanbecached,bysetingthegroupexecutepermissionsontheappropriatefiles,combinedwiththeXbi t Hack f ul l directive.Formoreinformation,seethemod_includedocumentation.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
14 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
12/14
PHP
PHPisaserversidescriptinglanguagethat,whenbuiltintotheserver,canbeusedtoembedscriptsinsideapagesHTML,muchlikeSSI,butwithafarlargernumberofoptions.PHPcanbeusedasaCGIscriptonanyWebserver(UnixorWindows),orasanApachemodule.
Bydefault,representationsprocessedbyPHParenotassignedvalidators,andarethereforeuncacheable.However,developerscansetHTTPheadersbyusingtheHeader ( ) function.
Forexample,thiswillcreateaCacheControlheader,aswellasanExpiresheaderthreedaysinthefuture:
RememberthattheHeader ( ) functionMUSTcomebeforeanyotheroutput.
As
you
can
see,
youll
have
to
create
the
HTTP
date
for
anExpi r es
header
by
hand;PHPdoesntprovideafunctiontodoitforyou(althoughrecentversionshavemadeiteasier;seethePHPsdatedocumentation).Ofcourse,itseasytosetaCache-Cont r ol : max- age header ,whichisjustasgoodformostsituations.
Formoreinformation,seethemanualentryforheader.
Seealsothecgi_bufferlibrary,whichautomaticallyhandlesETaggenerationandvalidation,Cont ent - Lengt hgenerationandgzipcontentcodingforPHPscriptswithaonelineinclude.
COLD FUSION
ColdFusion,byMacromediaisacommercialserversidescriptingengine,withsupportforseveralWebserversonWindows,LinuxandseveralflavorsofUnix.
ColdFusionmakessetingarbitraryHTTPheadersrelativelyeasy,withtheCFHEADERtag.Unfortunately,theirexampleforsetinganExpi r esheader,asbelow,isabitmisleading.
Itdoesntworklikeyoumightthink,becausethetime(inthiscase,whentherequestismade)doesntgetconvertedtoaHTTPvaliddate;instead,itjustgetsprintedasarepresentationofColdFusionsDate/Timeobject.Mostclientswilleitherignoresuchavalue,orconvertittoadefault,likeJanuary1,1970.
However,ColdFusiondoesprovideadateformatingfunctionthatwilldothejob;Get Ht t pTi meSt r i ng.IncombinationwithDat eAdd,itseasytosetExpiresdates;here,wesetaheadertodeclarethatrepresentationsofthepageexpireinonemonth;
YoucanalsousetheCFHEADERtagtosetCache- Cont r ol : max- ageandotherheaders.
RememberthatWebserverheadersarepassedthroughinsomedeploymentsofColdFusion(suchasCGI);checkyourstodeterminewhetheryoucanusethistoyouradvantage,bysetingheadersontheserverinsteadofinColdFusion.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
14 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
13/14
WhensetingHTTP
headersfromASPs,
makesureyoueither
placetheResponse
methodcallsbefore
anyHTML
generation,oruse
tobuffertheoutput.
Also,notethatsome
versionsofIISseta
headeron
ASPsbydefault,and
mustbedeclared
public tobecacheable
bysharedcaches.
ASP AND ASP.NET
ActiveServerPages,builtintoIISandalsoavailableforotherWebservers,alsoallowsyoutosetHTTPheaders.Forinstance,tosetanexpirytime,youcanusethepropertiesoftheResponse
object;
specifyingthenumberofminutesfromtherequesttoexpiretherepresentation.Cache- Cont r ol
headerscanbeaddedlikethis:
InASP.NET,Response. Expi r esisdeprecated;theproperwaytosetcacherelatedheadersiswithResponse. Cache;
Response. Cache. Set Expi r es ( Dat eTi me. Now. AddMi nut es ( 60 ) ) ;Response. Cache . Set Cacheab i l i t y ( Ht t pCacheab i l i t y . Pub l i c ) ;
HTTP 1.1 SPECIFICATION
TheHTTP1.1spechasmanyextensionsformakingpagescacheable,andistheauthoritativeguidetoimplementingtheprotocol.Seesections13,14.9,14.21,and14.25.
WEB-CACHING.COM
Anexcellentintroductiontocachingconcepts,withlinkstootheronlineresources.
ON INTERPRETING ACCESS STATISTICS
JeffGoldbergsinformativerantonwhyyoushouldntrelyonaccessstatisticsandhitcounters.
REDBOT
ExaminesHTTPresourcestodeterminehowtheywillinteractwithWebcaches,andgenerallyhowwelltheyusetheprotocol.
CGI_BUFFER LIBRARY
OnelineincludeinPerlCGI,PythonCGIandPHPscriptsautomaticallyhandlesETaggenerationandvalidation,ContentLengthgenerationandgzipContentEncodingcorrectly.ThePythonversioncanalsobeusedasawrapperaroundarbitraryCGIscripts.
ThisdocumentisCopyright19982012MarkNotingham.ThisworkislicensedunderaCreativeCommonsAtributionNoncommercialNoDerivativeWorks3.0UnportedLicense.
Alltrademarkswithinarepropertyoftheirrespectiveholders.
Althoughtheauthorbelievesthecontentstobeaccurateatthetimeofpublication,noliabilityisassumedforthem,theirapplicationoranyconsequencesthereof.Ifanymisrepresentations,errorsorotherneedforclarificationisfound,pleasecontacttheauthorimmediately.
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs
14 12/4/2012
-
7/30/2019 Caching Tutorial for Web Authors and Webmasters.pdf
14/14
Thelatestrevisionofthisdocumentcanalwaysbeobtainedfromhtp://www.mnot.net/cache_docs/
Translationsareavailablein:Belarusian,Chinese,Czech,German,andFrench.
February9,2012
ng Tutorial for Web Authors and Webmasters http://www.mnot.net/cache_docs