cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...

22
Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Upload: erica-harton

Post on 29-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage For the Next Billion

Students: Anirudh Badam, Sunghwan Ihm

Research Scientist: KyoungSoo Park

Presenter: Vivek Pai

Collaborator: Larry Peterson

Page 2: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion2

The Next Billion

Developing regions are not all alike Many people have stable food, clean

water, reasonable powerConnectivity, however, is bad

Growing middle class with desire for education & technologyThese people are the next billion

Page 3: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion3

Bad Networking & Options

Africa often backhauled through EuropeSatellite latency not funGhana: 2Mbps, $6000/month!

Emerging option: disk1TB disk now $200Even latency better than satellite

Page 4: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion4

Enter the Tiny Laptops

Problem – memory in 256MB range

Page 5: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion5

Making Storage Work

Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc

Ship disk to schools Update as needed

Pull update caches on-demand during peak Push updates off peak, overnight

Page 6: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion6

Deployment Scenarios

Special servers per school 2 for redundancy

Average school size: 100 students @ 100/laptop, $10K/school

Problems 2 servers @ $5K doubles per-school cost Servers don’t ride laptop commodity curves

Solution: no servers, just laptops

Page 7: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion7

Goal: 1 TB Cache Store on a 256MB Laptop

Why caching?Improves Web accessImproves WAN access

ProblemLarge disks are really slowDisk storage requires indexIn-memory indices optimize disk

access

Page 8: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion8

Memory Index Sizing

Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index

Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM

Page 9: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion9

Revisiting Cache Indexing

Seek reduction important Most objects small Access largely random

High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate

High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc

Page 10: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion10

Restarting the Design

Eliminate in-memory indexTreat disk like memoryOptimize data structures for localityUse location-sensitive algorithmsMeasure performance

Now consider what to addFor each addition, measure

performance

Page 11: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion11

What This Yields

HashCache familyOne basic storage enginePluggable algorithms & indexing

HashCache proxyWeb proxy using HashCache engine

Page 12: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion12

Performance Comparison

Page 13: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion13

Index Bits Per Object

240

576

Page 14: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion14

Index Bits Per Object

240

5760 0

11

31

39

Page 15: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion15

HashCache Memory

Page 16: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion16

Storage Limits w/2GB Index

Page 17: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion17

Beyond Diminishing Returns

HTTP cachability has upper limitBeyond that, items revalidated helpsRevalidation on demand, or

background Uncached content still cachable

Wide-area acceleratorsMust still contact servers, though

Page 18: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion18

Why WAN Acceleration?

Lots of slowly-changing dataWikipediaNews sites“Customized” sites

WAN acceleration middleboxesCustom protocol between boxesStandard protocols to rest of netLess desirable than caches for Web

Page 19: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion19

WAN Acceleration Dilemma

WAN accelerators use chunksTransit stream broken into chunks

Small chunks = high compressionAlso lots of small objects

Large chunks = high performanceBut worse for compression

Memory & disk important

Page 20: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion20

Merging WAN Acc & HashCache

Easily index huge # chunksSmall chunks OKLarge chunks better

Store chunks redundantlyOptimize for performance &

compression Communicate tradeoffs to cache layer

Page 21: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion21

Deployments

Two cache instances deployedBoth in AfricaShared machines, multiple services

Working with OLPC on deploymentWorking on licensingHopefully resolved this yearGoal: all-in-one server for schools

Page 22: Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

Cache Storage for the Next Billion22

Longer Term Goals

Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable

More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior