cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...

Cache Storage For the Next Billion

Students: Anirudh Badam, Sunghwan Ihm

Research Scientist: KyoungSoo Park

Presenter: Vivek Pai

Collaborator: Larry Peterson

Cache Storage for the Next Billion2

The Next Billion

Developing regions are not all alike Many people have stable food, clean

water, reasonable powerConnectivity, however, is bad

Growing middle class with desire for education & technologyThese people are the next billion


Bad Networking & Options

Africa often backhauled through EuropeSatellite latency not funGhana: 2Mbps, $6000/month!

Emerging option: disk1TB disk now $200Even latency better than satellite


Enter the Tiny Laptops

Problem – memory in 256MB range


Making Storage Work

Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc

Ship disk to schools Update as needed

Pull update caches on-demand during peak Push updates off peak, overnight


Deployment Scenarios

Special servers per school 2 for redundancy

Average school size: 100 students @ 100/laptop, $10K/school

Problems 2 servers @ $5K doubles per-school cost Servers don’t ride laptop commodity curves

Solution: no servers, just laptops


Goal: 1 TB Cache Store on a 256MB Laptop

Why caching?Improves Web accessImproves WAN access

ProblemLarge disks are really slowDisk storage requires indexIn-memory indices optimize disk

access


Memory Index Sizing

Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index

Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM


Revisiting Cache Indexing

Seek reduction important Most objects small Access largely random

High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate

High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc


Restarting the Design

Eliminate in-memory indexTreat disk like memoryOptimize data structures for localityUse location-sensitive algorithmsMeasure performance

Now consider what to addFor each addition, measure

performance


What This Yields

HashCache familyOne basic storage enginePluggable algorithms & indexing

HashCache proxyWeb proxy using HashCache engine


Performance Comparison


Index Bits Per Object

240

576


Index Bits Per Object

240

5760 0

11

31

39


HashCache Memory


Storage Limits w/2GB Index


Beyond Diminishing Returns

HTTP cachability has upper limitBeyond that, items revalidated helpsRevalidation on demand, or

background Uncached content still cachable

Wide-area acceleratorsMust still contact servers, though


Why WAN Acceleration?

Lots of slowly-changing dataWikipediaNews sites“Customized” sites

WAN acceleration middleboxesCustom protocol between boxesStandard protocols to rest of netLess desirable than caches for Web


WAN Acceleration Dilemma

WAN accelerators use chunksTransit stream broken into chunks

Small chunks = high compressionAlso lots of small objects

Large chunks = high performanceBut worse for compression

Memory & disk important


Merging WAN Acc & HashCache

Easily index huge # chunksSmall chunks OKLarge chunks better

Store chunks redundantlyOptimize for performance &

compression Communicate tradeoffs to cache layer


Deployments

Two cache instances deployedBoth in AfricaShared machines, multiple services

Working with OLPC on deploymentWorking on licensingHopefully resolved this yearGoal: all-in-one server for schools


Longer Term Goals

Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable

More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior

cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...

Documents

cache storage

cache instances

storage work

web slide

slow disk storage

satellite slide

hashcache memory slide

disk access slide