how the cloud works
DESCRIPTION
How The Cloud Works. Cornell University. Ken Birman. Consider Facebook. Popular social networking site (currently blocked in China), with > 1B users, growing steadily Main user interface: very visual, with many photos, icons, ways to comment on things, “like/dislike” Rapid streams of updates. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/1.jpg)
1
HOW THE CLOUD WORKSKen BirmanCornell University
![Page 2: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/2.jpg)
2
Consider Facebook Popular social networking site (currently
blocked in China), with > 1B users, growing steadily
Main user interface: very visual, with many photos, icons, ways to comment on things, “like/dislike”
Rapid streams of updates
![Page 3: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/3.jpg)
3
Facebook Page Page itself was
downloaded from Facebook.com They operate many
data centers, all can serve any user
Page was full of URLs Each URL triggered
a further download. Many fetched photos or videos User’s wall: a
continuously scrolling information feed with data from her friends,
news, etc
![Page 4: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/4.jpg)
4
Facebook image fetching architecture
local cache
Akamai
Facebook Edge
Edge Cache
Facebook Edge
Edge
Cache
Facebook Edge
Edge
Cache
Facebook Edge
Edge
Cache
Facebook Resizer
Resizer
Cache
Haystack
Akamai
Akamai
Akamai
Cloud
![Page 5: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/5.jpg)
5
The system... Operates globally, on every continent Has hundreds of Facebook Edge sites Dozens of Resizer locations just in the
USA, many more elsewhere A “few” Haystack systems for each
continent
Close relationships with Akamai, other network providers
![Page 6: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/6.jpg)
6
Things to notice The cloud isn’t “about” one big system
We see multiple systems that talk easily to on-another all using web-page standards (XML, HTML, MPLS...)
They play complementary roles Facebook deals with Akamai for caching of
certain kinds of very popular images, AdNet and others for advertising, etc
And within the Facebook cloud are many, many, interconnected subsystems
![Page 7: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/7.jpg)
7
Why so many caches? To answer, need to start by
understanding what a cache does A cache is a collection of web pages and
images, fetched previously and then retained
A request for an image already in the cache will be satisfied from the cache
Goal is to spread the work of filling the Facebook web page so widely that no single element could get overloaded and become a bottleneck
![Page 8: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/8.jpg)
8
But why so many layers? Akamai.com is a web company
dedicated to caching for rapid data (mostly images) delivery If Facebook uses Akamai, why would
Facebook ever need its own caches? Do the caches “talk to each other”? Should
they?
To understand the cloud we should try and understand answers to questions like these
![Page 9: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/9.jpg)
9
Memcached: A popular cache Stands for “In-Memory Caching Daemon”
A simple, very common caching tool Each machine maintains its own (single)
cache
function get_foo(foo_id) foo = memcached_get("foo:" . foo_id) return foo if defined foo foo =
fetch_foo_from_database(foo_id) memcached_set("foo:" . foo_id, foo) return foo
end
![Page 10: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/10.jpg)
10
Should we use Memcached everywhere?
Cached data can be stale (old/incorrect) A cache is not automatically updated when data changes; need
to invalidate or update the entry yourself And you may have no way to know that the data changed
When a cache gets full, we must evict less popular content. What policy will be used? (Memcached: LRU)
When applications (on one machine) share Memcached, they need to agree on the naming rule they will use for content Otherwise could end up with many cached copies of Angelina
Joli and Brad Pitt, “filling up” the limited cache space
![Page 11: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/11.jpg)
11
Fun with Memcached There are systems built over Memcached
that have become very important Berkeley Spark system is a good example Spark Memcached + a nice rule for
naming what the cache contains
Spark approach focuses on in-memory caching on behalf of the popular MapReduce/Hadoop computing tool
![Page 12: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/12.jpg)
12
MapReduce/Hadoop Used when searching or indexing very large data sets
constructed from collections of web data For example, all the web pages in the Internet Or all the friending relationships in all of Facebook
Idea is to spread the data over many machines, then run highly parallel computations on unchanging data The actual computing tends to be simple programs Map step: spreads computing out. Reduce: combines
intermediary results. Result aggregated with exactly one copy of each intermediary output
Often iterative: second step depends on output of first step
![Page 13: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/13.jpg)
13
Spark: Memcached for MapReduce Spark developers reasoned that if MapReduce uses
files for the intermediary results, file I/O would be a peformance limiting cost Confirmed this using experiments It also turned out that many steps recompute nearly
the identical thing (for example by counting words in a file)
Memcached can help... if MapReduce can find the precomputed results and if we “place” tasks to run where those
precomputed results are likely to be found
![Page 14: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/14.jpg)
14
Spark “naming convention” Key idea in Spark: rather than name intermediate
results using URLs or file names, they use the “function that produced the result” Represented in a functional programming notation
based on the Microsoft LINQ syntax In effect: “This file contains ((X))”
Since the underlying data is unchanging, e.g. a file, “X” has same meaning at all times
Thus ((X)) has a fixed meaning too By cleverly keeping subresults likely to be
reused, Spark obtained huge speedups, often 1000x or more!
![Page 15: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/15.jpg)
15
Spark has become very important While idea is simple, providing a full
range of control over what is in the cache and when it is searched and when things are evicted is complex Spark functionality was initially very limited
but has become extremely comprehensive There is a user community of tens of
thousands
In the cloud, when things take off, they “go viral”! … even software systems
![Page 16: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/16.jpg)
16
Key insight In-memory caching can have huge and
very important performance implications
Caching, in general, is of vital importance in the cloud, where many computations run at high load and data rates and immense scale
But not every cache works equally well!
![Page 17: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/17.jpg)
17
Back to Facebook Seeing how important Spark became for
MapReduce, we can ask questions about the Facebook caching strategy Are these caches doing a good job? What
hit rate do they achieve? Should certain caches “focus” on retaining
only certain kinds of data, while other caches specialize in other kinds of data? When a Facebook component encounters a photo or video, can we “predict” the likely value of caching a copy?
![Page 18: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/18.jpg)
18
Using Memcached in a pool of machines
Facebook often has hundreds or thousands of machines in one spot, each can run Memcached
They asked: why not share the cache data? Leads to a distributed cache structure They built one using ideas from research
experiences
A distributed hash table offers a simple way to share data in a large collection of caches
![Page 19: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/19.jpg)
19
How it works We all know how a HashMap or
HashTable works in a language like Java or C# or C++ You take the object you want to save and
compute a HashCode for it. This is an integer and will look “random” but is deterministic for any single object
For example, it could be the XOR of the bytes in a file
Hashcodes are designed to spread data very evenly over the range of possible integers
![Page 20: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/20.jpg)
Network communication It is easy for a program on
biscuit.cs.cornell.edu to send a message to a program on “jam.cs.cornell.edu” Each program sets up a “network socket Each machine has an IP address, you can
look them up and programs can do that too via a simple Java utility
Pick a “port number” (this part is a bit of a hack)
Build the message (must be in binary format)
Java utils has a request
20
![Page 21: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/21.jpg)
Distributed Hash Tables It is easy for a program on
biscuit.cs.cornell.edu to send a message to a program on “jam.cs.cornell.edu”
... so, given a key and a value1. Hash the key2. Find the server that “owns” the hashed
value 3. Store the key,value pair in a “local”
HashMap there
To get a value, ask the right server to look up key
21
![Page 22: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/22.jpg)
22
List of machines There are several ways to track the
machines in a network. Facebook just maintains a table In each FB data center there is a big table
of all machines currently in use Every machine has a private copy of this
table, and if a machine crashes or joins, it is quickly updates (seconds)
Can we turn our table of machines into a form of HashMap?
![Page 23: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/23.jpg)
23
From a table of machines to a DHT Take the healthy
machines Compute the
HashCode for each using its name, or ID
These are integers in range [Int.MinValue, Int.MaxValue]
Rule: an object with HashCode (O) will be be placed on the K machines closest to (O)
![Page 24: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/24.jpg)
24
Side remark about tracking members
This Facebook approach uses a “group” of machines and a “view” of the group We will make use of this in later lectures too But it is not the only way! Many DHTs track just log(N)
of the members and build a routing scheme that takes log(N) “hops” to find an object (See: Chord, Pastry...)
The FB approach is an example of a 1-hop DHT
Cloud systems always favor 1-hop solutions if feasible
![Page 25: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/25.jpg)
Distributed Hash Tables25
dht.Put(“ken”,2110)
(“ken”, 2110)
dht.Get(“ken”)
“ken”.hashcode()%N=77 IP.hashcode()%N=77123.45.66.781 123.45.66.782 123.45.66.783 123.45.66.784
IP.hashcode()%N=98
IP.hashcode()%N=13
IP.hashcode()%N=175
hashmap kept by 123.45.66.782
“ken”.hashcode()%N=77
![Page 26: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/26.jpg)
Facebook image “stack” We decided to study the effectiveness of caching in
the FB image stack, jointly with Facebook researchers
This stack’s role is to serve images (photos, videos) for FB’s hundreds of millions of active users About 80B large binary objects (“blob”) / day FB has a huge number of big and small data centers
“Point of presense” or PoP: some FB owned equipment normally near the user
Akamai: A company FB contracts with that caches images FB resizer service: caches but also resizes images Haystack: inside data centers, has the actual pictures (a
massive file system)
26
![Page 27: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/27.jpg)
What we instrumented in the FB stack
Think of Facebook as a giant distributed HashMap Key: photo URL (id, size, hints about where
to find it...) Value: the blob itself
27
![Page 28: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/28.jpg)
Facebook traffic for a week Client activity varies daily....
... and different photos have very different popularity statistics
28
![Page 29: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/29.jpg)
Facebook cache effectiveness Existing caches are very effective... ... but different layers are more effective
for images with different popularity ranks
29
![Page 30: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/30.jpg)
Facebook cache effectiveness Each layer should
“specialize” in different content.
Photo age strongly predicts effectiveness of caching
30
![Page 31: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/31.jpg)
Hypothetical changes to caching? We looked at the
idea of having Facebook caches collaborate at national scale…
… and also at how to
vary caching based on the “busyness” of the client
31
![Page 32: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/32.jpg)
Social networking effect? Hypothesis: caching will work best for
photos posted by famous people with zillions of followers
Actual finding: not really
32
![Page 33: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/33.jpg)
Locality? Hypothesis: FB probably serves photos
from close to where you are sitting
Finding: Not really...
… just the same, ifthe photo exists, itfinds it quickly
33
![Page 34: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/34.jpg)
Can one conclude anything? Learning what patterns of access arise,
and how effective it is to cache given kinds of data at various layers, we can customize cache strategies
Each layer can look at an image and ask “should I keep a cached copy of this, or not?”
Smart decisions Facebook is more effective!
34
![Page 35: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/35.jpg)
Strategy varies by layer Browser should cache less popular
content but not bother to cache the very popular stuff
Akamai/PoP layer should cache the most popular images, etc...
We also discovered that some layers should “cooperatively” cache even over huge distances Our study discovered that if this were done
in the resizer layer, cache hit rates could rise 35%!
35
![Page 36: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/36.jpg)
36
… many research questions arise Can we design much better caching
solutions?
Are there periods with bursts of failures? What causes them and what can be done?
How much of the data in a typical cache gets reused? Are there items dropped from cache that should have been retained?
![Page 37: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/37.jpg)
Overall picture in cloud computing Facebook example illustrates a style of
working Identify high-value problems that matter to
the community because of the popularity of the service, the cost of operating it, the speed achieved, etc
Ask how best to solve those problems, ideally using experiments to gain insight
Then build better solutions
37
![Page 38: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/38.jpg)
38
Learning More? We have a paper with more details and
data in the 2013 version of ACM Symposium on Operating Systems Principles, SOSP First author is Qi Huang, a Chinese student
who created the famous PPLive system, was studying for his PhD at WUST, then came to Cornell to visit
Qi will eventually earn two PhD degrees! One awarded by Cornell, one by WUST after he finishes
A very amazing and talented cloud computing researcher
![Page 39: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/39.jpg)
39
More about caching Clearly, caching is central to modern
cloud computing systems!
But the limitation that we are caching static data is worrying MapReduce/Hadoop use purely static data FB images and video are static data too But in “general” cloud computing will have
very dynamic kinds of data, rapidly changing
![Page 40: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/40.jpg)
40
Coherent Cache We say that a cache is coherent if it is always a
perfect real-time replica of the “true” data True object could be in a database or file system Or we could dispense with the true object and use
only in the in-memory versions. In this case the cache isn’t really a cache but is actually an in-memory replication scheme
In the cloud, file system access is too slow! So we should learn more about coherent
caching
![Page 41: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/41.jpg)
41
What could a coherent cache hold? A standard cache just has “objects” from some
data structure
A coherent cache could hold the entire data structure! A web graph with pages and links A graph of social network relationships, Twitter
feeds and followers, etc Objects on a Beijing street, so that a self-driving
car can safely drive to a parking area, park itself, and drive back later to pick you up
![Page 42: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/42.jpg)
42
Coherent data replication Clearly we will need to spread our data widely:
“Partitioning” is required We partition data in space (like with a DHT) Also in time (e.g. version of the database at time T, T+1,
…) Sometimes hierarchically (e.g. users from the US North
East, US Central, US North West…) Famous paper by Jim Gray & others: Dangers of
Database Replication and a Solution Shows that good partitioning functions are critically
needed Without great partitioning, replication slows a system
down!
![Page 43: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/43.jpg)
43
Aspects of coherent replication A partitioning method Many servers, small subsets for each partition
(“shard” has become the common term) Synchronization mechanisms for conflicting
actions A method for updating the data A way to spread the “read only” work over the
replicas Shard membership tracking Handling of faults that cause whole shard to crash
![Page 44: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/44.jpg)
44
Does Facebook have coherency? Experiments reveal answer: “no”
Create a Facebook account and put many images on it
Then form networks with many friends Now update your Facebook profile image a
few times
Your friends may see multiple different images of you on their “wall” for a long period of time! This reveals that it takes a long time
(hours) for old data to clear from the FB cache hierarchy!
![Page 45: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/45.jpg)
Inconsistencies in the cloud
![Page 46: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/46.jpg)
46
In fact this is common in today’s cloud
We studied many major cloud providing systems
Some guarantee coherency for some purposes but the majority are at best weakly coherent When data changes they need a long time
to reflect the updates They cache data heavily and don’t update
it promptly
![Page 47: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/47.jpg)
47
CAP Theorem Proposed by Berkeley Professor Eric Brewer
“You can have just 2 of Consistency, Availability, Partitioning or Fault Tolerance”
He argues that consistency is the guarantee to relax
We will look at this more closely later
Many people adopt his CAP based views This justifies non-coherent caching But their systems can’t solve problems in
ways guaranteed to be safe
![Page 48: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/48.jpg)
48
High assurance will need more! Remember that we are interested in the
question of how one could create a high assurance cloud!
Such a cloud needs to make promises If a car drives on a street it must not run
over people If a power system reconfigures it must not
explode the power generators A doctor who uses a hospital computing
system needs correct and current data
![Page 49: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/49.jpg)
49
So… we must look at coherent caching
In fact we will focus on “data replication” Data that should be in memory, for speed With a few shard members holding the information But with guarantees: if you compute using it, the data is
current
Why not a full database? Our coherent replication methods would live in structures
like the Facebook infrastructure: big, complex We need to build these in ways optimized to the uses Hence databases might be elements but we can’t just
hand the whole question to a database system
![Page 50: How The Cloud Works](https://reader035.vdocuments.mx/reader035/viewer/2022062815/56816915550346895de03118/html5/thumbnails/50.jpg)
50
Summary We looked at the architecture of a typical very
large cloud computing system (Facebook)
We saw that it uses caching extremely aggressively
Caching is fairly effective, but could improve
Coherent caching needed in high assurance systems but seems to be a much harder challenge