distributed systemspages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server...

50
DISTRIBUTED SYSTEMS Shivaram Venkataraman CS 537, Spring 2019

Upload: others

Post on 29-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

DISTRIBUTED SYSTEMS

Shivaram Venkataraman CS 537, Spring 2019

Page 2: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

ADMINISTRIVIA

Project 5: Due April 29. Last Project! Final Exam: Everything before the last lecture Discussion today: Worksheet on topics after midterm Peer mentors for next semester! https://forms.gle/h7zXQidTP4QxiwVD8

Page 3: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

AGENDA / LEARNING OUTCOMES

What are the design principles for systems that operate across machines? How to handle partial failures?

Page 4: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RECAP

Page 5: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

OPERATING SYSTEMS: THREE EASY PIECES

Three conceptual pieces

1. Virtualization

2. Concurrency

3. Persistence

Page 6: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

DISTRIBUTED SYSTEMS

Page 7: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

HOW DOES GOOGLE SEARCH WORK?

Page 8: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

What is a Distributed System?

A distributed system is one where a machine I’ve never heard of can cause my program to fail.— Leslie Lamport

Definition: More than one machine working together to solve a problem

Examples: –  client/server: web server and web client–  cluster: page rank computation

Page 9: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Why Go Distributed?

More computing power More storage capacity Fault tolerance Data sharing

Page 10: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

New Challenges

System failure: need to worry about partial failure

Communication failure: links unreliable-  bit errors-  packet loss-  node/link failure

Why are network sockets less reliable than pipes?

Page 11: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Communication Overview

Raw messages: UDP Reliable messages: TCP Remote procedure call: RPC

Page 12: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Raw Messages: UDP

UDP : User Datagram Protocol  API: - reads and writes over socket file descriptors - messages sent from/to ports to target a process on machine Provide minimal reliability features: - messages may be lost - messages may be reordered - messages may be duplicated - only protection: checksums to ensure data not corrupted

Page 13: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Raw Messages: UDP

Advantages –  Lightweight –  Some applications make better reliability decisions themselves (e.g., video

conferencing programs) Disadvantages –  More difficult to write applications correctly

Page 14: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Reliable Messages: Layering strategy

TCP: Transmission Control Protocol Using software to build

reliable logical connections over unreliable physical connections

Page 15: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Technique #1: ACK

Sender [send message] [recv ack]

Receiver [recv message] [send ack]

Ack: Sender knows message was received What to do about message loss?

Page 16: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Technique #2: Timeout

Sender [send message] [start timer] … waiting for ack … [timer goes off] [send message] [recv ack]

Receiver [recv message] [send ack]

Page 17: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

TIMEOUT

How long to wait? Too long? –  System feels unresponsive

Too short? –  Messages needlessly re-sent –  Messages may have been dropped due to overloaded server. Resending makes

overload worse!

Page 18: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

LOST ACK PROBLEM

Sender [send message] [timeout] [send message] [recv ack]

Receiver [recv message] [send ack] [ignore message] [send ack]

Page 19: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

SEQUENCE NUMBERS

Sequence numbers - senders gives each message an increasing unique seq number - receiver knows it has seen all messages before N

Suppose message K is received. - if K <= N, Msg K is already delivered, ignore it - if K = N + 1, first time seeing this message - if K > N + 1 ?

Page 20: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

TCP

TCP: Transmission Control Protocol Most popular protocol based on seq nums Buffers messages so arrive in order Timeouts are adaptive

Page 21: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Communications Overview

Raw messages: UDP Reliable messages: TCP Remote procedure call: RPC

Page 22: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RPC

Remote Procedure Call

What could be easier than calling a function?

Approach: create wrappers so calling a function on another machine feels just like calling a local function!

Page 23: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RPC int main(…) {

int x = foo(”hello”); } int foo(char *msg) {

send msg to B recv msg from B

}

Machine A int foo(char *msg) {

… } void foo_listener() {

while(1) { recv, call foo }

}

Machine B

Page 24: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RPC int main(…) {

int x = foo(”hello”); } int foo(char *msg) {

send msg to B recv msg from B

}

Machine A int foo(char *msg) {

… } void foo_listener() {

while(1) { recv, call foo }

}

Machine B

client wrapper

server wrapper

Page 25: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RPC Tools

RPC packages help with two components(1) Runtime library

–  Thread pool–  Socket listeners call functions on server

(2) Stub generation–  Create wrappers automatically–  Many tools available (rpcgen, thrift, protobufs)

Page 26: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Wrapper Generation

Wrappers must do conversions: - client arguments to message - message to server arguments - convert server return value to message - convert message to client return value Need uniform endianness (wrappers do this) Conversion is called marshaling/unmarshaling, or serializing/deserializing

Page 27: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Wrapper Generation: Pointers

Why are pointers problematic? Address passed from client not valid on server Solutions? Smart RPC package: follow pointers and copy data

Page 28: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Sender [call] [tcp send] [recv] [ack]

Receiver [recv] [ack] [exec call] … [return] [tcp send]

RPC over TCP?

Why wasteful?

Page 29: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

RPC over UDP

Strategy: use function return as implicit ACK Piggybacking technique What if function takes a long time? then send a separate ACK

Sender [call] [udp send] [recv]

Receiver [recv] [exec call] … [return] [tcp send]

Page 30: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

BREAK! NO QUIZ!

Course feedback: https://aefis.wisc.edu

Page 31: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

DISTRIBUTED SYSTEMS IN PRACTICE

Page 32: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

DISTRIBUTED SYSTEMS

Classic systems, algorithms

Grapevine: An exercise in distributed computing Andrew Birrell Roy Levin Roger M. Needham Mike Schroeder Communications of the ACM | April 1982, Vol 25

Time, Clocks, and the Ordering of Events in a Distributed System L. Lamport Communications of the ACM Vol. 21, No. 7 (July 1978)

CS 739

Page 33: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Google 1997

Page 34: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Data, Data, Data

“…Storage space must be used efficiently to store indices and, optionally, the documents themselves. The indexing system must process hundreds of gigabytes of data efficiently…”

Page 35: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Commodity CPUs Lots of disks Low bandwidth network

Google 2001

Cheap !

Page 36: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Datacenter Evolution

Facebook’s daily logs: 60 TB Google web index: 10+ PB

0

5

10

15

2010 2011 2012 2013 2014 2015

Moore's Law

Overall Data

(IDC report*)

Page 37: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Datacenter Evolution

Google data centers in The Dalles, Oregon

Page 38: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Datacenter Evolution

Capacity: ~10000 machines

Bandwidth: 12-24 disks per node

Latency: 256GB RAM cache

Page 39: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing
Page 40: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Jeff Dean @ Google

Page 41: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

MAPREDUCE

Page 42: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

PROGRAMMING MODEL

Data type: Each record is (key, value) Map function:

(Kin, Vin) à list(Kinter, Vinter)

Reduce function: (Kinter, list(Vinter)) à list(Kout, Vout)

Page 43: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Example: Word Count

def mapper(line): for word in line.split(): output(word, 1) def reducer(key, values): output(key, sum(values))

Page 44: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Word Count Execution

the quick brown fox

the fox ate the mouse

how now brown cow

Map

Map

Map

Reduce

Reduce

brown, 2 fox, 2 how, 1 now, 1 the, 3

ate, 1 cow, 1

mouse, 1 quick, 1

the, 1 brown, 1

fox, 1 quick, 1

the, 1 fox, 1 the, 1

how, 1 now, 1

brown, 1

ate, 1 mouse, 1

cow, 1

Input Map Shuffle & Sort Reduce Output

Page 45: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Word Count Execution

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

Automatically split work

Schedule tasks with locality

JobTracker Submit a Job

Page 46: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Fault Recovery If a task crashes:

–  Retry on another node –  If the same task repeatedly fails, end the job

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

Page 47: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Fault Recovery If a task crashes:

–  Retry on another node –  If the same task repeatedly fails, end the job

Requires user code to be deterministic

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

Page 48: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

Fault Recovery

If a node crashes: –  Relaunch its current tasks on other nodes What about task inputs ? File system replication

the quick brown fox

Map Map

the fox ate the mouse

Map

how now brown cow

Page 49: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

the quick brown fox

Map

Fault Recovery

If a task is going slowly (straggler): –  Launch second copy of task on another node –  Take the output of whichever finishes first

the quick brown fox

Map

the fox ate the mouse

Map

how now brown cow

Page 50: DISTRIBUTED SYSTEMSpages.cs.wisc.edu/~shivaram/cs537-sp19-notes/dist...– client/server: web server and web client – cluster: page rank computation. Why Go Distributed? More computing

NEXT STEPS

Next class: Distributed Filesystem(NFS) Discussion this week: Worksheet and review, Q&A for P5