distributed systems and security: an introduction brad karp ucl computer science cs gz03 / 4030 1 st...

15
Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

Upload: harriet-french

Post on 27-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

Distributed Systems and Security:

An Introduction

Brad KarpUCL Computer Science

CS GZ03 / 40301st October, 2007

Page 2: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

2

Today’s Lecture

• Logistics• Course Communication• Overview of Distributed Systems

– What are they?– Why build them?– Why are they hard to build well?

• Detailed Syllabus: Distributed Systems• Assessment regime• Questionnaire

Page 3: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

3

Logistics

• Meeting Times/Locations– Monday: 9 AM – 10 AM , MPEB 1.02– Monday: 1 PM – 2 PM, Roberts 309– Tuesday: 1 PM – 2 PM, Anatomy Gavin de Beer

LT

• Schedule– 1st October – 30th October: Distributed Systems– 5th November – 9th November: Reading Week

(no lecture)– 12th November – 11th December: Security

Page 4: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

4

Course Communication

• Course web page– http://www.cs.ucl.ac.uk/staff/B.Karp/gz03/f2007

/– Detailed calendar: readings, lecture topics,

coursework, announcements/corrections– Your responsibility: check page daily!

• Course mailing lists– {gz03,4030}@<department’s domain>– Subscribe by sending mail to {gz03-request,4030-request}@<department’s domain> with one-word subject join

– Must subscribe from UCL CS email address– Used for course announcements– Your responsibility: check email daily!

Page 5: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

5

What Is a Distributed System?

• Multiple computers (“machines,” “hosts,” “boxes,” &c.)– Each with CPU, memory, disk, network

interface– Interconnected by LAN or WAN (e.g.,

Internet)

• Application runs across this dispersed collection of networked hardware

• But user sees single, unified system

Page 6: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

6

What Is a Distributed System?(Alternate Take)

“A distributed system is a system in which I can’t do my work because some computer that I’ve never even heard of has failed.”

– Leslie Lamport, Microsoft Research (ex DEC)

Page 7: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

7

Start Simple: Centralized System

• Suppose you run Gmail• Workload:

– Inbound email arrives; store on disk– Users retrieve, delete their email

• You run Gmail on one server with disk

GmailServer (PC)

EmailSender

EmailSender

EmailSender

EmailReader

EmailReader

EmailReader

What are shortcomings of this design?

Page 8: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

8

Why Distribute? For Availability

• Suppose Gmail server goes down, or network between client and it goes down

• No incoming mail delivered, no users can read their inboxes

• Fix: replicate the data on several servers– Increased chance some server will be reachable– Consistency? One server down when delete

message, then comes back up; message returns in inbox

– Latency? Replicas should be far apart, so they fail independently

– Partition resilience? e.g., airline seat database splits, one seat remains, bought twice, once in each half!

Page 9: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

9

Why Distribute?For Scalable Capacity

• What if Gmail a huge success?• Workload exceeds capacity of one

server• Fix: spread users across several servers

– Best case: linear scaling—if U users per box, N boxes support NU users

– Bottlenecks? If each user’s inbox on one server, how to route inbound mail to right server?

– Scaling? How close to linear?– Load balance? Some users get more mail

than others!

Page 10: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

10

Performance Can Be Subtle

• Goal: predictable performance under high load

• 2 employees run a Starbucks– Employee 1: takes orders from customers,

calls them out to Employee 2– Employee 2:

• writes down drink orders (5 seconds per order)• makes drinks (10 seconds per order)

• What is throughput under increasing load?

Page 11: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

11

Starbucks Throughput

• Peak system performance: 4 drinks / min• What happens when load > 4 orders / min?• What happens to efficiency as load increases?

What would preferable curve be?What design achieves that goal?

Page 12: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

12

Why Are Distributed SystemsHard to Design?

• Failure: of hosts, of network– Remember Lamport’s lament

• Heterogeneity– Hosts may have different data representations

• Need consistency (many specific definitions)– Users expect familiar “centralized” behavior

• Need concurrency for performance– Avoid waiting synchronously, leaving resources

idle– Overlap requests concurrently whenever possible

Page 13: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

13

Security

• Before Internet:– Encryption and authentication using cryptography– Between parties known to each other (e.g.,

diplomatic wire)• Today:

– Entire Internet of potential attackers– Legitimate correspondents often have no prior

relationship– Online shopping: how do you know you gave credit

card number to amazon.com? How does amazon.com know you are authorized credit card user?

– Software download: backdoor in your new browser?– Software vulnerabilities: remote infection by

worms!– Crypto not enough alone to solve these problems!

Page 14: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

14

Detailed Syllabus

• No textbook• Readings: research papers on real, built

distributed systems that illustrate concepts– You must read papers by day assigned!– Lectures will assume you have.

• Lectures: (some) background for papers; review system from paper; discuss system

• See schedule on course web page

Page 15: Distributed Systems and Security: An Introduction Brad Karp UCL Computer Science CS GZ03 / 4030 1 st October, 2007

15

How Will You Be Evaluated?

• 2 courseworks, 15% total– One will involve significant programming: 10%

• Out: 9th Oct, Due: 29th Oct

– Other will be written problem set: 5%• Out: 20th Nov, Due: 13th Dec

• 85% final exam– 2.5 hours; rubric: 3 of 5 questions– 4030 (4th-years): must get >= 40% to pass

• Overall:– 4030 (4th-years): must get >= 40% mean to pass– GZ03 (DCNDS, SSE): must get >= 50% mean to

pass