11 yeah! that’s what i’d like to know. indranil gupta (indy) lecture 3 cloud computing continued...

44
1 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring 2011 All Slides © IG (Acknowledgments: Md Yusuf Sarwar and Md Ahsan Aref

Upload: lynette-knight

Post on 29-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

11

Yeah! That’s what I’d like to know.

Indranil Gupta (Indy)Lecture 3

Cloud Computing ContinuedJanuary 25, 2011

CS 525 Advanced Distributed

SystemsSpring 2011

All Slides © IG (Acknowledgments: Md Yusuf Sarwar and Md Ahsan Arefin)

Page 2: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

2

Timesharing Industry (1975):•Market Share: Honeywell 34%, IBM 15%, •Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10%•Honeywell 6000 & 635, IBM 370/168, Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108

Grids (1980s-2000s):•GriPhyN (1970s-80s)•Open Science Grid and Lambda Rail (2000s)•Globus & other standards (1990s-2000s)

First large datacenters: ENIAC, ORDVAC, ILLIACMany used vacuum tubes and mechanical relays

P2P Systems (90s-00s)•Many Millions of users•Many GB per day

Data Processing Industry - 1968: $70 M. 1978: $3.15 Billion.

Berkeley NOW ProjectSupercomputersServer Farms (e.g., Oceano)

“A Cloudy History of Time” © IG 2010

Clouds

Page 3: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

3

So What’s new in Today’s Clouds?

Besides massive scale, three major features:I. On-demand access: Pay-as-you-go, no upfront

commitment.– Anyone can access it (e.g., Washington Post – Hillary Clinton

example)

II. Data-intensive Nature: What was MBs has now become TBs.

– Daily logs, forensics, Web data, etc.– Do you know the size of Wikipedia dump?

III. New Cloud Programming Paradigms: MapReduce/Hadoop, Pig Latin, DryadLinq, Swift, and many others.

– High in accessibility and ease of programmability

Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing.

Page 4: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

4

OK, so that’s what a cloud looks like today. Now,

suppose I want to start my own company, Devils Inc.

Should I buy a cloud and own it, or should I outsource to a

public cloud?

4

Page 5: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

5

Single site Cloud: to Outsource or Own? [OpenCirrus paper]

• Medium-sized organization: wishes to run a service for M months– Service requires 128 servers (1024 cores) and 524 TB – Same as UIUC CCT cloud site

• Outsource (e.g., via AWS): monthly cost– S3 costs: $0.12 per GB month. EC2 costs: $0.10 per

Cpu hour– Storage ~ $62 K– Total ~ $136 K

• Own: monthly cost– Storage ~ $349 K / M– Total ~ $ 1555 K / M + 7.5 K (includes 1 sysadmin /

100 nodes)• using 0.45:0.0.4:0.15 split for hardware:power:network 5

Page 6: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

6

Single site Cloud: to Outsource or Own?

• Breakeven analysis: more preferable to own if:– $349 K / M < $62 K (storage)– $ 1555 K / M + 7.5 K < $136 K (overall)Breakeven points– M > 5.55 months (storage)

• Not surprising: Cloud providers benefit monetarily most from storage

– M > 12 months (overall)

• Assume hardware lasts for 3 years (typical lifetime)• Systems are typically underutilized• With system utilization of x%, still more preferable to own

if:– x > 33.3%– Even with CPU util of 20% (typical value for industry clouds, e.g.,

at Yahoo), storage > 47% makes owning preferable 6

Page 7: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

7

I want to do research in this area. I am sure there are no grand challenges in cloud

computing!

7

Page 8: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

8

10 Challenges [Above the Clouds](Index: Performance Data-related Scalability Logisitical)

• Availability of Service: Use Multiple Cloud Providers; Use Elasticity; Prevent DDOS

• Data Lock-In: Enable Surge Computing; Standardize APIs• Data Confidentiality and Auditability: Deploy Encryption, VLANs,

Firewalls, Geographical Data Storage• Data Transfer Bottlenecks: Data Backup/Archival; Higher BW

Switches; New Cloud Topologies; FedExing Disks• Performance Unpredictability: QoS; Improved VM Support; Flash

Memory; Schedule VMs• Scalable Storage: Invent Scalable Store• Bugs in Large Distributed Systems: Invent Debuggers; Real-time

debugging; predictable pre-run-time debugging• Scaling Quickly: Invent Good Auto-Scalers; Snapshots for

Conservation• Reputation Fate Sharing• Software Licensing: Pay-for-use licenses; Bulk use sales 8

Page 9: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

9

A more Bottom-Up View of Open Research Directions

Myriad interesting problems that acknowledge the characteristics that make today’s cloud computing unique: massive scale + on-demand + data-intensive + new programmability + and infrastructure- and application-specific details.

Monitoring: of systems&applications; single site and multi-site Storage: massive scale; global storage; for specific apps or classes Failures: what is their effect, what is their frequency, how do we achieve

fault-tolerance? Scheduling: Moving tasks to data, dealing with federation Communication bottleneck: within applications, within a site Locality: within clouds, or across them Cloud Topologies: non-hierarchical, other hierarchical Security: of data, of users, of applications, confidentiality, integrity Availability of Data Seamless Scalability: of applications, of clouds, of data, of everything Inter-cloud/multi-cloud computations Second Generation of Other Programming Models? Beyond MapReduce! Pricing Models Explore the limits today’s of cloud computing

9

Page 10: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

10

Alright. But, I bet that if I a have a ton of data to

process, it is very difficult to write a program for it!

Page 11: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

11

New Parallel Programming Paradigms: MapReduce

• Highly-Parallel Data-Processing• Originally designed by Google (OSDI 2004 paper)• Open-source version called Hadoop, by Yahoo!

– Hadoop written in Java. Your implementation could be in Java, or any executable

• Google (MapReduce)– Indexing: a chain of 24 MapReduce jobs– ~200K jobs processing 50PB/month (in 2006)

• Yahoo! (Hadoop + Pig)– WebMap: a chain of 100 MapReduce jobs– 280 TB of data, 2500 nodes, 73 hours

• Annual Hadoop Summit: 2008 had 300 attendees, 2009 had 700 attendees

Page 12: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

What is MapReduce?

• Terms are borrowed from Functional Language (e.g., Lisp)

Sum of squares:

• (map square ‘(1 2 3 4))– Output: (1 4 9 16)

[processes each record sequentially and independently]

• (reduce + ‘(1 4 9 16))– (+ 16 (+ 9 (+ 4 1) ) )– Output: 30

[processes set of all records in a batch]

• Let’s consider a sample application: Wordcount– You are given a huge datasets (e.g., Wikipedia dump) and asked to list the count

for each of the words in each of the documents therein

12

Page 13: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Map

• Process individual records to generate intermediate key/value pairs.

Welcome EveryoneHello Everyone

Welcome1Everyone 1 Hello 1Everyone 1 Input <filename, file text>

Key Value

13

Page 14: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Map

• Parallelly Process individual records to generate intermediate key/value pairs.

Welcome EveryoneHello Everyone

Welcome1Everyone 1 Hello 1Everyone 1 Input <filename, file text>

MAP TASK 1

MAP TASK 2

14

Page 15: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Map

• Parallelly Process a large number of individual records to generate intermediate key/value pairs.

Welcome Everyone

Hello Everyone

Why are you here

I am also here

They are also here

Yes, it’s THEM!

The same people we were thinking of

…….

Welcome 1

Everyone 1

Hello 1

Everyone 1

Why 1

Are 1

You 1

Here 1

…….

Input <filename, file text>

MAP TASKS

15

Page 16: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Reduce

• Processes and merges all intermediate values associated per key

Welcome1Everyone 1 Hello 1Everyone 1

Everyone 2 Hello 1Welcome1

Key Value

16

Page 17: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Reduce

• Parallelly Processes and merges all intermediate values by partitioning keys

Welcome1Everyone 1 Hello 1Everyone 1

Everyone 2 Hello 1Welcome1

REDUCE TASK 1

REDUCE TASK 2

17

Page 18: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Some Other Applications of MapReduce

• Distributed Grep:– Input: large set of files– Map – Emits a line if it matches the supplied pattern– Reduce - Simply copies the intermediate data to output

• Count of URL access frequency– Input: Web access log, e.g., from proxy server– Map – Processes web log and outputs <URL, 1>– Reduce - Emits <URL, total count>

• Reverse Web-Link Graph– Input: Web graph (page page)– Map – process web log and outputs <target, source>– Reduce - emits <target, list(source)>

18

Page 19: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

19

Programming MapReduce

• Externally: For user1. Write a Map program (short), write a Reduce program (short)2. Submit job; wait for result3. Need to know nothing about parallel/distributed programming!

• Internally: For the cloud (and for us distributed systems researchers)

1. Parallelize Map2. Transfer data from Map to Reduce3. Parallelize Reduce4. Implement Storage for Map input, Map output, Reduce input,

and Reduce output

Page 20: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

20

Inside MapReduce • For the cloud (and for us distributed systems

researchers)1. Parallelize Map: easy! each map job is independent of the other!

• All Map output records with same key assigned to same Reduce

2. Transfer data from Map to Reduce: • All Map output records with same key assigned to same Reduce

task • use partitioning function, e.g., hash(key)

3. Parallelize Reduce: easy! each reduce job is independent of the other!

4. Implement Storage for Map input, Map output, Reduce input, and Reduce output

• Map input: from distributed file system• Map output: to local disk (at Map node); uses local file system• Reduce input: from (multiple) remote disks; uses local file systems• Reduce output: to distributed file systemlocal file system = Linux FS, etc.distributed file system = GFS (Google File System), HDFS (Hadoop

Distributed File System)

Page 21: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

21

Internal Workings of MapReduce

Page 22: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

22

Fault Tolerance

• Worker Failure– Master keeps 3 states for each worker task

• (idle, in-progress, completed)

– Master sends periodic pings to each worker to keep track of it (central failure detector)

• If fail while in-progress, mark the task as idle

• If map workers fail after completed, mark worker as idle

• Reduce task does not start until all Map tasks done, and all its (Reduce’s) data has been fetched

• Master Failure– Checkpoint

Page 23: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

23

Locality and Backup tasks• Locality

– Since cloud has hierarchical topology– GFS stores 3 replicas of each of 64MB chunks

• Maybe on different racks, e.g., 2 on a rack, 1 on a different rack

– Attempt to schedule a map task on a machine that contains a replica of corresponding input data: why?

• Stragglers (slow nodes)– Due to Bad Disk, Network Bandwidth, CPU, or Memory– Keep track of “progress” of each task (% done)– Perform backup (replicated) execution of straggler task:

task considered done when first replica complete

Page 24: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

24

Grep

Locality optimization helps: • 1800 machines read 1 TB at peak ~31 GB/s • W/out this, rack switches would limit to 10 GB/s

Startup overhead is significant for short jobs

Workload: 1010 100-byte records to extract records

matching a rare pattern (92K matching records)

Testbed: 1800 servers each with 4GB RAM, dual 2GHz Xeon, dual 169 GB IDE disk, 100 Gbps, Gigabit ethernet per machine

Page 25: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

25

• Hadoop always either outputs complete results, or none– Partial results?– Can you characterize partial results of a partial

MapReduce run?

• Storage: Is the local write-remote read model good for Map output/Reduce input?– What happens on node failure?– Can you treat intermediate data separately, as a first-

class citizen?

• Entire Reduce phase needs to wait for all Map tasks to finish: in other words, a barrier– Why? What is the advantage? What is the

disadvantage?– Can you get around this?

Discussion Points

Page 26: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

26

Hmm, CCT and OpenCirrus are new. But

are there no older testbeds?

Page 27: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

• A community resource open to researchers in academia and industry• http://www.planet-lab.org/• (About )1077 nodes at 494 sites across the world• Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by

494 sites

• Node: Dedicated server that runs components of PlanetLab services.• Site: A location, e.g., UIUC, that hosts a number of nodes.• Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology.

Needed for timesharing across users.• Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell

like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node.

• Thus, PlanetLab allows you to run real world-wide experiments.• Many services have been deployed atop it, used by millions (not just researchers): Application-level

DNS services, Monitoring services, CDN services.• If you need a PlanetLab account and slice for your CS525 experiment, let me know asap! There are

a limited number of these available for CS525.

27All images © PlanetLab

Page 28: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Emulab• A community resource open to researchers in academia and industry• https://www.emulab.net/ • A cluster, with currently 475 nodes• Founded and owned by University of Utah (led by Prof. Jay Lepreau)

• As a user, you can:– Grab a set of machines for your experiment– You get root-level (sudo) access to these machines– You can specify a network topology for your cluster (ns file format)

• Thus, you are not limited to only single-cluster experiments; you can emulate any topology

• If you need an Emulab account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525.

28All images © Emulab

Page 29: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

29

And then there were…Grids!

What is it?

Page 30: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

30

Example: Rapid Atmospheric Modeling System, ColoState U

• Hurricane Georges, 17 days in Sept 1998– “RAMS modeled the mesoscale convective

complex that dropped so much rain, in good agreement with recorded data”

– Used 5 km spacing instead of the usual 10 km– Ran on 256+ processors

• Can one run such a program without access to a supercomputer?

Page 31: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

31

Wisconsin

MITNCSA

Distributed ComputingResources

Page 32: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

32

An Application Coded by a Physicist

Job 0

Job 2

Job 1

Job 3

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

Jobs 1 and 2 can be concurrent

Page 33: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

33

An Application Coded by a Physicist

Job 2

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

May take several hours/days4 stages of a job

InitStage inExecuteStage outPublish

Computation Intensive, so Massively Parallel

Several GBs

Page 34: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

34

Wisconsin

MITNCSA

Job 0

Job 2Job 1

Job 3

Page 35: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

35

Job 0

Job 2Job 1

Job 3

Wisconsin

MIT

Condor Protocol

NCSAGlobus Protocol

Page 36: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

36

Job 0

Job 2Job 1

Job 3Wisconsin

MITNCSA

Globus Protocol

Internal structure of differentsites invisible to Globus

External Allocation & SchedulingStage in & Stage out of Files

Page 37: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

37

Job 0

Job 3Wisconsin

Condor Protocol

Internal Allocation & SchedulingMonitoringDistribution and Publishing of Files

Page 38: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

38

Tiered Architecture (OSI 7 layer-like)

Resource discovery,replication, brokering

High energy Physics apps

Globus, Condor

Workstations, LANs

Page 39: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

39

The Grid RecentlySome are 40Gbps links!(The TeraGrid links)

“A parallel Internet”

Page 40: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

40

Globus Alliance

• Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers

• Activities : research, testbeds, software tools, applications

• Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for

resource monitoring, discovery, and management, plus security and file management.  Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”

Page 41: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

41

Some Things Grid Researchers Consider Important

• Single sign-on: collective job set should require once-only user authentication

• Mapping to local security mechanisms: some sites use Kerberos, others using Unix

• Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1

• Community authorization: e.g., third-party authentication

• For clouds, you need to additionally worry about failures, scale, on-demand nature, and so on.

Page 42: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

42

• Cloud computing vs. Grid computing: what are the differences?

• National Lambda Rail: hot in 2000s, funding pulled in 2009

• What has happened to the Grid Computing Community?– See Open Cloud Consortium– See CCA conference (2008, 2009)

Discussion Points

Page 43: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

Entr. Tidbits: Where to get your ideas from (for your project)

• Sometimes the best ideas come from something that you would rather use (or someone close to you would)

– Adobe’s John developed Illustrator because his wife was a graphic designer and LaserWriter was not enough

– Bluemercury.com was founded by a woman interested in high-end cosmetics herself– Militaryadvantage.com was started by a young veteran who had had trouble finding out

about benefits and connecting with other servicemen and women when he was in service

– Facebook was developed by Zuckerberg who wanted to target exclusive clubs in Harvard for his own reasons.

– Flickr developed the photo IM’ing and sharing service internally for themselves– Craigslist originally started as an email list for events (and then the cc feature broke,

forcing Newmark to put it all on a website)– Yahoo! was started by Yang and Filo, when they were grad students, as a

collection of links to research papers in the EE field. Slowly they came to realize that this idea could be generalize to the entire web.

• Sometimes the best ideas come bottom-up from what you are really passionate about

– Gates and Microsoft, Wozniak and Apple, Facebook and Zuckerberg• The above two categories are not mutually exclusive!• (Geschke, Adobe) “If you aren't passionate about what you are going to do,

don't do it. Work smart and not long, because you need to preserve all of your life, not just your work life.”

43

Page 44: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 3 Cloud Computing Continued January 25, 2011 CS 525 Advanced Distributed Systems Spring

44

Administrative Announcements

If you don’t have a project partner yet, please wait a few minutes after lecture (i.e., now!) to see if you can find partner(s)

44