programming the cloud - epfl€¦ · programming the cloud james larus extreme computing group,...

34
Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Upload: others

Post on 01-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Programming the Cloud James Larus

eXtreme Computing Group, Microsoft Research PPoPP / HPCA

February 14, 2011

Page 2: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  New  compu*ng  pla/orm  •  Ubiquitous  access  •  Inherently  distributed  •  Many,  diverse  clients  (single  func*on  →  rich)  

•  “Unlimited”  computa*on  and  storage,  on  demand  

Client + Cloud Computing

2  

Page 3: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Cloud – or Fog?

3  

Page 4: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Amazon Kindle

4  

Page 5: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Evolution

5  

Page 6: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Revolution

6  

Page 7: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

New Hardware → New Software → New Experiences

7  

           

           

           

           

Page 8: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Dennard  scaling  (“Moore’s  Law”)  looks  to  end  within  a  few  years  •  Transistors  no  longer  running  faster  •  Soon  transistor  count  will  stop  increasing  

•  Finally,  break  von  Neumann  boQleneck  •  Parallelism  at  all  levels:  circuits,  processors,  systems  •  Distributed  too  

•  Heterogeneous,  diverse  world  •  Processors  •  Client  devices  •  Sensors  •  Programming  languages  

•  Communica*on  as  important  as  computa*on  •  For  users  as  well  as  computers!  

Quantitative Becomes Qualitative

8  

Page 9: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Rapid  global  adop*on  •  High  standard  for  availability  and  reliability  •  Con*nual  security  challenges  

•  Rapid  evolu*on  •  Amazon  /  EBay  →  Google  →  Facebook  /  TwiQer  →  …    •  Unprecedented  engineering  challenges  at  each  step  

•  Sensors  •  GPS,  camera,  temperature,  …  

•  Social  •  Mail,  Messenger,  Facebook,  TwiQer,  on-­‐line  games,  …  

We’ve Never Seen This Before, cont’d

9  

Pre-PC Era (1980)

PC Era (1995)

Internet Era (2000)

Consumer Era (Today+)

Mainframe Era

Page 10: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Does Programming Need to Change?

10  

Page 11: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Programming  model  ≠  programming  language  •  Paradigm  for  thinking  about  and  structuring  solu*ons  

•  Cloud  programming  models  do  not  exist  •  Web  1.0  technology  (PHP,  Ruby,  ASP.NET,  …)  

•  Lack  makes  building  scalable,  reliable  web  services  challenging,  even  for  experts  

•  No  integra*on  with  rapidly  diversifying  clients  

•  Challenge:  enable  run-­‐of-­‐the-­‐mill  developers  to  build  client  +  cloud  apps  

Cloud Programming Models

11  

Page 12: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Client + Cloud System

12  

Client  

Service  

Network  

Servers  

Storage  

Sensors  

Page 13: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

New Programming Model – New Problems (and some old, unsolved ones)

13  

Concurrency

Parallelism

Distribution

Availability Performance

Reliability

Power Security & Privacy

Page 14: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Services  are  inherently  concurrent  •  Simultaneously  process  mul*ple  requests  •  Built  from  parallel  processes  and  machines  

•  Threads  or  events?  •  Threads:  familiar  sequen*al  model  

•  But,  not  sequen*al  since  state  can  change  while  thread  is  preempted  

•  Thread  and  context  switching  overhead  limit  concurrency  

•  Events:  handlers  fracture  program  control  flow  •  Program  logic  split  across  handlers  

•  Explicit  manipula*on  of  local  state  

•  High-­‐level  (state  machine,  Actor,  …)  models?  •  Parallel  programming,  redux  

•  Absence  of  consensus  retards  research,  development,  reuse,  interoperability,  …  

Concurrency

14  

Page 15: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Modern  processors  are  parallel  •  Increased  performance  +  power  efficiency  

•  Processors  becoming  heterogeneous  •  Non-­‐isomorphic  func*onal  units  

•  Data  centers  are  parallel,  message-­‐passing  clusters  •  Driven  by  cost,  throughput,  availability  concerns  

•  Shared-­‐memory  parallel  programming  is  long-­‐standing  mistake  •  State  of  the  art:  threads  and  synchroniza*on  (assembly  language)  

•  Can’t  even  agree  on  shared  memory  seman*cs  

•  Abolish  shared  memory!  

Parallelism

15  

Page 16: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Data  movement  and  placement  is  fundamental  concern  

•  Network  bandwidth,  latency,  reliability  constraints  •  Why  are  40  computers  a  fundamental  building  block?  

•  Data  is  usually  too  large  to  move  •  Reverse  of  CPU-­‐centric  worldview:  bring  computa*on  to  data  

•  Mobile  clients  connected  by  narrow  straws  •  Cellular  service  limited  by  shared  bandwidth  and  spectrum  alloca*on  

•  WiFi  limited  by  power  and  access  point  availability  

Communications

16  

Page 17: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Distributed  systems  are  rich  source  of  difficult  problems  •  Replica*on  •  Consistency  •  Quorum  

•  Well-­‐studied  field  with  good  solu*ons  •  Replica*on  •  Persistence    •  Relaxed  consistency  

•  Techniques  are  complex  •  Integrate  into  programming  model  

Distribution

17  

Page 18: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Services  must  be  highly  available  •  Building  na*onal  infrastructure  •  Blackberry/Google/TwiQer…  outage  affect  millions  of  people  and  gets  na*onal  aQen*on  

•  High  availability  is  difficult  •  Hard  to  eliminate  “single  points  of  failure”  

•  Murphy’s  Law  rules  

•  An*the*cal  to  rapid  sooware  evolu*on  

•  Programming  languages  offer  liQle  support  for  systema*c  error  handling  •  Dispropor*onate  number  of  bugs  in  error-­‐handling  code  

Availability

18  

Page 19: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  System-­‐level  concern  •  Extends  far  beyond  code  running  on  single  machine  •  Most  performance  tools  focus  on  low-­‐level  details  (CPU,  cache,  …)  

•  Build,  observe,  tweak,  overprovision,  pray  •  Wasteful  and  uncertain  

•  Scalable  system  must  grow  by  adding  machines,  not  rewri*ng  sooware  •  Big-­‐O  nota*on  for  scalability  

•  Architecture  is  star*ng  point  •  Model  and  simulate  before  building  system  

•  Adap*vity  •  Performance  SLAs  should  be  specified  behavior  •  Systems  need  to  be  introspec*ve  and  capable  of  adap*ng  behavior  to  load  

Performance

19  

Page 20: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Power  constraints  at  both  extremes  

•  Data  centers  •  Limited  by  available  power  

•  Costs  heavily  influence  by  power  and  cooling  (CapEx  and  OpEx)  

•  Clients  •  BaQery  constrains  computa*on  and  communica*ons  

Power

20  

Page 21: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Reliability

•  Considerable  progress  in  past  decade  on  defect  detec*on  tools  •  Tools  focused  on  local  proper*es  (e.g.,  buffer  overruns,  test  coverage,  races,  etc.)  •  LiQle  effort  on  system-­‐wide  proper*es  

•  Modular  checking  •  Whole  program  analysis  expensive  and  difficult  and  not  prac*cal  for  services  

•  Asser*ons  and  annota*ons  at  module  boundaries  

•  New  domain  of  defects  •  Message  passing  

•  Code  robustness  •  Poten*al  performance  boQlenecks  

21  

Page 22: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Centralized  computa*on  and  data  creates  aQrac*ve  targets  •  Vandalism  and  fraud  ⇒  theo  and  espionage  

•  Is  current,  self-­‐administrator  model  beQer?  

•  Centralized  data  also  creates  incen*ves  for  misuse  by  service  providers  •  No  general  understanding  or  community  standards  for  acceptable  use  of  PII  

•  Unfortunately,  closely  *ed  to  (large)  sums  of  money  

•  Is  current  system  infrastructure  fatally  flawed?  •  Should  we  start  afresh?  •  Do  we  know  how  to  start  afresh  and  do  beQer?  

Security & Privacy

22  

Page 23: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Orleans

•  Programming  framework  for  client  +  cloud  compu*ng  •  Under  development  in  XCG  

•  Goals  •  Simple,  accessible  programming  

model  •  Scalable,  resilient  systems  •  Correct,  maintainable  systems  •  Single  model  for  client  and  server  

23  

Windows  

.NET  

  Azure  

Orleans  

Page 24: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Orleans •  Key  concepts  

•  Grains  •  Ac*va*ons  •  Message  passing  •  Promises  •  Transac*ons  

24  

Page 25: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Grain - Unit of Computation •  Encourage  scalable  architecture  and  

enable  automated  management  •  Sharded  data  par**oned  among  

servers  •  Encapsula*on  allows  migra*on  for  

load  balancing  and  reliability  •  Replica*on  for  throughput  and  

reliability  •  Internally  single-­‐threaded  

•  Simple  programming  model  •  Concurrency  through  replicated  

grains  (transac*onal  shared  state)  •  Strongly  typed  and  versioned  

interfaces  

25  

Data

Code

Request

Reply

Receive  

Compute  Reply  

Page 26: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Activation - Parallelism Across Requests

•  Mul*ple  ac*va*ons  of  a  grain  execute  concurrently  •  Single  request  processed  

completely  by  one  grain  •  Each  ac*va*on  isolated  

•  Use  to  reduce  latency  and  increase  system  throughput  

•  Mul*-­‐master  branch-­‐and-­‐merge  updates  for  persistent  state  

26  

Page 27: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Message Passing - Asynchronous Stranger in a Synchronous World •  Fundamental  in  distributed  systems  •  BeQer,  scalable  programming  model  

•  Performance  /  correctness  isola*on  

•  Clearly-­‐defined  points  of  interac*on  

•  LiQle  language  support  beyond  message-­‐passing  libraries  •  Message  passing  is  asynchronous  •  Func*ons  calls  are  synchronous  

27  

Page 28: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Promises - Resolve Impedance Mismatch

•  Bind  future  computa*on  to  future  result  •  Remote  opera*ons  return  promise  •  Caller  binds  closure  that  will  be  

evaluated  when  result  appears  •  Can  block  and  wait  for  result  (bad)  

•  Makes  concurrency  explicit  •  Mechanism  for  handling  remote  

errors  

28  

Page 29: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Transactions – Consistency and Error Handling

•  Grains  process  request  in  a  transac*on  •  Computa*on  either  commits  and  

leaves  state  consistent  •  Or,  terminates  cleanly  

•  Goals  •  Isolate  dis*nct  computa*ons  •  Consistent  (replicated)  ac*va*ons  •  Simplify  error  handling  

•  Lightweight,  op*mis*c  implementa*on  •  Less  than  serializable  

29  

Page 30: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

Orleans Runtime

•  Factor  out  common,  important  func*onality  to  cloud  apps  •  Complex  to  implement  •  Hard  to  get  correct  •  Typically  aoerthought  

•  Deployment,  management,  maintenance  challenging  for  services  

30  

Azure

Applica*on

Orleans  Run+me Geo-­‐Distribu*on Data  Replica*on  and  Consistency Performance  Monitoring Adap*ve  Control

Distributed  Debugging Run*me  (Correctness)  Monitoring Deployment,  Configura*on,  Maintenance

Page 31: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Challenge:  scale  to  10K’s  nodes  and  provide  service-­‐level  reliability  on  unreliable  pla/orm  •  Data  par**oning,  service  replica*on,  automated  monitoring  and  control  

•  Orleans  architecture  is  collec*on  of  interdependent,  composable  services  with  declara*ve  service-­‐level  agreements  (SLAs)  •  Each  built  from  one  or  more  types  of  grain  

•  Independently  deployable  and  updatable    •  SLA,  with  Marlowe  performance  monitoring,  supports  dynamic,  adap*ve  resource  alloca*on  and  op*miza*on  

Application Model

31  

Page 32: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Deployment,  monitoring,  maintenance  are  major  challenges  in  building  and  running  cloud  service  •  Ad-­‐hoc,  one-­‐off  solu*ons  raise  cost  and  complexity  of  building  and  running  a  service  

•  Opera*ons  is  integral  part  of  Orleans  design  •  Strongly  versioned  grains  and  services  •  Can  concurrently  run  mul*ple  version  of  applica*on  

•  Support  for  single-­‐box,  test  cluster,  rolling  deployment,  and  emergency  rollback  

•  Integral  monitoring  and  control  of  applica*ons  (Marlowe)  

Operation Model

32  

Page 33: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Enterprise  Java  Beans  •  Java  framework  for  building  enterprise-­‐scale  apps  

•  Orleans:  larger  scale,  more  diverse  clients,  and  simpler  model  

•  Microsoo  App  Fabric  •  CLR-­‐based,  server  component  model  combine  components  in  exis*ng  technologies  (.NET,  WCF,  Workflow,  …)  

•  Incremental  solu*on  that  makes  it  easier  to  build  and  deploy  solu*ons  based  on  exis*ng  technologies  and  provides  some  support  for  solving  distributed  system  problems  

•  Enterprise-­‐scale  systems  

•  Orleans:  focused  replacement  for  exis*ng  Windows  and  .NET  programming  models  for  building  cloud  apps  

•  Salesforce.com  Service  Cloud  •  Integrated  environment  for  building  3-­‐*ered  business  apps  

•  New,  Java-­‐  /  C#-­‐like  language  for  middle  *er  

•  Mul*-­‐tenancy  support  

•  Hosted  by  Salesforce.com  

•  Google  App  Engine  •  Python  or  Java  frameworks  

•  BigTable  for  data  storage  –  schema-­‐less,  strongly  consistent,  op*mis*c  concurrency,  transac*onal  

•  Designed  for  serving  Web  applica*ons  –  passively  driven  by  web  requests,  queues,  or  scheduled  tasks  

•  Stateless  app  logic  *er,  must  complete  within  30  seconds  

Similar Frameworks

33  

Page 34: Programming the Cloud - EPFL€¦ · Programming the Cloud James Larus eXtreme Computing Group, Microsoft Research PPoPP / HPCA February 14, 2011

•  Client  +  cloud  marks  fundamental  change  in  compu*ng  •  Inflec*on  point,  paradigm  shio,  …    

•  New  challenges  require  new  solu*ons  •  Parallelism    /  concurrency  /  distribu*on  •  Reliability  /  availability  •  Communica*ons  •  Power  •  Security  /  privacy  

•  Opportunity  to  rethink  compu*ng  •  Reduce  aggrava*on  and  management  burden  •  Move  to  seamless,  ubiquitous  compu*ng  •  Develop  new,  natural  user  interfaces  

Conclusion

34