hpc cloud: high performance computing in infrastructure cloudscloud+anl+v1.pdf · 2012-06-18 ·...

27
HPC Cloud: High Performance Computing in Infrastructure Clouds Kate Keahey [email protected] Argonne Na6onal Laboratory Computa6on Ins6tute, University of Chicago 1 15/06/12

Upload: others

Post on 03-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

HPC Cloud: High Performance Computing in Infrastructure Clouds

Kate  Keahey  

[email protected]  

Argonne  Na6onal  Laboratory  

Computa6on  Ins6tute,  University  of  Chicago  

1  15/06/12  

Page 2: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Clouds of All Types

15/06/12  2  

Infrastructure-as-a-Service (IaaS)

Platform-as-a-Service (PaaS)

Community-­‐specific  tools,  applica6ons  and  portals  

Software-as-a-Service (SaaS) Control  

Specializa/on  

Page 3: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Technology Trends

15/06/12   3  

Page 4: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Virtualization

  System-­‐level  virtualiza6on  –  Emulates  a  computer  similar  

to  a  real  one  

–  The  VM  runs  a  full  OS  

  VM  images  –  Snapshots  –  Migra6on  

  Scheduling    

Hardware

Type 1 Hypervisor

Privilegeddomain Guest domains

PrivilegedOS

Guest OS

Guest OS

Userland Userland Userland

15/06/12  4  

Page 5: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Widespread Virtualization

  From  IBM  in  1967…  

  …to  Xen  in  2003  –  The  Need  for  Speed  –  Open  Source  

  Other  hypervisors  –  KVM  –  Palacios,  KHVMM  

15/06/12  5  

From  Pra(  et  al.,  SOSP  2003  

Page 6: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Virtualization: Game-changing Benefits

  Benefits  to  users  –  Control  over  environment  

•  Touches  all  aspects  from  convenience  to  privilege  

–  Convenient  packaging/distribu6on  mechanism    •  No  makefiles,  no  valida6on  and  revalida6on,  facilitates  sharing  

  Benefits  to  providers  –  Security  implica6ons  of  isola6on  –  Speed  of  deployment,  switching  between  environments  (cri6cal  difference  from  reimaging)  

  Migra6on  and  snapshoang  

15/06/12  6  

Page 7: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Infrastructure Clouds: Game-Changing Benefits

  Benefits  to  users  –  On-­‐demand  access:  manage  peaks  in  demand  –  Pay-­‐as-­‐you-­‐go:  manage  “valleys”  in  demand  

–  Viable  infrastructure  outsourcing  model  

  Benefits  to  providers  –  Consolida6on  and  economies  of  scale  

  Retail  versus  wholesale  resources  

15/06/12  7  

Page 8: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Clouds and HPC

15/06/12   8  

Page 9: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Clouds and HPC

15/06/12  9  

Can  HPC  workloads  be  run  in  a  cloud?    

Can  a  supercomputer  operate  as  a  cloud?    

HPC  Cloud  

Page 10: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

The Nimbus Project @ ANL

15/06/12  10  

Enable  providers  to  build  IaaS  clouds  

Enable  users  to  use  IaaS  clouds  

Nimbus  Infrastructure    

Nimbus  Pla8orm  

Workspace  Service  

Cumulus  

Context  Broker  

Cloudinit.d  

High-­‐quality,  extensible,  customizable,    open  source  implementa6on  

Elas6c  Scaling  Tools  

Enable  developers  to  extend,    experiment  and  customize  

Page 11: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

The Red Question (RQ)

15/06/12  11  

Napper  et  al.,  “Will  Cloud  CompuDng  Reach  Top500?”  (The  answer  was:  “No”)  

Virtual  Cluster  from  AWS  #146  on  Top500    (currently  #42)  

2009  

2010  

Total  Dme  to  change  your  mind:   1  year  

Can  HPC  workloads  be  run  in  a  cloud?    

Page 12: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: Performance

  I/O  Performance  –  Bandwidth  –  Latency  

  Instability  

15/06/12  12  

How  good  are  clouds  for  HPC?    Resources:  -­‐  The  Magellan  Report  -­‐  “Comparisons:  not  as  Odious  as  Once  Thought”,  see  www.scienceclouds.org/blog  

From  Santos  et  al.,  Xen  Summit  2007  

From  Ramakrishnan  et  al.,  PMBS’11  

Page 13: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: Storage

  Performance  challenges  –  I/O  impact  –  Buffer  caching  not  enabled  –  Bandwidth  conten6on  –  Variability  

  Geang  the  custom  infrastructure  –  Storage  op6ons  of  all  shapes  and  sizes  –  How  do  we  combine  offerings?  

  Price  performance  considera6ons  –  Instance  service  levels  versus  HPC  needs  

15/06/12  13  

Page 14: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: Overall Performance

15/06/12  14  

From  Ramakrishnan  et  al.,  PMBS’11  

PARATEC   MILC  

Overall impact on tightly-coupled applications

Also  see  Jackson  et  al.  ,  CloudCom’10  

Page 15: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: Size and Scale

  “U6lity  Supercompu6ng”  

  Record  (CycleCompu6ng)  –  Computa6onal  chemistry  app  

–  ~6,000  instances    –  ~50,000  cores  –  ~$5,000  per  hour  

  Nimbus  for  OOI    –  BigData  analy6cs  –  Elas6c  scaling  and  HA  –  ~2,000  instances  

15/06/12  15  

Page 16: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: “Other Considerations”

  Special  hardware  –  E.g.,  Amazon  GPUs  instances  

  Noise    –  Significant  variability  

  MTBF  and  failure  rates  –  Both  a  hardware  and  sooware  considera6on  –  Significant  lack  of  reliability  –  Ask  Franck…  

15/06/12  16  

How  good  are  clouds  for  HPC?    “Mohammad  and  the  Mountain”  see  www.scienceclouds.org/blog  

See  Jackson  et  al.  ,  CloudCom’10  

Page 17: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Programming Models

  Leverage  on-­‐demand,  large  provider  pool  

  Adapt  to  failures  –  Master/Slave  example  

15/06/12  17  

Auto-­‐scale   Redeploy   Role  transfer  +  leader  elecDon  

Page 18: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

RQ: Open Issues

  More  understanding  of  performance  –  S6ll  need  understanding  of  external  connec6vity,  storage/performance  tradeoffs,  MTBF  &  noise  

–  Ongoing  issue  as  the  offerings  change  dynamically  

  Performance  characteriza6on  –  Lightweight,  easy  to  run,  as  conclusive  as  possible  

  Storage  space/op6ons  s6ll  largely  unexplored    Programming  models  

15/06/12  18  

Page 19: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

The Blue Question (BQ)

  Is  it  feasible  to  turn  a  supercomputer  into  a  cloud?  

  Challenges:  –  Hypervisor  challenges  –  Deployment  speed    

–  Resource  management  model  

15/06/12  19  

Page 20: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

BQ: Hypervisor Issues

  Challenges:  architecture,  I/O  drivers,  and  performance  

  New  hypervisors  –  KHVMM  (Blue  Gene/P)  

  Xen-­‐IB  (since  ~2006)    “Symbio6c  virtualiza6on”    

–  Passthrough  I/O  –  Preemp6on  control  

–  Op6mized  paging  

  Research  sooware  

15/06/12  20  

CTH  (shockwave  simulaDon)  within  ~5%  of  naDve  performance  

On  Cray  XT4  (Sandia)  

From  Lange  et  al.,  VEE’  11  

Page 21: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

BQ: Deployment Scale and Speed

  Moving  images  is  the  main  component  of  VM  deployment    

  Challenge:  make  image  deployment  faster  

  LANTorrent:  the  BitTorrent  principle  on  a  LAN  –  Streaming  

–  Minimizes  conges6on  at  the  switch  

  Detec6ng  and  elimina6ng  duplicate  transfers  

  BoBom  line:  a  thousand  VMs  in  10  minutes  

15/06/12  21  

EvaluaDon  using  the    Magellan  resource    At  Argonne  NaDonal  Laboratory  

  Other  approaches:    –  Nicolae  et  al.,  HPDC’11  –  Riteau  et  al.,  QCOW  

Page 22: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

BQ: Resource Management

  Challenge:  high  performance  versus  cost    Mul6-­‐tenancy  

–  Noisy  neighbor  –  Interleaving  needs  –  Not  suitable  for  HPC  with  current  methods  

  Single-­‐tenancy  –  U6liza6on  challenge  –  Preemp6ble  instances  and  spot  instances:  increase  u6liza6on  

without  sacrificing  the  ability  to  respond  to  on-­‐demand  requests  

–  Preemp6ble  with  snapshoang:  increase  u6liza6on  without  sacrificing  the  ability  to  resume  computa6on  

15/06/12  22  

Paper:  Sotomayor  et  al.,  HPDC’08  

Page 23: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

BQ: Resource Management (2)

15/06/12  23  

Preemp/on  Enabled  Average  u6liza6on:  83.82%    Maximum  u6liza6on:  100%  

Preemp/on  Disabled  Average  u6liza6on:  36.36%    Maximum  u6liza6on:  43.75%  

Infrastructure  U6liza6on  (%)  Infrastructure  U6liza6on  (%)  

From  Marshall  et  al.,  CCGrid’11  

Page 24: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

BQ: Open Questions

  Hypervisors  –  Rapid  progress  but  s6ll  many  tradeoffs  to  examine  –  Need  robust,  open  source  solu6ons  

  Resource  management  –  Combine  various  concerns  to  allow  users  to  build  berer  virtual  clusters  

–  U6liza6on,  policies,  energy  saving,  etc.    Storage  management  

–  Service  levels  –  Combining  storage  and  compute  clouds  

  Image  management  

15/06/12  24  

Page 25: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Why Do We Care?

  Makeup  of  Top500  by  Franck:  –  87%  commodity  clusters  (similar  to  Amazon  CC)  –  13%  luxury  compu6ng  

  Cloud  compu6ng  creates  a  “cri6cal  mass”  –  Arracts  applica6ons  and  investment  

  Can  we  afford  to  not  support  it?  

15/06/12  25  

Page 26: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Open Issues

  Performance  characteriza6on  –  Conclusive,  easy  to  run,  and  lightweight  –  Reflec6ng  both  applica6on  requirements  and  platorm  idiosyncrasies  

–  “Cloud500”    Filling  the  gap  between  theory  and  prac6ce    “Shining  star”  demo  

  Models      Opportuni6es:  big  data  

15/06/12  26  

Page 27: HPC Cloud: High Performance Computing in Infrastructure CloudsCloud+ANL+v1.pdf · 2012-06-18 · HPC Cloud: High Performance Computing in Infrastructure Clouds Kate%Keahey% keahey@mcs.anl.gov%

Build-a-Collaboration

  KerData  team  (Rennes)  

  Myriads  team  (Rennes)    Avalon  team  (Lyon)    LBNL    Northwestern    University  of  Colorado    ISI    OpenCirrus  (interna6onal)  

15/06/12  27