a performance comparison of container-based virtualization systems for mapreduce clusters

22
A Performance Comparison of Containerbased Virtualiza8on Systems for MapReduce Clusters Miguel G. Xavier, Marcelo V. Neves, Cesar A. F. De Rose [email protected] Faculty of Informa8cs, PUCRS Porto Alegre, Brazil February 13, 2014

Upload: marcelo-veiga-neves

Post on 27-Jan-2015

113 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

A  Performance  Comparison  of  Container-­‐based  Virtualiza8on  Systems  for  MapReduce  Clusters    

Miguel  G.  Xavier,  Marcelo  V.  Neves,  Cesar  A.  F.  De  Rose  [email protected]  

Faculty  of  Informa8cs,  PUCRS  Porto  Alegre,  Brazil  

 February  13,  2014  

Page 2: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Outline  

•  Introduc8on  •  Container-­‐based  Virtualiza8on  •  MapReduce  •  Evalua8on  •  Conclusion    

Page 3: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Introduc8on  •  Virtualiza8on    

•  Allows  resources  to  be  shared  •  Hardware  independence,  availability,  isola8on  and  security  •  BeUer  manageability  •  Widely  used  in  datacenters/cloud  compu8ng  

•  MapReduce  Cluster  and  Virtualiza8on    •  Usage  scenarios  

•  BeUer  resource  sharing  •  Cloud  Compu8ng  

•  However,  hypervisor-­‐based  technologies  in  MapReduce  environments  has  tradi8onally  been  avoided  

Page 4: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Container-­‐based  Virtualiza8on    •  A  group  o  processes  on  a  Linux  box,  put  together  in  a  

isolated  environment  •  A  lightweight  virtualiza8on  layer    •  Non  virtualized  drivers  •  Shared  opera8ng  system  

Hardware

Host OS

Virtualization Layer

Guest Processes

Guest Processes

Hardware

Virtualization Layer

Guest Processes

Guest Processes

Guest OS Guest OS

Container-based Virtualization Hypervisor-Based Virtualization

Host OS

Page 5: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Container-­‐based  Virtualiza8on    •  Each  container  has:  

•  Its  own  network  interface  (and  IP  Address)  •  Bridged,  routed  …  

•  Its  own  filesystem  •  Isola8on  (security)  

•  container  A  and  B  can’t  see  each  other  •  Isola8on  (resource  usage)  

•  RAM,  CPU,  I/O  •  Current  systems  

•  Linux-­‐Vserver,  OpenVZ,  LXC      

 

Page 6: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Container-­‐based  Virtualiza8on    •  Implements  Linux  Namespaces  

•  Mount  –  moun8ng/unmou8ng  file  systems  •  UTS  –  hostname,  domainname  •  IPC  –  SysV  message  queues,  semaphore,  memory  segments  •  Network  –  IPv4/IPv6  stacks,  rou8ng,  firewall,  /proc/net,  

sock  •  PID  –  Own  set  of  pids  Chroot  is  filesystem  namespace    

•  Current  systems  •  Linux-­‐Vserver,  OpenVZ,  LXC      

 

Page 7: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Container-­‐based  Systems    •  Linux-­‐VServer  

•  Implements  its  own  features  in  Linux  kernel    •  limits  the  scope  of  the  file  system  from  different  processes  

through  the  tradi8onal  chroot  •  OpenVZ  

•  Linux  Containers  (LXC)  •  Based  on  CGroups  

Page 8: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Hypervisor-­‐  vs  Container-­‐based  Systems  

Hypervisor   Container  Different  Kernel  OS   Single  Kernel  Device  Emula8on   Syscall  Many  FS  caches   Single  FS  cache  Limits  per  machine   Limits  per  process  High  Performance  Overhead   Low  Performance  Overhead  

Page 9: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

MapReduce  •  MapReduce    •  A  parallel  programming  model  •  Simplicity,  efficiency  and  high  scalability  •  It  has  become  a  de  facto  standard  for  large-­‐scale  data  analysis  

 •  MapReduce  has  also  aUracted  the  aUen8on  of  the  HPC  

community  •  Simpler  approach  to  address  the  parallelism  problem  •  Highly  visible  case  where  MapReduce  has  been  successfully  

used  by  companies  like  Google,  Yahoo!,  Facebook  and  Amazon  

Page 10: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

MapReduce  and  Containers  •  Apache  Mesos  •  Shares  a  cluster  between  mul8ple  different  frameworks  •  Creates  another  level  of  resource  management  •  Management  is  taken  away  from  cluster’s  RMS  

•  Apache  YARN  •  Hadoop  Next  Genera8on  •  BeUer  job  scheduling/monitoring  •  Uses  virtualiza8on  to  share  a  cluster  among  different  

applica8ons  

   

Page 11: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Evalua8on  •  Experimental  Environment    

•  Hadoop  cluster  composed  by  4  nodes    •  Two  processors  with  8  cores  (without  threads)  per  node  •  16GB  of  memory  per  node  •  146GB  of  disksize  per  node  

•  Analyze  of  the  best  results  of  performance  •  Through  micro-­‐benchmarks    

•  HDFS  evalua8on  (TestDFSIO)  •  NameNode  evalua8on  (NNBench)  •  MapReduce  evalua8on  (MRBench)  

•  Through  macro-­‐benchmarks  (WordCount,  TeraBench)    •  Analyze  of  best  results  of  isola8on  

•  Through  IBS  benchmark  

•  At  least  50  execu8ons  were  performed  for  each  experiment  

 

Page 12: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

HDFS  Evalua8on  

•  Semngs:  •  Replica8on  of  3  blocks  •  File  size  from  100  MB  to  

3000  MB      

•  All  Container-­‐based  systems  have  performance  similar  to  na8ve    

•  Results  o  OpenVZ  represents  loss  of  3Mbps  

•  It  is  due  to  the  CFQ  scheduler      

0

5

10

15

20

25

30

0 1000 2000 3000File size (Bytes)

Thro

ughp

ut (M

bps)

lxcnativaovzvserver

Page 13: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

HDFS  Evalua8on    •  All  of  Container-­‐based  

systems  obtained  performance  results  similar  to  na8ve    

 •  Linux-­‐VServer  uses  a  

Physical-­‐based  network    

0

5

10

15

20

25

30

0 1000 2000 3000File size (Bytes)

Thro

ughp

ut (M

bps)

lxcnativaovzvserver

Page 14: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

NameNode  Evalua8on  using  NNBench  

•  NNBench  benchmark  was  chosen  to  evaluate  the  NameNode  component  •  Linux-­‐VServer  reaches  a  latency  at  a  average  of  48ms,  while  LXC  obtained  the  

worst  result  at  an  average  of  56ms  •  The  differences  are  not  so  significant  if  the  numbers  are  considered  •  However,  the  strengths  are  that  no  excep8on  was  observed  during  the  high  

HDFS  management  stress,  and  that  all  systems  were  able  to  respond  effec8vely  as  the  na8ve  

Na8ve   LXC   OpenVZ   VServer  

Open/Read  (ms)   0.51     0.52   0.51   0.49  

Create/Write  (ms)   54.65   56.89   51.96   48.90  

•       Generates  opera8ons  on  1000  files  on  HDFS  

Page 15: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

MapReduce  Evalua8on  using  MRBench  

•  The  results  obtained  from  MRBench  show  that  MR  layer  suffers  no  substan8al  effect  while  running  on  different  container-­‐based  virtualiza8on  systems  

Na8ve   LXC   OpenVZ   VServer  

Execu8on  Time     14251       13577   14304     13614    

Page 16: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Analyzing  Performance  with  WordCount  

 

0

20

40

60

80

100

120

140

160

180

Wordcount

Exec

utio

n Ti

me

(sec

onds

)

NativeLXCOpenVZVServer

•  30  GB  of  input  data  

•  The  peak  of  performance  degrada8on  from  OpenVZ  is  explained  by  the  I/O  scheduler  overhead  

Page 17: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Analyzing  Performance  with  TeraSort  

 

0

20

40

60

80

100

120

140

Terasort

Exec

utio

n Ti

me

(sec

onds

)

NativeLXCOpenVZVServer

•  Standard  map/reduce  sort  •  Steps:  •  Generates  30  GB  of  input  

data  •  Run  on  such  input  data.    

•  A  HDFS  block  size  of  64MB  

 

Page 18: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Performance  Isola8on  

Container  A  

Container  A  

Container  B  

Base  line    applica8on  

Base  line    applica8on  

Stress  Test  

Execu8on  Time     Execu8on  Time    

Performance  degrada8on  (%)    

Page 19: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Performance  Isola8on  

  CPU   Memory   I/O   Fork  Bomb  

LXC   0%   8.3%   5.5%   0%  

•  We  chose  LXC    as  the  representa8ve  of  the  container-­‐based  virtualiza8on  to  be  evaluated  

•  The  limits    of  the  CPU  usage  per  container  is  working  well  •  no  significant  impact  was  noted.    •  a  liUle  performance  degrada8on  needs  to  be  taken  into  account    •  The  fork  bomb  stress  test  reveals  that  the  LXC  has  a  security  subsystem  that  

ensure  feasibility  

Page 20: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Conclusions  •  we  found  that  all  container-­‐based  systems  reach  a  near-­‐na8ve  performance  for  

MapReduce  workloads    •  the  results  of  performance  isola8on  reveled  that  the  LXC  has  improved  its  

capabili8es  of  restrict  resources  among  containers    •  although  some  works  are  already  taking  advantages  of  container-­‐based  

systems  on  MR  clusters  •  this  work  demonstrated  the  benefits  of  using  container-­‐based  systems  to  

support  MapReduce  clusters  

Page 21: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Future  Work  

•  We  plan  to  study  the  performance  isola8on  at  the  network-­‐level  •  We  plan  to  study  the  scalability  while  increasing  the  number  of  

nodes  •  We  plan  to  study  aspects  regarding  the  green  compu8ng,  such  as  

the  trade-­‐off  between  performance  and  energy  consump8on    

Page 22: A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Thank  you  for  your  aUen8on!