edison hp 3par primary deduplication white paper fifth avenue, 7th floor new york, ny 10003 ...

16
89 Fifth Avenue, 7th Floor New York, NY 10003 www.TheEdison.com White Paper HP 3PAR Thin Deduplication: A Competitive Comparison

Upload: tranhanh

Post on 01-Apr-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

89 Fifth Avenue, 7th Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

 

 

 

 

 

 

 

 

White  Paper    

 

HP  3PAR  Thin  Deduplication:  

A  Competitive  Comparison    

 

 

 

 

 

 

 

 

 

Page 2: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

Printed  in  the  United  States  of  America  

Copyright  2014  Edison  Group,  Inc.  New  York.  

Edison  Group  offers  no  warranty  either  expressed  or  implied  on  the  information  contained  herein  and  shall  be  held  harmless  for  errors  resulting  from  its  use.  

All  products  are  trademarks  of  their  respective  owners.  

First  Publication:  June  2014  

Produced  by:  Chris  M.  Evans,  Senior  Analyst;  Manny  Frishberg,  Editor;  Barry  Cohen,  Editor-­‐‑in-­‐‑Chief  

Page 3: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

Table  of  Contents  

Executive  Summary  ......................................................................................................................  1  

Introduction  ...................................................................................................................................  2  

Objective  .....................................................................................................................................  2  

Audience  .....................................................................................................................................  2  

Contents  of  this  Report  .............................................................................................................  2  

Space  Optimization  in  Primary  Storage  ...................................................................................  3  

Data  Deduplication  ...................................................................................................................  4  

Technical  Features  .................................................................................................................  4  

Managing  Resiliency  .................................................................................................................  5  

Making  the  Cost  of  Flash  Acceptable  .....................................................................................  5  

Anticipated  Space  Savings  .......................................................................................................  5  

HP  3PAR  Thin  Deduplication:  Deep  Dive  ..............................................................................  6  

Background  ................................................................................................................................  6  

Hardware  Acceleration  .........................................................................................................  6  

Thin  Deduplication  Implementation  ......................................................................................  6  

Express  Indexing  .......................................................................................................................  7  

Thin  Clones  .................................................................................................................................  7  

Space  Savings  and  Write  Efficiency  ........................................................................................  8  

Competitive  Analysis  ..................................................................................................................  9  

SolidFire  Storage  System  ..........................................................................................................  9  

Pure  Storage  FlashArray  .........................................................................................................  10  

EMC  XtremIO  ...........................................................................................................................  11  

Conclusions  and  Recommendations  .......................................................................................  12  

Interpreting  Savings  ................................................................................................................  12  

Page 4: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  1  

Executive  Summary  

As  data  growth  continues  at  exponential  rates,  IT  departments  are  being  asked  to  deliver  storage  at  ever-­‐‑increasing  levels  of  efficiency  –  the  classic  “do  more  with  less”  dilemma.  At  the  same  time,  traditional  storage  arrays  are  failing  to  keep  up  with  I/O  density  requirements  and  customers  are  transitioning  to  all-­‐‑flash  systems,  which  have  a  much  higher  raw  $/GB  price  point.  Space  reduction  technologies  such  as  thin  provisioning,  compression  and  data  deduplication  form  a  key  strategy  in  all-­‐‑flash  systems  by  helping  businesses  meet  their  storage  needs  while  driving  high  levels  of  efficiency.  

HP  3PAR  StoreServ’s  thin  deduplication  feature  continues  the  story  of  delivering  value  to  customers  through  optimizing  the  way  their  shared  storage  systems  store  data.  Thin  deduplication  further  leverages  the  use  of  HP  3PAR’s  custom  application-­‐‑specific  integrated  circuit  (ASIC)  to  minimize  the  impact  of  performing  deduplication  inline  as  data  is  written  to  the  array.  Strong  data  integrity  is  maintained  through  additional  integrity  checks  on  every  deduplicated  write,  a  process  that  is  achieved  at  line  speed  using  the  ASIC  technology.  

HP  3PAR  StoreServ  thin  deduplication  is  the  latest  feature  in  a  line  of  thin  technologies,  including  thin  provisioning,  thin  persistence  and  thin  reclaim  that  deliver  value  and  cost  savings  to  the  customer.  Each  of  the  technologies  is  fully  built-­‐‑in  to  the  3PAR  StoreServ  architecture.  

In  this  study,  HP  3PAR  StoreServ  was  compared  to  competing  all-­‐‑flash  offerings  from  SolidFire,  Pure  Storage  and  EMC.  All  of  the  solutions  offer  inline  (real  time)  deduplication,  although  FlashArray  from  Pure  Storage  does  do  some  post-­‐‑processing  of  data.  Both  SolidFire  and  Pure  Storage  integrate  compression  into  their  space  saving  technologies  (and  their  savings  figures).  Only  HP  3PAR  and  Pure  Storage  offer  additional  data  integrity  checking  through  hash  verification.  

From  thin  deduplication  alone  (not  including  Zero  Page  Detect),  HP  3PAR  StoreServ  achieves  up  to  a  10:1  savings,  depending  on  the  data  type  in  use.  This  exceeds  the  figures  claimed  by  the  three  competing  platforms,  two  of  which  also  include  compression  technology  and  pattern  detection  in  their  calculated  figures.  

In  summary,  thin  deduplication,  added  to  the  existing  set  of  thin  technologies,  extends  HP  3PAR  StoreServ’s  leadership  in  offering  customers  highly  efficient,  highly  scalable  primary  storage  for  every  enterprise  requirement.      

Page 5: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  2  

Introduction  

Objective  

This  report  looks  at  the  implementation  of  data  deduplication  on  the  HP  3PAR  StoreServ  storage  platform  and  compares  the  features  and  functionality  offered  to  equivalent  products  in  the  marketplace  today.  The  constant  drive  to  “do  more  with  less”  means  all  space  reduction  technologies  are  valuable  tools  for  increasing  the  level  of  efficiency  in  primary  storage  arrays.  The  ubiquity  of  flash,  as  we  will  discuss,  means  primary  deduplication  is  ready  for  production  implementation.  

Audience  

Decision  makers  in  organizations,  looking  to  deliver  highly  efficient  deployments  of  centralized  storage  will  find  this  report  provides  an  understanding  of  the  technical  issues  in  deploying  deduplication  and  the  resultant  benefits  it  can  deliver.  

Contents  of  this  Report  

• Executive  Summary  –  A  summary  of  the  background  and  conclusions  derived  from  Edison’s  research  and  analysis.  

• Space  Optimization  in  Primary  Storage  –  A  primer  on  the  evolution  of  shared  storage  and  space  savings  techniques  that  help  to  manage  exponential  growth.  

• HP  3PAR  Thin  Deduplication:  Deep  Dive  –  An  in-­‐‑depth  discussion  on  the  features  and  functionality  of  the  HP  3PAR  StoreServ  thin  deduplication  feature.  

• Competitive  Analysis  –  An  examination  of  the  implementation  of  deduplication  in  competitive  storage  platforms  with  comparison  to  HP  3PAR  StoreServ.  

• Conclusions  and  Recommendations  –  A  summary  of  the  findings  from  the  research.  

   

Page 6: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  3  

Space  Optimization  in  Primary  Storage  

The  exponential  rate  of  data  growth  has  been  a  significant  challenge  for  many  organizations  to  manage  since  the  introduction  of  shared  storage  over  20  years  ago.  Demand  for  storage  is  insatiable,  with  estimates  on  growth  varying  from  50-­‐‑100  percent  per  annum.  To  help  manage  growth,  storage  vendors  have  implemented  software  features  that  optimize  the  use  of  physical  storage  capacity.  These  include:  

• Thin  Provisioning  –  this  is  a  space  reduction  technique  that  stores  only  host-­‐‑written  data  to  disk.  Space  savings  are  made  through  storing  only  the  actual  data  written  to  each  volume,  rather  than  reserving  out  the  whole  capacity  of  the  volume  in  “thick”  provisioned  implementations.  Thin  provisioning  solutions  can  save  anywhere  from  35-­‐‑75  percent  of  physical  disk  capacity,  depending  on  the  data  profile,  however  ongoing  housekeeping  is  required  to  keep  efficiency  at  optimum  levels.  HP  3PAR  StoreServ  systems  see  an  average  of  65  percent  based  on  field  data.  

• Zero  Page  Reclaim  –  this  space  reduction  technique  identifies  pages  of  empty  or  “zeroed”  data  and  removes  them  from  physical  disk,  retaining  metadata  information  to  indicate  the  logical  page  in  the  volume  is  empty.  Most  solutions  use  post-­‐‑processing  zero  page  reclaim  (ZPR)  as  the  overhead  of  identifying  empty  pages  in  real  time  impacts  I/O  performance.  However,  the  HP  3PAR  StoreServ  platform  is  unique  in  using  a  dedicated  ASIC  processor  that  identifies  and  eliminates  zero  pages  in  real  time  (known  as  Inline  Zero  Detect),  reducing  disk  I/O  and  saving  on  disk  capacity.  

• Data  Compression  –  this  is  a  space  reduction  technique  that  identifies  repeated  patterns  or  redundancy  in  data  and  removes  it,  leaving  in  place  metadata  that  allows  the  original  information  to  be  recreated.  Although  compression  can  make  significant  savings,  the  overhead  on  processor  requirements  means  many  vendors  have  chosen  not  to  implement  the  technology.  

• Space  Efficient  Snapshots  and  Clones  –  although  not  directly  a  space  reduction  technique,  snapshots  and  clones  of  primary  data  can  be  taken  space-­‐‑efficiently,  using  metadata  to  track  the  differences  between  the  primary  volume  and  the  snapshots.  On  some  architectures  there  are  performance  implications  from  using  snapshots;  some  also  require  space  to  be  reserved  for  a  snapshot  pool,  however  no  such  restrictions  exist  within  the  HP  3PAR  StoreServ  platform.  

Page 7: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  4  

Data  Deduplication  

Deduplication  is  a  space  reduction  technique  that  identifies  redundant  or  duplicate  data  in  physical  storage,  removing  the  redundant  copies  to  retain  a  single  copy  of  data  on  disk.  Metadata  (in  the  form  of  lookup  tables  in  memory)  is  used  to  map  logical  volumes  to  the  single  instance  copies  of  data.  Significant  savings  in  physical  disk  capacity  can  be  achieved  where  systems  contain  lots  of  similar  or  repeated  data,  such  as  virtual  server  and  virtual  desktop  environments.  To  date,  deduplication  has  been  widely  used  in  disk  backup  systems  where  savings  of  90-­‐‑95  percent,  or  over  20:1  reduction  in  physical  capacity  have  been  realized.  

Technical  Features  

Some  of  the  technical  features  of  data  deduplication  include:  

• Inline/Post  Processing  –  data  deduplication  can  be  performed  either  as  data  is  being  committed  to  disk,  in  which  case  it  is  known  as  inline,  or  after  the  data  is  on  disk,  so-­‐‑called  post  processing.  Inline  processing  requires  fast  efficient  algorithms  in  order  to  minimize  any  impact  on  performance,  with  the  added  benefit  that  space  savings  are  realized  immediately.  Post  processing  removes  any  direct  performance  impact,  however  physical  disk  space  usage  will  vary  as  data  is  written  to  disk  and  deduplication  is  performed  as  a  background  task.  

• Fixed/Variable  Block  Size  –  deduplication  techniques  identify  potentially  duplicate  data  either  using  fixed  or  variable  data  block  techniques.  Variable  block  algorithms  typically  produce  higher  deduplication  ratios  than  fixed-­‐‑block  solutions  but  require  more  processing  overhead.  Smaller  fixed  block  sizes  tend  to  produce  more  efficient  results,  but  cost  more  in  terms  of  processor  overhead  and  system  memory  through  additional  metadata  lookups.  

• Data  Hashing  –  hashing  refers  to  the  process  of  generating  a  unique  checksum  value  from  a  block  of  data.  The  hash  value  from  each  block  is  used  as  the  fingerprint  to  reference  that  data  in  metadata  tables  and  when  comparing  new  data  for  deduplication.  Hashing  techniques  vary  in  their  reliability,  with  some  algorithms  generating  the  same  hash  value  for  different  data,  known  as  a  “hash  collision”.  There  is  a  balance  to  be  struck  between  the  complexity  of  the  hash  algorithm  and  the  impact  on  performance,  so  some  implementations  use  lightweight  hashing  and  validate  all  data  before  confirming  duplicates.  

• Data  Profile  –  the  deduplication  of  data  results  in  a  more  randomized  pattern  of  access  for  a  single  volume,  as  the  original  physical  locations  for  blocks  of  data  are  not  

Page 8: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  5  

determined  by  the  logical  volume  layout.  Random  data  access  is  more  difficult  for  HDD-­‐‑based  storage  arrays  to  manage  as  random  I/O  results  in  a  lot  of  latency  from  mechanical  disk  head  movement.  Flash  storage  on  the  other  hand  has  no  such  issues,  making  this  technology  highly  suited  to  managing  deduplicated  data.  

Managing  Resiliency  

In  systems  that  are  highly  deduplicated,  a  single  block  of  data  may  be  a  component  in  tens  or  hundreds  of  logical  volumes.  As  a  result,  the  impact  of  losing  data  due  to  a  hardware  failure  is  much  higher  than  in  non-­‐‑deduplicated  environments.  Data  loss  could  occur  through  logical  corruption  (due  to  a  software  bug)  or  through  hardware  failure  (such  as  two  disks  failing  in  a  RAID  group  using  single  parity).  Some  deduplication  implementations  are  enabled  by  default  and  cannot  be  disabled  by  the  administrator,  which  may  be  undesirable  for  certain  data  types.  

Making  the  Cost  of  Flash  Acceptable  

All-­‐‑flash  arrays  are  a  recent  entrant  into  the  shared  storage  marketplace.  These  appliances  use  flash  exclusively  as  the  permanent  storage  medium.  Flash  is  much  more  expensive  per  GB  than  traditional  hard  drives,  and  as  a  result,  vendors  of  these  products  have  looked  to  find  ways  to  make  the  cost  of  all-­‐‑flash  arrays  more  acceptable  based  on  the  historical  $/GB  measurement.  

One  solution  has  been  to  quote  array  capacities  after  space  reduction  savings  have  been  applied.  The  result  is  a  much  more  palatable  cost  that  is  more  in  line  with  traditional  disk-­‐‑based  arrays.  However  basing  purchasing  decisions  on  anticipated  space  savings  can  be  risky,  unless  the  data  profile  is  well  known  or  validated  first.  

Anticipated  Space  Savings  

The  aim  of  deduplication  is  to  make  savings  on  physical  disk  space.  Savings  vary  with  the  type  of  data  being  optimized,  with  highly  redundant  data  such  as  virtual  server  and  VDI  (Virtual  Desktop  Infrastructure)  deployments  seeing  the  best  results.  Structured  data,  encrypted  data  and  media  content  does  not  usually  realize  much  in  the  way  of  savings  as  the  data  usually  already  been  optimized  by  the  application.  Data  savings  may  also  change  over  time  as  information  is  created  and  destroyed  through  a  normal  lifecycle.  The  savings  made  from  deduplication  should  therefore  be  seen  more  as  an  additional  benefit  rather  a  core  capacity  measurement.  

Page 9: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  6  

HP  3PAR  Thin  Deduplication:  Deep  Dive  

Background  

The  HP  3PAR  StoreServ  architecture  is  based  on  a  cache  coherent  active-­‐‑mesh  cluster  comprised  of  multiple  controller  nodes  and  disk  shelves.  All  controllers  participate  in  data  access,  in  an  “active-­‐‑active”  configuration,  ensuring  that  all  resources  on  all  nodes  are  used  to  service  I/O  requests.  The  HP  3PAR  OS  uses  a  three-­‐‑level  mapping  methodology  similar  to  that  used  in  enterprise  operating  systems  to  store  and  track  physical  and  virtual  resources.  With  the  introduction  of  flash  technology,  the  HP  3PAR  StoreServ  architecture  is  ideally  placed  to  exploit  faster  storage  media,  through  features  that  include  the  existing  range  of  thin  technologies.  

Physical  space  on  backend  storage  is  divided  up  into  1GB  units  known  as  chunklets.  Chunklets  are  then  combined  to  create  logical  disks  (LDs),  applying  data  protection  (RAID)  and  data  placement  rules  to  each  LD.  Virtual  volumes  (VVs)  or  logical  unit  numbers  (LUNs)  are  then  created  out  of  logical  disks  as  the  entity  that  is  assigned  to  hosts  using  a  page  size  granularity  of  16KiB.  Data  resilience  is  achieved  by  distributing  data  across  multiple  nodes,  disk  shelves  and  disks.  

Hardware  Acceleration  

One  of  the  key  differentiators  of  the  3PAR  StoreServ  platform  is  the  use  of  a  custom  hardware  controller,  or  ASIC.  The  ASIC,  now  in  its  fourth  generation,  provides  line  speed  zero  page  detect  for  each  16KiB  block  of  data  written  to  the  array.  It  is  a  core  technology  in  delivering  the  existing  3PAR  StoreServ  thin  technologies,  including  thin  provisioning,  thin  persistence,  thin  conversion  and  thin  copy  reclamation.  

Thin  Deduplication  Implementation  

Thin  deduplication  is  a  new  feature  initially  implemented  on  HP  3PAR  StoreServ  7450  Storage  Systems  deployed  with  the  generation  four  ASIC  or  later.  The  feature  is  provided  as  a  no-­‐‑cost  option  within  the  base  HP  3PAR  OS  suite,  providing  customers  with  the  option  for  immediate  cost  savings  at  no  additional  charge.  Thin  deduplication  is  available  for  both  virtual  volumes  and  snapshots.  

Thin  deduplication  is  an  inline  deduplication  process  that  takes  advantage  of  the  generation  four  ASIC  to  perform  hash  calculations  of  each  16KiB  block  of  data  as  it  is  

Page 10: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  7  

written  to  the  system.  When  data  is  received  by  the  system,  the  hash  calculation  effort  is  offloaded  to  the  ASIC  and  delivered  at  wire  speed.  The  array  then  uses  a  feature  called  Express  Indexing  to  check  whether  the  new  data  already  exists  in  the  system.  If  a  hash  match  is  found,  the  ASIC  is  used  to  do  a  bit-­‐‑by-­‐‑bit  comparison  of  the  new  data  with  the  copy  on  the  backend  flash  to  ensure  no  hash  collision  has  occurred.  As  this  function  is  offloaded  to  the  ASIC  and  performed  at  line  speed,  there  is  negligible  CPU  overhead.  

Express  Indexing  

The  HP  3PAR  operating  system  uses  a  process  called  Express  Indexing  to  detect  duplicate  page  data.  The  process  takes  advantage  of  the  innovative  and  robust  tri-­‐‑level  indexing  system  used  within  the  OS  to  store  and  manage  traditional  (non-­‐‑deduplicated)  volumes.  

When  data  is  received  by  the  array,  Express  Indexing  calculates  a  hash  value  for  each  16KiB  block  of  data.  The  hash  value  is  then  used  to  check  whether  the  new  data  block  already  exists  on  the  system  by  “walking”  the  metadata  tables  using  the  hash  value.  If  the  block  of  data  is  located,  it  is  read  from  the  backend  and  compared  at  a  bit  level  (using  XOR)  in  the  ASIC.  The  XOR  of  two  equal  pages  will  result  in  a  page  of  zeros  that  will  also  be  detected  in  line  by  leveraging  the  ASIC  zero  detection  built  in  engine.  

A  successful  comparison  results  in  a  “dedupe  hit”,  in  which  case  the  virtual  volume  LBA  pointers  are  updated  to  reference  the  located  data  and  the  incoming  data  is  discarded.  In  the  unlikely  event  a  hash  collision  is  detected,  then  the  data  is  stored  to  disk  directly  associated  with  the  virtual  volume  and  not  treated  as  deduplicated.  If  the  new  data  was  not  located  at  lookup,  a  new  data  block  is  allocated  and  the  data  is  written  to  backend  storage.  

With  this  innovative  technique  the  HP  3PAR  StoreServ  solution  makes  efficient  use  of  existing  memory  structures  to  track  unique  and  deduplicated  data  and  map  it  to  virtual  volumes.  With  the  3PAR  memory  structure  design  there  is  no  need  to  keep  reference  counts  to  shared  data  as  any  unreferenced  data  is  eventually  cleaned  up  as  part  of  an  online  garbage  collection  process  via  a  “mark  and  sweep”  algorithm.  

Thin  Clones  

The  abstraction  of  logical  and  physical  volume  content  through  deduplication  provides  the  ability  to  implement  features  such  as  thin  clones.  A  thin  clone  is  a  replica  of  a  volume  that  is  created  through  copying  only  the  metadata  that  associates  a  virtual  

Page 11: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  8  

volume  with  the  physical  data  on  disk.  At  initial  creation,  thin  clones  point  to  the  same  blocks  of  data  as  the  cloned  volume,  however  as  volumes  are  updated  and  the  content  of  data  changes,  new  writes  will  map  to  different  deduplicated  blocks  (or  create  new  blocks),  so  no  direct  overwrite  process  occurs.  Thin  clones  continue  to  “stay  thin”  if  updated  data  continues  to  map  to  existing  deduplicated  data  on  the  array.  

Thin  clones  allows  HP  3PAR  StoreServ  to  implement  highly  efficient  and  instant  volume  copies  for  hypervisor  cloning  functions  such  as  VAAI  on  VMware  vSphere  and  ODX  on  Microsoft’s  Hyper-­‐‑V.  

Space  Savings  and  Write  Efficiency  

HP  3PAR  Thin  Deduplication  has  been  shown  to  deliver  savings  of  up  to  10:1,  depending  on  the  source  data.  This  matches  the  levels  of  savings  claimed  by  other  all-­‐‑flash  storage  vendors.  HP  has  also  done  research  on  the  differences  between  using  the  default  16KiB  block  size  of  the  3PAR  StoreServ  platform  and  the  lower  4KiB  size  used  by  other  platforms.  Tests  showed  a  modest  improvement  in  savings  of  less  than  15  percent.  As  a  result,  HP  chose  to  remain  with  the  existing  16KiB  block  size  as  this  resulted  in  the  optimum  use  of  processor  and  memory  resources.  

HP  also  looked  at  telemetry  data  from  tens  of  thousands  of  existing  customer  systems.  These  showed  the  sweet  spot  for  deduplication  was  between  8KiB  and  16KiB  in  block  size.  Values  lower  than  this  saw  some  modest  improvement  in  savings  but  introduced  higher  system  load.  

HP  3PAR  StoreServ’s  write  striping  capability  means  that  write  I/O  across  SSDs  are  distributed  evenly,  reducing  the  risk  of  catastrophic  device  failure.  HP  provides  a  5-­‐‑year  unconditional  warranty  on  cMLC  drives  in  StoreServ  systems.  

Inline  Zero  Detect  means  data  is  removed  from  the  I/O  pipeline  and  not  written  to  backend  storage,  further  reducing  the  wear  on  SSD  devices.  Finally  features  such  as  Adaptive  Write  and  Adaptive  Sparing  provide  additional  SSD  management,  resulting  in  extending  SSD  capacity  by  a  further  20  percent.  

All  of  the  features  described  are  fully  integrated  with  the  new  thin  deduplication  technology.  

   

Page 12: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  9  

Competitive  Analysis  

Data  deduplication  has  not  been  widely  adopted  in  the  primary  storage  marketplace,  however  the  all-­‐‑flash  array  vendors  have  used  the  technology  as  part  of  new  architecture  designs.  The  notable  exception  to  early  deduplication  adoption  is  NetApp,  who  introduced  deduplication  technology  into  Data  ONTAP  as  early  as  2007.  Unfortunately,  this  implementation  was  based  on  post-­‐‑processing  data  and  consequently  limited  aggregate  size  due  to  the  performance  impact  of  the  post-­‐‑processing  task.  

In  the  all-­‐‑flash  startup  market,  deduplication  has  become  a  “table  stakes”  feature  with  vendors  looking  to  emphasize  the  effective  cost  per  GB  of  their  products  after  space  saving  techniques  have  been  applied.  This  has  caused  problems  for  Violin  Memory,  who  have  no  native  space  reduction  technologies  in  their  products.  

Three  vendors  offering  deduplication  have  been  chosen  as  a  comparison  to  the  HP  3PAR  StoreServ  technology.  These  are  SolidFire’s  Storage  System,  Pure  Storage  FlashArray  and  EMC  XtremIO.  All  of  these  systems  are  new  technology  from  startups  and  therefore  have  deduplication  built  into  their  architecture.  

SolidFire  Storage  System  

SolidFire’s  Storage  System  has  been  available  since  2012,  evolving  through  three  generations  of  hardware  and  six  generations  of  the  platform’s  Element  operating  system.  The  SolidFire  architecture  is  a  scale-­‐‑out  “shared  nothing”  loosely  coupled  node  design,  which  uses  a  back-­‐‑end  10GbE  network  for  inter-­‐‑node  communication.  Systems  can  expand  and  shrink  by  adding  and  removing  nodes.  Data  protection  is  implemented  through  simple  mirroring  of  data  between  nodes.  

SolidFire  uses  a  content-­‐‑based  data  placement  algorithm  to  distribute  data  evenly  across  a  node  complex.  Space  reduction  is  achieved  through  a  combination  of  both  data  deduplication  and  compression.  As  data  is  received  by  the  system,  it  is  divided  into  4KiB  blocks  and  compressed  before  being  hashed.  The  content  is  then  routed  to  the  node  responsible  for  managing  that  hashing  range  of  data.  If  the  new  data  is  found  to  be  a  duplicate,  then  a  reference  to  the  content  is  stored  against  the  volume  and  the  node  discards  it;  if  the  data  is  unique  it  is  written  to  SSD.  New  deduplicated  data  is  not  checked  before  writing  to  disk.  

Page 13: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  10  

Compressing  data  as  it  is  written  to  the  system  results  in  blocks  of  variable  length,  which  are  then  written  in  a  tightly  packed  arrangement  on  backend  storage.  The  means  as  data  is  expired  from  the  system,  housekeeping  is  required  to  reclaim  usable  space  and  restack  content  on  physical  media.  

SolidFire  delivers  inline  deduplication  based  on  a  4KiB  block  size  and  is  always  enabled.  The  company  claims  between  4:1  and  10:1  efficiency  savings,  based  on  both  compression  and  deduplication,  although  no  breakdown  of  each  method  is  given.  

Pure  Storage  FlashArray  

Pure  Storage  released  their  first  FlashArray  product  in  May  2012.  The  system  is  built  on  a  scale-­‐‑up  architecture  consisting  of  dual  active-­‐‑active  redundant  node  controllers  and  shelves  of  solid-­‐‑state  disk  (SSD).  

FlashArray  uses  five  different  techniques  for  data  reduction1,  all  known  together  as  “FlashReduce”.  The  components  are:  

• Pattern  Removal  –  this  looks  for  repeated  patterns  in  data  including  identifying  zeroed  data.  

• Inline  Compression  –  this  process  uses  a  lightweight  implementation  of  the  LZO  (Lempel-­‐‑Ziv-­‐‑Oberhumer)  algorithm  and  is  a  “first  pass”  at  compression  inline  before  data  is  committed  to  disk.  

• Adaptive  Inline  Deduplication  –  deduplication  is  performed  inline  using  a  variable-­‐‑size  block  deduplication  algorithm,  based  on  blocks  from  4KiB  to  32KiB  in  512  byte  increments  (the  minimum  size  is  based  on  SSD  page  writes,  which  are  4KiB).  

• Deep  Reduction  –  this  process  uses  a  patent  pending  form  of  the  Huffman  encoding  algorithm  and  is  performed  as  a  post-­‐‑processing  task  to  achieve  more  aggressive  space  savings.  

• Copy  Reduction  –  all  snapshots  and  clones  in  a  FlashArray  system  are  deduplication  aware.  This  feature  is  also  implemented  in  the  HP  3PAR  StoreServ  platform.  

Deduplication  is  always  enabled  within  FlashArray  systems,  however  the  architecture  allows  the  deduplication  process  to  be  curtailed  during  periods  of  heavy  system  load.  In  this  scenario,  hash  lookups  may  be  abandoned  and  potentially  duplicate  data  written  to  disk.  As  a  result,  FlashArray  uses  the  Deep  Reduction  feature  to  identify  missed  deduplication  opportunities  and  to  apply  compression  more  aggressively  than  could  be  achieved  inline.  

                                                                                                               1  http://www.purestorage.com/blog/pure-­‐‑storage-­‐‑flash-­‐‑bits-­‐‑adaptive-­‐‑data-­‐‑reduction/  

Page 14: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  11  

FlashArray  deduplication  cannot  be  disabled  on  a  per-­‐‑volume  basis;  all  volumes  have  deduplication  applied  to  them.  Pure  Storage  quote  their  space  savings  using  a  “real-­‐‑time”  ticker  on  their  website,  which  shows  savings  based  on  information  from  customer  arrays.  This  shows  an  overall  reduction  rate  of  5.72:1,  with  2.13:1  achieved  from  deduplication  and  2.68:1  from  compression.  

EMC  XtremIO  

EMC  acquired  the  Israeli  startup  XtremIO  in  2012,  with  the  first  GA  products  shipping  at  the  end  of  2013.  The  all-­‐‑flash  XtremIO  platform  is  based  on  a  scale-­‐‑out  node  architecture  of  paired  controllers  called  X-­‐‑Bricks,  which  encapsulate  a  fixed  amount  of  flash  (25  drives)  per  controller  pair.  Multiple  X-­‐‑Bricks  are  connected  through  an  RDMA  mesh.  

The  XtremIO  design  uses  a  content-­‐‑based  data  placement  architecture  where  data  is  stored  in  4KiB  blocks  based  on  the  hash  value  generated  by  each  write  I/O.  This  results  in  an  even  distribution  of  data  across  all  nodes  in  a  system,  with  each  node  managing  a  part  of  the  hash  value  address  space.  The  distribution  mechanism  means  system  expansion  is  a  non-­‐‑trivial  exercise  and  currently  XtremIO  systems  cannot  be  expanded.  

The  XtremIO  operating  system  (XIOS)  runs  a  number  of  processes  (called  modules)  that  manage  data  flow  in  the  XtremIO  system.  As  write  I/Os  are  received,  the  Routing  module  splits  the  data  into  4KiB  chunks  and  calculates  the  hash  value  of  each  chunk.  The  Control  module  maintains  a  hash  table  list  of  data  and  checks  to  see  if  the  hash  value  represents  data  already  stored  by  the  system.  If  the  data  is  unique,  the  hash  value  is  recorded  and  the  data  is  passed  to  a  data  module  to  store  on  SSD.  If  the  data  is  a  duplicate,  the  data  module  simply  increments  a  reference  count  and  discards  the  data.  The  XtremIO  system  is  therefore  heavily  dependent  on  maintaining  accurate  reference  counters  to  each  4KiB  of  stored  data.  

XtremIO  is  based  on  fixed  4KiB  blocks,  with  no  verification  of  the  hash  value  before  committing  to  disk.  Deduplication  is  global  across  the  entire  XtremIO  cluster,  due  to  the  use  of  content-­‐‑based  data  storage.  However,  data  is  not  replicated  across  nodes  using  a  standard  replication  scheme  such  as  RAID.  Instead  XtremIO  uses  a  RAID-­‐‑6  style  protection  mechanism  called  XDP,  which  writes  data  redundantly  within  each  X-­‐‑Brick  with  a  capacity  overhead  of  around  8  percent.  Loss  of  an  X-­‐‑Brick  therefore  means  data  becomes  inaccessible.  The  current  design  of  XDP  means  no  flexibility  in  data  protection  mechanisms  is  available  and  deduplication  cannot  be  turned  off  for  more  sensitive  data.  EMC  claims  a  5:1  deduplication  ratio  in  their  documentation  when  quoting  usable  capacity.    

Page 15: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  12  

Conclusions  and  Recommendations  

Data  deduplication  is  a  technology  that  can  offer  significant  space  and  cost  savings  in  primary  storage.  Due  to  the  random  nature  of  deduplicated  data,  the  technology  has  not  seen  traction  and  deployment  in  traditional  arrays;  instead  it  has  become  a  key  feature  for  all-­‐‑flash  solutions,  which  capably  cope  with  the  random  I/O  profile.  

The  underlying  design  and  architecture  of  the  HP  3PAR  StoreServ  platform  means  it  is  well  suited  to  the  requirements  of  deduplication  on  flash  storage.  HP  3PAR  StoreServ  Thin  Deduplication  continues  the  evolution  of  space  savings  features  of  the  platform,  adding  to  savings  customers  are  already  achieving  through  thin  provisioning,  thin  reclaim,  thin  conversion  and  thin  persistence.  

Thin  Deduplication  leverages  the  3PAR  StoreServ  custom  ASIC  to  perform  hashing  and  data  integrity  checking  at  line  speed;  the  ASIC  continues  to  be  a  key  differentiator  in  the  primary  array  marketplace.  

In  comparison  to  other  platforms,  HP  3PAR  StoreServ  implements  Thin  Deduplication  with  little  or  no  performance  overhead  and  provides  the  customer  with  the  ability  to  choose  which  data  should  be  considered  for  deduplication  on  a  volume  by  volume  basis.  In  true  3PAR  StoreServ  ethos,  space  saving  settings  can  be  changed  dynamically  without  requiring  work  by  the  customer  or  restricting  the  array  design  or  layout.  

Interpreting  Savings  

Explanations  of  space  savings  are  murky  and  not  transparently  explained.  Some  vendors  exclude  their  RAID  overhead;  some  include  all  space  saving  techniques  (including  thin  provisioning)  without  providing  a  breakdown  of  the  savings  and  how  they  are  achieved.  There  is  also  typically  no  discussion  on  how  much  space  metadata  occupies  on  backend  storage.  

In  the  product  comparisons,  EMC  XtremIO  quotes  a  saving  ratio  of  5:1  (without  any  detail  on  how  this  is  achieved),  Pure  Storage  quotes  5.72:1  and  SolidFire  quotes  values  from  4:1  to  10:1.  Note  that  figures  from  Pure  and  SolidFire  also  include  compression  savings  (which  has  considerable  processor  overhead),  which  is  not  currently  an  HP  3PAR  StoreServ  feature.  

Page 16: Edison HP 3PAR Primary Deduplication White Paper Fifth Avenue, 7th Floor New York, NY 10003  212.367.7400 !!!!! WhitePaper&!! HP3PAR&Thin&Deduplication:& ACompetitiveComparison&

 

 

Edison:  HP  3PAR  StoreServ  Thin  Deduplication  A  Competitive  Comparison       Page  13  

HP  3PAR  StoreServ  Systems  achieve  deduplication  ratios  of  up  to  10:1  without  including  savings  from  other  Thin  Technologies.  Space  savings  from  Inline  Zero  Detect,  for  example,  are  not  included  but  can  be  significant,  making  overall  savings  much  greater.  

Data  deduplication  ratios  alone  are  not  a  true  indication  of  the  benefit  of  deduplication  technology.  HP  3PAR  StoreServ  integrates  deduplication  with  existing  thin  technologies  and  features  such  as  Thin  Clones  to  deliver  a  comprehensive  integrated  space  saving  solution.  

With  the  release  of  thin  deduplication,  HP  3PAR  StoreServ  continues  to  maintain  leadership  in  delivering  customers  highly  efficient  primary  storage  solutions.    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4AA5-­‐‑3223ENW