silverton cleversafe-object-based-dispersed-storage

7
© 2012 Silverton Consulting, Inc. Page 1 of 7 All Rights Reserved twitter.com/RayLucchesi|RayOnStorage.com +1-720-221-7270|SilvertonConsulting.com Introduction During the evolution of the IT industry data centers have encountered the development of block oriented, storage area network (SAN) and direct accessed storage (DAS) as well as file oriented, network attached storage (NAS). Recently a new technology has emerged to supplement all this using a new storage paradigm originated in high performance computing environments and based on a new data unit called an object. Objects or data elements, consisting of data and metadata, can solve many of the problems found in current DAS, SAN and NAS solutions. An object repository provides a new way to store, access and manage data, and as such, no longer needs to adhere to traditional storage system restrictions or protocols. This new collection of data elements has become the foundation for a number of sophisticated applications such as active archives, vast content farms, and omnipresent, cloud storage services to name just a few. The Cleversafe® Dispersed Storage® solution provides unique object storage functionality not found in other vendor offerings. For instance, their product uses an innovative information dispersal approach to distribute data across a number of nodes or locations, supplying a much more robust, fault tolerant system in the face of drive, node and/or site outages. Why object storage? IT and end user unstructured data rapidly multiplies, often leading to orphaned files, a manageability morass, or worse. When this happens, file systems must be partitioned, data must be moved, and new mount points/shares must be created. All this consumes extra administrative time and causes unnecessary end user confusion. In contrast, an object repository can support vast numbers of data elements. These repositories can easily grow from thousands to billions of objects without partitioning or other application/end user disruptions, all within the same system environment. One may need to add more system nodes to accommodate data growth, but this can all be done without altering data elements, changing accessibility or system outages. Next, customary file and block metadata, or information about data, is defined and controlled by standards committees, making it limited, immutable and thus hard to extend. For example, Internet Engineering Task Force (IETF) defined NFS file Silverton Consulting, Inc. StorInt™ Briefing

Upload: phs

Post on 29-Nov-2014

505 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Silverton cleversafe-object-based-dispersed-storage

 

  ©  2012  Silverton  Consulting,  Inc.   Page  1  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

 

Introduction  During  the  evolution  of  the  IT  industry  data  centers  have  encountered  the  development  of  block  oriented,  storage  area  network  (SAN)  and  direct  accessed  storage  (DAS)  as  well  as  file  oriented,  network  attached  storage  (NAS).  Recently  a  new  technology  has  emerged  to  supplement  all  this  using  a  new  storage  paradigm  originated  in  high  performance  computing  environments  and  based  on  a  new  data  unit  called  an  object.        Objects  or  data  elements,  consisting  of  data  and  metadata,  can  solve  many  of  the  problems  found  in  current  DAS,  SAN  and  NAS  solutions.    An  object  repository  provides  a  new  way  to  store,  access  and  manage  data,  and  as  such,  no  longer  needs  to  adhere  to  traditional  storage  system  restrictions  or  protocols.    This  new  collection  of  data  elements  has  become  the  foundation  for  a  number  of  sophisticated  applications  such  as  active  archives,  vast  content  farms,  and  omni-­‐present,  cloud  storage  services  to  name  just  a  few.      The  Cleversafe®  Dispersed  Storage®  solution  provides  unique  object  storage  functionality  not  found  in  other  vendor  offerings.    For  instance,  their  product  uses  an  innovative  information  dispersal  approach  to  distribute  data  across  a  number  of  nodes  or  locations,  supplying  a  much  more  robust,  fault  tolerant  system  in  the  face  of  drive,  node  and/or  site  outages.    

Why  object  storage?  IT  and  end  user  unstructured  data  rapidly  multiplies,  often  leading  to  orphaned  files,  a  manageability  morass,  or  worse.    When  this  happens,  file  systems  must  be  partitioned,  data  must  be  moved,  and  new  mount  points/shares  must  be  created.    All  this  consumes  extra  administrative  time  and  causes  unnecessary  end  user  confusion.    In  contrast,  an  object  repository  can  support  vast  numbers  of  data  elements.    These  repositories  can  easily  grow  from  thousands  to  billions  of  objects  without  partitioning  or  other  application/end  user  disruptions,  all  within  the  same  system  environment.    One  may  need  to  add  more  system  nodes  to  accommodate  data  growth,  but  this  can  all  be  done  without  altering  data  elements,  changing  accessibility  or  system  outages.      Next,  customary  file  and  block  metadata,  or  information  about  data,  is  defined  and  controlled  by  standards  committees,  making  it  limited,  immutable  and  thus  hard  to  extend.    For  example,  Internet  Engineering  Task  Force  (IETF)  defined  NFS  file  

Silverton Consulting, Inc. StorInt™ Briefing

Page 2: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  2  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

metadata1  includes  such  items  as  the  filename,  directory  path,  creation/last-­‐open/last-­‐modified  dates  as  well  as  size  and  file  physical  location.    To  change  or  excise  NFS  metadata  usually  involves  moving,  modifying  or  deleting  the  file  altogether.    That’s  about  it,  there’s  typically  no  capability  to  extend  this  file  metadata  other  than  by  using  an  associated  store  alongside  the  original  file  system  or  by  encoding  additional  information  into  file  directory  paths.    On  the  other  hand,  object  metadata  can  be  created,  added  to  or  modified  almost  at  will.    This  allows  a  very  flexible,  easily  adaptable,  rich  set  of  information  about  data  that  can  then  be  used  to  help  better  manage  the  elements  of  a  repository  over  its  lifetime.    Such  easy  extensibility  enables  more  automation  and  other  services  unavailable  with  conventional  storage  systems.        Another  problem  with  today’s  file  and  block  storage  is  that  data  can  only  be  accessed  within  a  single  IT  location.  Yes,  solutions  exist  that  can  extend  this  beyond  the  data  center  boundary  but  they  are  historically  expensive  and  very  proprietary.        Alternatively,  objects  can  be  read  or  written  over  the  Internet.    As  such,  this  data  can  be  processed  from  anyplace  around  the  world  with  Web  access,  leading  to  all  sorts  of  new  possibilities  and  a  more  disaster  tolerant  storage  solution.      

Object  characteristics  Objects  are  essentially  a  package  of  data  along  with  rich  metadata  that  is  identified  by  a  single  object-­‐ID.    Further,  data  elements  are  normally  read  or  written  sequentially,  in  one  continuous  access  and  may  contain  any  binary  information.      Equally  important,  metadata  can  supply  any  data  about  an  object  and  can  be  easily  modified  or  extended  way  beyond  anything  available  in  today’s  file  and  block  storage  systems.    Thus,  application  and  system  designers  can  define  any  information  needed  to  help  with  cataloguing,  processing  and  managing  data  elements.    With  such  complete  versatility,  object  repositories  can  be  tailored  to  meet  many  diverse  customer  requirements.    For  example,  metadata  could  be  used  to  identify,        

• Lifecycle  attributes  –  in  an  intelligent  archive,  objects  can  be  moved  to  different  storage  tiers  to  reduce  expense  over  time  as  data  ages.    Lifecycle  metadata  can  be  used  to  identify  how  aggressively  to  manage  the  item  or  how  quickly  to  move  the  data  down  to  less  expensive  tiers  of  storage.  

 • Expiration  attributes  –  in  a  compliance  repository,  some  records  may  have  

different  expiration  dates  than  other  data.    Providing  an  expiration  date  at  the  time  of  creation  can  guarantee  that  important  records  are  not  modified  or  deleted  until  they  have  properly  expired.  

 

                                                                                                               1  Please  see  http://tools.ietf.org/html/rfc5661#section-­‐5.1  for  more  information  

Page 3: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  3  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

• Processing  attributes  –  in  a  video  library,  some  clips  may  need  further  processing  at  ingest  time,  e.g.,  to  transcode  to  other  formats.    Supplying  processing  metadata  at  time  of  video  clip  creation  can  enable  the  system  to  quickly  convert  the  segment  into  required  formats  before  it’s  needed.  

Object  storage  advantages  First  and  foremost,  object  based  storage  systems  can  scale  with  ease.    Most  of  these  systems  are  multi-­‐node  clusters,  built  out  of  storage,  access  and  management  components  with  a  system  interconnect  between  nodes.    As  such,  these  systems  grow  by  adding  more  cluster  nodes,  scaling  from  a  few  TB  to  multiple  PB  in  the  same  system  environment.    Usually  storage,  access,  and  management  components  can  be  added  independently  of  one  another,  but  to  obtain  adequate  performance  one  may  need  to  add  access  elements  as  capacity  grows  over  time.        Some  object  stores  can  span  multiple  sites,  creating  a  geographically  dispersed  storage  system.    In  this  case,  there  is  a  storage  cluster  at  each  location,  which  participates  in  the  fully  distributed  storage  system  consisting  of  all  sites.    With  such  storage,  data  is  commonly  retrievable  from  one  or  more  sites  or  from  multiple  nodes  at  a  single  location  and  as  such,  is  more  fault  tolerant.    In  addition,  most  object  stores  support  REST  (REpresentable  State  Transfer)  interfaces.    Such  protocols  underlie  today’s  World  Wide  Web  and  are  in  action  everyday  when  we  browse  the  Internet.    These  access  conventions  are  generally  considered  more  loosely  coupled  than  traditional  storage  interfaces  and  as  a  result,  are  easier  to  extend.    This  allows  metadata  to  be  easily  added  to  object  data  and  permits  access  to  data  elements  from  anywhere  with  a  link  to  the  Internet.      Another  benefit  of  RESTful  interfaces  is  that  they  are  simpler  to  map  to  other  protocols,  e.g.,  using  a  file  system  gateway  to  access  an  object  store.    In  this  fashion,  data  elements  can  be  read  or  written  by  more  standard  IT  applications  that  currently  employ  file  or  block  storage.    Object  repositories  front-­‐ended  by  file  gateways  like  this  may  sacrifice  some  advantages  such  as  extensible  metadata,  but  allow  data  element  access  to  standard  applications  and  current  end  user  computing  environments.    

Object  storage  use  cases  Object  stores  are  ideal  to  host  large  quantities  of  data  elements  like  content  storage,  content  distribution,  data  archives,  and  cloud  storage.  Specifically,    • Content  storage  –  media  storage  solutions  can  contain  millions  of  media  

segments  that  can  overly  burden  classic  file  systems  with  their  number  and  metadata  requirements.    However,  by  using  an  object  store,  content  repositories  can  support  almost  any  number  of  MPEG  files  and  can  provide  the  metadata  needed  to  manage  all  of  them.    For  example,  metadata  can  be  supplied  for  video  data  such  as  speech-­‐to-­‐text  translations,  facial  recognition  results,  clip  abstracts,  

Page 4: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  4  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

etc.    With  an  object  repository’s  extensible  metadata  even  more  information  about  video  fragments  can  be  added  to  the  content  storage  that  would  make  them  more  searchable  and  thus,  more  discoverable.      

 • Content  distribution  –  video  distribution  centers  can  hold  thousands  of  

videos  whose  streaming  requirements  may  easily  tax  the  performance  of  customary  file  systems.    In  contrast,  object  repositories  can  be  implemented  across  multiple  sites,  with  data  residing  at  many  locations  to  provide  quick,  regional  video  streaming.    In  this  way,  content  distribution  could  be  scaled  up  to  meet  whatever  video  streaming  performance  required  by  their  customer  environment.    

   • Intelligent  data  archives  –  data  archives  can  be  built  with  object  storage  

that’s  almost  impossible  to  supply  with  file  systems  alone.    Most  file  data  passes  through  a  pre-­‐defined  access  cycle,  i.e.,  data  is  referenced  extensively  for  the  first  week  to  90  days  after  creation/modification  and  then  access  rates  fall  off  precipitously.    By  migrating  or  archiving  this  data  through  a  multi-­‐tier  object  store  as  it  ages,  one  can  reduce  costs  using  slower  storage  commensurate  with  its  drop  in  access  intensity.      

 • Cloud  storage  –  cloud  data  storage  can  be  hard  to  support  with  traditional  

data  center  storage  systems.  As  discussed  previously,  object  repositories  with  RESTful  interfaces  are  inherently  WWW  enabled,  and  thereby,  a  better  cloud-­‐based  storage  medium.    Also,  with  extensive  metadata,  cloud  data  services  can  be  tailored  to  the  needs  of  the  data  element  rather  than  the  limited  capabilities  of  classic  storage  systems.    

Cleversafe  object-­‐based  dispersed  storage  Cleversafe’s  Dispersed  Storage  Network  (dsNet®)  solution  is  an  object  storage  system  that  spans  multiple  nodes  or  geographically  dispersed  locations  and  can  be  deployed  as  a  cluster  of  hardware  appliances  or  as  a  software-­‐only  solution.  As  such,  because  of  its  flexible  deployment  options,  customers  can  elect  to  implement  their  dsNet  store  on  currently  owned  hardware  or  purchase  a  complete  integrated  and  tested  storage  solution  from  Cleversafe.  

 With  either  approach,  Cleversafe  functionality  is  partitioned  across  the  following  components:    

• dsNet  Manager  –  one  of  these  instances  is  required  to  configure,  upgrade  and  monitor  the  object  repository.  

 • Accesser®  -­‐  two  or  more  of  these  instances  are  required  for  each  Cleversafe  

storage  site  and  they  provide  access  to  the  stored  data  elements  for  multiple  clients.    

 

Page 5: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  5  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

• Slicestor®  -­‐  multiple  instances  of  these  components  are  required  for  each  Cleversafe  location  and  they  provide  the  actual  storage  for  all  data  elements.  

 As  discussed  previously,  Cleversafe’s  dispersed  storage  system  is  built  around  an  information  dispersal  algorithm  that  slices  up  objects  and  distributes  data  to  multiple  storage  nodes  or  locations.  The  advantages  of  such  an  approach  include:    

• Cost  effective  data  protection  –  with  dispersed  storage,  a  mathematically  deduced,  minimal  amount  of  check  or  parity  information  is  added  to  each  slice  of  data  to  support  fault  tolerance  for  location  outages.    To  be  this  highly  available  with  conventional  storage  would  require  whole  replications  of  the  data  at  multiple  sites,  significantly  increasing  storage  capacity  and  thus,  system  costs.  

 • Configurable  levels  of  data  protection  –  with  the  data  protection  described  

above,  dsNet  data  availability  levels  can  be  configured  to  support  whatever  fault  tolerance  is  required  for  one’s  object  store,  based  on  site  layouts,  network  connectivity  and  storage  configuration.    Cleversafe  data  protection  can  be  varied  to  support  1,  2  or  even  N  site  failures,  all  with  a  lone  parameter  change.    Naturally  this  may  require  more  parity  but  the  system  automatically  takes  care  of  computing  and  storing  the  revised  check  information  for  all  data  elements.  

 • Inherent  levels  of  data  security  –  with  dsNet  information  dispersal  no  one  

location  has  all  of  an  object’s  data  as  slices  are  scattered  across  multiple  nodes  or  sites.    In  this  way  even  if  someone  could  read  all  the  information  at  one  node,  all  they  would  get  is  pieces  of  data  and  parity  information  with  no  way  of  understanding  which  bits  go  with  what  objects.  Thus,  dispersed  storage  is  inherently  more  secure  than  more  common  object  stores  that  keep  all  data  in  consecutive  locations  within  a  node.  

 Moreover,  Cleversafe  storage  is  both  readily  scalable  and  easily  supports  billions  of  data  elements.    In  fact,  Shutterfly,  a  Cleversafe  customer,  started  out  with  a  217TB  store  and  quickly  scaled  it  to  multiple  PB,  storing  over  15  billion  objects  today.2      Cleversafe  also  can  use  a  RESTful  interface  to  access  its  object  store  along  with  a  defined  software  oriented  API.    For  the  REST  access  protocol,  HTTP  oriented  PUT,  GET,  DELETE  and  LIST  commands  are  used  to  create,  retrieve,  delete  and  identify  data  elements  within  the  dsNet  storage  repository.    At  data  element  creation,  the  application  issuing  the  PUT  request  receives  an  object-­‐ID,  which  uniquely  identifies  

                                                                                                               2  Please  see  http://www.cleversafe.com/images/pdf/shutterfly-­‐cleversafe-­‐case-­‐study-­‐07142012.pdf  for  more  information  

Page 6: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  6  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

its  data  and  metadata  within  the  repository.    Any  application  using  the  storage  repository  is  responsible  for  remembering  the  object-­‐ID  returned  by  Cleversafe.    Furthermore,  Cleversafe  storage  solutions  provide  extensive  integrity  checking  to  insure  that  objects  are  readily  accessible  and  always  correct.    This  integrity  verification  activity  operates  in  a  continuous  and  ongoing  fashion  validating  that  data  in  the  object  repository  are  always  accessible  as  stored.    These  same  facilities  are  used  at  retrieval  time  to  insure  that  the  current  and  correct  data  is  always  read.      In  addition  to  the  inherent  security  provided  by  information  dispersal,  Cleversafe  also  offers  SecureSlice™  keyless  encryption  technology.      With  SecureSlice  an  object’s  data  is  encrypted  and  cryptographically  signed  before  being  sliced  and  written  to  Slicestor(s).    Thus,  during  read  back,  data  can  only  be  decrypted  after  a  predefined  threshold  of  slices  have  been  retrieved,  making  it  impossible  for  individual  portions  of  data  to  be  read  without  the  whole  threshold  being  present.    While  Cleversafe  provides  a  very  capable,  standalone  object  store,  they  have  partnered  with  several  3rd  party  solutions  to  supply  unique,  vertical/industry  specific  data  services  over  the  dsNet  storage  repository.    For  instance:    

• iRODS™  (integrated  Rule  Oriented  Data  System)  is  an  open  source  solution  that  can  integrate  with  Cleversafe  storage  to  supply  automated  policy  management  over  data  elements.  The  iRODS  data  grid  application  is  widely  deployed  in  data  intensive  research  and  high  performance  computing  environments  throughout  the  world.      This  application  provides  easy  scalability,  automated  management  and  share-­‐ability  for  large  collections  of  scientific  data  used  by  researchers  located  across  the  globe.  

 • QStar  Archive  Manager  is  data  archiving  software  that  creates  a  gateway  

supporting  NFS  and  CIFS/SMB  data  center  protocol  access  to  Cleversafe’s  object  store.    As  such,  the  QStar  archive  is  presented  as  a  network  mountable  file  share  that  provides  automated  storage  tiering  across  high-­‐speed  disk  and  the  backend  dsNet  storage  as  a  function  of  data  frequency  or  age  within  the  system.    This  data  archive  was  designed  to  support  vast  quantities  of  data  and  easy  scalability  from  TB  to  PB  without  system  disruption.  

 • Mezeo  Cloud  Storage  is  an  enterprise  class,  cloud  based  file  sync  solution.    

The  combined  Mezeo  and  Cleversafe  solution  provides  secure,  highly  available  data  center  file  synchronization  using  cloud  storage  that  enables  easier  collaboration  and  intrinsic  data  protection  for  enterprise  files.    Further,  as  a  cloud  based  storage  system,  data  in  the  Mezeo  and  Cleversafe  solution  can  be  accessed  securely  from  any  Internet  enabled  location.  

Page 7: Silverton cleversafe-object-based-dispersed-storage

  Cleversafe  Object-­based  Dispersed  Storage        

  ©  2012  Silverton  Consulting,  Inc.   Page  7  of  7     All  Rights  Reserved  twitter.com/RayLucchesi|RayOnStorage.com

+1-720-221-7270|SilvertonConsulting.com

Summary  In  short,  Cleversafe  dispersed  storage  implements  a  highly  resilient,  object  storage  solution  that  goes  well  beyond  traditional  IT  storage  systems.    Cleversafe  has  proven  dispersed  storage’s  high  capacity  scalability  and  support  for  billions  of  data  elements.    Just  as  important,  configurable  data  protection,  flexible  security  and  extensible  metadata  are  inherent  features  of  the  Cleversafe  dsNet  system.        Furthermore,  3rd  party  applications  exist  that  enhance  Cleversafe  storage  capabilities  to  support  high  performance/scientific  research  data  grids,  vast  data  archives  and  immense  cloud  storage  systems.    Given  all  this,  Cleversafe’s  object  storage  and  its  application  ecosystem  provide  a  compelling  set  of  advanced  functionality  that  supports  large  data  collections,  needed  by  many  new  and  emerging  data  center  solutions.          

Silverton Consulting, Inc. is a Storage, Strategy & Systems consulting services company, based in the USA offering products and services to the data storage community.