storageexpo brussels keynote - caringo · 7 file"storage"challenges"...

24
1 Tradi(onal storage models disrupted! Paul Carpen+er, CTO, Caringo, Inc. Storage Expo Brussels March 24, 2010

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

1  

Tradi(onal  storage  models  disrupted!  

Paul  Carpen+er,  CTO,  Caringo,  Inc.  

Storage  Expo  Brussels  

March  24,  2010  

Page 2: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

2  

“May  you  live  in  interes(ng  (mes!”  

Interes(ng  (mes  for  storage  indeed!  •  Exploding  storage  capacity  requirements  for  unstructured  data  

•  Imploding  budgets:  no  more  business  as  usual  

•  Extra  needs:  long-­‐term  archiving,  compliance  •  Geographically  sca>ered  data  and  applica?ons  •  Paralyzing  overall  complexity  

= old Chinese curse!(Curse??  In  boring,  well-­‐administered  Confucian  +mes,  “interes+ng”  was  synonymous  to  chaos  and  upheaval:  a  nightmare  to  the  orderly  and  conserva+ve  Confucian  mind.)  

Page 3: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

3  

About  Caringo  

•  Founded  2005  (Paul  Carpen?er,  Jonathan  Ring,  Mark  Goros)  •  Privately  held  (US,  Belgian  &  Dutch  shareholders)  

•  HQ  in  Aus?n,  Texas  •  Headcount  40  •  Near-­‐virtual  company  –  internet  infrastructure  

•  Gmail,  Skype,  Wiki,  Bugzilla,  Drupal,  colo  DC  for  dev  &  test  

•  Extremely  low  overhead  

•  CAStor  SW  +  commodity  X86  HW  =  “private  storage  clouds”  •  Extreme  simplicity  as  founda?on  of  robustness  &  performance  

content storage simplified!

Page 4: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

4  

Data  explosion  in  the  storage  ecosystem  

Video Documents

Photographs

Medical Images Audio

E-Mail

Access

Store

Distribute

==> unstructured data is the problem!!

Page 5: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

5  

File  storage  &  sharing:  the  reality  out  there  

Unstructured  data  •  Over  95%  is  “unstructured”  1  

Massive  file  growth  •  Up  to  120%  per  year2  

Low  reuse  of  files3  •  90%  never  accessed  ager  crea?on  

•  65%  of  files  accessed  are  only  accessed  once3  

Aging  files  occupying  expensive  storage  •  Sogware  needed  to  migrate  files  to  secondary  storage  •  Added  cost  and  complexity  

Must  meet  compliance  mandates    •  Secondary  storage  ?er  required  

90% 10% 65% Files never accessed

again

accessed once

1IDC,  The  Expanding  Digital  Universe  2The  Economic  Impact  of  File  Virtualiza?on,  IDC  3Measurement  and  Analysis  of  Large-­‐Scale  Network  File      System  Workloads,  UC  Santa  Cruz  

accessed

ugly!

V!

unbelievable! !

Page 6: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

6  

Customer  priori(es  

Source: 2007 Brocade Customer Survey Results

==> reduce complexity!!!

Page 7: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

7  

File  storage  challenges  

Today’s  storage  requirements  are  different  •  Millions  and  billions  of  files  on  thousands  of  large  disk  drives  

File  systems  simply  cannot  stretch  any  further  •  They  hit  maximums  on  file  system  size  and  object  count  •  The  weight  of  layers  of  complexity  and  virtualiza?on  makes  them  bri>le  •  They  are  hard/impossible  to  parallellize  

Newer  file  systems  are  high-­‐maintenance  •  Even  with  layers  of  virtualiza?on,  underlying  file  systems  must    

s?ll  be  managed,  provisioned,  migrated,  backed  up  and  maintained  •  Require  highly  skilled  administrators  

Volume  of  file  data  is  major  informa(on  management  problem  •  Folder/subfolder/filename  scheme  becomes  cryp?c  at  scale  (millions/billions)  •  File  systems  provide  no  proper  meta-­‐informa?onal  context  for  files  

not your father’s file server! !

Page 8: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

8  

“Very  large  part  of  storage  capacity  taken  up  by  file-­‐based,  rich  digital  content”  

5  key  infrastructure  requirements  (Enterprise  Strategy  Group):  

•  Infinite  scaling  –  in  real-­‐?me,  dynamically,  no  human  interven?on  

•  No  boundaries  –  expand  beyond  walls  of  IT  department  

•  Opera(onally  efficient  –  leverage  commodity  components,  policy-­‐based  automa?on  

•  Self-­‐managing  –  auto  re-­‐balance  and  op?mize,  no  human  interven?on  

•  Self-­‐  healing  –  withstand  failures,  automa?cally  adjust/heal  itself  

Next  genera(on:  Internet  scale  … according to analyst 1 !

Page 9: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

9  

Criteria  for  technology  as  defined  by  IDC:  

•  Self-­‐referencing  –  Unique  address  for  each  file/object  

•  Described  by  metadata  –  Beyond  standard  file  system  

•  Loca(on  independence  

•  Dynamic  presenta?on  –  Not  fixed  to  a  tradi?onal  tree  format  

•  Intelligent  replica?on/distribu?on  

Next  genera(on:  object-­‐based  storage    … according to analyst 2 !

Page 10: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

10  

Required/desirable  characteris(cs  

High  performance  object  storage  •  Easily  address  performance  needs  for  small  and/or  large  file  workloads  

Opera(onal  robustness  and  efficiency  •  Self-­‐managing  and  self-­‐healing  cluster  to  minimize  human  interven?on  (error  

prone  and  costly)  

Data  protec(on  &  preserva(on  •  Archive  unstructured  data  for  the  long-­‐term  

•  Address  regulatory  compliance  with  provable  content  integrity  

Cost-­‐effec(ve  scaling      •  Add  capacity  without  interrup?on  or  need  to  provision  storage  

•  Scale  from  Terabytes  to  Petabytes  in  a  single  cluster  

Investment  protec(on  •  Add  new  genera?on  hardware  at  any  ?me  without  disrup?on  

•  Add  sogware  licenses  organically,  in  lockstep  with  business  needs  

above all: simplicity!!!

Page 11: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

11  

Ingredients  

Commodity  hardware  

•  Nodes:  X86  rack  mounted  servers  –  even  entry  level  

•  SATA  drives  –  direct  a>ached  

Massively  parallel  cluster  

•  Switched  Gbit  Ethernet  between  nodes  

Networking  standards  

•  HTTP,  NTP,  SNMP  external;  UDP,  TCP  internal  

Stripped  embedded  Linux  

•  Boots  off  USB  s?ck,  CD-­‐ROM  or  PXE  net  boot  

•  Zero  install  -­‐  no  SW  ever  installed  on  disk  

Content  Addressing  

•  1  object,  1  unique  iden?fier  –  up  to  100  Million  per  node!  

Plain vanilla is the new straciatella! !

Page 12: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

12  

Non-­‐ingredients  

File  Systems  •  They  do  not  scale,  they  break  and  they  don’t  parallelize  properly  

•  Not  used  on  the  outside,  also  not  used  on  the  inside  

Fibre  channel,  iSCSI,  SAS,  RAID  •  Why  use  hardware  if  sogware  will  do  for  a  frac?on  of  the  price?  

Any  other  exo(c,  specialized  or  expensive  HW  •  Parallelliza?on  is  the  name  of  the  performance  game,  not  exo?cs  

Manual  install,  provisioning,  admin,  interven(on,  opera(ons  •  Humans  are  too  expensive  (and  unreliable!)  to  be  an  integral  part  of  the  opera?onal  

storage  management  loop  

… they’d spoil the soup!!

Page 13: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

13  

Content  Addressing  

Regular  file  system  storage  (“loca(on  based  addressing”):  •  Specifies  the  loca(on  of  the  container:   Amsterdam_srv3/maindocs/erp/2009/budgets/rev2/draft/prodlines.xls

•  Content  may  s?ll  be  updated  within  container  –  pathname  remains  iden?cal

Content  Addressed  Storage  (CAS):  •  Specifies  the  iden(ty  of  the  content,  op?onally  prefixed  with  server  address:  

http://cas.yoursite.com/b8f929292ee20bd070b73557ae47de6f

•  Unique  iden?fier  for  immutable  content  

•   new  or  updated  content  means:  new  iden?fier!  

•  usually  128  bit,  may  be  content  hash  or  random  

•  uniqueness  guaranteed  by  probability  

•  Flat  address  space,  no  loca?on  informa?on  

•  Ideal  for  parallel  clustered  object  storage  (object  =  data+metadata)  

… a serial number for every object! !

Page 14: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

14  

Key  characteris(cs  

Massively  scalable  storage  cluster  •  Start  small  and  scale  to  billions  of  objects  

•  As  you  grow  from  TBs  to  PBs,  throughput  also  grows  

Increase  capacity  seamlessly  

•  No  disrup?on  in  opera?ons  or  data  availability.  No  migra?on!  

Symmetric  parallel  architecture    

•  All  nodes  perform  all  func?ons,  no  specialized  access  nodes  

•  No  single  point  of  failure,  high  availability  out  of  the  box  

Manages  and  repairs  itself  automa(cally  -­‐  faster  than  RAID  

Data  is  replicated  for  protec(on  –  range  of  service  levels  

Con(nuous  data  availability  –  even  during  recovery  

Con(nuous  internal  checking  –  for  content  integrity  

Local  and  Wide  Area  Replica(on  –  for  DR  and  backup  

Node 1

n

2

3

GigE

900

4

… simple, robust, parallel!

Page 15: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

15  

CAStor

Simple  object  storage  interface  

HTTP  1.1:  open  standard  already  •  High  performance  and  throughput  

Standardize  the  on  the  wire  protocol  •  No  client  API  issues  Basic  HTTP  methods  will  do  •  GET,  HEAD,  POST,  DELETE,  PUT  

Several  “Cloud  Storage  Standards”  emerging  •  All  HTTP  based  •  Amazon  S3  most  mature  &  credible  so  far  

•  But  only  available  from  Amazon…  

•  Some  may  be  too  complicated  again  •  CDMI  (SNIA)  

•  Simple  Cloud  API  

Application Client

Application Client

Application Client

HTTP HTTP HTTP

MyFile

… usually: HTTP!

Page 16: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

16  

Private  storage  clouds  in  the  enterprise?  

•  Economic,  manageable,  sustainable  solu?on  for  growing  amount  of  unstructured  data  

•  Can  be  geographically  dispersed,  yet  logically  centralized  •  Long  term  archive  and  compliance  within  reach  

•  Green  opportuni?es  

•  For  which  applica?ons?  •  How  to  integrate  &  deploy  apps?  

… absolutely!!

Page 17: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

17  

The  problem  with  clouds  &  apps  

•  The  defini?on  of  cloud  is  …  cloudy  •  The  posi?oning  of  cloud  storage  is  …  woolly  •  The  classifica?on  of  cloud  storage  applica?ons  thus  far  is  …  

not  very  scien?fic,  to  say  the  least  ;-­‐)  

•  Any  confusion  will  always  favor  the  status  quo…  •  …  so  it  is  the  storage  industry’s  problem  and  challenge  to  help  

clarify  these  issues  in  the  mind  of  poten?al  buyers!  

Page 18: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

18  

Trying  to  classify  cloud  applica(ons…  

•  Try  to  list  the  use  cases  for  the  cloud  apps:  •  Web  Publishing  

•  Content  Archiving  •  Primary  Storage  

•  Secondary  Storage  …  •   not  really  a  linear  list,  but  rather  orthogonal,  like:  

Sec

onda

ry

Sto

rage

Prim

ary

Sto

rage

Web Content

Enterprise Content

Page 19: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

19  

Turning  problem  into  opportunity  

•  There  clearly  is  a  cloud  storage  classifica?on  vacuum  in  the  industry  and  especially  in  the  mind  of  poten?al  buyers  

•  Most  analysts  and  vendors  try  to  push  a  storage-­‐technical  classifica?on,  tweaked  to  suit  their  own  offerings:  grid,  cloud,  cluster,  NAS,  CAS,  COS,  RAIN,  MAID,  whatever.  

•  The  prospect  looks  at  this  from  his/her  applica?on’s  point  of  view  and  doesn’t  see  the  match    FRUSTRATION!!  

•  A  golden  opportunity  presents  itself  to  introduce  a  cloud  storage  classifica?on  that  looks  at  the  world  the  way  the  prospect  does:  from an application data perspective

Page 20: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

20  

The  Cloud  Storage  Applica(on  Quadrant:  

primary secondary safety

private

shared

public

(enterprise)  content  archiving  

web  content  publishing  

personal  PC  backup  service  

CloudFolder  (HSM)  

Swedish  Music  Archive  

Johns  Hopkins  University  CIDR  genomic  info  repository  &  archive  

Map any app…!the way the customer sees it!!

large  bank  –  client  docs  repository  &  archive  

large  telco  handset  content  sharing  &  repository  

tiers

exposure

Page 21: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

21  

Integra(ng  &  leveraging  the  enterprise  cloud  

App App

App App

App App

…learn by watching actual customers! !

Page 22: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

22  

Integra(ng  &  leveraging  the  enterprise  cloud  …learn by watching actual customers! !

App App

App App

App App AURA  Ac?ve  Unified  

Repository  &  Archive    

App

App

App

App App

Page 23: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

23  

Summarizing:  

•  Serious  storage  environment  disrup?on  in  full  swing  

•  Unstructured  content  is  the  main  issue  

•  Private  storage  clouds  in  the  enterprise  can  help:  •  reducing  complexity,  TCO  &  power  consump?on  

•  long  term  archive  

•  compliance  

•  unified  repository  as  founda?on  of  new  applica?on  stack  

 Make  that  archive  work  for  a  living!  

Page 24: StorageExpo Brussels Keynote - Caringo · 7 File"storage"challenges" Today’s"storage"requirements"are"different" • Millions*and*billions*of*files*on*thousands*of*large*disk*drives*

24  

Q  &  A:  your  turn!  

[email protected]