storageexpo brussels keynote - caringo · 7 file"storage"challenges"...
TRANSCRIPT
1
Tradi(onal storage models disrupted!
Paul Carpen+er, CTO, Caringo, Inc.
Storage Expo Brussels
March 24, 2010
2
“May you live in interes(ng (mes!”
Interes(ng (mes for storage indeed! • Exploding storage capacity requirements for unstructured data
• Imploding budgets: no more business as usual
• Extra needs: long-‐term archiving, compliance • Geographically sca>ered data and applica?ons • Paralyzing overall complexity
= old Chinese curse!(Curse?? In boring, well-‐administered Confucian +mes, “interes+ng” was synonymous to chaos and upheaval: a nightmare to the orderly and conserva+ve Confucian mind.)
3
About Caringo
• Founded 2005 (Paul Carpen?er, Jonathan Ring, Mark Goros) • Privately held (US, Belgian & Dutch shareholders)
• HQ in Aus?n, Texas • Headcount 40 • Near-‐virtual company – internet infrastructure
• Gmail, Skype, Wiki, Bugzilla, Drupal, colo DC for dev & test
• Extremely low overhead
• CAStor SW + commodity X86 HW = “private storage clouds” • Extreme simplicity as founda?on of robustness & performance
content storage simplified!
4
Data explosion in the storage ecosystem
Video Documents
Photographs
Medical Images Audio
Access
Store
Distribute
==> unstructured data is the problem!!
5
File storage & sharing: the reality out there
Unstructured data • Over 95% is “unstructured” 1
Massive file growth • Up to 120% per year2
Low reuse of files3 • 90% never accessed ager crea?on
• 65% of files accessed are only accessed once3
Aging files occupying expensive storage • Sogware needed to migrate files to secondary storage • Added cost and complexity
Must meet compliance mandates • Secondary storage ?er required
90% 10% 65% Files never accessed
again
accessed once
1IDC, The Expanding Digital Universe 2The Economic Impact of File Virtualiza?on, IDC 3Measurement and Analysis of Large-‐Scale Network File System Workloads, UC Santa Cruz
accessed
ugly!
V!
unbelievable! !
6
Customer priori(es
Source: 2007 Brocade Customer Survey Results
==> reduce complexity!!!
7
File storage challenges
Today’s storage requirements are different • Millions and billions of files on thousands of large disk drives
File systems simply cannot stretch any further • They hit maximums on file system size and object count • The weight of layers of complexity and virtualiza?on makes them bri>le • They are hard/impossible to parallellize
Newer file systems are high-‐maintenance • Even with layers of virtualiza?on, underlying file systems must
s?ll be managed, provisioned, migrated, backed up and maintained • Require highly skilled administrators
Volume of file data is major informa(on management problem • Folder/subfolder/filename scheme becomes cryp?c at scale (millions/billions) • File systems provide no proper meta-‐informa?onal context for files
not your father’s file server! !
8
“Very large part of storage capacity taken up by file-‐based, rich digital content”
5 key infrastructure requirements (Enterprise Strategy Group):
• Infinite scaling – in real-‐?me, dynamically, no human interven?on
• No boundaries – expand beyond walls of IT department
• Opera(onally efficient – leverage commodity components, policy-‐based automa?on
• Self-‐managing – auto re-‐balance and op?mize, no human interven?on
• Self-‐ healing – withstand failures, automa?cally adjust/heal itself
Next genera(on: Internet scale … according to analyst 1 !
9
Criteria for technology as defined by IDC:
• Self-‐referencing – Unique address for each file/object
• Described by metadata – Beyond standard file system
• Loca(on independence
• Dynamic presenta?on – Not fixed to a tradi?onal tree format
• Intelligent replica?on/distribu?on
Next genera(on: object-‐based storage … according to analyst 2 !
10
Required/desirable characteris(cs
High performance object storage • Easily address performance needs for small and/or large file workloads
Opera(onal robustness and efficiency • Self-‐managing and self-‐healing cluster to minimize human interven?on (error
prone and costly)
Data protec(on & preserva(on • Archive unstructured data for the long-‐term
• Address regulatory compliance with provable content integrity
Cost-‐effec(ve scaling • Add capacity without interrup?on or need to provision storage
• Scale from Terabytes to Petabytes in a single cluster
Investment protec(on • Add new genera?on hardware at any ?me without disrup?on
• Add sogware licenses organically, in lockstep with business needs
above all: simplicity!!!
11
Ingredients
Commodity hardware
• Nodes: X86 rack mounted servers – even entry level
• SATA drives – direct a>ached
Massively parallel cluster
• Switched Gbit Ethernet between nodes
Networking standards
• HTTP, NTP, SNMP external; UDP, TCP internal
Stripped embedded Linux
• Boots off USB s?ck, CD-‐ROM or PXE net boot
• Zero install -‐ no SW ever installed on disk
Content Addressing
• 1 object, 1 unique iden?fier – up to 100 Million per node!
Plain vanilla is the new straciatella! !
12
Non-‐ingredients
File Systems • They do not scale, they break and they don’t parallelize properly
• Not used on the outside, also not used on the inside
Fibre channel, iSCSI, SAS, RAID • Why use hardware if sogware will do for a frac?on of the price?
Any other exo(c, specialized or expensive HW • Parallelliza?on is the name of the performance game, not exo?cs
Manual install, provisioning, admin, interven(on, opera(ons • Humans are too expensive (and unreliable!) to be an integral part of the opera?onal
storage management loop
… they’d spoil the soup!!
13
Content Addressing
Regular file system storage (“loca(on based addressing”): • Specifies the loca(on of the container: Amsterdam_srv3/maindocs/erp/2009/budgets/rev2/draft/prodlines.xls
• Content may s?ll be updated within container – pathname remains iden?cal
Content Addressed Storage (CAS): • Specifies the iden(ty of the content, op?onally prefixed with server address:
http://cas.yoursite.com/b8f929292ee20bd070b73557ae47de6f
• Unique iden?fier for immutable content
• new or updated content means: new iden?fier!
• usually 128 bit, may be content hash or random
• uniqueness guaranteed by probability
• Flat address space, no loca?on informa?on
• Ideal for parallel clustered object storage (object = data+metadata)
… a serial number for every object! !
14
Key characteris(cs
Massively scalable storage cluster • Start small and scale to billions of objects
• As you grow from TBs to PBs, throughput also grows
Increase capacity seamlessly
• No disrup?on in opera?ons or data availability. No migra?on!
Symmetric parallel architecture
• All nodes perform all func?ons, no specialized access nodes
• No single point of failure, high availability out of the box
Manages and repairs itself automa(cally -‐ faster than RAID
Data is replicated for protec(on – range of service levels
Con(nuous data availability – even during recovery
Con(nuous internal checking – for content integrity
Local and Wide Area Replica(on – for DR and backup
Node 1
n
2
3
GigE
900
4
… simple, robust, parallel!
15
CAStor
Simple object storage interface
HTTP 1.1: open standard already • High performance and throughput
Standardize the on the wire protocol • No client API issues Basic HTTP methods will do • GET, HEAD, POST, DELETE, PUT
Several “Cloud Storage Standards” emerging • All HTTP based • Amazon S3 most mature & credible so far
• But only available from Amazon…
• Some may be too complicated again • CDMI (SNIA)
• Simple Cloud API
Application Client
Application Client
Application Client
HTTP HTTP HTTP
MyFile
… usually: HTTP!
16
Private storage clouds in the enterprise?
• Economic, manageable, sustainable solu?on for growing amount of unstructured data
• Can be geographically dispersed, yet logically centralized • Long term archive and compliance within reach
• Green opportuni?es
• For which applica?ons? • How to integrate & deploy apps?
… absolutely!!
17
The problem with clouds & apps
• The defini?on of cloud is … cloudy • The posi?oning of cloud storage is … woolly • The classifica?on of cloud storage applica?ons thus far is …
not very scien?fic, to say the least ;-‐)
• Any confusion will always favor the status quo… • … so it is the storage industry’s problem and challenge to help
clarify these issues in the mind of poten?al buyers!
18
Trying to classify cloud applica(ons…
• Try to list the use cases for the cloud apps: • Web Publishing
• Content Archiving • Primary Storage
• Secondary Storage … • not really a linear list, but rather orthogonal, like:
Sec
onda
ry
Sto
rage
Prim
ary
Sto
rage
Web Content
Enterprise Content
19
Turning problem into opportunity
• There clearly is a cloud storage classifica?on vacuum in the industry and especially in the mind of poten?al buyers
• Most analysts and vendors try to push a storage-‐technical classifica?on, tweaked to suit their own offerings: grid, cloud, cluster, NAS, CAS, COS, RAIN, MAID, whatever.
• The prospect looks at this from his/her applica?on’s point of view and doesn’t see the match FRUSTRATION!!
• A golden opportunity presents itself to introduce a cloud storage classifica?on that looks at the world the way the prospect does: from an application data perspective
20
The Cloud Storage Applica(on Quadrant:
primary secondary safety
private
shared
public
(enterprise) content archiving
web content publishing
personal PC backup service
CloudFolder (HSM)
Swedish Music Archive
Johns Hopkins University CIDR genomic info repository & archive
Map any app…!the way the customer sees it!!
large bank – client docs repository & archive
large telco handset content sharing & repository
tiers
exposure
21
Integra(ng & leveraging the enterprise cloud
App App
App App
App App
…learn by watching actual customers! !
22
Integra(ng & leveraging the enterprise cloud …learn by watching actual customers! !
App App
App App
App App AURA Ac?ve Unified
Repository & Archive
App
App
App
App App
23
Summarizing:
• Serious storage environment disrup?on in full swing
• Unstructured content is the main issue
• Private storage clouds in the enterprise can help: • reducing complexity, TCO & power consump?on
• long term archive
• compliance
• unified repository as founda?on of new applica?on stack
Make that archive work for a living!