sponsored workshop by cleversafe from structure:data 2012
DESCRIPTION
Sponsored workshop from Cleversafe. #dataconf More at http://event.gigaom.com/structuredata/TRANSCRIPT
Store and Analyze Big Data Without LimitsMarch 23, 2012
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 2
Big Data Challenges
75% of data is generated by individuals, andenterprises have liability for 80% of data generated
From 800 exabytes in 2008 to 35,000 exabytes in 2020
90% of data is unstructured format, and89% of growth in storage is unstructured format
Concern for data security and reliability in the Cloud
Public Cloud deployments and content depots are projectedto grow to $7.4B by 2014 to accommodate capacity
“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”
– IDC Extracting Value from Chaos, May 2011
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3
Capacity-Optimized storage growing 63% annually*
0
1250
2500
3750
5000
2002 2012
Dat
a
Year
Data Storage is Transforming
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3
Capacity-Optimized storage growing 63% annually*
0
1250
2500
3750
5000
2002 2012
Dat
a
Year
Data Storage is Transforming
Traditional DataNumbers, text,databases
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3
Capacity-Optimized storage growing 63% annually*
0
1250
2500
3750
5000
2002 2012
Dat
a
Year
Data Storage is Transforming
New DataImages, scans, audio files videos, hi-res videos
Traditional DataNumbers, text,databases
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3
Capacity-Optimized storage growing 63% annually*
•Growing 100X every 10 years•Required new methods
0
1250
2500
3750
5000
2002 2012
Dat
a
Year
Data Storage is Transforming
New DataImages, scans, audio files videos, hi-res videos
Traditional DataNumbers, text,databases
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.
• Understand certain IP traffic patterns for tracking fraudulent activity
• Determine online purchasing patterns for a retailer or merchandiser to help launch a new product or service
• Identify hot new trends in entertainment, sports, gaming, etc.
• In this election year, understand the appeal of a political message and more directly target potential voters
Practical Applications for a 10 Exabyte Data Storage System
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 5
RAID Can’t Effectively Scale
• RAID is not ideal for storing large amounts (PB) of digital content.
• RAID does not allow configurable reliability to be established.
• Increasing amounts of stored data is raising the risk of data loss and corruption.
• Spindle size is increasing faster than IO performance causing longer rebuild times and exposure to data loss.
• Spindle size is equal to Unrecoverable Read Error (URE) rates causing silent data corruption.
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 6
How Dispersed Storage Technology Works
Data is expanded, virtualized, transformed, sliced and dispersed using Information Dispersal Algorithms.1
DATA Cleversafe IDA
Cleversafe IDA
Even with individual servers or entire sites down, real time bit perfect data is retrieved from a subset of slices.3
SITE 1 SITE 2 SITE 3 SITE 4
Slices are distributed to separate disks, storage nodes and geographic locations.2
DATA
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 7
What Does a Limitless Scale Storage System Look Like?
• Single instance of data with guaranteed reliability and availability – not RAID and copy based
• Built-in geographic distribution for high availability and site failure tolerance
• Data concurrency with multiple simultaneous readers and writers• Continuous data availability through upgrade cycles and storage
node replacement• Flat namespace with highly efficient metadata management and no
database or master name node• Architecture delivers independent scaling of storage capacity and
performance• Take advantage of largest capacity most power-efficient disk drives
available in the industry
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.
• Data integrity and availability provided without the overhead of replication
• Deployed across multiple sites for site failure tolerance and high availability
• High bandwidth network between sites• Utilize a portable datacenter (PD) container model
for rapid setup and mobility• Each PD houses multiple racks for storage and a
single rack for network connectivity• Flat architecture with no centralized database or
management node• Hundreds of simultaneous readers/writers with
instantaneous access to billions of objects
10 Exabyte Data Storage System Configuration
Portable Datacenter (PD)
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 9
System Configuration
• 16 sites across the US• High bandwidth WAN• IDA W32, T22, 1.45 expansion• Massively parallel distributed
readers/writers• Filter capability with ingest• Access embedded in application
• 35 PDs per site (560 total)• 21 Racks / PD (11,760 total)• 189 Storage Nodes / PD
(105,840 total)• 45 3TB drives per storage node
(4.7M total)• ~15 EB raw, ~10EB usable
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.
System Architecture
10
Near Real-time Parallel Data Analyzers (and filters)
Multiple Simultaneous Writers
Very Big Data Sources
Multiple Simultaneous Readers and Writers
Secondary (Parallel) Data Analyzers
Very Large Object Storage Cloud• Deployed across multiple sites• Using container-based (POD) model• Flat architecture, no central database
Analysis & Results
Data & Indexes
Metadata
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 11
Use Case: Store and Analyze 6 months of Internet traffic
Total Global Monthly Internet Traffic Growing 32% Annually
PB 80 Exabytes per month in
Dec. 2015
Source: Cisco VNI, 2010
IP Traffic North America Monthly Worldwide Monthly
2012 12 EB 37 EB
2015 23 EB 80 EB
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 12
Use Case: Store and Analyze 6 months of Internet traffic
Very Large Scale Processing Requirements• Ingest/Filter : 4.6 TB per sec• Analyze/Index : ~0.5 TB per sec (assuming a 10:1 filter of IP traffic)
Very Large Scale Storage Requirements
• Store 10EB grow to 1,000 EB• ~900 GB/sec of data ingest• Growing 32% per year
Source: Cisco VNI, 2010
North America Monthly
North America Rolling 6 Months*
2012 12 EB 96 EB
Potential Solutions:• Massively parallel, distributed pioneered by Google, Yahoo, etc.
Traditional data storage systems not capable of this scale
Cleversafe Focus
** Rolling 6 months requires capacity to store 8 months worth of data in order to safely capture the next month before deleting the oldest month’s worth of data
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 13
Key Takeaways
• RAID can’t effectively scale to multi-petabytes and beyond• A limitless scale data storage system requires:
– Single instance of data with guaranteed reliability and availability– not RAID and copy based
– Built-in geographic distribution for high availability and site failure tolerance
– Data concurrency with multiple simultaneous readers and writers
– Flat namespace with highly efficient metadata management and no database or master name node
Friday, July 27, 2012
Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 14
Friday, July 27, 2012
Text
Sponsored Workshop
Friday, July 27, 2012