sponsored workshop by cleversafe from structure:data 2012

18
Store and Analyze Big Data Without Limits March 23, 2012 Friday, July 27, 2012

Upload: gigaom

Post on 28-Nov-2014

681 views

Category:

Technology


1 download

DESCRIPTION

Sponsored workshop from Cleversafe. #dataconf More at http://event.gigaom.com/structuredata/

TRANSCRIPT

Page 1: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Store and Analyze Big Data Without LimitsMarch 23, 2012

Friday, July 27, 2012

Page 2: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 2

Big Data Challenges

75% of data is generated by individuals, andenterprises have liability for 80% of data generated

From 800 exabytes in 2008 to 35,000 exabytes in 2020

90% of data is unstructured format, and89% of growth in storage is unstructured format

Concern for data security and reliability in the Cloud

Public Cloud deployments and content depots are projectedto grow to $7.4B by 2014 to accommodate capacity

“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”

– IDC Extracting Value from Chaos, May 2011

Friday, July 27, 2012

Page 3: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3

Capacity-Optimized storage growing 63% annually*

0

1250

2500

3750

5000

2002 2012

Dat

a

Year

Data Storage is Transforming

Friday, July 27, 2012

Page 4: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3

Capacity-Optimized storage growing 63% annually*

0

1250

2500

3750

5000

2002 2012

Dat

a

Year

Data Storage is Transforming

Traditional DataNumbers, text,databases

Friday, July 27, 2012

Page 5: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3

Capacity-Optimized storage growing 63% annually*

0

1250

2500

3750

5000

2002 2012

Dat

a

Year

Data Storage is Transforming

New DataImages, scans, audio files videos, hi-res videos

Traditional DataNumbers, text,databases

Friday, July 27, 2012

Page 6: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 3

Capacity-Optimized storage growing 63% annually*

•Growing 100X every 10 years•Required new methods

0

1250

2500

3750

5000

2002 2012

Dat

a

Year

Data Storage is Transforming

New DataImages, scans, audio files videos, hi-res videos

Traditional DataNumbers, text,databases

Friday, July 27, 2012

Page 7: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.

• Understand certain IP traffic patterns for tracking fraudulent activity

• Determine online purchasing patterns for a retailer or merchandiser to help launch a new product or service

• Identify hot new trends in entertainment, sports, gaming, etc.

• In this election year, understand the appeal of a political message and more directly target potential voters

Practical Applications for a 10 Exabyte Data Storage System

Friday, July 27, 2012

Page 8: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 5

RAID Can’t Effectively Scale

• RAID is not ideal for storing large amounts (PB) of digital content.

• RAID does not allow configurable reliability to be established.

• Increasing amounts of stored data is raising the risk of data loss and corruption.

• Spindle size is increasing faster than IO performance causing longer rebuild times and exposure to data loss.

• Spindle size is equal to Unrecoverable Read Error (URE) rates causing silent data corruption.

Friday, July 27, 2012

Page 9: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 6

How Dispersed Storage Technology Works

Data is expanded, virtualized, transformed, sliced and dispersed using Information Dispersal Algorithms.1

DATA Cleversafe IDA

Cleversafe IDA

Even with individual servers or entire sites down, real time bit perfect data is retrieved from a subset of slices.3

SITE 1 SITE 2 SITE 3 SITE 4

Slices are distributed to separate disks, storage nodes and geographic locations.2

DATA

Friday, July 27, 2012

Page 10: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 7

What Does a Limitless Scale Storage System Look Like?

• Single instance of data with guaranteed reliability and availability – not RAID and copy based

• Built-in geographic distribution for high availability and site failure tolerance

• Data concurrency with multiple simultaneous readers and writers• Continuous data availability through upgrade cycles and storage

node replacement• Flat namespace with highly efficient metadata management and no

database or master name node• Architecture delivers independent scaling of storage capacity and

performance• Take advantage of largest capacity most power-efficient disk drives

available in the industry

Friday, July 27, 2012

Page 11: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.

• Data integrity and availability provided without the overhead of replication

• Deployed across multiple sites for site failure tolerance and high availability

• High bandwidth network between sites• Utilize a portable datacenter (PD) container model

for rapid setup and mobility• Each PD houses multiple racks for storage and a

single rack for network connectivity• Flat architecture with no centralized database or

management node• Hundreds of simultaneous readers/writers with

instantaneous access to billions of objects

10 Exabyte Data Storage System Configuration

Portable Datacenter (PD)

Friday, July 27, 2012

Page 12: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 9

System Configuration

• 16 sites across the US• High bandwidth WAN• IDA W32, T22, 1.45 expansion• Massively parallel distributed

readers/writers• Filter capability with ingest• Access embedded in application

• 35 PDs per site (560 total)• 21 Racks / PD (11,760 total)• 189 Storage Nodes / PD

(105,840 total)• 45 3TB drives per storage node

(4.7M total)• ~15 EB raw, ~10EB usable

Friday, July 27, 2012

Page 13: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved.

System Architecture

10

Near Real-time Parallel Data Analyzers (and filters)

Multiple Simultaneous Writers

Very Big Data Sources

Multiple Simultaneous Readers and Writers

Secondary (Parallel) Data Analyzers

Very Large Object Storage Cloud• Deployed across multiple sites• Using container-based (POD) model• Flat architecture, no central database

Analysis & Results

Data & Indexes

Metadata

Friday, July 27, 2012

Page 14: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 11

Use Case: Store and Analyze 6 months of Internet traffic

Total Global Monthly Internet Traffic Growing 32% Annually

PB 80 Exabytes per month in

Dec. 2015

Source: Cisco VNI, 2010

IP Traffic North America Monthly Worldwide Monthly

2012 12 EB 37 EB

2015 23 EB 80 EB

Friday, July 27, 2012

Page 15: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 12

Use Case: Store and Analyze 6 months of Internet traffic

Very Large Scale Processing Requirements• Ingest/Filter : 4.6 TB per sec• Analyze/Index : ~0.5 TB per sec (assuming a 10:1 filter of IP traffic)

Very Large Scale Storage Requirements

• Store 10EB grow to 1,000 EB• ~900 GB/sec of data ingest• Growing 32% per year

Source: Cisco VNI, 2010

North America Monthly

North America Rolling 6 Months*

2012 12 EB 96 EB

Potential Solutions:• Massively parallel, distributed pioneered by Google, Yahoo, etc.

Traditional data storage systems not capable of this scale

Cleversafe Focus

** Rolling 6 months requires capacity to store 8 months worth of data in order to safely capture the next month before deleting the oldest month’s worth of data

Friday, July 27, 2012

Page 16: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 13

Key Takeaways

• RAID can’t effectively scale to multi-petabytes and beyond• A limitless scale data storage system requires:

– Single instance of data with guaranteed reliability and availability– not RAID and copy based

– Built-in geographic distribution for high availability and site failure tolerance

– Data concurrency with multiple simultaneous readers and writers

– Flat namespace with highly efficient metadata management and no database or master name node

Friday, July 27, 2012

Page 17: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Copyright © 2012 Cleversafe, Inc. All rights reserved.Copyright © 2012 Cleversafe, Inc. All rights reserved. 14

Friday, July 27, 2012

Page 18: SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012

Text

Sponsored Workshop

Friday, July 27, 2012