101 ab 1415-1445

29
© Supermicro 2012 The Infrastructure of Tomorrow, Today Integrating Supermicro, Greenplum and SAS to enable Big Data Analytics Jeff Tsai 蔡穎碩 Solution Manager

Upload: chiou-nan-chen

Post on 22-Apr-2015

601 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 101 ab 1415-1445

© Supermicro 2012

The Infrastructure of Tomorrow, Today – Integrating Supermicro, Greenplum and SAS

to enable Big Data Analytics

Jeff Tsai 蔡穎碩

Solution Manager

Page 2: 101 ab 1415-1445

Agenda

Big Data Analytics Platform & Infrastructure

EMC+Supermicro

1,000 Nodes Hadoop Cluster

Page 3: 101 ab 1415-1445

!!!

!!!

!!!

!!!

!!!

“Big Data Is Less

About Size, And

More About

Freedom” ―Techcrunch

!!!

!!!

!!! “Findings: „Big Data‟

Is More Extreme

Than Volume”

― Gartner

“Big Data! It‟s Real,

It‟s Real-time, and

It‟s Already

Changing Your

World” ―IDB

“Total data: „bigger‟ than big data”

― 451 Group

THE ERA OF

BIG DATA IS HERE…

Page 4: 101 ab 1415-1445

Data Sources Are Expanding

Source : 2011 IDC Digital Universe Study

GROW 44X IN THE NEXT 10 YEARS

THE DIGITAL UNIVERSE WILL

Page 5: 101 ab 1415-1445

BIG Data is Just a Bunch of Data to Store…?

2009 2010 2011 2012 2013 2014

0

10

20

30

40

50

60

70

80

90

Source: IDC

File Based: 60.7% CAGR Block Based: 21.8% CAGR

By 2012, 80% of all storage capacity sold will be for file-based data

Big

Data

Sources

OR

Page 6: 101 ab 1415-1445

To Create Significant value to your business…

HOW?...

Page 7: 101 ab 1415-1445

Make BIG Data

Accessible

Identify the data source

Store the data

Connect applications and users

Utilize the data in different views

Page 8: 101 ab 1415-1445

EMC UAP Solutions – Analytics Platform

This is what my

analytics

environment looks

like…

Page 9: 101 ab 1415-1445

Building The Big Data Analytics “Stack”

Greenplum Chorus Enterprise Collaboration Platform for Data

Greenplum Database

Enterprise & Community Editions

World’s Most Scalable MPP Database Platform

Analytic Toolsets (Business Analytics, BI, Statistics, etc.)

Greenplum HD

Hadoop Enterprise & Community Editions

Enterprise Analytics Platform for Unstructured Data

Greenplum Data Computing Appliances Purpose-built for Big Data Analytics

Page 10: 101 ab 1415-1445

E M C A C Q U I R E S G R E E N P L U M O N J U LY 2 0 1 0

“For three years, Gartner has identified Greenplum as

the most advanced vendor in the visionary

quadrant of its data warehouse DBMS Magic Quadrant….”

– Gartner

Greenplum Becomes the Foundation of EMC’s Data Computing Division

Page 11: 101 ab 1415-1445
Page 12: 101 ab 1415-1445

SAS at a Glance

Company Highlight: • Founded 1976: 11,000+ employees in 400+

offices

• 2010 worldwide revenue $2.43 B

• IDC: SAS is leader in Analytics with a 34.5%

market share : Analytics and Reporting

• 4.5 million users worldwide

• 50,000+sites in 114 countries

• From Tools to Vertical Solutions Services

11% Financial Services

42%

Retail

4% Other

2%

Manufacturing

6%

Healthcare

& Life Sciences

8%

Government

14% Energy & Utilities

2%

Education

3%

Communications

8%

Page 13: 101 ab 1415-1445

Overview

Revenues: FY09 $500M, FY10 $721M , FY11 ~$1B

Global Footprint: >100 Countries

Production: US, EU and Asia Production facilities

Engineering: 70% of workforce in engineering (30% growth through recession)

Market Share: #1 Server Channel (SMCI enables ~10% of global server market)

Brand Equity: Growing public profile since 2007 IPO

Corporate Focus: Energy Efficiency, Earth-friendly, Green Technology Innovation

Founded in 1993, HQ– San Jose, CA, 2007 NASDAQ: SMCI

SMC Inc., HQ

San Jose, CA

SMC BV,

The Netherlands

SMC TW,

Taiwan

Page 14: 101 ab 1415-1445

Product Family

Resource Optimized (WIO/UIO) Twin Architecture GPU SuperComputing

Embedded

SuperBlade

Storage Server

Workstation Mainstream Business Solutions

Application Optimized: Multi I/O

Data Center Optimized

Page 15: 101 ab 1415-1445

Server Building Block Solutions®

>550

Motherboards >1300

Chassis

> 350

Cooling

Modules

> 140 Power

Supplies Open

CPU/ Memory

Operating

Systems /

Applications

In-House Design and Server Building Block Solutions®

Technology Partners Customer Requirements

OEM

Specs

In-House Design

Optimized

Data Center

Tri-Lab

(1) As of Q2, 2009

Server Building Block Solutions®

Application Optimized

Page 16: 101 ab 1415-1445

Big Data Analytics on Hadoop

Internet companies are not built on SQL but are building Analytics on Hadoop/NoSQL

Existing Hadoop Users (Internet)

This is what I think

my analytics

environment looks

like…

Pig

Hadoop S

yste

m

Manag

em

ent

& C

oord

ination

Hadoop Storage

MapReduce Layer

ETL Tools

Web Portal,

Social Networks

Hive

BI &

Reporting

HBase

Web Apps

Page 17: 101 ab 1415-1445

Hadoop Components (hadoop.apache.org)

• Hadoop Distributed File System HDFS

• Framework for writing scalable data applications MapReduce

• Procedural language that abstracts lower level MapReduce Pig

• Highly reliable distributed coordination Zookeeper

• Data warehouse infrastructure built on top of Hadoop Hive

• Database for random, real time read/write access HBase

• workflow/coordination to manage jobs Oozie

• Scalable machine learning libraries Mahout

Page 18: 101 ab 1415-1445

What can Hadoop do for you?

Financial Services

Better knowing customers

Risk analysis and management.

Fraud detection and security

analytics.

Telecommunications Customer churn prevention.

Price optimization and marketing

Network analysis and optimization

Customer experience management

Healthcare

Patient care quality

Drug development

Data Source: Cloudera

Web & e-Tailing Web usage, click stream behavior

Market & customer segmentation

Ad customer targeting

On-line fraud detection

Government Fraud detection

Compliance and regulatory analytics

Retail

Market and consumer segmentation

Merchandizing and cross-selling

Promotion and campaign analysis

Page 19: 101 ab 1415-1445

Hadoop Use Cases

Linkedin – “People You May Know” and other facts

Yahoo! – Hadoop to support AdSystems and web search

Visa – Credit card fraud detection and analysis

T-Mobile – Churn analysis, user experience

Amazon, Baidu, AOL, eBay, Facebook, Twitter, …

Data Source: Cloudera

Page 20: 101 ab 1415-1445

Hadoop Cluster HW selection

What’s the HW configuration for Hadoop clusters?...

It depends, workloads matter.

CPU Intensive

Machine learning

Natural language processing

Complex data mining

Feature extraction

I/O Intensive

Data importing and exporting

Indexing

Searching

Grouping

Decoding/decompressing

Data Storage Capacity

# of data mirroring

TCO Rack space

Power consumption

Different workloads

General Configuration

2 Quad Core CPUs

16-96GB Memory

2 x GE

1TB-2TB Disk x n

1U/2U Rack mount

Page 21: 101 ab 1415-1445

Production-scale testing of Apache Trunk & hosted environment for customer POC‟s

Proven at Scale with Worldwide Support

Industry’s largest Hadoop

support team

Industry‟s most accomplished

Hadoop talents (from Yahoo!,

LinkedIn, Talend, etc.)

Tested at scale on the

Greenplum Analytics

Workbench

1,000-node, 24-petabyte cluster

Multi-million dollar investment

by EMC and partners

Reduced risk for EMC

customers

Certification of partner products

Bringing Rapid Innovation

to Hadoop

Page 22: 101 ab 1415-1445

Supermicro Server Functions in the Cluster

Supermicro

Data Nodes

Supermicro Infrastructure

Nodes

2U Storage Server

2U Twin2 Server

• 1,000+ Physical Supermicro Server Nodes (10k virtual nodes)

• 12,000 Processor Cores

• 24 Petabytes of Storage Capacity (6Gbps SATA)

• 48 Terabytes RAM

• 56 Gbps Infiniband Connectivity

Page 23: 101 ab 1415-1445

Supermicro Multi-Node Server Solutions

Switch Data Center - Las Vegas NV

Page 24: 101 ab 1415-1445

…Results before fine-tuning.

World record performance results expected to be announced before 2013.

Min

ute

s

Initial Benchmark Data

Page 25: 101 ab 1415-1445

Other testing programs – Supermicro & Intel

CPU Benchmark

Page 26: 101 ab 1415-1445

Supermicro Advantages

Why Supermicro…

Building Blocks for different

Workloads & Requirement

-Meet any Hadoop workloads by models

-I/O, CPU, Disks, Density

- Customize by specific workload requirement

High Efficiency, High Quality

-Green IT

-High Efficiency Power

-High Quality for highest system availability and

best utilization

Proven solutions

-EMC Greenplum proven solutions

-100% Apache Hadoop Compatible

-Benchmark and testing programs with partners

TCO

Solutions to Cost-Effective Hadoop Clusters

Best choice of Hadoop Hardware platforms

Page 27: 101 ab 1415-1445

Shipped Directly From US, NL, TW

Turnkey Hadoop:

Supermicro Complete Rack Solutions

One Stop Shop for Hardware, End to End Total Solutions

Speedup Deployment With Ready to Run Rack Systems

Single Source, Consistent Build Quality and Delivery Time

Multi-Vendor Compatibility Test, Zero Compatibility Issue

Premium Service With Competitive Pricing

Page 28: 101 ab 1415-1445

Broad Product Portfolios and Building Blocks

Best platform to your Hadoop cluster

Page 29: 101 ab 1415-1445

Q&A

Thank You

SMC Inc., HQ

San Jose, CA

SMC BV,

The Netherlands

SMC TW,

Taiwan