netezza pure data

24
IBM PureData System for Analytics Powered by Netezza Hossein Sarshar

Upload: hossein-sarshar

Post on 18-Jul-2015

261 views

Category:

Data & Analytics


7 download

TRANSCRIPT

Page 1: Netezza pure data

IBM PureData System for Analytics

Powered by Netezza

Hossein Sarshar

Page 2: Netezza pure data

Agenda• What is PureData and Netezza

o History

o Characteristics

o Product chain

• PureData Hardware Architectureo Introduction

o Hardware architecture

o Paralleled structures

• Analytics with PureDatao Introduction

o In-database analytics tools

• Demo

IBM® PureData™ for Analytics 2

Page 3: Netezza pure data

What is PureData and Netezza

PureSystems

PureFlex PureApplication

IBM® PureData™ for Analytics 3

Page 4: Netezza pure data

In 2010, IBM bought a new analytics platform called

Netezza. It was founded in 2000 at Marlborough, CA.

IBM later rebranded it to PureData.

What is PureData and Netezza

PureSystems

PureFlex PureApplication PureData

IBM® PureData™ for Analytics 4

Page 5: Netezza pure data

PureSystems Product Family

PureFlex: o Combines and optimizes compute, storage, networking and virtualization

capabilities under a single, unified management console into an

infrastructure system.

PureApplication:o Is a platform system designed and tuned specifically for transactional web

and database applications.

PureData:o Based on Netezza technology, PureData is all data experts need in a

single well tuned appliance.

IBM® PureData™ for Analytics 5

PureData

Operational Analytics

Transactions Analytics

Page 6: Netezza pure data

PureSystemsCharacteristics

• Built-in Expertso No indexing/tuning/partitioning

o Fully parallel, optimized in-Database Analytics.

o No storage administration.

o No software installation.

• Integration by Design:o Server, Storage, Database in one easy to use package.

o Automatic parallelization and resource optimization to scale economically

o Enterprise-class security and platform management

• Simplified Experience:o Up and running in hours.

o Minimal ongoing administration.

o Standard interfaces to best of breed Analytics, BI, and data integration tools.

o Built-in analytics capabilities allow users to derive insight from data quickly.

o Easy connectivity to other Big Data Platform components

IBM® PureData™ for Analytics 6

Each of these come as an appliance equal to simplified yet strong private clouds with

minimal administration

Page 7: Netezza pure data

PureData Introduction• It is a datawarehousing and data analytics

appliance that is fast enough to process terabytes of data in seconds. It is a fully parallel machine.

• Netezza’s main technology is using FPGA (Field Programmable Gateway Array) to filter unnecessary files in parallel manner.

• PureData uses Netezza technology to perform deep analytics on huge amount of data in a reasonable time.

• It is purpose-built for high performance analytics.

• It supports all DB structures (3NF, Star, De-Normalized table)

IBM® PureData™ for Analytics 7

Page 8: Netezza pure data

PureData Architecture

IBM® PureData™ for Analytics 8

Disk storage

RAID 1 disksHigh speed data

streams

SMP Host

Redhat linuxservers

OptimizerCompiler

A gateway to the system

Snippet-Blades

Query accelerator using FPGAs

Page 9: Netezza pure data

S-Blades (SPU)

IBM® PureData™ for Analytics 9

Page 10: Netezza pure data

S-Blades

IBM® PureData™ for Analytics 10

Intel Quad-Core

Dual-Core FPGADRAM

IBM BladeCenter Server Netezza DB Accelerator

SAS Expander

Module

SAS Expander

Module

Page 11: Netezza pure data

S-Blades Overview• There are 8 intel core on IBM Blade-Center Server

and 8 FPGA on Netezza DB accelerator.o FPGA has similar dimensions a CPU has, consumes 5 times less power and

clock speed is about 5 times less

o More caching capability

o Low latency and high throughput

• Each of these S-Blades takes ownership of 6-8 disks.

• The queries are divided into subqueries that are

processed by S-Blades.

IBM® PureData™ for Analytics 11

Page 12: Netezza pure data

PureData AMPP (Shared-Nothing) Architecture

12

Advanced Analytics

Loader

ETL

BI

Applications

FPGA

Memory

CPU

FPGA

Memory

CPU

FPGA

Memory

CPU

Hosts

SMP

Host

Disk

EnclosuresS-Blades™

Network

Fabric

Netezza Appliance

Page 13: Netezza pure data

FPGA Secret Sauce

IBM® PureData™ for Analytics 13

FPGA Core CPU Core

Uncompress Project Restrict,

Visibility

Complex ∑

Group by, …

select DISTRICT,

PRODUCTGRP,

sum(NRX)

from MTHLY_RX_TERR_DATA

where MONTH = '20091201'

and MARKET = 509123

and SPECIALTY = 'GASTRO'

Slice of table

MTHLY_RX_TERR_DATA

(compressed)

where MONTH = '20091201'

and MARKET = 509123

and SPECIALTY = 'GASTRO'

sum(NRX)

select DISTRICT,

PRODUCTGRP,

sum(NRX)

Using FPGA reduces a tremendous among of

unnecessary data movement

Page 14: Netezza pure data

PureData System Configuration

14IBM® PureData™ for Analytics

Page 15: Netezza pure data

PureData System Configuration

IBM® PureData™ for Analytics 15

Page 16: Netezza pure data

PureData System Configuration

IBM® PureData™ for Analytics 16

Single Rack System Multi Rack System

Specs N3001-

002

N3001-

005

N3001-

010

N3001-

020

N3001-

040

N3001-

080

Racks 1 1 1 2 4 8

Active S-Blades 2 4 7 14 28 56

CPU Cores 40 80 140 280 560 1120

FPGA Cores 32 64 112 224 448 896

User Data in TB 32 98 192 384 768 1536

N3001 is the newest IBM PureData

Page 17: Netezza pure data

What is Achievable• Having agile analytics platform.

• No administration effort to install/manage

• Scalability in petabyte level

• Linear speedup scalability by adding additional

racks.

• Big Data Meets Deep Analytics => No need to

sample

IBM® PureData™ for Analytics 17

Page 18: Netezza pure data

High Performance Analytics Architecture

IBM® PureData™ for Analytics 18

Page 19: Netezza pure data

PureData Analytics Modules

IBM® PureData™ for Analytics 19

Page 20: Netezza pure data

Netezza In-Database Analytics Options

Classification Time Series Clustering

Associate Rules

Simulation and Monte

Carlo AnalysisGeospatial

IBM® PureData™ for Analytics 20

Page 21: Netezza pure data

Demo• Installation

• Client Tool Exploration

• Command Execution

IBM® PureData™ for Analytics 21

Page 22: Netezza pure data

Summary• A system for analytics

• Out-of-the-box solution

• It uses FPGA technology to boost query execution

• It uses nothing-shared approach.

• PureData uses open standards to communicate to

outside world

• It has many NZ in-database and 3rd party in-

database options to enrich our analytics

IBM® PureData™ for Analytics 22

Page 23: Netezza pure data

References• http://www-01.ibm.com/software/data/netezza/

• http://www.ibm.com/ibm/puresystems/ca/en/

IBM® PureData™ for Analytics 24

Page 24: Netezza pure data