simplifying big data - · pdf filebig data – dream it. build it. realize it paul kent,...

53

Upload: trinhdieu

Post on 24-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,
Page 2: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SimplifyingBigDataBestPrac*cesforOn-PremisesandCloudArchitecturesCON8746

PaulKentVicePresidentBigData,SASJean-PierreDijcksMasterProductManagerBigData,Oracle

Presentedwith

Page 3: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirecQon.ItisintendedforinformaQonpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfuncQonality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andQmingofanyfeaturesorfuncQonalitydescribedforOracle’sproductsremainsatthesolediscreQonofOracle.

3

Page 4: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

•  OracleBigDataAppliance–  SimpletoDeploy–  FastoutoftheBox–  LowerCostthanDIYClusters

•  OracleBigDataCloudService–  Samearchitectureason-premises–  Highperformancedataprocessing–  EnablesBigDataSQLinOraclePublicCloud

4

•  OracleBigDataSQL–  OracleSQLacrossALLyourdata–  SecureaccesstodatainNoSQL,HadoopandOracleDatabase

–  Highperformanceonun-modeleddata

•  OracleExadata–  Bestpla]ormforallOracleDatabaseworkloads

–  Uniqueso`warethatmaximizesOracleDatabase

–  Standardize,opQmized,hardenedend-to-end

BigDataPla]ormopQonsfromOracle

Page 5: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 5

100%UpwardCompaQbilitywithOn-PremisesEnablesCoexistenceandMigraQon

OracleCloud

CoExistenceandMigra*on

SameArchitecture

SameStandards

SameProducts

PrivateCloud

Page 6: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

ProgramAgenda

RightEngine–RightJob

RightPla]ormfortheEngine

ConfiguringforMixedWorkloads

Real-WorldExampleswithSAS

1

2

3

4

6

Page 7: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RightEngine–RightJob

7

Page 8: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Asimplesetofcriteria

8

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Concurrency

ComplexQueryResponseTimes

SingleRecordRead/WritePerformance

BulkWritePerformance

PrivilegedUserSecurity

GeneralUserSecurity

GovernanceTools

SystemperTBCost

BackupperTBCost

SkillsAcquisiQonCost

RDBMS

NoSQLDB

Hadoop

PerformanceSecurityCost

Page 9: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

9

Ingest

Page 10: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

DRType SecondCluster NodeReplica SecondRDBMS

DRUnit File Record TransacQon

DRTiming Batch Record TransacQon

10

Ingest

DR

Page 11: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 11

High-levelTechnologyComparisonHDFS NoSQL RDBMS

DataType Chunk Record TransacQon

WriteType Synchronous EventuallyConsistent ACIDCompliant

DataPreparaQon NoParsing NoParsing ParsingandValidaQon

DRType SecondCluster NodeReplica SecondRDBMS

DRUnit File Record TransacQon

DRTiming Batch Record TransacQon

ComplexAnalyQcs? Yes No Yes

QuerySpeed Slow FastforsimplequesQons Fast

#ofDataAccessMethods One(fulltablescan) One(indexlookup) Many(OpQmized)

Ingest

DR

Acces

AffordableScale LowPredictableLatency FlexiblePerformance

Page 12: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RightPla]ormfortheEngine

12

Page 13: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

BigDataApplianceX5-2Hardware

•  SunOracleX5-2LServers,witheach– 2*18coreIntelXeonE5-2699v3– 128GBDDR4Memory– 96TBSASDiskSpace

•  3InfiniBandswitchesforallinternalHadoopandSparktraffic

•  1CiscoManagementSwitch

Note:SupportforClouderaCDHDataHubincluded

13

Page 14: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

HadoopHardware–PricingTrendswithBDA(List)

14

$-

$500.00

$1,000.00

$1,500.00

$2,000.00

$2,500.00

BDAX2-2 BDAX3-2 BDAX4-2 BDAX5-2

PriceperTB(diskraw)

PriceperCore

PriceforGB/Mem

Page 15: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Storage•  DensedrivesprovideflexibilityaswellaslowcostperTB

•  HDFSsupportsdensedriveswithmorescalabilityenhancementsinCDH5.5

•  RecommendaQon:Storageischeap,ignoreitforsizing

CPU•  “Hadoop”nodesareincreasinglyrunningmixedworkloadsand“externalprocesses”:–  HDFS–  ETL&StreamProcessing– MachineLearning

•  RecommendaQon:SizeCPUspernodetowardsthehigh-endcorecounts

15

Memory•  Mixedworkloadsandexternalprocessesdoneedmemory

•  LargeDIMMsdosQllcommandapremiumin2-socketservers

•  RecommendaQon:Sizememorytothemid-spectrumduetohockeysQckeffects.Thisischanging!

HardwareProfilesforHadoop

Page 16: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

HardwareProfilesforHadoop–Conclusion•  Stopworryingaboutdiskspace.Itischeapandavailable.

– SSD?OncerealdataQeringisavailable,orforNoSQLDatabases• Worryinsteadabout:

– Network•  Expensiveando`entheforgozenbozleneck•  Thinkaboutdataingestandtherequiredcapacity

– CPU• MixedworkloadswillrequiremoreCPUswithinthenode•  EveryonewantstoruncomputeatthesameQme

– Memory•  BeconsciousofactualneedsandrestricQons•  NotallSWcanmakeuseoflotsofmemory

16

Bewaryofmythsandlore.Thehardwareworldhaschanged…

Instead,dothemathandcalculatethebozlenecksandusagesyou

expect

Page 17: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

17

Disk

CPU

Page 18: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

18

Performance/DataStorage

Disk

CPU

VM1 VM2

Hyperviseroverheadistypicallylow,buttheIOoverheadhasimpact,whichiswhySR-IOVisappliedtoensurehighthroughput

Retain“directazacheddisks”forHadoopworkloads

Goal:Datapermanentlylivesinthe“Hadoopnodes”

Page 19: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

VirtualizaQonProfilesforHadoop

19

Flexibility/Applica*onHos*ng

Disk

CPU

VM-C1 VM-C2

VM-S1 VM-S2

Drivetowardsflexibilityattheexpenseofperformance

Goal:Spinupenvironmentsandthedataquickly

Page 20: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RecommendaQonsandFuture?•  Ifyoudon’tneedthem,don’tusethem

– RememberyouwillneedtoapplyOSupdatestoallVMs,andwhentheyhavedata,thismaynotbedesirable

• Don’tcreatemanysmallVMs,rathercreatelargeVMs– PayazenQontothenumberofVMsonaphysicalnode,andconsiderwhathappensifthephysicalnodegoesoffline(dataloss?)

•  Futures– Docker?

20

Page 21: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

ConfiguringforMixedWorkloads

21

Page 22: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

SharingandManagingResources

• Note:thisisnotabout“security/isolaQon”butinsteadaboutmulQpleworkloads(tenants)onasinglecluster– HenceweareignoringVMs

•  Twomainways(outsideofVM/Docker)ofprevenQngresourcegrabs:– LinuxControlGroups(cgroups)– YetAnotherResourceNegoQator(YARN)

22

PrivateCloud

Page 23: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

YARN• CapacityScheduler

– DesignedformulQpletenants– Basedonresourcequeues– Enablesadministratorstodistributeresourcestoappsandusers

•  FairScheduler– DesignedtoaverageoutresourcesperapplicaQon

– Doessupportqueues,butasaminimumguarantee

– Lowermaintenance,butlesscontrol

23

Page 24: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Whycgroups?•  YARNdoesrequiredeveloperstowriteapplicaQonsinaspecificway

– SomeQmesitisquickerandsimplertonotadheretoYARN

•  SomeQmestherequirementsarenotoverlycomplex– cgroupsenables“guarantees”by“hardparQQoning”ofthenodesattheLinuxlevel– cgroupsarestaQcbutcreateasimpletoadministerandrobustseparaQonofresources

24

Page 25: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

RealWorldExampleswithSAS

25

Page 26: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

4Segments

•  SASandOraclePartnership• DataLakeDream• Customer#1–Telco(Asia)• Customer#2–Banking

•  Thingsyoumightnotknow…

26

Page 27: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Big Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies, Oracle Corporation Maureen Chew, Oracle Corporation Gary Granito, Oracle Corporation 488-2013

Page 28: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS AND ORACLE

WORKING TOGETHER TO CREATE CUSTOMER VALUE

•  Joint R & D development and Product Management teams in Cary and Redwood Shores

•  Focus on driving SAS technology components to run natively in Oracle database

•  Joint performance engineering optimizations

•  Template physical architectures developed based on use-cases

•  Physically tested and benchmarked together

•  Reduction in physical effort •  Overall reduction in lifecycle

costs

•  Best Practice papers •  SAS and Oracle Engineers

provide joint "Sizing and Architecture Analysis and Design"

Page 29: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

29 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

ORACLE ENGINEERED SYSTEMS FOR

SuperCluster ExaData ExaLogic Virtual

Compute

Appliance

Big Data

Appliance

Database

Backup, Recovery,

Logging Appliance

ZFS

Storage

Appliance

Page 30: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

DataLakeDream

Page 31: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

FutureState:DataLakePazern

DemandDepositsData

CreditApplicaQons

Data

CreditCardTransacQons

MortgageData

200+DataSources

ForecasQng

Profitability

AssetTracking

Others

FinanceClientScanning

OperaQonalExposure

Commercial

Consumer

Risk

Federal

SEC

ComplianceReporQng

Others

RegulatoryMoneyLaundering

CheckFraud

LoanFraud

Fraud

Datamart

Datamart

DataMart

SAS

Cognos

Cognos

QlikView

QueryTool

SASSAS

SAS

MathRackAnalysts

Page 32: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

/ 3 Stage Refinement

/source /transform /data

/card

/checking

/mortgage

/customer

/product

Page 33: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

33 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS BIG DATA ON BIG DATA APPLIANCE

•  Flexible Architectural options for SAS deployments •  Can run on Starter, Half and Full configurations

•  Optionally select nodes “N, N-1, N-2, …” for additional SAS Services such as SAS Compute Tier, SAS MidTier

•  Optionally select node subset “N, N-1, N-2, N-3, …) for more dedicated resources for SAS Analytic Compute Environment by shifting Big Data Appliance roles

•  Option to selectively add more memory on a per node basis depending on specific workload distribution

Page 34: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

34 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Midtier

STARTER BDA

SAS Visual Analytics Metadata Server SAS Compute

SAS HPA Root Node

SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP

Page 35: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

35 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Midtier

FULL RACK BDA

LASR Worker

17

HDFS Data 17

Metadata Server SAS Compute

SAS HPA Root Node

LASR Worker

18

HDFS Data 18

SAS VISUAL ANALYTICS, HIGH-PERFORMANCE ANALYTIC COMPUTE ENVIRONMENT CO-LOCATED WITH HADOOP

Page 36: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Customer#1TelcoProvider(Asia)

Page 37: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

RAS

NOC

SOC

RAS

CEM

Collection

EPQM

HR

Supply Chain

TELCO (ASIA) CHALLENGES THE SITUATION…

•  IncreasingneedforData•  Moredataheldatmoregranularlevels•  ExpansionofAnalyQcsusersintheenterprise

DI Temp Space

Staging CMDM

CI Temp Space VA Workspace EM Workspace ADHOC

(ADM/ABTs)

ADM / ABTs

XXXXX BCA

XXXXX CLM

PLDT Home

PLDT EICB

XXXXX Network

IRM// Fraud Finance

Page 38: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) BIG DATA ARCHITECTURE IMPLEMENTATION APPROACH

In Hadoop Processing

ADM & ABT DI Jobs (ETL Hadoop Target)

ADM & ABT DI Jobs (ETL SAS Target) SAN Storage

ADM DI Jobs using Sqoop (Hive)

ABT DI Jobs (ELT Hadoop Target)

(Pass Through + Explicit SQL + Hive Optimization)

Phase 0

Phase 1

Phase 2 EXADATA

BDA - CLOUDERA HADOOP

ADM (Hive)

ABTs (Hive)

Page 39: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP SAMPLE JOB FLOW - ADM

Source Table Extract Data

(Implicit SQL to partitioned table)

Table Loader (Proc Append)

Target Table (Hive) Phase 1

Phase 2 Source Table Sqoop Data

(Oozie & Sqoop Import)

Target Table (Hive)

Source Table Extract Data

(Implicit SQL to partitioned table)

Table Loader (Proc Append)

Target Table (SAS)

Phase 0

Page 40: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP

OPTIMIZED USING HADOOP SQOOP+OOZIE VIA CUSTOM SAS DI SQOOP TRANSFORM

SubjectArea TableLoadDurationinhours(pre-optimization)

LoadDuration,inhours(post-optimization)

Improvement(inhours)

%Improvement

DeltaLoadAggregationLevel

DeltaLoadDataSize(inGB)

FactTable1 0.7000 0.2383 0.4617 65.95% DAILY 5.60FactTable2 1.9500 0.5031 1.4469 74.20% WEEKLY 18.50FactTable3 3.8167 1.1969 2.6197 68.64% MONTHLY 37.20FactTable4 6.0667 0.6214 5.4453 89.76% DAILY 7.70FactTable5 0.5333 0.0617 0.4717 88.44% DAILY 1.70FactTable6 0.2500 0.0528 0.1972 78.89% DAILY 0.73FactTable7 0.4833 0.1011 0.3822 79.08% WEEKLY 2.20FactTable8 0.9667 0.2597 0.7069 73.13% MONTHLY 6.30FactTable9 5.7000 0.0697 5.6303 98.78% DAILY 1.70FactTable10 0.9333 0.1094 0.8239 88.27% WEEKLY 4.10FactTable11 1.7333 0.2517 1.4817 85.48% MONTHLY 8.10FactTable12 2.0833 0.2033 1.8800 90.24% DAILY 10.10FactTable13 4.4000 0.8367 3.5633 80.98% WEEKLY 39.80FactTable14 8.1333 2.6169 5.5164 67.82% MONTHLY 104.80FactTable15 0.4833 0.0447 0.4386 90.75% DAILY 0.75FactTable16 1.1000 0.0842 1.0158 92.35% WEEKLY 1.80FactTable17 1.3500 0.1008 1.2492 92.53% MONTHLY 3.60FactTable18 0.3833 0.0950 0.2883 75.22% DAILY 2.40FactTable19 0.6833 0.3014 0.3819 55.89% WEEKLY 14.70FactTable20 4.8333 1.1042 3.7292 77.16% MONTHLY 50.90

DimensionTable1 8.1000 2.9511 5.1489 63.57% N/A 213.70

AnalyticDataMart(ADM)

Page 41: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

TELCO (ASIA) DATA MANAGEMENT ON

HADOOP OUTCOMES

•  Reuse existing workflows •  Retarget outputs to Hadoop Friendly Formats •  Selectively upgrade processing to Hadoop Optimal

•  Batch Window Dragon Slayed! •  Infrastructure no longer challenged by explosive growth

Page 42: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Customer#2Bank(Europe)

Page 43: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

HPA/TKGridRoot Nodegbbdap01

HPA/TKGridWorkNode

gbbdap02

HPA/TKGridWorkNode

gbbdap03

NM/DNVA/VSSMP

Hadoop PRODNode (Node manager/Data Node)

HBASE/Others

SASClients

(EM, EG, SASStudio

) SAS/ACCESSTO HADOOP

NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN

NM/DN

HPA/TKGridWorkNode

gbbdap04

HPA/TKGridWorkNode

gbbdap05

HPA/TKGridWorkNode

gbbdap06

HPA/TKGridWorkNode

gbbdap07

HPA/TKGridWorkNode

gbbdap08

HPA/TKGridWorkNode

gbbdap09

HPA/TKGridWorkNode

gbbdap20

HPA/TKGridWorkNode

gbbdap21

HPA/TKGridWorkNode

gbbdap22

HPA/TKGridWorkNode

gbbdap23

HPA/TKGridWorkNode

gbbdap24

HPA/TKGridWorkNode

gbbdap25

HPA/TKGridWorkNode

gbbdap26

HPA/TKGridWorkNode

gbbdap27

HPA/TKGridWorkNode

gbbdap19

SAS HPDMgbhpap01

SAS/ACCESSTO HADOOP

SSH

NM/DNNM/DN NM/DN NM/DNNM/DN NM/DN NM/DN NM/DN

NN

NM/DN

EP Embedded Process

EP EP EP EP EP EP EP EP EP

EP EP EP EP EP EP EP EPEP

BANK EUROPE REFERENCE DIAGRAM

Page 44: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Data exploration at massive scale

Intuitive visual

analytics

SAS® VISUAL ANALYTICS EXPLORER

Page 45: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Descriptive and Predictive Modeling

Model

comparison

Dynamic group-by processing

SAS® VISUAL STATISTICS

Page 46: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

In-Memory Statistics for

Hadoop:

Programming interface for SAS model

development

SAS® IN-MEMORY STATISTICS FOR HADOOP

Page 47: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyr igh t © 2012 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

BANK EUROPE THINGS OF NOTE

•  1. SAS was not licensed for all nodes in the BDA

•  SAS EP jobs scheduled by YARN will have full visibility to data across the cluster •  SASHDAT (SAS High Performance Data Binary Format) needs data sets to stay on

the nodes licensed for SAS; some attention with Hadoop Balancer required.

•  2. SAS was not licensed for all cores on the nodes it was licensed for

•  SAS Licensing Posture has improved – you can license say 200 cores on a 1000 core cluster and control limits with CGROUPS and YARN

Page 48: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.|

Thingsyoumightnotknow

Page 49: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

49

SPD Engine with Hadoop §  Support for running on MapR 4.0.2

§  Support for Code Accelerator

§  Enhanced WHERE pushdown: AND, OR, NOT, parenthesis, range operators and in-lists

§  Parallel write support can improve write performance up to 40%

§  Optionally uses Apache Curator/Zookeeper as a distributed lock server. No more physical lock files.

libname spdat spde '/user/dodeca' hdfshost=default;

Page 50: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

50 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Traditional Grid Manager Grid

control node Compute nodes

Shared File System

Run SAS job

LSF

YARN Resource Manager

YARN NM

YARN NM

YARN NM

Local Disks

Shared File System

Run SAS job

SAS GRID MANAGER FOR HADOOP

Leverage the processing power of the cluster •  SAS runs completely in Hadoop

cluster •  YARN with Oozie prioritizes and

schedules SAS jobs in cluster •  All existing SAS Grid Manager

integrated (e.g. EM) capabilities •  Reduction of SAN storage •  No separate SAS compute tier

Page 51: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

51 Copyr igh t © 2013 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

ORACLE ENGINEERED SYSTEMS FOR

SuperCluster ExaData ExaLogic Virtual

Compute

Appliance

ZFS

Storage

Appliance

Database

Backup, Recovery,

Logging Appliance

Big Data

Appliance

Page 52: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,

Copyright©2015,Oracleand/oritsaffiliates.Allrightsreserved.| 52

Page 53: Simplifying Big Data - · PDF fileBig Data – Dream IT. Build IT. Realize IT Paul Kent, Vice President, Big Data, SAS Andy Mendelsohn, Senior Vice President, Database Server Technologies,