enterprise data warehouse optimization: 7 keys to success

31
1 © Hortonworks Inc.2011 – 2016. All Rights Reserved 1 © Hortonworks Inc.2011 – 2017. All Rights Reserved Scott Gnau CTO, Hortonworks @Scott_Gnau David Loshin, President, Knowledge Integrity [email protected]

Upload: hortonworks

Post on 21-Jan-2018

912 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Enterprise Data Warehouse Optimization: 7 Keys to Success

1 ©HortonworksInc.2011–2016.AllRightsReserved1 ©HortonworksInc.2011–2017.AllRightsReserved

ScottGnau CTO,Hortonworks@Scott_GnauDavidLoshin,President,[email protected]

Page 2: Enterprise Data Warehouse Optimization: 7 Keys to Success

LegacyArchitecturesImpedePerformance

EDW

CapitalCosts

OperationsCosts

Scalability

AnalyticFlexibility

TimetoValue

DataQuality

DataVariety

©2017Knowledge Integrity,[email protected] (301) 754-6350 2

• Datawarehouseperformance isnolongersolelydefinedintermsofcomputationspeed

• Optimalperformancereflectstheabilitytomaximizevalueacrossarangeofdimensions

• Thestaticdesignoflegacyplatformshasnotkeptpacewithgrowingdesireforbusinessintelligenceandanalytics

Page 3: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step1:LeverageHorizontalScalability• DWappliancesrequire

significantcapitalinvestment– Systemmustbesizedtomeet

anticipatedneeds– Allowsforunusedcapacityat

beginning– Requiresincreased“step-up”

investmentsonregularintervals• Hadoopfinessesthischallenge

– Reliesoncommoditycomponents

– Startwithwhatyouneed,growwithincreaseddemand

– Introducenewerhardwareseamlessly

– Exploitinnovationstospeedperformance(e.g.,Stinger.next,LowLatencyAnalyticalProcessing)

©2017Knowledge Integrity,[email protected] (301) 754-6350 3

Rackswitch

NameNode

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

Rackswitch

NameNode

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

Rackswitch

NameNode

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

Rackswitch

NameNode

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

DataNode&TaskTracker

Page 4: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step2:AugmentEDWStoragewithHive

• ThevalueofexistingEDWinvestmentscanbeextendedusingaHybridArchitecture

• Hivecontinuestoevolvewithinnovativeperformanceimprovements:– In-memorycachingand

persistentqueryexecutors– Column-orienteddistributed

dataorganization– Improvedsecurityusing

ApacheRanger– SQLACIDMerge

©2017Knowledge Integrity,[email protected] (301) 754-6350 4

HadoopCluster

EDW

Page 5: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step3:IncreaseDataFlexibility

• Conventionaldatawarehousearchitecturesareorganizedusingadimensionalmodel– Factsrepresentevents– Dimensionscharacterizethefacts

• ThedimensionalmodelissuitedtotypicalDWoperations– Aggregationandrolled-upreporting– “Sliceanddice”

• However,thismodelforcesalldataintopredeterminedschema(“schema-on-write”)– Introducesbias,createsconstraintsandlimitsdataflexibility

• Alternative:schema-on-read– Datasetsarecapturedintheirsourceformats– Freesdataconsumerstoapplytheirownorganization– Allowslogicalstructuretobelayeredontopofdatainsourceformat– Enablesuseofcreativealgorithmsforanalytics,textmining,andmachinelearning

©2017Knowledge Integrity,[email protected] (301) 754-6350 5

Page 6: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step4:UseUnstructuredData

• Datawarehousesareengineeredaroundstructureddata• Manysourcesofincreasingvolumeofunstructureddata

– AppsrunningonInternet-connecteddevicesgeneratetextstreams– Machine-generatedunstructuredcontent– Semi-structuredsources

• Applicationsthatconsumebothstructuredandunstructureddataprovidefullervisibilityintoanalyticalresults

• ToolslikeLucene,Solr,Mahout,andothertextanalyticslibrarieshelptoparseandtagunstructuredtext

©2017Knowledge Integrity,[email protected] (301) 754-6350 6

Ingest

Parse

Tag

Organ

ize

Lucene

Solr

Mahout

Page 7: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step5:DataDiscovery

©2017Knowledge Integrity,[email protected] (301) 754-6350 7

DataIngestion&

Transformation

• Dataimportedintothedatawarehouseishomogenizedandorganizedwithinpredefineddatamodels

• Thisconstrainsdownstreamconsumers

Page 8: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step5:DataDiscovery

©2017Knowledge Integrity,[email protected] (301) 754-6350 8

DataDiscovery&Preparation

DataDiscovery&Preparation

DataDiscovery&Preparation

DataDiscovery&Preparation

DataDiscovery&Preparation

• Datadiscoveryallowseachusertoconfigurethedatafortheirspecializedpurposes

Page 9: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step6:OffloadETLtoHadoop

• 60-70%oftheeffortofdatawarehousingisattributedtoextraction,transformation,andloading(ETL)

• HadoopisanaturalplatformforETLprocessing:– ETLisinherentlydataparallel,enablingfasterexecution– Developmenttimecanbedrasticallyreducedwithfasterdev/test/debugcycle– ResourcescanbedynamicallyapportionedandreleasedwhenETLprocessingiscompleted,

loweringcosts

• ApacheHivesupportsSQLACIDMergewhichhandlesinserts,updates,anddeletesinasinglepass

• Allowsforin-databasetransformationswithoutneedformassiverefreshes

©2017Knowledge Integrity,[email protected] (301) 754-6350 9

Page 10: Enterprise Data Warehouse Optimization: 7 Keys to Success

Step7:OperationalDataGovernance

• Delegatingmoreresponsibilitytotheconsumercommunityposesariskofinconsistentinterpretationanduse

• Instituteoperationaldatagovernancetosupportversioning,lineage,andprovenance– Metadatamanagement– Datalineage– Archivingpolicies– Versioningpolicies– Datasecurityandprotection

• ApacheAtlasisanopensourcecomponentoftheHadoopecosystemthatcapturesdatadefinitions,hierarchicaltaxonomies,dataelementsandtheirrelationships,andlineage

©2017Knowledge Integrity,[email protected] (301) 754-6350 10

Page 11: Enterprise Data Warehouse Optimization: 7 Keys to Success

Modernization:EvolvingtheHybridEDW

• ConventionalRDBMS-baseddatawarehouseshaveservedorganizationswell,butarebeingeclipsedbynewertechnologies

• Scalablesystemsbuiltoncommoditycomponentsarerapidlybeingadoptedforbusinessintelligenceandanalyticsapplications

• OptimizetheEDWusinganevolutionaryapproachtoembracingHadoop:– Expandthestoragefootprint– Increasecomputationalpower– Broadenthescopeofapplicationsupport– Lowercosts

©2017Knowledge Integrity,[email protected] (301) 754-6350 11

Page 12: Enterprise Data Warehouse Optimization: 7 Keys to Success

Questions&Suggestions

• www.knowledge-integrity.com• www.dataqualitybook.com• www.decisionworx.com• Ifyouhavequestions,comments,

orsuggestions,pleasecontactmeDavidLoshin301-754-6350loshin@knowledge-integrity.com

©2017Knowledge Integrity,[email protected] (301) 754-6350 12

Page 13: Enterprise Data Warehouse Optimization: 7 Keys to Success

13 ©HortonworksInc.2011–2016.AllRightsReserved

TheNextGenEDWistheBigDataWarehouseà InForrester’s2016globalsurvey,59%ofrespondentsstatedthatleveragingbigdata

andanalyticswasacriticalorhighpriority.

Page 14: Enterprise Data Warehouse Optimization: 7 Keys to Success

14 ©HortonworksInc.2011–2016.AllRightsReserved

CompaniesAreLookingtoBigDataforEDWOptimization

à 82%of2550+respondentsarelookingtoBigDataforEDWOptimizationratherthanastraightreplacement.– 2016BigDataMaturitySurvey

Page 15: Enterprise Data Warehouse Optimization: 7 Keys to Success

15 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksConnectedDataPlatformsandSolutions

HortonworksConnection

HortonworksSolutions

EnterpriseDataWarehouseOptimization

CyberSecurityandThreatManagement

InternetofThingsandStreamingAnalytics

HortonworksConnectionSubscriptionSupportSmartSense

PremierSupportEducationalServicesProfessionalServices

CommunityConnection

CloudHortonworks DataCloudAWS HDInsight

DataCenterHortonworks DataSuite

HDFHDP

Page 16: Enterprise Data Warehouse Optimization: 7 Keys to Success

16 ©HortonworksInc.2011–2016.AllRightsReserved

DriversofaModernBIInfrastructure

DeeperandBroaderDataSets

CompleteData‘Provenance’

LeadingAnalyticsandTools

Integratenon-EDWdataandEDWdata

TotalCostofOwnership

Page 17: Enterprise Data Warehouse Optimization: 7 Keys to Success

17 ©HortonworksInc.2011–2016.AllRightsReserved

OpenSourceTransformationalImpacttoEDW

UnmatchedEconomicssupportlowcostdata-centerandcloudarchitecturesforEnterpriseApacheHadoop

EliminatesRiskandEnsuresIntegrationpreventsvendorlock-inandspeedsecosystemadoptionofODPi-compliantcore

COSTEFFICIENCY

DATAVARIETY

EDW

PROPRIETARYHADOOP

HORTONWORKSOPENSOURCE

RDBMS

Page 18: Enterprise Data Warehouse Optimization: 7 Keys to Success

18 ©HortonworksInc.2011–2016.AllRightsReserved

But,whyaren’tmorecompaniesrunningtothissolution?

Risky

Hadooprequiresabunchofnewskillsets

It’lltakealongtime

There’stoomuchmanualcodingrequired

It’shardtointegratetomyBItoolstack

Page 19: Enterprise Data Warehouse Optimization: 7 Keys to Success

19 ©HortonworksInc.2011–2016.AllRightsReserved

LegacyEDWSolution

Page 20: Enterprise Data Warehouse Optimization: 7 Keys to Success

20 ©HortonworksInc.2011–2016.AllRightsReserved

UsingHadooptoOptimizetheDataWarehouse

à AugmentEDWwithHive

à OffloadETLtoHadoop

à DataGovernance

Page 21: Enterprise Data Warehouse Optimization: 7 Keys to Success

21 ©HortonworksInc.2011–2016.AllRightsReserved

AugmentcurrentEDWwithHive

HiveLLAPGA:Interactivequeryinseconds,10Xfastjoinperformance

EaseofUseandAdoption:SQLStandardACIDMerge

EnterpriseReadiness:SupportsallTPC-DSQueries

StreamlinedOperations:HiveViews

Page 22: Enterprise Data Warehouse Optimization: 7 Keys to Success

22 ©HortonworksInc.2011–2016.AllRightsReserved

0

5

10

15

20

25

30

35

40

45

50

0

50

100

150

200

250

Speedup(xFactor)

QueryTime(s)(Low

erisBetter)

Hive2withLLAPaverages26xfasterthanHive1

Hive1/TezTime(s) Hive2/LLAPTime(s) Speedup (xFactor)

Hive2withLLAP:26xPerformanceBoostat1TBScale

Page 23: Enterprise Data Warehouse Optimization: 7 Keys to Success

23 ©HortonworksInc.2011–2016.AllRightsReserved

HiveLLAPinHDP2.6:StablePerformancewithHighConcurrency

4xQueries,2.8x

RuntimeDifference

5xQueries,4.6x

RuntimeDifference

Mark ConcurrentQueries

AverageRuntime

5 7.76s

25 36.24s

100 102.89s

Page 24: Enterprise Data Warehouse Optimization: 7 Keys to Success

24 ©HortonworksInc.2011–2016.AllRightsReserved

OffloadETLtoHadoop

à TheProblem:– EDWscanconsumebetween50%and90%of

resourcesjustonETL/ELTtasks.– Thesejobsinterferewithmorebusiness-

criticaltaskslikeBIandadvancedanalytics.

à TheSolution:– HiveandHDPdeliverETLthatscalesto

petabytes.– Economicalscale-outprocessingon

commodityservers.

à TheResult:– BetterSLAsformission-criticalanalytics.– LimitEDWexpansionorretireoldsystems.

ETL/ELT

DATAMART

DATALANDING&

DEEPARCHIVE

CUBEMART

ENDUSER

APPLICATIONS

APPLICATIONS

APPLICATIONS

ENDUSERSANDAPPS

Page 25: Enterprise Data Warehouse Optimization: 7 Keys to Success

25 ©HortonworksInc.2011–2016.AllRightsReserved

DataGovernanceforEDWOptimization

Classification

Prohibition

Time

Location

Policies

PDPResourceCache

Ranger

ManageAccessPoliciesandAuditLogs

TrackMetadataandLineage

AtlasClientSubscriberstoTopic

GetsMetadataUpdates

Atlas

MetastoreTags

Assets

Entitles

Streams

Pipelines

Feeds

HiveTables

HDFSFiles

HBaseTables

EntitiesinDataLake

IndustryFirst:DynamicTag-basedSecurityPolicies

Page 26: Enterprise Data Warehouse Optimization: 7 Keys to Success

26 ©HortonworksInc.2011–2016.AllRightsReserved

UseCase1:Multi-ChannelBehavioralAnalysis

à Industry:MassMedia– Largestbroadcastingandcablecompany

intheworldbyrevenue– Multiplechannels:Cable(set-top-box),

wirelessdevices,streamingprogramming,

– 22million+subscribers(internet&video)

à Results:– Scalability:480Brows,500nodes– 60xqueryperformanceimprovement– Insights:Newinfoimprovenegations– Loyalty:Outreachtocustomersviewing

competitivestreams;▼churn▲revenue

Before After

LeadingMediaCompany

HortonworksHDP

AtScaleIntelligenceServer

HortonworksHDP

Netezza DataMart

ChannelFeeds

Tableau+MSExcel+R

ChannelFeeds

Tableau+MSExcel

Page 27: Enterprise Data Warehouse Optimization: 7 Keys to Success

27 ©HortonworksInc.2011–2016.AllRightsReserved

UseCase2:CampaignPaid-SearchEffectiveness

à Industry:Retail/eCommerce– TopUSdepartmentstore(byrev)– Onlinesales$4B+&growing(11%+total)– 800+departmentstoresnationwide

à Results– Scale:Millionspaidkeywordsanalyzed– Speed:Eliminateextractstep– Insight:Operationalizedclosed-loop

analysisà insightà decisionà action– Impact:Makeandsave$millionsw/

instantbiddecisionsover6-weekseasonà thatdrives60%annualrevenue

Before After

HortonworksHDP

AtScaleIntelligenceServer

HortonworksHDP

Vertica DataMarts

Ad&PaidKeywords

Cognos +Tableau+Excel

Ad&PaidKeywords

Tableau+Excel

LeadingRetailer

Page 28: Enterprise Data Warehouse Optimization: 7 Keys to Success

28 ©HortonworksInc.2011–2016.AllRightsReserved

UseCase3:ClientandPatientAnalysis

à Industry:ManagedHealthCare– MemberofFortune100– Health,life+otherinsuranceproducts– ~52millionmembers;

medical/dental/pharm

à Results– Scalable:BIdirectlyon264+nodesdata– Time: Eliminatedatamovement step– 62xqueryperformanceimprovement– Speed:<2.2secondaveragequerytime– Insight:TableauonHadoopfor1000+– Security:Accesscontrolbyuser;HIPAA

Before After

LeadingManagedHealthcareProvider

HortonworksHDP

AtScaleIntelligenceServer

HortonworksHDP

Netezza DataMart

Client/PatientDetails

Tableau+MSExcel

Client/PatientDetails

Tableau+MSExcel

Page 29: Enterprise Data Warehouse Optimization: 7 Keys to Success

29 ©HortonworksInc.2011–2016.AllRightsReserved

NextStep:

à EveryonewillreceiveafreecopyofForresterWhitePapertitled”TheNext-GenerationEDWIsTheBigDataWarehouse”

à EDWOptimizationwithHDP– http://hortonworks.com/solutions/edw-optimization/– EDWOptimization7minvideo

Page 30: Enterprise Data Warehouse Optimization: 7 Keys to Success

30 ©HortonworksInc.2011–2016.AllRightsReserved

HortonworksConnectedDataPlatformsandSolutions

HortonworksConnection

HortonworksSolutions

EnterpriseDataWarehouseOptimization

CyberSecurityandThreatManagement

InternetofThingsandStreamingAnalytics

HortonworksConnectionSubscriptionSupportSmartSense

PremierSupportEducationalServicesProfessionalServices

CommunityConnection

CloudHortonworks DataCloudAWS HDInsight

DataCenterHortonworks DataSuite

HDFHDP

Page 31: Enterprise Data Warehouse Optimization: 7 Keys to Success

31 ©HortonworksInc.2011–2016.AllRightsReserved

ThankYou