analytics modernization: configuring sas® grid manager for hadoop

21
1 © Hortonworks Inc.2011 – 2016. All Rights Reserved Analytics Modernization: Configuring SAS® Grid Manager for Hadoop Mark Lochbihler, Channels Partner Engineering, Hortonworks April 21, 2017

Upload: hortonworks

Post on 21-Apr-2017

45 views

Category:

Technology


0 download

TRANSCRIPT

1 ©HortonworksInc.2011–2016.AllRightsReserved

AnalyticsModernization:ConfiguringSAS®GridManagerforHadoopMarkLochbihler,ChannelsPartnerEngineering,Hortonworks

April21,2017

2 ©HortonworksInc.2011–2016.AllRightsReserved

PresenterMarkLochbihler- Hortonworks,Inc.

MarkisaPrincipalArchitectwith27yearsofSASexperience,havingspent17yearswithinFinancialServices.HeiscurrentlyinhisfourthyearatHortonworksandisfocusedonintegratingtheHadoopecosystemwithstrategicpartnerecosystemproductsandsolutions.MarkhasaBSinComputerSciencefromNorthCarolinaStateUniversityandalsoholdsaSixSigmaBlackBelt.

MarkisaPrincipalArchitectwith27yearsofSASexperience,havingspent17yearswithinFinancialServices.HeiscurrentlyinhisfourthyearatHortonworksandisfocusedonintegratingtheHadoopecosystemwithstrategicpartnerecosystemproductsandsolutions.MarkhasaBSinComputerSciencefromNorthCarolinaStateUniversityandalsoholdsaSixSigmaBlackBelt.

3 ©HortonworksInc.2011–2016.AllRightsReserved

Clickstream Web&Social

Geolocation Sensor& Machine

ServerLogs

Unstructured

SOUR

CES

Existing Systems

ERP CRM SCM

ANAL

YTIC

S

Data Marts

Business Analytics

Visualization& Dashboards

ANAL

YTIC

S

Applications Business Analytics

Visualization& Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-TimeBatch SAS GridManager

Batch BatchMPP

EDW

Figure1:AHadoopClusterrunningBatch,InteractiveandRealTimeEngines,includingSASGridManager.

HadoopandYARN101

4 ©HortonworksInc.2011–2016.AllRightsReserved

Agenda

à Whyà Architectureà RealworldSizingà ConfigurationDetailsà Demo- UserPerspectiveà MigrationConsiderationsà CalltoAction

5 ©HortonworksInc.2011–2016.AllRightsReserved

WhyMoveSASWorkloadstoHadoop?

à Lowerinfrastructureandstoragecostsà Optimizeperformanceà Minimizeadministrativeoverhead

6 ©HortonworksInc.2011–2016.AllRightsReserved

Figure2:SASGridManagerforHadoopConceptualArchitecture(Reference:SASGridManagerforHadoop)

7 ©HortonworksInc.2011–2016.AllRightsReserved

KEYSASGRIDARCHITECTURECOMPONENTS

à SASMetadataServer– ASASservicesupporting,amongotherobjects,thelogicaltophysicalmappingofSASLogicalServerstoYARN.Inourexamples,wewillbeusingthelocalSASServer,SASGrid.

à SASGridControlServer– ASASservicerunningontheYARNResourceManagernode.ItiscalledbySASclientstocommunicatewithYARNResourceManagertonegotiateresourcesforSASjobs.

à SASObjectSpawner – ASASServicerunningontheYARNResourceManagernode.ItisusedtolaunchSAScontainswithYARN.

à SASClients – includesSASbatchjobs,SASGSUB(abatchgridutility),andinteractiveClientslikeSASEnterpriseGuide.SASClientswill“Connect”and“Disconnect”fromaSASLogicalServer,likeSASGrid,definedinSASMetadataandshownaboveinFigure4.

8 ©HortonworksInc.2011–2016.AllRightsReserved

KEYHADOOPINTEGRATIONPOINTSFORSASGRID

à YARNResourceManager – AHadoopYARNMasterServiceresponsibleforcontrollingglobalHadoopclusterresourceusage.ResourceManagerenablesmulti-tenancyandSLAs.ItisalsoresponsibleformonitoringNodeManagerState,submittingApplicationMasterrequests,verifyingcontainerlaunchandmonitoringApplicationMasterstate.

à YARNNodeManager – ThisHadoopYARNWorkerNodeServicemanageslocalresourcesonbehalfoftherequestingservice.ItalsotracksnodehealthandcommunicatesstatustotheResourceManager.

à YARNCapacityScheduler – AHadoopYARNservice,whichcanbeconfiguredtoprovideJobSchedulingpoliciesforSLAs,Users,Groups,andResources.

à HadoopDataNodes – HadoopDistributedFileSystem(HDFS)storagenodes.à KerberosService – TheHadoopclustermustbeKerborized.

9 ©HortonworksInc.2011–2016.AllRightsReserved

HADOOPMASTERNODEDECISIONS

à ASASGridControlServerandSASObjectSpawner mustbedeployedonthesameHadoopMasterNodeastheYARNResourceManager.

10 ©HortonworksInc.2011–2016.AllRightsReserved

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

Job1Container1.1

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager

Job1Container1.2

Job1Container 1.3

Job1AM 1 SASAM 2

SAS Grid Manager w YARN Architecture Overview

SASClient• SASGSUB• SASBatch• SASEG

YARNResourceManager

YARNCapacityScheduler

SASGridControlServer

SASObjectSpawner

SAS MetadataServer

SASContainer2.1

Figure3: SASGridManagerforHadoopwYARNArchitectureOverview

11 ©HortonworksInc.2011–2016.AllRightsReserved

HADOOPWORKERNODEDECISIONS

à SASHOMEandSASCONFIGForeachHadoopWorkerNodewhichisacandidatetorunSASjobsmustbeconfiguredsothatSASHOMEandSASCONFIGareavailable.

à SASWORKandSASUTILItiscriticalthateachHadoopWorkernodewhichisacandidatetorunSASjobsisconfiguredcorrectly.AlargepartoftheI/OrequiredwhenrunningSASanalyticsistothescratchortemporarylocationsofSASWORKandSASUTIL.SASrequiredI/Othroughputforthesefilesystems,toprovidethenecessaryperformancetoaheavilyloadedsystem,is100MB/sec/core.AdequatesizingforSASWORKisalsonecessary.

à TraditionalStorageandComputeVerseComputeOnlyWorkerNodesWithinHadoop,itisacommonpracticetohavedualpurposeworkernodeswhichrunmathorprogramsnearonthesamenodeswheretheHadoopdataresides.WithHadoop2.x,theconceptofdedicatedComputeOnlyHadoopWorkerNodesisanoption.ForSASGridManagerforHadoop,bothoptionsareanoption.ForComputeOnly,theseHadoopWorkerNodeswillnolongerhosttherequiredservicesanddataforHDFS,givingmorecomputingresourcesdedicatedtotheprogramsrunningonthesenodes.ThetradeoffforComputeOnlyHadoopNodesisthelossofHDFSdatalocality.YoursitesSASworkloadrequirementswilldeterminewhichtypeofWorkerNodestodeployforSASGridManagerforHadoop.

12 ©HortonworksInc.2011–2016.AllRightsReserved

Figure4: ViewofSASManagementConsole,withexpandedSASGrid LogicalServer.

13 ©HortonworksInc.2011–2016.AllRightsReserved

REALWORLDCONFIGURATIONEXAMPLE

TotalRAMPerCluster

Node

AvailableContainerRAM

PerNode

#WorkerNodesinCluster

TotalContainer

RAMAvailable

AmountofClusterRAMAllocatedto

SASQueue

TotalContainerRAMforSAS

Queue

256GB 192GB 28 5.376TB 50% 2.688TB

Average#ofBatchJobsorInteractiveSessionsper

SASUser

#ContainersPerJoborSession

Average#ofAdditionalHadoopContainersSpawnedfrominitial

SASJobContainer

AverageTotal#ofContainersperSAS

user

2 2 4 8

Table1: TotalClusterYARNContainerRAMAvailableforSASUsers

Table2: AverageNumberofContainersperSASUsers

14 ©HortonworksInc.2011–2016.AllRightsReserved

REALWORLDCONFIGURATIONEXAMPLE(Continued)

SASAppType(UserType)

ContainerSize

Anticipated%ofUsersTypeonServer

AvailableClusterMemoryforSASjobs

Max#ofContainers

Average#ContainersPerSASUser

Total#ofSASUsers

Low(General/Analyst) 2GB 70% 1.881TB 940 8 117

Medium 4GB 20% 537GB 134 8 16

High 8GB 10% 268GB 33 8 4Totals 2.688TB 1107 134

Table3: BreakdownofSASApplicationTypestobeconfiguredforSASUsers

15 ©HortonworksInc.2011–2016.AllRightsReserved

YARN Capacity SchedulerExample: 50% of Cluster RAM allocated to SAS Queue

ResourceManager

Scheduler

root

Adhoc30%

SAS50%

Mrkting20%

Dev10%

Reserved20%

Prod70%

Prod80%

Dev20%

P070%

P130%

Capacity Scheduler

HierarchicalQueues

REALWORLDCONFIGURATIONEXAMPLE(Continued)

Figure5: YARNCapacitySchedulerLogicalView- SASQueue- 50%HadoopClusterRAM

16 ©HortonworksInc.2011–2016.AllRightsReserved

REALWORLDCONFIGURATIONEXAMPLE(Continued)

Figure6: YARNCapacitySchedulerAdminView- SASQueue- 50%HadoopClusterRAM

17 ©HortonworksInc.2011–2016.AllRightsReserved

(SASGridPolicyFile- GridApplicationType“Low”section)<?xmlversion="1.0"encoding="UTF-8"standalone="yes"?>

<GridPolicy defaultAppType="low">

<GridApplicationType name="low">

<jobname>SASLow</jobname>

<priority>10</priority>

<nice>0</nice>

<memory>2048</memory>

<vcores>1</vcores>

<runlimit>480</runlimit>

<queue>sas94_queue</queue>

<hosts>

<hostGroup>sas94_work</hostGroup>

</hosts>

</GridApplicationType>

………….………….Continuedinpaper………………………………..

REALWORLDCONFIGURATIONEXAMPLE(Continued)

18 ©HortonworksInc.2011–2016.AllRightsReserved

REALWORLDCONFIGURATIONEXAMPLE(Continued)

Figure7: ConfiguringSASMetadataGroupstoSASGridApplicationTypes

19 ©HortonworksInc.2011–2016.AllRightsReserved

ADAYINTHELIFEOFASASUSERLEVERAGESASGRID

DEMO

20 ©HortonworksInc.2011–2016.AllRightsReserved

SASWORKLOADMIGRATIONCONSIDERATIONS

à ComplimentyourexistingSASInfrastructure- itsnotaforkliftmigration

à IdentifySASStorageCostSavingOpportunities• libname tohive• libname tohdfs• filenametohdfs

à IdentifySASWorkloadComputeMigrationOpportunities• SASJobsthatwillbeusinglargedatasetsstoredinHadoopareidealcandidates

• SASJobsthatwouldbenefitfromSASInDatabasePushDowntoHive

21 ©HortonworksInc.2011–2016.AllRightsReserved

CalltoAction

à LearnMore:Clickattachmentstabforlinksà GetStarted:DownloadtheHortonworksSandbox- clickGetStartedon

hortonworks.comà ContactUs:CallorfilloutthecontactformonGetStarted