Hadoop Infrastructure @Uber Past , Present and Future

Download Hadoop Infrastructure @Uber Past , Present and Future

Post on 31-Dec-2016

214 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>U B E R | Data </p><p>HadoopInfrastructure@UberPast,PresentandFutureMayankBansal</p></li><li><p>U B E R | Data </p><p>Transporta=onasreliableasrunningwater,everywhere,foreveryone</p><p>UbersMission</p><p>75+Countries 500+Ci=es</p><p>Andgrowing</p></li><li><p>U B E R | Data </p><p>HowUberworks</p></li><li><p>U B E R | Data </p><p>HowUberworks</p></li><li><p>U B E R | Data </p><p>HowUberworks</p></li><li><p>U B E R | Data </p><p>DataDrivenDecisions</p></li><li><p>U B E R | Data </p><p>DataInfraOnceUpona8me..(2014)</p><p>Kafka Logs </p><p>Key-Val DB </p><p>RDBMS DBs </p><p>S3 </p><p>Applica=ons</p><p>ETL </p><p>BusinessOps</p><p>A/BExperiments</p><p>Adhoc Analytics </p><p>CityOps</p><p>Vertica DataWarehouse</p><p> Data</p><p>Science</p><p>EMR</p></li><li><p>U B E R | Data </p><p>DataInfrastructureToday</p><p>Kafka8 Logs </p><p>Schemaless DB </p><p>SOA DBs </p><p>Service Accounts </p><p>ETL MachineLearning</p><p>Experimenta=on</p><p>Data Science </p><p>Adhoc Analytics Ops/DataScience</p><p>HDFS </p><p>CityOpsDataScience</p><p>Spark|PrestoHive</p></li><li><p>FewTakeaways</p><p> StrictSchemaManagement BecauseourlargestdataaudienceareSQL</p><p>Savvy!(1000sofUberOps!) SQL=StrictSchema</p><p> BigDataProcessingToolsUnlocked-Hive,PrestoandSpark MigrateSQLsavvyusersfromVer=catoHive</p><p>&amp;Presto(1000sofOps&amp;100sofdatascien=sts&amp;analysts)</p><p> Sparkformoreadvancedusers-100sofdatascien=sts</p></li><li><p>U B E R | Data </p><p>HadoopEvolu8on@ebay</p><p>2014</p><p>1XNodes1XPB</p><p>2015 10X Nodes 4X PB Data 3000+ node 30,000+ cores 50+ PB </p><p>2016 90X Nodes </p><p> 40X PB Data </p><p>HadoopEvolu8on@Uber</p></li><li><p>U B E R | Data </p><p>HadoopClusterU=liza=on</p><p> Overprovisioningforthepeakloads.</p><p> Overcapacityforan=cipa=onoffuturegrowth</p></li><li><p>U B E R | Data </p><p>HadoopEvolu8on@ebay</p><p>20140Nodes</p><p>2015 X Nodes </p><p>2016</p><p> 300XNodes</p><p>MesosEvolu8on@Uber</p></li><li><p>U B E R | Data </p><p>MesosClusterU=liza=on</p><p> Overprovisioningforthepeakloads</p><p> Overcapacityforan=cipa=onoffuturegrowth</p></li><li><p>U B E R | Data </p><p>EndGoal</p><p>Online</p><p>Presto</p></li><li><p>U B E R | Data </p><p>Whatweneed?</p><p>GLOBALVIEWOFRESOURCES</p></li><li><p>U B E R | Data </p><p>AvailableResourceManagers</p></li><li><p>U B E R | Data </p><p>MesosvsYARN</p><p>YARN MESOSSingleLevelScheduler TwoLevelSchedulerUseCgroupsforisola=on UseCgroupsforIsola=onCPU,Memoryasaresource CPU,MemoryandDiskasa</p><p>resource</p><p>WorkswellwithHadoopworkloads Workswellwithlongerrunningservices</p><p>YARNsupport=mebasedreserva=ons</p><p>Mesosdoesnothavesupportofreserva=ons</p><p>Dominantresourcescheduling Schedulingisdonebyframeworksanddependsoncasetocasebasis</p><p>ScalesBegerSimilarIsola=on</p><p>Diskisbeger</p><p>ThisisImportant</p><p>ImpforbatchSLAsBegerforbatch</p></li><li><p>U B E R | Data </p><p>Lets8edthemtogether</p><p>YARNisgoodforHadoopMesosisgoodforLongerRunningServices</p><p>InaNutshell</p></li><li><p>U B E R | Data </p></li><li><p>U B E R | Data </p><p> MyriadisMesosFrameworkforApacheYARN</p><p> MesosmanagesDataCenterresources YARNmanagesHadoopworkloads</p><p> Myriad GetsresourcesfromMesos LaunchesNodeManagers</p></li><li><p>U B E R | Data </p><p> YARNwillhandleresourceshanded</p><p>overtoit. Mesoswillworkonrestoftheresources</p><p>MyriadsLimita8onsSta=cResourcePar==oning</p></li><li><p>U B E R | Data </p><p> YARNwillneverbeabletodooversubscrip=on. NodeManagerwillgoaway Fragmenta=onofresources</p><p> Mesosoversubscrip=oncankillYARNtoo</p><p>MyriadsLimita8onsResourceOverSubscrip=on</p></li><li><p>U B E R | Data </p><p> NoGlobalQuotaEnforcement</p><p> NoGlobalPriori=es</p><p>MyriadsLimita8ons</p></li><li><p>U B E R | Data </p><p> Elas=cResourceManagement</p><p> BinPacking Stability</p><p> LongList</p><p>MyriadsLimita8ons</p></li><li><p>U B E R | Data </p><p>UnifiedScheduler</p></li><li><p>U B E R | Data </p><p>HighLevelCharacteris8cs</p><p> GlobalQuotaManagement</p><p> CentralSchedulingpolicies</p><p> Oversubscrip=onforbothOnlineandBatch</p><p> Isola=onandbinpacking</p><p> SLAguaranteesatGlobalLevel</p></li><li><p>U B E R | Data </p><p>UnifiedScheduler</p></li><li><p>U B E R | Data </p><p>FewTakeaways</p><p> Weneedoneschedulinglayeracrossallworkloads</p><p> Par==oningresourcesarenotgood Atleastcansave30%resources</p><p> StabilityandsimplicitywinsinProduc=on Mul=LevelofresourceManagementandschedulingwillnotbescalable</p></li><li><p>U B E R | Data </p></li><li><p>U B E R | Data </p><p>Ques=ons?</p><p>mabansal@uber.commayank@apache.org</p></li><li><p>U B E R | Data </p><p>ThankYou!!!</p></li></ul>

Recommended

View more >