design patterns leveraging spark in pdi - pentahoworld … · 2017-11-06 · design patterns...

17
Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Upload: duongduong

Post on 23-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

DesignPatternsLeveragingSparkinPDIChrisSkirdePentaho DirectorofSalesEngineering,HitachiVantaraRakeshSahaPentahoSeniorProductManager,HitachiVantara

Page 2: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

QuizTime!

• WhatisSpark?A. Agoodwaytostartafire.B. Necessaryforawellrunninginternalcombustionengine.C. Fastandgeneralpurposeengineforlarge-scaledataprocessing.D. Alloftheabove.

• TrueorFalse,PentahosupportsSpark?• WhoisusingSparktoday(withorwithoutPentaho)?

Page 3: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Agenda

• IntroductiontoSpark• Commondesignpatterns

• HowtoleverageSparkwithPentaho

Page 4: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

IntroductiontoSpark

• Whyareweinterested?

• Whatisitreally?

• What’sbeendone?

Page 5: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

SparkApplicationArchitecture

Daemon

PDI/Server

Page 6: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

WhatDoThoseApplicationsHaveinCommon?

Page 7: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

CommonDesignPatterns

• Filter/Organize• Join• Sum

• Transform/Enrich

• Query• MachineLearning/DataScience

Page 8: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Filter/Organize

Page 9: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Join

Page 10: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Sum(andOtherAggregations)

Page 11: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Transform/Enrich

• Anystepyoulike!

Page 12: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Query– Easy!

• ClouderauseHive-on-SparkwithHive2• HortonworksuseSparkSQL viaSimba

Page 13: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

MachineLearning/DataScience

Page 14: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Recap

Whatwecoveredtoday:

• ReviewedwhatSparkisandwhyorganizationsareadoptingit• Discussedseveralcommondataintegrationdesignpatterns

• LinkedthosedesignpatternstoPentahofeaturesforyoutotry

Page 15: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Questions?

Page 16: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

NextSteps

Wanttolearnmore?

• “MeettheExperts”MattCastersandMarkHall!

• AdaptiveExecutionLayerhttp://www.pentaho.com/blog/introducing-adaptive-execution-layer-spark-architecture

• SQLonSparkhttp://www.pentaho.com/blog/operationalize-spark-big-data-newest-enhancements

Page 17: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large