profiling spark applications - schedschd.ws/.../87/apache-big-data-2017-spark-profiling.pdf · 5...

26
1 © 2016, Conversant, LLC. All rights reserved. PROFILING SPARK APPLICATIONS APACHE BIG DATA NORTH AMERICA 2017 PRESENTED BY: JAYESH THAKRAR SENIOR SOFTWARE ENGINEER

Upload: phamduong

Post on 04-Jun-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

1 © 2016, Conversant, LLC. All rights reserved.

PROFILING SPARK APPLICATIONS

APACHE BIG DATA NORTH AMERICA 2017 PRESENTED BY:JAYESH THAKRARSENIOR SOFTWARE ENGINEER

Page 2: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

2

The Quest For Spark Profiling...

Page 3: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

3

IT ALL BEGAN WITH THE SPARK WEB UI...

Page 4: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

4

THE MISSING SUMMARY...

Page 5: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

5

SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG...

• SimplebatchtocomplexETLandhighlyiterativeML

• Conditionaldataflows

• Multipleinputs,multipleoutputs

• Executors,jobs,stages,tasks

Page 6: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

6

BUT STILL NEEDED FOROPERATIONS, TUNING AND TROUBLESHOOTING

• Whydidmyapptake2xnormaltimelastnight?

• ImpactofJVMtuningandotherchanges?

• Impactofinput/outputformats?

• Impactofconfigurationorotherchanges?

Page 7: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

7

BTW, WHAT'S PROFILING?

Source:https://en.wikipedia.org/wiki/Profiling_(computer_programming)

Page 8: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

8

Introducing.....Your Events

Page 9: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

9

WHAT DRIVES SPARK UI AND HISTORY SERVER?

Page 10: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

10

SPARK EVENTS

TaskTaskDriver+Listener

EventBusConfigurationParameters

spark.eventLog.enabled =true

spark.eventLog.dir =hdfs://<dir>

{"Event":"SparkListenerLogStart","SparkVersion":"1.6.0"}{"Event":"SparkListenerBlockManagerAdded","BlockManagerID":{"ExecutorID":"driver","Host":"10.110.104.43","Port":33287},"MaximumMemory":556038881,"Timestamp":1481061154984}

SampleEvents

Page 11: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

11

SPARK CONFIGURATION

Page 12: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

12

SPARK EVENTS FRAMEWORK

• Eventlogging=built-in– athreadinthedriver

• Abilitytocreateandregistercustomlisteners

sparkContext.addSparkListener(listener:SparkListenerInterface)

• StatsReportListener =Tasksummaryaftereachstage

• StatsReportListener =Alsoavailableforstreaming

• Seeorg.apache.spark.scheduler.SparkListener fordetails

Page 13: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

13

And That Leads To...

Page 14: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

14

SPARKPROFILER PROJECT ON GITHUBhttps://github.com/conversant/spark-profiler

Page 15: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

15

CONVERSANT: MORE TOOLS AND STUFF....

http://engineering.conversantmedia.com/posts/https://github.com/conversant

Page 16: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

16

SPARKPROFILER PROJECT OVERVIEW

ApplicationHierarchy

Application

Job

Stage

Task

Page 17: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

17

SPARKPROFILER PROJECT: PARSER

Page 18: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

18

SPARKPROFILER PROJECT: PROFILER

Page 19: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

19

SPARKPROFILER PROJECT: SUMMARYGENERATOR

Page 20: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

20

KEY ENTITY IN SPARKPROFILER : TASK

• taskDuration

• peakMemory

• inputRows

• outputRows

• resultSize

• bytesRead

• recordsRead

• shuffleBytesWritten

• shuffleRecordsWritten

• remoteBlocksFetched

• localBlocksFetched

• remoteBytesRead

• localBytesRead

• totalRecordsRead

MetricsIdentifyingAttributes

• ApplicationName

• ApplicationId

• JobId

• StageId

• StageAttemptId

• TaskId

• AttemptId

Page 21: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

21

PROFILING OVERVIEW

Runsparkapplication

{"Event":"SparkListenerLogStart","SparkVersion":"1.6.0"}{"Event":"SparkListenerBlockManagerAdded","BlockManagerID":{"ExecutorID":"driver","Host":"10.110.104.43","Port":33287},"MaximumMemory":556038881,"Timestamp":1481061154984}

SummaryGenerator

SavetoDatastore

EventsOutputFile

InteractiveAnalysis

Page 22: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

22

SUMMARY GENERATOR

• Programtoanalyze

(profile)jobs,stages

andtasksandprovide

summary

• Summaryofall

metricsacrossalltasks

Page 23: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

23

INTERACTIVE ANALYSIS

• Troubleshooting:Whichjobs/stagesareexpensive?Comparetime,input/output,shufflevolume,memory,etc.betweendifferentjobsandstagesofasinglerun

• Tuning:Comparedifferentruns• Impactofchanginginput/outputformats• Runtimevariations• Performancetuningandoptimization

e.g.JVMtuning,parallelism,compression

Page 24: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

24

POTENTIAL FUTURE ENHANCEMENTS

• RDDprofiling/analysis

• Dynamicexecutors

• Handlingfailedjobs,stages,tasks

• Analysisofstreamingjobs

• Automaticadvisorfortuning/optimization

• Visualizations– e.g.usingZepplin

• Saveeventsandsummaryfor

§ Historicalanalysis

§ Clusterutilizationoverview

§ Sizingandprediction

Page 25: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

25

Questions?

Page 26: PROFILING SPARK APPLICATIONS - Schedschd.ws/.../87/Apache-Big-Data-2017-Spark-Profiling.pdf · 5 SUMMARY FOR SPARK APPS? BUT SPARK APPS ARE A MIXED BAG... • Simple batch to complex

26