spark performance
DESCRIPTION
Spark Performance. Patrick Wendell Databricks. About me. Work on performance benchmarking and testing in Spark C o-author of spark- perf Wrote instrumentation/UI components in Spark. This talk. Geared towards existing users Current as of Spark 0.8.1. Outline. Part 1: Spark deep dive - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/1.jpg)
Patrick WendellDatabricks
Spark Performance
![Page 2: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/2.jpg)
About meWork on performance benchmarking and testing in SparkCo-author of spark-perf Wrote instrumentation/UI components in Spark
![Page 3: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/3.jpg)
This talkGeared towards existing usersCurrent as of Spark 0.8.1
![Page 4: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/4.jpg)
OutlinePart 1: Spark deep divePart 2: Overview of UI and instrumentationPart 3: Common performance mistakes
![Page 5: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/5.jpg)
Why gain a deeper understanding?spendPerUser = rdd.groupByKey().map(lambda pair: sum(pair[1])).collect()
spendPerUser = rdd.reduceByKey(lambda x, y: x + y).collect()
(patrick, $24), (matei, $30), (patrick, $1), (aaron, $23), (aaron, $2), (reynold, $10), (aaron, $10)…..
RDD
Copies all data over the network
Reduces locally before shuffling
![Page 6: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/6.jpg)
Let’s look under the hood
![Page 7: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/7.jpg)
How Spark worksRDD: a parallel collection w/ partitionsUser application creates RDDs, transforms them, and runs actionsThese result in a DAG of operatorsDAG is compiled into stagesEach stage is executed as a series of tasks
![Page 8: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/8.jpg)
Examplesc.textFile("/some-hdfs-data")
mapmap reduceByKey collecttextFile
.map(line => line.split("\t")).map(parts => (parts[0], int(parts[1])))
.reduceByKey(_ + _, 3).collect()
RDD[String]RDD[List[String]]
RDD[(String, Int)]Array[(String, Int)]
RDD[(String, Int)]
![Page 9: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/9.jpg)
Execution Graph
mapmap reduceByKey collecttextFile
map
Stage 2Stage 1
map reduceByKey collecttextFile
![Page 10: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/10.jpg)
Execution Graph
map
Stage 2Stage 1
map reduceByKey collecttextFile
Stage 2Stage 1read HDFS splitapply both mapspartial reducewrite shuffle data
read shuffle datafinal reducesend result to driver
![Page 11: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/11.jpg)
Stage execution
Create a task for each partition in the new RDDSerialize taskSchedule and ship task to slaves
Stage 1Task 1Task 2Task 3Task 4
![Page 12: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/12.jpg)
Task executionFundamental unit of execution in Spark- A. Fetch input from InputFormat or a shuffle- B. Execute the task- C. Materialize task output as shuffle or driver result
Execute taskFetch input
Write output
PipelinedExecution
![Page 13: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/13.jpg)
Spark Executor
Execute taskFetch input
Write output
Execute taskFetch input
Write output
Execute taskFetch input
Write outputExecute task
Fetch input
Write outputExecute task
Fetch input
Write output
Execute taskFetch input
Write output
Execute taskFetch input
Write output
Core 1
Core 2
Core 3
![Page 14: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/14.jpg)
Summary of ComponentsTasks: Fundamental unit of workStage: Set of tasks that run in parallelDAG: Logical graph of RDD operationsRDD: Parallel dataset with partitions
![Page 15: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/15.jpg)
Demo of perf UI
![Page 16: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/16.jpg)
Where can you have problems?1. Scheduling and launching
tasks2. Execution of tasks3. Writing data between
stages4. Collecting results
![Page 17: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/17.jpg)
1. Scheduling and launching tasks
![Page 18: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/18.jpg)
Serialized task is large due to a closurehash_map = some_massive_hash_map()
rdd.map(lambda x: hash_map(x)) .count_by_value()
Detecting: Spark will warn you! (starting in 0.9…)
FixingUse broadcast variables for large objectMake your large object into an RDD
![Page 19: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/19.jpg)
Large number of “empty” tasks due to selective filterrdd = sc.textFile(“s3n://bucket/2013-data”) .map(lambda x: x.split(“\t”)) .filter(lambda parts: parts[0] == “2013-10-17”) .filter(lambda parts: parts[1] == “19:00”)
rdd.map(lambda parts: (parts[2], parts[3]).reduceBy…
Detecting Many short-lived (< 20ms) tasksFixingUse `coalesce` or `repartition` operator to shrink RDD number of partitions after filtering:rdd.coalesce(30).map(lambda parts: (parts[2]…
![Page 20: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/20.jpg)
2. Execution of Tasks
![Page 21: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/21.jpg)
Tasks with high per-record overheadrdd.map(lambda x: conn = new_mongo_db_cursor() conn.write(str(x)) conn.close())
Detecting: Task run time is highFixingUse mapPartitions or mapWith (scala)rdd.mapPartitions(lambda records: conn = new_mong_db_cursor() [conn.write(str(x)) for x in records] conn.close())
![Page 22: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/22.jpg)
Skew between tasksDetectingStage response time dominated by a few slow tasks
FixingData skew: poor choice of partition key Consider different way of parallelizing the problem Can also use intermediate partial aggregations
Worker skew: some executors slow/flakey nodes Set spark.speculation to true Remove flakey/slow nodes over time
![Page 23: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/23.jpg)
3. Writing data between stages
![Page 24: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/24.jpg)
Not having enough buffer cachespark writes out shuffle data to OS-buffer cache
Detectingtasks spend a lot of time writing shuffle data
Fixingif running large shuffles on large heaps, allow several GB for buffer cash
rule of thumb, leave 20% of memory free for OS and caches
![Page 25: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/25.jpg)
Not setting spark.local.dirspark.local.dir is where shuffle files are written
ideally a dedicated disk or set of disks
spark.local.dir=/mnt1/spark,/mnt2/spark,/mnt3/spark
mount drives with noattime, nodiratime
![Page 26: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/26.jpg)
Not setting the number of reducersDefault behavior: inherits # of reducers from parent RDDToo many reducers: Task launching overhead becomes an
issue (will see many small tasks)Too few reducers: Limits parallelism in cluster
![Page 27: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/27.jpg)
4. Collecting results
![Page 28: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/28.jpg)
Collecting massive result setssc.textFile(“/big/hdfs/file/”).collect()
FixingIf processing, push computation into SparkIf storing, write directly to parallel storage
![Page 29: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/29.jpg)
Advanced ProfilingJVM Utilities:jstack <pid> jvm stack tracejmap –histo:live <pid> heap summary
System Utilities:dstat io and cpu statsiostat disk statslsof –p <pid> tracks open files
![Page 30: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/30.jpg)
ConclusionSpark 0.8 provides good tools for monitoring performance
Understanding Spark concepts provides a major advantage in perf debugging
![Page 31: Spark Performance](https://reader035.vdocuments.mx/reader035/viewer/2022081507/568166c9550346895ddad68d/html5/thumbnails/31.jpg)
Questions?