Download - Netflix: Integrating Spark At Petabyte Scale
![Page 1: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/1.jpg)
Netflix: Integrating Spark At Petabyte Scale
Ashwin Shankar Cheolsoo Park
![Page 2: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/2.jpg)
Outline
1. Netflix big data platform 2. Spark @ Netflix 3. Multi-tenancy problems 4. Predicate pushdown 5. S3 file listing 6. S3 insert overwrite 7. Zeppelin, Ipython notebooks 8. Use case (Pig vs. Spark)
![Page 3: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/3.jpg)
Netflix Big Data Platform
![Page 4: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/4.jpg)
Netflix data pipeline
CloudApps
S3
Suro/Kafka Ursula
SSTablesCassandra Aegisthus
Event Data
500 bn/day, 15m
Daily
Dimension Data
![Page 5: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/5.jpg)
Netflix big data platform
DataWarehouse
Service
Tools
Gateways
Prod
Clients
Clusters
Adhoc Prod TestTest
Big Data API/Portal
Metacat
Prod
![Page 6: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/6.jpg)
Our use cases
• Batch jobs (Pig, Hive)• ETL jobs • Reporting and other analysis
• Interactive jobs (Presto)• Iterative ML jobs (Spark)
![Page 7: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/7.jpg)
Spark @ Netflix
![Page 8: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/8.jpg)
Mix of deployments
• Spark on Mesos • Self-serving AMI • Full BDAS (Berkeley Data Analytics Stack) • Online streaming analytics
• Spark on YARN • Spark as a service • YARN application on EMR Hadoop • Offline batch analytics
![Page 9: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/9.jpg)
Spark on YARN
• Multi-tenant cluster in AWS cloud • Hosting MR, Spark, Druid
• EMR Hadoop 2.4 (AMI 3.9.0) • D2.4xlarge ec2 instance type • 1000+ nodes (100TB+ total memory)
![Page 10: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/10.jpg)
Deployment
S3 s3://bucket/spark/1.5/spark-1.5.tgz, spark-defaults.conf (spark.yarn.jar=1440443677)
s3://bucket/spark/1.4/spark-1.4.tgz, spark-defaults.conf (spark.yarn.jar=1440304023)
/spark/1.5/1440443677/spark-assembly.jar /spark/1.5/1440720326/spark-assembly.jar
/spark/1.4/1440304023/spark-assembly.jar /spark/1.4/1440989711/spark-assembly.jar
name: spark version: 1.5 tags: ['type:spark', 'ver:1.5'] jars: - 's3://bucket/spark/1.5/spark-1.5.tgz’
Download latest tarball From S3 via Genie
![Page 11: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/11.jpg)
Advantages
1. Automate deployment. 2. Support multiple versions. 3. Deploy new code in 15 minutes. 4. Roll back bad code in less than a minute.
![Page 12: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/12.jpg)
Multi-tenancy Problems
![Page 13: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/13.jpg)
Dynamic allocation
Courtesy of “Dynamic allocate cluster resources to your Spark application” at Hadoop Summit 2015
![Page 14: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/14.jpg)
Dynamic allocation // spark-defaults.confspark.dynamicAllocation.enabled truespark.dynamicAllocation.executorIdleTimeout 5spark.dynamicAllocation.initialExecutors 3spark.dynamicAllocation.maxExecutors 500spark.dynamicAllocation.minExecutors 3spark.dynamicAllocation.schedulerBacklogTimeout 5spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5spark.dynamicAllocation.cachedExecutorIdleTimeout 900
// yarn-site.xmlyarn.nodemanager.aux-services
• spark_shuffle, mapreduce_shuffleyarn.nodemanager.aux-services.spark_shuffle.class
• org.apache.spark.network.yarn.YarnShuffleService
![Page 15: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/15.jpg)
Problem 1: SPARK-6954
“Attempt to request a negative number of executors”
![Page 16: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/16.jpg)
SPARK-6954
![Page 17: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/17.jpg)
Problem 2: SPARK-7955
“Cached data lost”
![Page 18: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/18.jpg)
SPARK-7955
val data = sqlContext .table("dse.admin_genie_job_d”) .filter($"dateint">=20150601 and $"dateint"<=20150830)data.persistdata.count
![Page 19: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/19.jpg)
Problem 3: SPARK-7451, SPARK-8167
“Job failed due to preemption”
![Page 20: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/20.jpg)
SPARK-7451, SPARK-8167
• Symptom • Spark executors/tasks randomly fail causing job failures.
• Cause • Preempted executors/tasks are counted as failures.
• Solution • Preempted executors/tasks should be considered as killed.
![Page 21: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/21.jpg)
Problem 4: YARN-2730
“Spark causes MapReduce jobs to get stuck”
![Page 22: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/22.jpg)
YARN-2730
• Symptom • MR jobs get timed out during localization when running with Spark jobs
on the same cluster.
• Cause • NM localizes one job at a time. Since Spark runtime jar is big, localizing
Spark jobs may take long, blocking MR jobs.
• Solution • Stage Spark runtime jar on HDFS with high repliacation.
• Make NM localize multiple jobs concurrently.
![Page 23: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/23.jpg)
Predicate Pushdown
![Page 24: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/24.jpg)
Predicate pushdown
Case Behavior
Predicates with partition cols on partitioned table Single partition scan
Predicates with partition and non-partition cols on partitioned table
Single partition scan
No predicate on partitioned table e.g. sqlContext.table(“nccp_log”).take(10)
Full scan
No predicate on non-partitioned table Single partition scan
![Page 25: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/25.jpg)
Predicate pushdown for metadata
Analyzer
Optimizer
SparkPlanner
Parser
HiveMetastoreCatalog
getAllPartitions()
ResolveRelation
What if your table has 1.6M partitions?
![Page 26: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/26.jpg)
SPARK-6910
• Symptom • Querying against heavily partitioned Hive table is slow.
• Cause • Predicates are not pushed down into Hive metastore, so Spark does full
scan for table metadata.
• Solution • Push down binary comparison expressions via getPartitionsByfilter() in
to Hive metastore.
![Page 27: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/27.jpg)
Predicate pushdown for metadata
Analyzer
Optimizer
SparkPlanner
Parser
HiveTableScan
getPartitionsByFilter()
HiveTableScans
![Page 28: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/28.jpg)
S3 File Listing
![Page 29: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/29.jpg)
Input split computation
• mapreduce.input.fileinputformat.list-status.num-threads • The number of threads to use list and fetch block locations for the specifi
ed input paths.
• Setting this property in Spark jobs doesn’t help.
![Page 30: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/30.jpg)
File listing for partitioned table
Partition path
Seq[RDD]
HadoopRDD
HadoopRDD
HadoopRDD
HadoopRDD
Partition path
Partition path
Partition path
Input dir
Input dir
Input dir
Input dir
Sequentially listing input dirs via S3N file system.
S3N
S3N
S3N
S3N
![Page 31: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/31.jpg)
SPARK-9926, SPARK-10340
• Symptom • Input split computation for partitioned Hive table on S3 is slow.
• Cause • Listing files on a per partition basis is slow.
• S3N file system computes data locality hints.
• Solution • Bulk list partitions in parallel using AmazonS3Client.
• Bypass data locality computation for S3 objects.
![Page 32: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/32.jpg)
S3 bulk listing
Partition path
ParArray[RDD]
HadoopRDD
HadoopRDD
HadoopRDD
HadoopRDD
Partition path
Partition path
Partition path
Input dir
Input dir
Input dir
Input dir
Bulk listing input dirs in parallel via AmazonS3Client.
Amazon S3Client
![Page 33: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/33.jpg)
Performance improvement
0 2000 4000 6000 8000
10000 12000 14000 16000
1 24 240 720
seco
nds
# of partitions
1.5 RC2 S3 bulk listing
SELECT * FROM nccp_log WHERE dateint=20150801 and hour=0 LIMIT 10;
![Page 34: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/34.jpg)
S3 Insert Overwrite
![Page 35: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/35.jpg)
Problem 1: Hadoop output committer
• How it works: • Each task writes output to a temp dir. • Output committer renames first successful task’s temp dir to
final destination.
• Problems with S3: • S3 rename is copy and delete. • S3 is eventual consistent. • FileNotFoundException during “rename.”
![Page 36: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/36.jpg)
S3 output committer
• How it works: • Each task writes output to local disk. • Output committer copies first successful task’s output to S3.
• Advantages: • Avoid redanant S3 copy. • Avoid eventual consistency.
![Page 37: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/37.jpg)
Problem 2: Hive insert overwrite
• How it works: • Delete and rewrite existing output in partitions.
• Problems with S3: • S3 is eventual consistent. • FileAlreadyExistException during “rewrite.”
![Page 38: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/38.jpg)
Batchid pattern
• How it works: • Never delete existing output in partitions. • Each job inserts a unique subpartition called “batchid.”
• Advantages: • Avoid eventual consistency.
![Page 39: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/39.jpg)
Zeppelin Ipython Notebooks
![Page 40: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/40.jpg)
Big data portal
• One stop shop for all big data related tools and services. • Built on top of Big Data API.
![Page 41: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/41.jpg)
Out of box examples
![Page 42: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/42.jpg)
• Zero installation • Dependency management via Docker
• Notebook persistence • Elastic resources
On demand notebooks
![Page 43: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/43.jpg)
Quick facts about Titan • Task execution platform leveraging Apache Mesos. • Manages underlying EC2 instances. • Process supervision and uptime in the face of failures. • Auto scaling.
![Page 44: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/44.jpg)
Notebook Infrastructure
![Page 45: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/45.jpg)
Ephemeral ports / --net=host mode
ZeppelinDocker Container A172.X.X.X
Host machine A54.X.X.X
Host machine B54.X.X.X
PysparkDockerContainer B172.X.X.X
Titan cluster YARN cluster
Spark AM
Spark AM
![Page 46: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/46.jpg)
Use Case Pig vs. Spark
![Page 47: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/47.jpg)
Iterative job
![Page 48: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/48.jpg)
Iterative job
1. Duplicate data and aggregate them differently.
2. Merging aggregates back.
![Page 49: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/49.jpg)
Performance improvement
0:00:00 0:14:24 0:28:48 0:43:12 0:57:36 1:12:00 1:26:24 1:40:48 1:55:12 2:09:36
job 1 job 2 job 3
hh:m
m:s
s
Pig Spark 1.2
![Page 50: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/50.jpg)
Our contributions
SPARK-6018
SPARK-6662
SPARK-6909
SPARK-6910
SPARK-7037
SPARK-7451
SPARK-7850
SPARK-8355
SPARK-8572
SPARK-8908
SPARK-9270
SPARK-9926
SPARK-10001
SPARK-10340
![Page 51: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/51.jpg)
Q&A
![Page 52: Netflix: Integrating Spark At Petabyte Scale](https://reader033.vdocuments.mx/reader033/viewer/2022052707/589da5651a28abb3498bddc8/html5/thumbnails/52.jpg)
Thank You