node architecture implications for in-memory data analytics on scale-in clusters
TRANSCRIPT
![Page 1: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/1.jpg)
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Ahsan Javed Awan
![Page 2: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/2.jpg)
MotivationAbout me
● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain.● Visiting Researcher at Barcelona Super Computing Center.● Speaker at Spark Summit Europe 2016.● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics
with Apache Spark”● https://www.kth.se/profile/ajawan/
![Page 3: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/3.jpg)
MotivationWhy should we care about architecture support?
![Page 4: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/4.jpg)
MotivationCont..
*Source: SGI
● Exponential increase in core count.● A mismatch between the characteristics of emerging big data workloads and the
underlying hardware.● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc)
● Clearing the clouds, ASPLOS' 12● Characterizing data analysis
workloads, IISWC' 13● Understanding the behavior of in-
memory computing workloads, IISWC' 14
![Page 5: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/5.jpg)
MotivationCont...
Scale-in: Fewer nodes of powerful machines
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,Metis, Ostrich, etc..
Hadoop, Spark,Flink, etc.. Our Focus
![Page 6: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/6.jpg)
Which Scale-out Framework ?
[Picture Courtesy: Amir H. Payberah]
● Tuning of Spark internal Parameters● Tuning of JVM Parameters (Heap size etc..)● Micro-architecture Level Analysis using Hardware Performance Counters.
![Page 7: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/7.jpg)
Progress Meeting 12-12-14Which Benchmarks ?
![Page 8: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/8.jpg)
Multicore Scalability of SparkMulti-core Scalability of Apache Spark?
![Page 9: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/9.jpg)
Multicore Scalability of SparkThe Problem of GC?
![Page 10: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/10.jpg)
Multicore Scalability of SparkImpact of NUMA Awareness?
![Page 11: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/11.jpg)
Multicore Scalability of SparkEffectiveness of Hyper-Threading?
![Page 12: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/12.jpg)
Multicore Scalability of SparkEfficacy of existing prefetchers?
![Page 13: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/13.jpg)
Our Approach2D PIM vs 3D Stacked PIM
High Bandwidth Memories are not required for Spark
![Page 14: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/14.jpg)
Multicore Scalability of SparkThe Problem of File I/O?
![Page 15: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/15.jpg)
Our ApproachUse Near Data Computing Architecture
● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission)
![Page 16: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/16.jpg)
Our ApproachConclusions
● We advise using executors with memory size less than or equal to 32GB and restrict each executor to use NUMA-local memory.
● We recommend to enable hyper-threading, disable next-line L1-D and adjacent cache line L2 prefetchers and lower the DDR3 speed to 1333.
● We also envision processors with 6 hyper-threaded cores without L1-D next line and adjacent cache line L2 prefetchers per socket.
● The use of high bandwidth memories like Hybrid memory cubes is not justified for in-memory data● analytics with Spark.
![Page 17: Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58ecd5241a28abbc478b4649/html5/thumbnails/17.jpg)
THANK YOU.Email: [email protected]: www.kth.se/profile/ajawan/
Acknowledgements: Mats Brorsson(KTH)Vladimir Vlassov(KTH)Eduard Ayguade(UPC/BSC)