![Page 1: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/1.jpg)
Improving Hadoop performance with reliable opportunistic instances in
OpenStackTelles Nóbrega, Henrique Truta, Andrey Brito
Univesidade Federal de Campina Grande
![Page 2: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/2.jpg)
Cloud Computing● Cheap and flexible way to get resources on demand
● Provides resources in a pay-as-you-go fashion
● Have three basic types of services:○ Infrastructure-as-a-Service○ Software-as-a-Service○ Platform-as-a-Service
2
![Page 3: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/3.jpg)
Underutilization of resources● Overall cluster utilization is around 60% for CPU and 50% for
RAM
● The same behavior can be seen when we look at single machines of the cluster
● Nevertheless, the user’s quota (especially on private clouds) cannot be overdimensioned
3
![Page 4: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/4.jpg)
Underutilization of resources (Google Open Data)
4
![Page 5: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/5.jpg)
Data Processing with Hadoop● Hadoop is one of the most used data processing tools on the market
● Implements the MapReduce paradigm
● Fault tolerant
● Often moved to the cloud○ Easy to scale○ Lower costs○ Gain in flexibility often compensate loss in performance
5
![Page 6: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/6.jpg)
Hadoop as a Service● The PaaS model makes the Hadoop processing on the cloud easier
● The user can request a configured Hadoop cluster and just submit his/her data and applications
● Amazon Elastic MapReduce is an example, OpenStack Sahara is an equivalent solution
6
![Page 7: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/7.jpg)
Opportunistic instances● Cheaper type of instances (somehow similar to AWS Spot instances)
● Spawned in underused compute resources
● The goal is to use them to speed up Hadoop processing○ However, those instances can be preempted if the resources are needed
for a higher priority (regular) instances○ Losing the instance in the middle of a job execution is extremely harmful
for the application performance
7
![Page 8: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/8.jpg)
8
![Page 9: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/9.jpg)
The approach● A framework that uses opportunistic instances in Hadoop processing with a
higher guarantee that this instance will not be lost
● Uses a workload predictor to forecasts the cloud utilization in a short time window, estimating how many instances will be available
● If the predictor underestimates the workload, live migration is used so the VM is not lost○ Moves the instance to a different host without turning them off○ Downtime ranges from a milliseconds to a couple of seconds
9
![Page 10: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/10.jpg)
The approach
10
![Page 11: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/11.jpg)
Results with an extra instance (from 3 to 4)
11
![Page 12: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/12.jpg)
Results● VMs loss reduced with prediction
● Live migration has almost no interference on other processes
● Better resource usage in the cloud without the risks over overprovisioning or increasing the quota
12
![Page 13: Improving Hadoop performance with reliable opportunistic instances in OpenStack](https://reader035.vdocuments.mx/reader035/viewer/2022070605/5a6d8f8c7f8b9ad1418b5be9/html5/thumbnails/13.jpg)
Thank you!
Andrey BritoProfessor - Universidade Federal de Campina Grande
Telles NóbregaMaster Degree Student - Universidade Federal de Campina GrandeOpenStack ATC
Henrique TrutaMaster Degree Student - Universidade Federal de Campina GrandeOpenStack ATC
13