virtualisation of hadoop clusters dr g sudha sadasivam assistant professor department of cse psgct
TRANSCRIPT
![Page 1: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/1.jpg)
VIRTUALISATION OF HADOOP CLUSTERS
Dr G Sudha SadasivamAssistant ProfessorDepartment of CSE
PSGCT
![Page 2: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/2.jpg)
Introduction• Physical machine can have a number of smaller
virtual machines (VMs), each running a separate operating system instance.
• Challenges– partitioning of a machine – concurrent execution of multiple operating systems – Isolation of virtual machines from one another– Support heterogeneity of applications– Low performance overhead
• Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications
![Page 3: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/3.jpg)
Objective• Automation of creation and deletion of a virtual
cluster for hosting Hadoop using Xen• A large physical cluster can be simulated on few
physical machines
Steps• Input user configuration by editing configuration files.• Generates user specified number of VM running
Hadoop.• Users can manage the Hadoop file system • Users can submit jobs for each physical machine.
![Page 4: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/4.jpg)
Need for virtualisation• Ability to recover from software problems quickly by
saving a copy of guest image.• High availability by relocating guests when a server
machine in inoperable.• Dynamic load balancing by migrating guests from server
machines.• Consolidation of many services in one physical machine
and administer them independently in VM.• Usage of abundant computational power on the physical
machine. Minimisation of cost.• Switch between applications on different OS using
hypervisors.
![Page 5: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/5.jpg)
HADOOP CLUSTER CONFIGURATION
Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave
![Page 6: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/6.jpg)
Master is the HostOS which acts as job tracker/Name node. Slave is the GuestOS which acts as task tracker/Data node.
![Page 7: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/7.jpg)
• Installation of Xen kernel• Creation of Guest OS• Configuration of Guest OS • Installation of Java Development Kit• Extraction and Configuration of Hadoop
Cluster• Creating OS image for new Guest Machines• Creation and removal of other Virtual
machines, copy the OS images
Steps in implementing
![Page 8: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/8.jpg)
Automated Creation of a Hadoop Virtual cluster
XML file has configuration details of new VM
![Page 9: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/9.jpg)
Automated Shut down of Hadoop Virtual cluster
![Page 10: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/10.jpg)
Advantages of automated virtualization in Hadoop
1.Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable.
2.The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters.
3.The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)
![Page 11: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/11.jpg)
Enhancements
1.Providing a graphical console for monitoring and managing virtual cluster.
2.Creation and Migration of virtual machine for the purpose of load balancing.
3.Enabling snapshot of the virtual machine. For checkpointing
4.Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.
![Page 12: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/12.jpg)
Performance of Physical vs Virtual clusters
0
5E+10
1E+11
1.5E+11
2E+11
2.5E+11
1 2 3 4 5
Number of nodes
Tim
e in
nse
c
Physical clusters Virtual Clusters
4 6 8 10 12
![Page 13: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/13.jpg)
1.00E+09
1.00E+10
1.00E+11
1 2 3 4 5
Number of nodes
Tim
e in
nse
c
7 Nodes Data nodes – 6 Virtual nodes
Name node –1 physical node
Master as a Physical Node
![Page 14: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/14.jpg)
7 Nodes Data nodes – 1 physical node + 5 Virtual nodes
Name node –1 virtual node
1.00E+09
1.00E+10
1.00E+11
1 2 3 4 5
File size in MB
Tim
e in
nse
cMaster as a Virtual Node
![Page 15: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649da75503460f94a93abc/html5/thumbnails/15.jpg)
Performance with varying number of Virtual nodes
5.74E+10
5.76E+10
5.78E+10
5.80E+10
5.82E+10
5.84E+10
5.86E+10
5.88E+10
5.90E+10
5.92E+10
4 6 8 10 12
File Size in MB
Tim
e in
Na
no
seo
nd
Six Virtual Nodes Four Virtual Nodes