Download - Bare metal Hadoop provisioning
![Page 1: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/1.jpg)
GoDataDrivenPROUDLY PART OF THE XEBIA GROUP
Bare metal Hadoop provisioning
Kris GeusebroekBig Data Hacker
With ansible and cobbler
1
![Page 2: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/2.jpg)
-- Big Data Borat
“Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata”
2
![Page 3: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/3.jpg)
-- Kris Geusebroek
“We’re hiring”
3
![Page 4: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/4.jpg)
GoDataDriven
Don’t want to...Manually install everything needed for a Hadoop cluster...
4
![Page 5: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/5.jpg)
GoDataDriven
Separate layers...- Hardware- OS- Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting)- Cluster install (Cloudera Manager or Hortonworks Data Platform)- Extra stuff (Monitoring Ganglia, R & R-packages, ......)
5
![Page 6: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/6.jpg)
GoDataDriven
Want...- Horizontal scaling: Effort for an extra machine is minimal- Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation- Multiple clusters- Experiment first with appropriate configuration for a specific goal - Think memory, hard disks, number of nodes
6
![Page 7: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/7.jpg)
GoDataDriven
Want...- Automate all the tasks for every layer- Parameterise a lot- Simple configuration of the separate layers- Definition of roles (masternode, datanode etc.)
7
![Page 8: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/8.jpg)
GoDataDriven
Possible with...Vendor specific toolsproblem here is they can do only a subset of all tasks
8
![Page 9: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/9.jpg)
GoDataDriven
What we have done here...Nothing new, just another possibility
Nothing tool specific- demo installs Cloudera Manager, but works also with Hortonworks Data Platform.
Most important is:
9
![Page 10: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/10.jpg)
GoDataDriven
Stack...
10
![Page 11: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/11.jpg)
-- Big Data Borat
“Essentially, this solution is CoSSaaS.”
11
![Page 12: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/12.jpg)
-- Big Data Borat
“Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)”
12
![Page 13: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/13.jpg)
GoDataDriven
Cobbler...
Cobbler used for - CMS- DHCP server- OS image hosting- OS kickstart
cobblerd.org
13
![Page 14: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/14.jpg)
GoDataDriven
Ansible...
Ansible used for - Tying it all together
- Initial setup of network config- One time push of SSH key- Full software install
ansible.cc
14
![Page 15: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/15.jpg)
GoDataDriven
Cloudera Manager...
Cloudera Manager used for - Cluster install software.
- Currently manual labour, can be automated using the API
cloudera.com
15
![Page 16: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/16.jpg)
GoDataDriven
Show me the code...
Add node information to the cobbler CMSFirst make the install dvd known to cobbler:mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvdcobbler import --path=/mnt/dvd --name=CentOS64
Next make the node information known:sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True
If needed, re-enable the netboot flag:sudo cobbler system edit --name=node01 --netboot-enabled=True
16
![Page 17: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/17.jpg)
GoDataDriven
Show me the code...
Ansible needs to know what goes where[cluster]node01node02node03
[cobbler]cobbler
[proxy]cobbler
[ganglia-master]node01
[ganglia-nodes:children]cluster
[cloudera-manager]node01
17
![Page 18: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/18.jpg)
GoDataDriven
Show me the code...
For the rest it’s just a DSL thinghy with extra’s- hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml
- name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root
- name: Install CM4 common stuff yum: name=$item state=installed
18
![Page 19: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/19.jpg)
Demo...
19
![Page 20: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/20.jpg)
GoDataDriven
Shared problems...- No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example)- Bios settings, different RAID settings are not handled (yet).- Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue- MAC address of all nodes must be known
20
![Page 21: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/21.jpg)
GoDataDriven
Take aways...- Do automate from the start- It’s easy- Use (our) open source code to get a head starthttps://github.com/godatadriven/ansible_cluster- Our team will do the additional consultancy
21
![Page 22: Bare metal Hadoop provisioning](https://reader034.vdocuments.mx/reader034/viewer/2022050807/53f95aa68d7f729c2e8b4a01/html5/thumbnails/22.jpg)
GoDataDriven
We’re hiring / Questions? / Thank you!
Kris GeusebroekBig Data Hacker
22