minimum technology stack to setup hadoop lab
TRANSCRIPT
MINIMUM TECHNOLOGY REQUIREMENTS FOR STARTINGWITH HADOOP LAB
By Anurag Shrivastava
2
Minimum Data Platform
Hortonworks Data Platform 2.0Based upon Hadoop 2.2.0, Apache YARN
Machine Learnin
g (R)
Query Tools (Hive)
Map/Reduce (Java)
Version Control/Repository
Data Integration (Pig)
Redhat Enterprise Linux
Virtual Machines
Dedicated Nodes
Flume
Sqoop
Data Ingestion
Operating System Layer
Tools
3
Minimum Exploration Environment
Master Node(Name Node, Job
Tracker)
Slave Node 1(Data Node, Task
Tracker)
Git and Maven
Repository
Data Science/Engi
neering Server
Staging Server
Hadoop Cluster
Exploration Environment
The cluster sizing depends upon your data volume.
Slave Node 2(Data Node, Task
Tracker)
Slave Node 3(Data Node, Task
Tracker)
Slave Node 4(Data Node, Task
Tracker)
4
Data Needs Customer View
Hard customer data Basis customer data such as Name, Address and
DOB Products and Agreement Contact History of All Channels Credit Scores Demographic Data
Soft customer data Social Media Handles Social Media Feeds Unstructured Data such as Emails
Transaction History Transaction feed - frequent batches or near real
time