minimum technology stack to setup hadoop lab

4
MINIMUM TECHNOLOGY REQUIREMENTS FOR STARTING WITH HADOOP LAB By Anurag Shrivastava

Upload: anurag-shrivastava

Post on 27-Jul-2015

82 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Minimum technology stack to setup Hadoop lab

MINIMUM TECHNOLOGY REQUIREMENTS FOR STARTINGWITH HADOOP LAB

By Anurag Shrivastava

Page 2: Minimum technology stack to setup Hadoop lab

2

Minimum Data Platform

Hortonworks Data Platform 2.0Based upon Hadoop 2.2.0, Apache YARN

Machine Learnin

g (R)

Query Tools (Hive)

Map/Reduce (Java)

Version Control/Repository

Data Integration (Pig)

Redhat Enterprise Linux

Virtual Machines

Dedicated Nodes

Flume

Sqoop

Data Ingestion

Operating System Layer

Tools

Page 3: Minimum technology stack to setup Hadoop lab

3

Minimum Exploration Environment

Master Node(Name Node, Job

Tracker)

Slave Node 1(Data Node, Task

Tracker)

Git and Maven

Repository

Data Science/Engi

neering Server

Staging Server

Hadoop Cluster

Exploration Environment

The cluster sizing depends upon your data volume.

Slave Node 2(Data Node, Task

Tracker)

Slave Node 3(Data Node, Task

Tracker)

Slave Node 4(Data Node, Task

Tracker)

Page 4: Minimum technology stack to setup Hadoop lab

4

Data Needs Customer View

Hard customer data Basis customer data such as Name, Address and

DOB Products and Agreement Contact History of All Channels Credit Scores Demographic Data

Soft customer data Social Media Handles Social Media Feeds Unstructured Data such as Emails

Transaction History Transaction feed - frequent batches or near real

time