distributed data processing workshop - sbu

23
1 ه توزیع شدهه پردازش داد کارگادبهشتی شهی- پردیس کامپیوتر و مهندسیانشکده علوم د درس:ه توزیع شدهه داد پایگاستاد: ای طباطبای دکتر هادی ارائه:قیفضل صدی ابوال آبان۱۳۹۳

Upload: amir-sedighi

Post on 07-Jul-2015

1.438 views

Category:

Data & Analytics


1 download

DESCRIPTION

This presentation is about how to prepare a distributed data processing environment on your PC.

TRANSCRIPT

Page 1: Distributed Data Processing Workshop - SBU

1

کارگاه پردازش داده توزیع شده

پردیس- شهیدبهشتی

دانشکده علوم و مهندسی کامپیوتر

پایگاه داده توزیع شدهدرس:

دکتر هادی طباطباییاستاد:

ابوالفضل صدیقی ارائه: ۱۳۹۳آبان

Page 2: Distributed Data Processing Workshop - SBU

Distributed Data Processing

School of Computer Science and Engineering

A. Sedighi

@amirsedighiHexican.com

[email protected]

Page 3: Distributed Data Processing Workshop - SBU

3

Every Game needs it's Playing Yard

Page 4: Distributed Data Processing Workshop - SBU

4

Every Game needs it's Playing Yard

Page 5: Distributed Data Processing Workshop - SBU

5

What can I do on a Single Machine?

● MVC Programming

● Regular Biz Apps

● 100 GBs Data

● Web Surfing

● ...

Page 6: Distributed Data Processing Workshop - SBU

6

Linux Cluster

Page 7: Distributed Data Processing Workshop - SBU

7

Page 8: Distributed Data Processing Workshop - SBU

8

Page 9: Distributed Data Processing Workshop - SBU

9

Introduction

This is a 4 sessions, hands-on, step-by-step

tutorial on setting up, a Linux cluster on your

machine (Notebook or PC), to try a few number

of big-data processing frameworks and tools.

Page 10: Distributed Data Processing Workshop - SBU

10

What we are going to do?

● Your notebook, or a PC is just enough for starting.– Setting your Linux cluster up.

● Distributed Log Management and Realtime Search-Engines– What is Elasticsearch?

– Elasticsearch on the cluster.

– Monitoring and Usage.

● The most popular Distributed Data Processing Framework.– What is Apache Hadoop?

– Apache Hadoop on the cluster.

– Using Scenarios.

Page 11: Distributed Data Processing Workshop - SBU

11

What we would Learn?

● Leveraging our knowledge of Big-Data.

● Getting familiar with distributed data processing.

● Maximizing availability and reliability.

● Increasing data storage capacity.

● Leveraging data processing performance.

● Data locality is a silver bullet.

● Increasing cluster utilization.

● Taming giants by giving them a try.

Page 12: Distributed Data Processing Workshop - SBU

12

Preparing the Linux Cluster - VirtualBox

Page 13: Distributed Data Processing Workshop - SBU

13

Preparing the Cluster - Hosting

● VirtualBox

– Memory Size, Disk Capacity and CPU cores.

– Network Interfaces.● NAT, provides Internet.● Host-Only, provides cluster communication.

Page 14: Distributed Data Processing Workshop - SBU

14

Preparing the Cluster – Adding a Host-Only Network

Page 15: Distributed Data Processing Workshop - SBU

15

Preparing the Cluster – Adding a NAT Interface

Page 16: Distributed Data Processing Workshop - SBU

16

Preparing the Cluster – Adding a Host-Only Interface

Page 17: Distributed Data Processing Workshop - SBU

17

Preparing the Cluster – First Node

● Creating a Linux machine inside VirtualBox.

● Installing Linux. (I've used Ubuntu 12.04)

– Check Samba

– Check OpenSSH

● Give the first node all.

– Having an “install” folder on.

– Having primitives such as Java installed on.

● Shutting down the first node.

Page 18: Distributed Data Processing Workshop - SBU

18

Preparing the Cluster – Cloning, The Virtual Box Side

● Cloning the first node. (tutorial)

Page 19: Distributed Data Processing Workshop - SBU

19

Preparing the Cluster – Cloning, the Linux side

● Turning the new node on.

● Network configuration

– sudo nano /etc/hosts

– sudo nano /etc/hostname

– sudo nano /etc/network/interfaces

– sudo rm /etc/udev/rules.d/70-persistent-net.rules

● sudo reboot

Page 20: Distributed Data Processing Workshop - SBU

20

Preparing the Cluster – No Password Login

● Do this:

– ssh-keygen

– ssh-copy-id -i ~/.ssh/id_rsa.pub user@host

● Or this:

– ssh-keygen -t dsa -p '' -f ~/.ssh/id_dsa

– scp .ssh/id_rsa.pub user@host:~/master_key

– ssh user@host

– cat master_key >> ./ssh/authorized_keys

Page 21: Distributed Data Processing Workshop - SBU

21

Preparing the Cluster – Distributed Shell

● Do it like a Commander

– Installing DSH (Optional)

Page 22: Distributed Data Processing Workshop - SBU

22

Preparing the Cluster – Enjoy it

● To scale your cluster just repeat the cloning step.

Page 23: Distributed Data Processing Workshop - SBU

23

Next?

● An introduction to distributed Log Management and analytical search-engines.– How Elasticsearch works?

– Workshop.

● An introduction to Apache Hadoop

– How Apache Hadoop works?

– Workshop.