cloud computing project nsysu sec. 1 demo. nsysu ee it_lab2 outline our systems architecture flow...

21
Cloud Computing Cloud Computing project project NSYSU Sec. 1 Demo NSYSU Sec. 1 Demo

Upload: margery-harvey

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

NSYSU EE IT_LAB3 Architecture  Hardware –2 ASUS Servers, Intel Xeon CPU X GHz, 1TB HD & 3G ram (master, slave1) 1TB HD & 3G ram (master, slave1) –1 PC, Intel Core 2 Quad CPU Q GHz, 500G HD, 4G ram (slave2) 500G HD, 4G ram (slave2)  Software –CentOS 5.03 –Hadoop

TRANSCRIPT

Page 1: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

Cloud Computing projectCloud Computing project

NSYSU Sec. 1 DemoNSYSU Sec. 1 Demo

Page 2: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 22

OutlineOutline

Our system’s architectureOur system’s architecture

Flow chart of the hadoop’s job(web crawler) Flow chart of the hadoop’s job(web crawler) working on hadoop clusterworking on hadoop cluster– Basic setupBasic setup– Flow chartFlow chart

Compare crawler’s efficiency on different typCompare crawler’s efficiency on different types’ hadoop clusteres’ hadoop cluster

Page 3: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 33

ArchitectureArchitecture

HardwareHardware– 2 ASUS Servers, 2 ASUS Servers, Intel Xeon CPU X3330 2.66GHz,Intel Xeon CPU X3330 2.66GHz, 1TB HD & 3G ram (master, slave1)1TB HD & 3G ram (master, slave1)– 1 PC, 1 PC, Intel Core 2 Quad CPU Q6600 2.40GHz,Intel Core 2 Quad CPU Q6600 2.40GHz, 500G HD, 4G ram (slave2)500G HD, 4G ram (slave2)

SoftwareSoftware– CentOS 5.03CentOS 5.03– Hadoop 0.20.1Hadoop 0.20.1

Page 4: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 44

ArchitectureArchitectureMachine 01

Machine 02 Machine 03

master (x.x.x.1)

slave2 (x.x.x.3)slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

DatanodeTaskTracker

DatanodeTaskTracker

administer

http://x.x.x.1:50070

http://x.x.x.1:50030

user

Job

Page 5: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 55

HDFS HDFS http://x.x.x.1:50070

Page 6: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 66

HDFS HDFS http://x.x.x.1:50070

Page 7: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 77

Job admin Job admin http://x.x.x.1:50030

Page 8: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 88

Job admin Job admin http://x.x.x.1:50030

Page 9: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 99

Job admin Job admin http://x.x.x.1:50030

Page 10: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1010

Basic setup (hadoop)Basic setup (hadoop)

1.1. Setup communication without password thrSetup communication without password through ough sshssh protocol protocol

2.2. Install Install javajava3.3. Import Import java pathjava path (or any files’ path needed) (or any files’ path needed)

in {hadoop dir}/conf/hadoop-env.shin {hadoop dir}/conf/hadoop-env.sh4.4. Import Import namenodenamenode and and JobtrackerJobtracker hosts’ na hosts’ na

me in {hadoop dir}/conf/hadoop-site.shme in {hadoop dir}/conf/hadoop-site.sh

Page 11: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1111

Basic setup (hadoop)Basic setup (hadoop)

5.5. Setup Setup mastermaster file and file and slavesslaves file file6.6. Format HDFS Format HDFS (hadoop distributed file system)(hadoop distributed file system)7.7. Start HadoopStart Hadoop8.8. Check hadoop Check hadoop HDFS http://namenode’s ip:50070HDFS http://namenode’s ip:50070 Job admin http://Jobtracker’s ip:50030Job admin http://Jobtracker’s ip:50030

Page 12: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1212

Basic setup (crawler)Basic setup (crawler)

1.1. Check your web robot agent fileCheck your web robot agent file

2.2. Setup urls filter fileSetup urls filter file

3.3. Set your seed urls file by manual input or weSet your seed urls file by manual input or web’s url package b’s url package

(Some details’ setting steps are ignored here.)(Some details’ setting steps are ignored here.)

Page 13: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1313

Flow chartFlow chartSeed urls

Run crawl commandas a hadoop job

Assign job’s fragments to each tasktracker; go fetch web’s data

Store context to output dir. on HDFS

Link logNew fetch list

Doc. dataFetch log

HDFS

user

( )

Map &reduce

Page 14: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1414

Hadoop cluster – 1 nodeHadoop cluster – 1 node

Machine 01master (x.x.x.1)

Namenode

JobTracker

DatanodeTaskTracker

Page 15: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1515

Hadoop cluster – 2 nodesHadoop cluster – 2 nodes

Machine 01 Machine 02master (x.x.x.1) slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

DatanodeTaskTracker

Page 16: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1616

Hadoop cluster – 3 nodesHadoop cluster – 3 nodesMachine 01

Machine 02 Machine 03

master (x.x.x.1)

slave2 (x.x.x.3)slave1 (x.x.x.2)

Namenode

JobTracker

DatanodeTaskTracker

DatanodeTaskTracker

DatanodeTaskTracker

Page 17: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1717

Urls setUrls set

Get urls package from Get urls package from http://dmoz.org/http://dmoz.org/

select one out of every 500, so that we end select one out of every 500, so that we end up with around 10000 URLs up with around 10000 URLs

Page 18: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1818

Crawler input (seeds.txt)Crawler input (seeds.txt)

Page 19: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 1919

Crawler ouputCrawler ouput

Output to HDFSOutput to HDFS

Page 20: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 2020

Speed compareSpeed compare

Hadoop job costs timeHadoop job costs time(9199 urls)(9199 urls)

1 work node1 work node 1888 seconds1888 seconds

2 work nodes2 work nodes 1679 seconds1679 seconds

3 work nodes3 work nodes 1628 seconds1628 seconds

Page 21: Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our systems architecture  Flow chart of the hadoops job(web crawler) working

NSYSU EE IT_LABNSYSU EE IT_LAB 2121

Thanks for your attention!!Thanks for your attention!!