hadoop at rakuten, 2011/07/06

Post on 20-Jan-2015

2.187 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hadoop at Rakuten

TRANSCRIPT

1111

Hadoop at Rakuten.

Rakuten Inc. Architect GroupHamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed)

2222

Today’s Agenda.

Hadoop at Rakuten.

1. Our Profie.2. What is Hadoop?3. Our Current Hadoop System Overview.4. Our Hadoop Usage.5. Our Challenge.6. Our Future Plan.

3333

Our Profile.

Hadoop at Rakuten.

4444

From ACT GroupNakagawa GenHamba Mitsuharu

Our Profile.

5555

Our Profile.

Our Mission

Enhancing Hadoop at Rakuten.

6666

Our Profile.

Latest Our Tasks.Done.

1.Implementing Ganglia.2.Implementing HA.

7777

Our Profile.

Latest Our Tasks.Now Handing Over.

1. Keeping Up Our Hadoop Cluster.2. Modifying Our Hadoop Configurations.3. Implementing Scripts for Daily Chores.

8888

Our Profile.

Latest Our Tasks. Concentrate It!

1.Evaluating The Related Products.

9999

What is Hadoop?

Hadoop at Rakuten.

10101010

One of The Most PowerfulDistributed Processing for Large Data Sets.

What is Hadoop?

11111111

Distributions.

What is Hadoop?

12121212

Ecosystem.

What is Hadoop?

ETC...

13131313

What is Hadoop?

HDFS : Hadoop Distributed File System.MapReduce :Map & Reduce (Includes Shuffle & Sort) .

HDFS & MapReduce Constitute Hadoop.

14141414

What is Hadoop?

Source : http://horicky.blogspot.com/2008_11_01_archive.html

Input from HDFS.

Output to HDFS. Process by MapReduce.

15151515

What is Hadoop?

Simple Example.

Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/

16161616

What is Hadoop?

Source : http://horicky.blogspot.com/2008_11_01_archive.html

In Common Case,Combine Several Simple Jobs.

17171717

What is Hadoop?

NameNode & DataNode Constitute HDFS.

Source : http://horicky.blogspot.com/2008_11_01_archive.html

18181818

What is Hadoop?

Read & Write on HDFS.

Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes

19191919

What is Hadoop?

JobTracker & TaskTracker Constitute MapReduce.

Source : http://horicky.blogspot.com/2008_11_01_archive.html

20202020

What is Hadoop?

Good & Bad Points of Hadoop.

�Bad…There is SPoF at NameNode.

�Good!Easy to Scale Out System.Easy to Implement Distributed Processing.

21212121

Our Current HadoopSystem Overview.

Hadoop at Rakuten.

22222222

Our Current Hadoop System Overview.

The Cluster Infrastructure. #1For Instance.

Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/

23232323

Our Current Hadoop System Overview.

The Cluster Infrastructure. #2In Our Case.

Switch Switch Switch

Switch

Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack

NN&JTActive

NN&JTStandby

SNN

DN&TT DN&TT DN&TT DN&TT DN&TT DN&TT

DN&TTDN&TTDN&TT

1Gbps1Gbps1Gbps

1Gbps

x10 x10 x10 x10 x10 x10

x3 x3 x3

Client

Others Others Others

x18 x18 x183 Masters & 69 Slaves.

24242424

Our Current Hadoop System Overview.

The Monitoring System.Using Ganglia (& MRTG).Every Time We Easily Can CheckThe Resource Usage,Not Only Each MachineBut As Cluster.

25252525

Our Current Hadoop System Overview.

High Availability.Using DRBD & HeartBeat.

v-host.rakuten.co.jp

eth1

NN JT NN JT

/foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1

DRBD Sync The Change.

eth0 eth0

eth1

Active Standby

Client

Source : Gen

NN : NameNodeJT : JobTracker

26262626

Our Hadoop Usage.

Hadoop at Rakuten.

27272727

Our Hadoop Usage.

1. Generating Recommend Engine Index.2. Analyzing Redirect Log.3. Calculating AD Targeting Index.4. Measuring AD Effects.5. Analyzing Ichiba Merchandise & Order Info. 6. Calculating Ichiba Product Ranking.7. Analyzing Search Log.

8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)9. Analyzing Search Word N-gram. (Coming Soon...)

Who Is Using Our Hadoop.

28282828

Our Hadoop Usage.

The Issues of The Previous System.

Purchase

Shop

ITEM

Intermediate

Intermediate

FileFileFileFileFileFile

Marketing

Utility

Previous System

Category NFS

Mail

Unload

Load

Manipulate

1. Need High Cost to Keep Up The RDBMS.2. Need Quite a Lot of Storage Space More & More.3. System Cannot Handle So Many Job Request

Due to Low Performance.Batch Server

29292929

Our Hadoop Usage.

The Effect of The New System.

Purchase

Shop

ITEM

FileFileFileFileFileFile

Marketing

Utility

New System! 1st Step.

Category NFS

Mail

Unload

Load

Manipulate

Batch Serverwith

1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)2. Transaction Time is Dramatically Improved. (50-75% OFF.)

Intermediate

30303030

Our Hadoop Usage.

The Remaining Subject ofThe New System.1. Still Halfway to Aiming DWH.2. The Negative Influence Due to The Migration

from Occupied Environment to Shared Environment.1. Security.2. Sharing Cluster Resource.

31313131

Our Challenge.

Hadoop at Rakuten.

32323232

Our Challenge.

1. Likely to Use Up The HDFS Space.2. Need Much Electlicity Power.3. Share The Cluster Resource Efficiently.4. Need More Network Bandwidth.

The Issues with Our Hadoop.

33333333

Our Future Plan.

Hadoop at Rakuten.

34343434

Our Future Plan.

Considering New Slave Machine.

?

Now Looking for a Machine Which has…Low Electric Power Consumption,About 6 Cores CPU x2,About 10TB HDD,About 96GB Memory,& Naturally Compatible With Our Data Center.

35353535

Our Future Plan.

Upgrade from Apache to CDH3.

Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation]

1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes and back ported features (append for HBase, Kerberos security from Y!, etc.).

2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and work as a cohesive system.

3. Simplified installation via Yum / Apt repositories.4. Tighter integration with the OS (init scripts for daemons, installation of things in

common paths, logs in their proper location.).5. A fixed release schedule.6. Support available from Cloudera with SLAs.

Mr.Eric Sammer (Solution Architect at Cloudera) Described the Advantage of Hadoop from Cloudera on Quora.

36363636

Our Future Plan.

Evaluating HBase Using AWS.

Constructing HBase Cluster on Amazon EC2.Doing Evaluation & Verification This Summer!

37373737

Hadoop at Rakuten.

We Need Hadooper Much More!Come With Us!

Need Your Help!

38383838

Thank You.

Hadoop at Rakuten.

top related