lessons from building large clusters

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice1

Phil Day, HP Consulting

8th November 2010

22

Small vs Large Clusters

Small Production Clusters and

Proof of Concept

– Build and run by a few skilful

people

– Can be a natural extension

to conventional IT

– You know the servers by

name

Large Production Clusters

– Build and run by pioneers

– Large development staff

– Major Hadoop contributors

– Understand the problems of

scale

Images: Creative Commons 2.0 – Attribution Andrew Morrell (Flickr )

33

– Have, or want to start with, a small PoC (10’s of nodes)

– Want to quickly scale to large cluster (100’s of nodes)

– Want the scale of large clusters, but with the build and operational

model of a small one

– Want to run the cluster rather than build and develop it

– Need to integrate it with existing systems

Large Scale Early Adopters

Unfortunately not all things in life scale as well as Hadoop

Design – The Technology Challenge

Build – The Engineering Challenge

Transfer to Operations - The Service Management Challenge

44

Design – The Technology ChallengeSelecting all the right bits

Server Selection

– Core Nodes: Resilient, Big Memory, RAID

– Data Nodes: Not resilient, no RAID or hot swap, basic iLO

– Trade off Disks vs Cores vs Memory to match target load

– Need to consider disc allocation policy

– Network redundancy is useful to avoid rack switch failures

– Edge Nodes (Data ingress/egress & Mgmt)

– Higher spec data nodes

– Help provide the “appliance” view of the cluster

– Have Hadoop installed but don’t run as part of the cluster.

– Network Selection

– Dual 1Gb from data nodes to rack switches

– 10Gb from rack switches to core, and from Edge nodes

55

Build – The Engineering ChallengeDo you realise how many cardboard boxes that is ?

Building at the scale of 500+ servers has its own set of problems

• Space and Environment

• Consistency of Build

• Failures during the Build

• Deployment time and the cost of rework

Two things we found very helpful:

Factory Integration Services

Cluster Management Utility

66

Build – HP Factory Integration ServicesReducing risk and time

• Many years experience of building large clusters

• Site inspection

• Build, Configure, Soak Test

• Diagnose and fix DoAs

• Rack and Label

• Asset tagging

• Custom build and set-up

• Pack and Ship

• On-Site build and integration

www.hp.com/go/factoryexpress

Complex solutions ...

... Made simple

77

Build – HP Cluster Management UtilityRack aware deployment and monitoring

• Proven cluster deployment and management tool

• 11 Years of experience

• Proven with clusters of 3500+ nodes

• Deployment

• Network and power load aware deployment

• Easily extensible

• Kickstart integration

• Monitoring

• Scalable non intrusive monitoring

• Collectl integration

• Administration

• Command Line or GUI

• Cluster wide configuration

www.hp.com/go/cmu

88

CMU Dashboard

99

Cluster Performance over time

Disk (read)

CPU

Disk (write)Network

Map

Red

05:00

10:00

15:00

1010

Operate – the organisational challengeHow do we know when its working ?

Clusters are not just large numbers of servers

• At scale it may never be 100% up (like a network)

.... but it can be 100% down (like a server)

• Need to think more in terms of “How healthy is it ?”

• Core nodes are important

• Data nodes much less so – unless they fail in patterns

• Edge nodes – somewhere in between

• Look at HDFS health for replication counts

• Nagios & ganglia

• Collectl / CMU to visualise the cluster

1111

Summary

Key considerations when building a large cluster

• Use a pilot system to establish your server configuration

• Stand on the shoulders of the Pioneers

• Build and test in the factory if you can

• Consistency in the build and configuration is vital

• Cherish the NameNode, protect the Edge Nodes, and develop the

right level of indifference to the Data Nodes

• Practice the key recovery cases

• Match training and support to the service expectations

And remember not all things in life scale as well as Hadoop

12

Questions ?