strategies in cluster-design - universitas...

Strategies in Cluster-Design

Gerolf Ziegenhain, TU Kaiserslautern, Germany

Outline of This Talk● Look at the technologies once again● Provide more detail for making decisions● What to consider?● What should be avoided by all means?● Provide keywords / directions for further reading

● Less organized talk● Contains personal experience

Making Decisions● Strategic decisions:

– Do once, changes difficult, expensive

● Setup relatively easy● Therefore know some numbers

(per person, group, university)– #jobs

– Runtime of jobs

– CPUs per job

– Memory per job

– Coupling of system: latency / bandwidth

– Hdd storage (also consider final storage)

Buy or Build?● Buying

– Less work

– More costs

– You will have more than you want

– Vendor may help in consulting

● Building yourself– More work

– High learning effect

– Less costs

– You will have what you buy

Technological Overview

DHCP

NIS

Firewall

Queue

Syslog

Login1

Mirror

Mail

Boot

Admin Login2

NAS1

NAS3

NAS2

User1 User3

User2

Nodes

Components of a Cluster

Networking

A Word on Entropy● Managing 10 Workstations differs a lot from

managing a cluster● Entropy of cables

– Sort them immediately

– Use colors

– Use hook-and-loop-tape

– Use printed labels

Choice of Hardware● Nodes● Networking● Overhead servers

Choosing Nodes

Example: Google

Example: Google● Stock hardware● Custom build low-tech cases● Modular approach● Components

– Mainboard, CPU, Memory

– 2x HDD (Stripe)

– UPS Battery

● Advantage:– Cheap

– High learning effect

Example: BlueGene/P● PowerPC● Custom build

– Boards

– Chips

– Networking

● Advantage:– Scales very good

Buy a Rack● Common beowolf cluster● Buy ready-built 19”

pizza-boxes● Mounting in 19” rack

– Usually 42HE

● Advantage– Less work

– High packing density

Use Ready-Built Desktops

Processors and Architectures?● Know your problem● What to know about your algorithms?

– How much memory?

– Can the problem easily be decomposed?

– What precision?

● Libraries– Do they exist for your problem (i.e. QM calculations)

– Do they run on all architectures?

● Choices:– Architecture (usually AMD / Intel is a good choice)

– #CPUs

Storage Management● Know your problem● Parameters to know

– How much HDD space?

– What is the common bandwidth?● Evaluating 100GB files in real-time?● Writing out 1TB files?

● Choices: – NAS (multiple?)

– SAN

– Distributed filesystem

Backup● RAID ≠ backup

– You still can kill your stuff byrm -rf /my_stuff

● Incremental backup – Critical user configuration

– Configuration files

– Complete overhead server installation

Networking

Types● Know your problem● Choices

– Bandwidth● Gbit < Infiniband● Gbit: channel bonding possible

– Latency time ● Gbit > SCI

– Scalability● Stacked network switches● Fat tree architecture

Switches● Important parameters

– Backbone speed ● throughput when all ports are under load?

– Can it be configured? ● Auto-Sensing ● IP ● ARP● ...

– Stackable?

– (Uplink ports?)

Which #Cores/Node is Optimum?● Currently cheapest cost per core: 8 cores per node● Small systems (48 nodes)

– Doesn't matter because one switch is enough

● Average systems– Do you need all-to-all connections?

– Different rings or change network topology

– If you want to stick to single-switched networks: current optimum is 16 cpu per node for this

● Big system– Go for fat tree network :)

Infrastructure Requirements● Cooling

– Each W burned in CPU heat⇒● Stable power supply

– Black out?

– Fluctuations in voltage level ● Cheap power supplies will break on fluctuations

Notes about Power Consumption● Less power consumption

less heat ⇒ less defects(?)⇒

● Running costs per year can easily reach initial investment costs!– Do the math blade center could also pay off!⇒

● Do not switch on / off all nodes at once– Voltage peaks!

Decomposition of the Servers

Why Separate Login Nodes?● User interaction● May hang due to jobs ● Security

– Ssh ports open

– May be hacked

● Configuration of user packages– System more on bleeding edge

Splitting Servers● Easily >10 overhead tasks● Why not in one big server?

– Security (one hole all broken)⇒– Stability

– Maintenance ● Updates (what was done 3 years ago?)● Dependencies (how do software packages interfere?)● No plugin structure (no testing of different variants)

● Solution– Split the tasks >10 overhead servers⇒– Problem:

● Cost ● Hardware failures?

Combining Servers● Use XEN ● Host servers: 1...3 servers

– Hardware failure tolerant

● Further advantages– Extremely reduced costs

– Complete rollback possible

– Try different configurations ● Experiments are possible with limited budget

– Clear separation of tasks

Administration

Administration Policies● Interaction with human beings

– Difficult social aspects

– Good administrator is never realized (system works)

● Who has the root password?● Who will document what has been done?● Split the work, but communicate:

– Design decisions

– Buying, writing grant proposals

– Installation, bug fixing

– Educating end-users

Administration Policies● User interaction

– Keep the users informed (mailing list)

– Monitor system to cancel out problems before they occur

Managing Different Groups● Impossible!● Each group has to provide at least one person

– Managing user education

– Monitoring performance

– Know the needs ( cluster design decisions)⇒

⇒ Sharing administrator not possible!● Sharing resources: possible & meaningful

What is the Critical Data?● What data has to be stored?

– User programs

– Final data

– May be put on RAID-Mirror

● What data can be exposed to potential loss?– Temporary files

– May be put on RAID-Stripe

Compilation● Custom user programs / libraries● Where to install

– /usr/local/ (system-wide)

– $HOME (per-user)● Autotools provide possibility to install whole distribution in

home-directory!

⇒ Depends on how often the code changes

● Choosing a compiler– GNU compilers are good & free

– Special CPU instructions: buy a compiler● Intel compiler● Portland compiler

Security● University networks are

– Insecure

– Treasured victims

● Risks– Ssh password login

– Open ports

– Updating● Keep up to date with serious bugs!

– Users

● Therefore (attacks will happen on daily basis!)– Use firewall

– Monitor system for odd behavior

Operating Systems

Which Operating System?● Different OS / distributions exist

– But widely compatible configuration

– Way of doing stuff differs slightly in detail● I.e. Directories / files

– Watch out for licenses: BSD, GPL, ...

● OS: provide basic stable & secure functionality– Linux

● Debian● RedHat● SuSE (slow, costly, small community)

– FreeBSD (more secure, but ~older versions)

– OpenBSD (most secure)

Updating or not?● Motivations

– Stability

– Security

– Features

● Possible solution: – Keep login servers and firewall up-to-date

– Keep computation nodes stable (out-of-date)

– Works only if nodes are in inner network

Rolling your own distribution● Possible solution for installation issues● Possibilities

– From scratch distribution

– Modify existing distribution

– Compiling only custom packages (/usr/local/bin)

– Keep system hdd-images and clone them

Lesson Learned● Reproduceable?

– Making a distribution is exhausting

● Documentation (wiki)– Someday you have to handover

– Or reinstall

● Keep a complete mirror – Packages may vanish

The Gentoo-Approach● Use source-packages● Autotools binary files⇒● Create special configuration files for dependencies

– In gentoo: portage (→ corvix: egatrop)

– In bsd: ports

● Alternative– Linux from scratch

● Missing the configuration files ● Rely on autotools

– Arch linux

● Websites are good sources for step-by-step howtos

The Debian-Approach● Compile once, distribute binary packages● Create custom-packages with only one command● Advantage

– Extremely fast

– Easier to maintain for big number of servers

– Embedded devices use similar packages architecture

Our solution● Stable basis system:

– Debian overlays● Additional package source with custom packages

– Xen images of the installed debian-system ⇒ Even faster reinstallations

● Custom software– I.e. user demanded libraries

– Compilation in ~

Other cluster distributions● Debian-Based / RedHat-Based exist

– I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ...

● Good source for howtos● Good as cheat-sheet● But

– HPC is inherently customized

– Flexibility highest with customized installation

– None of the distros solved a problem that we had

Thank you!

● Acknowledgements– Prof. Dr. rer. Nat. Herbert M. Urbassek,

TU Kaiserslautern, Germany

strategies in cluster-design - universitas...

Documents