strategies in cluster-design - universitas...
TRANSCRIPT
Strategies in Cluster-Design
Gerolf Ziegenhain, TU Kaiserslautern, Germany
Outline of This Talk● Look at the technologies once again● Provide more detail for making decisions● What to consider?● What should be avoided by all means?● Provide keywords / directions for further reading
● Less organized talk● Contains personal experience
Making Decisions● Strategic decisions:
– Do once, changes difficult, expensive
● Setup relatively easy● Therefore know some numbers
(per person, group, university)– #jobs
– Runtime of jobs
– CPUs per job
– Memory per job
– Coupling of system: latency / bandwidth
– Hdd storage (also consider final storage)
Buy or Build?● Buying
– Less work
– More costs
– You will have more than you want
– Vendor may help in consulting
● Building yourself– More work
– High learning effect
– Less costs
– You will have what you buy
Technological Overview
DHCP
NIS
Firewall
Queue
Syslog
Login1
Mirror
Boot
Admin Login2
NAS1
NAS3
NAS2
User1 User3
User2
Nodes
Components of a Cluster
Networking
A Word on Entropy● Managing 10 Workstations differs a lot from
managing a cluster● Entropy of cables
– Sort them immediately
– Use colors
– Use hook-and-loop-tape
– Use printed labels
Choice of Hardware● Nodes● Networking● Overhead servers
Choosing Nodes
Example: Google
Example: Google● Stock hardware● Custom build low-tech cases● Modular approach● Components
– Mainboard, CPU, Memory
– 2x HDD (Stripe)
– UPS Battery
● Advantage:– Cheap
– High learning effect
Example: BlueGene/P● PowerPC● Custom build
– Boards
– Chips
– Networking
● Advantage:– Scales very good
Buy a Rack● Common beowolf cluster● Buy ready-built 19”
pizza-boxes● Mounting in 19” rack
– Usually 42HE
● Advantage– Less work
– High packing density
Use Ready-Built Desktops
Processors and Architectures?● Know your problem● What to know about your algorithms?
– How much memory?
– Can the problem easily be decomposed?
– What precision?
● Libraries– Do they exist for your problem (i.e. QM calculations)
– Do they run on all architectures?
● Choices:– Architecture (usually AMD / Intel is a good choice)
– #CPUs
Storage Management● Know your problem● Parameters to know
– How much HDD space?
– What is the common bandwidth?● Evaluating 100GB files in real-time?● Writing out 1TB files?
● Choices: – NAS (multiple?)
– SAN
– Distributed filesystem
Backup● RAID ≠ backup
– You still can kill your stuff byrm -rf /my_stuff
● Incremental backup – Critical user configuration
– Configuration files
– Complete overhead server installation
Networking
Types● Know your problem● Choices
– Bandwidth● Gbit < Infiniband● Gbit: channel bonding possible
– Latency time ● Gbit > SCI
– Scalability● Stacked network switches● Fat tree architecture
Switches● Important parameters
– Backbone speed ● throughput when all ports are under load?
– Can it be configured? ● Auto-Sensing ● IP ● ARP● ...
– Stackable?
– (Uplink ports?)
Which #Cores/Node is Optimum?● Currently cheapest cost per core: 8 cores per node● Small systems (48 nodes)
– Doesn't matter because one switch is enough
● Average systems– Do you need all-to-all connections?
– Different rings or change network topology
– If you want to stick to single-switched networks: current optimum is 16 cpu per node for this
● Big system– Go for fat tree network :)
Infrastructure Requirements● Cooling
– Each W burned in CPU heat⇒● Stable power supply
– Black out?
– Fluctuations in voltage level ● Cheap power supplies will break on fluctuations
Notes about Power Consumption● Less power consumption
less heat ⇒ less defects(?)⇒
● Running costs per year can easily reach initial investment costs!– Do the math blade center could also pay off!⇒
● Do not switch on / off all nodes at once– Voltage peaks!
Decomposition of the Servers
Why Separate Login Nodes?● User interaction● May hang due to jobs ● Security
– Ssh ports open
– May be hacked
● Configuration of user packages– System more on bleeding edge
Splitting Servers● Easily >10 overhead tasks● Why not in one big server?
– Security (one hole all broken)⇒– Stability
– Maintenance ● Updates (what was done 3 years ago?)● Dependencies (how do software packages interfere?)● No plugin structure (no testing of different variants)
● Solution– Split the tasks >10 overhead servers⇒– Problem:
● Cost ● Hardware failures?
Combining Servers● Use XEN ● Host servers: 1...3 servers
– Hardware failure tolerant
● Further advantages– Extremely reduced costs
– Complete rollback possible
– Try different configurations ● Experiments are possible with limited budget
– Clear separation of tasks
Administration
Administration Policies● Interaction with human beings
– Difficult social aspects
– Good administrator is never realized (system works)
● Who has the root password?● Who will document what has been done?● Split the work, but communicate:
– Design decisions
– Buying, writing grant proposals
– Installation, bug fixing
– Educating end-users
Administration Policies● User interaction
– Keep the users informed (mailing list)
– Monitor system to cancel out problems before they occur
Managing Different Groups● Impossible!● Each group has to provide at least one person
– Managing user education
– Monitoring performance
– Know the needs ( cluster design decisions)⇒
⇒ Sharing administrator not possible!● Sharing resources: possible & meaningful
What is the Critical Data?● What data has to be stored?
– User programs
– Final data
– May be put on RAID-Mirror
● What data can be exposed to potential loss?– Temporary files
– May be put on RAID-Stripe
Compilation● Custom user programs / libraries● Where to install
– /usr/local/ (system-wide)
– $HOME (per-user)● Autotools provide possibility to install whole distribution in
home-directory!
⇒ Depends on how often the code changes
● Choosing a compiler– GNU compilers are good & free
– Special CPU instructions: buy a compiler● Intel compiler● Portland compiler
Security● University networks are
– Insecure
– Treasured victims
● Risks– Ssh password login
– Open ports
– Updating● Keep up to date with serious bugs!
– Users
● Therefore (attacks will happen on daily basis!)– Use firewall
– Monitor system for odd behavior
Operating Systems
Which Operating System?● Different OS / distributions exist
– But widely compatible configuration
– Way of doing stuff differs slightly in detail● I.e. Directories / files
– Watch out for licenses: BSD, GPL, ...
● OS: provide basic stable & secure functionality– Linux
● Debian● RedHat● SuSE (slow, costly, small community)
– FreeBSD (more secure, but ~older versions)
– OpenBSD (most secure)
Updating or not?● Motivations
– Stability
– Security
– Features
● Possible solution: – Keep login servers and firewall up-to-date
– Keep computation nodes stable (out-of-date)
– Works only if nodes are in inner network
Rolling your own distribution● Possible solution for installation issues● Possibilities
– From scratch distribution
– Modify existing distribution
– Compiling only custom packages (/usr/local/bin)
– Keep system hdd-images and clone them
Lesson Learned● Reproduceable?
– Making a distribution is exhausting
● Documentation (wiki)– Someday you have to handover
– Or reinstall
● Keep a complete mirror – Packages may vanish
The Gentoo-Approach● Use source-packages● Autotools binary files⇒● Create special configuration files for dependencies
– In gentoo: portage (→ corvix: egatrop)
– In bsd: ports
● Alternative– Linux from scratch
● Missing the configuration files ● Rely on autotools
– Arch linux
● Websites are good sources for step-by-step howtos
The Debian-Approach● Compile once, distribute binary packages● Create custom-packages with only one command● Advantage
– Extremely fast
– Easier to maintain for big number of servers
– Embedded devices use similar packages architecture
Our solution● Stable basis system:
– Debian overlays● Additional package source with custom packages
– Xen images of the installed debian-system ⇒ Even faster reinstallations
● Custom software– I.e. user demanded libraries
– Compilation in ~
Other cluster distributions● Debian-Based / RedHat-Based exist
– I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ...
● Good source for howtos● Good as cheat-sheet● But
– HPC is inherently customized
– Flexibility highest with customized installation
– None of the distros solved a problem that we had
Thank you!
● Acknowledgements– Prof. Dr. rer. Nat. Herbert M. Urbassek,
TU Kaiserslautern, Germany