1 server computing – motivation and overview · active research work gain understanding of new...
TRANSCRIPT
1
1
Server Computing – Motivation and Overview
Prof. Dr. Andreas Polze Hasso Plattner Institute for Software Engineering at University Potsdam [email protected]
2 Mainframes PC‘s Service Grids
Hardware Software Middleware
IBM Microsoft ???
closed closed open standards, proprietary proprietary WebServices
PC1
PC3 PC2
The Shifting Paradigm
Cluster
SuperDome
OS/360, OS/390, zOS: RACF, JES2, licensing
OpenVMS: clustering, failover,
versioning file system
2
3
Computer Classification
4
MIMD lives on
3
5
More Complications...
6
Long retired...
Sequent Symmetry
4
7
Intel Paragon
Milestone in Computer Architecture
Each node runs one OS instance (Mach)
8
Servers have evolved...
• New form factors
• Higher density • Standard architectures
(x64, Itanium)
• Multicore/multithreaded arch
Advances in operating systems • Virtualization • Thrustworthiness/security • Clustering
Need for new programming model, Software Architectures, Services
5
9
Some problems remain...
10
Green IT
Server consolidation will lead to better energy efficiency
6
Dependability
Umbrella term for operational requirements on a system
■ „Trustworthiness of a computer system such that reliance can be placed on the service it delivers to the user“ [Laprie]
General question: How to deal with unexpected events ?
System Quality
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
Mission-Critical Applications
Large-Scale Clusters and
Distributed Systems
Hardware Solutions
Combined Solutions
Large-Scale Many-Core Servers
Software Solutions Combined Solutions
Time
Dependability Research
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
7
Hardware Revolution in the X86 World
Het
erog
eneo
us
Com
puting
Mem
ory
Hie
rarc
hy
Man
y-Cor
e
Proc
esso
r In
terc
onne
ct
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
Hypothesis: Reliability Wisdoms Replaced
Dramatic shift in single machine reliability aspects
■ SMP becomes heterogeneous tiled on-chip network
■ Decreasing structural sizes + dynamic frequency and voltage
■ Massive memory increase More fault classes, less error containment !
Few research results from HPC perspective
■ Type and intensity of workload significantly influences life time
■ Failure rates depend on processor count, not hardware type
Bia
nca
Sch
roed
er e
t al
.
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
8
Observations
Traditional hardware fault models need an update
■ Memory with increased density and data rates
■ Group of ,simple‘ cores instead of monolithic processor
■ Interconnect as crucial component, fault isolation issues Reactive fault tolerance gets inappropriate
■ Recovery time correlates with system size
■ 24/7 business availability demands pro-active fault tolerance
■ Reactive does not scale (Example: HPC)
Virtualization as new system layer
■ Dependability of (hardware-supported) hypervisors Weak tool support for reliability research
■ Missing consideration of ,below-OS‘ testing
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
FutureSOC Lab
HPI FutureSOC Lab
■ Collaboration with industry for software research on next-generation X86 hardware (32-65 cores, 1-2 TB RAM)
Active research work ■ Gain understanding of new fault classes
■ Failure prediction based on cross-level monitoring data analysis
■ Pro-active virtual machine migration
■ Fault injection based on UEFI firmware technology
■ Parallelized build processing
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
9
CPU level: Online Hardware Failure Prediction
Using X86 hardware performance events
■ Instruction retirement, cache miss, branch miss-prediction, ... □ Limited number of hardware counter units -> exploit event correlations
□ Threshold-triggered, time-triggered ■ Applicable to major cellular multiprocessing platforms
(Intel, AMD, SPARC, IBM Power)
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
Memory level: observations from our FutureSOC Lab
Date | Severity |Event| Source | Description
15-Jun-2010 13:47:12 | Info | No | BIOS | System boot (POST complete)
15-Jun-2010 13:45:53 | Major | No | [0x00:00] | POST - 'MEM4_DIMM-2D' memory training failed
15-Jun-2010 13:45:53 | Major | No | [0x00:00] | POST - 'MEM4_DIMM-1D' memory training failed
15-Jun-2010 13:45:53 | Major | No | [0x00:00] | POST - 'MEM4_DIMM-2B' memory training failed
15-Jun-2010 13:45:53 | Major | No | [0x00:00] | POST - 'MEM4_DIMM-1B' memory training failed
15-Jun-2010 13:45:53 | Critical | Yes | SMI | 'MEM4_DIMM-1D' Memory: Uncorrectable error (ECC)
15-Jun-2010 13:45:53 | Critical | Yes | SMI | 'MEM4_DIMM-1C' Memory: Uncorrectable error (ECC)
15-Jun-2010 13:45:53 | Critical | Yes | SMI | 'MEM4_DIMM-1B' Memory: Uncorrectable error (ECC)
15-Jun-2010 13:45:53 | Critical | Yes | SMI | 'MEM4_DIMM-1A' Memory: Uncorrectable error (ECC)
15-Jun-2010 13:45:40 | Critical | Yes | iRMC S2 | 'MEM4_DIMM-2D': Memory module failed (disabled)
15-Jun-2010 13:45:40 | Critical | Yes | iRMC S2 | 'MEM4_DIMM-1D': Memory module failed (disabled)
15-Jun-2010 13:45:40 | Critical | Yes | iRMC S2 | 'MEM4_DIMM-2B': Memory module failed (disabled)
15-Jun-2010 13:45:40 | Critical | Yes | iRMC S2 | 'MEM4_DIMM-1B': Memory module failed (disabled)
15-Jun-2010 13:43:43 | Info | No | BIOS | System boot (POST complete)
14-Jun-2010 17:41:47 | Critical | Yes | iRMC S2 | 'MEM4_DIMM-1D': Memory module error
14-Jun-2010 17:26:17 | Major | Yes | iRMC S2 | 'MEM4_DIMM-1D': Memory module failure predicted
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
10
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
z/PDT + z/OS R11: the mainframe in our lab
Predictive Failure Analysis (PFA) and Runtime Diagnostics (RD) in z/OS R11 (sick-but-not-death-incidents)
OS level:
OS level: our NTrace for Windows
Compiler/linker switch
■ /hotpatch, /functionpadmin
■ Microsoft C compiler shipped with Windows Server 2003 SP1 and later
Hotpatchable:
■ Windows Server 2003 SP1,Vista, Server 2008, Windows 7
■ Windows Research Kernel
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
Foo‐5: CallProxy:
. . . . . .
EntryThunk:
Foo:
. . .
„Ablaufverfolgung in einem laufenden Computersystem“ Pat. pend. DE-10 1009 038 177.5
... retn 10 nop
nop nop nop nop
NtfsPinMappedData:
mov edi, edi push ebp mov ebp, esp mov ecx, [ebp+18h] mov edx, [ebp+0Ch]
...
11
Monitoring on application (server) level
Request package enters platform (source: WSQM)
Service reachable, but broken
(source: Laprie)
Time for EJB / Handler processing (source: JSR-77)
Finished requests / uptime (source: WSQM)
Service not reachable (source: WSLA)
Service Resource
Meta Predictor
Ensemble learning: • Boosts accuracy – which failure-prone situations can best be identified by
either hardware, OS, VMM failure predictors?
• Domain knowledge – operating system vendors know their system best and can provide the most advanced predictor on OS level
• Pluggable – domain predictors provided by an application vendor can easily be integrated into our anticipatory virtualization architecture
• Ensemble-learning can combine predictions across all system levels Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
12
Our idea: Global System Health Indicator
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
CPU
Bare-Metal VMM
Core Core
Core Core
Mai
nboa
rd
Dev
ices
OS
App
licat
ion
Ser
ver
OS
Machine Check Architecture, CPU Hardware Profiling
VMware vProbe
Dtrace, Windows Monitoring Kernel
Application-specific counters, JSR-77,
AppServer Monitoring
Hardware level
VMM Level
Operating System Level
Application &
Middleware level
Wor
kloa
d
App
licat
ion
Ser
ver
Wor
kloa
d
Virtualization Cluster Management
Phys
ical
Mac
hine
Sta
tus
Virtu
al M
achi
ne S
tatu
s
Pre-
dict
or
Pre-
dict
or
Pre-
dict
or
Pre-
dict
or
System Health Indicator
Multi-Level Failure Prediction
Pro-Active VM Migration based upon Multi-Level Failure Prediction
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
Server blade
CPU
Bare-Metal VMM
Core Core
Core Core
Mai
nboa
rd
Dev
ices
OS
Server blade
CPU
Bare-Metal VMM
Core Core
Core Core
Mai
nboa
rd
Dev
ices
Virtualization Cluster
Management Console
VMM-Based Monitoring
Reactive Live Migration
OS
App
licat
ion
Ser
ver
Wor
kloa
d
App
licat
ion
Ser
ver
Wor
kloa
d
13
VM Migration – how long does it take?VMWare ESX 4
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
mig
rati
on
tim
e in
seco
nd
s
mig
rati
on
tim
e in
seco
nd
s
blocksize in kb
blocksize in kb allocation ra
te 1/sec
allocation ra
te 1/sec
VM Migration - VMWare ESX 4
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
14
VM Migration - XEN Server 5.6
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
VM migration – lessons learned
• Servers have evolved
• Ever growing number of CPU cores
• Tremendous amounts of memory
• Reliability will become the most sought-after feature of future server systems
• Higher density, integration levels in future CPUs will lead to multi-bit faults
• Failure prediction and VM migration as promising concept
• Must have fault isolation boundaries (LPARs, blades)
Server systems call for new programming and management models
Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group
15
Servers have evolved... New form factors Higher density Standard architectures Multicore/multithreaded Advances in operating systems Virtualization Thrustworthiness/security Clustering
Need for new programming models, SW Architectures, Services
Virtualization problems Security: extended attack surface Virtualization-based malware Must trust hypervisor
Intel VT-x, AMD Pacifica
Hybrid Computing OpenCL: New Programming Models
One Host + one or more Compute Devices Each Compute Device is composed of
one or more Compute Units Each Compute Unit is further divided
into one or more Processing Elements
Cloud Computing – the three layers
Servers Storage
Racks HVAC Power
Cloud Data Store
Managed Container
Comm- unications
Virtual Compute Virtual Machine
Virtual Storage Key-value Store
Block Store
Business Applications
Analytics Applications
Productivity Applications
Infrastructure “Infrastructure as a Service” , “Utility
Computing”
Platforms “Platform as a
Service”
Applications “Software as a
Service”, “on-demand” apps
Challenges:
• Has to abstract underlying hardware
• Be elastic in scaling to demand
• Pay per use basis
Computer architecture drives changes in system software
Andreas Polze, Operating Systems and Middleware
30
Agenda
The new Mainframe
• History, Use-Cases, Success Stories • Engineering for Reliability and Availability • Operating Systems: VM, MVS, VSE, OS/390, zOS, zLinux • Mainframe Computer Architecture • Virtualization
• zVM Overview • zVM Control Program
• Security • Security Architecture • Operating Systems Security • zOS Security Mechanisms • zOS SAF and RACF
16
Agenda (contd.)
OpenVMS
• History, Use Cases, Success Stories
• HP OpenVMS Strategy & Directions
• VAX Architecture
• VMS Architecture & Timeline
• Disaster Tolerance & Clustering
• OpenVMS Cluster Overview
• OpenVMS Cluster Theory
• DCL, UAF, Batch Job Processing
31
Agenda (contd.)
Solaris
• History, Overview, Success Stories
• Zones, DTrace, Virtualisierung
• Solaris Architecture
• Process model
• Kernel services
• Solaris scheduler
• Virtual memory system
• Solaris virtual file system
32
17
Agenda (contd.)
Future Trends
• Virtualization, Server Consolidation
• Green IT
• Multithreading/Multicore
• Cloud Computing, Web 2.0 Scalability
33
34