1 server computing – motivation and overview · active research work gain understanding of new...

1

1

Server Computing – Motivation and Overview

Prof. Dr. Andreas Polze Hasso Plattner Institute for Software Engineering at University Potsdam [email protected]

2 Mainframes PC‘s Service Grids

Hardware Software Middleware

IBM Microsoft ???

closed closed open standards, proprietary proprietary WebServices

PC1

PC3 PC2

The Shifting Paradigm

Cluster

SuperDome

OS/360, OS/390, zOS: RACF, JES2, licensing

OpenVMS: clustering, failover,

versioning file system

2

3

Computer Classification

4

MIMD lives on

3

5

More Complications...

6

Long retired...

Sequent Symmetry

4

7

Intel Paragon

Milestone in Computer Architecture

Each node runs one OS instance (Mach)

8

Servers have evolved...

•  New form factors

•  Higher density •  Standard architectures

(x64, Itanium)

•  Multicore/multithreaded arch

Advances in operating systems •  Virtualization •  Thrustworthiness/security •  Clustering

Need for new programming model, Software Architectures, Services

5

9

Some problems remain...

10

Green IT

Server consolidation will lead to better energy efficiency

6

Dependability

Umbrella term for operational requirements on a system

■  „Trustworthiness of a computer system such that reliance can be placed on the service it delivers to the user“ [Laprie]

General question: How to deal with unexpected events ?

System Quality

Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group

Mission-Critical Applications

Large-Scale Clusters and

Distributed Systems

Hardware Solutions

Combined Solutions

Large-Scale Many-Core Servers

Software Solutions Combined Solutions

Time

Dependability Research


7

Hardware Revolution in the X86 World

Het

erog

eneo

us

Com

puting

Mem

ory

Hie

rarc

hy

Man

y-Cor

e

Proc

esso

r In

terc

onne

ct


Hypothesis: Reliability Wisdoms Replaced

Dramatic shift in single machine reliability aspects

■  SMP becomes heterogeneous tiled on-chip network

■  Decreasing structural sizes + dynamic frequency and voltage

■  Massive memory increase More fault classes, less error containment !

Few research results from HPC perspective

■  Type and intensity of workload significantly influences life time

■  Failure rates depend on processor count, not hardware type

Bia

nca

Sch

roed

er e

t al

.


8

Observations

Traditional hardware fault models need an update

■  Memory with increased density and data rates

■  Group of ,simple‘ cores instead of monolithic processor

■  Interconnect as crucial component, fault isolation issues Reactive fault tolerance gets inappropriate

■  Recovery time correlates with system size

■  24/7 business availability demands pro-active fault tolerance

■  Reactive does not scale (Example: HPC)

Virtualization as new system layer

■  Dependability of (hardware-supported) hypervisors Weak tool support for reliability research

■  Missing consideration of ,below-OS‘ testing


FutureSOC Lab

HPI FutureSOC Lab

■  Collaboration with industry for software research on next-generation X86 hardware (32-65 cores, 1-2 TB RAM)

Active research work ■  Gain understanding of new fault classes

■  Failure prediction based on cross-level monitoring data analysis

■  Pro-active virtual machine migration

■  Fault injection based on UEFI firmware technology

■  Parallelized build processing


10


z/PDT + z/OS R11: the mainframe in our lab

Predictive Failure Analysis (PFA) and Runtime Diagnostics (RD) in z/OS R11 (sick-but-not-death-incidents)

OS level:

OS level: our NTrace for Windows

Compiler/linker switch

■  /hotpatch, /functionpadmin

■  Microsoft C compiler shipped with Windows Server 2003 SP1 and later

Hotpatchable:

■  Windows Server 2003 SP1,Vista, Server 2008, Windows 7

■  Windows Research Kernel


Foo‐5: CallProxy:

. . . . . .

EntryThunk:

Foo:

. . .

„Ablaufverfolgung in einem laufenden Computersystem“ Pat. pend. DE-10 1009 038 177.5

... retn 10 nop

nop nop nop nop

NtfsPinMappedData:

mov edi, edi push ebp mov ebp, esp mov ecx, [ebp+18h] mov edx, [ebp+0Ch]

...

11

Monitoring on application (server) level

Request package enters platform (source: WSQM)

Service reachable, but broken

(source: Laprie)

Time for EJB / Handler processing (source: JSR-77)

Finished requests / uptime (source: WSQM)

Service not reachable (source: WSLA)

Service Resource

Meta Predictor

Ensemble learning: •  Boosts accuracy – which failure-prone situations can best be identified by

either hardware, OS, VMM failure predictors?

•  Domain knowledge – operating system vendors know their system best and can provide the most advanced predictor on OS level

•  Pluggable – domain predictors provided by an application vendor can easily be integrated into our anticipatory virtualization architecture

•  Ensemble-learning can combine predictions across all system levels Peter Tröger, Felix Salfner, Andreas Polze, Operating Systems and Middleware Group

12

Our idea: Global System Health Indicator


CPU

Bare-Metal VMM

Core Core

Core Core

Mai

nboa

rd

Dev

ices

OS

App

licat

ion

Ser

ver

OS

Machine Check Architecture, CPU Hardware Profiling

VMware vProbe

Dtrace, Windows Monitoring Kernel

Application-specific counters, JSR-77,

AppServer Monitoring

Hardware level

VMM Level

Operating System Level

Application &

Middleware level

Wor

kloa

d

App

licat

ion

Ser

ver

Wor

kloa

d

Virtualization Cluster Management

Phys

ical

Mac

hine

Sta

tus

Virtu

al M

achi

ne S

tatu

s

Pre-

dict

or

Pre-

dict

or

Pre-

dict

or

Pre-

dict

or

System Health Indicator

Multi-Level Failure Prediction

Pro-Active VM Migration based upon Multi-Level Failure Prediction


Server blade

CPU

Bare-Metal VMM

Core Core

Core Core

Mai

nboa

rd

Dev

ices

OS

Server blade

CPU

Bare-Metal VMM

Core Core

Core Core

Mai

nboa

rd

Dev

ices

Virtualization Cluster

Management Console

VMM-Based Monitoring

Reactive Live Migration

OS

App

licat

ion

Ser

ver

Wor

kloa

d

App

licat

ion

Ser

ver

Wor

kloa

d

13

VM Migration – how long does it take?VMWare ESX 4


mig

rati

on

tim

e in

seco

nd

s

mig

rati

on

tim

e in

seco

nd

s

blocksize in kb

blocksize in kb allocation ra

te 1/sec

allocation ra

te 1/sec

VM Migration - VMWare ESX 4


14

VM Migration - XEN Server 5.6


VM migration – lessons learned

•  Servers have evolved

•  Ever growing number of CPU cores

•  Tremendous amounts of memory

•  Reliability will become the most sought-after feature of future server systems

•  Higher density, integration levels in future CPUs will lead to multi-bit faults

•  Failure prediction and VM migration as promising concept

•  Must have fault isolation boundaries (LPARs, blades)

Server systems call for new programming and management models


15

Servers have evolved...   New form factors   Higher density   Standard architectures   Multicore/multithreaded Advances in operating systems   Virtualization   Thrustworthiness/security   Clustering

Need for new programming models, SW Architectures, Services

Virtualization problems   Security: extended attack surface   Virtualization-based malware   Must trust hypervisor

Intel VT-x, AMD Pacifica

Hybrid Computing OpenCL: New Programming Models

  One Host + one or more Compute Devices   Each Compute Device is composed of

one or more Compute Units   Each Compute Unit is further divided

into one or more Processing Elements

Cloud Computing – the three layers

Servers Storage

Racks HVAC Power

Cloud Data Store

Managed Container

Comm- unications

Virtual Compute Virtual Machine

Virtual Storage Key-value Store

Block Store

Business Applications

Analytics Applications

Productivity Applications

Infrastructure “Infrastructure as a Service” , “Utility

Computing”

Platforms “Platform as a

Service”

Applications “Software as a

Service”, “on-demand” apps

Challenges:

•  Has to abstract underlying hardware

•  Be elastic in scaling to demand

•  Pay per use basis

Computer architecture drives changes in system software

Andreas Polze, Operating Systems and Middleware

30

Agenda

The new Mainframe

•  History, Use-Cases, Success Stories •  Engineering for Reliability and Availability •  Operating Systems: VM, MVS, VSE, OS/390, zOS, zLinux •  Mainframe Computer Architecture •  Virtualization

•  zVM Overview •  zVM Control Program

•  Security •  Security Architecture •  Operating Systems Security •  zOS Security Mechanisms •  zOS SAF and RACF

16

Agenda (contd.)

OpenVMS

• History, Use Cases, Success Stories

• HP OpenVMS Strategy & Directions

• VAX Architecture

• VMS Architecture & Timeline

• Disaster Tolerance & Clustering

•  OpenVMS Cluster Overview

•  OpenVMS Cluster Theory

• DCL, UAF, Batch Job Processing

31

Agenda (contd.)

Solaris

• History, Overview, Success Stories

• Zones, DTrace, Virtualisierung

• Solaris Architecture

•  Process model

•  Kernel services

•  Solaris scheduler

•  Virtual memory system

•  Solaris virtual file system

32

17

Agenda (contd.)

Future Trends

• Virtualization, Server Consolidation

• Green IT

• Multithreading/Multicore

• Cloud Computing, Web 2.0 Scalability

33

34

1 server computing – motivation and overview · active research work gain understanding of new...

Documents