achieving isolation in consolidated environments

34
Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh

Upload: heinz

Post on 23-Mar-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Achieving Isolation in Consolidated Environments. Jack Lange Assistant Professor University of Pittsburgh. Consolidated HPC Environments. The future is consolidation of commodity and HPC workloads HPC users are moving onto cloud platforms Dedicated HPC systems are moving towards in-situ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Achieving Isolation in Consolidated Environments

Achieving Isolation in Consolidated Environments

Jack LangeAssistant ProfessorUniversity of Pittsburgh

Page 2: Achieving Isolation in Consolidated Environments

Consolidated HPC Environments• The future is consolidation of commodity and HPC

workloads– HPC users are moving onto cloud platforms– Dedicated HPC systems are moving towards in-situ

• Consolidated with visualization and analytics workloads

• Can commodity OS/R’s effectively support HPC consolidation?– Commodity Design Goals

• Maximized resource utilization• Fairness• Graceful degradation under load

Page 3: Achieving Isolation in Consolidated Environments

• Current systems do support this, but…• Interference still exists inside the system software– Inherent feature of commodity systems

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

HPC PartitionCommodity Partition

Hardware Partitioning• Current approaches emphasize hardware space sharing

Page 4: Achieving Isolation in Consolidated Environments

HPC vs. Commodity Systems

• Commodity systems have fundamentally different focus than HPC systems– Amdahl’s vs. Gustafson’s laws– Commodity: Optimized for common case

• HPC: Common case is not good enough– At large (tightly coupled) scales, percentiles lose meaning– Collective operations must wait for slowest node– 1% of nodes can make 99% suffer– HPC systems must optimize outliers (worst case)

Page 5: Achieving Isolation in Consolidated Environments

Multi-stack Approach

• Dynamic Resource Partitions– Runtime segmentation of underlying hardware resources– Assigned to specific workloads

• Dynamic Software Isolation– Prevent interference from other workloads– Execute on separate system software stacks– Remove cross stack dependencies

• Implementation– Independent system software running on isolated resources

Page 6: Achieving Isolation in Consolidated Environments

Least Isolatable Units

• Independently managed sets of isolated HW resources

• Our Approach: Decompose system into sets of isolatable components– Independent resources that do not interfere with other components

• Workloads execute on dedicated collections of LIUs– Units of allocation– CPU, memory, devices– Each are managed by independent system software stacks

Page 7: Achieving Isolation in Consolidated Environments

Linux Memory Management• Demand Paging

– Primary goal is to optimize memory utilization – not performance– Reduce overhead of common application behavior (fork/exec)– Support many concurrent processes

• Large Pages– Integrated with overall demand paging architecture

• Implications for HPC– Insufficient resource isolation– System noise– Linux large page solutions contribute to these problems

IPDPS 2014Brian Kocoloski and Jack Lange, HPMMAP: Lightweight Memory Management for Commodity Operating Systems

Page 8: Achieving Isolation in Consolidated Environments

Transparent Huge Pages• Transparent Huge Pages (THP)

– Fully automatic large page mechanism – no system administration or application cooperation

– (1) Page fault handler uses large pages when possible– (2) khugepaged

• khugepaged– Background kernel thread– Periodically allocates a large page– “Merges” large page into address space of any process requesting THP

support– Requires global page table lock– Driven by OS heuristics – no knowledge of application workload

Page 9: Achieving Isolation in Consolidated Environments

Transparent Huge Pages

• Large page faults green, small faults delayed by merges blue• Generally periodic, but not synchronized• Variability increases dramatically under additional load

Page 10: Achieving Isolation in Consolidated Environments

HugeTLBfs

• HugeTLBfs– RAM-based filesystem supporting large page allocation– Requires pre-allocated memory pools reserved by system

administrator– Access generally managed through libhugetlbfs

• Limitations– Cannot back process stacks and other special regions– VMA permission/alignment constraints– Highly susceptible to overhead from system load

Page 11: Achieving Isolation in Consolidated Environments

HugeTLBfs

• Overhead of small page faults increases substantially• Due to memory exhaustion • HugeTLBfs memory is removed from pools available to small page fault

handler

Page 12: Achieving Isolation in Consolidated Environments

HPMMAP Overview

• High Performance Memory Mapping and Allocation Platform– Lightweight memory

management for unmodified Linux applications

• HPMMAP borrows from the Kitten LWK to impose isolated virtual and physical memory management layers• Provide lightweight versions of memory management system

calls• Utilize Linux memory offlining to completely manage large

contiguous regions• Memory available in no less than 128 MB regions

Page 13: Achieving Isolation in Consolidated Environments

HPMMAP Application Integration

Page 14: Achieving Isolation in Consolidated Environments

Results

Page 15: Achieving Isolation in Consolidated Environments

Evaluation – Multi-Node Scaling

• Sandia cluster (8 nodes, 1Gb Ethernet)– One co-located 4-core parallel kernel build per node

• No over-committed cores

• 32 rank improvement: 12% for HPCCG, 9% for miniFE, 2% for LAMMPS• miniFE

• Network overhead past 4 cores• Single node variability translates into worse scaling (3% improvement in single node experiment)

Page 16: Achieving Isolation in Consolidated Environments

HPC in the cloud• Clouds are starting to look like supercomputers…

• But we’re not there yet– Noise issues– Poor isolation– Resource contention– Lack of control over topology

• Very bad for tightly coupled parallel apps– Require specialized environments that solve these problems

• Approaching convergence– Vision: Dynamically partition cloud resources into HPC and commodity zones

Page 17: Achieving Isolation in Consolidated Environments

Multi-stack Clouds

With Jiannan Ouyang and Brian Kocoloski

• Virtualization overhead is not due to hardware costs– Results from underlying Host OS/VMM architectures and policies– Susceptible to performance overhead and Interference

• Goal to provide isolated HPC VMs on commodity systems– Each zone optimized for the target applications

Hardware

Kitten(Lightweight Kernel)

Isolated VM

Linux

Commodity VM(s)

Palacios VMMKVM

Page 18: Achieving Isolation in Consolidated Environments

Multi-OS Architecture

• Goals:– Fully isolated and independent operation– OS Bypass communication– No cross kernel dependencies

• Needed Modifications:– Boot process that initializes subset of offline resources– Dynamic resource (re)assignment to the Kitten LWK– Cross stack shared memory communication– Block Driver Interface

Page 19: Achieving Isolation in Consolidated Environments

Isolatable Hardware• We view system resources as a collection of Isolatable Units

– In terms of both Performance and Management

• Some hardware makes this easy– PCI (w/MSI, MSI-X)– APIC

• Some hardware makes this difficult– SATA– IO-APIC– IOMMU

• Some hardware makes this impossible– Legacy IDE– PCI (w/ Legacy PCI-INTx IRQs)

• Some hardware cannot be completely isolated– SRIOV PCI devices– HyperThreaded CPU cores

Page 20: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 21: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 22: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 23: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 24: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 25: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 26: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 27: Achieving Isolation in Consolidated Environments

CoresSocket 1

Memory

1 2

3 4

CoresSocket 2

5 6

7 8

Memory

Linux Offline Kitten

NIC Infiniband SATA

PCI

Page 28: Achieving Isolation in Consolidated Environments

Multi-stack Architecture

• Allow multiple dynamically created enclaves– Based on runtime isolation requirements– Provides flexibility of fully independent OS/Rs• Isolated Performance and resource management

Linux

HardwareKitten (1)

HPC VMCommodity VM(s)

Palacios VMMKVM

Kitten (2)

HPC App

Linux

Hardware

Palacio VMMKitten LWK(Lightweight Kernel)

HPC ApplicationCommodity

Application(s)

Page 29: Achieving Isolation in Consolidated Environments

Performance Evaluation• 8 Node Infiniband Cluster

– Space shared between commodity and HPC workloads• Commodity: Hadoop• HPC: HPCCG

– Infiniband passthrough for HPC VM– 1Gb Ethernet Passthrough for Commodity VM

• Compared Multi-stack (Kitten + Palacios) vs. full Linux environment (KVM)– 10 Experiment runs for each configuration

• CAVEAT: VM disks were all accessed from Commodity partition– Suffers significant interference (Current work)

Linux + KVM Multi-stack(Kitten + Palacios)

Mean Runtime (s) 55.754 52.36Std Dev 2.231433022 0.386551707

Page 30: Achieving Isolation in Consolidated Environments

Conclusion• Commodity systems are not designed to support HPC workloads

– Different requirements and behaviors than commodity applications

• A multi stack approach can provide HPC environments in commodity systems– HPC requirements can be met without separate physical systems– HPC and commodity workloads can dynamically share resources– Isolated system software environments are necessary

Page 31: Achieving Isolation in Consolidated Environments

Thank you

Jack LangeAssistant Professor University of Pittsburgh

[email protected]

http://www.cs.pitt.edu/~jacklange

Page 32: Achieving Isolation in Consolidated Environments

Multi-stack Operating Systems• Future Exascale Systems are moving towards in situ organization• Applications traditionally have utilized their own platforms• Visualization, storage, analysis, etc

• Everything must now collapse onto a single platform

Page 33: Achieving Isolation in Consolidated Environments

Performance Comparison

Linux Memory Management Lightweight Memory Management

Occasional Outliers(Large page coalescing)

Lowlevel noise

Page 34: Achieving Isolation in Consolidated Environments