virtual hierarchies to support server consolidation michael marty and mark hill university of...

Post on 29-Mar-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Virtual Hierarchies to Support Server Consolidation

Michael Marty and Mark HillUniversity of Wisconsin - Madison

What is Server Consolidation?

Multiple server applications are deployed onto Virtual Machines (VMs), running on a single, more powerful server.

Feasibility Virtualization Technology (VT) –

Hardware and software Many-core CMPs – Sun’s Niagara (32

threads); Intel’s Tera-scale project (100s tiles)

CMP Running Consolidated Servers

Characteristics

Isolating the function of VMs Isolating the performance of

consolidated servers Facilitating dynamic reassignment

of VM resources (processor, memory)

Supporting inter-VM memory sharing (content-based page sharing)

How Memory System Optimized?

Minimize AMAT by servicing misses within a VM

Minimize interference among separate VMs to isolate performance

Facilitate dynamic reassignment of cores, caches, and memory to VMs

Inter-VM page sharing

Current CMP Memory Systems

Global broadcast – Not viable for such a large number of tiles

Global directory – Forcing memory accesses to cross chip, failing to minimize AMAT and isolate performance

Statically distributing dir among tiles – Better, complicating memory allocation, VM reassignment & scheduling, limiting sharing opportunity

DRAM Dir with Dir Cache (DRAM-DIR)

Main dir in DRAM; Dir cache in Memory Controller

Each tile is a sharer of the data

Any miss issues a request to dir.

1. Failing to minimize AMAT-Significant latency to reach dir, even data is near2. Allows performance of one VM to affect others-due to interconnect and directory contention.

Duplicate Tag Directory (TAG-DIR)

Centrally located Fails to minimize

AMAT Dir contentions Challenging as the

number of cores increases (64 cores, 16-way => 1024-way)

Static Cache Bank Dir(STATIC-BANK-DIR)

Home tile (decided by block address or page frame no.)

Home tile maintains sharer & states

A local miss asks for home tile

A replacement from home tile invalidates all copies

Fails to meet minimizing AMAT, VM isolation (Even worse, due to invalidations.)

Solution: Two-level virtual hierarchy

Level 1 directory for intra-VM coherence Minimizing memory access time Isolating performance

Two alternative global level two protocols for inter-VM coherence Allowing for inter-VM sharing due to

migration, reconfiguration, page sharing

VHA and VHB

Level 1 Intra-VM Dir Protocol

Home tile within the VM

Who is home? Not necessarily

power of 2 Dynamic

reassignment Dynamic home

tiles by VM config Table (64-entry)

64 bit vector for each dir entry

Level 2 – Option 1: VHA

Dir in DRAM and Dir Cache in Memory Controller

Each entry contains a full 64-bit vector

Why not home tile ID?

Brief Summary

Level-one Intra-VM protocol handles most of the coherence

Level-two protocol will only be used for inter-VM sharing and dynamic reconfiguration of VMs

Can we reduce the complexity of Level-two protocol?

Level 2 – Option 2: VHB

A single bit tracks whether a block has any cached copies.

Broadcast for misses for inter-VM sharing if bit is set.

Advantage of Level 2 Broadcast

Reduce the complexity of protocol, get rid of many transient states

Enables level 1 proto to be inexact Using limited or coarse-grain vector Even no state with broadcast within VM No home tag for private data Victimize a tag without invalidating sharers Accessing memory with prediction without

checking the home tile first

Uncontended L1-to-L1 Sharing latency

Normalized Runtime: Homogenous

STATIC-BANK-DIR & VHA consumes tag space in static or dynamic home tiles

VHB: no home tiles for private data

Memory System Stall Cycle

Cycle per Transaction for Mixed

VHB best overall performance, lowest cpt DRAM-DIR: 45%-55% hit rate in the 8MB Dir

Cache (no partition) STATIC: slightly better for oltp, worse for jbb in

mixed1, allow interference, allow oltp to use other VMs resource

Conclusion

Future memory system should be optimized for workload consolidation as well as single-workload.

Maximize shared memory accesses serviced within a VM

Minimize interference among separate VMs

Facilitate dynamic reassignment of resource

top related