dynamic memory pressure aware...

Dynamic Memory Pressure Aware Ballooning

Jinchun Kim, Viacheslav Fedorov, Paul V. Gratz, A. L. Narasimha Reddy

Department of Electrical and Computer EngineeringTexas A&M University

[email protected], [email protected], [email protected], [email protected]

ABSTRACTHardware virtualization is a major component of large scaleserver and data center deployments due to their facilitationof server consolidation and scalability. Virtualization, how-ever, comes at a high cost in terms of system main memoryutilization. Current virtual machine (VM) memory manage-ment solutions impose a high performance penalty and areoblivious to the operating regime of the system. Therefore,there is a great need for low-impact VM memory manage-ment techniques which are aware of and reactive to currentsystem state, to drive down the overheads of virtualization.

We observe that the host machine operates under differ-ent memory pressure regimes, as the memory demand fromguest VMs changes dynamically at runtime. Adapting tothis runtime system state is critical to reduce the perfor-mance cost of VM memory management. In this paper, wepropose a novel dynamic memory management policy calledMemory Pressure Aware (MPA) ballooning. MPA balloon-ing dynamically allocates memory resources to each VMbased on the current memory pressure regime. Moreover,MPA ballooning proactively reacts and adapts to suddenchanges in memory demand from guest VMs. MPA balloon-ing requires neither additional hardware support, nor incursextra minor page faults in its memory pressure estimation.We show that MPA ballooning provides an 13.2% geomeanspeed-up versus the current ballooning techniques across aset of application mixes running in guest VMs; often yield-ing performance nearly identical to that of a non-memoryconstrained system.

CCS Concepts•Software and its engineering→ Software performance;

KeywordsVirtualization, Operating System, Memory Management

1. INTRODUCTION

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

MEMSYS ’15 Washington, DC, USAc© 2015 ACM. ISBN .

DOI:

With the advent of Virtual Private Server (VPS) ven-dors, such as Amazon Elastic Compute Cloud (EC2) [6]and Rackspace Cloud [5], cloud computing has become alarge and growing component of the computing market. Toprovide adequate service for clients, vendors often utilizehardware virtualization due to its strong server consolida-tion and high scalability characteristics. In virtualized envi-ronments, a hypervisor or virtual machine monitor (VMM)is responsible for managing virtualized hardware resourcesand the execution of guest virtual machines (VMs). Themain memory size specified by each guest VM running ona given host, however, can quickly add up to impose a highcost in required host machine physical memory. Despitethis high cost, VMs rarely fully utilize their entire virtu-alized main memory space, thus wasting this valuable re-source. To deal with under-utilization of physical memory invirtualized environments, hypervisor memory-managementtechniques which enable physical memory overcommitmenthave been developed, however they often trade lower systemmemory requirement for a significant performance cost, ordo not effectively respond and adapt to dynamically chang-ing memory pressure. The goal of this work is to reduce theperformance overheads of hypervisor memory managementtechniques by proposing a Memory Pressure Aware (MPA)ballooning which dramatically improves responsiveness andadaptivity of virtualized system.

To dynamically adjust memory allocation, Waldspurger [16]introduced the “ballooning” technique as a kernel moduledriver with VMware ESX Server [15]. The memory al-location is based on a share-based allocation scheme [17]where the share of VM is determined by its memory utiliza-tion. When the host machine is under memory pressure andneeds to reclaim memory from guest VMs, reclamation isdone by measuring the portion of idle memory in each guestVM. To measure the idle memory, VMware ESX tracks ran-domly selected pages by invalidating their associated TLBentries. Subsequent accesses to the sampled pages triggerTLB misses and increase the count of touched pages. Thefraction of inactively accessed memory is estimated from theratio of untouched pages to sampled pages. Similarly, recentstudies [19, 12, 18] force TLB misses to estimate the work-ing set size of each VM. Invalidating TLB entries, however,incurs a substantial performance overhead in virtualized sys-tems, since it forces a context switch and requires multiplememory accesses to refill the entries [2].

Further, the current memory allocation policies bear sig-nificant performance overhead due to insufficient knowledgeabout global memory pressure. Hypervisor exclusive cache [12]

4.9%

10.8%

0%

5%

10%

15%

20%

25% Memory

Sensitive Memory

Insensitive

Figure 1: Performance degradation of ballooning withTmem

or Tmem [14], manage an additional layer between physicalRAM and the disk, to cache pages evicted by the guest VMsin a common pool controlled by the hypervisor. However,they aggressively and obliviously claw-back file cache pagesfrom VMs to the hypervisor, even when sufficient memoryis available in the host machine. This causes substantialperformance impact due to excess page movement and pagefaults regardless of global memory pressure. Figure 1 showsthe overhead of ballooning with Tmem on individual ap-plications from the PARSEC suite, running in a guest VMwith no other load in the system. In the figure, we observethat ballooning with Tmem results in 10.8% of performanceoverhead compared to a system without ballooning supporton memory sensitive applications. In contrast to Tmem,VMware ESX adopts the concept of global memory pres-sure and activates ballooning driver only when the host freememory drops towards statically predefined threshold [15].This approach also degrades the memory utilization and re-sponsiveness of ballooning because most of physical memoryis likely to be consumed by the guest VMs without actuallybeing used.

Ideally, the memory allocation strategy for virtualized sys-tems should adapt dynamically to the global memory pres-sure with minimal overhead. This is the goal of MPA bal-looning. The key concept of MPA ballooning is that the hy-pervisor unobtrusively measures the current system memorypressure and adaptively changes memory allocation policy.For example, if the host machine is under low memory pres-sure, MPA ballooning allows guest VMs to have additionalmemory cushion so that they can reduce the overhead ofacquiring pages from the hypervisor. On the other hand, ifthe host machine is under heavy memory pressure, the hy-pervisor perceives increased memory pressure and reclaimsthe inactive pages from guest VMs. Unlike prior works [18,15] which considered memory cushion only when the hostmachine has enough free memory, MPA ballooning dynam-ically changes the amount of base and bonus memory withrespect to the different memory pressure regimes. We imple-ment MPA ballooning in Xen hypervisor and Linux kernelwith minor modifications. We show that the MPA balloon-ing allows guest VMs to share memory across all ranges of

memory pressure, thus enabling their wider adoption.The paper makes the following major contributions:

• We identify different regimes of memory pressure ina system and employ this identification to drive thememory allocation and sharing policies in a virtualizedenvironment.

• We develop mechanisms to measure the memory pres-sure state of the virtualized system to guide the mem-ory allocation policy. These mechanisms require nei-ther additional hardware support, nor incur significantperformance overhead in memory pressure estimation.

• We implement MPA ballooning and evaluate it on realworkloads. The proposed technique improves perfor-mance by 18.2% with multiple VMs repeating singleapplications. In more realistic scenario with high mem-ory pressure, MPA ballooning improves performanceby 13.2% with multiple VMs repeating random orderedapplications.

2. BACKGROUNDThis section discusses the background of operating system

memory management and lays out the regimes of memorypressure that virtualized environments may experience.

2.1 OS Memory UseGenerally the OS allocates the majority of its memory

at runtime through two primary mechanisms: anonymouspages and file cache pages. First, running applications mayrequest memory from the OS through the “malloc” function.When the OS receives a malloc call, it notes the amountof space requested and returns a virtual address pointer forthis space back to the calling application. These malloc’edpages, together with stack and static data space pages, arecollectively known as anonymous pages. Beyond anonymouspages, the OS also uses main memory for file system buffers.The available, unmalloc’ed free space, is used as a cache forrecently accessed files, since disk access is very slow relativeto DRAM access.

Although OSes typically track the sum of malloc’ed andfile system buffer pages, attaining an accurate working setfootprint of utilized pages can be challenging. To acquire agood estimation of page locality, most OSes categorize thesepages into active and inactive pages. For example, the Linuxkernel on X86 relies on the Accessed bit in each page tableentry, which is automatically set by the hardware. The ker-nel also uses software defined Referenced bit and maintainsfour different states for page locality. Mac OS X also ex-ploits the active and inactive lists for efficient page reclama-tion [9]. With the support of active and inactive lists, anapproximation of LRU algorithm for page replacement canbe established by ordering pages in each list.

2.2 Memory Pressure RegimesImplementing an accurate memory pressure model is im-

portant in virtualized system because the pressure informsthe hypervisor to reclaim pages from guest VMs or let themkeep pages locally. As described above, anonymous and filepages are two main components of guest OS memory usage.Comparing the sum of anonymous and file pages againstthe total available host memory we classify system memorypressure into three regimes:

(GB)

0

1

2

3

4

0 10 20 30 40 50 60 70 80 90 Execution Time (sec)

∑Anon ∑File Free

LOW PRESSURE

MILD PRESSURE

HEAVY PRESSURE

A B

Figure 2: Memory pressure regimes

• Low pressure: ΣAnoni + ΣFilei < MEMhost

• Mild pressure: ΣAnoni + ΣFilei ≥MEMhost

• Heavy pressure: ΣAnoni ≥MEMhost

Where ΣAnoni represents the sum of anonymous pages acrossall VMs, ΣFilei is the sum of clean pages across all VMs andMEMhost is the total available system memory. We also de-fine the total memory demand and local memory demand asfollows.

• Total memory demand = ΣAnoni + ΣFilei

• Local memory demand = Anoni + Filei

To avoid confusion from different terminologies, in thispaper, we consider Anon as total anonymous page, ActiveAnon as recently accessed anonymous page, and InactiveAnon as not recently accessed anonymous page. Figure 2illustrates the memory pressure regimes when multiple VMsare running in a host system. In this example, the physicalmemory in the host machine (MEMhost) is 4GB. The firstregime is Low, where the total memory demand (ΣAnoni

+ ΣFilei) is less than total available memory of the hostmachine. In this case, there should be no need to evict filepages from each VM because the host machine has enoughFree memory for each VM to satisfy its local memory de-mand. The second regime, Mild, occurs when the total mem-ory demand exceeds the available memory (Point A). At thispoint, the hypervisor must have a policy for page manage-ment since there is not enough room for all the anonymousand file pages of all currently running VMs. In the Heavyregime (beginning at Point B), ΣAnoni surpasses the totalmemory of host machine. In this regime, there is insufficientmemory to satisfy the requested anonymous pages, thus theguest OSes must swap pages off on to disk, leading to pagefaults, often at a large performance cost. We note that agiven system may dynamically transition through each ofthese regimes at different points during its runtime. Each ofthese regimes demands a different virtual machine memorymanagement policy to achieve good performance, motivat-ing our MPA ballooning approach. VMware ESX also main-tains memory pressure in terms of host free memory states.However, these states are reflected by percentile threshold:6%, 4%, 2%, and 1% of host memory respectively [15] whichrestricts flexible host memory usage.

3. DESIGN MOTIVATION

0

1000

2000

3000

4000

0 50 100 150 200

Execution Time (sec)

Anon Used Total

(a) Ballooning + Tmem

Anon Used Total

0

1000

2000

3000

4000

0 50 100 150 200


(b) Ballooning off

Figure 3: False balloon target due to clean pages in vipsfrom the PARSEC suite

We observe that traditional ballooning drivers incur a highperformance penalty because the algorithm is oblivious tosystem conditions and often incorrectly sets the memorytarget for the guest VM. Prior work in ballooning are nei-ther capable of reacting fast enough to deal with rapidlychanging application memory demands, nor adaptive to thesystem memory pressure regimes described in Section 2.2.

3.1 Adaptation to System ConditionsTypically the ballooning driver chooses its target based on

memory resource requirements. For instance, the balloon-ing driver implemented in Linux as part of Xen, sets the bal-loon target equal to the sum of the malloc’ed memory at anygiven time. On the other hand, memory reclamation policiescan be classified into two categories: aggressive and specula-tive. Hypervisor exclusive cache [12] and Tmem [14] belongto aggressive management policies. These approaches im-plement a hypervisor-level cache which stores evicted pagesfrom guest VMs to reduce the number of disk I/Os in nearfuture. A critical problem with aggressive reclamation, how-ever, is that the guest VMs are forced to evict file cache pageswithout considering current system state.

Figure 3 illustrates how a typical ballooning driver withTmem results in performance degradation by evicting andreloading file pages. In the figure, the lightly shaded areaTotal represents the memory assigned by the hypervisor tothe VM, while the darker area Used is the amount of mem-ory actually being used. The application is vips, an imageprocessing application from the PARSEC suite [3], whichreads a large input image file using memory mapped I/O.After a file has been loaded, all image operations are appliedto the loaded input file. In this experiment vips is iteratedfour times to analyze the overhead that comes from accessingevicted file pages in Tmem. Since the input image mappedin memory is identical to the content of physical disk, theyare not accounted as Anon (shown in the black line). Thebaseline ballooning driver, therefore, does not change theballoon target (Figure 3a) though the actual memory usageis much larger than the target. Consequently, the guest OSevicts file pages to make more space to read the input im-age file. Evicted pages are copied into the Tmem managedcache area. Each time vips reloads the file, the pages in theTmem cache are accessed again so that the hypervisor canavoid disk I/O, resulting in another copy back (reloading) tothe guest VM. Figure 3b shows the actual memory usage ofvips running in a VM without ballooning. In this case, eachtime vips reloads the file, it reuses the file pages maintainedin a guest VM’s memory from the previous iteration. The

file pages kept with ballooning off is shown as Used abovethe Anon line in Figure 3b.

Comparing the execution time of Figure 3a and 3b, reload-ing pages stored in the Tmem cache results in 20% perfor-mance overhead. It is clear that the hypervisor should notreclaim pages aggressively from VMs. In other words, if ac-tively used pages can be kept in the guest VMs, the perfor-mance degradation will significantly decrease. Based on thismotivation, MPA ballooning adaptively changes memory al-location and deallocation policy according to the memorypressure state.

Other works based on speculation [16, 19, 18, 7] are lessaggressive than Tmem because they let the guest VMs keeppages based on memory access speculation without imple-menting a hypervisor-level shared cache. Only when thememory pressure exceeds a certain threshold, these pages areslowly reclaimed by the hypervisor. This approach, however,can suffer performance degradation when the host machineis under memory pressure, particularly when the memoryconsumption is changing rapidly. Under these conditionspage reclamation should be accelerated such that pages canbe reallocated to the VM which can use it most effectively.

3.2 Slow Reclamation and ReallocationTo address the performance cost of latency in memory

reclamation and reallocation, changes in the memory de-mands of each guest VMs should be communicated to thehypervisor immediately. With more accurate and timelyinformation about guest VM memory needs, the hypervi-sor can better manage each VM’s memory allocation basedon a view of current system demands. Current ballooningdrivers [12, 14] do not react dynamically with the varyingsystem conditions since the activation period is statically setby the hypervisor. For example, Xen self-ballooning driver isinvoked every five seconds to update the balloon target [13].Thus, sudden changes of memory demand are not processeduntil the end of the current five second period, potentiallyimpacting performance. Similarly, the VMware ESX Serveruses sampling to activate memory reclamation. By default,VMware ESX Server samples 100 guest pages for each 60-second period [15].

To analyze the impact of slow response in memory allo-cation and deallocation process, we set two VMs to executededup at different time. Figure 4a shows VM1 starting dedupat 15 seconds and Figure 4b shows VM2 starting the sameapplication at 85 seconds. In this experiment, the host ma-chine memory is limited to 4.5GB so that we can clearlyobserve memory allocation and deallocation process.

Point A and B in Figure 4a illustrate how latency in re-sponding to dynamic memory demand can affect memory al-location. At point A, dedup requests a large chunk of mem-ory, causing Anon to instantly jump to 2.5GB. This occursin the middle of the five second interval, thus the ballooningdriver does not respond to this change and the Total mem-ory assigned to the VM does not change. Meanwhile, deduprapidly attempts to use the memory it requested leading tomajor page faults because insufficient memory is assigned tothe VM. Point B in Figure 4a illustrates a further hazardof latency in responding to memory demand. Here, Anonvaries dramatically within the five-second sampling perioddue to a large allocation, use and deallocation. The balloon-ing driver is not aware of any of these changes and does notadjust the target.

0

1000

2000

3000

4000

0 5 10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

(MB)


Anon Used Total

A

B C

(a) VM1 executes dedup at t = 15

0

1000

2000

3000

4000

0 5 10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

(MB)


Anon Used Total

C

(b) VM2 executes dedup at t = 85

Figure 4: Slow response to memory allocation and dealloca-tion

On the contrary, Point C illustrates how slow memorydeallocation impacts the other VMs running. At 85 sec-onds, VM2 starts running dedup and its Anon jumps up to2.5GB seeking for more memory. However, VM1 slowly re-turns memory to the hypervisor which in turn slows downthe memory reallocation to VM2. Therefore, VM2 cannotachieve enough memory to execute dedup and suffers frommajor page faults. Both Xen hypervisor and VMware ESXServer adopt slow deallocation process inspired from controltheory and networking domain [16]. To control the rateof variation in reclaim process, they both adopt hysteresisvalue. In other words, the memory target is determined notjust on current data but also on past data stored in thesystem. However, this approach only covers local memoryusage in each VM and does not consider global memory pres-sure state. As a result, the memory reallocation speed doesnot adapt to other VM’s memory demand or global memorypressure status.

Naıvely, one might consider increasing the sampling fre-quency of the ballooning driver, however, this approach couldincur high overheads from the additional hypercalls. In-stead, we propose to capture the exact moment when thememory demand increases or decreases unexpectedly andinvoke ballooning driver without delay.

3.3 Working Set Estimation OverheadSeveral prior work memory allocation policies [16, 19, 18,

12] have adopted a memory trap technique to sample thememory accesses by setting a privilege bit in a page tableentry. In doing so, the hypervisor samples memory accessesto build a page miss ratio curve which can provide an estima-tion of working set size (WSS). These techniques, however,often come at a significant performance cost due to the re-quired TLB and page walk cache entry invalidations. Forexample, the TLB invalidation instruction in x86 architec-ture invalidates all entries in all page walk caches associatedwith the current PCID, regardless of the virtual addresses

MEMhost

∑Anoni

∑Filei

Free

Free

N+2

Cushion

VM1

VMN

Host

Balloon Target

Anoni + Filei + Cushion

(a) Low: ΣAnoni + ΣFilei < MEMhost

MEMhost

∑Anoni

∑Filei

Inactive

N+1

Cushion

VM1

VMN

Host

Balloon Target

Anoni + Filei + Cushion (Active)

(Active)

∑Filei (Inactive)

∑Anoni (Inactive)

+

(Active) (Active)

(b) Mild: ΣAnoni +ΣFilei ≥ MEMhost

MEMhost

∑Anoni

Inactive

N+1

Cushion

VM1

VMN

Host

Balloon Target

Anoni + Cushion (Active)

∑Anoni (Inactive)

(Active)

(c) Heavy: ΣAnoni ≥ MEMhost

Figure 5: Adaptive memory cushion

to which they correspond [10]. In virtualized environments,invalidating TLB and page walk cache results in more mem-ory accesses than that of native machine since each pointerin the guest page table must be translated through the hy-pervisor page table [2].

Instead of estimating the memory footprint using high-overhead TLB and page walk cache invalidations, in MPAballooning, we propose to exploit information directly ex-ported by the guest OS’s to the hypervisor. Specifically, aspart of the ballooning driver, the guest OS reports the ac-tive and inactive list up to the hypervisor [4, 9]. Althoughthis approach does come at the cost of a limited number ofadditional hypervisor calls, these calls have much less perfor-mance overhead than the invalidations used in the samplingapproaches. We note that this use of back-channel infor-mation is similar to the information passing utilized by theXen hypervisor’s guest OS ballooning drivers, however, aswe discuss in the next section, we greatly extend the inter-face to provide a much more detailed and timely picture ofthe guest OS’s memory usage state.

4. DESIGNThe proposed MPA ballooning addresses the oblivity to

memory pressure shown in existing ballooning techniques, asdiscussed in the previous section. The overall design of MPAballooning can be divided into two components. First, MPAballooning implements an adaptive memory cushion whichdynamically changes its size based on the system memorypressure. Second, MPA ballooning implements an instantresponse mechanism which dynamically reacts to changes inmemory pressure through accelerated memory reclamationand reallocation. Similar to Ginkgo [7], the approach de-vised in MPA ballooning is hypervisor-agnostic in that theframework only requires a ballooning driver to dynamicallycontrol the memory size of a VM, which is already availablein most of hypervisors such as VMware ESX Server [15],Xen [1], and KVM [8].

4.1 Adaptive Memory CushionAs noted in Section 3, the hypervisor must dynamically

adapt to the current memory pressure in order to avoid un-necessary page movement and disk I/O. We design an adap-tive memory cushion in MPA ballooning so that the memorycan be efficiently reallocated among guest VMs. The adap-tive cushion is managed by the hypervisor and the size ofcushion dynamically varies as the memory pressure changes.Figure 5 describes how the MP aware ballooning calculatesthe memory cushion and balloon target based on differentmemory pressure regimes. In the Low pressure regime (Fig-

ure 5a), the host machine does not suffer from memory pres-sure because the total memory demand is always lower thanMEMhost. Therefore, the hypervisor splits Free memorywith N + 2 instances where N is the number of guest VMs.Each guest VM then receives a cushion slice of Free/(N+2)above their local memory demand (Anoni + Filei). Thiscushion allows the guest VMs to absorb dynamic changesin memory demand in the VM without further interventionfrom the hypervisor. The balloon target for each guest VMis set by adding the adaptive memory cushion on the top oflocal memory demand. We reserve two memory cushions ofFree/(N + 2) for the hypervisor as spare memory that canbe quickly allocated to a given guest OS in the event of asudden demand.

When the total memory demand exceeds the MEMhost

limit, the host machine enters the Mild pressure regime andthe memory cushion must be rebalanced. Since active pagesare more likely to be used by guest VMs, while inactive pagesare less likely, MPA ballooning uses the inactive memory inthe guest VMs as an adaptive cushion in the Mild pressureregime. As shown in Figure 5b, MPA ballooning dividesup inactive pages with N + 1 instances of memory cushionin Mild pressure regime. Rebalancing with N + 1 cushionsforces guest VMs to yield a portion of its inactive pages toother VMs. These reclaimed pages can be used for guestVMs with high active memory demands. By reclaiming in-active pages from guest VMs, the system may return to theLow pressure regime. If the guest VMs do not change theirlocal memory demands, the retrieved memory remains Freein the hypervisor. The VMWare ESX Server also falls backto the low memory pressure state when reclaiming pagesfrom guest VMs when the host free memory is below 6%of its total memory. In this scheme, however, the memorypressure state is purely based on host memory size [15], thusthis policy in the VMWare ESX Server can lead to underuti-lized memory. On the other hand, MPA ballooning reclaimsand rebalances pages based directly on guest VM’s memorydemands.

If the total memory demand is overwhelmed by ΣAnoni,file pages are mostly evicted and the host machine may seeperformance degradation due to increased file I/O. In thisregime, an adaptive memory cushion is built by dividing upthe inactive anonymous pages. Note that the size of cushiondynamically adapts to varying system conditions becausethe cushion determined by inactive portion of memory.

4.2 Memory Reallocation TriggerCompared to the guest VM scheduling time slice, which is

usually 30ms in Xen hypervisor [11], the five second response

0

1000

2000

3000

4000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85Execution Time (sec)

(MB) AnonUsedTotal

A

B

(a) Slow response

0

1000

2000

3000

4000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85Execution Time (sec)

(MB) AnonUsedTotal

A

B

(b) Improved response

Figure 6: Improved response time with trigger

interval of ballooning driver is relatively slow. VMwareESX Server has better response since it activates balloon-ing driver when the hypervisor detects 6% threshold of hostfree memory. However, the detection mechanism, based ona 60 second sampling period [15] potentially slows down theresponse of memory reallocation process. MPA ballooningresolves this problem by implementing a memory use thresh-old trigger which immediately activates the ballooning driverrather than waiting for the next response interval. Thus,memory reallocation is accelerated and the guest VM canavoid major page faults.

Figure 6 shows the effect of our memory reallocation trig-ger. Since an abrupt change in Anon occurs within theresponse interval (five seconds here), the ballooning drivercannot detect the change in malloc’ed memory and fails toset a correct target in Figure 6a. As a result, the amountof memory given to a guest VM (Total) remains unchangedat Point A, starving the guest VM of memory to meet thesudden demand and leading to performance loss due to ma-jor page faults. In MPA ballooning, shown in Figure 6b, thememory reallocation trigger detects the change in memorydemand and immediately activates ballooning driver. ThusMPA ballooning, improves reaction time dramatically andreduces the gap between Total and Anon. The realloca-tion trigger activates ballooning driver based on a thresholdformula as follows.

|Totalc −Anonc| × Th < Rsize (1)

Here, Totalc is the current total memory given to a par-ticular guest VM, Anonc is the current malloc’ed memoryprior to the coming change and Rsize is size of change inAnonc requested. If Rsize is larger than Th% of the dif-ference between Totalc and Anonc, the ballooning driver isimmediately triggered via hypervisor call. As shown in Fig-ure 7, we examined a wide range of threshold values andTh = 50% was empirically determined to be optimal.

Threshold Th (%)

-4%

-2%

0%

2%

4%

6%

8%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Geo

mea

n S

peed

-up

Figure 7: Performance analysis on trigger threshold

4.3 Adaptive HysteresisTypically, increasing memory allocation in MPA balloon-

ing occurs quickly via the reallocation trigger, while decreas-ing memory allocation is relatively slow so that the guestOS can adapt to any unexpected change of its local memorydemand. When the host machine is under memory pres-sure (i.e., Mild or Heavy pressure), however, the hypervisorshould control the hysteresis value so guest VMs with in-active pages can quickly return pages to other VMs withhigh memory demands. As noted in Section 3.2, if hys-teresis value does not adapt to global memory pressure, theslow memory reallocation will degrade overall system per-formance. MPA ballooning uses the following formula todynamically adapt its hysteresis to the current memory pres-sure regime:

HMP = Hdefault ×Global Inactive Page(GIP )

Inactivei(2)

GIP =ΣInactivei

Number of VMs(3)

Equations (2) and (3) are inspired by prior work [16] whichconsidered the amount of idle pages for page reallocation.Similarly, MPA ballooning dynamically changes the hystere-sis value for each guest VM by comparing its inactive pagecount to the global average. If the hysteresis value HMP ishigh, the system responds slowly to both increasing and de-creasing memory requests. Alternately, when the hysteresisvalue is low, the algorithm responds quickly to requests inboth directions. In the Low pressure regime, HMP is set to ahigh default value (Hdefault) because the host machine hasenough memory to cover requests without rapid hypervisorintervention. When the memory pressure regime enters theMild or Heavy pressure regime, MPA ballooning comparesGIP which represents the average number of inactive pagesacross the all VMs with the guest VM’s inactive page count(Inactivei). If Inactivei is higher than GIP , it implies thatthe reclamation process can be accelerated in VM2 becauseit has more inactive pages than average. On the other hand,if local Inactivei is less than the global average, we maintainthe previous HMP value. By comparing the local inactivepages with the global average, the MPA ballooning acceler-ates memory allocation and deallocation process. Note thatHMP is related to the speed of memory management andhas nothing to do with the size of allocation or deallocation.

4.4 ImplementationThe MPA ballooning is a pure software approach and re-

quires minimal changes to the ballooning driver and Hy-pervisor to implement. To evaluate the technique we imple-mented MPA ballooning in the Linux kernel and Xen hyper-visor. Table 1 shows the number of modified/additional linesof code required to implement each component of MPA bal-looning. Since the memory reallocation trigger is activatedby the guest OS side, it does not require any change in thehypervisor. To set a new balloon target with an adaptivememory cushion, the MPA ballooning requires only one ad-ditional hypercall from the guest OS. Furthermore, the de-sign is hypervisor-agnostic because the implementation onlyrequires a ballooning driver which is already widely adoptedin most hypervisors.

Component Linux Kernel Hypervisor

Adaptive Cushion 15 lines 120 linesReallocation Trigger 25 lines -Adaptive Hysteresis 20 lines 20 lines

Total 60 lines 140 lines

Table 1: Design overhead of MP aware ballooning

5. EVALUATIONIn this section we first describe our experimental method-

ology. We then explore the MPA ballooning’s performanceunder different workloads.

5.1 MethodologyAll experiments were performed in real hardware, on an

Intel Xeon E5-2420 1.90GHz machine. The baseline hard-ware configuration is shown in Table 2. All experimentalresults are normalized to an ideal, non-memory constrainedcase where the hypervisor can fully accommodate the sum ofeach VM’s configured main memory (4GB each). Addition-ally, the ballooning driver is disabled so that the hypervisordoes not reclaim any pages from the guest VMs. In thisconfiguration, a minimal number of page faults and file I/Orequests to load pages from disk to memory occur during thevery first execution. By comparing the speedup against thenon-memory constrained, ideal case, we clearly show thatTmem causes significant performance degradation and howMPA outperforms Tmem by reducing major page faults andfile I/O.

MPA ballooning is evaluated under three different mem-ory restriction models: 20%, 40%, and 60% pressure (20%represents 20% less DRAM is available than the sum of con-figured guest VM’s main memory). These models are il-lustrated in Figure 8, each dashed line represents memoryrestriction we applied to the physical memory. In each re-striction model, the hypervisor experiences different memorypressure regimes described in Figure 2. For example, with20% restriction model, the hypervisor always stays in Lowpressure while the 40% restriction model experiences tran-sition from Low to Mild regime. Similarly the 60% modelexperiences all three memory pressure regimes. With greaterrestriction amounts, there is a greater likelihood to be in theMild or Heavy regimes.

MPA ballooning is evaluated using applications from thePARSEC suite [3] of shared memory multiprocessor bench-marks executing in guest VMs. The PARSEC suite includesbenchmarks selected from different computation fields and

Domain Component Configuration

Host

CPU Intel Xeon E5-2420 1.90GHz# Cores 6

L1I & D cache 32KB 8-wayL2 cache 256KB 8-way

Shared LLC 15MB 20-wayDRAM 32GB

VM Technology VT-x, VT-d, EPTHost OS Ubuntu Linux 12.04.3

Hypervisor Xen 4.2.3

GuestVirtualized CPUs 4

DRAM/guest 4GBGuest OS Ubuntu Linux 12.04.3

Table 2: Baseline configuration

MIX Benchmark 1 Benchmark 2 Benchmark 3

VM1 bodytrack dedup x264VM2 blackscholes raytrace cannealVM3 freqmine dedup x264VM4 bodytrack blackscholes x264VM5 vips raytrace cannealVM6 bodytrack vips cannealVM7 freqmine vips dedupVM8 freqmine blackscholes raytrace

Table 3: Randomized mixes

the memory usage of each benchmark is diverse enough torepresent applications running in large scale servers. Wetest MPA ballooning in two different test scenarios. In thefirst experiment, we execute 8 VMs simultaneously, witheach VM repeatedly executing a single benchmark, selectedas one of the memory sensitive applications shown in Fig-ure 1. The second experiment runs multiple VMs where eachVM repeats multiple applications mixed in random order.To avoid biased selection on a certain type of benchmarks,we generate 8 randomized mixes of the PARSEC suite (Ta-ble 3). Between each iteration, the VM executes a sleep timerandomly selected from 0 to 20 seconds in order to evaluatethe response time of Tmem and MPA ballooning.

0

4

8

12

16

20

24

28

32

0 50 100 150 200 250 300 350


(GB) ∑Anon ∑File Free

20% Pressure

40% Pressure

60% Pressure

Figure 8: Create memory pressure by restricting MEMhost

80.0%

23.4%

98.2%

6.9% 0%

20% 40% 60% 80%

100% 120% 140%

Tmem MPA

(a) Normalized performance to Ideal

92.5%

0%

20%

40%

60%

80%

100%

0

50

100

150

200 Tmem MPA Ideal 18,236

(b) Number of major page faults

81.3%

0%

20%

40%

60%

80%

100%

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Tmem MPA Ideal x 106

(c) Number of file reads

Figure 9: Performance analysis in 20% pressure

5.2 Repeating Single ApplicationWe first evaluate a scenario where each guest VM iterates

a specific application. Memory sensitive applications areassigned to guest VMs and the experiment is stopped whenall applications have executed more than five instances. Theaverage execution time of each of those first five instances isused for performance evaluation.

As Figure 9a shows, MPA ballooning shows performancenearly identical to that of the non-memory constrained sys-tem (Ideal). Compared to Ideal, MPA shows less than 2%slowdown, 18.2% better than that of Tmem across all mixesunder the 20% physical memory restriction model. Withthis relatively small memory restriction, the host machineremains in Low pressure regime. As a result, the adaptivecushion of MPA ballooning is able to absorb both anony-mous and clean page demands, thereby dramatically re-ducing the amount of page movement between the guestVM and Tmem (seen as major page faults in the guest).Here, the reduction in major page faults (Figure 9b) andfile read operations (Figure 9c) corresponds to the speedupof MPA ballooning. The right most bar of Figures 9b and 9c,Norm. Reduction, shows the normalized geomean reductionin faults and file reads respectively, across all benchmarks.

There are exceptions where reduced file I/O does notmatch the speedup value. For example, with Tmem, fre-qmine and dedup generate more page faults and file readsbut they run faster than MPA and Ideal. This is because

76.3%

28.5%

91.6%

20.1%

0% 20% 40% 60% 80%

100% 120% 140%

Tmem MPA


76.0%

0%

20%

40%

60%

80%

100%

0

50

100

150

200 Tmem MPA Ideal 18,562 16,949


x 106

65.0%

0%

20%

40%

60%

80%

100%

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Tmem MPA Ideal



freqmine and dedup are frequently rescheduled than otherapplications which wait for file I/O. Thus, with Tmem, fre-qmine and dedup steal scheduled slices from other applica-tions because of those applications’ increased file I/O. As aresult we see higher performance degradation and standarddeviation with Tmem, also bringing into question schedul-ing fairness. Tmem exhibits a standard deviation of 23.4%while MPA shows only 6.9%. Overall, the system wide ma-jor page faults decrease by 92.5% and the number of filereads decreases by 81.3% compared to Tmem.

In the 40% and 60% memory restriction experiments, MPAballooning outperforms Tmem by 15.3% and 13.0% respec-tively. Since there is not enough memory to allocate, Tmemstores most of evicted anonymous pages and file pages inthe hypervisor-level cache. MPA ballooning allows VMs tokeep their actively used pages, avoiding page faults. As aresult, MPA ballooning reduces page faults and file readsby 76.0% and 65.0% respectively under the 40% restrictionmodel (Figure 10b and 10c). Similarly, the numbers de-creases by 83.8% and 60.2% under the 60% restriction model(Figure 11b and 11c). In speedup, Tmem still shows out-liers in both memory restriction models, with higher per-formance degradation and standard deviation compared toMPA ballooning. This result clearly shows that traditionalballooning technique does not work efficiently as the mem-ory pressure increases.

73.0%

33.1%

86.0%

28.0%

0% 20% 40% 60% 80%

100% 120% 140%

Tmem MPA


83.8%

0%

20%

40%

60%

80%

100%

0

100

200

300

400

500 Tmem MPA Ideal 19,266 2,099 17,737 1,244


x 106

60.2%

0%

20%

40%

60%

80%

100%

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Tmem MPA Ideal



5.3 Multiple Applications in Random OrderIn this experiment, we assume a scenario in which differ-

ent applications are run in each VM. Based on Table 3, weassign multiple applications to each VM and execute themin random order. The experiment is stopped when all ap-plications have repeated more than three instances. Due tothe limited space, we show the experiment results from theworst and most realistic scenario (60% memory pressure)where the system experiences transitions from Low, Mildand Heavy pressure regimes.

The performance improvement in random ordered appli-cations shows more consistent trend than in the single ap-plication repetition scenario. The MPA ballooning providesadaptive memory cushion so that the guest VMs can avoidmajor page faults and maintain file pages locally. In partic-ular, the reallocation trigger and adaptive hysteresis becomemore important when multiple applications are mixed. Thetrigger increases the responsiveness of each VM when thereis an urgent memory demand. At the same time, the adap-tive hysteresis controls VMs with a large amount of inactivepages to return their pages quickly for other VMs with ur-gent needs. Combining these techniques together, as shownin Figure 12a, MPA ballooning achieves a 13.2% geomeanspeedup compared to Tmem. The number of major pagefaults dramatically decreases by 86.2% (Figure 12b) whilethe number of file I/Os decreases by 14.5% (Figure 12c).As a result, MPA ballooning reaches 80% performance of

66.8%

11.5%

80.0%

12.6%

0%

20%

40%

60%

80%

100% Tmem MPA


21,544 86.2%

0%

20%

40%

60%

80%

100%

0

4,000

8,000

12,000

16,000 Tmem MPA Ideal


x 106

14.5%

0%

20%

40%

60%

80%

100%

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

Tmem MPA Ideal


Figure 12: Performance analysis with random ordered ap-plications

non-memory constrained system. Note that MPA balloon-ing always shows consistent performance improvement withany guest VM in a highly memory constrained model.

5.4 Hypercall OverheadsSince MPA ballooning requires additional hypercalls to

report the number of different types of pages from guestOS to hypervisor, one might be concerned that it results ingreater hypercall overhead. Figure 13 measures the numberof hypercalls from the randomly mixed applications experi-ment (Section 5.3). In the figure, “Original” represents thefraction of hypercalls used by MPA to control page move-ment and change page table entries (similar to those usedby Tmem). “Extra” represents the fraction of new hyper-calls in MPA ballooning used to report malloc’ed/file pagesand activate memory reallocation trigger. Since MPA bal-looning leaves actively used pages in the guest VMs, it re-duces useless page movement from VMs to the hypervisor.As a result, MPA ballooning greatly reduces the number of“Original” hypercalls, by 21.8% on average. Furthermore,the additional “Extra” hypercalls are less than 0.01% on av-erage relative to Tmem’s hypercalls. Thus, Figure 13 clearlyshows MPA, in fact, significantly reduces the overall numberof hypercalls relative to Tmem.

78.2%

0.01%

0%

20%

40%

60%

80%

100%

VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 Geomean

Original Extra

Figure 13: Number of hypercalls normalized to Tmem

0%

20%

40%

60%

80%

100%

10% 20% 30% 40% 50% 60%Memory Pressure (%)

MPA Tmem

Figure 14: Sensitivity to overall memory pressure

5.5 Sensitivity to memory pressureFigure 14 shows the performance gap between between

MPA ballooning and Tmem under different amounts of mem-ory pressure. We increase the memory pressure from 10% to60%. The performance value shown is normalized against anon-memory constrained system. The experiment is stoppedat 60% memory pressure since the system generates toomany page faults after 60% pressure due to the limited 32GBhost memory.

When there is only 10% memory pressure, MPA balloon-ing can achieve performance nearly identical to that of thenon-memory constrained system. At this point, MPA bal-looning outperforms Tmem by 18%. However, as the amountof memory pressure increases, the performance gap betweentwo techniques becomes narrower. At 60% memory pres-sure, the performance gap becomes 13%. This is expectedbecause with increasing memory pressure, the number ofpages that can be exploited by MPA ballooning naturally de-creases. However, MPA ballooning still provides substantialperformance improvement over Tmem under heavy memorypressure.

6. CONCLUSIONIn this paper, we propose MPA ballooning, a VM mem-

ory management technique which dynamically adapts to thesystem memory pressure state. Prior works in VM memorymanagement are oblivious to the system memory pressureand thus incur substantial performance overheads under dif-ferent memory pressure regimes. We classify memory pres-sure into the Low, Mild, and Heavy pressure regimes basedon the committed memory and file page usage. MPA bal-looning leverages back-channel information from the guestVMs to dynamically allocate memory resources to each VMbased on the current memory pressure regime. Moreover,the MPA ballooning proactively reacts and adapts to sud-den changes in memory demand from guest VMs. To thebest of our knowledge, MPA ballooning is the first VM mem-

ory management technique which dynamically changes thememory allocation policy based on the system’s memorypressure regime. We show that MPA ballooning substan-tially improves performance versus baseline ballooning, re-gardless of memory pressure regime.

7. ACKNOWLEDGEMNTThis research is supported in part by the National Science

Foundation, under grants #1320074 and #1439722.

8. REFERENCES[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris,

A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen andthe art of virtualization. In ACM SIGOPS OperatingSystems Review (SOSP), volume 37, pages 164–177, 2003.

[2] T. W. Barr, A. L. Cox, and S. Rixner. Spectlb: amechanism for speculative address translation. InComputer Architecture (ISCA), 2011 38th AnnualInternational Symposium on, pages 307–317. IEEE, 2011.

[3] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsecbenchmark suite: characterization and architecturalimplications. In Proceedings of the 17th Internationalconference on Parallel Architectures and ComplicationTechniques (PACT), pages 72–81, 2008.

[4] D. P. Bovet and M. Cesati. Understanding the Linuxkernel. ” O’Reilly Media, Inc.”, 2005.

[5] R. Cloud. Rackspace Cloud.http://www.rackspacecloud.com, 2010.

[6] A. EC2. Amazon Elastic Compute Cloud (Amazon EC2).http://aws.amazon.com/ec2/, 2010.

[7] A. Gordon, M. Hines, D. Da Silva, M. Ben-Yehuda,M. Silva, and G. Lizarraga. Ginkgo: Automated,application-driven memory overcommitment for cloudcomputing. Proc. RESoLVE, 2011.

[8] I. Habib. Virtualization with kvm. Linux Journal,2008(166):8, 2008.

[9] A. Inc. Page lists in the kernel.https://developer.apple.com/library/mac/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html, 2013.

[10] Intel. Intel 64 and ia-32 architectures software developer’smanual. 3A:125–136, 2014.

[11] M. Lee, A. S. Krishnakumar, P. Krishnan, N. Singh, andS. Yajnik. Supporting soft real-time tasks in the Xenhypervisor. In Proceedings of the 6th ACMSIGPLAN/SIGOPS International conference on VirtualExecution Environments (VEE), pages 97–108, 2010.

[12] P. Lu and K. Shen. Virtual machine memory access tracingwith hypervisor exclusive cache. In Usenix AnnualTechnical Conference, pages 29–43, 2007.

[13] D. Magenheimer. Add self-ballooning to balloon driver.Discussion on Xen Development mailing list and personalcommunication, 2008.

[14] D. Magenheimer, C. Mason, D. McCracken, and K. Hackel.Transcendent memory and linux. In Proceedings of theLinux Symposium, pages 191–200, 2009.

[15] VMware. Understanding memory resource management invmware vsphere 5.0. In Technical White Paper. VMware,2011.

[16] C. A. Waldspurger. Memory resource management invmware esx server. In ACM SIGOPS Operating SystemsReview, pages 181–194, 2002.

[17] C. A. Waldspurger and W. E. Weihl. Lottery scheduling:Flexible proportional-share resource management. InProceedings of the 1st USENIX conference on OperatingSystems Design and Implementation, page 1. USENIXAssociation, 1994.

[18] W. Zhao, Z. Wang, and Y. Luo. Dynamic memorybalancing for virtual machines. ACM SIGOPS OperatingSystems Review, 43(3):37–47, 2009.

[19] P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman,Y. Zhou, and S. Kumar. Dynamic tracking of page missratio curve for memory management. In ACM SIGOPSOperating Systems Review, volume 38, pages 177–188.ACM, 2004.

dynamic memory pressure aware...

Documents