a low-overhead high-performance uniﬁed buffer management ... · that exploits sequential and...

A Low-Overhead High-Performance Unified Buffer Management Schemethat Exploits Sequential and Looping References

�

Jong Min Kim�

Jongmoo Choi�

Jesung Kim�

Sam H. Noh �Sang Lyul Min

�Yookun Cho

�Chong Sang Kim

��School of Computer Science and Engineering

Seoul National UniversitySeoul 151-742, Korea

� Department of Computer EngineeringHong-Ik University

Seoul 121-791, Korea

Abstract

In traditional file system implementations, the LeastRecently Used (LRU) block replacement scheme iswidely used to manage the buffer cache due to its sim-plicity and adaptability. However, the LRU schemeexhibits performance degradations because it does notmake use of reference regularities such as sequentialand looping references. In this paper, we present aUnified Buffer Management (UBM) scheme that ex-ploits these regularities and yet, is simple to deploy.The UBM scheme automatically detects sequential andlooping references and stores the detected blocks in sep-arate partitions of the buffer cache. These partitions aremanaged by appropriate replacement schemes based ontheir detected patterns. The allocation problem amongthe divided partitions is also tackled with the use of thenotion of marginal gains. In both trace-driven simu-lation experiments and experimental studies using anactual implementation in the FreeBSD operating sys-tem, the performance gains obtained through the use ofthis scheme are substantial. The results show that the hitratios improve by as much as 57.7% (with an average of29.2%) and the elapsed times are reduced by as muchas 67.2% (with an average of 28.7%) compared to theLRU scheme for the workloads we used.

�This work was supported in part by the Ministry of Ed-

ucation under the BK21 program and by the Ministry of Sci-ence and Technology under the National Research Laboratoryprogram.�

http://archi.snu.ac.kr/ � jmkim,jskim,symin,cskim ��http://ssrnet.snu.ac.kr/˜ � choijm,cho ��http://www.cs.hongik.ac.kr/˜noh

1 Introduction

Efficient management of the buffer cache by usingan effective block replacement scheme is important forimproving file system performance when the size of thebuffer cache is limited. To this end, various block re-placement schemes have been studied [1, 2, 3, 4, 5, 6].Yet, the Least Recently Used (LRU) block replacementscheme is still widely used due to its simplicity. Whilesimple, it adapts very well to the changes of the work-load, and has been shown to be effective when recentlyreferenced blocks are likely to be re-referenced in thenear future [7]. A main drawback of the LRU scheme,however, is that it cannot exploit regularities in blockaccesses such as sequential and looping references andthus, yields degraded performance [3, 8, 9]. In this pa-per, we present a Unified Buffer Management (UBM)scheme that exploits these regularities and yet, is sim-ple to deploy. The performance gains are shown to besubstantial. Trace-driven simulation experiments showthat the hit ratios improve by as much as 57.7% (withan average of 29.2%) compared to the LRU scheme forthe traces we considered. Experimental studies using anactual implementation of this scheme in the FreeBSDoperating system show that the elapsed time is reducedby as much as 67.2% (with an average of 28.7%) com-pared to the LRU scheme for the applications we used.

1.1 Motivation

The graphs in Figure 1 show the motivation behind thisstudy. First, Figure 1(a) shows the space-time graphof block references from three applications, namely,�� , �� , and �� (details of which will bediscussed in Section 4), executing concurrently. The

0

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000 30000

Logi

cal B

lock

Num

ber

Virtual Time

Multi2(cscope+cpp+postgres) trace

(a) Block references

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts

Inter-Reference Gap (IRG) (x50)

cscope+cpp+postgres trace

100 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



200 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



600 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



800 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



1000 cache blocks

(b) OPT block replacement scheme

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



100 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



200 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



600 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



800 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



1000 cache blocks

(c) LRU block replacement scheme

Figure 1: Caching behaviors of the OPT block replacement scheme and the LRU block replacement scheme.

� -axis is the virtual time which ticks at each block ref-erence and the � -axis is the logical block number of theblock referenced at the given time. From this graph,we can easily notice sequential and looping referenceregularities throughout their execution.

Now consider the graphs in Figures 1(b) and 1(c).They show the Inter-Reference Gap (IRG) distributionsof blocks that hit in the buffer cache for the off-lineoptimal (OPT) block replacement scheme and the LRUblock replacement scheme, respectively, as the cachesize increases from 100 blocks to 1000 blocks. The � -axis is the IRG and the � -axis is the total hit count withthe given IRG.

Observe from the corresponding graphs of the twofigures the difference with which the two replacementschemes behave. The main difference comes from howlooping references (that is, blocks that are accessed re-peatedly with a regular reference interval, which we re-fer to as the loop period) are treated. The OPT schemeretains the blocks in the increasing order of loop peri-ods as the cache size increases since the scheme choosesa victim block according to the forward distance (i.e.,difference between the time of the next reference in thefuture and the current time). From Figure 1(b), we cansee that in the OPT scheme the hit counts of blockswith IRG between 70 and 90 increase gradually as thecache size increases. On the other hand, in the LRUscheme there are no buffer hits at this range of IRGseven when the buffer cache has 1000 blocks. This re-sults from blocks at these IRGs being replaced either byblocks that are sequentially referenced (and thus neverre-referenced) or by those with larger IRGs (and thusare replaced before being re-referenced). Although pre-dictable, the regularities of sequential and looping ref-erences are not exploited by the LRU scheme, whichleads to significantly degraded performance.

From this observation, we devise a new buffer man-agement scheme called the Unified Buffer Management(UBM) scheme. The UBM scheme exploits regulari-ties in reference patterns such as sequential and loop-ing references. Evaluation of the UBM scheme usingboth trace-driven simulations and an actual implementa-tion in the FreeBSD operating system shows that 1) theUBM scheme is very effective in detecting sequentialand looping references, 2) the UBM scheme managessequentially-referenced and looping-referenced blockssimilarly to the OPT scheme, and 3) the UBM schemeshows substantial performance improvements.

1.2 The Remainder of the Paper

The remainder of this paper is organized as follows.In the next section, we review related work. In Section3, we explain the UBM scheme in detail. In Section 4,we describe our experimental environments and com-pare the performance of the UBM scheme with thoseof previous schemes through trace-driven simulations.In Section 5, an implementation of the UBM scheme inthe FreeBSD operating system is evaluated. Finally, weprovide conclusions and directions for future researchin Section 6.

2 Related Work

In this section, we place previous page/block replace-ment schemes into the following three groups and inturn, survey the schemes in each group.

� Replacement schemes based on frequency and/orrecency factors.

� Replacement schemes based on user-level hints.

� Replacement schemes making use of regularitiesof references such as sequential references andlooping references.

The FBR (Frequency-based Replacement) scheme byRobinson and Devarakonda [1], the LRU-K scheme byO’Neil et al. [2], the IRG (Inter-Reference Gap) schemeby Phalke and Gopinath [5], and the LRFU (Least Re-cently/Frequently Used) scheme by Lee et al. [6] fallinto the first group. The FBR scheme chooses a vic-tim block to be replaced based on the frequency factordiffering from the Least Frequently Used (LFU) mainlyin that it considers correlations among references. TheLRU-K scheme bases its replacement decision on theblocks’ � th-to-last reference, while the IRG scheme’sdecision is based on the inter-reference gap factor. TheLRFU scheme considers both the recency and frequencyfactors of blocks. These schemes, however, show lim-ited performance improvements because they do notconsider regular references such as sequential and loop-ing references.

Application-controlled file caching by Cao et al. [4]and informed prefetching and caching by Patterson etal. [10] are schemes based on user-level hints. Theseschemes choose a victim block to be replaced based

on user-provided hints on application reference charac-teristics, allowing different replacement policies to beapplied to different applications. However, to obtainuser-level hints, users need to accurately understandthe characteristics of block reference patterns of appli-cations. This requires considerable effort from userslimiting the applicability.

The third group of schemes considers regularities ofreferences, and the 2Q scheme by Johnson and Shasha[3], the SEQ scheme by Glass and Cao [8], and theEELRU (Early Eviction LRU) scheme by Smaragdakiset al. [9] fall into this group. The 2Q scheme quicklyremoves from the buffer cache sequentially-referencedblocks and looping-referenced blocks with long loopperiods. This is done by using a special buffer calledthe � 1 �� queue in which all missed blocks are initiallyplaced and from which the blocks are replaced in theFIFO order after short residence. On the other hand,the scheme holds looping-referenced blocks with shortloop periods in the main buffer cache by using a ghostbuffer called the � 1 �� queue in which the addresses ofblocks replaced from the � 1 �� queue are temporarilyplaced to discriminate between frequently referencedblocks and infrequently referenced ones. When a blockis re-referenced while its address is in the � 1 �� queue,it is promoted to the main buffer cache. The 2Q scheme,however, has two drawbacks. One is that an additionalmiss has to occur for a block to be promoted to the mainbuffer cache from the � 1 �� queue. The other is thata careful tuning is required for two control parameters,that is, the size of the � 1 �� queue and the size of the� 1 �� queue, which may be sensitive to the types ofworkload.

The SEQ scheme detects long sequences of page faultsand applies the Most Recently Used (MRU) schemeto those pages. However, in determining the victimpage, it does not distinguish sequential and looping ref-erences. The EELRU scheme confirms the existenceof looping references by examining aggregate recencydistributions of referenced pages and changes the pageeviction points using a simple on-line cost/benefit analy-sis. The EELRU scheme, however, does not distinguishbetween looping references with different loop periods.

3 The Unified Buffer ManagementScheme

The Unified Buffer Management (UBM) scheme iscomposed of the following three main modules.

Detection This module automatically detects sequen-tial and looping references. After the detection,block references are classified into sequential,looping, or other references.

Replacement This module applies different replace-ment schemes to the blocks belonging to the threereference patterns according to the properties ofeach pattern.

Allocation This module allocates the limited buffercache space among the three partitions corre-sponding to sequential, looping, and other refer-ences.

In the following subsections, we give a detailed expla-nation of each of these modules.

3.1 Detection of Sequential and Looping Ref-erences

The UBM scheme automatically detects sequential,looping, and other references according to the followingrules:

Sequential references that are consecutive block ref-erences occurring only once.

Looping references that are sequential references oc-curring repeatedly with a regular interval.

Other references that are detected neither as sequen-tial nor as looping references.

Figure 2 shows the classification process of the UBMscheme. Note that looping references are initially de-tected as sequential until they are re-referenced.

For on-line detection of sequential and looping refer-ences, information about references to blocks in eachfile is maintained in an abstract form. The elementsneeded are shown in Figure 3.

Information for each file is maintained as a 4-tupleconsisting of a file descriptor ( �� ), a start blocknumber ( �� ), an end block number ( � �� ), and a loopperiod (� �� ). A reference is categorized as a se-quential reference after a given number of consecutivereferences are made. For a sequential reference the loopperiod is � , while for a looping reference its value isthe actual loop period. In real systems, the loop pe-riod fluctuates by various factors including the degree

referencesSequential Otherreferences

Repeat?

referencesLooping

Consecutive references?

block references

NY

Y N

Figure 2: Classification process of the UBM scheme.

of multiprogrammingand scheduling. Hence, this valueis set as an exponential average of measured loop peri-ods. Also, since a looping reference is initially detectedas a sequential reference, its blocks are managed justlike those belonging to a sequential reference until theyare re-referenced. This may make them miss the firsttime they are re-referenced if there is not enough spacein the cache (cf. Section 4.7).

The resulting table keeps information for sequencesof consecutive block references that are detected upto the current time and is updated whenever a blockreference occurs. In most UNIX file systems, sequencesof consecutive block references are detected by usingvnode numbers and consecutive block addresses.

Figure 4 shows an example of sequential and loopingreferences, and the data structure that is used to maintaininformation for these references. In the figure, the filewith fileID 3 is a sequential reference as it has � asits loop period. Files with fileID 1 and 2 are loopingreferences with loop periods of 80 and 40, respectively.

3.2 Block Replacement Schemes

The detection mechanism results in files being catego-rized into three types, that is, sequential, looping, andother references. The buffer cache is divided into three

(fileID,start,end,period)

(1,0,10,80)

Period

10020

109

87

65

43

21

0 01

23

456

78

910

NumberBlock

Time

(1,0,10, )

fileID=1 fileID=1

Figure 3: Elements for representing sequential and loop-ing references.

Example

(1,0,10,80)

(2,0,6,40) (2,0,6,40)

fileID start end period3 0 10

6 40021 0 10 80

referenced block’s (fileID, block_number)

current time = 180

109

87

65

43

21

0 01

23

45

6

01

23

456

7

65

43

21

0 01

23

456

01

23

456

78

910

98

10

520 40 60 80

75100 120 140 160

155180

Time

start: start block numberend: end block number

sequential ref.

looping ref.(2,0,6, )

(3,0,10, )

(1,0,10, )

Figure 4: Example of detection: sequential and loopingreferences.

Sequentialreferences

cache size

MG

Otherreferences

cache size

MG

cache size

HIT

Loopingreferences

cache size

MG

cache size

HIT

cache size

HIT

HIT: Buffer Hits per Unit Time

MG: Marginal Gain

Figure 5: Typical curves of buffer hits per unit time andmarginal gain values.

partitions to accommodate the three different types ofreferences. Management of the partitions must be doneaccording to the reference characteristics of blocks be-longing to each partition.

For the partition that holds sequential references, it is asimple matter. Sequentially-referencedblocks are neverre-referenced. Hence, the referenced blocks need not beretained and therefore, the MRU replacement policy isused.

For the partition that holds looping references, victimblocks to be replaced are chosen based on their loopperiods because their re-reference probabilities dependon these periods. To do so, we use a period-basedreplacement scheme that replaces blocks in decreasingorder of loop periods, and the MRU block replacementscheme among blocks with the same loop period.

Finally, within the partition that holds other references,victim blocks to be replaced can be chosen based ontheir recency, frequency, or a combination of the twofactors. Hence, we may use any of previously proposedreplacement schemes including the Least Recently Used(LRU), the Least Frequently Used (LFU), the LRU-K,and the Least Recently/Frequently Used (LRFU) as longas they have a model that approximates the hit ratio fora given buffer size to compute the marginal gain, whichwill be explained in the next subsection. In this paper,we assume the LRU replacement scheme.

3.3 Buffer Allocation Based on Marginal Gain

The buffer cache has now been divided into three par-titions that are being managed separately. An important

problem that should be addressed then is how to allocatethe blocks in the cache among the three partitions. Tothis end, we use the notion of marginal gains, which hasfrequently been used in resource allocation strategies invarious computer systems areas [10, 11, 12, 13].

Marginal gain is defined as �� 1 � , which specifies the expected number ofextra buffer hits per unit time that would be obtained byincreasing the number of allocated buffers from (n-1)to n, where � � � �� is the expected number of bufferhits per unit time using � buffers. In the following, weexplain how to predict the marginal gains of sequen-tial, looping, and other references, as the buffer cacheis partitioned to accommodate each of these types ofreferences.

The expected number of buffer hits per unit time of se-quential references when using � buffers is � �� 0, and thus, the expected marginal gain is always�� 0.

For looping references, the expected number of bufferhits per unit time and the expected marginal gain valueare calculated as follows.

First, for a looping reference, � �� , with loop length� � and loop period �� , the expected number of buffer hitsper unit time when using � buffers is � �� "! � ��# ��$&% �'� . Thus, if �)( � � , the expected marginalgain is �� *�� +� �,% �'� �-� �.� 1 �*% �'� � 1 % �'� andif �0/ � � , �� 1� � �� % �� % �� 0.

Now, assume that there are 2 looping references3 � �� 1#5464746# � �� #5464746# � ��98;: , where the loops here are

arranged in the increasing order of loop periods. Let�*<>=1? be the maximum of � such that �A@ �B�C

1 �BED

� ,where � is the number of buffers in the partition for loop-ing references. If ��<>=1?F�G2 , then all loops can be held inthe buffer cache, and hence � ��7�� @ 8B�C

1 �B % � B ,

and �� E� 0. Consider now, the more com-plicated case where ��<>=1? D 2 . The expected numberof buffer hits per unit time of these looping referenceswhen using � buffers is � � ��*� � ��+�H@ �7I,JLKB�C

1 �B % � B�M � �"! � � INJLKPO 1

# �Q� @ � INJLKB�C1 �B $R% � � I,JLK5O 1. (Recall that we

are using the period-based replacement scheme to man-age the partition for looping references. Hence, therecan be no loops within this partition that has loop periodlonger than �9�7INJ�K O

1.) Hence, the expected marginalgain is �� *� � �� 1 % �'�6INJLK O 1.

Finally, for the partition that holds the other referencesand which is managed by the LRU replacement scheme,the expected number of buffer hits per unit time and theexpected marginal gain value can be calculated from the

buffer hit ratio using ghost buffers [11, 10] and/or theBelady’s lifetime function [14]. Ghost buffers, some-times called � � �� buffers, are used to estimate thenumber of buffer hits per unit time for cache sizes largerthan the current size when the cache is managed bythe LRU replacement scheme. A ghost buffer does notcontain any data blocks but maintains control informa-tion needed to count cache hits. The prediction of thebuffer hit ratio using only ghost buffers is impracticaldue to the overhead of measuring the hit counts of allLRU stack positions individually. In the UBM scheme,we use an approximation method suggested by Choi etal. [13]. The proposed method utilizes the Belady’slifetime function, which is well-known to approximatethe buffer hit ratio of references that follow the LRUmodel. Specifically, the hit ratio with � buffers is givenby Belady’s lifetime function as� � � ��

1

M �2

M �3

M 46474 M �� 1 � � � �B

where � and � are control parameters. Specific � and� values can be calculated on-line by using measuredbuffer hit ratios at pre-set cache sizes. Ghost buffers areused to determine the hit ratios at these pre-set cachesizes. The overhead of using ghost buffers in this case isminimal as accurate LRU stack positions of referencedblocks need not be located. For example, to calculate thevalues of the control parameters, � and � , buffer hit ratiosat a minimum of two cache sizes, say, � and � , where � �� are required. Using these values and the equation ofthe lifetime function,we can calculate the values of � and

� . Then, the expected number of buffer hits per unit timeis given by � �L�� J�� where� �� and � �R�� = � are the number of other references andthe number of total references, respectively, during theobserved period. Finally, the expected marginal gain issimply �� - � �� E � �� 1 � .

Figure 5 shows typical curves of both buffer hitsper unit time and marginal gain values of sequential,looping, and other references as the number of allo-cated buffers increases. In the UBM scheme, since themarginal gain of sequential references, �� , isalways zero, the buffer manager does not allocate morethan one buffer to the corresponding partition, exceptwhen buffers are not fully utilized. That is, only whenthere are free buffers at the initial stage of allocation,more than one buffer may be allocated to this partition.Thus, in general, buffer allocation is determined be-tween the partitions that hold the looping-referencedblocks and the other-referenced blocks. The UBMscheme tries to maximize the expected number of to-tal buffer hits by dynamically controlling the allocationso that the marginal gain value of looping references,�� *� � �� , and the marginal gain value of other ref-

erences, �� , where � is the cache size,converge to the same value.

3.4 Interaction Among the Three Modules

Figure 6 shows the overall interactions between thethree modules of the UBM scheme. Whenever a blockreference occurs in the buffer cache, the detection mod-ule updates and/or classifies the reference into one ofthe sequential, looping, or other reference types (step(1) in Figure 6). In this example, assume that a misshas occurred and the reference is classified as an otherreference. As a miss has occurred the buffer allocationmodule is called to get additional buffer space for thereferenced block (step (2)). The buffer allocation mod-ule would normally compare the marginal gain valuesof looping and other references and choose the one witha smaller marginal gain value and send a replacementrequest to the corresponding cache partition as shownin step (3). However, if there is space allocated to asequential reference, such space is always deallocatedfirst. The cache management module of the selectedcache partition decides a victim block to be replacedusing its replacement scheme (step (4)) and deallocatesthe buffer space of the victim block to the allocationmodule (step (5)). The allocation module forwards thisspace to the cache partition for other-referenced blocks(step (6)). Finally, the referenced block is fetched fromdisk into the buffer space (step (7)).

4 Performance Evaluation

In this section and the next,we discuss the performanceof the UBM scheme. This section concentrates on thesimulation study, while the next section focuses on theimplementation study.

In this section, the performance of the UBM schemeis compared with those of the LRU, 2Q, SEQ, EELRU,and OPT schemes through trace-driven simulations1.We also compare the performance of the UBM schemewith that of application-controlled file caching throughtrace-driven simulations with the same multiple appli-cation trace used in [4]. We did not compare the per-formance of the UBM and those of schemes based onrecency and/or frequency factors such as FBR, LRU-K,

1Though the SEQ and EELRU schemes were originallyproposed as page replacement schemes, they can also be usedas block replacement schemes.

reference

based on marginal gains

fetch the missed block from the disk

Sequential

6

1

3

4

5

send a replacement request to this partition

reference

select a victim block

referencesLooping

Loopingreference

using an appropriate replacement scheme

7

referenceOther

Sequential

2

update the table and detect sequential and looping references

Detector

request new buffer space

allocate new buffer spacedeallocate buffer space of the victim block

referencesOther

block references

Allocator

Figure 6: Overall structures of the UBM scheme.

Table 1: Characteristics of the traces used.

Trace Applications executed concurrently # of references # of unique blocks

Multi1 cscope, cpp 15858 2606Multi2 cscope, cpp, postgres 26311 5684Multi3 cpp, gnuplot, glimpse, postgres 30241 7453

and LRFU since the benefits from the two factors arelargely orthogonal and any of the latter schemes can beused to manage other references in the UBM scheme.We first describe the experimental setup and then presentthe performance results.

4.1 Experimental Setup

Traces used in our simulations were obtained by con-currently executing diverse applications on the FreeBSDoperating system running on an Intel Pentium PC. Thecharacteristics of the applications are described below.

cpp Cpp is the GNU C compiler pre-processor. Thetotal size of C sources used as input is roughly

11MB. During execution, observed block refer-ences are sequential and other references.

cscope Cscope is an interactive C source examinationtool. The total size of C sources used as input isroughly 9MB. It exhibits looping references withan identical loop period and other references.

glimpse Glimpse is a text information retrieval util-ity. The total size of text files used as input isroughly 50MB. Its block reference characteristicis diverse - it shows sequential references, loopingreferences with different loop periods, and otherreferences.

gnuplot Gnuplot is an interactive plotting program.The size of raw data used as input is 8MB. Loop-ing references with an identical loop period and

0

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000 30000

Logi

cal B

lock

Num

ber

Virtual Time


(a) Block references

0

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000 30000

Logi

cal B

lock

Num

ber

Virtual Time


Sequential references

0

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000 30000

Logi

cal B

lock

Num

ber

Virtual Time


Looping references

0

1000

2000

3000

4000

5000

6000

0 5000 10000 15000 20000 25000 30000

Logi

cal B

lock

Num

ber

Virtual Time


Other references

(b) Classified references

Figure 7: Reference classification results (Trace: � � � � � 2).

other references were observed during execution.

postgres Postgres is a relational database system fromthe University of California at Berkeley. We usedjoin queries among four relations, namely, twot-houstup, twentythoustup, hundredthoustup, andtwohundredthoustup, which were made from ascaled-up Wisconsin benchmark. The sizes ofeach relation are approximately 150KB, 1.5MB,7.5MB, and 15MB. It exhibits sequential refer-ences, looping references with different loop peri-ods, and other references.

mpeg play Mpeg play is an MPEG player from theUniversity of California at Berkeley. The sizeof the MPEG video file used as input is 5MB.Sequential references dominate in this application.

We used three multiple application traces in our ex-periments. They are denoted by � � � � � 1, � � � � � 2, and� � � � � 3, and their characteristics are shown in Table 1.

We built simulators for the LRU, 2Q, SEQ, EELRU,and OPT schemes as well as the UBM scheme. Unlikethe UBM scheme that does not have any parameters thatneed to be tuned, the 2Q, SEQ, and EELRU schemeshave one or more parameters whose settings may affectthe performance. For example, in the 2Q scheme theparameters are the sizes of the � 1 � � and � 1 �� queues.

The parameters of the SEQ scheme are threshold val-ues used to choose victim blocks among consecutivelymissed blocks. Finally, for the EELRU scheme, theearly and late eviction points from which a block is re-placed, have to be set. In our experiments, we used thevalues suggested by the authors of each of the schemes.

4.2 Detection Results

Figure 7 shows the classifications resulting from thedetection module for the � � � � � 2 trace. The � -axisis the virtual time and the � -axis is the logical blocknumber. The figures are space-time graphs with Figure7(a) showing block references for the whole trace andFigure 7(b) showing how the detection module classifiedthe sequential, looping, and other references from theoriginal references. The results indicate that the UBMscheme accurately classifies the three reference patterns.

4.3 IRG Distribution Comparison of the UBMScheme with the OPT Scheme

Figure 8 shows the caching behaviors of the OPT andUBM schemes for the � � � � � 2 trace. Compare theseresults with those shown in Figures 1(b) and 1(c), whichuse the same trace. Recall that these graphs show the

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



100 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



200 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



600 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



800 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



1000 cache blocks(a) OPT block replacement scheme

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



100 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



200 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



600 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



800 cache blocks

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

Hit

Cou

nts



1000 cache blocks(b) Unified buffer management scheme

Figure 8: Caching behaviors of the OPT and UBM schemes.

hit counts in the buffer cache using IRG distributionsof referenced blocks. We note that the UBM scheme isvery closely mimicking the behavior of the OPT scheme.

4.4 Performance Comparison of the UBMScheme with Other Schemes

Figure 9 shows the buffer hit ratios of the UBM schemeand other replacement schemes as a function of cachesize with the block size set to 8KB. For most cases,the UBM scheme shows the best performance. Furtheranalysis for each of the schemes is as follows.

The SEQ scheme shows fairly stable performance forall cache sizes. The reason behind its good performanceis that it quickly replaces sequentially-referencedblocksthat miss in the buffer cache. However, since the schemedoes not consider looping references, it shows worseperformance than the UBM scheme.

The 2Q scheme shows better performance than theLRU scheme for most cases because it quickly replacessequentially-referenced blocks and looping-referencedblocks with long loop periods. However, when thecache size is large (caches with 1800 or more blocks forthe � � � � � 1 trace, 2800 or more blocks for the � � � � � 2trace, and 3600 or more blocks for the � � � � � 3 trace),the scheme shows worse performance than the LRU

scheme. There are two reasons behind this. First, sincethe scheme replaces all newly referenced blocks afterholding it in the buffer cache for a short time, when-ever these blocks are re-referenced, additional missesoccur. The ratio of these additional misses to totalmisses increases as the cache size increases resultingin a significant impact on the performance of the buffercache. Second, the scheme does not distinguish betweenlooping-referenced blocks with different loop periods.The performance of the 2Q scheme does not gradu-ally increase with the cache size, but rather surges be-yond some cache size and then holds steady. Also the2Q scheme exhibits rather anomalous behavior for the� � � � � 2 and � � � � � 3 traces. When the cache sizes areabout 2200 blocks and 2800 blocks for � � � � � 2 and� � � � � 3, respectively, the buffer hit ratio of the schemedecreases as the cache size increases. A careful inspec-tion of results reveals that when the cache size increasesthe � 1 �� queue size increases accordingly and thisresults in a situation where blocks that should not bepromoted are promoted to the main buffer cache lead-ing to such an anomaly.

The EELRU scheme shows similar or better perfor-mance than the LRU scheme as the cache size increases.However, since the scheme chooses a victim block to bereplaced based on aggregate recency distributions of ref-erenced blocks, it does not replace quickly enough thesequentially-referenced blocks and looping-referencedblocks that have long loop periods. Further, like the2Q scheme, it does not distinguish between looping-referenced blocks with different loop periods. Hence, itdoes not fair well compared to the UBM scheme.

The LRU scheme shows the worst performance formost cases because it does not give any consideration tothe regularities of sequential and looping references.

Finally, the UBM scheme replaces sequentially-referenced blocks quickly and holds the looping-referenced blocks in increasing order of loop periodsbased on the notion of marginal gains as the cache sizeincreases. Consequently, the UBM scheme improvesthe buffer hit ratios by up to 57.7% (for the � � � � � 1trace with 1400 blocks) compared to the LRU schemewith an average increase of 29.2%.

4.5 Results of Dynamic Buffer Allocation

Figure 10(a) shows the distribution of the buffers allo-cated to the partitions that hold sequential, looping, andother references as a function of (virtual) time when thebuffer cache size is 1000 blocks. Until time 2000, the

200 600 1000 1400 1800Cache Size(# of blocks)

30%

40%

50%

60%

70%

80%

90%

Hit

Rat

io (

%)

Multi1(cscope+cpp) trace

OPTUBMSEQ2QEELRULRU

200 600 1000 1400 1800 2200 2600 3000Cache Size(# of blocks)

10%

20%

30%

40%

50%

60%

70%

80%

Hit

Rat

io (

%)


OPTUBMSEQ2QEELRULRU

200 600 1000 1400 1800 2200 2600 3000 3400 3600Cache Size(# of blocks)

10%

20%

30%

40%

50%

60%

70%

80%

Hit

Rat

io (

%)

Multi3(cpp+gnuplot+glimpse+postgres) trace

OPTUBMSEQ2QEELRULRU

Figure 9: Performance comparison of the UBM schemewith other schemes.

0

200

400

600

800

1000

0 5000 10000 15000 20000 25000 30000

Cac

he B

lock

s

Virtual Time


LoopingReferences

OtherReferences

SequentialReferences

(a) Dynamic buffer allocation

0

0.0005

0.001

0.0015

0.002

0 500 1000 1500 2000

Mar

gina

l Gai

n

Cache Size


731

0.000222

Looping references

0

0.0005

0.001

0.0015

0.002

0 500 1000 1500 2000

Mar

gina

l Gai

n

Cache Size


268

0.000222

Other references

(b) Expected marginal gain values (time: 20000)

Figure 10: Results of dynamic buffer allocation (cache size: 1000 blocks, trace: � � � � � 2).

buffer cache is not fully utilized. Hence, buffers areallocated to any reference that requests it, resulting inthe partition that holds sequentially-referenced blocksbeing allocated a considerable number of blocks. Af-ter time 2000, all the buffers have been consumed, andhence, the number of buffers allocated to the partition forsequentially-referenced blocks decreases, while alloca-tions to the partitions for looping and other referencesstart to increase. As a result, at around time 6000, onlyone buffer is allocated to the partition for sequentially-referenced blocks. From about time 10000, the alloca-tions to the partitions for the looping and other refer-ences converge to a steady-state value.

Figure 10(b) shows marginal gains as a function of al-located buffers of looping and other references that arecalculated at time 20000 of Figure 10(a). Since thereare several looping references with different loop peri-ods in the � � � � � 2 trace, from the left figure, we cansee that the expected marginal gain values of the loop-ing references, �� , decrease step-wise as thenumber of allocated buffers, � , increases. The figure onthe right shows the expected marginal gain values of theother references, �� , that decrease gradually.The UBM scheme dynamically controls the number ofallocated buffers to each partition so that the marginal

gain values of the two partitions converge to the samevalue. At time 20000, the two marginal gain values con-verge to 0.000222 and the number of allocated buffersto the partitions for the looping and other references is731 blocks and 268 blocks, respectively.

4.6 Performance Comparison of the UBMScheme with Application-controlled FileCaching

The application-controlled file caching (ACFC)scheme [4] is a user-hint based approach to buffer cachemanagement, which is in contrast to the UBM schemethat requires no such hints. To compare the performanceof these two schemes, we used the ULTRIX multiple ap-plication (� �� M �� M linking the kernel) tracein [4].

Figure 11 shows the buffer hit ratios of theUBM scheme, two ACFC schemes, namely,ACFC(HEURISTIC) and ACFC(RMIN), and theLRU, 2Q, SEQ, EELRU, and OPT schemes whenthe cache size increases from 4M to 16M. TheACFC(HEURISTIC) scheme uses user-level hints foreach application, while the ACFC(RMIN) scheme uses

4M 8M 12M 16MCache Size(MB)

70%

75%

80%

85%

90%

95%

100%H

it R

atio

(%

)ULTRIX(cscope+linking the kernel+postgres) trace

OPTUBMACFC(RMIN)ACFC(HEURISTIC)2QEELRUSEQLRU

Figure 11: Performance comparison of the UBMscheme with the application-controlled file cachingscheme.

the optimal off-line strategy for each application. Theresults for the two ACFC schemes were borrowed from[4] while the results for all the other schemes wereobtained from simulations. The results show that thehit ratios of the UBM scheme, which does not makeuse of any external hints, are comparable to those ofthe ACFC(RMIN) scheme and higher than those of theACFC(HEURISTIC) scheme.

4.7 Warm/Cold Caching Effects

All experiments so far were done with cold caches.To evaluate the performance of the UBM scheme atsteady-state, we performed additional simulations withwarmed-up caches. In the experiments, we run initiallya long-run workload ( � � � �� 2) through thebuffer cache and after the cache was warmed up, werun both the long-run workload and the target work-load ( �� M �� M �� ) concurrently. Thecache statistics were collected only after the cache waswarmed up.

Experimental results that show the warm/cold cachingeffects of the UBM scheme are given in Figure 12 andTable 2. The graphs of Figure 12 show that when thecache size is small, the performance improvements bythe UBM scheme with warmed-up caches over the LRUscheme are similar to those with cold caches. As thecache size increases, however, the performance increaseby the UBM scheme with warmed-up caches is reduced

2Sdet is the SPEC SDET benchmark that simulates a mul-tiprogramming environment.

when compared with the performance increase with coldcaches since sequential references are not allowed to becached at all. Specifically, when the cache is warmedup, blocks belonging to sequential references are notallowed to be cached. If those blocks are re-referencedwith a regular interval in the future (i.e., if they turninto looping references), the UBM scheme has to rereadthem from the disks. In cold caches, however, manyof them are reread from the cache partition that holdssequentially-referenced blocks because when the cacheis cold, more than one block can be allocated to thepartition for the sequentially referenced blocks.

The resulting performance degradation, however, isnot significant as we can see from Table 2 that sum-marizes the performance improvements by the UBMscheme for both cold caches and warmed-up caches -the difference in the average performance improvementbetween cold caches and warmed-up caches is less than1%. Although the overall performance degradation isnegligible, the additional misses may adversely affectthe performance perceived by the user due to increasedstart-up time. As future work, we plan to explore anallocation scheme where blocks are allocated even forsequential references based on the probability that a se-quential reference will turn into a looping reference.

5 Implementation of UBM in the FreeBSDOperating System

The UBM scheme was integrated into the FreeBSD op-erating system running on a 133MHz Intel Pentium PCwith 128MB RAM and a 1.6GB Quantum Fireball harddisk. For the experiments, we used five real applications( � �� , �� , � � � �� , �� , and �� ) thatwere explained in Section 4. We ran several combina-tions of three or more of these applications concurrentlyand measured the elapsed time of each application underthe UBM and SEQ schemes as well as under the built-inLRU scheme when the cache sizes are 8MB, 12MB, and16MB with the block size set to 8KB.

5.1 Performance Measurements

Figure 13 shows the elapsed time of each individualapplication under the UBM, SEQ and LRU schemes. Asexpected, the UBM scheme shows better performancethan the LRU scheme. In the figure, since the ��and �� applications access large files repeatedlywith a regular interval, they show better improvement


10%

20%

30%

40%

50%

60%

70%

80%

90%

Hit

Rat

io (

%)

[cscope+cpp+postgres+sdet] trace

OPTUBMSEQ2QEELRULRU

(a) With cold caches


10%

20%

30%

40%

50%

60%

70%

80%

90%

Hit

Rat

io (

%)

[cscope+cpp+postgres+sdet] trace

OPTUBMSEQ2QEELRULRU

(b) With warmed-up caches

Figure 12: The cold/warm caching effects of the UBMscheme in trace-driven simulations (Trace: �� M�� M �� M � � � ).

in the elapsed time than other applications. On theother hand, the �� application does not showas much improvement because it accesses a large videofile sequentially.

Overall, the UBM scheme reduces the elapsed time byup to 67.2% (the elapsed time of the � �� applica-tion for the � �� M � �� M �� M �� casewith 16MB cache size) compared to the LRU schemewith an average improvement of 28.7%. We note thatimprovements by the UBM scheme on the elapsed timeare comparable to those on the buffer hit ratios we ob-served in the previous section.

There are two main benefits from using the UBMscheme. The first is from detecting looping references,

LRU SEQ UBMScheme

0

200

400

600

Ela

psed

Tim

e (s

econ

ds)

8MB Cache

cpppostgrescscope

LRU SEQ UBMScheme

12MB Cache

LRU SEQ UBMScheme

16MB Cache

cpp+postgres+cscope

LRU SEQ UBMScheme

0

200

400

600

Ela

psed

Tim

e (s

econ

ds)

8MB Cache

glimpsecpppostgrescscope

LRU SEQ UBMScheme

12MB Cache

LRU SEQ UBMScheme

16MB Cache

glimpse+cpp+postgres+cscope

LRU SEQ UBMScheme

0

200

400

600

Ela

psed

Tim

e (s

econ

ds)

8MB Cache

cpppostgrescscopempeg_play

LRU SEQ UBMScheme

12MB Cache

LRU SEQ UBMScheme

16MB Cache

cpp+postgres+cscope+mpeg_play

Figure 13: Performance of the UBM scheme integratedinto the FreeBSD.

Table 2: Comparison of performance improvements ofthe UBM scheme compared to th e LRU scheme (Trace:�� M �� M � �� M � �� ).

Improvements Improvementswith cold caches with warmed-up caches

avg. 19.6% (max. 26.2%) avg. 18.9% (max. 22.6%)

managing them by a period-based replacement policy,and allocating buffer space to them based on marginalgains. The second is from giving preference to blocksbelonging to sequential references when a replacementis needed. To quantify these benefits, we compared theUBM scheme with the SEQ scheme. The results in Fig-ure 13 show that there is still a substantial differencein the elapsed time between the UBM scheme and theSEQ scheme indicating that the benefit from carefullyhandling looping references is significant.

5.2 Effects of Run-Time Overhead of the UBMScheme

To measure the run-time overhead of the UBMscheme, we executed a CPU-bound application( �� ) that executes an � � � instruction repeat-edly, along with the other applications.

Figure 14 shows the run-time overhead of the UBMscheme for two combinations of applications, namely,�� M �� M � �� M �� and �� M� � � � �� M �� M �� M �� when the cachesize is 12MB. The elapsed time of the � � � � �� appli-cation increases slightly by around 5% when using theUBM scheme compared with that when using the LRUscheme. A major source of this overhead comes fromthe operations to manipulate the ghost buffers and tocalculate the marginal gain values. Currently, we inte-grated the UBM scheme into the FreeBSD in a straight-forward manner without any optimization,and hence weexpect there is still much room for further optimizingthe performance.

The UBM scheme also has the space overhead of main-taining ghost buffers in the kernel memory. The maxi-mum size of the LRU stack to be maintained includingghost buffers is limited by the total number of buffersin the file system. Therefore, the space overhead due toghost buffers is proportional to the difference betweenthe total number of buffers in the system and the numberof allocated buffers for other references. In the current

LRU UBMScheme

0

200

400

600

Ela

psed

Tim

e (s

econ

ds)

cpu_bound+cpp+postgres+cscope12MB Cache

cpu_boundcpppostgrescscope

LRU UBMScheme

cpu_bound+glimpse+cpp+postgres+cscope12MB Cache

cpu_boundglimpsecpppostgrescscope

Figure 14: Run-time overhead of the UBM scheme.

implementation, each ghost buffer requires 13 bytes.

6 Conclusions and Future Work

This paper starts from the observation that the widelyused LRU replacement scheme does not make use ofregularities present in the reference patterns of applica-tions, leading to degraded performance. The UnifiedBuffer Management (UBM) scheme is proposed to re-solve this problem. The UBM scheme automaticallydetects sequential and looping references and stores thedetected blocks in separate partitions of the buffer cache.These partitions are managed by appropriate replace-ment schemes based on the properties of their detectedpatterns. The allocation problem among the partitionsis also tackled with the use of the notion of marginalgains.

To evaluate the performance of the UBM scheme, ex-periments were conducted using both trace-driven sim-ulations with multiple application traces and an imple-mentation of the scheme in the FreeBSD operating sys-tem. Both simulation and implementation results showthat 1) the UBM scheme accurately detects almost all thesequential and looping references, 2) the UBM schememanages sequential and looping-referenced blocks sim-ilarly to the OPT scheme, and 3) the UBM schemeshows substantial performance improvements increas-ing the buffer hit ratio by up to 57.7% (with an averageincrease of 29.2%) and reducing, in an actual implemen-tation in the FreeBSD operating system,the elapsed timeby up to 67.2% (with an average of 28.7%) comparedto the LRU scheme, for the workloads we considered.

As future research, we are attempting to apply to other

references the Least Recently/Frequently Used (LRFU)scheme based on both recency and frequency factorsrather than the LRU scheme, which is based on therecency factor only. Also, as automatic detection ofsequential and looping references is possible, we areinvestigating the possibility of further enhancing per-formance through prefetching techniques that exploitthese regularities as was attempted in [10] for informedprefetching and caching. Finally, we plan to extendthe techniques presented in this paper to systems thatintegrate virtual memory and file cache management.

Acknowledgements

The authors are grateful to the anonymous reviewersfor their constructive comments and to F. Douglis, our“shepherd”, for his guidance and help during the revi-sion process. The authors also want to thank P. Cao forproviding the ULTRIX trace used in the experiments.

References

[1] J. T. Robinson and M. V. Devarakonda. DataCache Management Using Frequency-Based Re-placement. In Proceedings of the 1990 ACM SIG-METRICS Conference on Measurement and Mod-eling of Computer Systems, pages 134–142, 1990.

[2] E. J. O’Neil, P. E. O’Neil, and G. Weikum.The LRU-K Page Replacement Algorithm forDatabase Disk Buffering. In Proceedings of the1993 ACM SIGMOD Conference, pages 297–306,1993.

[3] T. Johnson and D. Shasha. 2Q : A Low OverheadHigh Performance Buffer Management Replace-ment Algorithm. In Proceedings of the 20th In-ternational Conference on VLDB, pages 439–450,1994.

[4] P. Cao, E. W. Felten, and K. Li. Application-Controlled File Caching Policies. In Proceedingsof the USENIX Summer 1994 Technical Confer-ence, pages 171–182, 1994.

[5] V. Phalke and B. Gopinath. An Inter-ReferenceGap Model for Temporal Locality in Program Be-havior. In Proceedings of the 1995 ACM SIGMET-RICS Conference on Measurement and Modelingof Computer Systems, pages 291–300, 1995.

[6] D. Lee, J. Choi, S. H. Noh, S. L. Min, Y. Cho, andC. S. Kim. On the Existence of a Spectrum of Poli-cies that Subsumes the Least Recently Used (LRU)and Least Frequently Used (LFU) Policies. In Pro-ceedings of the 1999 ACM SIGMETRICS Confer-ence on Measurement and Modeling of ComputerSystems, pages 134–143, 1999.

[7] E. G. Coffman, JR. and P. J. Denning. Operat-ing Systems Theory. Prentice-Hall InternationalEditions, 1973.

[8] G. Glass and P. Cao. Adaptive Page ReplacementBased on Memory Reference Behavior. In Pro-ceedings of the 1997 ACM SIGMETRICS Confer-ence on Measurement and Modeling of ComputerSystems, pages 115–126, 1997.

[9] Y. Smaragdakis, S. Kaplan, and P. Wilson.EELRU: Simple and Effective Adaptive Page Re-placement. In Proceedings of the 1999 ACM SIG-METRICS Conference on Measurement and Mod-eling of Computer Systems, pages 122–133, 1999.

[10] R. H. Patterson, G. A. Gibson, E. Ginting,D. Stodolsky, and J. Zelenka. Informed Prefetch-ing and Caching. In Proceedings of the 15th Sym-posium on Operating System Principles, pages 1–16, 1995.

[11] D. Thiebaut, H. S. Stone, and J. L. Wolf. Im-proving Disk Cache Hit-Ratios Through CachePartitioning. IEEE Transactions on Computers,41(6):665–676, 1992.

[12] C. Faloutsos, R. Ng, and T. Sellis. Flexibleand Adaptable Buffer Management Techniques forDatabase Management Systems. IEEE Transac-tions on Computers, 44(4):546–560, 1995.

[13] J. Choi, S. Cho, S. H. Noh, S. L. Min, and Y. Cho.Analytic Prediction of Buffer Hit Ratios. IEE Elec-tronics Letters, 36(1):10–11, 2000.

[14] J. R. Spirn. Program Behavior: Models and Mea-surements. New York: Elsevier-North Holland,1977.

a low-overhead high-performance uniﬁed buffer management ... · that exploits sequential and...

Documents