[ieee 2010 3rd international symposium on computational intelligence and design (iscid) - hangzhou,...

3
A Semantic-aware Cache Prefetching Mechanism for Disk Array Zhiqiang Liu 1 , Lifang Wang 2 , Zhike Zhang 2 , Aihua Zhang 1 , Zejun Jiang 2 1. College of Software and Microelectronics, Northwestern Polytechnical University, Xian China, 710072 2. School of Computer Science and Technology, Northwestern Polytechnical University, Xian China, 710072 Email: [email protected] , [email protected] , [email protected] , [email protected] , [email protected] Abstract—Cache prefetching is a highly effective technique for improving I/O performance of disk array. A semantic-aware cache prefetching mechanism is designed to improve cache hit rate of disk array. The mechanism can identify some important properties of each block on the disk, and utilize that information to determine whether the block is live or dead. The allocation status (live or dead) of block is a kind of semantic information of blocks in disk array. Knowledge about semantic information of blocks can enable the mechanism to selectively cache only those live blocks. Simulation experiments have been carried out to evaluate performance of semantic- aware cache prefetching mechanism. The results show that the proposed mechanism gives a better hit rate than mechanism without semantic information for a large number of read requests. The mechanism is especially efficient when the disk capacity utilization rate is low. Keywords-storage system; disk array; cache algorithm; prefetching Disk array possesses gigabytes of RAM, called buffer cache, for caching disk blocks. A critical problem in improving disk array’s I/O performance is to design an effective block prefetching and replacement algorithm for the buffer cache. Cache prefetching [1] algorithm of a disk array is one of the key factors affecting disk array's performance. The main motivation for prefetching is to overlap computation with I/O and thus reduce the exposed latency of I/Os [2] . Buffer cache is very restricted and expensive resource of disk array. There is no separate space or replacement policy allocated for the prefetch cache [3] . Those prefetched disk blocks need to be stored in the buffer cache. All subsequent requests to the block are handled by this cache and there will be no disk read for the block until the cache is evicted. Therefore block prefetching can potentially compete for buffer cache entries. It is substantially important to design suitable prefetching algorithm to improve cache utilization rate. Some prefetching techniques use access patterns [4] to predict what data to prefetch, while other prefetching techniques prefetch sequential data [5] . Both of them utilize features of I/Os to decide what data and how much data to prefetch. But, they can’t acquire value of the prefetched blocks or relationship with the blocks which have been read. To solve this problem, we propose a cache prefetching algorithm used for RAID, based on the semantic information. I. SEMANTIC INFORMATION Storage access protocols such as SCSI and ATA/IDE export a narrow, block-based interface with simple read and write APIs to access the logical block address. A Storage controller internally maps the logical block address to a physical sector. The main disadvantage of the block-based interface is the absence of higher-level semantics. As compared to a typical storage device, the key advantage of a semantically-smart storage system [6-8] is its ability to identify and utilize important properties of each block on the disk. For instance, given a block, the system can identify whether it is an inode or a data block. And if it is an inode block, the system can identify individual fields in the inode. The property of each block extracted from disk is semantic information. It is a challenge that how a RAID system gains the semantic information of each block on disk. These semantic information can be acquired through two approaches: one is offline technique, and the other is online technique [6] . EOF (Extraction of File systems) is an offline tool that gains knowledge of file system data structure. The technique that EOF uses is isolation combined with known patterns. For example, if two blocks (one is a data block and the other is an inode) are written to disk during a given test, then we can identify each as follows. First, we can fill the data block with a known pattern; by monitoring the contents of all written blocks, the storage system can detect such data blocks. Then, because the disk knows that for the given workload, only an inode and data block are written to disk, it can successfully isolate the inode block (it is the block that is not filled with a known pattern). In this manner, EOF can acquire a remarkable amount of detailed information about the on-disk structure of the file system. Online technique is comprised of classification, association, and operation inferencing. With classification, the disk system examines the block address of a read or write request and uses it to determine the type of the block. Association technique is used to connect related blocks in a simple and efficient manner. By operation inferencing technique, storage system can infer the types of higher-level “operations” invoked by the file system above. II. SEMANTIC-AWARENESS CACHE PREFETCHING MECHANISM The Semantic-awareness Cache prefetching Mechanism consists of three modules, i.e. extracting semantic information module, prefetching module and cache 2010 International Symposium on Computational Intelligence and Design 978-0-7695-4198-3/10 $26.00 © 2010 IEEE DOI 10.1109/ISCID.2010.112 109 2010 International Symposium on Computational Intelligence and Design 978-0-7695-4198-3/10 $26.00 © 2010 IEEE DOI 10.1109/ISCID.2010.112 95

Upload: zejun

Post on 14-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2010 3rd International Symposium on Computational Intelligence and Design (ISCID) - Hangzhou, China (2010.10.29-2010.10.31)] 2010 International Symposium on Computational Intelligence

A Semantic-aware Cache Prefetching Mechanism for Disk Array

Zhiqiang Liu1, Lifang Wang2, Zhike Zhang2, Aihua Zhang1, Zejun Jiang2 1. College of Software and Microelectronics, Northwestern Polytechnical University, Xian China, 710072 2. School of Computer Science and Technology, Northwestern Polytechnical University, Xian China, 710072

Email: [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Cache prefetching is a highly effective technique for improving I/O performance of disk array. A semantic-aware cache prefetching mechanism is designed to improve cache hit rate of disk array. The mechanism can identify some important properties of each block on the disk, and utilize that information to determine whether the block is live or dead. The allocation status (live or dead) of block is a kind of semantic information of blocks in disk array. Knowledge about semantic information of blocks can enable the mechanism to selectively cache only those live blocks. Simulation experiments have been carried out to evaluate performance of semantic-aware cache prefetching mechanism. The results show that the proposed mechanism gives a better hit rate than mechanism without semantic information for a large number of read requests. The mechanism is especially efficient when the disk capacity utilization rate is low.

Keywords-storage system; disk array; cache algorithm; prefetching

Disk array possesses gigabytes of RAM, called buffer cache, for caching disk blocks. A critical problem in improving disk array’s I/O performance is to design an effective block prefetching and replacement algorithm for the buffer cache. Cache prefetching[1] algorithm of a disk array is one of the key factors affecting disk array's performance. The main motivation for prefetching is to overlap computation with I/O and thus reduce the exposed latency of I/Os[2].

Buffer cache is very restricted and expensive resource of disk array. There is no separate space or replacement policy allocated for the prefetch cache[3]. Those prefetched disk blocks need to be stored in the buffer cache. All subsequent requests to the block are handled by this cache and there will be no disk read for the block until the cache is evicted. Therefore block prefetching can potentially compete for buffer cache entries. It is substantially important to design suitable prefetching algorithm to improve cache utilization rate. Some prefetching techniques use access patterns[4] to predict what data to prefetch, while other prefetching techniques prefetch sequential data[5]. Both of them utilize features of I/Os to decide what data and how much data to prefetch. But, they can’t acquire value of the prefetched blocks or relationship with the blocks which have been read. To solve this problem, we propose a cache prefetching algorithm used for RAID, based on the semantic information.

I. SEMANTIC INFORMATION Storage access protocols such as SCSI and ATA/IDE

export a narrow, block-based interface with simple read and write APIs to access the logical block address. A Storage controller internally maps the logical block address to a physical sector. The main disadvantage of the block-based interface is the absence of higher-level semantics.

As compared to a typical storage device, the key advantage of a semantically-smart storage system [6-8] is its ability to identify and utilize important properties of each block on the disk. For instance, given a block, the system can identify whether it is an inode or a data block. And if it is an inode block, the system can identify individual fields in the inode. The property of each block extracted from disk is semantic information.

It is a challenge that how a RAID system gains the semantic information of each block on disk. These semantic information can be acquired through two approaches: one is offline technique, and the other is online technique[6].

EOF (Extraction of File systems) is an offline tool that gains knowledge of file system data structure. The technique that EOF uses is isolation combined with known patterns. For example, if two blocks (one is a data block and the other is an inode) are written to disk during a given test, then we can identify each as follows. First, we can fill the data block with a known pattern; by monitoring the contents of all written blocks, the storage system can detect such data blocks. Then, because the disk knows that for the given workload, only an inode and data block are written to disk, it can successfully isolate the inode block (it is the block that is not filled with a known pattern). In this manner, EOF can acquire a remarkable amount of detailed information about the on-disk structure of the file system.

Online technique is comprised of classification, association, and operation inferencing. With classification, the disk system examines the block address of a read or write request and uses it to determine the type of the block. Association technique is used to connect related blocks in a simple and efficient manner. By operation inferencing technique, storage system can infer the types of higher-level “operations” invoked by the file system above.

II. SEMANTIC-AWARENESS CACHE PREFETCHING MECHANISM

The Semantic-awareness Cache prefetching Mechanism consists of three modules, i.e. extracting semantic information module, prefetching module and cache

2010 International Symposium on Computational Intelligence and Design

978-0-7695-4198-3/10 $26.00 © 2010 IEEE

DOI 10.1109/ISCID.2010.112

109

2010 International Symposium on Computational Intelligence and Design

978-0-7695-4198-3/10 $26.00 © 2010 IEEE

DOI 10.1109/ISCID.2010.112

95

Page 2: [IEEE 2010 3rd International Symposium on Computational Intelligence and Design (ISCID) - Hangzhou, China (2010.10.29-2010.10.31)] 2010 International Symposium on Computational Intelligence

8 16 32 640

20

40

60

80

100

Hit

Rat

e (%

)

Cache Size (MB)

Without Semantic Information With Semantic Information

management module. The extracting semantic information module picks up liveness information of blocks. The prefetching module deals with details of data prefetching, like how much data to prefetch and when to prefetch. The cache management module deals with preserving useful prefetched data until I/O requests for the data arrive.

A fundamental piece of information required in semantic-awareness cache prefetching mechanism is the liveness information[9] of block. Live block means the given disk block contains valid data, i.e., data that is accessible through the file system.

The extracting semantic information module extracts semantic information of file system of the data stored in RAID. The module adopt the implicit detection approach[9], thus the storage system can infer the liveness information efficiently underneath a range of file systems, without any change to the storage interface.

The prefetching module determines when to prefetch and how many blocks to prefetch. Therefore, efficiency of this module determines the efficiency of semantic-awareness cache prefetching mechanism. Prefetching is beneficial for sequential accesses to a file, i.e., accesses to consecutive blocks of that file. When a file is not accessed sequentially, prefetching can potentially result in extra I/Os by reading data that is not used. For this reason, it is critical to make its best guesses that whether future accesses are sequential, and to decide whether to perform prefetching. Many attempts have been made to detect sequential I/O request for Cache Prefetching algorithm, and these works are comprehensive and in-depth. So we would not discuss the details。

The prefetching module is activated when an I/O request misses in the cache, and prefetching is only used for read access. A summary of the prefetching module is presented in figure 1. When a read request arrives, the prefetching module reads the on-demand accessed block and prefetches a number of blocks following the on-demand accessed block. The blocks that are prefetched are also referred to as a read-ahead group. Only a subset of prefetched blocks may be live, and thus, caching the whole read-ahead group may result in suboptimal cache space utilization. with knowledge about liveness of blocks in disk array, the prefetching module selectively cache only those live blocks. This prefetching is called synchronous prefetching, as the prefetched blocks are read along with the on-demand accessed block. The number of live blocks in a read-ahead group determines the number of blocks once prefetching.

The next read request may or may not be sequential with respect to previous one. If the request accesses a block that the prefetching module has not already prefetched, i.e., the block is not in previous read request's read-ahead group, the module decides that prefetching was not useful and resorts to conservative prefetching as described above. However, if the block is hit in the previous read-ahead group, the prefetching module performs more aggressive

prefetching. The size of the previous read-ahead group is doubled to determine the number of blocks (N) to be prefetched on this access. However, N is never increased beyond a pre-specied maximum (usually 32 blocks). The prefetching module then attempts to read the N contiguous blocks that follow the blocks in the previous read-ahead group, and cache only those blocks that are live. This prefetching is called asynchronous prefetching as the on-demand block is already prefetched, and the new prefetching requests are issued asynchronously.

Figure 1. summary of the prefetching module.

The cache management module determines the size of read-ahead group and the replacement policy of prefetch cache. There is no separate space or replacement policy allocated for the prefetch cache. Instead, prefetched data are loaded into the single cache and treated just like regular data. When the cache is full, prefetched data are thrown out like the rest of the data depending on the cache replacement policy, for example LRU, ARC and so on.

III. PERFORMANCE EVALUATION In this section, we assess the performance of semantic-

aware cache prefetching mechanism.

Figure 2. comparison of hit rate of cache prefetching mechanism for different cache size.

We focus primarily on the read hit rate of the mechanism with various cache size and storage capacity.

11096

Page 3: [IEEE 2010 3rd International Symposium on Computational Intelligence and Design (ISCID) - Hangzhou, China (2010.10.29-2010.10.31)] 2010 International Symposium on Computational Intelligence

For experimental evaluation of the mechanism, we have used accurate evaluation system for disk array cache algorithms[10].

In the first experiment, we measure the read hit rate of the mechanism with different cache size. Storage capacity of disk array is 40 GB, and capacity utilization is 30%, live blocks are distributed on the disk randomly. The experiment performs 100000 times read request, and each read request accesses randomly one of all blocks.

Figure 3. comparison of hit rate of cache prefetching mechanism

for different capacity utilization rate.

In figure 2, it is clearly that the read hit rate of the semantic-awareness cache prefetching mechanism is higher than the cache prefetching mechanism without semantic information.

Further more, the differences between semantic-awareness cache prefetching mechanism and cache prefetching mechanism without semantic is evaluated. The simulation environment is illustrated as follow: the capacity of disk is 40GB and data is distributed randomly in it, and Cache size is 16MB, 100,000 random read is excuted and one block is read one time.

The simulation result in figure 3 shows that the hit rate of the semantic-awareness cache prefetching mechanism is higher than the cache prefetching mechanism without semantic. Figure 3 shows that when the utilization rate of disk capacity is low, the hit rate of the semantic-awareness cache prefetching mechanism is substantially better. And the higher the utilization rate of disk capacity is, the fewer the hit rate difference between the semantic-awareness cache prefetching mechanism and the cache prefetching mechanism without semantic information is. When the utilization rate of disk capacity is almost 90%, the two mechanisms are almost the same (the difference is 0.7%).

IV. CONCLUSIONS The semantic-awareness cache prefetching mechanism

utilizes the semantic information stored in the storage medium to prefetch data. The semantic information shows whether a block is live or dead. The dead block is free block

in the media, which is not used. The semantic-awareness cache prefetching mechanism exploits this knowledge to improve performance transparently or enhance functionality beneath a standard block read/write interface.

The result of evalution proves that the semantic-awareness cache prefetching mechanism is suitable for the low utilization rate of medium capacity.

ACKNOWLEDGMENT This work was supported by the Shaanxi province NSF

grants 2009JQ8021 and 2009JM8017, and by aviation science foundation grant 2009ZD53044.

REFERENCES [1] N. Mary and H. Tan, "Improving iSCSI Memory Cache Hit through

Prefetching to a Striped Disk," 2010 Second International Conference on Computer Engineering and Applications. vol. 2 Bali Island: IEEE Computer Society, 2010, pp. 504-508.

[2] S. Seelam, I.-H. Chung, J. Bauer, and H.-F. Wen, "Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems," 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) Atlanta, GA, USA: IEEE, 2010, pp. 1-12.

[3] A. R. Butt, C. Gniady, and Y. C. Hu, "The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms," IEEE Transactions on Computers, vol. 56, pp. 889-908, 2007.

[4] G. Soundararajan, M. Mihailescu, and C. Amza, "Context-aware prefetching at the storage server," USENIX 2008 Annual Technical Conference on Annual Technical Conference Boston, Massachusetts: USENIX Association, 2008, pp. 377-390.

[5] M. Li, E. Varki, S. Bhatia, and A. Merchant, "TaP: table-based prefetching for storage caches," Proceedings of the 6th USENIX Conference on File and Storage Technologies, San Jose, California, 2008, pp. 21-26.

[6] A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, L. N. Bairavasundaram, T. E. Denehy, F. I. Popovici, V. Prabhakaran, and M. Sivathanu, "Semantically-smart disk systems: past, present, and future," ACM SIGMETRICS Performance Evaluation Review, vol. 33, pp. 29-35, 2006.

[7] M. Sivathanu, V. Prabhakaran, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "Improving storage system availability with D-GRAID," ACM Transactions on Storage, vol. 1, pp. 133-170, 2005.

[8] L. N. Bairavasundaram, M. Sivathanu, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs," Proceedings of the 31st annual international symposium on Computer architecture München, Germany: IEEE Computer Society, 2004.

[9] M. Sivathanu, L. N. Bairavasundaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "Life or death at block-level," Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6 San Francisco, CA: USENIX Association, 2004.

[10] W. Lifang, Z. Xingshe, L. Zhiqiang, and W. Xinmin, "An Accurate Evaluation System for Disk Array Cache Algorithms," Journal of northwestern polytechnical university, vol. 27, pp. 721-724, 2009.

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100

Hit

Rat

e (%

)

Capacity utilization rate (%)

Without Semantic Information With Semantic Information

11197