asynchronous disk interleaving: approximating access delays

10
IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991 80 1 Asynchronous Disk Interleaving: Approximating Access Delays Michelle Y. Kim, Member, IEEE, and Asser N. Tantawi, Senior Member, IEEE Abstract-Disk interleaving, or disk striping, distributes a data block across a group of disks and allows parallel transfer of data. Disk interleaving is achieved by dividing a data block into a number of subblocks and placing each subblock on a separate disk. A subblock can be stored on an interleaved disk at a predetermined location (relative to the adjacent subblocks), or it can be stored at any location on the disk. We consider a system where adjacent subblocks are placed independently of each other, we call it an asynchronous disk interleaving system, and analyze its performance implications. Since each of the disks in such a system is treated independently while being accessed as a group, the access delay of a request for a data block in an n-disk system is the maximum of n access delays. Using approximate analysis, we obtain a simple expression for the expected value of such a maximum delay. The analytic approximation is verified by simulation using trace data, the relative error is found to be at most 6%. Index Terms-Approximating access delays, asynchronous disk interleaving, simulation of an asynchronous system. I. INTRODUCTION ITH increasing processor speeds and multiprocessor W organizations, the processing power of a computer system has improved greatly in recent years. This in turn has allowed the scale of computing problems to grow. Many large- scale problems require the processing of huge amounts of data. If the data array is larger than the size of main memory, it is assumed to be stored on external devices such as disks. The data rates of disks are limited by their mechanical speeds, and this has caused a huge speed mismatch between the processing power and the 1/0 system. Consequently, problems that were once CPU-bound are quickly becoming 1/0 bound. Unless the problem of 1/0 bandwidth is solved, the dramatically improved processing speeds will not result in a speedup of the system as desired. Disk interleaving (also called disk stripping)' has been suggested as a means of improving bandwidth to disks for large database systems [2], or for scientific applications [6]. More recently, the effectiveness of disk interleaving in com- puting a large fast Fourier transform has been demonstrated [SI. Disk interleaving has also been studied [12], [13] as a high-performance 1/0 system architecture in a multiprocessor system. It has been pointed out [7] that the advantages of disk Manuscript received April 11, 1988; revised February 15, 1991. The authors are with the IBM Thomas J. Watson Research Center, Yorktown IEEE Log Number 9100992. 'A group of disk units is interleaved if each data block is divided into portions and succeeding portions of the same block are stored on different disks. Heights, NY 10598. interleaving are twofold: 1) it enables parallelism and 2) it facilitates uniform distribution of requests over multiple disks. One of the major contributors to the performance problems of today's disk systems is the phenomenon that disk access requests are not uniformly distributed over the disks. Thus, only a small number of disks may be heavily utilized while the rest remain idle. This results in 1/0 bottlenecks, by which the performance of a disk system may be limited. By interleaving disks, it is possible to achieve a more uniform distribution of access requests, thereby improving the overall system performance. In an interleaved disk system, a data block may be parti- tioned into subblocks SI, Sz, ' . . , S,. Subblock Si is assigned to disk unit ((2 - 1) mod n) + 1, where n is the degree of interleaving. That is, subblock 1 is stored on disk 1, subblock 2 of the same data block is stored on disk 2, and so on. As these subblocks are placed across a group of disks, they may be stored at predetermined locations, say at the same physical location, or be stored independently of each other. The former case has been known as synchronized interleaved disk system, and its performance implications have been analyzed in [7]. We call the latter asynchronous interleaved system, and study its performance implications in this paper. In an asynchronous interleaved disk system, the disks may be treated asynchro- nously, or independently of each other, and those subblocks belonging to the same data block are stored independently of each other. As a result, the seek and rotational delays involved in the same transfer will be different for each disk. In order to provide an adequate error correcting scheme upon failure of a disk, checksums may be placed on separate checksum disks to improve reliability using minimizing redundancy. See [7] and [4] for a detailed discussion on reliability issues. Evaluating Average I10 Response Time: Basically, the re- sponse time of a I/O request consists of five components: 1) queueing delay, 2) seek, 3) latency, 4) RPS (rotational po- sitioning sensing) delay, and 5) data transfer. Queueing delay is the time that a request waits for the disk to become free. Seek delay is the time required to position the access mechanism to the cylinder containing the data. Latency is the time required for the correct data to rotate under the read/write head. The RPS feature allows the rotational positioning sensing of the disks to take place while disconnected from channel. The penalty for this is that, in the absence of a disk cache, if the channel is not free when the disk is ready, the disk has to make a full revolution before it can reattempt a connection. Data transfer time is the time required to transfer data between the disk and main memory. 0018-9340/91/070G0801$01.00 0 1991 IEEE

Upload: ibm

Post on 08-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991 80 1

Asynchronous Disk Interleaving: Approximating Access Delays

Michelle Y. Kim, Member, IEEE, and Asser N. Tantawi, Senior Member, IEEE

Abstract-Disk interleaving, or disk striping, distributes a data block across a group of disks and allows parallel transfer of data. Disk interleaving is achieved by dividing a data block into a number of subblocks and placing each subblock on a separate disk. A subblock can be stored on an interleaved disk at a predetermined location (relative to the adjacent subblocks), or it can be stored at any location on the disk. We consider a system where adjacent subblocks are placed independently of each other, we call it an asynchronous disk interleaving system, and analyze its performance implications. Since each of the disks in such a system is treated independently while being accessed as a group, the access delay of a request for a data block in an n-disk system is the maximum of n access delays. Using approximate analysis, we obtain a simple expression for the expected value of such a maximum delay. The analytic approximation is verified by simulation using trace data, the relative error is found to be at most 6%.

Index Terms-Approximating access delays, asynchronous disk interleaving, simulation of an asynchronous system.

I. INTRODUCTION

ITH increasing processor speeds and multiprocessor W organizations, the processing power of a computer system has improved greatly in recent years. This in turn has allowed the scale of computing problems to grow. Many large- scale problems require the processing of huge amounts of data. If the data array is larger than the size of main memory, it is assumed to be stored on external devices such as disks. The data rates of disks are limited by their mechanical speeds, and this has caused a huge speed mismatch between the processing power and the 1/0 system. Consequently, problems that were once CPU-bound are quickly becoming 1/0 bound. Unless the problem of 1/0 bandwidth is solved, the dramatically improved processing speeds will not result in a speedup of the system as desired.

Disk interleaving (also called disk stripping)' has been suggested as a means of improving bandwidth to disks for large database systems [ 2 ] , or for scientific applications [6]. More recently, the effectiveness of disk interleaving in com- puting a large fast Fourier transform has been demonstrated [SI. Disk interleaving has also been studied [12], [13] as a high-performance 1/0 system architecture in a multiprocessor system. It has been pointed out [7] that the advantages of disk

Manuscript received April 11, 1988; revised February 15, 1991. The authors are with the IBM Thomas J. Watson Research Center, Yorktown

IEEE Log Number 9100992. ' A group of disk units is interleaved if each data block is divided into

portions and succeeding portions of the same block are stored on different disks.

Heights, NY 10598.

interleaving are twofold: 1) it enables parallelism and 2) it facilitates uniform distribution of requests over multiple disks. One of the major contributors to the performance problems of today's disk systems is the phenomenon that disk access requests are not uniformly distributed over the disks. Thus, only a small number of disks may be heavily utilized while the rest remain idle. This results in 1/0 bottlenecks, by which the performance of a disk system may be limited. By interleaving disks, it is possible to achieve a more uniform distribution of access requests, thereby improving the overall system performance.

In an interleaved disk system, a data block may be parti- tioned into subblocks SI, Sz, ' . . , S,. Subblock Si is assigned to disk unit ( (2 - 1) mod n) + 1, where n is the degree of interleaving. That is, subblock 1 is stored on disk 1, subblock 2 of the same data block is stored on disk 2, and so on. As these subblocks are placed across a group of disks, they may be stored at predetermined locations, say at the same physical location, or be stored independently of each other. The former case has been known as synchronized interleaved disk system, and its performance implications have been analyzed in [7]. We call the latter asynchronous interleaved system, and study its performance implications in this paper. In an asynchronous interleaved disk system, the disks may be treated asynchro- nously, or independently of each other, and those subblocks belonging to the same data block are stored independently of each other. As a result, the seek and rotational delays involved in the same transfer will be different for each disk. In order to provide an adequate error correcting scheme upon failure of a disk, checksums may be placed on separate checksum disks to improve reliability using minimizing redundancy. See [7] and [4] for a detailed discussion on reliability issues.

Evaluating Average I 1 0 Response Time: Basically, the re- sponse time of a I/O request consists of five components: 1) queueing delay, 2) seek, 3) latency, 4) RPS (rotational po- sitioning sensing) delay, and 5) data transfer. Queueing delay is the time that a request waits for the disk to become free. Seek delay is the time required to position the access mechanism to the cylinder containing the data. Latency is the time required for the correct data to rotate under the read/write head. The RPS feature allows the rotational positioning sensing of the disks to take place while disconnected from channel. The penalty for this is that, in the absence of a disk cache, if the channel is not free when the disk is ready, the disk has to make a full revolution before it can reattempt a connection. Data transfer time is the time required to transfer data between the disk and main memory.

0018-9340/91/070G0801$01.00 0 1991 IEEE

802 IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991

In an asynchronous system there is one queue for each independent disk. It has been shown [7] that in a reasonably well-tuned conventional disk system, it is hard to see a disk queue growing beyond one. An n-disk asynchronous system is similar to an n-disk conventional system in that there will be one queueing point for each disk. The average number of requests that wait in a queue in an n-disk asynchronous system will be even smaller than that has been observed in a conventional system. This is because of the reduced data transfer time in an asynchronous system; queueing delay is an increasing function of the delays that occur at the disk, and the time to transfer data in an asynchronous system is a fraction of the time in a conventional system. As will be shown, a major portion of a data block access delay is the synchronization delay, by which we mean the gap between the time the first subblock is accessed and the time the last subblock is available. Although the impact of RPS misses on the performance of a disk system is great, there are implementation techniques that could be used to minimize or eliminate the problem. Hence, we shall ignore the impact of RPS miss delay. Among the remaining delays: seek, latency, and data transfer times, the latter is a constant and does not change from disk to disk. Thus, we consider seek, latency, and access delay which is defined as the sum of seek and latency.

The paper is organized as follows. In Section 11, we obtain exact and approximate expressions for the expected access delay in an n-disk asynchronous system. These expressions are validated in Section I11 by simulating an asynchronously interleaved system. Conclusions appear in Section IV.

11. EVALUATION OF THE EXPECTED ACCESS DELAY

In this section, we evaluate the expected maximum delays of disk access requests in asynchronously interleaved disk systems with n disks. We assume that the delays at each disk are independent and identically distributed (i.i.d.) random variables. Thus, the problem is to evaluate the expected value of the maximum of a set of i.i.d. random variables. In general, it is hard to obtain a closed form expression for the expected maximum. However, for simple distributions such as the exponential and the uniform distributions one can obtain closed form expressions as we show in Section 11-A. By assuming that the seek time distribution is exponential and the latency distribution is uniform, we can evaluate the expectation of their maximum delays using these expressions. We also give an approximate expression for the expected maximum and apply it to the normal distribution. This approximation method is used to evaluate the expected maximum of the access delay. Unfortunately, the access delay, which is of prime interest to us, does not have a simple distribution such as exponential, uniform, or normal, whose expected maximum can be easily evaluated using simple expressions. However, by approximating its distribution by a normal distribution, we obtain an approximation for the expected maximum value. One interesting observation, as we shall see later, is that the two extreme assumptions of exponential and uniform cases are shown to provide useful bounds on the relative error of the approximation. In Section 11-B, we obtain exact expressions

for the expected maximum of the access delay for certain seek and latency distributions and evaluate the error due to our approximation.

A. Analysis

Let { X i , i = 1, 2, . . . , n} be i.i.d. nonnegative random vari- ables with distribution function Fx, mean p x , and standard deviation gay. Denote by Xmax(n) the random variable which is the maximum of { X i , i = 1.2, . . . , n}; its distribution is given by

Since the random variable Xmax(n) is nonnegative, its expec- tation is expressed as

(2) 0

It is straightforward to evaluate the above expression for exponential and uniform distributions. Assuming that X is exponentially distributed with mean p-y, we get

E[xmax(n)] = H ~ P L X . (3)

where H,, = CL=, l / k is the harmonic series. For large n, H , may be approximated by c + ln(n), where c = 0.5772 is Euler's constant. Note that E[X,,,(n)] grows logarithmically with n. As an example, if the seek times are exponentially distributed, we can obtain the expectation of the maximum seek, Smax(n), from (3). It is clear that seek time cannot be greater than the time it takes to move the access arm from the innermost to the outermost cylinder. Thus, we bound the value of H , for seek on an IBM 3380 disk by 30/7.2(= 4.2), where 30 ms is the time for the maximum amount of movement across cylinders and 7.2 ms the expected seek time [l].

Assuming that X is uniformly distributed in the interval [ 0 , 2 p s ] , we get

(4)

Note that the expectation of Xmax(n) increases slowly with n and it reaches a constant, namely 2 p x , as n + 00. Let latency times be uniformly distributed between 0 and 16.6 ms, the maximum rotational latency of the disk. By substituting 2p,y = 16.6 ms into (4), we obtain the expected maximum latency.

For general distribution functions FA=, the integration in (2) is not straightforward. However, E[X,,,,(n)] may be approxi- mated by a quantity known as the characteristic maximum of the random variable X which we denote by xn [9], [3]; it is defined by

xn = min{z : 1 - Fx(z) 5 l/n}.

For continuous functions, the characteristic maximum is ob- tained from the equation

1 n Fx(xn) = 1 - -

KIM AND TANTAWI: ASYNCHRONOUS DISK INTERLEAVING: APPROXIMATING ACCESS DELAYS 803

In terms of xn, E[Xmax(n)] may be approximated by [5]

E[Xmax(n)l Xn + n(1 - F.Y(Y)) dY X n 7

which has a lower bound xn. In other words, the expectation of the maximum of n. i.i.d. random variables can be approximated by the characteristic maximum:

E[Xmax(n)l X n . (6)

As an example, if X is normal with mean bAy and standard deviation c , ~ , then for n > 4, the characteristic maximum may be approximated by [9]

X n LLx + c s J 2 l o g n .

Therefore, from (6) we have

~ [ ~ m a x ( n ) l + gxJ210gn (7)

which is a good approximation for normal distributions and found to be valid for a class of general distributions. Note that E[Xmax(n)] grows as the square root of log n.

Expressions for the expectation of Xmax(n) are given in (3), (4), and (7) for exponential, uniform, and normal distributions, respectively. We note that they all have the form

E[Xmax(n)] = LLX + OXG(7b)

where G(n) is a function of n which depends on the distribu- tion F-Y in an interesting fashion as illustrated in Fig. 1.

G(n) may be interpreted as the expectation of the maximum of i.i.d. random variables with zero mean and unit variance. For exponential distributions, G(n) = Hn - 1 which grows logarithmically with n. For normal distributions, G(n) = d m which grows at a slower rate as the square root of the logarithm. Finally, for uniform distributions, G(n) = f i ( n - l ) / (n + 1) which grows very slowly with n and goes to fi as n --+ m. A tight upper bound on G(n) is given by [31

n - 1 G(n) I

which is illustrated in the figure. Note that the curve for the normal distribution is valid only for n > 4. Otherwise, one could use the upper bound.

B. Examples We consider two cases: 1) exponential seek and uniform

latency and 2) uniform seek and uniform latency. The access delay 2 is the sum of seek and latency, 2 =

S + L. The probability density function of 2 is therefore obtained by convolving the probability density functions of seek and latency as

I 16 4 12 'n

Fig. 1

Substituting the probability density functions fs and f L into (8) yields the probability density function of the access delay which by integration and substitution into (2):

E[zmax(n)] (1 - F,"(t)) d t (9) [ gives the expected maximum access delay. Although the integration may not be straightforward and the result may not be in a closed form, the above equations can be evaluated for some simple distributions.

1) Exponential Seek and Uniform Latency: We assume that the seek time is exponentially distributed

with mean 1 / X and latency is uniformly distributed in the range [O,a]. Thus, we have fs(s) = Xe-", s 2 0, and f L ( l ) = k, 0 I 1 5 a. In order to perform the integration in (8), we divide the range of z into two regions: z 5 a and z > a, and obtain

- e-'"), z 5 a , ", ( e X a - 1). z > a,.

By integrating the above expression we obtain the distribution function of 2, which we denote by Fz( t ) = s , ' f i ( z ) d z ; it is given by

Substituting (10) into (9) yields

E [ Z m a x (.)I =

- y]} (11)

and

804 IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991

We note from (11) that the exact expression for E[Zmax(n)] is given in terms of finite sums and is easily evaluated numer- ically for various values of n. In Fig. 2 we plot E[Z,,,(n)], the expected maximum access delay, against n, the number of disks. We assume that the mean values of seek and latency are 7.09 and 8.333 ms, respectively. Also, we plot the approximate expression for the expected maximum access delay which we obtained by approximating the distribution of the access delay by a normal distribution, as given in (7). Note that the approximation is valid only for n > 4. The relative error of the approximation is plotted in Fig. 3. From Fig. 3, the relative error for n = 8 is approximately 7.6%.

2) Uniform Seek and Uniform Latency: Assume that both S and L are uniformly distributed in the ranges [0, b] and (0, a] , respectively. Their probability density functions fs and f L are given by fs(s) = 5, 0 5 s 5 b, and f L ( 1 ) = i, 0 5 15 a, respectively. By substituting these equations into (8) we obtain the probability density function f z of the access delay:

O < z < a , fz(.) = b, a s z s b , { 2 - 5 , b l z l a - t b .

By integrating j z ( z ) we obtain the distribution function of Z as

O l t l a , Fz( t ) = (t - +), a 5 t 5 b, { ?'-, ( (a+b) -t ) b < t < a + b .

Substituting the above equation into (9) yields

a+b

E[Zmax(n)] = 1 (1 - F,"(t)) dt = A + B + C, 0

where

b

B = 1- - t - - J' ( a ( = & a - -

n + l

and a+b

Ind t (I - (a + b ) - t)* '= J l - 1 2ab

a L

= a - b G

By summing we get

J

EXWNENllAL SEEK + UNIFORM LATENCY r

4 8 18 12

Fig. 2.

MPONEKnAL SEEK + UNIFORM LAlENCY

I I I I I I I 4 O n 12 16

Fig. 3.

where a = 9. In Fig. 4 we plot E[Z,,,(n)] for various values of n, with

mean values of seek and latency equal to 25 and 8.333 ms, respectively. Note that the device parameters are those of an IBM 3350 disk system. As previously, we compare the exact E[Z,,,(n)] obtained from (12) to the normal approximation discussed in Section 11-A. The relative error of the approxima- tion is computed and plotted in Fig. 5. For n = 8, the relative error is approximately 0.89%. For all n that are shown in Fig. 5, the relative errors are very small, almost negligible. In the case where the seek time distribution is exponential, we have shown that the relative error is 7.6%, which is still very low. We can, therefore, safely conclude that the normal approximation is a good approximation for a wide class of seek time distributions.

111. EXPERIMENTS AND RESULTS

In the previous section, we have obtained an approximation to the expected access delay in an asynchronous interleaved

KIM AND TANTAWI. ASYNCHRONOUS DISK INTERLEAVING. APPROXIMATING ACCESS DELAYS 805

UNIFORM SEEK + UNIFORM LATENCY Pr

“i t mC SEEK - 25.M)

Avc lATD(cy - 1.333

0 1 I I I I I I I 4 8 12 18

Fig. 4.

UNIFORM SEEK + UNIFORM LATENCY

12 16

Fig. 5

system. Experiments were run to validate our method, and we summarize the results in this section.

A. Traces Used

Two types of trace data are used for the experiments: real reference traces and synthetic traces, as described below.

1) Real Traces: The disk reference traces that were avail- able for our experiments originated at the data processing center of a major manufacturing company. The company has large IMS (information management system) databases, which are accessed interactively. Our traces are of IMS database references from 32 of the company’s 430 IBM 3350 DASD’s (direct access storage devices). A more detailed description of the traces can be found elsewhere [ll]. The traces have been filtered to eliminate all but sequences of seek addresses, or cylinder addresses, on the disk, we are interested in. Once the seek activities have been grouped by disk, we compute their displacements, or the distance the access arms travel. We assume that the access arms are initially positioned at

the cylinder where the required data are. The number of displacements is then one smaller than the number of original seek addresses.

Seek times are calculated using the mechanical characteris- tics of the actuator.’ For high-performance voice coil actuator, the seek time is given by [14]

S = a + b & , A > l

where A is the displacement in cylinders, a is the mechanical setting time, and b is related to track density and the accelera- tion of the actuator. For 3350 devices seek times are obtained from

S = 7.8+ l . S f i , A 2 1:

whereas for 3380 devices, they are given by

S = 2.1 + 0.9fi, A > 1

In our experiments, the seek times of both device types have been considered and our methods have been shown equally valid in both cases. Our methods are based on certain assumptions, about the distribution of seeks, and their resulting distributions are similar, although the magnitudes of the seek times on the two device types are quite different. Since the traces were taken from 3350 DASD’s, we will base our discussion on the 3350 seek times in this section.

From the above process we have obtained a sequence of seek times for each of the disks that were monitored. Note that from the traces, we have obtained only seek information. As for latency, we assume that it is uniformly distributed between 0 and 16.6, the maximum rotational delay.

We examine in our experiments four of the most active disks among the 32 which were monitored. Their seek time distributions are depicted in Figs. 10-13. The seek activities exhibit commonly a significant number of zero seeks, and vary widely otherwise.

2) Synthetic Traces: Real reference streams are complicated and variable. From a limited number of traces, there is always uncertainty associated with drawing a general conclusion. In order to experiment with other seek distributions which are certainly conceivable, we create two reference streams for seek time: uniform and exponential. Having generated seek times, we generate a sequence of latency times, which are uniformly distributed. These two sequences are then merged to form a sequence of (seek, latency) pairs. In fact, having a uniform seek time distribution implies that the seek times are independent of where data are stored. Although this is not conceivable given today’s storage technology, it is theoret- ically interesting, and worth investigating. More importantly, this uniform distribution of seek times, as we shall see shortly, is shown to provide a lower bound on the relative error of the approximation, while exponential distribution provides an upper bound.

2An arm mechanism that moves the readiwrite heads and attached heads form an actuator.

806 IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991

Seek Disk-1 Disk-2 Disk-3 Disk-4 Disk-E Disk-U

(Measured) Mean 1.3337 14.9320 23.7070 19.7737 7.0916 24.8910

14.3240 Max 20.1402 45.6860 49.0824 47.6450 47.8950 49.9371

42.2031

Std dev 7.0629 13.7520 15.1140 13.9850 6.9143

Exp max 16.4092 35.1879 37.0035 18.5752 44.3544

Fig. 6. For each disk, mean, standard deviation (std dev), maximum (max), and expected maximum (exp max) seek times are measured.

3) Redistribution of Trace Data: The disk references from a single disk may be redistributed over n disks so as to simulate an asynchronous n-disk system. Note that our goal is to preserve the original access pattern so that each of the interleaved disks, after the references have been redistributed across them, still maintains the same access pattern as the original disk. While the access pattern is being preserved on each disk, the disks must at the same time be treated independently of each other. This may be accomplished by splitting a sequence of (seek, latency) pairs originally obtained from a single disk into eight, n = 8 in our experiments, subsequences. This split is done by assigned the first i t h references to diskl, the second i t h to diskz, and so on. In this manner, the original access pattern on each disk is preserved on all of the interleaved disks, and yet at any point of reference the references across them are independent of each other.

B. Measurements

Having regrouped the original trace sequence for each disk, and split each group into eight subsequences of (seek, latency) pairs, each of length I;, we take measurements of the trace data as follows. Let Sz,J and L2,J,a = 1 , 2 , . . ' , n and j = 1 , 2 , . . . , I; , denote the j th seek and latency for disk,, respectively. Furthermore, let Z 2 , j = St , j + L2.J be the access delay. The sample average values of seek, latency, and access delays are therefore obtained from

n k

j=1 3=1

and

. n k

respectively. Define the sequences of maximum values of seek, latency, and access delays as

j = 1 , 2 , . . . , I C , respectively. In order to obtain the sample average values of the maximum, we simply evaluate the

~~

(Measured) Mean 8.2921 Max 16.6230 Exp max 14.8177 (Estimated)

*exp max 14.7779 Relative error 0.0027

Fig. 7. Latency. Measured values and estimated expected maximum (*exp max).

following expressions:

Having computed the sample average values of the maximum, we now compute the corresponding estimated values:

where S is exponentially distributed,

where Rev is the time for a disk revolution, 16.6 ms, and L is considered to be uniformly distributed, and

where u the standard deviation of the access delay and is given by

Note that for n = 8, H8 IV_ 2.7, and d m 21 1.34. In order to determine the goodness of an estimated value, we compute the percent relative error as follows.

Seek:

I(HSE[SI - ~ [ s m a x ( n ) ] ) I E S =

KIM AND TANTAWI: ASYNCHRONOUS DISK INTERLEAVING: APPROXIMATING ACCESS DELAYS 807

Disk-1 Disk-2 Disk-3 Disk-4 Disk-E Disk-U

(EIS + LI 1

Upper

Estimate Lower

(Std dev) Estimate Lower

(Sample) Mean Median Std dev (Fitted) Mean Median Std dev

Upper

15.645 15.268 16.028

8.5378 8.2813 8.8156

15.645 15.371 8.5379

15.645 15.645 8.5379

23.225 22.504 23.945

14.491 14.003 15.023

23.225 21.661 14.491

23.225 23.225 14.491

31.924 30.857 32.99

15.898 15.188 16.7

31.924 34.68 15.898

31.924 31.924 15.898

28.06 27.161 28.959

14.564 13.962 15.236

28.06 30.275 14.564

28.06 28.06 14.564

15.394 15.032 17.757

8.1085 7.8619 8.3757

15.394 14.401 8.1085

15.394 15.394 8.1085

33.221 32.3700 34.073

19.029 18.45 19.655

33.221 33.282 19.029

33.221 33.221 19.029

Fig. 8. Analysis of normal distribution fit. Confidence intervals (95%).

Access Delay Disk-1 Disk-2 Disk-3 Disk-4 Disk-E Disk-U

(Measured) Mean 15.6344 23.2140 31.9130 28.0490 15.3944 33.2210 Max 36.5523 61.3470 63.7150 62.7720 57.2641 66.5760 Exp max 27.6528 45.3925 51.4120 47.1228 28.1765 53.7402 (Estimated) *exp max 27.1480 43.2678 52.9328 47.9321 26.4484 53.4761 Relative error ,0183 ,0468 ,0296 ,0172 ,0613 .0049

Fig. 9. Summary of experiments. For each experiment, measured access delays are summarized. Estimated maximum values (*exp max) are compared to the measured expected maximum delays.

Latency:

Access delay:

C. Results of Measurements and Comparisons

We first examine the latency, seek, and access delay distribu- tions for each of the original trace sequences. We then give the expected maximum values which have been measured on the simulated systems. These measured values are then compared to the corresponding estimated values, and the relative error for each estimated value is computed.

1) Seek: We first inspect the real trace data from the four most active disks, and study the synthetic data which we have created. The seek time distributions found from the four disks from the real trace data are illustrated in Figs. 10-13 for Disk-1 -Disk-4, respectively. A summary of the seek time distributions on all the six disks, including Disk-U with a uniform seek and Disk-E with an exponential seek, is provided in Fig. 6.

Note that each of the seek time distributions from the real trace has a significant number of zero seeks. Disk-1 as in

Fig. 10 has approximately 50% of zero seeks with the mean of 7.3337 ms. The maximum seek time observed on the disk is 20.14 ms. Disk-2 as in Fig. 11 shows almost the same ratio of zero seeks as in Disk-1, but the maximum seek time is more than twice as much, that of 45.68 ms. Consequently, its mean value and also the standard deviation are almost doubled. Disk- 3 as in Fig. 12, on the other hand, shows fewer zero seeks, only 25%. Note that not counting the zero seeks, the nonzero seek time distribution has the well-known bell-shaped curve which characterizes the normal density function. Finally on Disk-4 as in Fig. 13, zero seeks are again not as much, E 32%, but this time, the mean and standard deviation are considerably smaller than those found on Disk-3.

As discussed previously, the seek time distribution is com- plicated. None of the four disks from the real trace data has a simple seek time distribution. We will show in the next section that the sum of seek and latency can be rather accurately approximated by a normal distribution.

On Disk-E, where the seek times have been generated by the exponential number generator, the expected maximum seek that has been measured is 18.5752 ms, as shown in Fig. 6. The expected maximum value that has been estimated by the method described previously, H&[S], is 19.143 ms. The relative error is approximately 3%. The reason for this somewhat higher error rate is as follows. As we generate the exponential numbers using the formula shown previously, there will be a small number of seek times that are larger than the maximum time required for seek. The probability that the number X generated is larger than the maximum

808

S t t K T l M t DISIRlBUIlUN

SEEK TlMF

Fig. 10. Seek time distribution of Disk-I

S t t K T l M t DlSlKItlUllUN

SEEK TIME

Fig. 11. Seek time distribution of Disk-:!

M , Pr [ X > M ] , is (e+), where p is the mean value of X . As a result of generating 2000 random variates, four of them were larger than 50 ms, the maximum seek time on an IBM 3350 disk system. Although the number is small, we discard such numbers and experiment only with the remaining ones. Hence, the resulting distribution was not exactly an exponential one. As we include all such large numbers, the relative error naturally becomes smaller; it was 0.9% with our experiment.

2) Latency: The latency time distribution is straightforward. Given a sequence of uniformly distributed latency times, the sequence was then split into eight subsequences, and a measurement was taken to obtain the expected value of the maximum latency. Its corresponding estimated value was then computed. As summarized in Fig. 7, the measured expected maximum is 14.8177 ms, and its estimated value, which is denoted by (*exp max), is 14.7779 ms. Note that this estimated value of the expected maximum has been computed using the exact method. Thus, the estimation is very accurate with a

IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991

S t t K l l M E DlSIKlBUllON

0 N

0

SEEK TIME

Fig. 12. Seek time distribution of Disk-3

SEEK I lME OlSlKlBUrlON

0 10 20 30 40 50

SEEK TIME

Fig. 13. Seek time distribution of Disk-4.

small error of 0.27%. This error, although very small, is due to the fact that the size of the trace data we have used was limited. As the size of the trace data grows, it is likely that the estimated value and the measured value will converge.

3) Access Delay: Each seek time sequence we have obtained is now paired with a latency sequence which has been gener- ated by the random number generator. The distribution of the sum (seek + latency) for each of the four sequences from the real trace data is plotted in Figs. 14, 15, 16, and in 17. The two synthetic sequences, one with the exponential seek distribution and the other with the uniform distribution, are also plotted in Figs. 18 and 19. By examining the six figures each of which has its own seek time distribution, we can say that for a wide range of seek time distributions the sum of seek and latency may be approximated by a normal distribution. To show this, each of them is fitted to a normal distribution as shown in the figures, and the fit is summarized in Fig. 8.

Each sequence of (seek + latency) times is then split into eight subsequences as if we were simulating an eight-disk asynchronous system. As we have done previously, expected

KIM AND TANTAWI: ASYNCHRONOUS DISK INTERLEAVING: APPROXIMATING ACCESS DELAYS 809

(SEEK + LATENCY) TIME DISTRIBUTION

NORMAL DENSITY FUNCTION, N=1968

(SEEK + LATENCY) TIME DISTRIBUTION

NORMAL DENSITY FUNCTION, N=855 I n

(SEEK t LATENCY)

Fig. 14. (Seek t latency) time distribution of Disk-1.

(SEEK + LATENCY) TIME DISTRIBUTION

NORMAL DENSITY FUNCTION, N=1556

; 20

(SEEK t LATENCY)

Fig. 16. (Seek + latency) time distribution of Disk-3.

(SEEK + LATENCY) TIME DISTRIBUTION

NORMAL DENSIM FUNCTION, N=1010

I n

(SEEK + LATENCY)

Fig. 17. (Seek + latency) time distribution of Disk-4. (SEEK + LATENCY)

Fig. 15. (Seek + latency) time distribution of Disk-2.

maximum values are measured. These measured values are then compared to the estimated values. A summary of our experiments is presented in Fig. 9.

Note that the mean and maximum values have been mea- sured from the original sequences, treating each sequence as a reference stream per disk. The expected maximum values, both measured and estimated, however, have been obtained after splitting each sequence to eight subsequences.

On the four disks whose traces have been taken from the real trace data, the estimated maximum values vary from 1.72% to 4.68%. Disk-U with the uniform seek shows the smallest relative error of 0.49% providing a lower bound, and Disk-E with the exponential seek showing the largest relative error of 6.13%.

We repeated the same experiments using the IBM 3380 seek times. The results were not significantly different. Therefore, we can say that for a wide range of the seek time distributions,

the sum of seek and latency can be approximated by a normal distribution. Consequently, as we preserve the same distribution on the asynchronous disk system which has been simulated by the trace sequence we have at hand, we can estimate its expected maximum using the formula E [ S + L] + ad-.

IV. CONCLUSIONS

We have studied the performance implications of asyn- chronous disk interleaving. We have used approximate analysis to develop simple expressions for various expected delays and demonstrated that they are good for a wide range of seek time distributions.

Disk interleaving is a useful technique in computing large- scale problems that require huge amounts of data. Disk in- terleaving may be achieved synchronously or asynchronously. Synchronous interleaving may be best suited for algorithms whose reference patterns are regular and structured and whose

810 IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 7, JULY 1991

NOKMAL DENSIlY FUNCIION. N=1920

“7

2

0

2 L Y 2 g o E W

i o 2 s [L

n 8

0 20 60

Fig. 18

E - O

m * ? - U 0

W 3 - 0 w v ) E ? -

W -

$ 5 - N

8 -

(SEEK + LATENCY) TIME

(Experimental seek + latency) time distribution on Disk-E.

(UNIFORM SEEK + LATENCY) TIME DISTRIBUTION

NORMAL DENSITY FUNCTION. N = 1 9 2 0

\ I 40

(SEEK t LATENCY)

L 80

Fig. 19. (Uniform seek t latency) time distribution on Disk-U.

block sizes are large [8]. Nevertheless, a fully synchronized disk system may no longer be an option as the number of disks in a disk system reaches beyond a certain point [12]. A suitable combination of the two may provide an effective 1/0 system architecture that can also scale well according to the size of the disk system. In particular, asynchronous interleaving can be used to group clusters of synchronously interleaved disks. Each cluster of synchronous disks can be treated as if it were a single disk and it can then participate to form yet another level of interleaving.

REFERENCES [l] M. Bohl, Introduction fo IBM Direct Access Sforage Devices, Science

Research Associates, Inc., 1981. [2] H. Boral, and D. J. DeWitt, “Database machines: An idea whose time

has passed? A critique of the future of database machines,” in Database

[3] H. A. David, Order Statistics. [4] H. Garcia-Molina and K. Salem, “Disk stripping,” Computer Research

Report, Princeton Univ., 1988. [ 5 ] A. Gravey, “A simple construction of upper bound for the mean of

the maximum of N identically distribution random variables,” J . Appl. Probability, vol. 22, pp. 844-851, 1985.

[ h ] 0. G. Johnson, “Three-dimensional wave equation computations on vector computers,” Proc. IEEE, vol. 72, no. 1, Jan. 1984.

[7] M. Y. Kim, “Synchronized disk interleaving,” IEEE Trans. Compuf., vol. ‘2-35, no. 11, Nov. 1986.

[8] M. Y. Kim, A. Nigam, G. Paul, and R. J. Flynn, “Disk interleaving and very large fast Fourier transforms, Int. J . Supercomput. Appl., vol. 1, no. 3, pp. 75-96, 1987.

[9] C. Kruskal and A. Weiss, “Allocating independent subtasks on parallel processors,” IEEE Trans. Software Eng., vol. SE-11, no. 10, Oct. 1985.

1101 J. R. Lineback, “New features tune Unix for high-end machines,” Electronics, Aug. 19, 1985.

[ l 11 C. May, “LARGQ: A study in the design and evaluation of a memory management algorithm,” IBM Computer Sci. Res. Rep. RC10048, July 1983.

(121 A. L. N. Reddy and P. Banerjee. “An evaluation of multiple-disk 1/0 systems,” IEEE Trans. Compuf., Dec. 1989.

1131 -, “Design analysis and simulation of 1 /0 architectures for hyper- cube multiprocessors,” IEEE Trans. Parallel Distributed Sysf., 1990.

[ 14) R. A. Scranton, D. A. Thompson, and D. W. Hunter, “The access time myth.” IBM Res. Rep. RC10197, Sept. 1983.

New York: Wiley, 1981.

Michelle Y . Kim (M’85) received the Ph.D. degree in computer science from Polytechnic University.

She joined IBM at the Thomas J. Watson Re- search Center, Yorktown Heights, NY, in 1981 as a research staff member, where she is currently a member of the Artificial Intelligence Department. Her research interests include computer architecture and artificial intelligence.

Asser N. Tantawi (M’87-SM’90) received the B.S. and M.S. degrees in computer science from Alexandria University, Alexandria, Egypt, and the Ph.D. degree in computer science from Rutgers University.

He joined the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, in 1982, as a re- search staff member, where he is currently manager of Systems Connectivity Performance in the High Bandwidth Systems Laboratory. His fields of inter- est include performance modeling, queuing theory,

load balancing, parallel processing,. reliability modeling, and high-speed networking.

Dr. Tantawi is a member of the Association for Computing Machinery and ORSA/TIMS. He has served as an ACM National Lecturer during

Machines. Berlin, Germany: Springer-Verlag, 1983. 1984-1988