while optionally exploiting the exposed information to orchestrate its data layouts and access patterns. 7 Conclusions The Atropos disk array LVMemploys a new data organization that allows applications to take advantage of features built into modern disks. Striping data in track-sized units lets them take advantage of zero-latency access to achieve efficient access for sequential access patterns. Taking advantage of request scheduling and knowing exact head switch times enables semi-sequential access, which results in efficient access to diagonal sets of noncontiguous blocks. By exploiting disk characteristics, a new data organization, and exposing high-level constructs about this organization, Atropos can deliver efficient accesses for database systems, resulting in up to 4 speed-ups for decision support workloads, without compromising performance of OLTP workloads. Acknowledgements We thank the members and companies of the PDL Consortium (including EMC, Hewlett-Packard, Hitachi, IBM, Intel, LSI Logic, Microsoft, Network Appliance, Oracle, Panasas, Seagate, Sun, and Veritas) for their interest, insights, feedback, and support. This work is funded in part by NSF grants CCR-0113660, IIS- 0133686, and CCR-0205544, and by an IBM faculty partnership award. References [1] A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. International Conference on Very Large Databases (Rome, Italy, 11–14 September 2001), pages 169–180. Morgan Kaufmann Publishing, Inc., 2001. [2] J. S. Bucy and G. R. Ganger. The DiskSim simulation environment version 3.0 reference manual. Technical Report CMU– CS–03–102. Department of Computer Science Carnegie-Mellon University, Pittsburgh, PA, January 2003. [3] M. J. Carey et al. Shoring up persistent applications. ACM SIGMOD International Conference on Management of Data (Minneapolis, MN, 24–27 May 1994). Published as SIGMOD Record, 23(2):383–394, 1994. [4] P. M. Chen and D. A. Patterson. Maximizing performance in a striped disk array. ACM International Symposium on Computer Architecture (Seattle, WA), pages 322–331, June 1990. [5] G. P. Copeland and S. Khoshafian. A decomposition storage model. ACM SIGMOD International Conference on Management of Data (Austin, TX, 28–31 May 1985), pages 268–279. ACM Press, 1985. [6] IBM DB2 Universal Database Administration Guide: Implementation, Document number SC09-2944-005. [7] T. E. Denehy, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Bridging the information gap in storage protocol stacks. Summer USENIX Technical Conference (Monterey, CA, 10–15 June 2002), pages 177–190, 2002. [8] G. R. Ganger. Blurring the line between OSs and storage devices. Technical report CMU–CS–01–166. Carnegie Mellon University, December 2001. [9] G. G. Gorbatenko and D. J. Lilja. Performance of twodimensional data models for I/O limited non-numeric applications. Laboratory for Advanced Research in Computing Technology and Compilers Technical report ARCTiC–02–04. University of Minnesota, February 2002. [10] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 3rd ed. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2003. [11] R. Y. Hou and Y. N. Patt. Track piggybacking: an improved rebuild algorithm for RAID5 disk arrays. International Conference on Parallel Processing (Urbana, Illinois), 14–18 August 1995. [12] D. M. Jacobson and J. Wilkes. Disk scheduling algorithms based on rotational position. Technical report HPL–CSP–91– 7. Hewlett-Packard Laboratories, Palo Alto, CA, 24 February 1991, revised 1 March 1991. [13] M. Livny, S. Khoshafian, and H. Boral. Multi-disk management algorithms. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 69–77, 1987. [14] C. R. Lumb, J. Schindler, and G. R. Ganger. Freeblock scheduling outside of disk firmware. Conference on File and Storage Technologies (Monterey, CA, 28–30 January 2002), pages 275– 288. USENIX Association, 2002. [15] R. Ramamurthy, D. J. DeWitt, and Q. Su. A case for fractured mirrors. International Conference on Very Large Databases (Hong Kong, China, 20–23 August 2002), pages 430–441. Morgan Kaufmann Publishers, Inc., 2002. [16] A. L. N. Reddy and P. Banerjee. A study of parallel disk organizations. Computer Architecture News, 17(5):40–47, September 1989. [17] M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACMTransactions on Computer Systems, 10(1):26–52. ACM Press, February 1992. [18] K. Salem and H. Garcia-Molina. Disk striping. International Conference on Data Engineering (Los Angeles, CA), pages 336– 342. IEEE, Catalog number 86CH2261-6, February 1986. [19] J. Schindler. Matching application access patterns to storage device characteristics. PhD thesis. Carnegie Mellon University, 2004. [20] J. Schindler, A. Ailamaki, and G. R. Ganger. Lachesis: robust database storage management based on device-specific performance characteristics. International Conference on Very Large Databases (Berlin, Germany, 9–12 September 2003). Morgan Kaufmann Publishing, Inc., 2003. [21] J. Schindler and G. R. Ganger. Automated disk drive characterization. Technical report CMU–CS–99–176. Carnegie-Mellon University, Pittsburgh, PA, December 1999. [22] J. Schindler, J. L. Griffin, C. R. Lumb, and G. R. Ganger. Trackaligned extents: matching access patterns to disk drive characteristics. Conference on File and Storage Technologies (Monterey, CA, 28–30 January 2002), pages 259–274. USENIX Association, 2002. [23] M. Seltzer, P. Chen, and J. Ousterhout. Disk scheduling revisited. Winter USENIX Technical Conference (Washington, DC, 22–26 January 1990), pages 313–323, 1990. [24] M. Shao, J. Schindler, S. W. Schlosser, A. Ailamaki, and G. R. Ganger. Clotho: decoupling memory page layout from storage organization. Technical report CMU–PDL–04–102. Carnegie- Mellon University, Pittsburgh, PA, February 2004. [25] J. Wilkes, R. Golding, C. Staelin, and T. Sullivan. The HP AutoRAID hierarchical storage system. ACMTransactions on Computer Systems, 14(1):108–136, February 1996. [26] B. L. Worthington, G. R. Ganger, Y. N. Patt, and J. Wilkes. Online extraction of SCSI disk drive parameters. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (Ottawa, Canada), pages 146–156, May 1995. A Access Efficiency Calculations Let T(N;K) be the time it takes to service a request of K sectors that fit onto a single track of a disk with N sectors per track (i.e., track-aligned access). Ignoring seek, and assuming no zero-latency access, this time can be expressed as Tnzl(N;K) = N 1 2N + K N where the first term is the average rotational latency, and the second term is the media access time. For disks with zero-latency access, the first term is not constant; rotational latency decreases with increasing K. Thus, Tzl(N;K) = (N K+1)(N+K) 2N2 + K 1 N These expressions are derived elsewhere [19]. The efficiency of track-based access is the ratio between the raw one revolution time, Trev, and the time it takes to read S = kN sectors for some large k. Hence, En = kTrev Tn(N;N)+(k1)(Ths+Trev) kTrev k (Ths+Trev) where Tn(N;N) is the time to read data on the first track, and (k1)(Ths+Trev) is the time spent in head switches and accessing the remaining tracks. In the limit, the access efficiency is En(N;H) =1 H N (6) which is the maximal streaming efficiency of a disk. The maximal efficiency of semi-sequential quadrangle access is simply Eq(N;H) = Trev Tq(N;S) = Trev Tzl (N;db+(d1)H) (7) with d and b set accordingly. B Relaxing the one-revolution constraint Suppose that semi-sequential access to d blocks, each of size b, from a single quadrangle takes more than one revolution. Then the inequality in Equation 1 will be larger than 1. With probability 1=N, a seek will finish with disk heads positioned exactly at the beginning of the b sectors mapped to the first track (the upper left corner of the quadrangle in Figure 12). In this case, the disk will access all db sectors with maximal efficiency (only incurring head switch of H sectors for every b-sector read). However, with probability 1 1=N, the disk heads will land somewhere “in the middle” of the b sectors after a seek, as illustrated by the arrow in Figure 12. Then, the access will incur a small rotational latency to access the beginning of the nearest b sectors, which are, say, on the k-th track. After this initial rotational latency, which is, on average, equal to (b1)=2N, the (dk)b sectors mapped onto (d k) tracks can be read with maximal efficiency of the semi-sequential quadrangle access. L b b b b b N-(K mod N) Collapsing quadrangle with d = 6 into a request of size db+(d-1)H b b b b b N d Physical layout of a quadrangle across disk tracks of size N H H H H H H head position after seek Figure 12: An alternative representation of quadrangle access. To read the remaining k tracks, the disk heads will need at be positioned to the beginning of the b sectors on the first track. This will incur a small seek and additional rotational latency of L=N. Hence, the resulting efficiency is much lower than when the one-revolution constraint holds, which avoids this rotational latency. We can express the total response time for quadrangle access without the one-revolution constraint as Tq(N;S) = b1 2N + K N +Plat L N (8) where Plat = (N H b1)=N is the probability of incurring the additional rotational latency after reading k out of d tracks, K = db(d 1)H is the effective request size, L = N (K mod N), and S = db is the original request size. To understand this equation, it may be helpful to refer to the bottom portion of Figure 12. The efficiencies of the quadrangle accesses with and without the one-revolution constraint are approximately the same when the time spent in rotational latency and seek for the unconstrained access equals to the time spent in rotational latency incurred during passing over dR residual sectors. Hence, dR N = N 1 N N 1 2N +Seek Ignoring seek and approximating N 1 to be N, this occurs when R 6= 0 and d N 2R: Thus, in order to achieve the same efficiency for the non-constrained access, we will have to access at least d VLBNs. However, this will significantly increase I/O latency. If R = 0 i.e., when there are no residual sectors, the one-revolution constraint already yields the most efficient quadrangle access. |