A decrease in reliability is expected because the spin-down algorithm has become more aggressive

Hybrid disks present an opportunity for spin-down algorithms to further reduce power consumption while minimizing the performance and reliability impact they impose on the media itself. We now describe four spin-down algorithm and I/O subsystem enhancements.Spin-down algorithms controlling traditional disks compute the idle period as current time last access time, where last access time is the time of the last disk access. If the idle period exceeds the current time-out, the rotating media is spun-down. I/O type, whether read or write, is ignored because any request, regardless of type, will cause the rotating media to spin-up. This is not true of hybrid disks as one of the intents for adding an NVCache to a hard disk is to extend the duration of spun-down periods by servicing I/O to and from the NVCache. Note this assumes the block I/O layer or disk driver is aware of the rotating media’s power state, and will redirect I/O while it is at rest. Our previous work describes a mechanism implemented in the block-layer to redirect I/O to and from a physically separate flash-based NVCache while the disk is spun-down. With such support, NVCache utilization is conveyed to the spin-down algorithm in the context of extended spin-down periods. A hybrid disk-unaware spin-down algorithm will still ignore I/O type because it believes any I/O will cause a spin-up. However, with the above redirection mechanism, such an assumption is false—write requests are actually unlikely to cause a spin-up. Therefore, we present Artificial Idle Periods, a spin-down algorithm modification for a hybrid disk which considers I/O type when computing a disk’s idle time, by recording idle time as time since the last read request. When a request occurs for a disk in the active mode,black plastic plant pots the time-out value is reset only on a read request. The idle period is thus artificially increased to current time last read access time.

As a result, even if a hybrid disk is actively servicing requests, it can be spundown and remain so, provided I/O consists only of write requests. Such a modification has several implications. First, duty cycles may be consumed faster; idle periods are artificially increased so a disk will spin-down sooner and probably more frequently. Second, I/O performance may degrade with sequential write workloads as flash sequential write throughput is only a fraction of rotating media, which must still be periodically flushed to disk. Finally, the NVCachewill endure more erase-operations resulting from its increased workload, decreasing its expected lifetime.As we will show in our evaluation, even with Artificial Idle Periods, typically less than 10% of the NVCache is used per spin-down period to cache writes. Although the NVCache is checked for desired read requests, reads are still typically responsible for initiating spin-ups. NVCache cached writes are not successful at servicing read requests because the host operating system is likely to be idle and not evicting buffer cache pages quickly. As a result, most read requests will be satis- fied by the buffer cache. To ensure that read requests are satisfied by the NVCache we propose a Read-Miss Cache, an area in the NVCache that is populated with unsatisfied NVCache read requests . The hypothesis being that readmisses causing a disk to spin-up are likely to cause a disk to spin-up again. When an NVCache read-miss occurs while the rotating media is spun-down, the requested content is read from the newly spun-up disk and returned to the file system. It is stored in the Read-Miss Cache and subsequent sequential reads are also stored in the Read-Miss Cache, which we refer to as preloading. Preloading stops when a non-sequential read or write request occurs. Only the most frequently preloaded content is stored in the NVCache— preloading data into the NVCache merely updates its frequency count, including the original read-miss.

The Read-Miss Cache size is dynamic. However, a maximum size constraint can be supplied to bound its growth, represented as a percent of the total NVCache. There is also a static minimum Read-Miss Cache size set to 1%. The Read-Miss Cache grows when a read-miss occurs causing the disk to spin-up and shrinks when a write operation cannot be stored in the NVCache because there is no available room.We implemented hybrid disk functionality in the Linux kernel, mimicking a hybrid disk using flash and a traditional disk by redirecting I/O traffic at the block I/O layer. While the disk is spun-down, I/O is intercepted at the block layer and redirected to flash memory. To evenly spread out block erase cycles and reduce page-remappings, redirected writes are appended to flash in log order. Each redirected request is prepended with a metadata sector describing the original I/O: starting LBA, length, spin-down interval, etc. The redirected LBA numbers are stored in memory to speed up read requests to flash. When the flash fills up or a readmiss to it occurs, the corresponding disk is spun-up and the flash sectors are flushed to their respective locations on disk. We also built a a simulator to model a hybrid disk to allow expedient evaluation of the proposed enhancements with several week-long block-level traces. The simulator considers power relative to the given trace for different power states: read/write, seek, idle, standby, and spin-up. For the notebook drive, power state transitions to and from performance, active, and low-power idle are done internally to the drive and are workload dependent. Therefore, we make a worst case assumption and assume the drive is always in low-power idle, providing a lower bound on spin-down algorithm performance, relative to idle power. Read/write I/O power is computed using the disk’s maximum specified I/O rate, and seek power is computed with the drive’s average seek time rating for non-sequential I/O.

The spin-down algorithm implemented is the multiple experts spin-down algorithm. It is an adaptive spin-down algorithm developed by Helmbold et al.. The spin down algorithm is based on a machine learning class of algorithms known as Multiple Experts. In the dynamic spin-down algorithm, each expert has a fixed time-out value and weight associated with it. The time-out value used is the weighted average of each expert’s weight and time-out value. It is computed at the end of each idle period. After calculating the next time-out value, each expert’s weight is decreased proportional to the performance of its time-out value.To evaluate the proposed enhancements, we use several different block-level access traces, shown in Table 2. We use four desktop workloads and a personal video recorder workload. Each workload is a trace of disk requests, and every entry is described by: I/O time, sector, sector length, and read or write. The first workload, Eng, is a trace from the root disk of a Linux desktop used for software engineering tasks; the ReiserFS file system resides on the root disk. The trace was extracted by instrumenting the disk driver to record all accesses for the root disk to a memory buffer, and transfer it to userland when it became full. A corresponding userland application appended the memory buffer to a file on a separate disk. The trace, HPLAJW, is from a single-user HP-UX workstation. The next trace, PVR, is from a Windows XP machine usedas a Home Theater PC running the personal video recording application, Beyond TV. The WinPC trace is from an Windows XP desktop used mostly for web browsing, electronic mail, and Microsoft Office applications. The block level traces for both Windows systems were extracted using a filter driver. The final trace, Mac is from a Macintosh PowerBook running OS X 10.4. The trace was recorded using the Macintosh command line tool, fs usage,black plastic planting pots by filtering out file system operations and redirecting disk I/O operations for the root disk to a USB thumb drive. The physical devices we present results for are a Sandisk Ultra II Compact Flash card, a Hitachi Travelstar E7K100 2.5 in drive, and a Hitachi Deskstar 7K500 3.5 in drive. The power consumption for each state are shown in Table 1. Note that in all figures except Figure 9, we show results using the 2.5 in drive. We present results for both a a 2.5 in and 3.5 in drive in Figure 9 to motivate placing an NVCache in a 3.5 in form factor.Figure 5 shows the results of making the I/O subsystem aware of a hybrid disk’s NVCache, such that a write request occurring while the rotating media is at rest, is redirected to the NVCache.

This figure shows the percentage of time the disk can remain spun-down as a function of the NVCache size. The Eng trace benefits the most from the write cache by increasing its spin-down time from 71% to 92%, which translates into an increase of slightly more than one and half days of spin-down time for the seven day trace. This is primarily due to the periodicity of writes-while-idle which occur more frequently than any another workload. The other workloads also benefit from a write-cache by increasing their spun-down time by 4–10%. Figure 5 shows the percentage increase in spun-down time by adding Artificial Idle Periods to write-caching. This plot shows that Artificial Idle Periods significantly increase the percentage of time a disk is spun-down. The most significant benefit comes when the NVCache is less than 1MB. By adding Artificial Idle Periods to NVCache with less than 1MB, its utilization increases as I/O redirection to the NVCache occurs sooner and more frequently. With larger NVCache sizes, Artificial Idle Periods is utilized less often, and so its impact is less pronounced. However, the percentage increase by adding Artificial Idle Periods still stabilizes between 3.5% and 5% for all but the PVR workload, which stabilizes at a 27% increase in spundown time. The PVR workload benefits from artificial write period so much because its workload consists of periodic write requests without interleaving read requests. Although write-caching and Artificial Idle Periods are excellent solutions to decrease the time a disk is spent in standby mode, it is important recognize the associated reliability impact. Figure 5 shows the number of expected years to elapse before the 2.5 in disk exceeds the duty cycle rating . By enabling write-caching while the disk is spun-down, reliability increases with respect to utilized NVCache size. As the NVCache exceeds 10MB, reliability stabilizes because it becomes under utilized beyond this point. Figure 5 also shows the expected years before the disk will exceed the duty cycle rating, but with Artificial Idle Periods on. In this figure, we see that reliability decreases relative to write-caching alone. However, the expected years before exceeding the duty cycle rating is still more than two and half years for the the Mac trace, the lowest of the five workloads. Note that without write-caching or Artificial Idle periods, the Mac workload would exceed the duty cycle rating in seven months.Figure 6 shows the results for a 256MB NVCache with write-caching, Artificial Idle Periods, and a Read-Miss Cache as a function of the maximum Read-Miss Cache size. Figure 6 shows the number operations satisfied by the Read-Miss Cache while the rotating media is spundown. This figure shows that with less than half the NVCache enabled for the Read-Miss Cache, the working set of read requests to the NVCache is captured. Figure 6 shows the average Read-Miss Cache size as a function of the maximum Read-Miss Cache size. The average Read-Miss Cache size grows linearly with respect to the maximum size until 90%, after which it deteriotes quickly, confirming that usually 10% of a 256MB NVCache is used for write-caching and Artificial Idle Periods. The PVR workload is an exception as it stabilizes at 27% because of its high NVCache utilization from television content recording. Figure 6 shows the percentage increase in spun-down time by adding a Read-Miss Cache to an NVCache with write-caching and Artificial Idle Periods. Here we see that the Read-Miss Cache only increases the spun-down time percentage by at most 1.5%. Note that with only write caching and Artificial Idle Periods enabled for a 256MB NVCache, all but the PVR workload are spun-down for 90–95% of the workloads, which leaves little room for improvement. Although the Read-Miss Cache does not increase spun-down time significantly, there is still a consistent increase when the Read-Miss Cache is allowed to use the entire NVCache.