The HP AutoRAID hierarchical storage system (HP, 1995)
Jonathan Ledlie
February 18, 2000

As hinted at in our first discussion of RAID, taking the hybrid of two good ideas often results in keeping both of their good aspects and leaving their bad ones behind.  Here, the HP group has taken the speed of mirroring (RAID level 2) and combined it with the economy of spreading parity checking over all of the disks (RAID level 5).  They are able to come close to the speed of JBOD, but retain the safety net that RAID provides.  They have taken a common concept in computer science, locality, but instead of making the mirror layer a redundant cache for the lower RAID 5 layer, it is the only store of the active data.  The HP group seeks to resolve two primary difficulties with RAID: it is difficult, even for experts to properly configure a RAID, and that mirroring, which is fastest, is overkill for most applications.  They resolve the first issue, tuning, through software on the array controller itself.  This software then looks at how the data is being accessed and dynamically decides where it should go.  The second is resolved throught their hybrid approach, which is discussed above.  Their software divides data into Physical Extend Groups (PEGs), Physical Extents (PEXs), and Relocation blocks (RBs), where PEGs are the largest and RBs the smallest.  When a higher layer (the file system) asks to write some data, it is promoted to the RAID 2 layer.  After some period of inactivity, the data then migrates back to RAID 5.  Their system writes the data out in a manner similar to the Log Structured File System.  This entails the garbage collection difficulty associated with that algorithm, but because it is beneath the actual file system, operating system vendor do not need to tailor their systems to use the HP AutoRAID (at least that's the idea).  Other good points are that users are able to add disks of different sizes (at market price), and that the system dynamically spreads the workload over the resources it has available.  The obvious bad point with putting these decisions (when to migrate data, when to collect the garbage, and how to allocate data) in the controller is that they are now out of the hands of the operating system, which may know better in many cases about how data will be accessed in the long term -- best of all is probably the application itself.