For Day 19 of this series, I am going to talk a little about RAID, which stands for Redundant array of independent disks or Redundant array of inexpensive disks, depending on who you believe.
RAID is a technology that allows the use of multiple hard drives, combined in various ways, to improve redundancy, availability and performance, depending on the RAID level used. When a RAID array is presented to a host in Windows, it is called a logical drive. Using RAID, the data is distributed across multiple disks in order to:
• Overcome the I/O bottleneck of a single disk
• Get protection from data loss through the redundant storage of data on multiple disks
• Avoid any one hard drive being a single point of failure
• Manage multiple drives more effectively
Regardless of whether you are using traditional magnetic hard drive storage or newer solid state storage technology, most database servers will employ some sort of RAID technology. RAID improves redundancy, improves performance, and makes it possible to have larger logical drives. RAID is used for both OLTP and DW workloads. Having more spindles in a RAID array helps both IOPS and throughput, although ultimately throughput can be limited by a RAID controller, HBA, NIC, or the PCI-E slot that is being used.
Keep in mind that while RAID does provide redundancy in your data storage, it is not a substitute for an effective backup strategy or a high availability/disaster recovery (HA/DR) strategy. Regardless of what level of RAID you use in your storage subsystem, you still need to run SQL Server full, differential, and log backups as necessary to meet your recovery point objective (RPO) and recovery time objective (RTO) goals.
There are a number of commercially-available RAID configurations, which I’ll review over the coming sections, and each has associated costs and benefits. When considering which level of RAID to use for different SQL Server components, you have to carefully consider your workload characteristics, keeping in mind your hardware budget. If cost is no object, I am going to want RAID 10 for everything, i.e. data files, log file, and tempdb. If my data is relatively static, I may be able to use RAID 5 for my data files. It is also fairly common to use RAID 5 for SQL Server backup files.
During the discussion, I will assume that you have a basic knowledge of how RAID works, and what the basic concepts of striping, mirroring, and parity mean.
RAID 0 (disk striping with no parity)
RAID 0 simply stripes data across multiple physical disks. This allows reads and writes to happen simultaneously, across all of the striped disks, so offering improved read and write performance, compared to a single disk. However, it actually provides no redundancy whatsoever. If any disk in a RAID 0 array fails, the array is off-line and all of the data in the array is lost. This is actually more likely to happen than if you only have a single disk, since the probability of failure for any single disk goes up as you add more disks. There is no disk space loss for storing parity data (since there is no parity data with RAID 0), but I don’t recommend that you use RAID 0 for database use, unless you enjoy updating your resume! RAID 0 is often used by serious computer gaming enthusiasts in order to reduce the time it takes to load portions of their favorite games. They do not keep any important data on their “gaming rigs”, so they are not that concerned about losing one of their drives. Even this usage is declining over time as SSDs become more affordable.
RAID 1 (disk mirroring or duplexing)
You need at least two physical disks for RAID 1. Your data is mirrored between the two disks, i.e. the data on one disk is an exact mirror of that on the other disk. This provides redundancy, since you can lose one side of the mirror without the array going off-line and without any data loss, but at the cost of losing 50% of your space to the mirroring overhead. RAID 1 can improve read performance, but can hurt write performance in some cases, since the data has to be written twice.
On a database server, it is very common to install the Windows Server operating system on two of the internal drives, configured in a RAID 1 array, and using an embedded internal RAID controller on the motherboard. In the case of a non-clustered database server, it is also common to install the SQL Server binaries on the same two drive RAID 1 array as the operating system. This provides basic redundancy for both the operating system and the SQL Server binaries. If one of the drives in the RAID 1 array fails, you will not have any data loss or down-time. You will need to replace the failed drive and rebuild the mirror, but this is a pretty painless operation, especially compared to reinstalling the operating system and SQL Server!
RAID 5 (striping with parity)
RAID 5 is probably the most commonly-used RAID level, for both general file server systems and for SQL Server. RAID 5 requires at least three physical disks. The data, and calculated parity information, is striped across the physical disks by the RAID controller. This provides redundancy because if one of the disks goes down, then the missing data from that disk can be reconstructed from the parity information on the other disks. Also, rather than losing 50% of your storage, in order to achieve redundancy, as for disk mirroring, you only lose 1/N of your disk space (where N equals the number of disks in the RAID 5 array) for storing the parity information. For example, if you had six disks in a RAID 5 array, you would lose 1/6th of your space for the parity information. As you add more disks to a RAID 5 array, the chances of losing any one of the disks goes up (due to simple statistics), so that is a reliability consideration for larger arrays.
However, you will notice a very significant decrease in performance while you are missing a disk in a RAID 5 array, since the RAID controller has to work pretty hard to reconstruct the missing data. Furthermore, if you lose a second drive in your RAID 5 array, the array will go offline, and all of the data will be lost. As such, if you lose one drive, you need to make sure to replace the failed drive as soon as possible. RAID 6 stores more parity information than RAID 5, at the cost of an additional disk devoted to parity information, so you can survive losing a second disk in a RAID 6 array.
Finally, there is a write performance penalty with RAID 5, since there is overhead to write the data, and then to calculate and write the parity information. As such, RAID 5 is usually not a good choice for transaction log drives, where we need very high write performance. I would also not want to use RAID 5 for data files where I am changing more than 10% of the data each day. One good candidate for RAID 5 is your SQL Server backup files. You can still get pretty good backup performance with RAID 5 volumes, especially if you use backup compression and striped backups.
RAID 10 and RAID 0+1
When you need the best possible write performance, you should consider either RAID 0+1 or, preferably, RAID 10. These two RAID levels both involve mirroring (so there is a 50% mirroring overhead) and striping but differ in the details in how it is done in each case.
In RAID 10 (striped set of mirrors), the data is first mirrored and then striped. In this configuration, it is possible to survive the loss of multiple drives in the array (one from each side of the mirror), while still leaving the system operational. Since RAID 10 is more fault tolerant than RAID 0+1, it is preferred for database usage.
In RAID 0+1 (mirrored pair of stripes) the data is first striped, and then mirrored. This configuration cannot handle the loss of more than one drive in each side of the array.
RAID 10 and RAID 0+1 offer the highest read/write performance, but incur a roughly 100% storage cost penalty, which is why they are sometimes called “rich man’s RAID”. These RAID levels are most often used for OLTP workloads, for both data files and transaction log files. As a SQL Server database professional, you should always try to use RAID 10 if you have the hardware and budget to support it. On the other hand, if your data is less volatile, you may be able to get perfectly acceptable performance using RAID 5 for your data files. By “less volatile”, I mean if less than 10% of your data changes per day, then you may still get acceptable performance from RAID 5 for your data files(s).