Determining How Many Disks to Add
How do you know how many disks you need to meet your performance
requirements? The primary performance requirements for a RAID array are adequate throughput and response time. The workload you place on the array and the
amount of work the RAID array can support (i.e., transfers per second) influence
both requirements.
To help you know what steps you need to take when adding storage capacity
to NT, let's look at an example. Imagine that you have a server with a RAID 5
array composed of three 4GB Ultra Wide SCSI 7200rpm hard disks. Having
historical information to work from when adding storage capacity is helpful, so
imagine that you've stress tested your NT file server using Bluecurve's
Bi-Directional copy workload to simulate a file server workload (for information
about Bluecurve, see Carlos Bernal, "Dynameasure Enterprise 1.5,"
September 1997).
From the Bluecurve stress test results, you learn that the maximum
throughput that this configuration (configuration 1) provides at the 20-user
level is 3.8MB per second (MBps) with a response time of 13.9 seconds. When you
review the corresponding Performance Monitor log to determine what's happening
inside NT during the tests, you see that the %Disk Time stays at 100 percent. As
a result, I omitted this counter to ease viewing the chart you see in Screen 1,
page 187. As the Disk Transfers/sec increases against the disk array, the Avg.
Disk Queue Length grows to almost 16 and the average RAID array response time
(i.e., Avg. Disk sec/Transfer) increases to 0.121 second, which is slow. This
information indicates that this RAID array is causing a bottleneck. Now that you
know a bottleneck is occurring, you can use this information to determine the
best economical solution to remove the bottleneck and increase the usable disk
capacity.
Estimating Required Additional RAID Performance Capacity
The Avg. Disk Queue Length for configuration 1 is 16, which exceeds the
maximum recommended rating of 6 (3 disks * 2 outstanding requests each). Also,
the maximum transfers per second are 139 ([126 + (4 * 73)] / 3) per disk, which
exceeds the suggested workload that one disk can support. The combination of
long queues and excessive numbers of transfers per second slow the Avg. Disk
sec/Transfer response time to 0.121 second.
You want to limit each disk in the array to no more than two outstanding
requests at a time, so you need a minimum of eight disks to remove the
bottleneck. I recommend you replace the three-disk RAID 5 array with a 10-disk
RAID 5 array. Adding two more disks than the system requires gives you some room
for possible surges in workload and room to accommodate future requirements.
This configuration removes the disk bottleneck and provides 36GB of usable
storage capacity.
Graph 1 shows how the average response times of the RAID array in
configuration 1 compare with those of the new configuration (configuration 2).
Graph 2 shows how the throughput levels of the RAID array for configuration 1
compare with those of configuration 2. Configuration 2 lowered the aver-
age response time from 13.9 seconds to 9.2 seconds and improved the throughput
from 3.8MBps to 4.9MBps at the 20-client level. The Avg. Disk Queue Length
dropped from 16 to 12, and Avg. Disk sec/Transfer dropped from 0.119 second to
0.049 second. These results provide insight into the reason why the throughput
and response time reported by the Bluecurve clients improved significantly. In
addition, Performance Monitor reported that the RAID array provided greater than
7.34MBps of disk throughput while supporting a workload of 68 ([147 + (4 *
117)]/9) transfers per second per disk. This sizing solution provides improved
performance with room to grow.
Disk Storage Capacity vs. Disk Performance Capacity
In the example in this article, you learned how to determine the number of
disks you need to add to a RAID array to remove a disk bottleneck and provide
the necessary storage capacity. This example provides 36GB of usable disk
storage capacity. So why did I suggest you create a RAID array using ten 4GB
disks instead of five 9GB disks to provide 36GB of usable storage capacity? The
answer has to do with the supported disk workload. Just because disk capacity
increases from 4GB to 9GB, the workload each disk can support doesn't increase
if the disks in the RAID array are from the same family (e.g., Ultra Wide SCSI
7200rpm). Regardless of the disk storage capacity, each 7200rpm disk can support
only about 100 transfers per second. Thus, if you use five 9GB disks instead of
ten 4GB disks, you meet the storage capacity goal of 36GB, but the RAID array is
still a bottleneck. You can also use nineteen 2GB disks to provide even better
performance, but this solution is economically prohibitive.
Meeting Your Storage and Performance Needs
Understanding how to use and evaluate NT's built-in metrics and
distinguishing between storage capacity and disk performance capacity is
important. After you understand these concepts and the relationships of the
information that Performance Monitor provides, you can remove the guesswork
associated with sizing your RAID array and meet your storage and performance
needs. In a future article, I'll show you how you can tune your NT RAID solution
for maximum performance.
End of Article