Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


July 2002

High-Availability System Architecture


RSS
Subscribe to Windows IT Pro | See More Backup and Recovery Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Measuring High Availability

Basic blueprints for file servers, Web servers, and DNS servers

You'll find a glut of articles that discuss high-availability concepts and strategies, as well as a plethora of articles that cover the engineering details of high-availability solutions' components. You're probably ready for an article that shows you how to put those components to use in your IT environment. Perhaps you've promised a high service level agreement (SLA) to your customers, and now you need to know how you're going to keep that promise. If you need to configure a high-availability Windows 2000 file server, Web server, or DNS server, you'll find this article's collection of basic blueprints extremely valuable.

High-Availability Management Phases
Application availability is inversely proportional to the total application downtime in a given time period (typically a month), and the total downtime is simply the sum of the duration of each outage. To increase a system's availability, you need to decrease the duration of outages, decrease the frequency of outages, or both. Before I discuss useful technologies, you need to understand the phases of a postoutage restoration.

In the event of a serious outage, you would need to build a new server from scratch and restore all the data and services in the time available to you. Suppose you've promised an SLA of 99.5 percent, and you're counting on only one outage per month. (For information about calculating availability percentages, see the sidebar "Measuring High Availability.") Within 3 hours and 43 minutes of the start of an outage, you would need to work through the following five restoration phases:

  1. Diagnostic phase—Diagnose the problem and determine an appropriate course of action.
  2. Procurement phase—Identify, locate, transport, and physically assemble replacement hardware, software, and backup media.
  3. Base Provisioning phase—Configure the system hardware and install a base OS.
  4. Restoration phase—Restore the entire system from media, including the system files and user data.
  5. Verification phase—Verify the functionality of the entire system and the integrity of user data.

Regardless of your SLA, you need to know how long the phases take. Each phase can introduce unexpected and unwelcome delays. For example, an unconstrained diagnostic phase can take up the lion's share of your available time. To limit how much time your support engineers spend diagnosing the problem, set up a decision tree in which the engineers proceed to the procurement phase if they don't find what's wrong within 15 minutes. The procurement phase can also be time-consuming if you keep the backup media offsite and have to wait for it to be delivered. I once experienced a situation in which the truck delivering an offsite backup tape crashed en route to our data center. You might think 3 hours and 43 minutes is a short period of time in which to restore service, but in reality, you might have only about 2 hours to complete the actual restoration phase.

Blueprint for High-Availability File Servers
A file server doesn't require much CPU capacity or memory. To support 500 users and 200GB of data, you might use one small server, such as a Compaq ProLiant DL380 with two Pentium III processors and 512MB of RAM. With minimal accessories, such a setup costs about $9700 retail. If you use a DLT drive that provides an average transfer rate of 5MBps, physically restoring 200GB of data will take 666 minutes, or 11 hours and 6 minutes. Add an hour for the diagnostic, procurement, base provisioning, and verification phases, and you're looking at 726 minutes to recover from an outage. If you assume one outage per 31-day month (44,640 minutes), then $9700 buys you a file server with an SLA of 98.37 percent.

To increase the availability of file servers, you can use standard strategies: Reduce the time required to restore the file share and data during an outage and reduce the frequency of outages. Many technologies address each of these strategies for file servers. As a starting point, let's look at basic implementations that use the following techniques: data partitioning, snapshot backup-and-restore technologies, and fault-tolerant systems.

Data partitioning. In the configuration that Figure 1 shows, FileServer2 contains product data, FileServer3 contains images, and so on. To make this partitioning transparent to the user, you can implement a technology such as Microsoft Dfs, which lets you create a virtual file system from the physical nodes across the network. A user who connects to \fileserver1\share would see a directory structure that appears to show all data as if it were residing on FileServer1, even though some of the data physically resides on FileServer2.

Table 1 shows the availability you can achieve through data partitioning. (This table uses a typical SLA formula and assumes that you need to restore only one server during the outage.) The cost per server goes down as you accumulate servers and as the number and size of the servers' disks decrease. Obviously, the partitioning option is costly in terms of server hardware, so you need to decide whether you can live with an average of 12 hours 6 minutes (726 minutes) unscheduled downtime per month. Perhaps spending an extra $23,800 ($33,500 minus $9700) to reduce that time to 3 hours 13 minutes (193 minutes) makes sense for you. Data partitioning is particularly cost-prohibitive if you have huge quantities of data and need dozens of servers or more.

Snapshot backup and restore. An alternative to data partitioning is to implement faster technology. Faster tape drives won't necessarily provide a quantum leap in performance, so you'll need to use snapshot backup-and-restore technology, which is typically available in conjunction with Independent Hardware Vendors (IHVs—e.g., EMC, Compaq) of enterprise storage systems. Upcoming software snapshot products might change this equation, but for now, you need to address the enterprise storage vendors.

   Previous  [1]  2  3  4  5  Next 


Reader Comments
Page 24 of the print article states that RAID 5 technology introduces additional fault tolerance by allocating portions of each disk in the array to parity data.

No, RAID 5 does not provides additional fault tolerance over mirroring. It is just another way of providing fault tolerance in which we have a more efficient fault tolerance (because mirroring means 50 % efficiency where as teh efficiency of RAID 5 exceeds 66%). It is efficient but it does not introduce any more fault tolerance.

Murat Yildirimoglu August 13, 2002


You must log on before posting a comment.

If you don't have a username & password, please register now.




Top Viewed ArticlesView all articles
Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

New Microsoft/Yahoo! Deal? No

On Sunday, the Times of London reported that Microsoft had renewed talks with failing Internet giant Yahoo! and would manage its search engine for 10 years, while Yahoo! would retain control of its email, messaging, and content services. This report ...

How can I stop and start services from the command line?

...


Storage Whitepapers Combining Deduplication and VMware Disaster Recovery: Cascading Savings Improves Cost Effectiveness

Virtualizing Microsoft Exchange Server 2007

StoreVault SnapManagers for Microsoft Exchange and SQL Server

Related Events Storage Consolidation for Your Microsoft Applications: Reducing Cost and Complexity

Virtualization, Automation and Databases

Check out our list of Free Email Newsletters!

Storage eBooks A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Keeping Your Business Safe from Attack: Encryption and Certificate Services

Related Storage Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technology Resource Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing