Enter file-based replication systems, which intercept writes at the file-system level rather than at the volume block level. Replication engines that work at this level know just which parts of a file have been changed and send only those parts across the network. These engines can also evaluate filenames, so you can create lists of files to include in or exclude from replication, reducing the demands replication makes on network bandwidth.
All the replication-based products support failover to cluster nodes located at a remote site (i.e., stretch clusters). For a look at the challenges that stretch clusters present and how high-availability products address them, see the sidebar "Remote Cluster Considerations."
As I've mentioned, high-availability solutions restart an application quickly after a system failure causes the application to stop processing. Another category of solutions—fault-tolerant solutions—lets an application continue after a failure without even a momentary interruption. In this review, I look at high-availability solutions from six vendors and one fault-tolerant solution.
Marathon Endurance Virtual FTserver
Marathon Technologies' Marathon Endurance Virtual FTserver is a software product that turns a pair of identical Intel-based servers into one fault-tolerant computing system. Endurance is built on the same foundation as Marathon’s Endurance 6200, which I reviewed in "Endurance 6200 3.0," July 2001, InstantDoc ID 21140.
Endurance works by running an application on both servers, which Marathon calls CoServers, in instruction lockstep—meaning that both servers execute the same machine instruction at the same time. To achieve this lockstep, the servers must have identical hardware and configurations. Currently both CoServers must run Win2K Advanced Server (AS) Service Pack 4 (SP4) in a dual-processor system. Endurance 6.0, which will be available shortly, will support Windows 2003 and CoServer configurations that have one hyperthreaded processor.
A minimum configuration requires two disk drives on each server: one to boot Windows and the other to boot the Virtual FTserver OS. Each server has a pair of Gigabit Ethernet adapters that's dedicated to communications between the two CoServers. Another Ethernet adapter in each server, called the CoServer Management Link, lets an administrator communicate directly with the CoServer. You can configure up to four additional Ethernet adapters in each CoServer for public network communication—these are the network links that user and application communication traverses.
When the two Endurance CoServers boot up, the first server to start waits for the second to establish communication over the two CoServer links, named CSLink1 and CSLink2. After communication is established, the first server starts Virtual FTserver, the protected virtual server that runs applications. When initialization of Virtual FTserver has been completed on the first server, Endurance copies the context of the first server—including the current state of virtual server memory and the next instruction scheduled to execute—to the second server, momentarily stopping the first server to complete the synchronization. When the servers are synchronized, they begin redundant, fault-tolerant operation, executing in lockstep. If a CoServer isn't available (e.g., it's down for maintenance or repair), the administrator can mark it disabled, and Endurance will run the functioning Virtual FTserver in non–fault-tolerant mode.
Endurance supports as much local disk storage as the physical server configuration supports. Mirrored pairs of disks, one in each CoServer, provide fault tolerance. The Endurance Device Redirector presents the virtual server with a single-disk view of the mirrored pair and manages the mirrored operation of the physical disks. After running in single-server (non–fault-tolerant) mode, Endurance performs a rapid remirror by copying only the disk sectors that were modified during single-server operation.
You use the Endurance Manager GUI, which Figure 1 shows, to monitor and manage Endurance. Endurance Manager runs on Windows XP and Win2K workstations that can communicate with the NICs configured as the CoServer Management Links. In addition, the MTCCONS command lets you perform routine tasks either from a CoServer or remotely.
Installing Endurance is a rather lengthy procedure but is well documented in Marathon’s Installation Guide. At its conclusion, Virtual FTserver is running in fault-tolerant mode as a virtual server that uses both CoServers. You control the virtual server by using Virtual FTserver Desktop (aka Endurance Desktop) from either one of the CoServers. Endurance Desktop operates similarly to remote control applications—you interact with the virtual server when Endurance Desktop has input focus and with the CoServer when the input focus is outside Endurance Desktop.
Marathon Technologies sells Endurance Virtual FTserver only through authorized resellers at a suggested retail price that starts at $12,000 for a uniprocessor version with 1 year of support and upgrades. Fully configured systems sell for less than $20,000.
PolyServe Matrix Server
PolyServe Matrix Server is the first Windows-oriented product from a company that has been serving the Linux market for several years. Matrix Server combines the PolyServe SAN File System (PSFS) with the features of PolyServe Matrix HA, a separate product that provides clustering functionality.
A PolyServe matrix consists of as many as 16 servers connected to SAN-based data storage. PSFS is at the heart of Matrix Server and lets you grow application processing power by adding nodes to the cluster in lieu of using larger (and more expensive) SMP servers. PSFS supports concurrent access to shared data by multiple cluster nodes with a distributed lock manager that coordinates file updates. Full journaling of data updates promotes rapid recovery from hardware failures. Because Matrix Server doesn't require a master/slave relationship between nodes, the administrator is free to configure failover of an application to any other node in the cluster. Matrix Server supports multiple host bus adapters (HBAs), redundant SAN switches, and multiple NICs for enhanced node availability. Integrated fabric management supports Brocade Communications Systems and McDATA switches and automatically adjusts cluster-node configuration in response to both data-path failures and reestablished data paths. Matrix Server supports most IA-32 servers from major manufactures and isn't limited to hardware on Microsoft’s cluster-certified list.
The PolyServe Management Console (mxconsole.exe), a Java-based GUI, is the primary administrative interface. As Figure 2 shows, the Management Console allows management of all cluster nodes from a central management station. A command-line interface, mx.exe, supports scripted operation. Currently, no SNMP interface or Windows Management Instrumentation (WMI) provider is available.
Matrix Server lets administrators add or delete nodes from a cluster while other nodes of the cluster continue to operate normally—no halting or pausing is necessary. For enhanced security and performance, administrators can specify which network or networks Matrix Server will use for cluster management traffic. Also, access to administrative functions is password-protected. Only the primary administrative user can change the matrix configuration. Other users, created by using the Matrix Server UserManager or the Mxpasswd command, can only view the matrix configuration. A flexible event-notification system can send administrators event information through email messages, pages, the PolyServe Management Console, or another user-defined process.