Managing User-Maintained Data
The data that users store in OST or PST files on desktop or laptop PCs is the most troublesome type of Exchange data to manage because it's distributed and often inaccessible (from an administrative perspective). OST files are a lesser problem because they're simply slave copies of Exchange server mailbox content. In Microsoft Office Outlook 2003 Cached Exchange Mode, the OST is a complete replica of the online Exchange mailbox, whereas in nonCached Exchange Mode (or in earlier Outlook versions), the local OST contains a subset of the server mailbox data.
PSTs are a different story. Mailbox quota restrictions often force users to store important email data in PST files, but these files are usually large (several hundred megabytes or greater) andwhen stored locallytypically are excluded from local hard-disk backup procedures, if any even exist. Users often place PST files on network shares, which is certainly better than keeping them on hard disks. But though backups are simpler when dealing with network sharebased PST files, unchecked PST growth can still be problematic. As a PST's size increases, so does its chance of corruption, which can be irrepairable. In reality, little benefit is to be had from moving data to a network sharebased PST versus keeping the content in a user's mailbox. Furthermore, PSTs are inherently unsecure. You can encrypt PST files, but decrypting utilities are well known and widely available. If users are storing sensitive corporate data in PSTs on, say, laptops, that data is at risk if the laptop is lost or stolen. Even data held in PSTs on network shares must be adequately protected from unauthorized access. Furthermore, if you have legal requirements for archiving or retention, unmanaged PST files can get you in a lot of trouble.
Better Backup and Restore
Your choice of backupand more importantly, restoresolutions will depend on the amount of data that you need to process and the speed with which this processing must occur. For server-based data, many enterprise deployments implement procedures that allow for the databases to be restored within 1 hour (your specific Service Level AgreementsSLAsmight provide for variance from this figure). For example, to meet the goal of a 1-hour restoration of 40GB of server data, a tape device must provide restore rates (not just backup rates) of no less than 10MBps. Many backup solutions now involve intermediate backup to disk before eventually streaming off to tape, so initial backup rates (i.e., the rate of the backup-to-disk portion) and restore rates can often be significantly higher than backup-and-restore traditionally associated with tape only.
SAN-based solutions often offer high tape-restore transfer rates; figures in the region of 100GB to 140GB per hour aren't uncommon. Such capability might influence the size limits that you assign to databases. The ability to backup and restore larger volumes of data faster means that you can implement larger databases, which in turn can mean either increased mailbox quotas for users or more users per server.
Windows Server 2003 provides support for Volume Shadow Copy Services (VSS) which in conjunction with Exchange 2003 offers the capability to take a consistent snapshot of an Exchange database in a matter of seconds. Note that the snapshot is merely a point-in-time view of the disk map for the original database file, so if the physical volumes on which the database resides becomes unavailable, the snapshot is effectively useless (although many vendors attempt to insulate systems from this problem). Therefore, even though databases can be "snapped" in seconds, the snap volume must still be streamed off to some storage medium, typically tape. Accordingly, however, the snapped volume can be restored in a matter of seconds as well. VSS-aware storage subsystems and backup and restore solutions can dramatically influence your data-management framework, but be sure you carefully research and test them before putting them into production.
Exchange 2003 (especially Service Pack 1SP1) introduces new functionality in the form of the Recovery Storage Group (RSG). The concept is straightforward: If a database from a particular SG becomes unavailable to users and must be restored from backup media, an empty recovery database is made available to users homed in the affected database while that database is being restored from backup. Although none of the users' existing messages will be available during this restore period, the ability to send and receive email is maintained. When the restore is complete, the recovery database (which is now populated with new content) can be merged with the restored database. When properly worked into disaster recovery and restore plans, the RSG concept can positively influence SLAs and maximum database sizes. And SP1's Recover Mailbox Data Wizard simplifies the merging of the restored data with newly created data.
Backing up user-maintained data, such as PSTs, presents greater challenges, as I mentioned earlier. Backups of PSTs on local hard disks are almost impossible to enforce or control because they rely almost solely on the user. PSTs on network shares can be backed up centrally but still seem to offer little advantage over large mailboxes in the Exchange database.
All About Archiving
Strictly speaking, archiving solutions differ from regulatory-compliance solutions in the following ways:
- Archiving is often user-initiated, in that a user arbitrarily decides to archive an object from his or her Exchange mailbox to an archive store.
- Arbitrary archiving is often complemented by policy-based archiving of expired content to archive stores.
- Archiving solutions typically don't guarantee that all messages that are created or sent within a system or that pass through an ingress or egress point will be written to an archive store.
You might be aware that Outlook provides a rudimentary form of archiving whereby the user can configure Outlook to move messages older than a defined age to a PST file. However this approach just moves the data around rather than delivering it to dedicated, protected archive stores, so Outlook archiving isn't a serious contender.
More sophisticated solutions, such as VERITAS Software's KVS Enterprise Vault, can provide user-initiated and policy-based archiving to a second-tier (or higher) data location. Solutions such as these are effective because they can retain a message stub in the user's Exchange mailbox while moving sizeable attachments or message content to the archive store. If a user wants to review archived content, it's often accessible merely by clicking the message stub, at which point the archived content is retrieved. Thus, Exchange storage consumption is optimized while large content is offloaded to a system more suitable for bulk storage.
This type of archiving solution is often integrated with Exchange's Journaling feature to intercept and trap all messages circulating within an Exchange environment. But when large volumes of traffic are expected or when regulatory-compliance issues dominate, even archiving systems that integrate with Exchange Journaling (which might not provide the non-rewritable, non-erasable storage environment that most regulations stipulate) must integrate with or be replaced by more advanced technologies.
Examples of this form of technology include EMC Centera as well as HP's Reference Information Storage System (RISS). These types of solutions let you store static content, in a non-modifiable format, on disk and usually implement RAID-like technologies to guarantee data integrity and content authentication by means of digital signatures and time stamping. Typically, these solutions implement sophisticated Hierarchical Storage Management (HSM) systems, in addition to providing content indexing and retrieval. When you're dealing with regulatory compliance, HSM functionality is important because of the huge volume of email that can quickly mount up, especially in larger organizations. The average user sends 20 emails per day at an average size of 25KB per message. In an organization of 10,000 users, this estimate correlates to a total of 200,000 messages per day4.7GB of content per day or 1.7TB per year. If you also need to archive inbound messages, storage requirements can grow significantly. Of course, these are average figures, but I'm aware of one organization with 9400 users that receives between 120GB and 150GB of email per month.
Many organizations choose to implement an archiving solution as a first step when migrating from one Exchange version or organization to another. This technique reduces the amount of data that must be migrated and can speed up the migration process.
Get Your Act Together
You can no longer ignore the importance of managing Exchange data, especially as email traffic and message size continue to grow and as regulatory-compliance requirements become more commonplace. Users will continue to demand that you retain more datayet leave it at their disposaland that you maintain fast recovery times and as little downtime as possible. As an administrator, you must try to meet these demands while operating under your organization's financial, technical, and regulatory constraints. Fortunately, you have many options at your disposal: mailbox quotas, storage technologies, and archiving solutions. For more information and ideas about your options and how to evaluate them, see the "Learning Path" on page 62, as well as the Web-exclusive sidebars "Putting Exchange Data Management in Context" (http://www.windowsitpro.com, InstantDoc ID 45625) and "Data Management Challenge: How Did We Get Here?" (InstantDoc ID 45624).
End of Article
terryh April 03, 2006 (Article Rating: