Does your firm have a disaster recovery plan? I would venture to say yes. It may be a little dusty, but inevitably a news headline will soon appear spreading panic about a hurricane brewing or a pandemic lurking and all eyes will once again be focused on your plan. Surprisingly, statistics show that very few "disasters" result from natural disasters or large-scale catastrophic events. It is much more likely that your firm is going to need to recover from a hardware failure, data corruption, or air conditioning or power failure than from the swine flu or a Category 5 hurricane. In reality, it doesn't really matter; when any event results in the disruption of information technology services or an outage in your data center, your disaster recovery plan had better be ready and viable.
What Are the Expectations?
The plan is current
If there is one thing that disaster recovery plans are not, it is static. It requires a lot of diligence on the part of IT to keep these plans current. Every time a change is made in the firm's IT operating environment, it needs to be evaluated against the existing plan to identify potential changes. Application upgrades, new hardware, changes in infrastructure, new backup technologies and procedures, and new storage components all need to be accounted for and addressed.
Time to recovery and your recovery point
One basic underlying concept in any disaster recovery plan is the acceptable recovery time. This is the time that the organization has agreed is acceptable for being "down." Just after Sept. 11, my organization engaged a third-party consulting firm to devise a recovery plan. In that era of disaster recovery, we had some systems that were deemed mission critical and were allotted a recovery time within the first 24 hours of an incident, but many services weren't expected to resume for days or in some cases, up to a week, after an event. However, according to a recent survey from Symantec,1 these expectations have changed significantly. Up to 60 percent of an organization's systems are now deemed mission critical and most organizations expect their mission-critical applications to be restored within four hours of an event. The acceptable recovery window continues to tighten; in the same survey in 2008, only 3 percent of organizations indicated that they could recover skeleton operations within a 12-hour window, and 31 percent of respondents indicated that they could recover baseline operations within a 24-hour window.
Another area of focus in your plan is your recovery point, which basically defines how much data you feel that you are comfortable losing. For example, if you have real-time replication, you basically have an immediate recovery point, but if you take a snapshot of your data or server every four hours, then you could potentially lose up to four hours worth of work if something were to happen. Your recovery point varies from system to system and is highly dependent on your backup strategies and the success of those strategies. A lot of organizations still use nightly tape backups as their sole backup strategy. Under this scenario, you could potentially lose up to 24 hours worth of work product and, in reality, you could lose more. According to industry statistics, 20 percent of nightly tape backups fail to capture all data. In addition, 40 percent of IT managers have indicated an inability to recover data from a tape when they needed it.2 As a result, many organizations have moved away from tape. However you back up your data, just remember when considering your disaster recovery plan that your point of recovery is very important, and you're only as good as your last successful backup.
What Are the Major Challenges?
Cost and investment
To implement a solid disaster recovery strategy is not cheap, but most organizations seem to accept this fact. According to the same Symantec survey, IT budgets for disaster recovery were up in 2009 compared with 2008 and, even with the economic downturn when many areas are being cut, it is expected that expenditures for disaster recovery will remain at this level.
Traditionally, disaster recovery has meant the identification of key systems and the investment in requisite hardware to rebuild those systems at some off-site location. Typically, the hardware needed to be a very close, if not an identical, match to facilitate a successful restoration. Unfortunately, for this large outlay of capital, the investment results in little more than hardware that sits idle "just in case."
Put to the test
Another area where disaster recovery plans commonly fall short is testing. It is not uncommon for plans to go untested, and when they are put to the test, the results are not that surprising—one in four tests results in failure. However, given the wide array of variables that must be accounted for, it is surprising that our disaster recovery plans are as good as they are.
Enter Virtualization
Virtualization technologies have been in existence for several years, but it is just in the past two years that they've gained a strong foothold within the legal environment. Initially, virtualization was brought in as an answer to firms' green initiatives and to address environmental constraints such as power and air conditioning, but many firms have found that there are several secondary benefits to virtualizing their environment.
What is virtualization?
Generally speaking, traditionally, one application equals one server. Over time, this has led to hundreds of servers, and all of these servers take up space, electrical power, and cooling, but many of these servers are underutilized. With virtualization, several logical servers can be consolidated onto a single physical machine where they are able to share the available resources (e.g., RAM, CPUs). In this scenario, an organization can significantly reduce the number of servers that it supports, which also reduces the physical footprint of the IT infrastructure, the required cooling, and the draw on power.
How does virtualization affect disaster recovery?
Virtualization affects disaster recovery in the following ways:
- Immediately, the disaster recovery strategy becomes slightly less complex because you're dealing with fewer servers as part of your recovery strategy and, ultimately, this means a reduction in expenses and less idle equipment.
- Virtual servers are software based and hardware independent. Basically, a virtual machine comprises several files. These files contain the entire server—data, applications, operating system, hot fixes, updates, etc. These files can be restored to any compatible host, regardless of the make and model of the physical box, and your server, application, and data can be up and running immediately. Similarly, system restorations become much more uniform and you are less dependent on system-specific expertise and documentation.
- With the right tools and configuration in place, your virtual servers can automatically failover to alternate physical machines when they detect a failure or particularly high resource utilization.
Virtualization is not a panacea. There are pitfalls. In a virtual machine environment, a hardware failure has the potential to affect multiple systems instead of a single resource. In addition, it is important to know your applications and how they are used and the resources that they require. With these caveats stated, it is safe to say that virtualization is changing the disaster recovery model and is providing IT departments with the necessary tools and capabilities to meet the ever-increasing demands surrounding firm disaster recovery strategies. However, it is still important to emphasize the necessity of a well-documented plan and the need for regular testing, and it is still imperative to set expectations and define acceptable recovery parameters.
Sources
1. Symantec, Disaster Recovery Global Data: Survey Results June 2009.
2. Regan, Keith, Concerns Raised on Tape Backup Methods, SearchSecurity.com, April 15, 2004.
Back to Contents
|