Cloud Computing Is Reinventing Disaster Recovery Planning
Business Solutions Industry expert Walter Angerer weighs in on the cloud’s changing role in disaster recovery and business continuity.
Mediaplanet: Can you speak to what cloud technology has done for the data recovery?
Walter Angerer: Years ago the data recovery market underwent a major change. Tape based backup was replaced with disk to disk backup, minimizing backup windows, but more importantly crushing restore times. The introduction of disk into the system greatly improved restore times and concurrency.
The introduction of cloud has even more significant impact than disk had. Using cloud technology we not only can provide fast or near instant restore on premises but also for the first time ever, businesses can now perform an instant recovery to an off-site location. The recovery time has gone down from weeks to minutes. That change in technology completely changed how we think about off-site data recovery.
MP: What are some interesting initiatives currently going on in the data recovery industry?
WA: Providing fast access to off-site data is just a starting point. Today’s vendors must take data protection to the next level. The most significant trend in the industry is to find ways to utilize the recovery infrastructure in the absence of a restore or DR scenario.
Today’s best of breed solutions enable companies to turn their data recovery system into a live clone of their production environment and use it as a development and test platform without impacting production. Moving this workload completely into the cloud eliminates the need for cumbersome and expensive maintenance of development/test environments.
"The concept of "backup once, use many" will reshape the industry and architecture of data centers in the future."
The concept of "backup once, use many" will reshape the industry and architecture of data centers in the future.
MP: What’s the greatest benefit that data backup/recovery can provide a business?
WA: Businesses are much more depended on their IT infrastructure than ever before. The cost of downtime caused by minor system failures is greatly underestimated and does significantly impacting the bottom line.
We usually don't think about it until systems go down, and we are somewhat accustomed to accept this down time, but with what consequences? Gartner predicted that 2 out of 5 enterprises that experience a disaster would go out of business within five years of the event.
Minimizing downtime by providing multiple layers of instant recovery capability that should include the ability to recover to an independent off-site location (cloud) will have a major impact on any corporation within less than a year.
MP: Can you name a few obscure causes of downtime that readers should be aware of?
WA: A customer I recently spoke with suffered from a devastating SAN failure. Three disk drives failed at about the same time in the SAN causing a total loss of the SAN. In the absence of an independent recovery infrastructure, the outage lasted for several days.
Another example is a cyber attack. A crypto virus managed to penetrate a corporation’s network, encrypting much of their production data. A regular restore would have taken many days because of the sheer magnitude of lost data and services. This customer had a solution in place and was able to recover within minutes.
MP: Three important steps readers should take in order to avoid an IT disaster?
WA: Every IT infrastructure should have at least 3 layers of protection:
- A production layer with built in redundancy like raid technology, snapshots and the ability to move a workload quickly to different computer infrastructure.
- An independent layer of protection on-site. If a central piece of the production system fails, such as a SAN failure for example, it is important to have an independent system on premises that can run the entire production workload while the failure is being fixed.
- An off-site location that has the ability to run the production workload within short period of time. A simple power outage, caused by road construction in the neighborhood can take down an entire site for days. The ability to recover the system in a different location is important.
Further, every CIO needs to have a simple way to follow DR procedure. Many times DR procedures are lengthy and complicated. Relying on a 200-page procedure to recover from a disaster is a bad idea. Once the system failure occurs, the pressure is on and it is unrealistic to expect even the best trained IT professional to flawlessly walk through such complicated procedures without error and in a reasonable time. Systems should be able to recover with a single click.
Corporations are ignoring the smaller, more frequently occurring outages. In many cases it’s easier to repair the failure rather than to fail over to a standby solution. This approach will greatly prolonged the outage and cost a lot of money.
Having a system that allows you to restore services quickly and easily without having to repair the failed system, will greatly reduce the number of outages. Every time a system is down for an hour or two, companies lose money. Those outages can be avoided with an instant recovery solution that runs on independent hard ware.