Posts Tagged ‘RTO’
What’s worse than losing your data?
Losing your data and having no backup.
What’s worse than having no backup?
Having a backup that restores inconsistent data.
That’s precisely the concern that Josh Kirsher raised on the April 10 Wikibon Peer Incite. A lot of people are buying insurance, in the form of snapshots of application data, and they leverage consistency groups, thinking this will insure that the data is application-consistent. It’s the application-consistent snapshot that companies use as source-volumes for off-site backups and asynchronous replication, and as on-premise application recovery points. And it’s consistency groups that enable applications to be restored in minutes rather than hours or days. Unfortunately consistency groups only work when procedures are perfectly designed, when they are perfectly followed, when they are constantly maintained, and when no one makes an error.
In today’s dynamic environment, where the servers on which applications run are virtualized, where applications are frequently moved from one physical server to another, where LUNs are quickly created, and volumes are added to and removed from LUNs on a daily basis, the probability of developing a perfect consistency-group process that is precisely followed and continuously maintained, without introducing any human error, is very low. That means that, when you need to call upon your insurance, which is the snapshot or the backup that you assume is application consistent, the probability is very high that the data will in fact be inconsistent and the time to restore consistent application data from paper source documents will be measured in days, not minutes or hours. And, for companies that primarily transact business electronically, they may not be able to reconstruct the data at all. This is the scenario that Tim Hays, of Animal Health International, avoided when he made the decision to protect everything. After all, if he could affordably protect everything, he didn’t have to worry about what he might miss.
About thirty years ago, IBM announced the IBM 3380 Direct Access Storage Device. It had a capacity of 2.52GB and a price that began at $81,000 without the controller. At the time, successful storage solution providers like IBM made their storage systems out of high-quality, high-cost components and charged a premium. The design goal was to prevent failures, because there weren’t a lot of ways to survive failures.
Given the volume of data now created, today’s storage systems are by necessity very different. They are designed with the expectation that components will fail and fail frequently, but that the data will survive. In order to achieve acceptable levels of data availability and data protection, storage system suppliers overcome the component failures through software, through redundant components, and through redundant copies.
I spent some time this weekend looking at what others are saying about disaster recovery and found this article by Rajen Sheth: “Disaster Recovery by Google.” Rajen is a Senior Product Manager at Google. His article makes some very good points, such as stressing the importance of synchronous mirroring and having a disaster recovery facility located outside of the disaster zone. Of course, he also talks about the cost and complexity of managing multiple data centers.
As a product manager for Google, it’s not surprising that he then recommends that companies migrate to Google’s collaboration applications, such as Gmail, Google Calendar, and Google Docs. One of Rajen’s key points is that Google’s offering will deliver better disaster recovery, since it has an RPO design target of zero and an RTO design target of “instantaneous” or RTO=0. Google Apps are, in my opinion, a good example of where cloud-based applications are heading. Unfortunately, despite all the work that Google is doing, only a small portion of any company’s applications would be covered by the current suite of Google Apps. Read the rest of this entry »
I’ve continued to look at the data that was in Symantec 2010 Disaster Recovery Study. There’s a lot of very useful information in the study. Here’s some of what I found interesting:
• Only 20% of virtual environments are protected by replication or failover technologies
• 60% of virtualized environments are not covered in DR plans
• Actual downtime from outages is more than twice what companies expect
• 40% of DR tests fail to meet the RTO/RPO that have been set for the applications
That last one is very interesting. It’s hard to imagine anyone putting up with a 40% failure rate for long. I suspect some things will have to change, and soon. But given the tight budget times, it doesn’t mean that companies are going to spend more. In fact, 43% of companies said their disaster recovery budget would decline in the next 12 months.
At Axxana, our sole reason to exist is to provide disaster recovery capabilities to organizations, so you might think that declining budgets for DR are bad news, but they’re not. No, in the world of disaster recovery, when budgets get tight and service levels aren’t being met, something needs to change. And that’s when organizations look for new, more-innovative ways to provide data protection and disaster recovery. That’s what we offer. We have a new class of data protection, Enterprise Data Recording (EDR), that actually enables companies to meet RTO/RPO service levels, while lowering the cost of data protection.