Posts Tagged ‘RPO’
When I ask the question, “What’s your RPO?” I typically get an answer like “We’ve got an RPO of 5 minutes.” If I ask “How much data are you willing to lose?” I’ll hear a similar answer. But if I ask someone, “How much data is in your storage system?” they don’t answer me in minutes.
The data created by the mission-critical applications that support businesses don’t get updated in an even, regulated way. The update rates are almost always highly variable, full of transaction peaks and valleys. The peaks can happen during predictable times, like a holiday shopping season, and during unexpected times, like when there is panic buying before a hurricane.
That’s the not-so-funny disconnect between data and disaster recovery. We don’t measure data in minutes. We measure it in GBs. Wouldn’t it be better to set our snapshots and our recovery points based upon how much data has changed, instead of how much time has passed?
I have to credit our CTO, Dr. Alex Winokur, for helping me think this through. But, I’ve decided that, unless your RPO is zero, RPO doesn’t matter. Instead, we need to make sure we’ve protected the data to the very last byte.
Amazon has built a fantastic reputation as a provider of cloud services. With multiple data centers, service availability levels at 99.9% and integrated data backup services, Amazon’s EC2 makes perfect sense for new companies that want to build software applications and deliver them as a service. By delivering applications as a service, emerging companies can be a disruptive force competing against established packaged-application vendors. And Amazon EC2 enables these Application-as-a-Service suppliers to avoid the up-front capital costs associated with building multiple, redundant data centers. It doesn’t mean, however, that Amazon EC2 is perfect and without risk.
A look at the Amazon Web Services Service Health Dashboard today showed a number of service interruptions and performance issues in Amazon’s Northern Virginia facility on April 21 – 24. Henry Blodget of Business Insider reported that Amazon had a cloud crash and the “cloud crash destroyed many customers’ data.”
It would take a lot of digging to get to the bottom of why data was lost. The Business Insider article refers to a letter from Amazon to a customer that discusses “an inconsistent data snapshot” and Amazon’s inability to recover the data. Unfortunately, corrupted data which has been carefully copied to another location is still corrupted. That’s why it is important to keep a series of application-consistent snapshots together with transaction journals, so that application-data can be restored to its last known good state and updates can be applied to bring the data back to RPO=0. This is precisely what is done with the EMC RecoverPoint/Axxana Phoenix System RP solution. RecoverPoint maintains application-consistent snapshots, and Axxana stores the changed data, protected from fire, smoke, flood, shock, earthquakes, and building collapse.
As the cloud services become increasingly adopted for mission critical applications, perhaps it is time to consider a zero-data-loss solution.
About thirty years ago, IBM announced the IBM 3380 Direct Access Storage Device. It had a capacity of 2.52GB and a price that began at $81,000 without the controller. At the time, successful storage solution providers like IBM made their storage systems out of high-quality, high-cost components and charged a premium. The design goal was to prevent failures, because there weren’t a lot of ways to survive failures.
Given the volume of data now created, today’s storage systems are by necessity very different. They are designed with the expectation that components will fail and fail frequently, but that the data will survive. In order to achieve acceptable levels of data availability and data protection, storage system suppliers overcome the component failures through software, through redundant components, and through redundant copies.
I spent some time this weekend looking at what others are saying about disaster recovery and found this article by Rajen Sheth: “Disaster Recovery by Google.” Rajen is a Senior Product Manager at Google. His article makes some very good points, such as stressing the importance of synchronous mirroring and having a disaster recovery facility located outside of the disaster zone. Of course, he also talks about the cost and complexity of managing multiple data centers.
As a product manager for Google, it’s not surprising that he then recommends that companies migrate to Google’s collaboration applications, such as Gmail, Google Calendar, and Google Docs. One of Rajen’s key points is that Google’s offering will deliver better disaster recovery, since it has an RPO design target of zero and an RTO design target of “instantaneous” or RTO=0. Google Apps are, in my opinion, a good example of where cloud-based applications are heading. Unfortunately, despite all the work that Google is doing, only a small portion of any company’s applications would be covered by the current suite of Google Apps. Read the rest of this entry »
I’ve continued to look at the data that was in Symantec 2010 Disaster Recovery Study. There’s a lot of very useful information in the study. Here’s some of what I found interesting:
• Only 20% of virtual environments are protected by replication or failover technologies
• 60% of virtualized environments are not covered in DR plans
• Actual downtime from outages is more than twice what companies expect
• 40% of DR tests fail to meet the RTO/RPO that have been set for the applications
That last one is very interesting. It’s hard to imagine anyone putting up with a 40% failure rate for long. I suspect some things will have to change, and soon. But given the tight budget times, it doesn’t mean that companies are going to spend more. In fact, 43% of companies said their disaster recovery budget would decline in the next 12 months.
At Axxana, our sole reason to exist is to provide disaster recovery capabilities to organizations, so you might think that declining budgets for DR are bad news, but they’re not. No, in the world of disaster recovery, when budgets get tight and service levels aren’t being met, something needs to change. And that’s when organizations look for new, more-innovative ways to provide data protection and disaster recovery. That’s what we offer. We have a new class of data protection, Enterprise Data Recording (EDR), that actually enables companies to meet RTO/RPO service levels, while lowering the cost of data protection.
Maybe all data should have RPO=0.
I was thinking today about a conversation I had a few years ago with the storage administrator of a major financial services company. I wanted to understand his perspective on when zero data loss was important and when it wasn’t. He told me that his team spent a lot of time with the application developers and the business unit executives discussing the various recovery point objective (RPO) requirements and the cost of the various approaches. We’ve all been told that RPO should be tied to the business value of the application, and that we shouldn’t over-insure or under-insure our data. Over-insure and you waste money. Under-insure and you increase risk.
But then he told me the challenge wasn’t in determining the RPO requirements when an application was developed. The challenge was determining RPO requirements of applications that are part of a business process that is constantly changing. “I’m fine with the RPO requirements until some developer takes an application that used to be non-critical and puts it into the critical path,” he said.