Posts Tagged ‘snapshots’
What’s worse than losing your data?
Losing your data and having no backup.
What’s worse than having no backup?
Having a backup that restores inconsistent data.
That’s precisely the concern that Josh Kirsher raised on the April 10 Wikibon Peer Incite. A lot of people are buying insurance, in the form of snapshots of application data, and they leverage consistency groups, thinking this will insure that the data is application-consistent. It’s the application-consistent snapshot that companies use as source-volumes for off-site backups and asynchronous replication, and as on-premise application recovery points. And it’s consistency groups that enable applications to be restored in minutes rather than hours or days. Unfortunately consistency groups only work when procedures are perfectly designed, when they are perfectly followed, when they are constantly maintained, and when no one makes an error.
In today’s dynamic environment, where the servers on which applications run are virtualized, where applications are frequently moved from one physical server to another, where LUNs are quickly created, and volumes are added to and removed from LUNs on a daily basis, the probability of developing a perfect consistency-group process that is precisely followed and continuously maintained, without introducing any human error, is very low. That means that, when you need to call upon your insurance, which is the snapshot or the backup that you assume is application consistent, the probability is very high that the data will in fact be inconsistent and the time to restore consistent application data from paper source documents will be measured in days, not minutes or hours. And, for companies that primarily transact business electronically, they may not be able to reconstruct the data at all. This is the scenario that Tim Hays, of Animal Health International, avoided when he made the decision to protect everything. After all, if he could affordably protect everything, he didn’t have to worry about what he might miss.
When I ask the question, “What’s your RPO?” I typically get an answer like “We’ve got an RPO of 5 minutes.” If I ask “How much data are you willing to lose?” I’ll hear a similar answer. But if I ask someone, “How much data is in your storage system?” they don’t answer me in minutes.
The data created by the mission-critical applications that support businesses don’t get updated in an even, regulated way. The update rates are almost always highly variable, full of transaction peaks and valleys. The peaks can happen during predictable times, like a holiday shopping season, and during unexpected times, like when there is panic buying before a hurricane.
That’s the not-so-funny disconnect between data and disaster recovery. We don’t measure data in minutes. We measure it in GBs. Wouldn’t it be better to set our snapshots and our recovery points based upon how much data has changed, instead of how much time has passed?
I have to credit our CTO, Dr. Alex Winokur, for helping me think this through. But, I’ve decided that, unless your RPO is zero, RPO doesn’t matter. Instead, we need to make sure we’ve protected the data to the very last byte.