Posts Tagged ‘Asynchronous’
What’s worse than losing your data?
Losing your data and having no backup.
What’s worse than having no backup?
Having a backup that restores inconsistent data.
That’s precisely the concern that Josh Kirsher raised on the April 10 Wikibon Peer Incite. A lot of people are buying insurance, in the form of snapshots of application data, and they leverage consistency groups, thinking this will insure that the data is application-consistent. It’s the application-consistent snapshot that companies use as source-volumes for off-site backups and asynchronous replication, and as on-premise application recovery points. And it’s consistency groups that enable applications to be restored in minutes rather than hours or days. Unfortunately consistency groups only work when procedures are perfectly designed, when they are perfectly followed, when they are constantly maintained, and when no one makes an error.
In today’s dynamic environment, where the servers on which applications run are virtualized, where applications are frequently moved from one physical server to another, where LUNs are quickly created, and volumes are added to and removed from LUNs on a daily basis, the probability of developing a perfect consistency-group process that is precisely followed and continuously maintained, without introducing any human error, is very low. That means that, when you need to call upon your insurance, which is the snapshot or the backup that you assume is application consistent, the probability is very high that the data will in fact be inconsistent and the time to restore consistent application data from paper source documents will be measured in days, not minutes or hours. And, for companies that primarily transact business electronically, they may not be able to reconstruct the data at all. This is the scenario that Tim Hays, of Animal Health International, avoided when he made the decision to protect everything. After all, if he could affordably protect everything, he didn’t have to worry about what he might miss.
Having two data centers, especially when they are separated by a significant distance, brings so many advantages, it’s difficult to name them all, but here are just a few:
- The ability to increase the frequency and quality of disaster recovery testing
- The ability to perform site maintenance and upgrades, while maintaining application availability
- The ability to rapidly restore applications and continue operations in the event of a regional disaster
Some organizations have eliminated tape and migrated to disk-based backup methods, leveraging various techniques for creating application-consistent snapshots. This approach can dramatically improve recovery times, but again, requires that the 3rd-party recovery location have all the necessary equipment and software in order to run the applications, once the applications and data are restored. And, again, the location must be unoccupied.
The reason organizations use 3rd-party disaster recovery service providers is, in part, because they don’t want to absorb the full cost of having a second location sitting idle, just in case a disaster happens. It is cost prohibitive for most organizations. But forward-thinking companies have recognized that application development and test environments can be re-purposed for production applications, when a disaster occurs. In this way, no infrastructure is wasted, and no systems are sitting idle. A two-data center architecture, with development, test, and disaster recovery in one location, and production in the other, provides the ideal approach for both resource efficiency and resiliency.
The biggest challenge for organizations may be to determine the best way to get all of the current application data from the primary production location to the development, test, and disaster recovery location. Asynchronous replication is clearly the approach of choice, in terms of cost and flexibility for locating the secondary site, but it ensures that some data will be lost. Many of you saw our recent announcement about Animal Health International becoming an Axxana customer. The approach that Animal Health took, combining asynchronous replication with disaster-proof protection of the synchronous lag, is precisely the approach that organizations should take. The combination gives organizations a complete solution that is both affordable and flexible.
What if I told you that I could take your existing asynchronous infrastructure…and turn it into remote synchronous, zero-data-loss replication?
No, I’m not a magician, and I don’t pull rabbits out of a hat. I’m Eli Efrat, Axxana’s CEO, and that’s how I began my latest video, which you can see on Axxana’s home page or on YouTube here. When we started Axxana, our CTO, Dr. Alex Winokur, understood all of the trade-offs that organizations make when choosing between asynchronous and synchronous replication.
Asynchronous replication is inexpensive and can be used over any distance. That’s good. But it also guarantees that in the event of a disaster, you will lose data. That’s bad.
Synchronous replication is expensive and can only be used over limited distances, typically a few 10s of miles. That’s bad. But, it guarantees that you won’t lose data from a single site disaster. That’s good. Read the rest of this entry »
Have you ever used this phrase?
Well, I guess that’s a risk I’m just going to have to take.
I’ll admit, I’ve said it myself, and it’s usually because either:
- I don’t think the risk is real
- I think the risk is low enough, that I’m willing to take a chance
- I think the risk is real, but avoiding the risk seems impossible
- I think the risk is avoidable, but the cost to eliminate the risk seems too high
One of my friends told me about a guy who likes to drive fast cars, so he rented an hour in one of last year’s race cars, took a lesson from a professional driver, and then had a chance to drive by himself on a race track. What better place to safely try your hand at driving fast! The race track offered him an insurance policy for $70 that would cover any damage to the very fast and expensive car. Confident that he was sufficiently trained and that driving a car built for speed on a track designed for fast cars was an acceptable risk, he declined the insurance. Unfortunately, you guessed it, he wrecked the car, and it cost him over $100,000 to pay for the damages.
Many data center managers think like this guy. They view the risk of data loss as relatively unlikely. After all, the important data is protected by RAID and backed up on a regular schedule. They might even use asynchronous replication to copy most of the data to another location. These things do protect against most data-loss risk. Many of the other risks are relatively rare; things like hurricanes, floods, tornadoes, earthquakes, tsunamis, fires, and building collapse. Perhaps they think that this risk is sufficiently low that they represent acceptable risks. It’s a risk they are willing to take. But what if the company takes the wrong bet? As with the wannabe race car driver, the cost of being wrong is huge. And what if protecting against the other risks was as affordable as the $70 insurance policy? Shouldn’t the company buy it?
Think about what is possible with Axxana’s Phoenix System. It’s time to re-evaluate the notion of acceptable risk in your data center and your disaster recovery plan.
The U.S. Department of State has issued travel warnings to U.S. citizens for thirty-one (31) countries. According to their website:
“Travel Warnings are issued when long-term, protracted conditions that make a country dangerous or unstable lead the State Department to recommend that Americans avoid or consider the risk of travel to that country…”
Despite the reported risk, what’s common to many of these countries is the fact that medium-sized and large corporations continue to operate significant businesses there. Sometimes the corporations have regional headquarters in one of these countries. Other times, as with Axxana, they may be headquartered there. And far too few corporations are adequately prepared to continue operations when the country where they are operating becomes unstable. Read the rest of this entry »
It is an unfortunate fact that high bandwidth communication lines are required for metropolitan-area synchronous replication. They are also needed for frequent asynchronous transmissions of snapshots to a remote disaster recovery center. When we meet with companies in the U.S., the U.K. or Central Europe, they may complain about the cost of bandwidth for replication, but at least the bandwidth is available at a price. Anyone with enough money can get as many 1 Gb/sec lines as they need, which will do nicely to protect the data for most applications. And they can take those lines and use them with their favorite storage-controller based, triple-site replication software.
In Johannesburg, South Africa, a company might be lucky to get a pair of 40 Mb/sec lines, which in most cases won’t be enough to protect all of the company’s data. And the cost will be outrageous. So triple-site replication approaches are almost unheard of there. The world may be getting increasingly flat, but it’s a mistake to believe that every region of the world has equal access to an affordable, abundant supply of communications resources. Read the rest of this entry »
You may have heard, there was a magnitude 5.3 earthquake in Serbia this week. Earthquakes are relatively common in Serbia. They’ve had more than fifteen >4.0 magnitude earthquakes in the past 10 years, but this was the strongest.
Serbia is developing into a banking and trading hub, and so having trading systems that are up and available is important. It’s also important that the banks and trading companies not lose data, and earthquakes are not their friend. These companies don’t operate on the same scale as London, New York, or Tokyo. While they can’t afford the latest EMC VMAX and multi-hop replication, they still need to protect as much data as possible, in the event of a disaster. Even if the companies could afford VMAX, high-bandwidth communication links needed to support metropolitan-area synchronous replication and wide-area asynchronous replication are very expensive. RecoverPoint from EMC is very popular in the country, because it enables companies to move periodic, application-consistent updates to a disaster recovery site or secondary data center. But, there’s a tradeoff that companies in Serbia are making, even with RecoverPoint. Read the rest of this entry »
I was talking to John McArthur the other day about a use case we are looking at with a customer in Canada. The customer doesn’t want to lose data if and when a disaster hits their primary data center, and their service provider’s DR data center is more than 100 miles away. Therefore, the customer is presented by the storage vendor with two options – either do Multi-Hop (one of two ways for deploying a 3 data center topology for replication over long distances, trying to lose as little data as possible – EE) or go with the new Axxana Phoenix solution. John asked me what I thought of this, and I naturally answered that Multi-Hop is too expensive, too complicated to deploy and doesn’t really solve the problem… (of long distance synchronous replication… – EE). John liked my answer… he said so… and I repeated the 3 reasons why I thought Multi-Hop doesn’t really cut it…: “It is too expensive… too complicated to deploy, and doesn’t really solve the problem…”
As I said that again, I recalled an excellent joke involving a consultant and a flock of sheep:
A shepherd was herding his flock in a remote pasture, when, suddenly, a brand-new BMW advanced out of the dust cloud towards him. The driver, a young man in a Brioni suit, Gucci shoes, Ray Ban sunglasses and YSL tie, leaned out the window and asked the shepherd… “If I tell you exactly how many sheep you have in your flock, will you give me one?” The shepherd looked at the man, obviously a yuppie, then looked at his peacefully grazing flock and calmly answered, “Sure.”
The yuppie parked his car, whipped out his IBM ThinkPad and connected it to a cell phone, then he surfed to a NASA page on the internet, where he called up a GPS satellite navigation system, scanned the area, and then opened up a database and an Excel spreadsheet with complex formulas. He sent an email on his Blackberry, and, after a few minutes, received a response. Finally, he prints out a 130-page report on his miniaturized printer, then turns to the shepherd and says, “You have exactly 1586 sheep. “That is correct; take one of the sheep,” said the shepherd. He watches the young man select one of the animals and bundle it into his car.
Then the shepherd says: “If I can tell you exactly what your business is, will you give me back my animal?” “OK, why not,” answered the young man. “Clearly, you are a consultant,” said the shepherd. “That’s correct,” says the yuppie, “But how did you guess that?” “No guessing required,” answers the shepherd. “You turned up here although nobody called you. You want to get paid for an answer I already knew, to a question I never asked, and you don’t know crap about my business…Now give me back my dog.”
So how do you know a storage sales rep is selling you on the idea of multi-hop replication?! It’s easy… no guessing required… it’s just too expensive… too complicated… and doesn’t really solve the problem… If you want Synchronous Replication over your existing Asynchronous lines… whatever the distance may be between your data centers… you need Axxana.