Archive for the ‘Data Communications’ Category
Everyone loves twins. They just make you smile. And if you have twins, you’re so happy, you don’t think about the cost. That’s true of twin babies, but it’s not true with twin data centers.
On yesterday’s webinar with Gil Chaouat, we had an interesting discussion. Gil discussed a global company with which he had recently met. The company has two data centers in every region. Two data centers per region is a common approach for organizations like this one, that want to deliver a zero data loss environment. There were two things that really stood out for me:
- Two data centers per region is really expensive
- One is sometimes better than two
For this particular company, one of the biggest costs, in addition to the cost of the second data center in every region, was the cost of the high-speed fiber connections between each of the two data centers. And, of course, because fiber connections can break, they had to have two. Twice the cost. They used fiber connections, because they wanted synchronous data replication. It was the only way they could deliver zero data loss. So why is one data center better than two?
This company had no replication between regions. That would have added even more cost, and they wanted to reduce spending, not increase it. And replication between regions, which for them would have been North America to Europe and Europe to Asia, can only be done asynchronously, which means they will lose data in a disaster.
What they are discussing is eliminating the second data center and implementing Axxana’s black box. That way they can use asynchronous replication over IP lines between regions, eliminate the cost of the redundant fiber links, and use Axxana to deliver zero data loss. Doing this also ensures zero data loss in regional disasters, disasters which might have impacted both data centers in one region. So in this scenario, one is better than two.
Of course, when it comes to babies, two is better than one.
Too often, business continuity planning and disaster recovery planning are treated as the same functions. Unfortunately, they are not. Business continuity planning helps organizations insure that applications and processes continue through the myriad of day-to-day disruptions that might occur. These include IT component failures, such as disk-drive failures, a server failure, a dropped network link, or an application bug. Disaster recovery planning helps organizations recover operations after less frequent, but far more devastating events, such as fires, floods, hurricanes, earthquakes, and a variety of man-made disasters. While the data center strategy is only one component of business continuity and disaster recovery planning, it is a key component. And while business continuity and disaster recovery planning are different functions, they must often be considered together, because of budget limitations.
There are plenty of advantages to having a business continuity data center in region, a very short distance from the production data center. If the data centers are very close, there will be little impact on transaction latency for the always-important two-phase database commit. Failover times from the production data center to the business continuity data center can be very short. Staff that normally work at the primary data center can easily show up for work at the in-region business continuity data center. WAN charges between the primary and business continuity data centers will be relatively low.
The problem with an in-region business continuity data center is that it can’t replace an out-of-region disaster recovery data center. The two are simply too close for comfort. And few organizations can afford three data centers. Following are a few of the types of disasters that can prevent an in-region business continuity data center from acting as a disaster recovery data center:
- Electrical-grid failure
- Telecommunications failure
- Transportation systems failure
- Chemical spills
- Radiation leaks
- War, terrorism, and civil unrest
For these types disasters, it is much more likely that both in-region data centers will be affected and much more challenging to recover applications and data. One of the trade-offs organizations must make is between how quickly they recover and how certain they are that they can recover from the range of disasters that could strike them. We believe that a slight increase in recovery time is well worth the additional assurance that you can actually recover applications after a disaster. Using an in-region business continuity data center as a disaster recovery data center is a little like doing a tandem sky dive. It’s fine, as long as nothing goes wrong.
Looking for articles on data replication and disaster recovery techniques, I discovered an interesting post by Hu Yoshida, VP and CTO at HDS. In it, Hu describes the benefits of the Hitachi Universal Replicator (HUR) and its unique pull capability. Read the blog post for more detail, but I think it is important to review and comment on one statement. Hu wrote:
Combinations of synchronous and asynchronous replication may be used to provide out of region disaster recovery with no data loss.
If you take a look at this image, you will see a four (4) data center design. The idea is that an organization that really cares about disaster recovery and zero data loss should have both a Primary Data Center and an In-region Recovery Data Center within synchronous-remote copy distance. The architecture then allows both data centers to replicate using asynchronous remote copy to one or more Out-of-region Remote Data Centers.
Disaster recovery architectures are designed to protect against infrequent, but catastrophic events. Looking over short time periods in a specific geographic region, these events are, in fact, infrequent. But looked at over longer time periods, disasters are a near certainty. Now, the assumption that must be met in order to support the claim of zero data loss is that an in-region disaster won’t affect both In-region Data Centers. But if the assumption is not met, data will in fact be lost.
One might argue that the probability of losing both In-region data centers is very small. And the probability that two data centers will be knocked out of commission by the same man-made or natural disaster is dramatically higher with two close-by, in-region data centers than it is between two widely-separated data centers.
Having two data centers in one region adds substantial additional cost for little incremental protection. Business continuity protection does not require a Recovery Data Center. The more elegant approach is to provide business continuity capabilities from within the Primary Data Center, continue to use a Remote Data Center for disaster recovery, and protect yet-to-be-transferred data (the most recent snapshot deltas, and post-snapshot changed data) in our Phoenix System black box located within the Primary Data Center. The probability of losing data within our black box is near zero. Our approach provides substantially more data-recovery insurance for dramatically less money.
For almost 8 years, Sebastian Darrington, who maintains a much-read blog under the title The Storage Chap, has been discussing and debating various approaches to disaster recovery and business continuity planning. In June, he posted an article on The Six Differentiating Features of RecoverPoint. Add to that the excellent post he wrote in January on Zero Data Loss with Asynchronous Replication, and you’ll get a pretty complete picture of the EMC/Axxana benefits.
The sixth differentiating feature he mentions in his June blog post is Bandwidth Reduction. It is specifically this feature that makes RecoverPoint and Axxana the perfect combination for organizations that operate in regions that have less-reliable or very expensive communication costs. Of course, regardless of where an organization operates, there’s no sense in throwing away good money on bandwidth. But in some regions of the world, you can’t get good bandwidth, regardless of the price.
Let’s face facts. Data backup, data protection, and disaster recovery are difficult. There are more data and more applications to protect and less time to do it. And there are a growing number of risks against which you have to protect your data and applications. Thanks to application and data growth and the integrated nature of applications that support today’s business processes, old data protection and disaster recovery methods simply won’t work. There’s too much complexity and too little time.
Thankfully there are new approaches and new technologies to solve data protection and disaster recovery challenges. Innovative suppliers, like Axxana, are eager to win your trust and win your business. That’s the good news. The bad news is that you don’t have the time to evaluate all of the new technologies and all of the new suppliers to figure out what works and what doesn’t and who to trust and who not to trust.
Faced with a challenge like that, what do you do? If you are like most people, you talk to the people you trust the most. Your larger, long-time suppliers are a logical choice. They have a lot to lose, if they guide you down the wrong path. That’s one of the reasons we chose to partner with the leading information infrastructure supplier, EMC. EMC stands behind the rigorous tests conducted in their ELab, so you don’t have to wonder if our Phoenix System works.
Another logical choice is your peers. What makes peer groups so helpful is that your peers have no financial interest in your decision. They’re not the incumbent supplier, and they’re not the new kid on the block. Their interest is in preserving and enhancing their reputation, which will only be damaged, if they steer you down the wrong path.
We are very pleased to tell you that one of your peers, Tim Hays, VP of IT at Animal Health International, will be talking on a Wikibon Peer Incite, on Tuesday, April 10 at noon Eastern Daylight Time. His topic is how he implemented an affordable zero-data-loss disaster recovery solution; one that eliminates the need to classify data, but instead protects all of the company’s production data. Not only will he be presenting, but he’ll be available to answer your questions. If you are thinking about re-architecting disaster recovery, building a second data center, are concerned about the cost of ensuring data protection, or simply can’t figure out how to affordably protect all of your production data, I encourage you to attend.
Dial-in instructions are below. No registration is required.
Date: Tuesday, Apr 10, 2012
Time: 12:00pm – 1:00pm ET (9:00am – 10:00am PT)
I hope you have already had a chance to watch the video of Tim Hays, Vice President of IT at Animal Health International talking about why he chose Axxana. If not, please stop reading now and go watch it. Here’s the link.
If you listened to Tim, you’ll know that he chose Axxana because of three factors:
- Axxana improved his recovery capabilities
- Axxana integrated with his existing EMC CLARiiON, VNX, and RecoverPoint infrastructure
- Axxana cost him less than historical methods of synchronous replication
Animal Health International is representative of thousands and thousands of cost-conscious organizations. They know how large, global institutions are protecting their data, but in Tim’s words, with these traditional methods of synchronous replication, “The cost of the equipment, the cost of the telecommunications for synchronous I/O were just not at a cost level we were willing to support.” And with Axxana, Animal Health International gets an even higher level of data protection at a dramatically lower cost.
Tim recognizes that some companies have already made a very large investment in traditional synchronous replication. Often those companies chose an expensive synchronous solution, because at the time of the decision, there were no lower-cost alternatives available. Besides, many were require by law or regulation to have the highest levels of data protection, as, for example, in the financial services industry. For those companies, Tim says, ”Here’s an opportunity…to reduce those costs.”
The fact is, these organizations already have their processes in place, and few organizations like to disrupt a process that is working, even if it is costly. That’s OK with us. Eventually, those organizations will recognize the our approach not only lowers cost, but provides superior disaster recovery capabilities. In the meantime, we’ll be plenty busy serving the needs of organizations like Animal Health International.
On 1 May 2011, French investigators recovered the flight data recorder from Air France Flight 447, twenty-three (23) months after the Airbus 330 plunged into the Atlantic Ocean on a flight between Rio de Janeiro and Paris. At the time of the tragic crash, and for months following, there was a great deal of speculation regarding what caused the crash. Was it mechanical failure, pilot error, or some combination of both? With the release of the information from the flight data recorder, including the full transcripts of the cockpit voice recordings, investigators now have a clear picture of what occurred. And from analysis of the retrieved data, they can make recommendations to airplane manufacturers and to pilot training programs on how reduce or eliminate these kinds of tragedies.
In the case of Flight 447, investigators did have some data from the automatic transmissions, but the data was incomplete. Over the past several decades, and as storage media has advanced from magnetic tape to solid state disk, the airline industry has been able to increase the amount of data that flight data recorders store and protect. And now, with the information from Flight 447′s flight data recorder, which was retrieved from the ocean floor, two miles below the surface, the picture is now complete.
Eye-witness accounts are notoriously unreliable, as this Stanford Journal of Legal Studies article and this APA Monitor article attest. And the stress that comes during and after disasters strike only serves to increase the unreliable nature of eye-witness testimony. When disasters strike a business and data is lost, it is sometimes possible to reconstruct data from source documents, but source documents are sometimes lost. Data can also be reconstructed from memory, but, as research shows, memories can be flawed.
For the airline industry, the capture and protection of data in flight data recorders before and during disasters, and the analysis of data after disasters, have been critical to ensuring that airlines are the safest mode of travel. Still the industry is looking to constantly improve. Imagine if, rather than 23 months, the data in the flight data recorder had been recoverable immediately. Imagine that rather than having to be found, the data could have been extracted automatically. Then the analysis of the cause and the development and modification of procedures that might prevent future tragedies could have begun almost immediately.
On Sunday, there was a very small fire at a 400,000 square foot data center in Mahwah, New Jersey. The data center is very important, because it houses NYSE Euronext’s matching engine, which matches buy and sell orders for high-frequency trading. The fire was quickly extinguished, and, although communications were temporarily disrupted between 58 trading companies, everything was restored prior to the opening of markets on Monday.
As fires in data centers go, the timing of this fire was extremely fortunate. At Euronext, microseconds matter, and millions of shares trade in milliseconds. Imagine what would have happened had the fire occurred on one of those now-too-frequent market meltdown days. Imagine that the record of the trades that occurred in the last few seconds before the fire were somehow destroyed and there was no second copy.
Protecting the transaction record for all of these trades matters, down to the last microsecond. You have to get it right. No one wants a call from a customer saying:
You mean to tell me that you didn’t protect the records of my transactions?
Because complete protection is now possible and because Axxana helps companies save money, there really are no excuses. Pretty soon, companies will be saying:
Sure, I have Axxana. Doesn’t everybody?
There’s a LinkedIn group called BCMIX – Business Continuity Management Information eXchange. There are over 7,000 members of this group, which I think shows just how important Business Continuity Management is in organizations today. Members can post questions to the community and get advice from other professionals who are struggling with the same issues. I’m paraphrasing here, but some of the recent topics were:
- Can you develop a profile for what types of individuals are able to manage disasters?
- How do you determine the RTO for critical systems and applications?
- What is the ROI from a Business Continuity Management Program?
I’m always interested in the calculation of an ROI on an intangible such as a BCM program, because the true value of it, like insurance, is not really calculable until after the event. I mean what is the ROI on a fire extinguisher?
There’s really no ROI on a fire extinguisher until you need it, which, hopefully is never. But, if you do have a fire, you want the fire extinguisher that works well with the type of fire you have. There are different types of fires and different types of fire extinguishers for each type of fire. There are also combination fire extinguishers that work with more than one type of fire. For those of you who want a quick tutorial on fires and fire extinguishers, here’s a helpful website: Fire Extinguisher: 101.
Once you’ve decided what risks you want to reduce, then you should get the best possible protection at the lowest possible cost. And that’s where the ROI comes in. Our Phoenix System is like a combination fire extinguisher, because we protect data through a wide variety of disasters: floods, fires, earthquakes, bombings, hurricanes, building collapse. But we have something else going for us. We actually lower the cost of data protection, by reducing data communications costs when replicating data over distance.
Maybe there’s no way to determine the return on a Business Continuity Management plan, but once you’ve made the decision to put a plan in place, you might as well have the best possible coverage at the lowest possible cost. To help you understand the savings that an Axxana Phoenix System investment can provide, we developed an ROI white paper. I hope you find it helpful.
What if I told you that I could take your existing asynchronous infrastructure…and turn it into remote synchronous, zero-data-loss replication?
No, I’m not a magician, and I don’t pull rabbits out of a hat. I’m Eli Efrat, Axxana’s CEO, and that’s how I began my latest video, which you can see on Axxana’s home page or on YouTube here. When we started Axxana, our CTO, Dr. Alex Winokur, understood all of the trade-offs that organizations make when choosing between asynchronous and synchronous replication.
Asynchronous replication is inexpensive and can be used over any distance. That’s good. But it also guarantees that in the event of a disaster, you will lose data. That’s bad.
Synchronous replication is expensive and can only be used over limited distances, typically a few 10s of miles. That’s bad. But, it guarantees that you won’t lose data from a single site disaster. That’s good. Read the rest of this entry »