Posts Tagged ‘Disaster Recovery’
For more than 20 years, storage system suppliers have been using software to compensate for the fact that spinning hard disk drives fail. The software, which can be embedded in a hardware device like a RAID controller or running as software on a server, is designed to recover data, when data is lost. And given the challenges of hard disk drive designs, without some sort of protection, data will be lost.
In a hard disk drive the read/write head literally flies on a cushion of air above the surface of the platter. The height and speed of the flight was once compared to flying a fighter jet three inches off the ground. But when disk drives are put into a storage system, where vibration, heat, and other interference can be transferred from one disk drive to another, it’s actually more like trying to fly multiple fighter jets in formation three inches off the ground. And all of that assumes that the storage system is in a clean, temperature-controlled environment, with conditioned power, and limited floor vibration. Using the fighter jet example and with magnetic heads flying over disk drive platters at a height of 3 microns, a single dust particle, which averages 500 microns, would appear over 40 feet high to the fighter jet, and a typical smoke particle would appear over 80 feet tall.
Now imagine you want to build a storage system that can protect the data on your disk drives, in the event that there is a fire in the data center, the building floods, smoke fills the room, the floor shakes from an earthquake or explosion, or the ceiling collapses. Today’s spinning disk drive technology is not designed to survive these types of physical conditions. Fortunately, flash memory and solid state disk, which are much more tolerant of a wide range of adverse conditions, have become much more affordable, and now can be applied to solve some of these complex data-protection challenges. That is the subject of Dr. Alex Winokur’s speech entitled “Flash Forward for Reliable Data Protection and Recovery” at the upcoming Flash Memory Summit in Santa Clara, California on August 9th. We hope that you can attend.
Have you ever used this phrase?
Well, I guess that’s a risk I’m just going to have to take.
I’ll admit, I’ve said it myself, and it’s usually because either:
- I don’t think the risk is real
- I think the risk is low enough, that I’m willing to take a chance
- I think the risk is real, but avoiding the risk seems impossible
- I think the risk is avoidable, but the cost to eliminate the risk seems too high
One of my friends told me about a guy who likes to drive fast cars, so he rented an hour in one of last year’s race cars, took a lesson from a professional driver, and then had a chance to drive by himself on a race track. What better place to safely try your hand at driving fast! The race track offered him an insurance policy for $70 that would cover any damage to the very fast and expensive car. Confident that he was sufficiently trained and that driving a car built for speed on a track designed for fast cars was an acceptable risk, he declined the insurance. Unfortunately, you guessed it, he wrecked the car, and it cost him over $100,000 to pay for the damages.
Many data center managers think like this guy. They view the risk of data loss as relatively unlikely. After all, the important data is protected by RAID and backed up on a regular schedule. They might even use asynchronous replication to copy most of the data to another location. These things do protect against most data-loss risk. Many of the other risks are relatively rare; things like hurricanes, floods, tornadoes, earthquakes, tsunamis, fires, and building collapse. Perhaps they think that this risk is sufficiently low that they represent acceptable risks. It’s a risk they are willing to take. But what if the company takes the wrong bet? As with the wannabe race car driver, the cost of being wrong is huge. And what if protecting against the other risks was as affordable as the $70 insurance policy? Shouldn’t the company buy it?
Think about what is possible with Axxana’s Phoenix System. It’s time to re-evaluate the notion of acceptable risk in your data center and your disaster recovery plan.
Data privacy has been in the news a lot lately, and reports of information compromise are frequent. Countries are getting serious about data privacy and are imposing stiff fines for failure to adequately protect personal information. According to King and Spalding, in a March 25, 2010, Corporate Practice Group Client Alert, ”The UK data protection authority, the Information Commissioner, will have powers to issue fines of up to £500,000 against companies who breach UK data protection laws from 6 April 2010.” King and Spalding go on to explain that the power to impose the fine can be exercised if “the Information Commissioner is satisfied that the breaches are ‘serious’ and of a kind likely to cause substantial damage or distress and provided the company either deliberately breached data protection laws or knew (or should have known) that there was a risk that a breach would occur but failed to take appropriate action.”
These laws are focused on the unauthorized release of personal information, not the protection of information against loss, deletion or destruction. At the same time, however, laws already exist in some industries, such as financial services, which mandate disaster recovery capabilities and disaster recovery testing. In addition, laws, such as the Safety Act, which requires the preservation of information, have been introduced in order to improve the ability of law enforcement organizations to locate individuals who are engaging in illegal activity on the internet.
What is interesting about about some of the information privacy laws is that liability can be assigned and fines assessed even when companies do not know that a risk of data compromise exists. The requirement is that they “should have known.” It’s a matter of corporate responsibility to know what is possible and take reasonable efforts to protect against bad events. It is not difficult to imagine that a similar responsibility test may be applied in disaster recovery and data retention laws. Organizations will likely be held accountable for what they “should have known,” if they failed to act. Given advances in data replication, deduplication, storage tiering, and data archiving technology, organizations should know that all data can be affordable replicated to multiple sites, all data can be protected through a wide range of disasters, and all data can be affordable archived. Consider yourself informed.
This week, as thousands of IT professionals converge at EMC World, many will be getting their first look at Axxana’s Phoenix System RP. The system, which integrates with EMC RecoverPoint and all RecoverPoint-supported platforms, forces IT professionals to change the way they think and to imagine what was previously thought impossible. Now it truly is possible for companies to protect all of their data over any distance through a wide range of disasters. It is not only possible, but it is affordable for virtually any mid-sized and large enterprise customer. And thanks to RecoverPoint’s and the Phoenix System RP’s integration with VCE Vblocks, zero data loss over any distance is also possible for smaller companies that are leveraging public cloud infrastructures based upon VCE Vblocks.
If you look at the home page of our Axxana website, you will see that we have changed our banner this week to honor other great innovators. These individuals imagined and created what was previously thought impossible. The Wright brothers proved that flight was not just for birds, bees, and bats, but that man, too could fly. Alexander Graham Bell proved that people could remain connected and communicate, hearing each others’ voices over vast distances. John Bardeen and his colleagues, who developed commercially available transistors proved that electronics could be made affordable for the masses, and Albert Einstein, well, he changed just about everything we thought about the physical world.
In his book, The Black Swan, The Impact of the Highly Improbable, Nassim Taleb explains how Europeans could not imagine black swans until they actually saw them. Just like black swans, many will not believe that they can recover their data from the ashes, until they see the Phoenix System RP. No one today denies the existence of black swans, and everyone can imagine them. Soon, no one will doubt the ability to protect all data and recover it from the ashes, from the floods, from an earthquake, or from a building collapse. If you are at EMC World, please stop by and see for yourself. We are at Booth 605.
In the world of software development most testers know the phrase “Happy Path.” Happy Path is when you have a very well defined test case that has no unusual events. If you test your software on the Happy Path, it works, and everyone is happy. Unfortunately, just because software works on the Happy Path, doesn’t mean it will work in production environments, under heavy loads, and under a variety of fault conditions.
Too often, today, disaster recovery testing is done on the Happy Path. If you tightly control the disaster recovery test plan, you can prove to your executive team and your auditors, that you can recover your data, your applications and your business processes. But get off the Happy Path, and the probability of recovery quickly approaches zero.
The real world is filled with randomness and unpredictable events. Often, multiple bad events occur in close succession or even simultaneously. But because organizations test their recoverability on the Happy Path, they delude themselves into thinking that they can actually recover in the event of a real disaster. I think it’s time that we all get off the Happy Path and start building disaster recovery plans that enable organizations to survive the unpredictable and the unplanned.
Geary Sikich wrote an interesting article on Continuity Central, titled “Unrealistic Scenarios? C’mon Man!” In the article Geary compares linear planning to nonlinear planning, and he argues that nonlinear planning is required to develop “truly resilient plans and capabilities.”
He describes linear thinking as:
“A process of thought following known cycles or step-by-step progression where a response to a step must be elicited before another step is taken.”
He describes nonlinear thinking as:
“Thinking characterized by expansion in multiple directions, rather than in one direction; based on the premise that there are many points from which one can apply logic to problem.”
In the summary of the article, Geary wrote one statement really stuck out for me, which I think applies to many organizations today:
“Current planning techniques are asking the wrong questions precisely; and we are getting the wrong answers precisely; the result is the creation of false positives;”
Current linear planning techniques argue that the business managers must classify the business value and importance of each of the applications and business processes that support the business, and prioritize business continuity plans to maintain or recover the “important” applications and processes. The challenge with that approach is that environments are dynamic, applications are in constant flux, and the importance of applications to a process may change over time.
We recently met with an investment management company that has a staff of seven, four of which support applications and 3 of which support infrastructure (servers, storage, networks). The company has over 300 custom applications and 150 packaged applications, supported by four application developers. For a company with this many applications and this few staff, it is virtually impossible to maintain a ranking of the relative importance of each of the applications.
At Axxana, we would argue that a company which asks “Which applications are important?” will get a very precise answer. But answering that question will lead to the false positive that the company can restore the business to full operations.
If you don’t have both a High Availability site locally and a replicated site for system maintenance and disaster recovery some distance away, would it be best to have just the HA site or the replicated disaster recovery site?
With regard to the HA option, Kathleen Lucey, President of Montague Risk Management, and a business continuity management expert pointed out:
If what you are talking about is local clustering in the same site, then I would not consider this to be HA. The protection afforded by a same-site clustering solution is limited to failover to the designated backup server in the event of a failure of the primary. A larger local event could take down the entire cluster, and so this is not really HA, but more properly local hardware backup. Read the rest of this entry »
If you work in the area of Business Continuity Management (BCM), you are probably aware of BS 25999, published by the British Standards Institution. BS 25999 is the Institution’s standard for BCM. BS 25999 was actually published in two parts:
- BS 25999-1:2006 Business continuity management. Code of practice.
- BS 25999-2:2007 Business continuity management. Specification.
The first publication deals with the should of the standard. If an organization is considering the development or enhancement of a business continuity management program, the publication provides a comprehensive set of factors that the organization should consider. It is a set of recommendations and guidelines, not a set of requirements.
The second publication deals with the shall of the standard, meaning that, if an organization wants to claim that they have met the standard, as certified by the BSI, then these are the things that the organization must do. Read the rest of this entry »
The U.S. Department of State has issued travel warnings to U.S. citizens for thirty-one (31) countries. According to their website:
“Travel Warnings are issued when long-term, protracted conditions that make a country dangerous or unstable lead the State Department to recommend that Americans avoid or consider the risk of travel to that country…”
Despite the reported risk, what’s common to many of these countries is the fact that medium-sized and large corporations continue to operate significant businesses there. Sometimes the corporations have regional headquarters in one of these countries. Other times, as with Axxana, they may be headquartered there. And far too few corporations are adequately prepared to continue operations when the country where they are operating becomes unstable. Read the rest of this entry »
Roger Bilham, a professor of geological sciences at the University of Colorado, was quoted in an Associated Press article, saying, “It’s our fault for not anticipating these things. You know, this is the Earth doing its thing.” That quote was prompted by a very bad year filled with natural disasters. According to the article, natural disasters killed more than 260,000 people through November of this year. It was a bad year, especially compared to 2009, during which only 15,000 died from the effects of natural disasters.
One of the problems with natural disasters is that they are hard to predict. Actually, it’s not hard to predict that one will happen. That’s a certainty. The problem comes when you want to know when and where. But there are some things that we do know. We know that more people are living near earthquake fault lines. In fact, the article states that if the earthquake in Haiti had occurred in 1985, instead of 2010, the death toll would have been around 80,000, because there were so fewer people living in Haiti in 1985. We also know that more people are living in flood plains. And although there are people willing to debate the cause, we also know that the earth is getting warmer, which will increase certain types of natural disasters. Read the rest of this entry »