Posts Tagged ‘Data loss’
In the last 30 days, Google, LinkedIn, and Microsoft all had outages in their cloud-based services. Caroline Craig of InfoWorld wrote in a recent article, “Crashes are inevitable in the cloud. The trick to a successful cloud strategy is to design for the impending failure.” Everyone using cloud services should expect some downtime, since, as Craig writes, “… no cloud-based service offers a 100 percent uptime guarantee.”
But here’s the problem: InfoWorld, Google, LinkedIn, and Microsoft, in all of their analysis, their post-mortems, and their root-cause analysis disclosures, only talk about downtime. Downtime is only half of the story. Where’s the discussion regarding data loss? That’s the elephant in the room that everyone sees, but few are talking about.
If you want a 100 percent guarantee, I’ll give you one. It’s this.
Every time there is downtime in a cloud service, there is data loss.
If there’s always data loss, then why don’t people talk about it? That’s easy. You don’t even have to be a customer to know whether most cloud services are up and running. Twitter users are more than happy to tweet that a cloud service is down. And most reputable cloud-based services disclose downtime to the world through their status page. Here are just a few examples:
None of these services, however, publicly report the data loss that occurs every time they have an outage. Part of the reason may be that they don’t know. And unlike downtime, data loss doesn’t usually affect everyone, so only the affected person or company knows. While it’s frustrating and costly for the customer, especially if it’s a business customer, they are unlikely to tell anyone that they lost data. In fact, one of the few who acknowledged the potential for data loss after the recent Google Drive outage was @john_mccoubrey, who tweeted, “Some of my docs on Google Drive appear to have been removed. Hope this is just a temporary glitch.”
It’s time we all started talking about the impact of data loss. You know it’s there, even if you’re not talking about it.
Imagine you live in New York. You’ve booked a cross-country flight to attend a party in San Diego, California, where your parents have retired. Your entire family is about to celebrate your parents’ 50th wedding anniversary. It’s a once in a life-time event. You have been selected by your brothers and sisters to give the speech to honor them. You, yourself, feel both proud and honored to have been selected. It’s Wednesday morning, and you’ve left yourself a full day to get there. You’re a frequent traveler, and you know that flights get delayed. Especially this week, and especially today. Everyone has remarked about how appropriate it is that this special anniversary falls on Thanksgiving Day. Today is the day before Thanksgiving, and it’s the busiest travel day of the year.
When you arrive at the airport for check in, the agents at the counters are flustered and fellow travelers are obviously upset. The computers are down. You’re told they’ve been down for 10 minutes. Even though you are not surprised by the outage, you feel your tension increasing. But after another five minutes, the computers come back up, everyone breaths a collective sigh of relief, you check in, and the flight, though slightly delayed, departs nearly on time. The pilot apologizes for the delay and tells the passengers, “I think I can make up the time.”
Now, imagine that when you arrive at the airport for check in, everyone is smiling. The computers are up. Check-ins are being processed, baggage is being checked. The lines are flowing smoothly. But when you reach the check-in desk, the agent tells you, “I’m sorry. I have no record of your reservation. You don’t have a seat on this flight.” You’re angry. She’s sympathetic, but says “There’s nothing I can do. The flight is full.” You demand a seat on the next flight. But it’s the busiest travel day of the year. All other flights are booked. In fact, over sold. The next available seat is Friday, the day after the party. It turns out that the airline has lost a small amount of data. It was a few kilobytes among terabytes of reservation data. Someone at the airline knew there had been a recent data-loss incident, but they didn’t know what was in the data. Unfortunately, it was the data that held the record of your reservation. And you miss this once-in-a-lifetime event.
As customers, there is a fundamental difference between the way we perceive downtime and the way we perceive data loss. In most cases, short-term downtime is, at worst, an annoyance. How many times have you walked into a retail store or a restaurant, a bank, a package shipping service, or an airline check-in desk and heard, “Sorry, the computers are down”? It happens. When computers go down, companies lose productivity. If you are in a hurry and can’t wait or won’t come back, companies may lose a sale. But unless the computer outages are frequent or prolonged, you reluctantly tolerate short term outages. After all, you can see that the computers are down. Everyone is affected. We’re all in this together. It’s not personal.
Contrast that with data loss. If a company actually lost all of their data, you would know it’s not personal. Just incompetence or a catastrophic event; one that everyone can see. But that’s not the way most data loss happens. Most data is protected. Just not all. So, data loss, more than downtime, becomes personal. That airline lost your reservation. That shipping company lost your package. That bank lost your deposit. Unless you’ve got some physical or electronic receipt, you can’t see or prove data loss. By its very nature, it’s often unseen. You say something happened. They say it didn’t. One of you is lying. And you know it’s them. So when a company loses your data, you lose faith, and they are likely to lose a customer for life. So the next time someone asks you about the value of data, ask them, “What’s the value of a customer for life?”
Looking for articles on data replication and disaster recovery techniques, I discovered an interesting post by Hu Yoshida, VP and CTO at HDS. In it, Hu describes the benefits of the Hitachi Universal Replicator (HUR) and its unique pull capability. Read the blog post for more detail, but I think it is important to review and comment on one statement. Hu wrote:
Combinations of synchronous and asynchronous replication may be used to provide out of region disaster recovery with no data loss.
If you take a look at this image, you will see a four (4) data center design. The idea is that an organization that really cares about disaster recovery and zero data loss should have both a Primary Data Center and an In-region Recovery Data Center within synchronous-remote copy distance. The architecture then allows both data centers to replicate using asynchronous remote copy to one or more Out-of-region Remote Data Centers.
Disaster recovery architectures are designed to protect against infrequent, but catastrophic events. Looking over short time periods in a specific geographic region, these events are, in fact, infrequent. But looked at over longer time periods, disasters are a near certainty. Now, the assumption that must be met in order to support the claim of zero data loss is that an in-region disaster won’t affect both In-region Data Centers. But if the assumption is not met, data will in fact be lost.
One might argue that the probability of losing both In-region data centers is very small. And the probability that two data centers will be knocked out of commission by the same man-made or natural disaster is dramatically higher with two close-by, in-region data centers than it is between two widely-separated data centers.
Having two data centers in one region adds substantial additional cost for little incremental protection. Business continuity protection does not require a Recovery Data Center. The more elegant approach is to provide business continuity capabilities from within the Primary Data Center, continue to use a Remote Data Center for disaster recovery, and protect yet-to-be-transferred data (the most recent snapshot deltas, and post-snapshot changed data) in our Phoenix System black box located within the Primary Data Center. The probability of losing data within our black box is near zero. Our approach provides substantially more data-recovery insurance for dramatically less money.
I found a very good article written by Tom Deaderick called “10 Places You Don’t Want a Data Center.” Tom is a Director at OnePartner LLC, which provides high-availability colocation services from the company’s data center in the southwest corner of Virginia. Anyone who is on a site-selection team for a new data center or evaluating new colocation providers should read Tom’s article. OnePartner is doing something right. The company reports having no outages in over 1400 days.
Tom’s #2 place you don’t want a data center is “in a location that suffers from frequent natural disasters.” He includes some useful data on the annual frequency of tornadoes for each state in the United States. Based on a quick glance at the data, you might think you should never build a data center in Texas. The state had an average of 139 tornadoes per year between 1950 and 2004. That’s over 7,600 tornadoes in 55 years. Maryland, on the other hand, had only 6 tornadoes per year over the same period. From a tornado-risk perspective, Maryland is obviously much safer, right? Wrong.
You’ve got to be careful with statistics. Texas, as most Americans know, is the second largest state in the U.S., with an area of almost 270,000 square miles. Maryland is #42 and covers only 10,455 square miles. So if you calculate the tornado-rate per square mile, Maryland ranks 8th in annual tornado frequency at 5.74 tornadoes per 10,000 square miles, 10% higher than in Texas, which ranks 11th. For the record, Florida is the state with the highest tornadoes-per-10,000 square-miles rate at 9.37.
Tom offers 10 important factors to consider when locating a data center. Read the article to get the list, because I don’t want to steal his thunder. But, yes, companies should know the frequency of various types of disasters and obviously avoid known flood plains, airplane take-off and landing paths, and the San Andreas Fault. I wonder if Tom looked at earthquake risk in Virginia. Based on data from the last century, they are extremely rare. But, in fact, a significant earthquake occurred in Virginia in August, 2011. And there was another, less-severe earthquake in the same area just a few days ago. The epicenters for both the August 2011 earthquake and the July 2012 earthquake were almost 350 miles from Tom’s data center. But a much stronger earthquake occurred in southwest Virginia in 1774. I wonder when southwest Virginia will have its next big earthquake. Despite new earthquake prediction techniques, nobody really knows.
That brings me to my last point. Disasters are, by their nature, simple to track, but very difficult to predict. In designing data centers for maximum up-time and minimal data loss, it’s important to protect your data against disasters that you can’t predict.
Early this year, Batley News in the U.K., reported that Cattles Group, a financial services company, was being investigated, after the firm lost the personal information belonging to a million people, including both customers and employees. You can read the entire article here, but the lost data was on two tapes that went missing. It doesn’t mean that they got in the wrong hands, and it doesn’t mean that the data has actually been accessed by an unauthorized person, or that accounts have been compromised. But under a number of laws that exist in various countries, losing personal information that has been entrusted to an organization is a reportable offense. And so, Cattles Group notified the police and two other government agencies. And they also notified each of the affected customers and employees.
Despite the continued decline in the use of tape, it is, in fact, still in use, and there are a number of applications where tape remains very valuable and a great technology fit. Two of the historical values of tape were that it was removable and transportable. And one tape holds a lot of data. Remember, two tapes held the personal information of a million people. The fact that tape is removable and transportable is also its liability. So it is not unusual to hear incidents of lost tapes and, thus, lost data. In fact, there is an entire website, datalossdb.org, devoted to reporting data losses, and you can search the database for data losses associated with tape media.
If the job of the solution is to get your data from one location to another in a secure and cost-effective way, so that you can restore operations after a disaster, I think the improvements in disk-based replication technology, including point-in-time, application-consistent snapshots, data deduplication, and data compression, make it unlikely that tape will survive much longer as a backup media. Add to that Axxana’s zero-data-loss-over-any-distance capabilities and there’s no compelling reason to stay with tape.
I know everyone always says that they are “drowning in data,” but I’m always looking for more. So, I was very happy this week when a very large pile of data landed in my email inbox. The data were the results of research commissioned by EMC and performed by VansonBourne. VansonBourne just released a report based on the research entitled “European Disaster Recovery Survey 2011: Data today gone tomorrow, how well companies are poised for IT Recovery.”
I provided a link to the report, in case you want to read the entire report, but let me tell you some of what I found interesting. First there was this:
A quarter of organizations have experienced data loss within the last twelve months.
Hardware failures are the most frequent cause of data loss at over 60%, and I should probably write more about how Axxana protects against data loss when there is a hardware failure, because we do. Instead I’ve written a lot about the risk of natural disasters; maybe too much, since natural disasters accounted for only 7% of the reported data losses. It’s just that when a natural disaster occurs, like an earthquake or a flood, the risk to your data can be enormous. Just ask the folks in Japan or Thailand.
More interesting, though, than the cause of data loss, were the reported consequence of data loss. Here are a few data points from the report on the impact of data loss:
- 43% reported loss of employee productivity
- 28% reported loss of revenue
- 14% reported loss of customers
- 12% reported loss of repeat business
In this fiercely competitive business climate, employee productivity is extremely important in trying to derive profit from revenue. Every deal and every customer is important, and losing repeat business from an existing customer may be the worst outcome of all, since that should be the most profitable.
Even though the average amount of data lost was relatively small at only 400GB, the consequences were significant, which is why we advocate protecting 100% of your data for all applications. When it can be done so cost effectively, why risk losing productivity, revenue, customers, and repeat sales.
When people first take up the sport of golf, they often believe that distance is everything. They are looking for the long ball, and they want that drive off the tee to be the longest drive possible. After all, the further the tee shot, the fewer shots they’ll need to get the ball in the cup. Well, not exactly. Anyone who has excelled at the game of golf knows that matches are won and lost based upon a combination of distance and accuracy.
In the world of disaster recovery it’s exactly the same. You need distance, because if your replication distance is too short, you are likely to have both copies of data in a single disaster zone, exposed to the effects of the same disaster. Distance matters during natural disasters such as the earthquake and tsunami in Japan or the more recent monsoons in Thailand. For better data protection, you want the disaster recovery data copy in another time zone, another geography, and maybe even on another continent, as some of our recent customers are doing.
But just like golf, you also want pin-point accuracy. In disaster recovery, that means protecting all of the data. Luckily for our customers, we give them distance replication at an affordable price, and we make sure that all of the data arrives safely at the recovery site, regardless of the disaster.
Imagine you make cars, and 80 percent of your parts come from 20 percent of your suppliers. The parts are packed in containers and delivered to your manufacturing location on ships. Imagine there was a disaster, like an earthquake. Your biggest suppliers have great contingency plans that ensure a seamless flow of components, so you can make cars. But one of your suppliers, not one of the big 20%, was affected, and couldn’t ship parts for several months. Oh, well, it’s not that important. Just apply the 80/20 rule.
The 80/20 rule, which is also known as the Pareto Principle or Juran’s Pareto Principle, doesn’t always work. The rule originated from an analysis of wealth distribution by Italian economist, Vilfredo Pareto, who estimated that 80% of the wealth in his country was controlled by 20% of the people. Dr. Joseph Juran, who was a pioneer in quality management, applied Pareto’s analysis to quality management challenges, determining that 20% of the factors account for 80% of an outcome. In manufacturing, this might mean that 20% of your suppliers account for 80% of your output potential, so in supply chain disaster preparedness, companies logically place the bulk of their focus on the 20% of companies that supply 80% of the parts. Unfortunately, according to Patrick Brennan, in his article, Lessons Learned from the Japan Earthquake, published this summer in the Disaster Recovery Journal, Lesson 1 was “Don’t Apply the 80/20 Rule to Supply Chain Disaster Preparedness.” The 80/20 rule doesn’t work.
When the lack of availability of a $1 part prevents a company from making a $30,000 product, something needs to change.
The same error occurs when attempting to apply the 80/20 rule to the value of data. While it might be convenient to believe that 20% of your data accounts for 80% of the value, the loss of even a small amount of data, can have an enormous effect on the output of an analytical process or on the reputation of an organization. Imagine, for example, that a disaster destroys the last 3 minutes of data, and one of those pieces of data was an email that provided critical evidence to defend against a shareholder claim, or it was a buy order for fuel in a rising fuel market, or it was a change to a medication order for a critically ill patient.
You can’t always determine in advance, which data will be valuable. Therefore, it is best to provide complete protection to all data. If it’s important enough to keep, it’s important enough to protect. Fortunately, we make complete data protection both possible and affordable.