April was a busy month for IT Disaster Recovery. With the key event being the Holborn fire which crippled many businesses, there were also major incidents for Bloomberg who claimed that a hardware failure was at fault for their loss of systems, and Starbucks who had to give away free coffee nationally while they were without business critical systems.
So why is it that leading brands with large IT budgets are still suffering from IT downtime in this day and age? Bloomberg should be especially risk averse due to the nature of their business, and Starbucks would have seen a big impact on their revenue which could have been avoided by a more resilient system. The answer probably comes down to priority, and to some extent, market awareness.
What we have learned from our years of being in the business is that the following 2 factors are the biggest drivers behind choice of disaster recovery solution:
- How well does it protect the business?
- Does it fit within the budget?
Of course there are other elements to consider such as security and compliance; but for now let’s focus on the 2 main ones. Why are these companies getting it so wrong when decisions are made largely on the performance of a DR solution? Companies should have their pick of disaster recovery offerings which offer highly reliable resilience. But unfortunately that’s just not the case and we’re hearing all too regularly of IT downtime.
Protecting your business
If you outsource your disaster recovery, chances are you may be a little confused by the technologies and all the marketing claims out there. Whether it’s physical or virtual standby, replication, cloud disaster recovery services or on-line backup or you’ve opted for, there is still a risk that recovery will not go to plan. The reason is that even if you have a standby solution, the testing regime won’t be disciplined enough to guarantee recovery. The only way you can really know that your disaster recovery solution is working is when you’ve just tested it. IT departments are constantly making changes to their IT systems – whether it’s moving a database or adding an application, which means your standby DR system will need to be updated with every change. Because this is rarely done immediately, when you come to recover your IT systems, there are databases or applications missing. Put simply, a DR system isn’t prioritised as much as a live system, so it’s never going to perform as well when you come to use it.
If you manage your DR in-house, the limiting factor on its performance is going to be your maintenance and testing. Most companies test their DR systems once a year which just isn’t regularly enough – as the like of Starbucks and Bloomberg have proven. You’re never going to stop your hardware from failing, but you will speed up the recovery time if you prepare well and test every 24 hours. But testing costs money and time, which is considered better spent working on the live systems to gain competitive advantage. It comes down to priority.
So what’s the answer?
There is only one way to guarantee your recovery. And that’s by keeping a standby system that you test daily. If you test your DR solution every 24 hours then you know that you will be able to recover back to (at most) 24 hours ago. Often outsourcing is the only way you can prioritise and discipling testing this regularly.
Businesses that experience downtime of more than 1 hour in an incident don’t have a standby system that’s tested every 24 hours. They’re therefore risking their business, but can easily reduce this risk to the business.