(and the continual changes to your production systems)
Plan B has had a week of disasters – or rather, our customers have had a week of unfortunate IT disasters and we’ve naturally been drafted in to help. One customer has had a problem with their primary web server going down, another was infected by a cryptolocker attack, a third lost connectivity when their BT cables coming into the building were damaged, yet a fourth had a terminal server that was blue screening, and finally a non-customer’s loss of power has left them without IT systems for days now when they failed to restore from their Veeam backup. These problems have all been quite simple and straightforward for Plan B to resolve thanks to us having a ‘last known good’ copy of customer’s servers tested to within 24 hours that we could deliver to them instantly, or in the Veeam case, the expertise to be able to restore from their backup for them (which is currently underway).
Where things get a bit trickier, and resolution takes longer, is when you have issues with replication failures and there are large amounts of data involved. For one customer, a bug in their continuous replication software meant that one of their servers failed to replicate successfully. It is a large server with over 6TB data, and because we are managing the customer’s DR service which comes inclusive of daily fault management, we noticed the replication failure immediately. With an imminent office move, it is paramount that this sever is replicated quickly to be confident of protection should anything happen during the impending move.
As with most technologies, with this particular continuous replication software there are limitations. If a replication fails, it subsequently needs to perform a delta sink which can take over twice as long as the standard replications. With a 6TB machine, this looks to take a long time – estimated to be around 5-6 days. And here lies the first challenge of continuous replication. If you’ve got lots of data, you’ll need lots of bandwidth. Furthermore, this bandwidth needs to be available 100% of the time for continuous replication to be effective. If any replications takes longer than expected, the next replication will be set back and your RPO will suffer, or your virtual machines may not be replicated at all. This leads to big vulnerability. So if you’re considering continuous replication for your business you’ll need to factor in large dedicated bandwidths for your DR to work.
For the customer in question, the replication did take longer than expected. After deeper investigation it transpired that a rolling backup was slowing down the replication speed by using up available bandwidth. Bandwidth that was desperately needed for the replication. When our customer stopped this backup to enable more bandwidth for the replication, the replication that was now at at 85% suddenly restarted as a change to the systems had occurred. And here lies the second limitation to continuous replication – if a replication is underway and something changes on your systems then the replication will automatically set to restart, without finishing the existing replication. Given the number and frequency of changes happening to live systems, replications are likely to regularly get interrupted which can cause further errors and problems with the replication success.
Continuous replication requires careful management in order to ensure it works successfully and really offers you the level of protection you require for your systems. If you are encountering problems or would like any advice on whether continuous replication is right for your business please don’t hesitate to contact Plan B to speak to our experts.