Insurance brokers suffer loss of £000's a day

Protecting your IT systems against failure is sometimes seen as a game of chance, and the bigger your risk appetite (or the smaller the risk) then the less likely you are to prepare fully for IT failure. But how likely is it that you will experience failure and what is the impact if you do?

The average company experiences downtime at least once every year so the answer to the first questions is that it is isn’t a case of if  but more a question of when you will experience an IT failure. However, a failure could just happen to one of your servers, a group of servers or your entire IT systems. The severity of the failure will determine the answer to the second question – the impact of the failure. Both financially damaging and brand damaging, there is no doubt that an IT failure will have a detrimental impact if it can’t be dealt with quickly and without big data loss. Occasionally, everything seems to be stacked against you and what at the outset is a simple power outage can cause catastrophic results which will demand the resource of every senior exec within the business and ultimately cost you a lot of money in lost revenue, lost customers and compensation, and also damage your brand reputation for a good period of time.

A recent incident highlights the worst case scenario when a power failure caused a hardware (disc) failure, which caused an outage for over 10 days for the SaaS provider SSP. SSP boasts 40% of the UK insurance market and 8 out of the top 10 UK insurers as their customers. Without their business critical systems, insurers were unable to quote or issue insurance policies without reverting to a time-consuming and difficult manual process.

Plan B assesses the impact of the failure:

The IT failure

On 26 August, a power outage in the Solihull area hit a major data centre that was hosting the systems for SSP. The power outage caused damage to discs in their Storage Area Network (SAN) and affected all customers  hosted in the data centre, rendering them unable to use their online systems to process quotations or insurance certificates. The provider assures customers that they are working around the clock to get systems back to normal as fast as possible.

On 1 September, customers are yet to have access to their systems when the SaaS Provider experiences a further hardware failure in its storage facility, which stemmed from the original power failure, and set back its timeline for getting service restored to customers affected by the incident.

SSP stated: “The ‘Storage Area Network’ (SAN) is the high volume disk array that is at the centre of the service. It comprises a large number of storage disks, grouped into cassettes, over which the business and configuration data that supports our systems is housed. It was damage to a small number of these disks following the exceptional disruption to power supply that caused the original loss of service.”

“With support from HP we have dealt with the original damage to the disks and were a long way through the process of restoring the environments and data required to support our customers, when the work was interrupted by a recurrence of the original problem.”

This update was sent out to customers via their portal from the Chief Executive.

managed service alert

The decision was made, following a through weigh-up of pros and cons, to restore the environment in the original data centre, rather than move it to their secondary data centre in London. They hoped to have customers back up and running on 2 September.

On 5 September however, the SSP’s Chief Executive announced that they had decided to move services to an alternative data centre in West London. They were in the process of unpacking all the backup data, having made the decision the previous night.

The statement was made “Since we made the decision in the early hours of this morning [4 September] to restore service to our customers in our alternate site in West London rather than the original Solihull site, we have been focusing on unpacking very large volumes of transaction and other required data, along with application instances. We are focusing on completing the unpacking of data as quickly as possible, so that we can implement systems for the rest of our affected customers. The pace at which we can do this is in part driven by the speed with which the very large volumes of data can be unpacked and made available.”

Data recovery

In addition to recovering the systems themselves to deliver a working platform for the software and services, the SaaS Provider needed to recover all of their customers’ data onto this new platform.

The software house warned that the process of recovering data that was affected by the recurrence will take some time and said it was working on “estimating that impact as a matter of urgency”.

The note continued: “Once again, we apologise for the disruption to the businesses of our affected customers and to the inconvenience that this incident is causing their staff, management and customers. We are giving the resolution work the highest priority within our business.

“We appreciate your frustration at the length of time this is taking to resolve the issue. We are extremely frustrated at not being able to bring our affected customers back to full functionality as quickly as we would hope.”

It is believed that there will be announcements made on 6 September to individual customer around the estimated recovery time of their systems, and there are signs that the first customers have regained some access 11 days after the initial outage.

Broken promises?

There’s no doubt that customers will be asking whether the Disaster Recovery promises made to customers in their statement below is in line with the actual experience. A customer uncovered the following on the SSP’s DR provision:

DR promise

So if this was the promise why could speedy recovery not be delivered? I’m sure this is the question many customers are asking.

The provider is effectively running a back solution, rather than a DR solution. Many SaaS Providers offer this – a backup solution with in-house recovery to a secondary data centre when it is required. It comes with long recovery times and no guarantees. You are at their mercy and have no control when their primary IT systems fail. In order to save costs, SaaS Providers rarely have a second system constantly running with continual data replication and regular testing going on, because this is an expensive option and the customer will have to pay a premium for it. Instead, the SaaS provider just holds a copy of your data (not a running copy of systems) in a second data centre – a much cheaper option. The impact of this is that if the first data centre is unavailable they need to build a replica copy of the production systems in the secondary data centre, then install the operating systems and software and finally the data (once unpacked). This all takes time, and will be costing the SaaS provider money for as long as they are using the compute resource in the secondary data centre.

The saga is probably not over yet because the provider will likely want to migrate everything back to their primary data centre to bring costs back down in the long run (which is probably the main reason why they were reluctant to move to their secondary data centre in the first place).

And as the first customers start to regain access to their systems, more issues emerge

“OK, we now have access to Pure, BUT, there is no historic documaster data and no access to microsoft applications. At least we can quote NB, and process MTA’s and Renewals. No timescale for when we will get the rest of the functionality back. Hope the rest of you guys get your system back soon.”

Business Impact

From a customer viewpoint, there have been various statements made around the impact of the downtime:

    1. “The impact on us as an organisation is that we, in certain parts of the business, are unable to access client records. We’re unable to transact commodity business which relies on EDI transmission, mainly on the personal lines and the SME side of it.”
    2. “We’re unable to transact that business and we’re having to rely on insurers. Working with insurers in the old fashioned method of engaging with them over the phone and trying to look after the customers as much as possible. It is a wholly unacceptable situation and I’m not very happy about it.”
    3. “If someone needed to do a change of vehicle from today, if someone needed to insure a vehicle from today, someone’s renewal was due today and we couldn’t do it – I can’t predict how they might react, they might want to go somewhere else,”
    4. “So it’s a loss of business and obviously it reflects badly on us. They call us expecting a quote and we can’t do the very basic of offering them one.”

SSP themselves will experience an undoubtable loss of customers, request for compensation and large financial cost of recovery. All in all, not a good couple of weeks for the provider or their customers.

Social

Customers have also been making their feelings heard through social media

  • “You don’t deserve to be in business”
  • “our ‘dedicated account manager’ hasn’t returned our calls and your portal updates are meaningless”
  • “please stop the cover up and lies and tell us exactly what is going on, I won’t accept this bs anymore!”
  • “30 yrs in business, have faced many challenges but none as big as this. This won’t happen again! You have failed us all”
  • “X have fundamentally failed in their duty to provide a system which is robust enough to keep brokers’ systems operational and functioning so that we can trade.”
  • “Despite the fact that I’m losing £000’s each day because of @xxx……”

Forums

Forums have also been created by broker customers of this SSP. This creates an environment for brokers to discuss, amongst other things, the possibility of moving to a new supplier:

“I’m worried this issue is going to continue well into next week. It would appear they have a major problem. My biggest worry is that they have lost our data. If and when we get back online are SSP going to compensate us? I am going to have to pay my staff overtime just to clear the backlog. We have also turned away numerous new business enquiries. Does anyone know what the service agreement states SSP will do in such a situation? With regards to moving to another supplier I’m sure there are plenty of brokers thinking about this. After changing from YY to Pure I’m not sure I’d want to go through another software change, that said I am losing patience and confidence in SSP.”

“I have concerns that SSP have yet to give any assurances that Data has not been compromised. Does anyone else feel this is strange given the circumstances?”

Customers seeking solace in each other and openly discussing preferred suppliers is potentially very damaging to your business and something that should be addressed with proper communication urgently.

Where did the problems lie?

Once things have settled down and the SSP goes through a post-incident debrief, certain questions need to be asked – mainly around the Disaster Recovery provisions of the SSP. There is a simple solution for SSP, and that is to significantly improve its DR solutionA second data centre location with backup data simply isn’t enough for a SaaS Provider that is providing business critical operations to its customers. However, customers should be educated to ask the right questions around recovery time and data loss guarantees, for it is only customer demand that will enforce better IT availability and business resilience against future failures. One customer has come out on Twitter seeking answers:

SSP questions

It is worth customers asking the above questions of all their SaaS Providers, in addition to:

  • What recovery time guarantees do you offer?
  • How will you minimise data loss?

If you are not happy with the answers think carefully about whether they are the right provider for you.

As a customer you have the right to ask SaaS Providers about Escrow services which can offer continuity should your SaaS provider encounter an IT failure, or even go bust. Alternatively, sourcing your own independent DR solution will enable you to choose the recovery guarantee that you require for your business critical systems. View our guide to DR solutions for further reading.

If you’d like to discuss a recent incident with Plan B, we are happy to offer free consultancy and advice- please call us on 08448 707999 or email info@planb.co.uk.